GSoC Project

May 20, 2017

The project I will be working this (G) summer (oC)

Overview

The motivation, here, is to extend the functionality of vulture as a library, and to pass on all metadata through the API and then to harness this utility in VultureBear for auto removing dead code, which would greatly optimise the bear. The second part of this project focuses on offering the source range of the dead code which would make auto-removal much easier. As of now, vulture only supplies the beginning of the dead code. Also, it proposes to enhance vulture in order to detect unreachable code (like if False, if True else, any code written after return statements, etc). - this shall help the user in trimming down their codebase without affecting usability. Also, the third part would be to implement a confidence value for every result, this shall be helpful when tackling false positives.

Project Architecure

Goals

Specifications

1.) Realise vulture’s API in VultureBear*

Extending vulture’s API: This would allow the user to find all the unused code through a single abstract layer: get_unused_code. Strategy here would be to: Parse all files straight away from dict(filename: filecontent) - This would highly improve the performance of the bear later due to the time we save of memory copying.

Return a sorted list of tuples [(item.filename, item.lineno, item.typ, item)...] - which would be easily configurable. This can be easily implemented, given the already existing Vulture.scan(), Vulture.report(), Vulture.unused_funcs(), etc.

Enhance VultureBear

Refactor VultureBear to directly fetch results through get_unused_code (API), thus making it more efficient- we would have memory files passing (An extra layer of parsing would then be removed)

Further enhancements in vulture (detect unreachable code and reporting ranges of dead code** would influence the API, which would also need refactoring of the Bear.

2.) Making whitelist default and extending it further

The first step here would be to make the whitelist default. The important thing would be to identify possible cases which might cause vulture to report a false positive. This can be achieved through extensive testing with major projects - trending python projects on github would cater to our need for the purpose. This approach would serve us many benefits:

We can identify instances of what should ideally be in our whitelist file - as we may find any lesser known constructs.

We can test vulture for any unreported bugs.

We can find many projects which use/might want to use vulture - they may further collaborate with us in making the whitelists together. (As proposed by @jendrikseipp**

3.) Acquiring source range and implementing auto-removal

Analyse and discuss with the community the utilities of ast or enhanced pyflake ast for what would better cater to our problem and would offer simplicity for source-range acquisition and arrive at a concrete conclusion. Also, there was another proposal by @m0hawk to get everything until the next node starts. Dialogue here - #25

Also, if able to fetch the source range successfully, implement the pathway through which the metadata flows in and out of API, this would not require much work because we can easily change item.lineno (int) to item.dead_range (tuple of ints** and can parse them over in the VultureBear.

4.) Detecting unreachable code

We would first need to identify cases where code cannot be reached. Some of the common ones are:

Similar constructs would have to be looked onto. The crude form of this would be:

5.) Implementing a confidence value for results

We would need to analyse every construct individually on a case by case basis. For example, we already know that import statements can be predicted with 100% surety (except for * imports, where it would be 0%), but functions often have false-positives.

The confidence value will be alike the ones given below: (The finer grained distinctions will need further discussion)

Table 1: A table with default style formatting

Construct Confidence Value
import 100%
from foo import *0%
variable <100%
function <100%
class <100%
if False 100%

References:

Milestones

PREPARATION/BONDING

CODING PHASE 1

CODING PHASE 2

CODING PHASE 3

Thank You for reading along, please feel free to tweet to @rahul722j or reach out to me at rahul722j@gmail.com, or just comment below for any queries.

GSoC Project - May 20, 2017 - Rahul Jha