Phase-1
June 24, 2017
Phase 1 of the coding period ended on 26’th June 23:30 GMT+5:30. With this post, I would like to reflect upon the development progress so far and share some of the challenges I faced.
Overview
I had the following things to tackle:
- Make vulture available as an API
- Refactor VultureBear accordingly
- Add an option to exclude some files from being analysed by vulture
- Implement confidence values for results given by vulture
- Make use of the
stdlib whitelist
in every run by default. - Add appveyor CI
Making vulture available as an API
Thanks to the already implemented Vulture.Item
class and vulture.scavenge
method, we had to do nothing to extend vulture’s utility as an API.
We just had to manipulate the way VultureBear calls vulture.
Refactor VultureBear
Previosuly, VultureBear created a POpen instance, executed vulture over the input files and piped the output. Output was then parsed for information (filename, lineno, type, etc.) using Regular Expressions.
You can read more about it here - Meeting Jendrik
Add an option to exclude files
There might be a variety of reasons where people would want VultureBear to not analyse some files, like:
- vulture reports False positives, and they are fed up of errors.
- some configuration files define environment variales, they are being reported as dead code by vulture.
- Author plans to use the specified code later.
Now, previously, I had the vulture’s exclude option in my mind. But, then @jayvdb pointed out that there is an ignore option in coala. After further discussion, the issue was marked as already resolved. :-) Thanks @jayvdb
Implement confidence values for results given by vulture
This was an important change, as it would have changed the format in which vulture reports it’s results.
In a prior meeting with @jendrikseipp, it was decided that imports would be
reported with 100% confidence and everything else with 70%. There was also some
discussion for a flag --min-confidence
which would enable a filter which
would only allow results which have a confidence more than the value given.
But after further thought the idea of making this change upstream was dropped
and the CONFIDENCE_MAP
was implemented in the VultureBear
itself. The implementation
is quite simple:
CONFIDENCE_MAP = {
'attribute': 70,
'class': 70,
'function': 70,
'import': 95,
'property': 70,
'variable': 70,
}
def _find_unused_code(filenames):
"""
:param filenames: List of filenames to check.
:return: Generator of Result objects.
"""
def file_lineno(item):
return (item.filename.lower(), item.lineno)
vulture = Vulture()
vulture.scavenge(filenames)
for item in sorted(
vulture.unused_funcs + vulture.unused_imports +
vulture.unused_props + vulture.unused_vars +
vulture.unused_attrs, key=file_lineno):
message = 'Unused {0}: {1}'.format(item.typ, item)
yield Result.from_values(origin='VultureBear',
message=message,
file=item.filename,
line=item.lineno,
confidence=CONFIDENCE_MAP[item.typ])
class VultureBear(GlobalBear):
LANGUAGES = {'Python', 'Python 3'}
REQUIREMENTS = {PipRequirement('vulture', '0.14.0')}
AUTHORS = {'The coala developers'}
AUTHORS_EMAILS = {'coala-devel@googlegroups.com'}
LICENSE = 'AGPL-3.0'
ASCIINEMA_URL = 'https://asciinema.org/a/82256'
CAN_DETECT = {'Unused Code'}
SEE_MORE = 'https://github.com/jendrikseipp/vulture'
def run(self):
"""
Check Python code for unused variables and functions using `vulture`.
"""
filenames = list(self.file_dict.keys())
return _find_unused_code(filenames)
Default whitelisting
The idea here was to take into account a whitelist - by default in every run. For this, some ground work needed to be layed down first - It seemed like a daunting task at first because it needed a major change in the way vulture was shipped - It had to be shipped as a package, so that we can bundle any additional package data with it - like whitelists.
But au contraire this was relatively easy - we just needed to rename some files and change the directory structure.
Now, only whitelists needed to be bundled
For this, I added this to the setup.py
:
package_data={'vulture': ['whitelists/**/*.py']},
But no matter what, an error kept on cramming:
no whitelists/ file were found in the package-dir
Now, here comes the confusing part: package_data is not supposed to take up recursive
glob patterns and this wasn’t mentioned anywhere in the documentation. :-( I found
it written in a very small font in some comment on Stack Overflow. Anyways, removing the
**
made it work.
Finally we could now add whitelists to the list of modules being scanned, @jendrikseipp suggested a much efficient way to append the core whitelist data and I was finally able to merge this PR. :-)
Here are the commits:
Add appveyor CI
After going through their documentation more than twice, I still had no idea what would
be the constructs of the appveyor.yml
(The configuration file for appveyor). Then, it
came to me, just copy the file from coala-bears and make suitable changes.
Here’s the appveyor file for coala-bears:
environment:
global:
# SDK v7.0 MSVC Express 2008's SetEnv.cmd script will fail if the
# /E:ON and /V:ON options are not enabled in the batch script intepreter
# See: http://stackoverflow.com/a/13751649/163740
CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\.ci\\run_with_env.cmd"
matrix:
- PYTHON: "C:\\Python34"
PYTHON_VERSION: "3.4"
PYTHON_ARCH: "32"
- PYTHON: "C:\\Python34-x64"
PYTHON_VERSION: "3.4"
PYTHON_ARCH: "64"
cache:
- "C:\\pip_cache"
- "node_modules"
- "C:\\Users\\appveyor\\AppData\\Local\\coala-bears\\coala-bears"
- "C:\\Users\\appveyor\\AppData\\Roaming\\nltk_data"
branches:
except:
- /^sils\/.*/
install:
# Prepend newly installed Python to the PATH of this build (this cannot be
# done from inside the powershell script as it would require to restart
# the parent CMD process).
- "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PATH%"
- "SET PATH=C:\\Program\ Files\\Java\\jdk1.7.0\\bin;%PATH%"
# language-tool needs the registry tweaked here since it determines the java
# version wrong (since appveyor has both, 1.7 and 1.8 in x86 and x64).
- "SET KEY_NAME=HKLM\\Software\\JavaSoft\\Java Runtime Environment"
- "REG add \"%KEY_NAME%\" /v CurrentVersion /t REG_SZ /d 1.7 /f"
# Check that we have the expected version and architecture for Python
- "python --version"
- "python -c \"import struct; print(struct.calcsize('P') * 8)\""
- >
"%CMD_IN_ENV% pip install
--cache-dir=C:\\pip_cache -r requirements.txt -r test-requirements.txt
- ps: "Install-Product node ''" # Use latest node v5.x.x
- "npm config set loglevel warn"
- "npm install"
build: false # Not a C# project, build stuff at the test step instead.
test_script:
# Force DOS format, as Checkstyle configs enable NewlineAtEndOfFile,
# which defaults to CRLF on Windows, and Appveyor CI ignores .gitattributes
# http://help.appveyor.com/discussions/problems/5687-gitattributes-changes-dont-have-any-effect
- unix2dos tests/java/test_files/CheckstyleGood.java
# Clang DLLs x64 were nowadays installed, but the x64 version hangs, so we
# exclude according tests. See https://github.com/appveyor/ci/issues/495 and
# https://github.com/appveyor/ci/issues/688
- >
"%CMD_IN_ENV% python -m pytest
--cov -k "not ClangASTPrintBear and not ClangCloneDetectionBear and
not ClangComplexityBear and not ClangCountVectorCreator and
not ClangCountingConditions"
- "%CMD_IN_ENV% python setup.py install"
on_success:
- codecov
on_failure:
- codecov
matrix:
fast_finish: true
After tweaking around a little bit, it looked like this:
environment:
global:
# SDK v7.0 MSVC Express 2008's SetEnv.cmd script will fail if the
# /E:ON and /V:ON options are not enabled in the batch script intepreter
# See: http://stackoverflow.com/a/13751649/163740
CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\.ci\\run_with_env.cmd"
matrix:
- PYTHON: "C:\\Python34"
PYTHON_VERSION: "3.4"
PYTHON_ARCH: "32"
- PYTHON: "C:\\Python34-x64"
PYTHON_VERSION: "3.4"
PYTHON_ARCH: "64"
cache:
- "C:\\pip_cache"
install:
# Prepend newly installed Python to the PATH of this build (this cannot be
# done from inside the powershell script as it would require to restart
# the parent CMD process).
- "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PATH%"
# Check that we have the expected version and architecture for Python
- "python --version"
- "python -c \"import struct; print(struct.calcsize('P') * 8)\""
- >
"%CMD_IN_ENV% pip install
--cache-dir=C:\\pip_cache tox
build: false # Not a C# project, build stuff at the test step instead.
test_script:
- >
# tox takes care of everything, from installing to running tests and even reporting the code coverage
"%CMD_IN_ENV% tox
matrix:
fast_finish: true
But, sadly, this didn’t work. :-( But then, an option called Ignore YAML
caught
my eye - This meant there was another way to do this - It was the helper GUI in appveyor
and then, I thought, I would export all the settings in the form of the YAML file.
Browsing through every setting and after 10 failed builds, when the eleventh one finally
finished successfully, I was all set to celebrate. :-)
Here’s what the current appveyor.yml
file looks like:
version: 1.0.{build}
install:
- cmd: pip install tox
build_script:
- cmd: python setup.py install
test_script:
- cmd: tox
matrix:
fast_finish: true
I was finally able to get things to work in the given time. Here’s how my Burndown chart looks like for the Phase 1:
Special thanks to @jendrikseipp and @jayvdb. Thank You! :-)