Phase-1

June 24, 2017

Phase 1 of the coding period ended on 26’th June 23:30 GMT+5:30. With this post, I would like to reflect upon the development progress so far and share some of the challenges I faced.

Overview

I had the following things to tackle:

Make vulture available as an API
Refactor VultureBear accordingly
Add an option to exclude some files from being analysed by vulture
Implement confidence values for results given by vulture
Make use of the stdlib whitelist in every run by default.
Add appveyor CI

Making vulture available as an API

Thanks to the already implemented Vulture.Item class and vulture.scavenge method, we had to do nothing to extend vulture’s utility as an API. We just had to manipulate the way VultureBear calls vulture.

Refactor VultureBear

Previosuly, VultureBear created a POpen instance, executed vulture over the input files and piped the output. Output was then parsed for information (filename, lineno, type, etc.) using Regular Expressions.

You can read more about it here - Meeting Jendrik

Add an option to exclude files

There might be a variety of reasons where people would want VultureBear to not analyse some files, like:

vulture reports False positives, and they are fed up of errors.
some configuration files define environment variales, they are being reported as dead code by vulture.
Author plans to use the specified code later.

Now, previously, I had the vulture’s exclude option in my mind. But, then @jayvdb pointed out that there is an ignore option in coala. After further discussion, the issue was marked as already resolved. :-) Thanks @jayvdb

Implement confidence values for results given by vulture

This was an important change, as it would have changed the format in which vulture reports it’s results.

In a prior meeting with @jendrikseipp, it was decided that imports would be reported with 100% confidence and everything else with 70%. There was also some discussion for a flag --min-confidence which would enable a filter which would only allow results which have a confidence more than the value given.

But after further thought the idea of making this change upstream was dropped and the CONFIDENCE_MAP was implemented in the VultureBear itself. The implementation is quite simple:

CONFIDENCE_MAP = {
    'attribute': 70,
    'class': 70,
    'function': 70,
    'import': 95,
    'property': 70,
    'variable': 70,
}


def _find_unused_code(filenames):
    """
    :param filenames: List of filenames to check.
    :return: Generator of Result objects.
    """

    def file_lineno(item):
        return (item.filename.lower(), item.lineno)

    vulture = Vulture()
    vulture.scavenge(filenames)
    for item in sorted(
            vulture.unused_funcs + vulture.unused_imports +
            vulture.unused_props + vulture.unused_vars +
            vulture.unused_attrs, key=file_lineno):
        message = 'Unused {0}: {1}'.format(item.typ, item)
        yield Result.from_values(origin='VultureBear',
                                 message=message,
                                 file=item.filename,
                                 line=item.lineno,
                                 confidence=CONFIDENCE_MAP[item.typ])


class VultureBear(GlobalBear):
    LANGUAGES = {'Python', 'Python 3'}
    REQUIREMENTS = {PipRequirement('vulture', '0.14.0')}
    AUTHORS = {'The coala developers'}
    AUTHORS_EMAILS = {'coala-devel@googlegroups.com'}
    LICENSE = 'AGPL-3.0'
    ASCIINEMA_URL = 'https://asciinema.org/a/82256'
    CAN_DETECT = {'Unused Code'}
    SEE_MORE = 'https://github.com/jendrikseipp/vulture'

    def run(self):
        """
        Check Python code for unused variables and functions using `vulture`.
        """
        filenames = list(self.file_dict.keys())
        return _find_unused_code(filenames)

Default whitelisting

The idea here was to take into account a whitelist - by default in every run. For this, some ground work needed to be layed down first - It seemed like a daunting task at first because it needed a major change in the way vulture was shipped - It had to be shipped as a package, so that we can bundle any additional package data with it - like whitelists.

But au contraire this was relatively easy - we just needed to rename some files and change the directory structure.

Now, only whitelists needed to be bundled

For this, I added this to the setup.py:

package_data={'vulture': ['whitelists/**/*.py']},

But no matter what, an error kept on cramming:

no whitelists/ file were found in the package-dir

Now, here comes the confusing part: package_data is not supposed to take up recursive glob patterns and this wasn’t mentioned anywhere in the documentation. :-( I found it written in a very small font in some comment on Stack Overflow. Anyways, removing the ** made it work.

Finally we could now add whitelists to the list of modules being scanned, @jendrikseipp suggested a much efficient way to append the core whitelist data and I was finally able to merge this PR. :-)

Here are the commits:

Add appveyor CI

After going through their documentation more than twice, I still had no idea what would be the constructs of the appveyor.yml(The configuration file for appveyor). Then, it came to me, just copy the file from coala-bears and make suitable changes.

Here’s the appveyor file for coala-bears:

environment:
  global:
    # SDK v7.0 MSVC Express 2008's SetEnv.cmd script will fail if the
    # /E:ON and /V:ON options are not enabled in the batch script intepreter
    # See: http://stackoverflow.com/a/13751649/163740
    CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\.ci\\run_with_env.cmd"

  matrix:
    - PYTHON: "C:\\Python34"
      PYTHON_VERSION: "3.4"
      PYTHON_ARCH: "32"

    - PYTHON: "C:\\Python34-x64"
      PYTHON_VERSION: "3.4"
      PYTHON_ARCH: "64"

cache:
  - "C:\\pip_cache"
  - "node_modules"
  - "C:\\Users\\appveyor\\AppData\\Local\\coala-bears\\coala-bears"
  - "C:\\Users\\appveyor\\AppData\\Roaming\\nltk_data"

branches:
  except:
    - /^sils\/.*/

install:
  # Prepend newly installed Python to the PATH of this build (this cannot be
  # done from inside the powershell script as it would require to restart
  # the parent CMD process).
  - "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PATH%"
  - "SET PATH=C:\\Program\ Files\\Java\\jdk1.7.0\\bin;%PATH%"

  # language-tool needs the registry tweaked here since it determines the java
  # version wrong (since appveyor has both, 1.7 and 1.8 in x86 and x64).
  - "SET KEY_NAME=HKLM\\Software\\JavaSoft\\Java Runtime Environment"
  - "REG add \"%KEY_NAME%\" /v CurrentVersion /t REG_SZ /d 1.7 /f"

  # Check that we have the expected version and architecture for Python
  - "python --version"
  - "python -c \"import struct; print(struct.calcsize('P') * 8)\""


  - >
    "%CMD_IN_ENV% pip install
    --cache-dir=C:\\pip_cache -r requirements.txt -r test-requirements.txt

  - ps: "Install-Product node ''"  # Use latest node v5.x.x
  - "npm config set loglevel warn"
  - "npm install"

build: false  # Not a C# project, build stuff at the test step instead.

test_script:
  # Force DOS format, as Checkstyle configs enable NewlineAtEndOfFile,
  # which defaults to CRLF on Windows, and Appveyor CI ignores .gitattributes
  # http://help.appveyor.com/discussions/problems/5687-gitattributes-changes-dont-have-any-effect
  - unix2dos tests/java/test_files/CheckstyleGood.java
  # Clang DLLs x64 were nowadays installed, but the x64 version hangs, so we
  # exclude according tests. See https://github.com/appveyor/ci/issues/495 and
  # https://github.com/appveyor/ci/issues/688
  - >
    "%CMD_IN_ENV% python -m pytest
    --cov -k "not ClangASTPrintBear and not ClangCloneDetectionBear and
    not ClangComplexityBear and not ClangCountVectorCreator and
    not ClangCountingConditions"
  - "%CMD_IN_ENV% python setup.py install"

on_success:
  - codecov

on_failure:
  - codecov

matrix:
  fast_finish: true

After tweaking around a little bit, it looked like this:

environment:
  global:
    # SDK v7.0 MSVC Express 2008's SetEnv.cmd script will fail if the
    # /E:ON and /V:ON options are not enabled in the batch script intepreter
    # See: http://stackoverflow.com/a/13751649/163740
    CMD_IN_ENV: "cmd /E:ON /V:ON /C .\\.ci\\run_with_env.cmd"


  matrix:
    - PYTHON: "C:\\Python34"
      PYTHON_VERSION: "3.4"
      PYTHON_ARCH: "32"

    - PYTHON: "C:\\Python34-x64"
      PYTHON_VERSION: "3.4"
      PYTHON_ARCH: "64"

cache:
  - "C:\\pip_cache"

install:
  # Prepend newly installed Python to the PATH of this build (this cannot be
  # done from inside the powershell script as it would require to restart
  # the parent CMD process).
  - "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PATH%"

  # Check that we have the expected version and architecture for Python
  - "python --version"
  - "python -c \"import struct; print(struct.calcsize('P') * 8)\""


  - >
    "%CMD_IN_ENV% pip install
    --cache-dir=C:\\pip_cache tox

build: false  # Not a C# project, build stuff at the test step instead.

test_script:
  - >
    # tox takes care of everything, from installing to running tests and even reporting the code coverage
    "%CMD_IN_ENV% tox

matrix:
  fast_finish: true

But, sadly, this didn’t work. :-( But then, an option called Ignore YAML caught my eye - This meant there was another way to do this - It was the helper GUI in appveyor and then, I thought, I would export all the settings in the form of the YAML file. Browsing through every setting and after 10 failed builds, when the eleventh one finally finished successfully, I was all set to celebrate. :-)

Here’s what the current appveyor.yml file looks like:

version: 1.0.{build}

install:
- cmd: pip install tox

build_script:
- cmd: python setup.py install

test_script:
- cmd: tox

matrix:
  fast_finish: true

I was finally able to get things to work in the given time. Here’s how my Burndown chart looks like for the Phase 1: screen shot 2017-06-29 at 11 25 47 pm

Special thanks to @jendrikseipp and @jayvdb. Thank You! :-)