Skip to content

Instantly share code, notes, and snippets.

@bilderbuchi
Created November 7, 2017 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bilderbuchi/95bf3d91f959922c838a59463baf16e6 to your computer and use it in GitHub Desktop.
Save bilderbuchi/95bf3d91f959922c838a59463baf16e6 to your computer and use it in GitHub Desktop.
Identify redundant tests from coverage data
"""Identify redundant tests from coverage data.
STATUS: Rough draft, not working 100% correctly, probably badly documented.
This needs smother (https://github.com/ChrisBeaumont/smother), a tool using coverage,
and offering a pytest plugin, which reports on which source section is covered by which tests.
Usage:
First run your test suite (probably pytest) to generate smother info for the module under
test, e.g. in this case smother itself:
$ pytest --smother=smother [...]
This creates a `.smother` file containing the data (looks like JSON)
Next, you create a csv which contains (source, test) pairs clustered by semantic code sections
$ smother --semantic csv report.csv
Now you can run this script which parses the `report.csv` and reports on which tests can be removed
(if any) without decreasing coverage.
Useful commands to run
pytest --smother=smother --tb=no --cov-branch [test set]
smother to_coverage && coverage report [html]
smother --semantic csv report.csv && python ../find_redundant_tests.py
"""
import csv
import logging
from collections import defaultdict
# smother csv data looks like this:
# source_context, test_context
# \smother\control:Smother.save_context,
# \smother\control:Smother.start,
# \smother\control:Smother.write,
# \smother\control:Smother.write,smother/tests/test_cli.py::test_combine
# \smother\control:Smother.write,smother/tests/test_cli.py::test_combine_different_root
# \smother\control:Smother.write,smother/tests/test_controller.py::test_append
# \smother\control:Smother.write_coverage,
# \smother\control:Smother.write_coverage,smother/tests/test_controller.py::test_write_coverage
LOGGER = logging.getLogger(__name__)
# Read data into dict with section names as keys, list of related tests as values
sections = defaultdict(list)
all_sections = []
with open('./report.csv', newline='') as f:
reader = csv.reader(f)
next(reader) # skip header row
for row in reader:
all_sections.append(row[0])
if row[1]: # there is actual content in the test colum
sections[row[0]].append(row[1])
del reader
all_sections = set(all_sections)
# Create tests dict with tests as keys, list of related sections as values
tests = defaultdict(list)
for k, v in sections.items():
for test in v:
tests[test].append(k)
sections_set = set(sections.keys())
tests_set = set(tests.keys())
def impact(sectionlist, duration=1):
"""Return the impact of a certain test, given its duration and list of touched sections.
This uses a simple metric where the impact of the test is considered proportinal to
the number of code sections touched, and inversely proportinal to the test duration, if
provided.
Later in could be expanded to also take the length of the code sections into account.
"""
return len(sectionlist) / duration
impactdict = {k: impact(v) for k, v in tests.items()}
# create a list of sections sorted by coverage multiplier, i.e. the number of tests covering a
# particular section
sections_sorted = sorted(sections_set, key=lambda ele: len(sections[ele]))
def most_impactful_test(target, sections, impacts):
"""Return the test that has the most impact of all tests touching the target section."""
candidates = sorted(sections[target], key=lambda t: impacts[t], reverse=True)
return candidates[0]
# The meat of the script:
# For the full set of sections and tests, successively go through sections,
# start with those only covered once, work in increasing coverage multiplier
# identify the most impactful test touching that section,
# remove that test (i.e. regard it as essential for testing) and all the sections it touches
# In the end all sections are consumed, and the only tests remaining are those that can be
# dropped without impacting coverage
tests_to_drop = set(tests_set)
sections_to_consume = set(sections_set)
for s in sections_sorted:
if s in sections_to_consume: # check if the section has been removed already
t = most_impactful_test(s, sections, impactdict)
sections_to_consume -= set(tests[t])
tests_to_drop -= set([t])
assert len(sections_to_consume) == 0, 'Not all sections have been consumed!'
message = 'Could drop %i of %i tests without impacting coverage, remaining tests are:'
print(message % (len(tests_to_drop), len(tests_set)))
for t in sorted(tests_set - tests_to_drop):
print(t)
print('For pytest invocation: remaining tests:')
print(' '.join(sorted(tests_set - tests_to_drop)))
print('For pytest invocation: tests_set:')
print(' '.join(sorted(tests_set)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment