- Name: Yashu Garg (@yashugarg)
- Organization: Python Software Foundation
- Sub-organization: CVE Binary Tool
- Project: Improving code coverage and Implementing fuzzing
- Proposal: View/Download
Code coverage is a measurement used to express which lines of code can a test suite execute. Codecov measures and keeps track of the code coverage with every commit. Before the GSoC period started, the total test coverage was slightly shy of 80%.
To increase the code coverage of the tool to 95% or more, which includes:
- Writing new unit tests and improving the existing ones.
- Improving the test harnesses and CI infrastructure to cover everything.
- Successfully added unit tests for specific modules with no code coverage. Also revisited and optimized pre-implemented tests. The total test coverage of the tool had crossed 90%.
- PRs:
- intel/cve-bin-tool#1683 (tests for
.zst
and.pkg
file extractors) - intel/cve-bin-tool#1709 (unit tests for
format_checkers
script) - intel/cve-bin-tool#1720 (tests to cover all execution paths in
extractor
module) - intel/cve-bin-tool#1737 (unit tests for
csv2cve.py
) - intel/cve-bin-tool#1739 (unit tests for
version.py
) - intel/cve-bin-tool#1758 (python package tests in
language_scanner
) - intel/cve-bin-tool#1770 (tests to cover all methods in the
scanner
module) - intel/cve-bin-tool#1778 (include
intermediate_report
inoutput_html
tests)
- intel/cve-bin-tool#1683 (tests for
- Successfully added a new CI job to run Long Tests on Windows, which included writing tests to cover the Windows-specific code paths and optimizing the code to make a few Long Tests pass on all operating systems.
- PRs:
The graph shows a growth pattern in code coverage during Phase 1 of GSoC.
- #1720 took exceptionally long to implement, requiring me to cover all possible execution paths for both operating systems.
- With new code added regularly, keeping the test coverage percentage up was also challenging.
Fuzz testing, often known as fuzzing, is an automated software testing approach that includes feeding random, erroneous, or invalid data to a program. The goal of fuzzing is to find bugs and vulnerabilities in the program. The team decided to go with Atheris as the primary fuzzing tool for the project.
To implement fuzzing in the tool to find bugs and vulnerabilities, which includes:
- Set up fuzzing infrastructure.
- Implement structure-aware fuzzing for JSON inputs in merged reports and various input formats for CycloneDx and SPDX SBOMs.
- Successfully implemented structure-aware fuzzing from scratch to test different parsers and input paths in the tool.
- PRs:
- intel/cve-bin-tool#1873 (tried out structure-aware fuzzing using
protobuf
) - intel/cve-bin-tool#1888 (test
merged reports
with fuzzyJSON
inputs) - intel/cve-bin-tool#1898 (added default values for
protobuf message
fields) - intel/cve-bin-tool#1906 (
restructured
the fuzzing code for better readability and CI integration) - intel/cve-bin-tool#1923 (fuzzed Python and Linux
package-list parsers
) - intel/cve-bin-tool#1924 (fuzz tested
CycloneDX SBOM
parser)
- intel/cve-bin-tool#1873 (tried out structure-aware fuzzing using
After implementing the fuzzing infrastructure successfully, the tool supports fuzzing for various input formats and parsers. The fuzz tests are still in progress and cover a relatively low percentage of code. The tests didn't yield any new bugs, but that's because of the sound code quality and excellent parsing tools used in the project.
- Setting up fuzzing from scratch was a challenge in itself. I had no experience with
atheris
, but the mentors helped me immensely during this process.
- It was my first time writing unit tests in Python, and I learned about
pytest-mock
and implemented mock tests from scratch. - I also learned about
protobuf
while setting up structure-aware fuzzing into the project. - Besides many soft skills like communication and time management, the program also helped me learn a lot about best coding practices and open source etiquette.
You can find a detailed description of progress and work done in weekly blogs.
I plan on contributing significantly to the project after the GSoC period. Things I plan to do:
- Working with SBOMs and improving the tool to support more formats.
- Adding more tests to cover the code paths that are not covered yet.
- Fuzz testing the project for data vulnerability formats.
I am thankful to Google, Python Software Foundation, and Intel for providing me with this excellent opportunity and the mentors, Terri Oda, Suhail, and Anthony Harrison, who guided me throughout the program.
I also want to thank my fellow GSoC contributor Rhythm & Anant and the cve-bin-tool community for helping me during the program.