yashugarg/GSoC22.md

## GSoC22.md

      
    Raw
  

              GSoC22.md
            
          
Google Summer of Code Final Work Report


Name: Yashu Garg (@yashugarg)
Organization: Python Software Foundation
Sub-organization: CVE Binary Tool
Project: Improving code coverage and Implementing fuzzing
Proposal: View/Download


Phase-1: Improving code coverage

Summary

Code coverage is a measurement used to express which lines of code can  a test suite execute. Codecov measures and keeps track of the code coverage with every commit. Before the GSoC period started, the total test coverage was slightly shy of 80%.
Aim

To increase the code coverage of the tool to 95% or more, which includes:

Writing new unit tests and improving the existing ones.
Improving the test harnesses and CI infrastructure to cover everything.

Tasks Achieved

Writing and improving unit tests.


Successfully added unit tests for specific modules with no code coverage. Also revisited and optimized pre-implemented tests. The total test coverage of the tool had crossed 90%.
PRs:

intel/cve-bin-tool#1683 (tests for .zst and .pkg file extractors)
intel/cve-bin-tool#1709 (unit tests for format_checkers script)
intel/cve-bin-tool#1720 (tests to cover all execution paths in extractor module)
intel/cve-bin-tool#1737 (unit tests for csv2cve.py)
intel/cve-bin-tool#1739 (unit tests for version.py)
intel/cve-bin-tool#1758 (python package tests in language_scanner)
intel/cve-bin-tool#1770 (tests to cover all methods in the scanner module)
intel/cve-bin-tool#1778 (include intermediate_report in output_html tests)


Working on tests and CI for Windows


Successfully added a new CI job to run Long Tests on Windows, which included writing tests to cover the Windows-specific code paths and optimizing the code to make a few Long Tests pass on all operating systems.
PRs:

intel/cve-bin-tool#1788
intel/cve-bin-tool#1801
intel/cve-bin-tool#1822


Results


The graph shows a growth pattern in code coverage during Phase 1 of GSoC.

The total test coverage of the tool increased significantly during this phase of the project. With more than a 10% increase in code coverage, the test suite covered execution paths for both Windows and Linux operating systems.
Challenges


#1720 took exceptionally long to implement, requiring me to cover all possible execution paths for both operating systems.
With new code added regularly, keeping the test coverage percentage up was also challenging.


Phase-2: Implementing fuzzing

Summary

Fuzz testing, often known as fuzzing, is an automated software testing approach that includes feeding random, erroneous, or invalid data to a program. The goal of fuzzing is to find bugs and vulnerabilities in the program. The team decided to go with Atheris as the primary fuzzing tool for the project.
Aim

To implement fuzzing in the tool to find bugs and vulnerabilities, which includes:

Set up fuzzing infrastructure.
Implement structure-aware fuzzing for JSON inputs in merged reports and various input formats for CycloneDx and SPDX SBOMs.

Tasks Achieved


Successfully implemented structure-aware fuzzing from scratch to test different parsers and input paths in the tool.
PRs:

intel/cve-bin-tool#1873 (tried out structure-aware fuzzing using protobuf)
intel/cve-bin-tool#1888 (test merged reports with fuzzy JSON inputs)
intel/cve-bin-tool#1898 (added default values for protobuf message fields)
intel/cve-bin-tool#1906 (restructured the fuzzing code for better readability and CI integration)
intel/cve-bin-tool#1923 (fuzzed Python and Linux package-list parsers)
intel/cve-bin-tool#1924 (fuzz tested CycloneDX SBOM parser)


Results

After implementing the fuzzing infrastructure successfully, the tool supports fuzzing for various input formats and parsers. The fuzz tests are still in progress and cover a relatively low percentage of code. The tests didn't yield any new bugs, but that's because of the sound code quality and excellent parsing tools used in the project.
Challenges


Setting up fuzzing from scratch was a challenge in itself. I had no experience with atheris, but the mentors helped me immensely during this process.


Things I Learned


It was my first time writing unit tests in Python, and I learned about pytest-mock and implemented mock tests from scratch.
I also learned about protobuf while setting up structure-aware fuzzing into the project.
Besides many soft skills like communication and time management, the program also helped me learn a lot about best coding practices and open source etiquette.

You can find a detailed description of progress and work done in weekly blogs.
Future Work

I plan on contributing significantly to the project after the GSoC period. Things I plan to do:

Working with SBOMs and improving the tool to support more formats.
Adding more tests to cover the code paths that are not covered yet.
Fuzz testing the project for data vulnerability formats.


I am thankful to Google, Python Software Foundation, and Intel for providing me with this excellent opportunity and the mentors, Terri Oda, Suhail, and Anthony Harrison, who guided me throughout the program.
I also want to thank my fellow GSoC contributor Rhythm & Anant and the cve-bin-tool community for helping me during the program.