Skip to content

Instantly share code, notes, and snippets.

@sbs2001
Last active August 9, 2021 16:09
Show Gist options
  • Save sbs2001/26d42784e738c078a97e3904e8833fc6 to your computer and use it in GitHub Desktop.
Save sbs2001/26d42784e738c078a97e3904e8833fc6 to your computer and use it in GitHub Desktop.

Google Summer of Code 2020 Report

Organisation : AboutCode

Project Title : Enhancement of VulnerableCode

Quick primer on VulnerableCode VulnerableCode is Django project which aggregates data about software vulnerabilities from multiple sources and transforms it into an easy to use format.

Things acheived in GSoC

  1. Implemented a OVAL document parser. During this task I noticed that the element lookup method used in CIS's implementation was unusually slow. After some tweaks and algorithmic changes over 600% increase in performance was acheived. The improved implementation is now used by CIS. Related PRs :

  2. Added over 8 data pipelines AKA importers. This included security advisories provided by NVD, GitHub, Gentoo, Debian OVAL, Ubuntu OVAL, Ubuntu USN, SUSE Backports and RedHat RHSAs . Related PRs :

  3. Changes in data structures and schemas.

    1. Use JSONField instead of CharField to store qualifiers of a Package URL. This allows lookup of packages via Package URLs without requiring to normalize the qualifiers. Related PRs :

    2. Combine ResolvedPackage and ImpactedPackage model into a single PackageRelatedVulnerablility model using a simple boolean field. This allows database level constraints to avoid conflicting relationships between packages and vulnerabilities. Also added a basic mechanism to store these conflicts and later be manually reviewed and resolved. Leverage some basic heuristics to reduce database queries and allow bulk inserts while maintaining data integrity this reduced the time to perform data imports . Related PRs :

    3. Pass reference urls and reference ids of a vulnerability as a mapping using Reference dataclass . This preserves the relationship between the urls and ids which was not possible earlier when urls and ids where passed discretely. Related PRs :

  4. Designing and implementing the UI for community curation. Implemented views for creation, searching, deletion and modification of packages, vulnerabilities and the relationships between them. Related PRs : - nexB/vulnerablecode#230

zz_comp

  1. Refactor the whole codebase to make use of asynchronous API calls to various package manager APIs. This gave a massive speed boost for data pipelines which were dependent on making API calls to resolve version ranges of packages. Related PRs :

  2. API redesign :

    1. Added API endpoint to search for packages using their package url.
    2. Added API endpoint to search for vulnerabilities using various vulnerability ids.
    3. Made the API more RESTful by adding hyperlinks which provide navigation experience comparable to the web interface.
    4. Added swagger API documentation which makes the API easy to use and understand Related PRs :
  3. Miscellaneous improvements :

    1. Added mechanism for importers to check whether the data obtained is new, using either checking last date modified or ETags. This prevents duplicate work if the data is not changed since the previous import. Related PRs :

    2. Switched to using a registry and a class yielding objects containing the seed data for importers. This prevented a lot of boiler plate code compared to previous approach of using migration scripts to provide initial seed data. Related PRs :

    3. Added documentation to enable easy understanding of the project and it's features and enable easy onboarding of new contributors. Related PRs :

Closing Thoughts

As expected and warned by Haiko, my time estimates on the proposal went wrong. it tooked too long to get the models right and refactor the codebase, but on the brighter side implementing the frontend was quicker than estimated so it all balanced out. There were many unexpected hiccups but documentation + Stack Overflow + Mentors allowed me overcome these with ease. Thanks to Philippe, Steven, Ted, Haiko for their invaluable guidance and making this GSoC smooth. Summer well spent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment