Skip to content

Instantly share code, notes, and snippets.

@DanSeraf
Last active August 22, 2021 13:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DanSeraf/2c8312c0d209ea6a9b697b192f0da44f to your computer and use it in GitHub Desktop.
Save DanSeraf/2c8312c0d209ea6a9b697b192f0da44f to your computer and use it in GitHub Desktop.
GSoC 2021 - Final Report

Software Heritage Code Scanner Improvements for Production Environments

Software Heritage has the biggest open archive of the source code publicly available, it captures software projects from various hosting places and all of them are stored inside a giant Merkle DAG. Currently Software Heritage has an experimental tool (Code Scanner) to check, from a given code base which part of the source code is already stored inside the archive. The main idea for this proposal is to enhance the Software Heritage Code Scanner to make it usable in real production use cases: Software Heritage GSoC task

What was done

Main changes

  • (D5926) Refactoring of the swh-scanner model: since swh-model already provide on-disk caching of software artifacts the source code is stored directly in the swh-model Merkle data structure. This part involved also the refactoring of the output functions and the creation of a new data structure to store the Merkle nodes informations. Task(s) Involved: T3349, T2730, T2692
  • (D5996) Abstraction of scan policies in order to easily create new scan algorithms. All the scan approaches present in the benchmark branch was moved to the master branch. Task(s) Involved: T3420
  • (D6114) Store provenance information about software artifacts using the Software Heritage graph service.

Minor changes

  • (D5951) Make the Merkle data stucture in swh-model iterate over nodes without deduplicate the nodes.
  • (D6027) Added "auto" option to the CLI to select the most efficient approach to scan the input source code and created a new policy to scan and query all the software artifacts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment