Skip to content

Instantly share code, notes, and snippets.

@creachadair
Last active July 5, 2019 14:35
Show Gist options
  • Save creachadair/552ed31494c0bb3bad3a409cc49db551 to your computer and use it in GitHub Desktop.
Save creachadair/552ed31494c0bb3bad3a409cc49db551 to your computer and use it in GitHub Desktop.
Accurate reverse package dependencies for Go

Accurate reverse package dependencies for Go

Idea: Read each repository on GitHub (et al.) with Go code. Maybe limit this to repositories with a go.mod file, maybe not. You can't get this from the godoc.org API because imports are only updated when you visit the importer, and if nobody does that the imports don't change (you can verify this by checking cases you know of manually and reloading to watch the counter go up).

Use go list ./... to list all the import paths of all the packages, and find the import paths of all packages depended upon by each one.

Build a matrix of: depends-on(x ipath) : [ipath]

Include version numbers maybe, if they're available (e.g., from a go.mod file).

Invert the matrix to get depended-on-by(x ipath) : [ipath]

Now you can query for any package which packages depend on it.

As a side effect we could also record for each repository the import paths of all the Go packages defined in that repository.

Problem steps:

  1. List the repositories to examine.
  2. Fetch each repository (maybe shallowly?)
  3. Scan the repository for Go packages and record their deps.
  4. Merge, dedup, invert.

How to list the repos.

  • https://github.com/golang/gddo/wiki/API
  • curl -L https://api.godoc.org/packages | tee capture.json
  • jq .results[].path
  • Per the documentation "This API returns all packages, including packages with errors, vendored packages, internal packages and more."

Resolving vanity URLs:

  • HTTP GET <url>?go-get=1, follow redirects if necessary till you get a 200.
  • Parse <meta name="go-import" content="<import-path> <vcs> <fetch-url>">

Tools: https://github.com/creachadair/repodeps. Currently does not use module information at all.

Per Alex: 30K ~42K repositories on GitHub with go.mod files in the root (not in a vendor directory, for example).

Future Work

  • Use the module files to record which versions of each import path are being used.
  • Include file content digests in the index, so that dependencies can be matched against file contents during/near a parse.
  • Plugin model for other languages: Run in the root of a repo, do whatever you have to do, spit out dependencies in this format. Should work for Python, maybe others (though ipath format will vary; we might not care).
  • Preserve mapping from ipath to repository during indexing ("which repositories provide this package?")
  • PyPI: https://warehouse.readthedocs.io/api-reference/#

Interesting Stats & Visualizations

  1. Adoption: How many external dependencies are there on packages in my org? This one package?
  2. Migration: How many depend on Old instead of New, and how does it change over time?
  3. Breakdown of transitive dependencies by org: Which non-standard packages are "crucial" to the health of the ecosystem?
@bzz
Copy link

bzz commented Jun 25, 2019

~30K repositories on GitHub with go.mod files in the root

To be precise, right now search request for filename:go extension:mod path:/ returns 42287 results.

@creachadair
Copy link
Author

  1. Generate output for each .siva file.
  2. Combine the results into a string-string map (import-path to import-path).
  3. HTTP to query the map for:
  • forward(path) = [q | path depends on q]
  • reverse(path) = [q | q depends on path]
  • connect(a, b) = [ai | a0 = a, ai depends on a(i+1), an = b]
  • locate(hash) = repo and package containing file with this hash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment