Skip to content

Instantly share code, notes, and snippets.

@Akrabut
Last active July 27, 2021 11:49
Show Gist options
  • Save Akrabut/7b7a616d31d29ef49c99e473c0833cf4 to your computer and use it in GitHub Desktop.
Save Akrabut/7b7a616d31d29ef49c99e473c0833cf4 to your computer and use it in GitHub Desktop.
Google Summer of Code 2019 Summary (Ruby Language)

Summer of Code

Ruby Bundler and gem metrics, statistics, and analytics

Goals and Achievements

Project Goal

To have Bundler report anonymous usage metrics over HTTP to rubygems.org when a heavy command is run (to avoid deteriorating the runtime), and to have a backend API endpoint in rubygems.org to instrument it.

Did we achieve the goal?

Basically, yes. The actual implementation changed a little bit, and we decided to only instrument the low cardinality metrics using Datadog, as the free Datadog version limits value cardinality and wouldn't work with, for example, time related metrics. Changing it into another metric instrumentation database has been left out of this project for another time.

Overall, Bundler sends all the metrics we planned in addition to all the metrics the core team wanted, and rubygems accepts most of it, so I'd deem the project a complete success :]

"Just shut up and give me the code"

OK.

Pull Request Commits Changes Status
Bundler 84 commits +532/-10 Awaiting approval
rubygems.org 13 commits +228/-0 Approved
bundler.io 2 commits +61/-0 Awaiting approval
TOTAL 99 commits +821/-10
  • This will be updated until the code is merged.

Completed work

Bundler

Metric collection

Collects the following metrics when any command is run:

Command used, Options if specified, Time taken to fully execute for commands that don't kill Bundler, Time taken to start executing for commands that do, Timestamp.

Additionally, collects the following metrics when install/outdated/package/update/pristine is run:

(generated) randomized hex ID, Remote git repository (hashed), git version, rvm version, rbenv version, chruby version, Host system details, Ruby version, Bundler version, Rubygems version, Ruby engine, CI’s, Extra user agent strings, Gemfile gem count, Actually installed gem count, git gem count, Path gem count, Gem source count, List of gem sources (hashed).

Additionally, when a gem is downloaded:

Gem download time.

Additionally, when a gem is installed:

Gem install time.

Additionally, when a gemfile is resolved:

Gemfile resolve time.

Additionally, when a gem fails to install:

Name and version of the criminal.

The aforementioned metrics are appended to the metrics.yml file in the global Bundler directory as a Ruby hash until install/outdated/package/update/pristine has been run. When any of these commands is run it reads the metrics into an array of hashes, converts it to YAML and sends it over HTTP to the server.

Opt out and in to metric collection

Privacy is sacred, and although we don't violate it in any way, we still gracefully provide you the choice to opt out (and later opt in when you've faced the righful regret).

bundle config set disable_metrics true is used to opt out, and bundle config set disable_metrics false is used to opt back in.

This setting is saved in the global config file, and the default for non existing setting is opt in. When the user opts out, none of the metric related code is executed.

Tests

The code is pretty much fully covered in unit tests.

rubygems.org

Routing

The metrics are accepted in YAML format in a POST request to /api/metrics.

Parsing

The data type is declared as YAML in the header before sending, so the metrics are accessed from the request.raw_post data rather than the params. They are parsed back into an array with Psych.safe_load, and are instrumented by the hash keys.

Validation

  1. All values that are supposed to be version numbers that aren't are discarded.
  2. All values that are realisticly too long are discarded.

Instrumentation

We currently use Datadog's StatsD to increment the counter for the low cardinality metrics (such as command used). We currently don't instrument the high cardinality metrics, but we will probably replace Datadog entirely with timescale to instrument all the metrics at some point.

Tests

The code is pretty much fully covered in unit tests.

bundler.io

The documentation page for the metric collection project.

Tests

Dude

Miscellaneous Work

Just a small bug fix I helped with while working on rubygems.org that was merged.

Original Project Purpose:

The description in the idea list can be found here.

What was the problem?

The actual purpose change quite a bit since the original description was written. The previous state of this project was that Bundler was sending a few metrics in the User Agent HTTP header during requests to gem servers. Those metrics were left uninstrumented, aside from the usage of kirby to parse the HTTP logs from Fastly, which resulted in a very large overhead for a few KB worth of information.

What was the proposed solution?

The overall purpose of this project was to come up with functionality that will collect those previous metrics along with many others the core development team is interested in,

Future Plans

Probably taking part in the development when the team decides to make the move from Datadog to timescale. Hopefully gather the time to keep contributing!

Experience

The entire journey has been amazingly fun and educating. I took part in GSoC while going through a full semester and it has been probably the busiest three months of my life. Still, I had the opportunity to learn A LOT, work with real professionals and above all, give back and contribute to the language we all love.

Hacking through unknown problems and putting together code that not only solves them but is top notch and production ready for best-of-the-industry projects is a gigantic challenge but a fun, educating one. I learned not only a lot in the technical aspect, I was enlightened (for the millionth time) of our capability to learn new things and use them to overcome technical difficulties.

The GSoC experience is second to none and I strongly suggest taking part!

Acknowledgements!

  • Ruby Organization - For giving me the opportunity to not only contribute to the language I love, but also to gain invaluable knowledge and experience.
  • Andre - @indirect - For mentoring and guidance, and voluntarily putting in the time and effort despite real hardships. This definitely wouldn't have been possible without you.
  • Aditya - @sonalkr132 - For co mentoring and helping with the backend part, and managing the weekly meetings.
  • Google - For funding the project, granting us students the opportunity to kickstart out careers, and helping us realize how possible, approachable and wonderful the world of open source is - I really wasn't aware of that.
  • App Academy - Last but far from least! for App Academy Open, without a single doubt the best single code learning resource on the internet (and its all for free!!!!). I shit you not, I would have never accomplished even a fraction of what I did without those guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment