To have Bundler report anonymous usage metrics over HTTP to rubygems.org when a heavy command is run (to avoid deteriorating the runtime), and to have a backend API endpoint in rubygems.org to instrument it.
Basically, yes. The actual implementation changed a little bit, and we decided to only instrument the low cardinality metrics using Datadog, as the free Datadog version limits value cardinality and wouldn't work with, for example, time related metrics. Changing it into another metric instrumentation database has been left out of this project for another time.
Overall, Bundler sends all the metrics we planned in addition to all the metrics the core team wanted, and rubygems accepts most of it, so I'd deem the project a complete success :]
OK.
Pull Request | Commits | Changes | Status |
---|---|---|---|
Bundler | 84 commits | +532/-10 | Awaiting approval |
rubygems.org | 13 commits | +228/-0 | Approved |
bundler.io | 2 commits | +61/-0 | Awaiting approval |
TOTAL | 99 commits | +821/-10 |
- This will be updated until the code is merged.
Collects the following metrics when any command is run:
Command used, Options if specified, Time taken to fully execute for commands that don't kill Bundler, Time taken to start executing for commands that do, Timestamp.
Additionally, collects the following metrics when install/outdated/package/update/pristine is run:
(generated) randomized hex ID, Remote git repository (hashed), git version, rvm version, rbenv version, chruby version, Host system details, Ruby version, Bundler version, Rubygems version, Ruby engine, CI’s, Extra user agent strings, Gemfile gem count, Actually installed gem count, git gem count, Path gem count, Gem source count, List of gem sources (hashed).
Additionally, when a gem is downloaded:
Gem download time.
Additionally, when a gem is installed:
Gem install time.
Additionally, when a gemfile is resolved:
Gemfile resolve time.
Additionally, when a gem fails to install:
Name and version of the criminal.
The aforementioned metrics are appended to the metrics.yml
file in the global Bundler directory as a Ruby hash until install/outdated/package/update/pristine has been run.
When any of these commands is run it reads the metrics into an array of hashes, converts it to YAML and sends it over HTTP to the server.
Privacy is sacred, and although we don't violate it in any way, we still gracefully provide you the choice to opt out (and later opt in when you've faced the righful regret).
bundle config set disable_metrics true
is used to opt out, and bundle config set disable_metrics false
is used to opt back in.
This setting is saved in the global config file, and the default for non existing setting is opt in. When the user opts out, none of the metric related code is executed.
The code is pretty much fully covered in unit tests.
The metrics are accepted in YAML format in a POST request to /api/metrics.
The data type is declared as YAML in the header before sending, so the metrics are accessed from the request.raw_post
data rather than the params.
They are parsed back into an array with Psych.safe_load
, and are instrumented by the hash keys.
- All values that are supposed to be version numbers that aren't are discarded.
- All values that are realisticly too long are discarded.
We currently use Datadog's StatsD to increment the counter for the low cardinality metrics (such as command used). We currently don't instrument the high cardinality metrics, but we will probably replace Datadog entirely with timescale to instrument all the metrics at some point.
The code is pretty much fully covered in unit tests.
The documentation page for the metric collection project.
Dude
Just a small bug fix I helped with while working on rubygems.org that was merged.
The description in the idea list can be found here.
The actual purpose change quite a bit since the original description was written. The previous state of this project was that Bundler was sending a few metrics in the User Agent HTTP header during requests to gem servers. Those metrics were left uninstrumented, aside from the usage of kirby to parse the HTTP logs from Fastly, which resulted in a very large overhead for a few KB worth of information.
The overall purpose of this project was to come up with functionality that will collect those previous metrics along with many others the core development team is interested in,
Probably taking part in the development when the team decides to make the move from Datadog to timescale. Hopefully gather the time to keep contributing!
The entire journey has been amazingly fun and educating. I took part in GSoC while going through a full semester and it has been probably the busiest three months of my life. Still, I had the opportunity to learn A LOT, work with real professionals and above all, give back and contribute to the language we all love.
Hacking through unknown problems and putting together code that not only solves them but is top notch and production ready for best-of-the-industry projects is a gigantic challenge but a fun, educating one. I learned not only a lot in the technical aspect, I was enlightened (for the millionth time) of our capability to learn new things and use them to overcome technical difficulties.
The GSoC experience is second to none and I strongly suggest taking part!
- Ruby Organization - For giving me the opportunity to not only contribute to the language I love, but also to gain invaluable knowledge and experience.
- Andre - @indirect - For mentoring and guidance, and voluntarily putting in the time and effort despite real hardships. This definitely wouldn't have been possible without you.
- Aditya - @sonalkr132 - For co mentoring and helping with the backend part, and managing the weekly meetings.
- Google - For funding the project, granting us students the opportunity to kickstart out careers, and helping us realize how possible, approachable and wonderful the world of open source is - I really wasn't aware of that.
- App Academy - Last but far from least! for App Academy Open, without a single doubt the best single code learning resource on the internet (and its all for free!!!!). I shit you not, I would have never accomplished even a fraction of what I did without those guys.