Skip to content

Instantly share code, notes, and snippets.

@indirect
Forked from sidk/proposal.md
Last active August 9, 2019 21:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save indirect/719e4e95d36a005c633a05fd8d4a9ebe to your computer and use it in GitHub Desktop.
Save indirect/719e4e95d36a005c633a05fd8d4a9ebe to your computer and use it in GitHub Desktop.

How would you summarize your proposal, in one sentence?

A web application that collects usage information from RubyGems.org, and surfaces that information in charts and graphs for use by the Ruby community—the initial version includes Ruby, Bundler, and RubyGems versions, as well as Ruby platform and CI platform.

Who are you, and how are you related to the project?

Sid Krishnan, a contributor to the project for the last few months, collaborating with André Arko, the original creator.

What are your preferred pronouns? (e.g. she/her, they/them, ze/zir)

he/him

What is your home timezone?

UTC-4/UTC-5 (depending on Daylight savings)

What is your location?

Toronto, Ontario, Canada

Do you consider yourself an underrepresented minority in tech?

No

How much funding are you requesting?

$36,000 (20 hours a week for 12 weeks)

What do you hope to accomplish with this funding?

Project Goal: Capture, organize, correlate and surface usage data for Ruby gems.

This funding will allow me to actively develop this application towards several milestones:

Phase 1 - MVP (~90% complete) ~v1.0

A public dashboard graphing high-level metrics about the users of RubyGems.org over time:

Collect and graph high-level metrics on Bundler and Rubygems usage over time:

  • Ruby version
  • Bundler version
  • Rubygems version
  • CI Provider
  • Unix Platform

Phase 2 - Expansion ~v2.0

Track and display per-version-per-day statistics across all gems. RubyGems.org previously captured this information, but stopped tracking it when the volume of data began to overwhelm the existing servers—and that was more than 5 years ago, when RubyGems.org experienced only 10% of the traffic it experiences today.

A feasible architecture for this will have several strong requirements:

  • Low ongoing human maintenance needs (5 hours per week or less, ideally closer to 1)
  • Relatively low ongoing cost ($4000 per month or less, ideally closer to $1000)
  • Capacity to store a relatively large amount of data for relatively quick retrieval
    • One record per gem version per day means over one million rows per day

"Ecosystem 2.0" will consist of a public website graphing usage-over-time information for every gem and every version of every gem.

Phase 3 - Correlation

Add the ability to correlate different data sets to each other. This would allow users to answer questions that involve potentially disparate data points. Examples:

  • "Show me a graph of platform usage over time for Bundler v1.15 requests"
  • "Show me reverse dependencies over time on the Windows platform for Rails" (not sure about this one)
  • "Show me Bundler usage metrics over time, collated by project, instead of by raw request count"

This requires at least an order of magnitude more storage, since the individual events have to be stored and collated for each query, rather than simply storing rollup counters.

Phase 4 - Consolidation

Build a user interface that incorporates the best features from rubygems.org, ruby-toolbox.com, npmjs.com, and cocoapods.org to offer a comprehensive in-browser experience to a user evaluating a gem. Features can include:

  • Gem readme and documentation
  • Number of projects using a gem
  • Other gems depending on a gem
  • One-click in-browser Ruby interpreter to try out the gem
  • Usage-over-time and version-popularity-over-time graphs

What is your expected timeline for that work, broken down across the 12 weeks you will have funding?

Assuming about 2 days worth of work per week:

Weeks 1-2: Complete v1.0 of the Ecosystem app. This involves:

  • Styling the app
  • Adding a couple of new graphs (bundler vs rubygems and CI vs non-CI usage)
  • Initiate user testing
  • Launching it to the community

Weeks 3-6: Research data storage architectures that are able to store a large amount of data in a cost-effective manner.

Weeks 6-9: Build process to collect and store usage data for all available gems.

Week 9: Collaborate on design for graphing individual gem data, and start building out Ecosystem v2.0

Week 10-12: Continue building Ecosystem 2.0.

How will this work improve the Ruby community?

Apart from being interesting to look at, metrics from Bundler and rubygems.org graphed out have the potential to help:

  • Gem authors understand what versions of Ruby they should consider supporting
  • Developers make decisions on if and when they should upgrade their Ruby version
  • The Ruby community as a whole understand better how it is evolving and changing

Do you receive any other funding for work on this project?

No

Do you have any existing OSS work you would like to mention or show the selection committee?

Only the work so far on this project, as this application has been my first OSS project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment