Move the Bundler API back into RubyGems.org
What is your project idea?
My project would re-implement the Bundler API on the Rails app that provides RubyGems.org. Bundler uses bundler-api to download dependency information about all the gems it is trying to install. This request for complex database queries, marshalling the output and everything else is quite CPU intensive and rubygems.org's limited hardware infrastructure could not support it until now. With help of Ruby Together, hopefully we will make
bundle install faster and more reliable for everyone.
Have you found someone to mentor you already?
What related experience and education do you have?
I am a computer science undergraduate student in my junior year from National Institute of Technology, Durgapur, India. I was a GSoC student on GlitterGallery under the Fedora Project in 2015. I am maintainer on GlitterGallery now. My experience was as such that I came to love reading code and documentation, and figuring out solution on my own. More often than not, you can find me watching talks on Confreaks or listening to some podcast on Ruby Rogues or thoughtbot.
It is diffcult to find time when classes are going on, regardless I have been able to contribute on quite a few open source projects: glittergallery/GlitterGallery, openSUSE/osem, github/government.github.com, openstreetmap/openstreetmap-website, rails/spring, rails/rails, github/pages-gem and others. I am proud of them even if my contribution on some of them was as small as a fix in documentation.
Why do you think this idea is worth doing?
bundler-api and rubygems.org maintain two databases with same data, and it is crucial that they remain in sync #118, #53. Besides the webhook provided by rubygems.org #580, bundler-api runs a cron job #25, #115 to keep its data upto date with that of rubygems.org. I hope you can see the amount of unecessary and duplicate work we are doing. Didn't rails and ruby teach us to keep it DRY?
If only the api used by bundler talked directly to database of rubygems.org, and it had tests which ensured that the api used by bundler was never broken; none of the mentioned issues would have existed in the first place and no one would have to spend time fixing them.
Maintainers wouldn't have to maintain two infrastructure. Both
bundle install and
gem install (it uses bundler-api too) commands would be more reliable.
gem push would be faster because rubygems.org won't have to wait for the ackowledgement from bundler-api about the update of its database. Hopefully, dealing with yanked gems will be a little less painful too #49.
What are your plans for the summer?
I have no engagements other than GSoC.
What is your expected timeline?
Community Bonding (before 23 May)
I have my end of semester exams from 22 May to 7 May. It is unlikely that I will be active for the duration of two weeks. It still leaves me a lot of time to get acquainted with things I am not as comfortable, for example: memcached and metriks. I would also like to get myself familiar with codebase of rubygems.org.
Iteration 1 - Import
api/v1/dependencies endpoint (24 May - 20 June)
- It would return both json and binary of Marshal dump.
- Response would contain gem name, number (version), platform, rubygems_version, ruby_version , checksum , created_at , dependencies.
- This api would be exhaustively tests for the cases like too many gems, no gems, yanked gems etc
Mid-term Evalution (21 June - 28 June)
Hopefully, in this time we can alpha test the new api and fix possible issues along the way.
Iteration 2 - Remove webhook set up from rubygems.org (29 June - 3 July)
- pusher model of rubygems.org sends data about gems being pushed to rubygems.org to bundler-api.
- I am going to remove this setup on a experimental branch on rubygems.org and make sure that everything works without the webhook too. Probably, we will tranisition people to the new api version by version.
Iteration 3 - Add caching (4 July - 24 July)
- bundler-api uses fastly and memcached to cache the responses. We would want the same functionality on the new api.
- We should be able to store and purge cache on per-gem basis.
Iteration 4 - Integrate monitoring (25 July - 15 August)
- bundler-api used librato to track response times, database queries, request complexity etc. We would want similar monitoring support on rubygems.org. I found this blog informative: Production Visualization and Metrics Tools
- We would need intergration for watching exceptions in a log aggregator.
Final Evalution (16 August - 24 August)
I want all the code I have written to be well tested and of reasonable code quality. If I am missing on either of those, I would use this time to fix them.
Do you have plans if you finish sooner than expected? How about slower than expected?
I am not sure if anyone is applying for idea of Bundler and gem metrics, statistics, and analytics, if I finish sooner than my tentative timeline I want to help on that.
How do you expect to accomplish your project idea?
Most of the code I am to write is already there on bundler-api. Also, previous attempts #1163, #966 are clear give aways of what I am suppose to do. I am sure André, Samuel and probably every other member of rubygems.org and bundler team would be there to help me if I mess up. I am counting on an eventful summer.