Skip to content

Instantly share code, notes, and snippets.

@aeden
Forked from copiousfreetime/gem-index-with-dns.md
Created June 17, 2011 14:16
Show Gist options
  • Save aeden/1031494 to your computer and use it in GitHub Desktop.
Save aeden/1031494 to your computer and use it in GitHub Desktop.

Currently the RubyGems index is stored as a Gzip file that is a marshalled array. Whenever Bundler needs to install a gem that is not yet installed it downloads the index, gunzips it and unmarshals it. It then looks for dependencies that are described in another file that is also a gzipped and marshalled file. There are several issues that arise from this setup:

  • The full index must be downloaded and parsed, but does not contain dependency data, which must then be downloaded and parsed. This is a relatively time consuming process.
  • The index must be centralized.

Additionally the gems themselves are currently centralized since there is nothing in the meta data that indicates where the gem should be downloaded from. However in order to allow this it is important to find ways of keeping the index from being poisoned (is this an issue in the centralized system?)

Dependency Resolution

I'd like to propose an alternate way of indexing RubyGems: let's use DNS.

Here's how this might work. For this example, I want to get the latest version of Rails, which is 3.0.1 (in this scenario):

  • Client sends question to local name server for ALL records at rails.index.rubygems.org
  • Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
  • Root delegates to .org name servers
  • .org name servers delegate to rubygems.org name servers
  • rubygems.org name servers can either respond to the query or delegate to another set of name servers. It'll answer in this case.
  • rubygems.org name servers respond with a CNAME record pointing to 1.0.3.rails.index.rubygems.org and all PTR records for 1.0.3.rails.index.rubygems.org.

For example:

rails.index.rubygems.org.         600   CNAME 1.0.3.rails.index.rubygems.org.
1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.activesupport.index.rubygems.org.
1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.actiopack.index.rubygems.org.
1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.activerecord.index.rubygems.org.
1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.activeresource.index.rubygems.org.
1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.actionmailer.index.rubygems.org.
1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.railties.index.rubygems.org.
1.0.3.rails.index.rubygems.org.   600   PTR   1.bundler.index.rubygems.org.

Note that some PTR records represent canonical gem names and others would be a CNAME pointing to the appropriate canonical version. The last record is an example of this where the CNAME record would likely resolve to something like 7.0.1.bundler.index.rubygems.org (which would be the reverse notation for bundler-1.0.7)

This also allows for ~>, = and >= support, for instance in the Amalgalite 1.0.0 gem has runtime dependencies of

  • arrayfields ~> 4.7.4
  • fastercsv ~> 1.5.4

This can be modeled with the following set of records

amalgalite.index.rubygems.org       600     CNAME   0.0.1.amalgalite.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 600     PTR     5.1.fastercsv.index.rubygems.org 
0.0.1.amalgalite.index.rubygems.org 600     PTR     7.4.arrayfields.index.rubygems.org 

It is not exactly the same, but close enough, the 5.1.fastercsv.index.rubygems.org would then be a CNAME record for the latest 1.5.x version of fastercsv.

for a = dependency, they would be:

amalgalite.index.rubygems.org       600     CNAME   0.0.1.amalgalite.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 600     PTR     4.5.1.fastercsv.index.rubygems.org 
0.0.1.amalgalite.index.rubygems.org 600     PTR     4.7.4.arrayfields.index.rubygems.org 

And for a >=, they would be dependent on the most recent release of the gem in question, which is always found as the CNAME of that gemname

amalgalite.index.rubygems.org       600     CNAME   0.0.1.amalgalite.index.rubygems.org
0.0.1.amalgalite.index.rubygems.org 600     PTR     fastercsv.index.rubygems.org 
0.0.1.amalgalite.index.rubygems.org 600     PTR     arrayfields.index.rubygems.org 

Downloads

In addition to dependency management another interesting use of DNS is to provide references to where gems can be downloaded. Here is how this might work:

  • Client sends question to local name server for ALL records at rails.index.rubygems.org
  • Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
  • Root delegates to .org name servers
  • .org name servers delegate to rubygems.org name servers
  • rubygems.org name servers can either respond to the query or delegate to another set of name servers. It'll answer in this case.
  • rubygems.org name servers respond with a CNAME record pointing to 1.0.3.rails.index.rubygems.org and all NAPTR records for 1.0.3.rails.index.rubygems.org

For example:

  rails.index.rubygems.org.         600   CNAME   1.0.3.rails.index.rubygems.org.
  1.0.3.rails.index.rubygems.org.   600   NAPTR   100 10 "U" "TCP+http" "!^.*$!http://rubygems.org/rails-3.0.1.gem!i" .
  1.0.3.rails.index.rubygems.org.   600   NAPTR   100 20 "U" "TCP+http" "!^.*$!http://backup.rubygems.org/rails-3.0.1.gem!i" .

Note that there is no need to do any complex regex translation to get the various URLs since they are mapped directly to the canonical name of the gem.

Other Considerations

Platforms

To support multiple platforms (i.e. jruby) the client will first try platform.z.y.x.gemname.index.rubygems.org. If this is not found then the client should use z.y.x.gemname.index.rubygems.org. If a platform gem is provided then CNAME records will also need to be provided for all of the variations, i.e platform.y.x, platform.x and platform.

Decentralization

DNS provides the tools necessary to make this a decentralized system if we desire. This would be accomplished by delegating responsibility for gem names out to different DNS servers other than the rubygems.org servers. For example, if responsibility for management of the Rails gem metadata was decrentralized then the interaction might look like this:

  • Client sends question to local name server for TXT records at rails.index.rubygems.org
  • Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots
  • Root delegates to .org name servers
  • .org name servers delegate to rubygems.org name servers
  • rubygems.org name servers respond with the following NS record

Example:

  rails.index.rubygems.org.   600   NS   ds1.rubyonrails.org
  rails.index.rubygems.org.   600   NS   ds2.rubyonrails.org
  • The question is then sent to one of the two name servers which responds with a CNAME record pointing rails.index.rubygems.org to 1.0.3.rails.index.rubyonrails.org.
  • The rubyonrails.org name servers would then respond as shown in the scenarios above.

Security

DNSSEC providers a means for signing DNS records so that you have verification that the name server is authoritative for the particular question. This technology is not yet widely deployed, however it does have the potential for providing layer of protection against gem poisoning when used in conjunction with and SHA signature. The SHA signature could also be stored in the name servers using a TXT or SIG record. This technology is still very experimental, but the potential exists for having a highly trusted distribution system.

Searching

DNS does not provide a mechanism for search for records given a part of a name. For example, there is no mechanism in DNS to query for the term "active" and get "activerecord", "activeresource", etc. This functionality would need to be provided using a protocol other than DNS.

Reference

@qrush
Copy link

qrush commented Jun 19, 2011

This is great, not sure if Gist is the right place to discuss this though.

@aeden
Copy link
Author

aeden commented Jun 19, 2011

@qrush this was more to get it out there and work on it initially. It seems that there is sufficient momentum now to discuss in the RubyGems mailing list (or in issues, although it may be premature for that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment