Skip to content

Instantly share code, notes, and snippets.

@aeden
Created January 16, 2011 13:31
Show Gist options
  • Save aeden/781782 to your computer and use it in GitHub Desktop.
Save aeden/781782 to your computer and use it in GitHub Desktop.
Using DNS for Rubygems index

Currently the RubyGems index is stored as a Gzip file that is a marshalled array. Whenever Bundler needs to install a gem that is not yet installed it downloads the index, gunzips it and unmarshals it. It then looks for dependencies that are described in another file that is also a gzipped and marshalled file. There are several issues that arise from this setup:

  • The full index must be downloaded and parsed, but does not contain dependency data, which must then be downloaded and parsed. This is a relatively time consuming process.
  • The index must be centralized.

Additionally the gems themselves are currently centralized since there is nothing in the meta data that indicates where the gem should be downloaded from. However in order to allow this it is important to find ways of keeping the index from being poisoned (is this an issue in the centralized system?)

Dependency Resolution

I'd like to propose an alternate way of indexing RubyGems: let's use DNS.

Here's how this might work:

  • Client sends question to local name server for ALL records at rails.index.rubygems.org

  • Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots

  • Root delegates to .org name servers

  • .org name servers delegate to rubygems.org name servers

  • rubygems.org name servers can either respond to the query or delegate to another set of name servers. It'll answer in this case.

  • rubygems.org name servers respond with a CNAME record pointing to 1.0.3.rails.index.rubygems.org and all PTR records for 1.0.3.rails.index.rubygems.org. For example:

    rails.index.rubygems.org.         600   CNAME 1.0.3.rails.index.rubygems.org.
    1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.activesupport.index.rubygems.org.
    1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.actiopack.index.rubygems.org.
    1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.activerecord.index.rubygems.org.
    1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.activeresource.index.rubygems.org.
    1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.actionmailer.index.rubygems.org.
    1.0.3.rails.index.rubygems.org.   600   PTR   0.0.3.railties.index.rubygems.org.
    1.0.3.rails.index.rubygems.org.   600   PTR   1.bundler.index.rubygems.org.
    

Note that some PTR records represent canonical gem names and others would be a CNAME pointing to the appropriate canonical version. The last record is an example of this where the CNAME record would likely resolve to something like 7.0.1.bundler.index.rubygems.org (which would be the reverse notation for bundler-1.0.7)

Downloads

In addition to dependency management another interesting use of DNS is to provide references to where gems can be downloaded. Here is how this might work:

  • Client sends question to local name server for ALL records at rails.index.rubygems.org

  • Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots

  • Root delegates to .org name servers

  • .org name servers delegate to rubygems.org name servers

  • rubygems.org name servers can either respond to the query or delegate to another set of name servers. It'll answer in this case.

  • rubygems.org name servers respond with a CNAME record pointing to 1.0.3.rails.index.rubygems.org and all NAPTR records for 1.0.3.rails.index.rubygems.org, for example:

    rails.index.rubygems.org.         600   CNAME   1.0.3.rails.index.rubygems.org.
    1.0.3.rails.index.rubygems.org.   600   NAPTR   100 10 "U" "TCP+http" "!^.*$!http://rubygems.org/rails-3.0.1.gem!i" .
    1.0.3.rails.index.rubygems.org.   600   NAPTR   100 20 "U" "TCP+http" "!^.*$!http://backup.rubygems.org/rails-3.0.1.gem!i" .
    

Note that there is no need to do any complex regex translation to get the various URLs since they are mapped directly to the canonical name of the gem.

Other Considerations

Platforms

To support multiple platforms (i.e. jruby) the client will first try platform.z.y.x.gemname.index.rubygems.org. If this is not found then the client should use z.y.x.gemname.index.rubygems.org. If a platform gem is provided then CNAME records will also need to be provided for all of the variations, i.e platform.y.x, platform.x and platform.

Decentralization

DNS provides the tools necessary to make this a decentralized system if we desire. This would be accomplished by delegating responsibility for gem names out to different DNS servers other than the rubygems.org servers. For example, if responsibility for management of the Rails gem metadata was decrentralized then the interaction might look like this:

  • Client sends question to local name server for TXT records at rails.index.rubygems.org

  • Local name server does not have the record so it sends the usual response indicating that the search should go upstream to the roots

  • Root delegates to .org name servers

  • .org name servers delegate to rubygems.org name servers

  • rubygems.org name servers respond with the following NS record:

    rails.index.rubygems.org.   600   NS   ds1.rubyonrails.org
    rails.index.rubygems.org.   600   NS   ds2.rubyonrails.org
    
  • The question is then sent to one of the two name servers which responds with a CNAME record pointing rails.index.rubygems.org to 1.0.3.rails.index.rubyonrails.org.

  • The rubyonrails.org name servers would then respond as shown in the scenarios above.

Security

DNSSEC providers a means for signing DNS records so that you have verification that the name server is authoritative for the particular question. This technology is not yet widely deployed, however it does have the potential for providing layer of protection against gem poisoning when used in conjunction with and SHA signature. The SHA signature could also be stored in the name servers using a TXT or SIG record. This technology is still very experimental, but the potential exists for having a highly trusted distribution system.

Searching

DNS does not provide a mechanism for search for records given a part of a name. For example, there is no mechanism in DNS to query for the term "active" and get "activerecord", "activeresource", etc. This functionality would need to be provided using a protocol other than DNS.

Reference

@aeden
Copy link
Author

aeden commented Jan 18, 2011

I've removed the section on TXT records for holding meta data since it's not clear if DNS TXT records are the appropriate place for them. The goal is to focus on dependency resolution first and foremost.

Also, it may make sense to remove download routing from this as well, at least initially.

@copiousfreetime
Copy link

add examples of what various dependency operators would look like -- https://gist.github.com/791687

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment