Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
How Shopify Scales Rails via @JohnDuff #railsConf

How Shopify Scales Rails John Duff

The Stack:

  • ruby1.9.3-p327
  • rails3.2
  • unicorn 4.5
  • percona Mysql5.5
  • memcache14.14
  • redis2.6

33 app servers, 1172 unicorn workers, 5 job servers, 370 job workers 211 controllers, 468 models..

Current Scale:

9.9 M orders last year 2008 sales per minutes Cyber Monday 50,000 RPM, 45ms response time

Looking back:

first line of code written in 2004 shopify released on July, 2005

Know where to optimize the system. one request, one process

increase workers, discrease reponse time

Know the system

  • avoid network calls during requests
  • speed up unavoidable network calls
  • the storefront and checkout
  • the Chive: handle spikes in the system

Measure ALL THE THINGS

  • New Relic
  • splunk
  • statsD
  • Cacti //mysql records historic stats
  • Conan //test production system

dashboard all over the office

Caching

cacheable https://github.com/Shopify/cacheable

  • serve gzip'd content, stored in memcache. increase the response
  • generational caching
  • no explicit expiry

caching 404s

Identity Cache: https://github.com/Shopify/identity_cache

  • cache full model objects in memcached
  • can include associated objects in cache
  • must opt in to the cache
  • explicit, but expiry
class Product < ActiveRecord::Base
  include IdentityCache

  has_many :images

  cache_has_many :images, :embed => true
end

# Fetch the product by its id, the primary index.
@product = Product.fetch(id)

# Fetch the images for the Product. Images are embedded so the product fetch would have already loaded them.
@images = @product.fetch_images

Get out of my process

Delayed Job

  • jobs stored in the db
  • workers run in their own process
  • workers poll for jobs periodically

moved to:

Resque: https://github.com/resque/resque

  • redis backed
  • no more db contention
  • faster (300 jobs/sec vs 120 jobs/sec)
  • extensible

ex:

  • sending emails
  • processing payments
  • geolocation
  • import/export
Class AddressGeolocationJob
max_retries 3

def sefl.perform(params)
  object = params[:model].constantize.find(params[:id])
  object.latitude, object.longtitude = Geocoder.geocode(object)
  object.save!
end
end

Resque.enque(AddressGeolocationJob, :id => 1, :model => 'Address')

Redis

  • Inventory resavation system
  • sessions
  • theme uploads
  • throttliing
  • sequenced column

Speed up MySQL

  • 4x8 core processor
  • SSD
  • 256 GB RAM
  • full working set in memory

query opt

  • pt-query-digest
  • avoid queries that generate temp tables
  • adding the right indexes
  • forcing/ignoring indexes

MySQL tuning

  • disable innodb_stats_on_metadata
  • increase table_open_cache
  • replace glibc memory allocator with tcmalloc
  • innodb_autoinc_lock_mode='interleaved'

after_commit //db transactions

  • after transaction has been commmited
  • webhooks
  • cache expiry
  • update associated objects

Services

  • split out standalone services as needed
  • independently scaled
  • segmented metrics
  • overall system is more complex ex) Imagery

@johnduff http://www.youtube.com/watch?v=j347oSSuNHA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment