Skip to content

Instantly share code, notes, and snippets.

@peijiehu
Created June 23, 2017 04:53
Show Gist options
  • Save peijiehu/d0612e948492f1c57f039294fd6e6113 to your computer and use it in GitHub Desktop.
Save peijiehu/d0612e948492f1c57f039294fd6e6113 to your computer and use it in GitHub Desktop.

http://highscalability.com/blog/2016/4/20/how-twitter-handles-3000-images-per-second.html

The New Way - Twitter In 2016

The Write Path

Decoupling media upload from tweeting.

Uploading was made a first class citizen. An upload endpoint was created, it’s only responsibility is to put the original media in BlobStore

This gives a lot of flexibility in how upload is handled.

The client talks to TFE which talks to Image Service which puts the image in BlobStore and adds data into a Metadata store. That’s it. There are no other hidden services involved. No one is handling the media no one is passing it around.

A mediaId, a unique identifier for the media, is returned from the Image Service. When a client wants to create a tweet, a DM, or update their profile photo, the mediaId will be used as a handle to reference the media rather than supplying the media.

Let’s say we want to create a tweet with the media that was just uploaded. The flow goes like:

The client hits the update endpoint, passing the mediaId in the post; it will hit the Twitter Front End; the TFE will route to the service that’s appropriate for the entity that is being created. For tweets it’s TweetyPie. There are different services for DMs and Profiles; all the services will talk to the Image Service; The Image Server has post processing queues that handle features like face detection and child pornography detection; when that’s finished the Image Service talks to ImageBird for images or VideoBird for videos; ImageBird will generate variants; VideoBird will do some transcoding; whatever media is generated will be put in BlobStore.

No media is being passed around. A lot of wasted bandwidth has been saved.

Segmented resumable uploads.

Walk into a subway, come out 10 minutes later, the upload process will be resumed from where it was left off. It’s completely seamless for the user.

A client initializes an upload session using the upload API. The backend will give it a mediaId that is the identifier to use through the entire upload session.

An image is divided into segments, say three segments. The segments are appended using the API, each append call gives the segment index, all appends are for the same mediaId. When the upload is completed the upload is finalized and the media is ready to be used.

This approach is much more resilient to network failures. Each individual segment can be retried. If the network goes down for any reason you can pause and pick up the segment you left off at when the network comes back.

A simple approach with huge gains. For files > 50KB there was a 33% drop in image upload failure rate in Brazil, 30% in India, and 19% in Indonesia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment