Skip to content

Instantly share code, notes, and snippets.

@jianminchen
Created February 18, 2020 19:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jianminchen/ff21b0a1cad67216376999512dd1b696 to your computer and use it in GitHub Desktop.
Save jianminchen/ff21b0a1cad67216376999512dd1b696 to your computer and use it in GitHub Desktop.
System design - Feb. 16, 2020 - design twitter first 40 minutes, last 10 minutes, design deploy code
Design a twitter, simple version, 100 follower at most for one person, RESTFUL API. Later on celebrity millions follower, hashtag, allow search using hashtag.
Accounts, 100 followers max
Searching - hashtag, text
Feed generation
SCHEMA
------
Account
- handle (PK)
- display name
- followers
- feed (point to head of linked list)
- last_login
- src IP address
Follow table
- src
- dst
Message
- message_id (PK)
- author (FK into account)
- content (str, max 140 chars)
- hashtags
- time
Hashtags
- hashtag (PK)
- message_id (FK into message)
API
---
different endpoints for different functions
use cases:
- post message
- delete message?
- search_text
- search_hashtag
- request_feed
- follow
post_message(handle, message) -> POST
delete_message(handle, message_id) -> DELETE
search_text(text) -> GET # need the requesting users handle so we can see private tweets
search_hashtag(hashtag) -> GET
request_feed(handle) -> GET
follow(handle_src, handle_dst) -> POST
SQL Database + cache (write through to elasticsearch)
Elasticsearch (for text queries only)
Text search service # maybe backed by alternative database suited for querying eg. elasticsearch or other document store
Hashtag search service
Feed generation service
Fraud detection service
Notification service -> question about protocol? long polling
Follow service (checks whether exceeded follow limit, raise event to regenerate feed)
when a user A posts a tweet -> raise event -> queue to recompute feeds for users B that follow user A
Elasticsearch
words -> documents (tweets)
"the dog went to the park" Tweet #23
Index - inverted index
"the" -> [23]
"dog" -> [23]
..
..
..
"park" -> [23]
"typical index" id -> contents
"inverted index" contents -> id
How to scale the system?
How to see tweets of people they are following?
14:00 create new account
14:000001 100 new follows
14:02 1 follows
14:03 1 follow
how to deploy code?
Build,
4GB -> code
10,000 deployment instances?
7 regions
Problems:
- Efficient way to build code so we don't have to build it 10,000 times? A: use docker container to avoid recompiling 4G of code for each deployment
- How to update live version with new build?
- Load balancers:
- spin up small number of instances with new build
- load balancer redirects small portion of traffic to new instances
- wait to see if there are errors/crashes etc
- gradually introduce new instances of new build, and remove instances of old build
- if errors, take new instances down until we fix bug
- if no errors, continue increasing instances of new build until it reaches 10,000
- schedule based on low-demand times for each region
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment