Skip to content

Instantly share code, notes, and snippets.

@jianminchen
Created March 23, 2020 22:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jianminchen/7dd83b551698ba69b1c8829dab223aef to your computer and use it in GitHub Desktop.
Save jianminchen/7dd83b551698ba69b1c8829dab223aef to your computer and use it in GitHub Desktop.
design twitter - March 22, 2020 10:00 PM work as an interviewer. - Excellent performance from anonymous interviewee
Use cases
Functional requirements:
We'll scope the problem to handle only the following use cases
i) User posts a tweet
avg tweet, video or audio combo size is 10 KB
ii) Service pushes tweets to followers, sending push notifications and emails
latency is less thab 2 seconds
iii) User views the user timeline (activity from the user) - chrnological order
50 tweets
iv) User views the home timeline (activity from people the user is following) -chrnological order
50 tweets
v) User searches keywords
tweet related to keywords -> max search result range 20 tweets (sort timestamp)
vi) Performance metrics , monitoring or logging
Non functional requirements:
i) Service has high availability
ii) Latnecy should be low
Layout:
Estimates 100 million daily active users, 500 million tweets per day -> 500 /24 = 2 million per hour => peak hours => 4 million per hour / 3600 = 1000 requests/sec
RPS : 1000 req/sec
tweet storage: 500 million * 365 * 2 = 1 billion * 365 => 365 billion tweets hold -> 365 billion * 10 KB => 4 PB data
user profile info : 200 kB per user -> 200 KB * 500 million total users => 1 PB data
Timeline: 50 tweets per users -> 50 * 8 bytes = 200 bytes per user -> 200 bytes * 500 million => 4 TB data
Followers: 10 followers -> 8 bytes * 10 = 80 bytes per user -> 80 * 500 million => 400 GB data
Storage estimates - 5.5 PB data (tweet info + user profile + timeline + followers)
Memory estimates - 20 % rule -> 4 TB * 20 % => 1 TB data
Bandwdith estimates -> 1000 req/sec * 50 tweets * 10 KB => 500 MB / Sec
API design:
i) postTweet(userId, tweetText, location, timestamp, byte[] media)
ii) getHomeTimeline(userId, timestamp, size)
iii) getUserTimeline(userId (usertimeline), vistorId, size)
iv) search(userId, location, timestamp, searchText)
Database design:
User:
UserID, location, creationdate, password, lastlogindatetime, thumbnail, profiledesc
UserTweet:
UserID (row key)
tweetId, creation date
Tweet
TweetId, Text, mediaid (unique identifier), comments, retweetCount, likeCount
Media
MediaId,blobpath, creationdate
Followers: (who is following me)
UserId
FollowerId
Following: (What I am following)
UserId
FollowingIds
High level design:
Caching
Paritioning
Scalability
Availability
Latency
bottlenecks
extra stuff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment