Created
March 23, 2020 22:24
-
-
Save jianminchen/7dd83b551698ba69b1c8829dab223aef to your computer and use it in GitHub Desktop.
design twitter - March 22, 2020 10:00 PM work as an interviewer. - Excellent performance from anonymous interviewee
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Use cases | |
Functional requirements: | |
We'll scope the problem to handle only the following use cases | |
i) User posts a tweet | |
avg tweet, video or audio combo size is 10 KB | |
ii) Service pushes tweets to followers, sending push notifications and emails | |
latency is less thab 2 seconds | |
iii) User views the user timeline (activity from the user) - chrnological order | |
50 tweets | |
iv) User views the home timeline (activity from people the user is following) -chrnological order | |
50 tweets | |
v) User searches keywords | |
tweet related to keywords -> max search result range 20 tweets (sort timestamp) | |
vi) Performance metrics , monitoring or logging | |
Non functional requirements: | |
i) Service has high availability | |
ii) Latnecy should be low | |
Layout: | |
Estimates 100 million daily active users, 500 million tweets per day -> 500 /24 = 2 million per hour => peak hours => 4 million per hour / 3600 = 1000 requests/sec | |
RPS : 1000 req/sec | |
tweet storage: 500 million * 365 * 2 = 1 billion * 365 => 365 billion tweets hold -> 365 billion * 10 KB => 4 PB data | |
user profile info : 200 kB per user -> 200 KB * 500 million total users => 1 PB data | |
Timeline: 50 tweets per users -> 50 * 8 bytes = 200 bytes per user -> 200 bytes * 500 million => 4 TB data | |
Followers: 10 followers -> 8 bytes * 10 = 80 bytes per user -> 80 * 500 million => 400 GB data | |
Storage estimates - 5.5 PB data (tweet info + user profile + timeline + followers) | |
Memory estimates - 20 % rule -> 4 TB * 20 % => 1 TB data | |
Bandwdith estimates -> 1000 req/sec * 50 tweets * 10 KB => 500 MB / Sec | |
API design: | |
i) postTweet(userId, tweetText, location, timestamp, byte[] media) | |
ii) getHomeTimeline(userId, timestamp, size) | |
iii) getUserTimeline(userId (usertimeline), vistorId, size) | |
iv) search(userId, location, timestamp, searchText) | |
Database design: | |
User: | |
UserID, location, creationdate, password, lastlogindatetime, thumbnail, profiledesc | |
UserTweet: | |
UserID (row key) | |
tweetId, creation date | |
Tweet | |
TweetId, Text, mediaid (unique identifier), comments, retweetCount, likeCount | |
Media | |
MediaId,blobpath, creationdate | |
Followers: (who is following me) | |
UserId | |
FollowerId | |
Following: (What I am following) | |
UserId | |
FollowingIds | |
High level design: | |
Caching | |
Paritioning | |
Scalability | |
Availability | |
Latency | |
bottlenecks | |
extra stuff | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment