jianminchen/Design twitter

## Design twitter
Use cases

Functional requirements:

We'll scope the problem to handle only the following use cases
i) User posts a tweet
  avg tweet, video or audio combo size is 10 KB

ii) Service pushes tweets to followers, sending push notifications and emails
  latency is less thab 2 seconds

iii) User views the user timeline (activity from the user) - chrnological order
  50 tweets
iv) User views the home timeline (activity from people the user is following) -chrnological order
  50 tweets

v) User searches keywords
  tweet related to keywords -> max search result range 20 tweets (sort timestamp)

vi) Performance metrics , monitoring or logging

Non functional requirements:

i) Service has high availability
ii) Latnecy should be low

Layout:
Estimates 100 million daily active users,  500 million tweets per day -> 500 /24 = 2 million per hour => peak hours => 4 million per hour / 3600 = 1000 requests/sec
  RPS : 1000 req/sec
 tweet storage: 500 million * 365 * 2 = 1 billion * 365 => 365 billion tweets hold -> 365 billion * 10 KB => 4 PB data
 user profile info : 200 kB per user -> 200 KB * 500 million total users => 1 PB data
 Timeline: 50 tweets per users -> 50 * 8 bytes = 200 bytes per user -> 200 bytes * 500 million => 4 TB data
 Followers: 10 followers -> 8 bytes * 10 = 80 bytes per user -> 80 * 500 million => 400 GB data

  Storage estimates - 5.5 PB data (tweet info + user profile + timeline + followers)
  Memory estimates - 20 % rule -> 4 TB * 20 % => 1 TB data
  Bandwdith estimates -> 1000 req/sec * 50 tweets * 10 KB => 500 MB / Sec

API design:
i) postTweet(userId, tweetText, location, timestamp, byte[] media)
ii) getHomeTimeline(userId, timestamp, size)
iii) getUserTimeline(userId (usertimeline), vistorId, size)
iv) search(userId, location, timestamp, searchText)

Database design:
User:
UserID, location, creationdate, password, lastlogindatetime, thumbnail, profiledesc

UserTweet:
UserID (row key)
tweetId, creation date

Tweet
TweetId, Text, mediaid (unique identifier), comments, retweetCount, likeCount

Media
MediaId,blobpath, creationdate

Followers: (who is following me)
UserId
FollowerId

Following: (What I am following)
UserId
FollowingIds

High level design:


Caching
Paritioning
Scalability
Availability
Latency
bottlenecks
extra stuff
	Use cases

	Functional requirements:

	We'll scope the problem to handle only the following use cases
	i) User posts a tweet
	avg tweet, video or audio combo size is 10 KB

	ii) Service pushes tweets to followers, sending push notifications and emails
	latency is less thab 2 seconds

	iii) User views the user timeline (activity from the user) - chrnological order
	50 tweets
	iv) User views the home timeline (activity from people the user is following) -chrnological order
	50 tweets

	v) User searches keywords
	tweet related to keywords -> max search result range 20 tweets (sort timestamp)

	vi) Performance metrics , monitoring or logging

	Non functional requirements:

	i) Service has high availability
	ii) Latnecy should be low

	Layout:
	Estimates 100 million daily active users, 500 million tweets per day -> 500 /24 = 2 million per hour => peak hours => 4 million per hour / 3600 = 1000 requests/sec
	RPS : 1000 req/sec
	tweet storage: 500 million * 365 * 2 = 1 billion * 365 => 365 billion tweets hold -> 365 billion * 10 KB => 4 PB data
	user profile info : 200 kB per user -> 200 KB * 500 million total users => 1 PB data
	Timeline: 50 tweets per users -> 50 * 8 bytes = 200 bytes per user -> 200 bytes * 500 million => 4 TB data
	Followers: 10 followers -> 8 bytes * 10 = 80 bytes per user -> 80 * 500 million => 400 GB data

	Storage estimates - 5.5 PB data (tweet info + user profile + timeline + followers)
	Memory estimates - 20 % rule -> 4 TB * 20 % => 1 TB data
	Bandwdith estimates -> 1000 req/sec * 50 tweets * 10 KB => 500 MB / Sec

	API design:
	i) postTweet(userId, tweetText, location, timestamp, byte[] media)
	ii) getHomeTimeline(userId, timestamp, size)
	iii) getUserTimeline(userId (usertimeline), vistorId, size)
	iv) search(userId, location, timestamp, searchText)

	Database design:
	User:
	UserID, location, creationdate, password, lastlogindatetime, thumbnail, profiledesc

	UserTweet:
	UserID (row key)
	tweetId, creation date

	Tweet
	TweetId, Text, mediaid (unique identifier), comments, retweetCount, likeCount

	Media
	MediaId,blobpath, creationdate

	Followers: (who is following me)
	UserId
	FollowerId

	Following: (What I am following)
	UserId
	FollowingIds

	High level design:





	Caching
	Paritioning
	Scalability
	Availability
	Latency
	bottlenecks
	extra stuff