Skip to content

Instantly share code, notes, and snippets.

@jianminchen
Created January 28, 2020 19:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jianminchen/22ced79245ce2464569bfcaadb0eb276 to your computer and use it in GitHub Desktop.
Save jianminchen/22ced79245ce2464569bfcaadb0eb276 to your computer and use it in GitHub Desktop.
Design twitter - Jan. 27, 2020, 10:00 PM system design mock interview as an interviewer
design a twitter, let us start, less than 1000 users, RSTful API, post twitter, follow, follower can see time line, hashtag, search using hashtag, -> scability, bottom neck
API
authentication jwt token
encrpty user_id with information this token only remaining valid around 1 week
define salt key random string
decrypt token invalid
/we store token after user authenticate redis
http protocol
register(username,password)
login(username, password)
post_tweet(user_id, content)
follow(user_id, followed_user_id)
get_recent_timeline(user_id, start, end) return array of tweets
search(hashtag) return array of tweets
header 200 ok
server internal error 500
request parameter invalid 400
body: {
message: "ok"
error_code: 23,
error_message: ""
}
Table
User table
id , username , password, created_at
Tweet table
id, user_id , content, created_at
Hashtag table
id, tweet_id, hashtag
1 . 1 . hashtag1
2 1 . hashtag2
Follower table
user_id, follow_id bot id foreign key to user table, created_at
Timeline:
agregate based on recent tweets from our follower
so it will be agregate background and it will store in cache in redis
redis
user_id: [tweet_id, tweet_id, tweet_id ]
Diagram:
less than 1000 user
Client -> LB -> Reverse Proxy -> UserService -> Redis ->MYSQL
-> TimelineService -> Redis -> MYSQL
please describe and write down why we need reverse proxy:
Reverse Proxy
we can add rate limiter
SSL termination
compression response for outbound
blaclist certain ip
select * from hashtag_table like '#hashtag' - for searching hashtag
ranking based on like, retweet on tweet
Activity:
type (like, retweet) | actor_id ( user_id) | tweet_id | created_at
1st query if grow big store in cache
tweet_id_total_likes: 123
tweet_id_total_retweet: 123
every time we do agregation we can also update in tweet table
denomarlize
Tweet table
id, user_id , content, created_at, total_likes, total_retweet
Ranking:
we can read based on total_likes or total_retweet basedo n algorithm
100 Million
something #hastag something
if tweet include hashtag it will do tokenizer based on hashtag as the main key it will send to queue and worker will index to elastic search it will use inverted index
got tweet send asycn to queue
worker instance will call index api to elastic search
inverted index is data structure for faster lookup
let say
doc1 tweet #something #abc
doc2 tweet #something #abc2
tokenizer
#s
#so
#som
#some
// keyword index criteria -
1 "something like this #abc"
fuzzy search some
1
#ab
-
#abc
1
tokenizer phrase that not has hashtag
we define min length 4
some
somet
someth
somethi
somethin
something
like
#abc
#something doc1, doc2
#abc doc1
#abc2 doc2
Client -> LB -> Reverse Proxy -> Fanout -> SearchService -> Queue -> Worker -> ElasticSearch
-> PostTweetService -> MYSQL
-> Redis
Fanout
Please write down some keywords to help understand FAUout
Fanout is for sending /forwarding request to multiple service
Fanin call from server service and merge
-> TimelineService -> Redis -> MYSQL
we will have a lot of followers
we can 2 mechanism timeline
push /pull
hybrid
server will do push information new tweet if user follower if less than 1 million
if user follower if greater than 1 million it means to get the new tweets from that user they must using pull
push mechanism can be divide in two
SSE / Websocket
SSE
Pull mechanism
short polling every 5 minutes calling to see new tweet from celebrity follower
Survey -
Cassadra -
distributed storage -
AWS - storage -
zookeeper - Redis - idea
Cassandra High throupout and high availability but the downside eventual consistent
Timeline
TimelineFeed
User_id -> list(tweet_id)
redis
tweet_id->"content"
mysql it use master slave
quorum ring based consistent hashing
consistent hashing
5 server
size of the ring 0 - 2^32-1
server1 1
server2 3
server3 4
hash the data and doing % with size of the ring
it will place the data based on the clock wise
let say i hash data i got number 2 it will store in server2
large data intensive application (this i read the book)
- do you learn by reading or you also work on large distribute system
real experience - large distributed system -
what is technology different from twitter system
what is weakness/ strength system design?
if familiar with questoin is easy
let say design something im not familiar like live streaming like
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment