jianminchen/System design

## System design
Design a twitter, simple version, 100 follower at most for one person, RESTFUL API. Later on celebrity millions follower, hashtag, allow search using hashtag.


Accounts, 100 followers max
Searching - hashtag, text
Feed generation


SCHEMA
------

Account
- handle (PK)
- display name
- followers
- feed (point to head of linked list)
- last_login
- src IP address

Follow table
- src
- dst

Message
- message_id (PK)
- author (FK into account)
- content (str, max 140 chars)
- hashtags
- time

Hashtags
- hashtag (PK)
- message_id (FK into message)


API
---
different endpoints for different functions

use cases:
- post message
- delete message?
- search_text
- search_hashtag
- request_feed
- follow


post_message(handle, message) -> POST
delete_message(handle, message_id) -> DELETE
search_text(text) -> GET  # need the requesting users handle so we can see private tweets
search_hashtag(hashtag) -> GET
request_feed(handle) -> GET
follow(handle_src, handle_dst) -> POST


SQL Database + cache (write through to elasticsearch)
Elasticsearch (for text queries only)
Text search service  # maybe backed by alternative database suited for querying eg. elasticsearch or other document store
Hashtag search service
Feed generation service
Fraud detection service
Notification service -> question about protocol? long polling
Follow service (checks whether exceeded follow limit, raise event to regenerate feed)

when a user A posts a tweet -> raise event -> queue to recompute feeds for users B that follow user A

Elasticsearch
words -> documents (tweets)

"the dog went to the park" Tweet #23

Index - inverted index
"the" -> [23]
"dog" -> [23]
..
..
..
"park" -> [23]

"typical index" id -> contents
"inverted index" contents -> id


How to scale the system?
How to see tweets of people they are following?

14:00 create new account
14:000001 100 new follows
14:02 1 follows
14:03 1 follow


how to deploy code?

Build,

4GB -> code
10,000 deployment instances?
7 regions


Problems:
- Efficient way to build code so we don't have to build it 10,000 times? A: use docker container to avoid recompiling 4G of code for each deployment
- How to update live version with new build?
- Load balancers:
    - spin up small number of instances with new build
    - load balancer redirects small portion of traffic to new instances
    - wait to see if there are errors/crashes etc
    - gradually introduce new instances of new build, and remove instances of old build
    - if errors, take new instances down until we fix bug
    - if no errors, continue increasing instances of new build until it reaches 10,000
    - schedule based on low-demand times for each region
	Design a twitter, simple version, 100 follower at most for one person, RESTFUL API. Later on celebrity millions follower, hashtag, allow search using hashtag.


	Accounts, 100 followers max
	Searching - hashtag, text
	Feed generation


	SCHEMA
	------

	Account
	- handle (PK)
	- display name
	- followers
	- feed (point to head of linked list)
	- last_login
	- src IP address

	Follow table
	- src
	- dst

	Message
	- message_id (PK)
	- author (FK into account)
	- content (str, max 140 chars)
	- hashtags
	- time

	Hashtags
	- hashtag (PK)
	- message_id (FK into message)


	API
	---
	different endpoints for different functions

	use cases:
	- post message
	- delete message?
	- search_text
	- search_hashtag
	- request_feed
	- follow


	post_message(handle, message) -> POST
	delete_message(handle, message_id) -> DELETE
	search_text(text) -> GET # need the requesting users handle so we can see private tweets
	search_hashtag(hashtag) -> GET
	request_feed(handle) -> GET
	follow(handle_src, handle_dst) -> POST


	SQL Database + cache (write through to elasticsearch)
	Elasticsearch (for text queries only)
	Text search service # maybe backed by alternative database suited for querying eg. elasticsearch or other document store
	Hashtag search service
	Feed generation service
	Fraud detection service
	Notification service -> question about protocol? long polling
	Follow service (checks whether exceeded follow limit, raise event to regenerate feed)

	when a user A posts a tweet -> raise event -> queue to recompute feeds for users B that follow user A

	Elasticsearch
	words -> documents (tweets)

	"the dog went to the park" Tweet #23

	Index - inverted index
	"the" -> [23]
	"dog" -> [23]
	..
	..
	..
	"park" -> [23]

	"typical index" id -> contents
	"inverted index" contents -> id


	How to scale the system?
	How to see tweets of people they are following?

	14:00 create new account
	14:000001 100 new follows
	14:02 1 follows
	14:03 1 follow




	how to deploy code?

	Build,

	4GB -> code
	10,000 deployment instances?
	7 regions


	Problems:
	- Efficient way to build code so we don't have to build it 10,000 times? A: use docker container to avoid recompiling 4G of code for each deployment
	- How to update live version with new build?
	- Load balancers:
	- spin up small number of instances with new build
	- load balancer redirects small portion of traffic to new instances
	- wait to see if there are errors/crashes etc
	- gradually introduce new instances of new build, and remove instances of old build
	- if errors, take new instances down until we fix bug
	- if no errors, continue increasing instances of new build until it reaches 10,000
	- schedule based on low-demand times for each region