Skip to content

Instantly share code, notes, and snippets.

@jianminchen
Created October 11, 2020 04:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jianminchen/ca4e75152356d496f71be9a68411bff4 to your computer and use it in GitHub Desktop.
Save jianminchen/ca4e75152356d496f71be9a68411bff4 to your computer and use it in GitHub Desktop.
Oct 10 2020 - system design as an interviewee - key value cache for search engine
This editor is synced in real time with your peer.
Use it to share thoughts and resources, such as:
- Features scope
- API design
- Pseudo code for specific components
- Data model/schema
- Back-of-the-envelope calculations
- Reference links
- Link to whiteboard or diagram such as https://sketchboard.me/new
Good luck!
key-value cache for search engine
cache - quick and fast compared to permanent storage, NOSQL, data store
key-value -
text - key - stock name -
value - list of articles -
billions key words - storage -
value - more than key in size - thousand by - page by page -10 pages output
first page - considering - value ->
web -> load balancer -> search service - read API - cache - sharded by key -> avoid hotspot - > distribute to different DB
Data center -> Asia/ - replication -> single - CAP theorem - consistent, available, partition
avaiablity, partition tolearnace
consistent
-
Cache - strategy - look aside, write through - consistency
avaiablity
consistency
latency
can you type
OK
and speak in the same time
I want to know, Query Procesing Overview:
LRU - leastly recently used - cached - maintain eviction - set high priority - fixed size
LFU - eviction - stay in/ out cache
maintain cache - in term of get in/ out
Level - cache - high - > refine -> small detail
Key-value cache - how to store those key-value in the cache? Memcache - Redis - cache
lookup O(1) - linear search
Trie - hashset
text: keys
keys
key
similar - edit distance - text, natural language - AI - machinese
keys - hundreds - use data structure
- normal - horziontal - cheap - micro - cache - RAM - 2GB - memory - outside -> store as cache - ROM
Can you type the question in short sentence? I cannot hear clearly?
type line 62
In a typical search engine, there are five types of data items that are accessed or generated during the search process: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents.
What is your question?
put question - sentence - Query processing involves a number of steps ? 1. Query submission 2. 3.
query submission
full text match -
tennis super star - tennis, super stars
2020 corona virus - keyword
- search relational data -> index - based keywor
- full - inverted binary search - document - linear search -
save something to cache
Question: For the 1st time search "2020 corona virus", getting data from main data storage, displayi primary or top news ralted to the search..
for this, how we will save in terms of Key-value cache ?
Recently - key value into cache - Run out disc
search Cache -
break into multiple keys - permanent - > keys -> 10 -1000 keys -> cache ->
replication - shard - cache server -> permanant -> A, T -> shard them multiple, 10,000 -> 5 - 6 T
partition
- AI - should be new key -> permanent -> cache - hit ratio - exactly same
- what should I
do next? should we swap role? good with your answer...
I'm going to swap the roles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment