Created
October 11, 2020 04:21
-
-
Save jianminchen/ca4e75152356d496f71be9a68411bff4 to your computer and use it in GitHub Desktop.
Oct 10 2020 - system design as an interviewee - key value cache for search engine
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This editor is synced in real time with your peer. | |
Use it to share thoughts and resources, such as: | |
- Features scope | |
- API design | |
- Pseudo code for specific components | |
- Data model/schema | |
- Back-of-the-envelope calculations | |
- Reference links | |
- Link to whiteboard or diagram such as https://sketchboard.me/new | |
Good luck! | |
key-value cache for search engine | |
cache - quick and fast compared to permanent storage, NOSQL, data store | |
key-value - | |
text - key - stock name - | |
value - list of articles - | |
billions key words - storage - | |
value - more than key in size - thousand by - page by page -10 pages output | |
first page - considering - value -> | |
web -> load balancer -> search service - read API - cache - sharded by key -> avoid hotspot - > distribute to different DB | |
Data center -> Asia/ - replication -> single - CAP theorem - consistent, available, partition | |
avaiablity, partition tolearnace | |
consistent | |
- | |
Cache - strategy - look aside, write through - consistency | |
avaiablity | |
consistency | |
latency | |
can you type | |
OK | |
and speak in the same time | |
I want to know, Query Procesing Overview: | |
LRU - leastly recently used - cached - maintain eviction - set high priority - fixed size | |
LFU - eviction - stay in/ out cache | |
maintain cache - in term of get in/ out | |
Level - cache - high - > refine -> small detail | |
Key-value cache - how to store those key-value in the cache? Memcache - Redis - cache | |
lookup O(1) - linear search | |
Trie - hashset | |
text: keys | |
keys | |
key | |
similar - edit distance - text, natural language - AI - machinese | |
keys - hundreds - use data structure | |
- normal - horziontal - cheap - micro - cache - RAM - 2GB - memory - outside -> store as cache - ROM | |
Can you type the question in short sentence? I cannot hear clearly? | |
type line 62 | |
In a typical search engine, there are five types of data items that are accessed or generated during the search process: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. | |
What is your question? | |
put question - sentence - Query processing involves a number of steps ? 1. Query submission 2. 3. | |
query submission | |
full text match - | |
tennis super star - tennis, super stars | |
2020 corona virus - keyword | |
- search relational data -> index - based keywor | |
- full - inverted binary search - document - linear search - | |
save something to cache | |
Question: For the 1st time search "2020 corona virus", getting data from main data storage, displayi primary or top news ralted to the search.. | |
for this, how we will save in terms of Key-value cache ? | |
Recently - key value into cache - Run out disc | |
search Cache - | |
break into multiple keys - permanent - > keys -> 10 -1000 keys -> cache -> | |
replication - shard - cache server -> permanant -> A, T -> shard them multiple, 10,000 -> 5 - 6 T | |
partition | |
- AI - should be new key -> permanent -> cache - hit ratio - exactly same | |
- what should I | |
do next? should we swap role? good with your answer... | |
I'm going to swap the roles |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment