Skip to content

Instantly share code, notes, and snippets.

@DagnyTagg2013
Created November 13, 2017 22:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DagnyTagg2013/55dad1ae966180a0118d9f971cd06950 to your computer and use it in GitHub Desktop.
Save DagnyTagg2013/55dad1ae966180a0118d9f971cd06950 to your computer and use it in GitHub Desktop.
MRU Cache
"""
PROBLEM: Implement MRU Cache
- i.e. cache the MOST-MOST-RECENTLY-USED-OR-ACCESSED, and drop LEAST-RECENTLY-USED
- FIXED-size cache
initialized with a size, never to exceed that size
- put(key, val), O(1)
- get(key) -> val (or None), O(1)
TEST CASES:
testCache = Cache(3)
testCache.put("A", 1)
testCache.put("B", 2)
testCache.put("C", 3)
print testCache.get("B") # 2
testCache.put("D", 4)
print testCache.get("A") # One of these should be None
print testCache.get("B")
print testCache.get("C")
print testCache.get("D")
"""
"""
SUMMARY:
- Lookup with DICT using String:Node reference
- Nodes referenced contained in in PriorityQ implemented as
DEQUE
- NODE key=String, value=Integer
- need random access lookup for Get/Put via DICT
- need to determine which Node to evict in PriorityQ;
then lookup corresponding DICT entry to remove
So KEY has to be part of Node.
CACHE EVICTION POLICY VARIATIONs from SIMPLEST to REAL-WORLD:
1) RECENTLY WRITTEN-INSERTED
- if you just want to evict ANY node; then simple List array implementation with eviction of last appended is OK; but then we have cache policy of retaining OLDEST
WRITTEN-INSERTED and not USED-ACCESSED which is bad
- LIST implemented as Array in Python makes removal from FRONT
expensive due to copy; whereas add to END is less expensive due to amortized
block allocation
2) ACCESS RECENCY: Most recently accessed CACHED; stale accesses evicted
- if we want to evict node by ACCESS recency -- eg retain most recently accessed
- so FIFO according to ACCESS, then use Doubly-Linked-List or DEQUE
since remove from HEAD and add to TAIL are constant operations
3) ATTRIBUTE VALUE: Use HEAP, with value according to ATTRIBUTE
- Price (cache High-Price, so evict low-price with MIN-Heap)
- Time Written (cache recently written with smallest time difference from NOW, so evict largest time difference with MAX-Heap)
TRICK:
- MOST RECENTLY ACCESSED is REMOVED from TAIL of Priority Q,
then MOVED to HEAD of Q
- DICT is UNCHANGED, as holds reference to Node which didn't itself change EXCEPT when Q is FULL:
- Node at END of Q is EVICTED on a full Q,
then LOOKUP on Node KEY needed to REMOVE it from DICT!
ATTENTION TO:
- track data IN-SYNC on each PUT modification: Q, DICT, COUNT;
which is unaffected with GET: DICT, Q, COUNT
- need to EVICT TAIL on FULL Q case;
then LOOKUP corresponding KEY in DICT to RELEASE reference!
WHAT MATTERS:
- DICT always points to Node directly for random-access for GET
- PUT needs to check for FULL and use
PriorityQ with EVICTION POLICY:
- On Eviction from PriorityQ, need Node with both
with KEY-VALUE Node for
lookup and eviction of corresponding Entry from DICT!
- SIMPLEST to have VALUE on DICT reference SAME Node structure as Q!
- DEQUE since insert-delete at HEAD and TAIL are needed;
AND want efficient DELETE so need BACK ptr!
- update COUNT on PUT only
DETAILS:
get => lookup Dict is NOT modified, instead PriorityQ is
- retrieves Node value if it exists from DICT
- REMOVE accessed item from TAIL of PriorityQ
- ADD it to HEAD to TRACK MR
- otherwise returns None if it does not exist
set => PriorityQ modified, THEN lookup Dict is synced
- if EXISTS, UPDATE its value
- if NOT exists and
- if NOT FULL at capacity
- lookup DICT, and UPDATE value
- if FULL at capacity
- create new node
- REMOVE stale item from HEAD of Priority Q,
then goto lookup KEY in DICT to REMOVE corresponding entry
- INSERT New Node to TAIL of PriorityQ
SIMPLIFYING ASSUMPTIONS:
- ignore concurrent access; otherwise have to copy and return MODIFIED
collection each time, leaving original dictionary intact
- ignore WEAK references on DICT key which allow garbage collection for
stale keys
- NO DEEP copies required; just REFERENCES to VALUE objects saved;
shallow copy OK. eg not copy.deepcopy(), but dict.copy()
LANGUAGE-SPECIFIC SUPPORT:
- JAVA
- LinkedHashMap with ctor parameter for AccessOrder (True for Access, False for Insertion)
- PriorityQ with ctor parameter for Comparator where DEFAULT -1 for
L < R MIN HEAP; and you multiply by -1 to get REVERSE order to conver to MAX HEAP
- java Generics PECS rule: Producer Extend, Consumer Super
Comparator<? super E>
- https://stackoverflow.com/questions/2723397/what-is-pecs-producer-extends-consumer-super
- PYTHON
- heapq supports min-heap by default;
but needs customization for MAX-Heap
- i.e needs ctor-parameterized comparator function for specific contained element type
*** BUT I CUSTOMIZED THIS MYSELF -- WHOOHOO!
https://github.com/DagnyTagg2013/CTCI-Python/blob/master/HEAP/binheap.py
- OR from Queue import PriorityQueue (synchronized and slower)
which adds elements as TUPLEs for automatic field-sequence ordering;
but maybe also need to do * -1 to REVERSE ordering for a MAX heap
- https://stackoverflow.com/questions/407734/a-generic-priority-queue-fo-python
- https://docs.python.org/3/library/heapq.html
"""
from collections import deque
class Node:
def __init__(self, key=None, value=None):
self.key = key
self.value = value
def __repr__(self):
status = "({},{})".format(self.key, self.value)
return status
class MRUCache:
def __init__(self, maxSize=3):
self.capacity = maxSize
self.currSize = 0
self.lookup = {}
self.priorityQ = deque()
def __repr__(self):
# ATTN print syntax!
status = '\nMRUCache\n {}, {}\n{}\n{}'.format(self.capacity, self.currSize, self.lookup, self.priorityQ)
return status
def get(self, key):
# RETURN NONE if not in lookup cache
foundNode = self.lookup.get(key, None)
# if node found, reorder ACCESS-ORDER
if foundNode:
# FIND/REMOVE corresponding node reference from priority Q,
# (should be SAME reference as above!)
# then put it into the FRONT of Q
self.priorityQ.remove(foundNode)
self.priorityQ.appendleft(foundNode)
# RETURNs object reference if in lookup cache or None if not found
return foundNode
def put(self, key, value):
currNode = None
# UPDATE value if ALREADY in lookup cache
if key in self.lookup:
currNode = self.lookup[key]
# ATTN: Dict only saves REFERENCE to pairNode, so modify
# value on that Node!
currNode.value = value
# FIND corresponding node from Q,
# then put it into the FRONT of Q,
# - NOTE this is the SAME Node reference as ABOVE!
# ATTENTION: TIME TO SEARCH is O(n) on DEQUE
existNode = self.priorityQ.remove(currNode)
# otherwise, CREATE new Node, and ADD it to lookup cache
else:
# create new node
currNode = Node(key, value)
self.currSize = len(self.priorityQ)
# ATTN : check FWD bound before EVICTION
if ((self.currSize + 1) > self.capacity):
# ATTN: update Q WITH its corresponding DICT Entry!
# IFF we're at CAPACITY, first remove STALE node at
# END of priorityQ, then remove its corresponding entry from the lookup
# cache
staleNode = self.priorityQ.pop()
# ATTN: this is the way to REMOVE from a DICT in Python!
staleNode = self.lookup.pop(staleNode.key, None)
# COMMON-CODE,
# accessed updated/added node ALWAYs gets added to HEAD of Q,
# AND exists in the lookup cache based on Node key
self.priorityQ.appendleft(currNode)
self.lookup[currNode.key] = currNode
# UPDATE size!
self.currSize += 1
# ************ TEST DRIVER ******************
print("\n*** INIT CACHE")
testCache = MRUCache(3)
print(testCache)
print("\n*** PUT A")
testCache.put("A", 1)
print(testCache)
print("\n*** PUT B")
testCache.put("B", 2)
print(testCache)
print("\n*** PUT C")
testCache.put("C", 3)
print(testCache)
print("\n*** GET B")
print(testCache.get("B")) # 2
print(testCache)
print("\n*** PUT D")
testCache.put("D", 4)
print(testCache)
print("\n*** GET A")
print(testCache.get("A")) # THIS should be None
print(testCache)
print("\n*** GET B")
print(testCache.get("B"))
print(testCache)
print("\n*** GET C")
print(testCache.get("C"))
print(testCache)
print("\n*** GET D")
print(testCache.get("D"))
print(testCache)
print("\n*** UPDATE B")
testCache.put("B", 100)
print(testCache)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment