palexs/caching.md

## caching.md

      
    Raw
  

              caching.md
            
          
    CURRENT CACHING LAYER
magnetFetch (src/lib/magnetFetch/magnetFetch.js)
Data is cached when the request is a GET and when any max-age is provided.

GET is reading a server resource; other requests are for modification.
The expectation for the app is that a modification should fail if it
can't hit the server, and that it should always know when a modification
fails. Therefore cached data is not appropriate for non GET requests.
when max-age == 0 it means that the response is valid for offline mode,
even if it is immediately stale when online.

The cache is used when:

When the app is offline, any data in the cache is considered fresh.
When the app is online, only fresh data (< max-age) is used and the network
request is skipped.
When the app is online, but the data is stale, and the server returns a 50x
error, the app tries to recover by providing the cache data, even if it is stale.
Cache is provided when response code is 304 (Not Modified) on iOS and Android.

The caching mechanism stores and uses Etag header value.
React Native caches images by default. See more: https://github.com/facebook/react-native/blob/master/Libraries/Image/RCTImageCache.m
JSONCache (src/lib/magnetFetch/JSONCache.js)
export default class JSONCache {
  cache = new Cache({
    namespace: 'JSONCache',
    policy: {
      // ENG4BCMA-317. Fix sqlite error 13 - database is full
      // ~450-500 entries exceeds permitted limit in 6Mb, lets decrease max entries limit
      maxEntries: 200,
    },
    backend: AsyncStorage,
  });

  // We use this to keep a small number of responses in memory
  // this is mostly for when users spam refresh, that we aren't
  // hit for the AsyncStorage penalty
  memoryCache = new LRUMap(20);

  getEntry(key) {...}

  setEntry(key, json, etag, staleTime) {...}
}

Example
GET https://cdn-mobapi.bloomberg.com/wssmobile/v1/stories/PWXTSR6KLVR801
Date Wed, 28 Aug 2019 12:52:42 GMT
Server Apache
x-cache-hits 1
Etag "0970737e5cd4f8b43ad8025ece5a0f704"
x-cache HIT
Content-Type application/json;charset=UTF-8
Cache-Control public, max-age=297
Accept-Ranges bytes
x-timer S1566996762.255292,VS0,VE338
Age 0
Via 1.1 varnish
x-served-by cache-hhn4051-HHN
Content-Length 13402

PROPOSED CACHING LAYER
Terminology

LRU (least recently used) cache - caching algorithm that discards the least recently used items first. This algorithm requires keeping track of what was used when (lastReadTime field).
Implicit cache - cache entries that are eligible for removal at any time (during cache clean-up process).
Explicit cache - cache entries that contain userTag(s) and represent persistent entries that have been downloaded/bookmarked by a user. These entries are not eligible for purging and should remain in the cache as lons as the user needs them.
User tag - string that represents ownership, e.g. issue or article id. If a cache entry has a user tag it means that it holds data for the downloaded/bookmarked issue/article or its dependent resources (e.g. image). Such cache entry should remain in the cache for at least as long as its owing entity.

Requirements

The app should support implicit caching and user initiated downloading
Response's JSON and media data (such as images) should be cached
Cached data should not be stored in AsyncStorage
Caching layer should allow purging of implicitly cached data
Caching should happen on a conditional basis (e.g. devmode flag, type of article, free space available, etc)
The downloading system will need to support downloading a collection of articles, but that are managed under a single 'Issue' entity
The downloading system should be able to report progress

Implicit cache vs user initiated downloads

It's possible for a user to see a list of downloaded articles, so they can view and manage them easily.
Downloaded articles are not automatically purged; they require explicit removal.
Implicitly cached data could be purged automatically.
Because it's user initiated, download progress should be shown.
Downloaded articles could include more data than a cached article (such as images).

"Issue" downloading

If a user downloads an issue, every article in that issue will be downloaded.
If a user removes/deletes an issue, every article in that issue will be removed.
If a user bookmarks an individual article from a downloaded issue, we should make sure not to download that article twice.
If an issue is deleted, but contains a bookmarked article, that article is not deleted (but all other articles are).
In any listing of downloaded content, an issue should appear as a single entity (rather than a list of its constituent articles).
Download progress reporting should apply to the issue as a whole.

Manifest
Manifest will be the source of truth for cached/downloaded contents.
// downloaded issue with 3 articles. 'url1' key value is the issue itself.
url1: {
	userTags: ['issue1'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// article #1
url2: {
	userTags: ['issue1'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// article #2 - this article belongs to the downloaded issue and has also been bookmarked
url3: {
	userTags: ['issue1', 'article2'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// article #3
url4: {
	userTags: ['issue1'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// bookmarked article
url5: {
	userTags: ['article4'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// cache entry eligible for purging
url6: {
	// userTags: [],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
}

The idea is to store the manifest on disk (in JSON format), but to keep it in memory for quick access when the app is running. Loading the manifest to memory should happen at app startup. As the manifest will be updated frequently we should flush it to disk periodically. Proposal is to mark the manifest as dirty on every cache write operation => flush every 10s, if dirty.
On disk, there will be our cache/download space.
cache/
   hash(url1).json
   hash(url2).json
   hash(url3).json
   hash(image-url1).jpeg
   hash(url4).json
   ...
   hash(image-urlN).jpeg
   hash(urlN).json

Notes:
article json => file://localstorage/cache/hash(url).json
image file => file://localstorage/cache/hash(image-url).jpeg
magnetFetch will need to be capable of handling raw image response. Currenly it knows how to handle JSON responses only.
All cache actions related to downloads should also update the corresponding reducer.
Managers

CacheManager is a class that is responsible for managing cache. In the current implementation we rely on react-native-cache that stores data to AsyncStorage. This shouldn't be the case as AsyncStorage is meant to store only a small amount of key-value data. The idea is to create our own substitute for react-native-cache library and implement LRU cache logic from scratch. This homegrown solution should store data directly to the filesystem.

CacheManager API:

getEntry(url)
setEntry(url, value)
removeUserTag(issueID/articleID) - iterates over all cache entries and removes the provided tag thus making entries become implicitely cached.
sweepCache() - scans entries and makes cache be within max # of entries. Call sweepCache on startup on some frequency. Cache items with user tag(s) are not eligible for removal.

Note: There's no need in addUserTag method. Adding tag(s) should be done via call to magnetFetch and providing a userTag.
magnetFetch should accept new options flags, e.g. userTag:
// download issue
magnetFetch(url, options: {
	userTag: 'issue1'
})

// bookmark article
magnetFetch(url, options: {
	userTag: 'article4'
})

// access article from home feed
magnetFetch(url, options: {})

When a user initiates a download DownloadsManager comes into play.

DownloadManager: orchestrates download process - starting from initiating a fetch request by the means of magnetFetch, ending by delegated filesystem manipulations (via FileSystemManager) and triggering Redux store updates.

DownloadManager.downloadIssue(url, issueID) {
	magnetFetch(url, options: { userTag: issueID })
	parse response
	
	for each image
		downloadImage
	success callback => [FileSystemManager.writeToDisk => manifest update => Redux store update]
	failure callback => [notify user => Redux store update]
	
	for each article
		downloadArticle
		progress callback(1/N)
	success callback => [FileSystemManager.writeToDisk => manifest update => Redux store update]
	failure callback => [notify user => Redux store update]
}

DownloadManager.downloadArticle(url, issueID/articleID) {
	magnetFetch(url, options: { userTag: issueID/articleID })
	parse response
	for each image
		downloadImage
	success callback => [FileSystemManager.writeToDisk => manifest update => Redux store update]
	failure callback => [notify user => Redux store update]
}

DownloadManager.downloadImage(url, issueID/articleID) {
	magnetFetch(url, options: { userTag: issueID/articleID })
	parse response
	success callback => [FileSystemManager.writeToDisk => manifest update]
	failure callback => [notify user => Redux store update]
}

DownloadManager.removeIssue(url) {...} // makes the issue become implicitely cached by removing user tags(s)

DownloadManager.removeArticle(url) {...} // makes the article become implicitely cached by removing user tags(s)

The corresponding reducer should contain data needed for updating UI based on the downloads statuses.
We might want to add redux-persist library, so that the reducer's data is persisted across app launches. Another option is to save reducer data to AsyncStorage and restore it on startup (using startup actions mechnism).
// issues/reducer.js
savedIssues: [
{
	id: 'issue1',
	status: 'downloading'
},
{
	id: 'issue2',
	status: 'downloaded'
}
]


FileSystemManager: reads/writes to disk (react-native-fs).
Native filesystem access for React Native apps: https://github.com/itinance/react-native-fs

Image wrapper component
There's a need in an Image based component that will be capable of displaying cached image data. This component could make a wrapper that:

takes image url as a prop
makes a magnetFetch call with it (returns from cache or real url if needed)
downloads the image and returns a path to the downloaded file

The benefits of this wrapper are:

it is not cache aware
if there's no corresponding cache entry for a given url it will delegate to magnetFetch to fetch the image and put it to the cache

Note: Unfortunately, this means that there's no way we can handle all responses uniformly - JSON and image responses will be handled differently.
Handling failures
If during downloading for any reason a failure happens userTags should be removed for the corresponding resources (relying on removeUserTag method), thus making them purgeable. It's important to inform a user about the failure by e.g. showing an alert.
On every app startup reconciliation process should take place. It's needed because, for example, in case of a crash, our manifest might be out of sync with the state of the filesystem. The reconciliation process should identify issues that failed to download (reducer -> issues with downloading status) and remove userTags for the corresponding resources, as in case of a recoverable falures described above.
Miscellaneous
Currently image resources that we deal with are of JPEG and PNG formats.
Default size: ~20-50 KB
Large size: ~700-800 KB - 1.5-2.0 MB

Turn off RN caching for images.
Add cache size to dev menu.
Add "Clear all cached data" in the Settings menu.
Attempt recovery (re-download) if a crash happend during downloading.