Skip to content

Instantly share code, notes, and snippets.

@palexs
Last active September 16, 2019 15:40
Show Gist options
  • Save palexs/85de5ada3cbdb316bffb74844eb37f35 to your computer and use it in GitHub Desktop.
Save palexs/85de5ada3cbdb316bffb74844eb37f35 to your computer and use it in GitHub Desktop.

CURRENT CACHING LAYER

magnetFetch (src/lib/magnetFetch/magnetFetch.js)

Data is cached when the request is a GET and when any max-age is provided.

  • GET is reading a server resource; other requests are for modification. The expectation for the app is that a modification should fail if it can't hit the server, and that it should always know when a modification fails. Therefore cached data is not appropriate for non GET requests.
  • when max-age == 0 it means that the response is valid for offline mode, even if it is immediately stale when online.

The cache is used when:

  • When the app is offline, any data in the cache is considered fresh.
  • When the app is online, only fresh data (< max-age) is used and the network request is skipped.
  • When the app is online, but the data is stale, and the server returns a 50x error, the app tries to recover by providing the cache data, even if it is stale.
  • Cache is provided when response code is 304 (Not Modified) on iOS and Android.

The caching mechanism stores and uses Etag header value.

React Native caches images by default. See more: https://github.com/facebook/react-native/blob/master/Libraries/Image/RCTImageCache.m

JSONCache (src/lib/magnetFetch/JSONCache.js)

export default class JSONCache {
  cache = new Cache({
    namespace: 'JSONCache',
    policy: {
      // ENG4BCMA-317. Fix sqlite error 13 - database is full
      // ~450-500 entries exceeds permitted limit in 6Mb, lets decrease max entries limit
      maxEntries: 200,
    },
    backend: AsyncStorage,
  });

  // We use this to keep a small number of responses in memory
  // this is mostly for when users spam refresh, that we aren't
  // hit for the AsyncStorage penalty
  memoryCache = new LRUMap(20);

  getEntry(key) {...}

  setEntry(key, json, etag, staleTime) {...}
}

Example

GET https://cdn-mobapi.bloomberg.com/wssmobile/v1/stories/PWXTSR6KLVR801

Date Wed, 28 Aug 2019 12:52:42 GMT
Server Apache
x-cache-hits 1
Etag "0970737e5cd4f8b43ad8025ece5a0f704"
x-cache HIT
Content-Type application/json;charset=UTF-8
Cache-Control public, max-age=297
Accept-Ranges bytes
x-timer S1566996762.255292,VS0,VE338
Age 0
Via 1.1 varnish
x-served-by cache-hhn4051-HHN
Content-Length 13402

PROPOSED CACHING LAYER

Terminology

  • LRU (least recently used) cache - caching algorithm that discards the least recently used items first. This algorithm requires keeping track of what was used when (lastReadTime field).
  • Implicit cache - cache entries that are eligible for removal at any time (during cache clean-up process).
  • Explicit cache - cache entries that contain userTag(s) and represent persistent entries that have been downloaded/bookmarked by a user. These entries are not eligible for purging and should remain in the cache as lons as the user needs them.
  • User tag - string that represents ownership, e.g. issue or article id. If a cache entry has a user tag it means that it holds data for the downloaded/bookmarked issue/article or its dependent resources (e.g. image). Such cache entry should remain in the cache for at least as long as its owing entity.

Requirements

  • The app should support implicit caching and user initiated downloading
  • Response's JSON and media data (such as images) should be cached
  • Cached data should not be stored in AsyncStorage
  • Caching layer should allow purging of implicitly cached data
  • Caching should happen on a conditional basis (e.g. devmode flag, type of article, free space available, etc)
  • The downloading system will need to support downloading a collection of articles, but that are managed under a single 'Issue' entity
  • The downloading system should be able to report progress

Implicit cache vs user initiated downloads

  • It's possible for a user to see a list of downloaded articles, so they can view and manage them easily.
  • Downloaded articles are not automatically purged; they require explicit removal.
  • Implicitly cached data could be purged automatically.
  • Because it's user initiated, download progress should be shown.
  • Downloaded articles could include more data than a cached article (such as images).

"Issue" downloading

  • If a user downloads an issue, every article in that issue will be downloaded.
  • If a user removes/deletes an issue, every article in that issue will be removed.
  • If a user bookmarks an individual article from a downloaded issue, we should make sure not to download that article twice.
  • If an issue is deleted, but contains a bookmarked article, that article is not deleted (but all other articles are).
  • In any listing of downloaded content, an issue should appear as a single entity (rather than a list of its constituent articles).
  • Download progress reporting should apply to the issue as a whole.

Manifest

Manifest will be the source of truth for cached/downloaded contents.

// downloaded issue with 3 articles. 'url1' key value is the issue itself.
url1: {
	userTags: ['issue1'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// article #1
url2: {
	userTags: ['issue1'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// article #2 - this article belongs to the downloaded issue and has also been bookmarked
url3: {
	userTags: ['issue1', 'article2'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// article #3
url4: {
	userTags: ['issue1'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// bookmarked article
url5: {
	userTags: ['article4'],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
},
// cache entry eligible for purging
url6: {
	// userTags: [],
	etag: '0636cccf1fa3433a56d37d46091c7f224',
	staleTime: 1567592998697, // max-age header value
	lastReadTime: 1567595302365
}

The idea is to store the manifest on disk (in JSON format), but to keep it in memory for quick access when the app is running. Loading the manifest to memory should happen at app startup. As the manifest will be updated frequently we should flush it to disk periodically. Proposal is to mark the manifest as dirty on every cache write operation => flush every 10s, if dirty.

On disk, there will be our cache/download space.

cache/
   hash(url1).json
   hash(url2).json
   hash(url3).json
   hash(image-url1).jpeg
   hash(url4).json
   ...
   hash(image-urlN).jpeg
   hash(urlN).json

Notes: article json => file://localstorage/cache/hash(url).json

image file => file://localstorage/cache/hash(image-url).jpeg

magnetFetch will need to be capable of handling raw image response. Currenly it knows how to handle JSON responses only.

All cache actions related to downloads should also update the corresponding reducer.

Managers

  • CacheManager is a class that is responsible for managing cache. In the current implementation we rely on react-native-cache that stores data to AsyncStorage. This shouldn't be the case as AsyncStorage is meant to store only a small amount of key-value data. The idea is to create our own substitute for react-native-cache library and implement LRU cache logic from scratch. This homegrown solution should store data directly to the filesystem.

CacheManager API:

  1. getEntry(url)
  2. setEntry(url, value)
  3. removeUserTag(issueID/articleID) - iterates over all cache entries and removes the provided tag thus making entries become implicitely cached.
  4. sweepCache() - scans entries and makes cache be within max # of entries. Call sweepCache on startup on some frequency. Cache items with user tag(s) are not eligible for removal.

Note: There's no need in addUserTag method. Adding tag(s) should be done via call to magnetFetch and providing a userTag.

magnetFetch should accept new options flags, e.g. userTag:

// download issue
magnetFetch(url, options: {
	userTag: 'issue1'
})
// bookmark article
magnetFetch(url, options: {
	userTag: 'article4'
})
// access article from home feed
magnetFetch(url, options: {})

When a user initiates a download DownloadsManager comes into play.

  • DownloadManager: orchestrates download process - starting from initiating a fetch request by the means of magnetFetch, ending by delegated filesystem manipulations (via FileSystemManager) and triggering Redux store updates.
DownloadManager.downloadIssue(url, issueID) {
	magnetFetch(url, options: { userTag: issueID })
	parse response
	
	for each image
		downloadImage
	success callback => [FileSystemManager.writeToDisk => manifest update => Redux store update]
	failure callback => [notify user => Redux store update]
	
	for each article
		downloadArticle
		progress callback(1/N)
	success callback => [FileSystemManager.writeToDisk => manifest update => Redux store update]
	failure callback => [notify user => Redux store update]
}

DownloadManager.downloadArticle(url, issueID/articleID) {
	magnetFetch(url, options: { userTag: issueID/articleID })
	parse response
	for each image
		downloadImage
	success callback => [FileSystemManager.writeToDisk => manifest update => Redux store update]
	failure callback => [notify user => Redux store update]
}

DownloadManager.downloadImage(url, issueID/articleID) {
	magnetFetch(url, options: { userTag: issueID/articleID })
	parse response
	success callback => [FileSystemManager.writeToDisk => manifest update]
	failure callback => [notify user => Redux store update]
}

DownloadManager.removeIssue(url) {...} // makes the issue become implicitely cached by removing user tags(s)

DownloadManager.removeArticle(url) {...} // makes the article become implicitely cached by removing user tags(s)

The corresponding reducer should contain data needed for updating UI based on the downloads statuses. We might want to add redux-persist library, so that the reducer's data is persisted across app launches. Another option is to save reducer data to AsyncStorage and restore it on startup (using startup actions mechnism).

// issues/reducer.js
savedIssues: [
{
	id: 'issue1',
	status: 'downloading'
},
{
	id: 'issue2',
	status: 'downloaded'
}
]

Image wrapper component

There's a need in an Image based component that will be capable of displaying cached image data. This component could make a wrapper that:

  • takes image url as a prop
  • makes a magnetFetch call with it (returns from cache or real url if needed)
  • downloads the image and returns a path to the downloaded file

The benefits of this wrapper are:

  • it is not cache aware
  • if there's no corresponding cache entry for a given url it will delegate to magnetFetch to fetch the image and put it to the cache

Note: Unfortunately, this means that there's no way we can handle all responses uniformly - JSON and image responses will be handled differently.

Handling failures

If during downloading for any reason a failure happens userTags should be removed for the corresponding resources (relying on removeUserTag method), thus making them purgeable. It's important to inform a user about the failure by e.g. showing an alert.

On every app startup reconciliation process should take place. It's needed because, for example, in case of a crash, our manifest might be out of sync with the state of the filesystem. The reconciliation process should identify issues that failed to download (reducer -> issues with downloading status) and remove userTags for the corresponding resources, as in case of a recoverable falures described above.

Miscellaneous

Currently image resources that we deal with are of JPEG and PNG formats.

Default size: ~20-50 KB

Large size: ~700-800 KB - 1.5-2.0 MB

  • Turn off RN caching for images.
  • Add cache size to dev menu.
  • Add "Clear all cached data" in the Settings menu.
  • Attempt recovery (re-download) if a crash happend during downloading.
@Komarev
Copy link

Komarev commented Aug 28, 2019

Is it worth to 'zip' downloaded articles?

@sbaruth
Copy link

sbaruth commented Aug 28, 2019

Thanks for the demo Alex! I had one question for you:

Is it really necessary to have a manifest? It looks like magnetFetch is currently caching responses using the url as the key. Could we do the same thing in our solution? For instance, if you hit a url to retrieve an article, you would:

  • Hash the url (or similar) to create a filename on the filesystem
  • Store the response in the file
  • Also store additional metadata in the file, or in a companion file with a .meta extension (isSavedArticle, isDownloaded, TTL, etc)

We can do the same thing for issues, articles, images, and other items down the road (video clips?). It feels like the manifest approach, and maintaining a hierarchy, may require some custom tweaking every time we want to start caching something different.

From my brief look at it, I feel like magnet fetch is actually a fairly decent caching solution, it just needs to be moved from async storage to the filesystem and allow adding some metadata. Let me know what you think!

@palexs
Copy link
Author

palexs commented Aug 29, 2019

@Komarev We might want to add archiving later on, if it's needed. Currently we're trying to build kind of MVP version of caching layer, whereas zipping/unzipping step might complicate the solution and require adding a 3rd party library. Moreover, this decision should be based on real life figures, e.g. size of cache, archiving/unarchiving time, etc.

@palexs
Copy link
Author

palexs commented Aug 29, 2019

@sbaruth Thanks for your comment! Initially I was also hesitant about adding the manifest file. It's an additional entity (sort of "middleman") that we should keep in sync and up-to-date, which is an extra burden anyways. It reminds me of convention over configuration principle. I personally have always preferred convention.

The approach you propose looks fine to me and has its props (e.g., universality), but my concern is that as we will have to perform various traversal operations like, for example, getting all implicitly cached articles older than a give date, it might be rather time-consuming to loop over all .meta files in order to select ones that are eligible for deletion.

Another thought. Consider a scenario when a downloaded issue should be deleted. It means that we should identify all articles that belong to this issue and all the corresponding images. Without the manifest it seems like a challenging task to pull off.

@palexs
Copy link
Author

palexs commented Aug 29, 2019

Given that the manifest will be accessed often for performance optimization it should be loaded to memory (on app startup) or reside in Redux store (and be persisted to local storage, e.g. by the means of redux-persist).

@sbaruth
Copy link

sbaruth commented Aug 29, 2019

Good morning (evening) Alex! You bring up some very good points. Yes, I agree with you that the solution I suggested presents more traversal complications. Particularly when marking/unmarking an issue or article as saved. However, regarding cleanup, I don't believe implicitly cached items actually need a .meta file. I'm assuming they can be cleaned up just based on the file modified timestamp? If so, at least one of your concerns could be alleviated. This is all assuming filesystem operations like checking for meta file exists and retrieving files to cleanup by date are fast.

It sounds like you had roughly the same concerns I had regarding the manifest and no longer caching purely by url. As you mentioned, there's pros and cons to both approaches, so if you think it's not worth the tradeoff I definitely trust your judgement. Thanks for discussing!

Oh, I just saw your edit! "Consider a scenario when a downloaded issue should be deleted". This is a good point as well. You'd likely want to put all related url hashes into the item's .meta file for later traversal. So for instance, an issue url would have a list of article url hashes in its meta file. An article url would have a list of image hashes in its meta file, etc.

@palexs
Copy link
Author

palexs commented Aug 29, 2019

@sbaruth Ok, now I'm more inclined towards revisiting the proposed approach :) I believe the right direction might be to enhance the current magnetFetch based mechanism to make it capable of:

  • storing downloaded contents alongside with purgeable cache;
  • pre-caching related contents (images in case of articles and articles in case of issues);
  • storing/manipulating data directly on disk (abolishing AsyncStorage dependency);
  • being versatile.

@sbaruth
Copy link

sbaruth commented Aug 29, 2019

Yea, I think what you outlined sounds pretty solid. Will be interesting to hear input from the rest of the team members. This is a tricky problem!

@mishagreenberg
Copy link

mishagreenberg commented Sep 3, 2019

cacheEntries.json-
url1: {
	userTags: ['issue1591','article1345'],
	etag: 'a2t376uf2fj48fkjiugh2376',
        cacheDate: 124142,
        lastReadDate: 124124,
}

removeUserTag('issue1591')
vacuumCache() - scans entries and makes cache is within max # of entries. call vacuumCache on startup on some frequency?

issues/reducer.js-

savedIssues: [{
id:'issue1591',
status:'downloading',
}]

// download issue article
magnetFetch(url,
options: {
userTags:['issue123'],
})

// download article - saved bookmark
magnetFetch(url,
options: {
userTags:['article1'],
})

// access article from home feed
magnetFetch(url,
options: {
userTags:[],
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment