akiradeveloper/read-caching.md

## read-caching.md

      
    Raw
  

              read-caching.md
            
          
    I am writing this blog as a documentation draft and the aim is to make all users feel easy to use read-caching in their production.
writeboost is a log-structured writeback-caching software so it sounds bit strange that it can supports read-caching. The idea is implement read-caching as lightweight write-caching. The basic idea is:

Only caches 4KB (full-sized) read requests
Captivate read data into the read-cache cells, in the endio callback. (read_cache_cell_copy_data) There are 2048 cells preallocated. reserve_read_cache_cell is called when the read request results in cache miss (then read from the backing device) and the function reserves one of the cells. In endio, the read data payload is copied to the cell reserved.
Once all cells are occupied writeboost "injects" the read data captivated in the cells into the write-caching path. (inject_read_cache) This is something like read-requests turns into writers. But the code path is simpler than the actual write path because the data is assured to be 4KB (never be partial) and there is no chance of overwriting the existing cache block because the captivated cell data was cancelled when the write-caching wrote the same address. (might_cancel_read_cache_cell)

struct read_cache_cell {
	sector_t sector;
	void *data; /* 4KB data read */
	bool cancelled; /* Don't include this */
	struct rb_node rb_node;
};
With the tunable parameter read_cache_threshold set, read-caching detects the sequentiality within the 2048 cells and cancels the cells that is sequential along the sector. read_cache_cell is maintained in the rb tree because taking the cells out of the tree is sorted in ascending order which is helpful to detect sequentiality.
the struct dirtiness indicates the dirtiness of a cache block. is_dirty is set false if it's read-cached and no need to write back. data_bits is the bit mask of valid regions in the 4KB cache block. Therefore all read-cached blocks are set is_dirty=false and data_bits = 255. This way, we can mix dirty and clean cache data.
struct dirtiness {
	bool is_dirty;
	u8 data_bits;
};
In summary, read-caching is implemented using write-caching so nothing special was added. The codebase is only 400 lines against the total 4000 lines. The algorithm is conservative so to be implemented safely. So why not using read-caching.