Skip to content

Instantly share code, notes, and snippets.

@maddyblue
Last active September 20, 2017 21:06
Show Gist options
  • Save maddyblue/7187003ad87dc5a4417c929cca203dcd to your computer and use it in GitHub Desktop.
Save maddyblue/7187003ad87dc5a4417c929cca203dcd to your computer and use it in GitHub Desktop.
load csv progress notes
For local:
First phase is all of total progress. Initial, simple, implementation is to count the number of files (N), each file, after being converted to KVs, counts for 1/N of this phase's progress. Follow up work to get the file size from the export storage. Then progress for this phase is number of bytes processed / total number of bytes (among all files).
For distributed:
Sampling phase is 1/3 of total progress. Same deal as local with counting files and then file sizes for progress, also keeping track of number of KVs produced.
Second phase is 1/3 of total progress. As KVs are consumed and written to RocksDB, progress is total number of written KVs / total number of KVs (among all distsql processors).
Third phase is 1/3 of total progress. Each SST written to storage is 1/N of this phase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment