Numpy out-of-core processing study
#Python particles simulator: numpy out-of-core processing
> This notebook benchmarks different solution to [this Stackoverflow question](
##Utility Functions
##Simulate "`emission`"
Time needed to save the data to a file with **pandas** ([Jeff version](
##Simulate "`timestamps`"
"####Compute timestamps. \n",
###PyTables: store `counts`, then compute `timestamps`
Each particle has a different table of "counts".
Let create the tables on disk:
"####Compute timestamps\n",
###PyTables: compute `timestamps` (without storing `counts`)
###PyTables: compute "`timestamps`", storing them on disk on-the-go
"### Storing \"`counts`\"\n",
"Storing counts will take 10% more space and 40% more time to compute timestamps. Having `counts` stored is not an advantage per-se because only the timestamps are needed in the end. \n",
"The advantage is that recostructing the index (timestamps) is simpler because we query the full time axis in a single command (`.get_where_list('counts >= 1')`). Conversely, with chunked processing, we need to perfome some index arithmetics that is tricky, and maybe a burden to maintain.\n",
"However the the code complexity may be small compared to the sorting and merging that is needed in both cases.\n",
"### Storing \"`timestamps`\"\n",
"Timestamps can be accumulated in RAM. However a final `hstack()` is needed to \"merge\" the different chunks stored in a list. This doubles the memory requirements so the RAM may be insufficient.\n",
"We can store as-we-go timestamps to a table using `.append()`. At the end we can load the table in memory with `.read()`. This is only 10% slower than all-in-memory computation but avoids the \"double the RAM\" requirement. Moreover we can avoid the final full load resulting in minimal RAM usage.\n",
"### H5Py\n",
"**H5py** is a much simpler library than pytables. For this usecase of sequential processing seems a better fit than pytables. The only missing feature is the lack of 'blosc' compression. Must be stested if this results in a big performance penalty."
