Created
August 5, 2019 17:47
-
-
Save erget/84f62a72d3bf6eae291bb2d5e71e979e to your computer and use it in GitHub Desktop.
MICMoR Daniel Lee.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# 5 Sep 2019: EDA, ETL, Visualisation\n", | |
"## D. Lee: Data processing, formats, etc.\n", | |
"Resources:\n", | |
"- [Landing page](https://micmor.kit.edu/summer-schools)\n", | |
"- [GitHub repo](https://github.com/cwerner/kit_micmor_summerschool_2019)\n", | |
"\n", | |
"### First block: 9:00 - 9:30\n", | |
"Why do we care about this? What's the point? Rosetta Stone comparison.\n", | |
"Maybe a short excursion into the realm of communicating with aliens, making fun of movies.\n", | |
"How do they even know the endianness? Probably binary is obvious, but how are floats encoded?\n", | |
"As humans we pretty much have data encodings figured out... Kind of. But actually not. And data formats are even more difficult.\n", | |
"#### Basic formats\n", | |
"- CSV\n", | |
"- Image formats like JPG\n", | |
"- xarray\n", | |
"- GeoTIFF\n", | |
"- netCDF\n", | |
"- HDF5\n", | |
"- Meteorological stuff\n", | |
"- Other stuff you might encounter\n", | |
"\n", | |
"Some other stuff\n", | |
"- What libraries\n", | |
"- Conventions\n", | |
" - Coordinates\n", | |
" - Encoding\n", | |
"- Sometimes your data is organised stupidly and then you have to put it into the format you can work with\n", | |
"- Libraries that abstract away this stuff can be helpful, like pandas, numpy, etc.\n", | |
"#### Beyond formats, how do you process data?\n", | |
"- Some cool analogies about how much more efficient you are when you stay in one place\n", | |
"- How to organise your data so that it's optimised for your access patterns (what dimensions increment first, basically?)\n", | |
"### Second block: 9:30 - 10:00\n", | |
"#### But what about the cloud?\n", | |
"- Moving algorithms instead of data.\n", | |
"\n", | |
"Then some stuff about cloud-optimised formats:\n", | |
"- Parquett\n", | |
"- zarr\n", | |
"- COG\n", | |
"- The importance of streaming\n", | |
"- Object store vs files vs databases\n", | |
"\n", | |
"#### Generating interoperable data\n", | |
"Formats to consider, standards, other data formats engineering questions. Keep this brief." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Make sure represented: