Skip to content

Instantly share code, notes, and snippets.

@kwinkunks
Created September 27, 2021 14:08
Show Gist options
  • Save kwinkunks/28634d462769f5dbe431e76f15dbbe3a to your computer and use it in GitHub Desktop.
Save kwinkunks/28634d462769f5dbe431e76f15dbbe3a to your computer and use it in GitHub Desktop.
Making a striplog from a DataFrame
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "bearing-tuition",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'0.8.9'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import striplog\n",
"striplog.__version__"
]
},
{
"cell_type": "markdown",
"id": "flexible-sierra",
"metadata": {},
"source": [
"# Making a striplog from a `pandas.DataFrame`"
]
},
{
"cell_type": "markdown",
"id": "linear-sussex",
"metadata": {},
"source": [
"Let's make a CSV file with some data in it:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "finished-survey",
"metadata": {},
"outputs": [],
"source": [
"data = \"\"\"comp unit,top,base,comp lithology,comp grain size,description\n",
"A,0,0.3,sand,medium,\"Medium-grained unconsolidated sand.\"\n",
"B,0.3,1.2,mixed,NA,\"Broken shale and unconsolidated sand as above.\"\n",
"C,1.2,3.7,shale,NA,\"Dark grey shale.\"\n",
"D,3.7,5.1,sandstone,fine,\"fine-grained brown sandstone with fractures.\"\n",
"\"\"\"\n",
"\n",
"with open(\"data.csv\", \"wt\") as f:\n",
" f.write(data)"
]
},
{
"cell_type": "markdown",
"id": "small-amendment",
"metadata": {},
"source": [
"## Read the CSV directly\n",
"\n",
"Striplog can read CSVs directly, provided there is a column called `'top'` and `'base'`.\n",
"\n",
"If there are columns that start with `'comp'` then these will be used to construct `Component` objects. Or, if you also provide a `Lexicon`, it will try to parse the descriptions into `Component` objects."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "level-belgium",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Striplog(4 Intervals, start=0.0, stop=5.1)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from striplog import Striplog, Lexicon\n",
"\n",
"s = Striplog.from_csv(\"data.csv\")\n",
"s"
]
},
{
"cell_type": "markdown",
"id": "focal-muslim",
"metadata": {},
"source": [
"Look at the first unit in the striplog:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "subsequent-billy",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table><tr><td style=\"width:2em; background-color:#DDDDDD\" rowspan=\"6\"></td><td><strong>top</strong></td><td>0.0</td></tr><tr><td><strong>primary</strong></td><td><table><tr><td><strong>unit</strong></td><td>A</td></tr><tr><td><strong>lithology</strong></td><td>sand</td></tr><tr><td><strong>grain size</strong></td><td>medium</td></tr></table></td></tr><tr><td><strong>summary</strong></td><td>0.30 m of A, sand, medium</td></tr><tr><td><strong>description</strong></td><td>Medium-grained unconsolidated sand.</td></tr><tr><td><strong>data</strong></td><td><table></table></td></tr><tr><td><strong>base</strong></td><td>0.3</td></tr></table>"
],
"text/plain": [
"Interval({'top': Position({'middle': 0.0, 'upper': 0.0, 'lower': 0.0, 'units': 'm'}), 'base': Position({'middle': 0.3, 'units': 'm'}), 'description': 'Medium-grained unconsolidated sand.', 'data': {}, 'components': [Component({'unit': 'A', 'lithology': 'sand', 'grain size': 'medium'})]})"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s[0]"
]
},
{
"cell_type": "markdown",
"id": "retired-london",
"metadata": {},
"source": [
"Make a plot:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "registered-future",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAFwAAADfCAYAAAB7y7QuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAADjElEQVR4nO3cP2sVaRiG8fvRRDcrInaCbqEgrGBnsBZEsBELG/0CqRZrLfwiFtaCCwsRRGE/wEJsFZS4zQZbFfHvCo+NhaghRzNzSSbXrztnhoeXi+FlTvGe6u6Is+NnL2C7MTjM4DCDwwwOMzhspuBVdbaqHlXValVdGXtRU1YbvYdX1c4kj5OcSbKWZCXJpe5+OP7ypmeWJ/xkktXu/re73ye5meT8uMuarlmCH0zy32ef1z59px8wN8M99Y3vvtqHqmopyVKSHPj96Imr9+9ucmlb1+U9R77VLMlsT/hakt8++3woydMvb+ru69292N2L8wu7v3+V28QswVeSHK2qw1W1K8nFJMvjLmu6NtxSuvtDVf2R5F6SnUludPeD0Vc2UbPs4enuO0nujLyWbcFfmjCDwwwOMzjM4DCDwwwOm+k9/HvV6rvMnX8yxuit4e8j617yCYcZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZw2KDBq+pcVV1/8+HVkGMnZdDg3X27u5cW5vYMOXZS3FJgBocZHGZwmMFhBodt+BdMP+LE8f39z61Tg8/dKuaP/bWp/0vRgAwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzhslHOaL17+P+TYSRnlnOa+vfNDjp0UtxSYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWGe04R5ThPmlgIzOMzgMIPDDA4zOMzgMIPDDA4zOMzgMIPDDA4zOMzgMIPDDA4zOMzgMIPDDA4zOMzgMIPDDA4zOGxujKHPn+3N8p+nxxi9JVy4tv41n3CYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcNsrB2Ndv3ww5dlJGORj76y8LQ46dFLcUmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhBocZHGZwmMFhntOEeU4T5pYCMzisunv4oVV3u/vs4IMnYJTgWp9bCszgMIPDDA4zOOwj5ZSOhITpHxQAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 108x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"s.plot(aspect=2)"
]
},
{
"cell_type": "markdown",
"id": "small-refund",
"metadata": {},
"source": [
"## Using a `DataFrame`"
]
},
{
"cell_type": "markdown",
"id": "furnished-coffee",
"metadata": {},
"source": [
"Let's load the data as a `pandas.DataFrame`:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "prepared-macro",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>comp unit</th>\n",
" <th>top</th>\n",
" <th>base</th>\n",
" <th>comp lithology</th>\n",
" <th>comp grain size</th>\n",
" <th>description</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>A</td>\n",
" <td>0.0</td>\n",
" <td>0.3</td>\n",
" <td>sand</td>\n",
" <td>medium</td>\n",
" <td>Medium-grained unconsolidated sand.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>B</td>\n",
" <td>0.3</td>\n",
" <td>1.2</td>\n",
" <td>mixed</td>\n",
" <td>NaN</td>\n",
" <td>Broken shale and unconsolidated sand as above.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>C</td>\n",
" <td>1.2</td>\n",
" <td>3.7</td>\n",
" <td>shale</td>\n",
" <td>NaN</td>\n",
" <td>Dark grey shale.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>D</td>\n",
" <td>3.7</td>\n",
" <td>5.1</td>\n",
" <td>sandstone</td>\n",
" <td>fine</td>\n",
" <td>fine-grained brown sandstone with fractures.</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" comp unit top base comp lithology comp grain size \\\n",
"0 A 0.0 0.3 sand medium \n",
"1 B 0.3 1.2 mixed NaN \n",
"2 C 1.2 3.7 shale NaN \n",
"3 D 3.7 5.1 sandstone fine \n",
"\n",
" description \n",
"0 Medium-grained unconsolidated sand. \n",
"1 Broken shale and unconsolidated sand as above. \n",
"2 Dark grey shale. \n",
"3 fine-grained brown sandstone with fractures. "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv(\"data.csv\")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "emotional-setting",
"metadata": {},
"source": [
"If this was the starting point, we could export as a CSV with `df.to_csv(fname)` and then load it directly to `striplog` using the code above.\n",
"\n",
"Or, we can parse these data ourselves. The best way to do this is probably with `df.apply()`. That way, we get to write the function that maps our data to all the striplog things, so we can tweak anything we want, like the order and spellings of properties. We could also add things on the way.\n",
"\n",
"I know this looks like a lot of code to do something easy, but if you can get your head around it, it provides a lot of flexibility."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "prescription-glance",
"metadata": {},
"outputs": [],
"source": [
"from striplog import Component, Interval\n",
"\n",
"def make_interval(row):\n",
" \"\"\"\n",
" This function processes a row to make a striplog.Interval object.\n",
" \"\"\"\n",
" \n",
" # Make the Component properties.\n",
" props = {'unit': row['comp unit'],\n",
" 'lithology': row['comp lithology'],\n",
" 'grainsize': row['comp grain size'],\n",
" 'source': 'Company database',\n",
" }\n",
" \n",
" # Make the Interval and return.\n",
" iv = Interval(top=row['top'],\n",
" base=row['base'],\n",
" description=row['description'],\n",
" components=[Component(properties=props)]\n",
" )\n",
" return iv"
]
},
{
"cell_type": "markdown",
"id": "determined-alert",
"metadata": {},
"source": [
"Now we can call this row processor on the DataFrame. It will emit a `pandas.Series` of `striplog.Interval` objects. We can cast this to a list and pass it to the `Striplog` constructor:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "rental-phase",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Striplog(4 Intervals, start=0.0, stop=5.1)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = Striplog(df.apply(make_interval, axis=1).to_list())\n",
"s"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "alert-labor",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAFwAAADfCAYAAAB7y7QuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAADeklEQVR4nO3bTYuOYRiH8f85ZhiRJKGwoJSdBfkGymay5QvMSlYUn0C2lhbWVhZIfAEba4pkY5KllLzWaWMhDA/u+9Dcc/x2z0tnV0d3V89MndXdEWfufx9gvTE4zOAwg8MMDjM4bKbgVXWyqp5U1bOqujj2oaasfvc7vKo2JHma5ESSlSQPk5zp7sfjH296ZnnCjyd51t3Pu/tjkhtJTo17rOmaJfjeJC++eb3y9T39hfkZvlM/ee+He6iqlpMsJ8nBXXuO3jx/6R+PtnYduXDuZ82SzPaEryTZ/83rfUlefv+l7r7W3ce6+9jiwsKfn3KdmCX4wySHqupAVW1McjrJrXGPNV2/vVK6+3NVnU1yP8mGJNe7+9HoJ5uoWe7wdPfdJHdHPsu64F+aMIPDDA4zOMzgMIPDDA6b6Xf4Hw9929n54NMYo9c8n3CYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhw0avKqWquram0/vhhw7KYMG7+7b3b28bWHzkGMnxSsFZnCYwWEGhxkcZnDYKDs+rxbncuXw1jFGrwlXf/GZTzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDw0bZ0/zw3j3N1Yyyp7lp0T3N1XilwAwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOMzjM4DCDwwwOc08T5p4mzCsFZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnCYwWEGhxkcZnDY/ChDX89n153tY4xeGy6v/pFPOMzgMIPDDA4zOMzgMIPDDA4zOMzgMIPDDA4b9J9XVbWUZGnHxt1Djp2UURZjF+e2DDl2UrxSYAaHGRxmcJjBYQaHGRxmcJjBYQaHGRxmcJjBYQaHGRxmcJjBYQaHGRxmcJjBYQaHGRxmcJjBYQaHGRxmcJjBYQaHGRxmcJjBYQaHGRxmcJjBYQaHuacJc08T5pUCMzisunv4oVX3uvvk4IMnYJTgWp1XCszgMIPDDA4zOOwLqsyBmaFRGqQAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 108x216 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"s.plot(aspect=2)"
]
},
{
"cell_type": "markdown",
"id": "emerging-senator",
"metadata": {},
"source": [
"---\n",
"\n",
"© Agile Scientific 2021, licensed CC BY"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "welly",
"language": "python",
"name": "welly"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment