Skip to content

Instantly share code, notes, and snippets.

@knu2xs
Last active May 31, 2018 21:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save knu2xs/9b8cde1f656af47dfc147921a14ddbba to your computer and use it in GitHub Desktop.
Save knu2xs/9b8cde1f656af47dfc147921a14ddbba to your computer and use it in GitHub Desktop.
Loading data from a CSV file previously saved from a SpatialDataFrame back into another SpatialDataFrame.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Load Data from a CSV Previously Exported from a SpatialDataFrame\n",
"\n",
"Data previously in a SpatialDataFrame exported to a CSV file unfortunately does not currently (at v1.4) load directly back into a SpatialDataFrame. An intermediate step is required to get the SpatialDataFrame to correclty recognize the SHAPE column as geometry."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from arcgis.features import SpatialDataFrame as SDF\n",
"from arcgis.geometry import Geometry\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Data into a Pandas DataFrame\n",
"\n",
"The first step is to load data into a normal Pandas DataFrame. This gives us a starting point to start manipulating the data."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>STATE</th>\n",
" <th>CLUSTER</th>\n",
" <th>PROJ_EARLY_MORN</th>\n",
" <th>SHAPE</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>150010201001</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>214127.969928</td>\n",
" <td>{\"rings\": [[[-155.09868999999998, 19.801140001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>150010201002</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>177040.856850</td>\n",
" <td>{\"rings\": [[[-155.10497000099997, 19.869760001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>150010201003</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>211432.928668</td>\n",
" <td>{\"rings\": [[[-155.09024999999997, 19.804200001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>150010201004</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>208857.945568</td>\n",
" <td>{\"rings\": [[[-155.11466041399996, 19.793709357...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>150010202021</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>162845.872009</td>\n",
" <td>{\"rings\": [[[-155.09386972499996, 19.757200099...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID STATE CLUSTER PROJ_EARLY_MORN \\\n",
"0 150010201001 HI 5 214127.969928 \n",
"1 150010201002 HI 5 177040.856850 \n",
"2 150010201003 HI 5 211432.928668 \n",
"3 150010201004 HI 5 208857.945568 \n",
"4 150010202021 HI 5 162845.872009 \n",
"\n",
" SHAPE \n",
"0 {\"rings\": [[[-155.09868999999998, 19.801140001... \n",
"1 {\"rings\": [[[-155.10497000099997, 19.869760001... \n",
"2 {\"rings\": [[[-155.09024999999997, 19.804200001... \n",
"3 {\"rings\": [[[-155.11466041399996, 19.793709357... \n",
"4 {\"rings\": [[[-155.09386972499996, 19.757200099... "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(path_csv, index_col=0)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a DataFrame Without the Geometry Column\n",
"\n",
"When we create the SpatialDataFrame, we will load the geometry in through a separate input argumen than the rest of the data. Hence, we need to create a DataFrame _without_ the `SHAPE` column."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>STATE</th>\n",
" <th>CLUSTER</th>\n",
" <th>PROJ_EARLY_MORN</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>150010201001</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>214127.969928</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>150010201002</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>177040.856850</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>150010201003</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>211432.928668</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>150010201004</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>208857.945568</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>150010202021</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>162845.872009</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID STATE CLUSTER PROJ_EARLY_MORN\n",
"0 150010201001 HI 5 214127.969928\n",
"1 150010201002 HI 5 177040.856850\n",
"2 150010201003 HI 5 211432.928668\n",
"3 150010201004 HI 5 208857.945568\n",
"4 150010202021 HI 5 162845.872009"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_data = df[[col for col in df.columns if col != 'SHAPE']]\n",
"df_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convert the SHAPE Column to Geometry\n",
"\n",
"To illustrate, if we take a look at the first value from the `SHAPE` column, it is actually an extremely long string. To be recoginzed as a Geometry object by the Python API, it needs to be a dictionary."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df.SHAPE.iloc[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can convert the string to a dictionary by using the Python `eval` method."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(eval(df.SHAPE.iloc[0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This gets us _most_ of the way there. We still need to create a geometry object. This can be accomplished by converting the dictionary using the Geometry object constructor. The Geometry object constructor reads the geometry type based on the JSON strucutre of the input dictionary."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"arcgis.geometry._types.Polygon"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(Geometry(eval(df.SHAPE.iloc[0])))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To convert the entire column, we can use the Pandas `apply` method to apply this conversion to every value in the column."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 {'rings': [[[-155.09868999999998, 19.801140001...\n",
"1 {'rings': [[[-155.10497000099997, 19.869760001...\n",
"2 {'rings': [[[-155.09024999999997, 19.804200001...\n",
"3 {'rings': [[[-155.11466041399996, 19.793709357...\n",
"4 {'rings': [[[-155.09386972499996, 19.757200099...\n",
"Name: SHAPE, dtype: object"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"geom = df.SHAPE.apply(lambda value: Geometry(eval(value)))\n",
"geom.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, with a Pandas `DataFrame` containing the normal tabular data columns, and a Pandas `Series` popualted with valid `Geometry` objects, we use these as input parameters to create a `SpatialDataFrame`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>STATE</th>\n",
" <th>CLUSTER</th>\n",
" <th>PROJ_EARLY_MORN</th>\n",
" <th>SHAPE</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>150010201001</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>214127.969928</td>\n",
" <td>{'rings': [[[-155.09868999999998, 19.801140001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>150010201002</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>177040.856850</td>\n",
" <td>{'rings': [[[-155.10497000099997, 19.869760001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>150010201003</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>211432.928668</td>\n",
" <td>{'rings': [[[-155.09024999999997, 19.804200001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>150010201004</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>208857.945568</td>\n",
" <td>{'rings': [[[-155.11466041399996, 19.793709357...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>150010202021</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>162845.872009</td>\n",
" <td>{'rings': [[[-155.09386972499996, 19.757200099...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID STATE CLUSTER PROJ_EARLY_MORN \\\n",
"0 150010201001 HI 5 214127.969928 \n",
"1 150010201002 HI 5 177040.856850 \n",
"2 150010201003 HI 5 211432.928668 \n",
"3 150010201004 HI 5 208857.945568 \n",
"4 150010202021 HI 5 162845.872009 \n",
"\n",
" SHAPE \n",
"0 {'rings': [[[-155.09868999999998, 19.801140001... \n",
"1 {'rings': [[[-155.10497000099997, 19.869760001... \n",
"2 {'rings': [[[-155.09024999999997, 19.804200001... \n",
"3 {'rings': [[[-155.11466041399996, 19.793709357... \n",
"4 {'rings': [[[-155.09386972499996, 19.757200099... "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sdf = SDF(data=df_data, geometry=geom)\n",
"sdf.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A Function Doing It All\n",
"\n",
"This can all be condensed into a single function facilitating using this as part of larger workflows."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def read_csv(path_csv):\n",
" df = pd.read_csv(path_csv, index_col=0)\n",
" sdf = SDF(\n",
" data=df[[col for col in df.columns if col != 'SHAPE']],\n",
" geometry=df.SHAPE.apply(lambda value: Geometry(eval(value)))\n",
" )\n",
" return sdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the sake of completeness, this is how using this function looks when used to get the same result."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>STATE</th>\n",
" <th>CLUSTER</th>\n",
" <th>PROJ_EARLY_MORN</th>\n",
" <th>SHAPE</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>150010201001</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>214127.969928</td>\n",
" <td>{'rings': [[[-155.09868999999998, 19.801140001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>150010201002</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>177040.856850</td>\n",
" <td>{'rings': [[[-155.10497000099997, 19.869760001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>150010201003</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>211432.928668</td>\n",
" <td>{'rings': [[[-155.09024999999997, 19.804200001...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>150010201004</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>208857.945568</td>\n",
" <td>{'rings': [[[-155.11466041399996, 19.793709357...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>150010202021</td>\n",
" <td>HI</td>\n",
" <td>5</td>\n",
" <td>162845.872009</td>\n",
" <td>{'rings': [[[-155.09386972499996, 19.757200099...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID STATE CLUSTER PROJ_EARLY_MORN \\\n",
"0 150010201001 HI 5 214127.969928 \n",
"1 150010201002 HI 5 177040.856850 \n",
"2 150010201003 HI 5 211432.928668 \n",
"3 150010201004 HI 5 208857.945568 \n",
"4 150010202021 HI 5 162845.872009 \n",
"\n",
" SHAPE \n",
"0 {'rings': [[[-155.09868999999998, 19.801140001... \n",
"1 {'rings': [[[-155.10497000099997, 19.869760001... \n",
"2 {'rings': [[[-155.09024999999997, 19.804200001... \n",
"3 {'rings': [[[-155.11466041399996, 19.793709357... \n",
"4 {'rings': [[[-155.09386972499996, 19.757200099... "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sdf = read_csv(path_csv)\n",
"sdf.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment