Skip to content

Instantly share code, notes, and snippets.

@d-wasserman
Last active February 19, 2024 23:23
Show Gist options
  • Star 39 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save d-wasserman/e9c98be1d0caebc2935afecf0ba239a0 to your computer and use it in GitHub Desktop.
Save d-wasserman/e9c98be1d0caebc2935afecf0ba239a0 to your computer and use it in GitHub Desktop.
Functions to convert a ArcGIS Table/Feature Class in arcpy to a pandas dataframe. For other options, check the new ArcGIS Python API, but this works across versions.
import arcpy
import pandas as pd
def arcgis_table_to_df(in_fc, input_fields=None, query=""):
"""Function will convert an arcgis table into a pandas dataframe with an object ID index, and the selected
input fields using an arcpy.da.SearchCursor.
:param - in_fc - input feature class or table to convert
:param - input_fields - fields to input to a da search cursor for retrieval
:param - query - sql query to grab appropriate values
:returns - pandas.DataFrame"""
OIDFieldName = arcpy.Describe(in_fc).OIDFieldName
if input_fields:
final_fields = [OIDFieldName] + input_fields
else:
final_fields = [field.name for field in arcpy.ListFields(in_fc)]
data = [row for row in arcpy.da.SearchCursor(in_fc,final_fields,where_clause=query)]
fc_dataframe = pd.DataFrame(data,columns=final_fields)
fc_dataframe = fc_dataframe.set_index(OIDFieldName,drop=True)
return fc_dataframe
def arcgis_table_to_dataframe(in_fc, input_fields, query="", skip_nulls=False, null_values=None):
"""Function will convert an arcgis table into a pandas dataframe with an object ID index, and the selected
input fields. Uses TableToNumPyArray to get initial data.
:param - in_fc - input feature class or table to convert
:param - input_fields - fields to input into a da numpy converter function
:param - query - sql like query to filter out records returned
:param - skip_nulls - skip rows with null values
:param - null_values - values to replace null values with.
:returns - pandas dataframe"""
OIDFieldName = arcpy.Describe(in_fc).OIDFieldName
if input_fields:
final_fields = [OIDFieldName] + input_fields
else:
final_fields = [field.name for field in arcpy.ListFields(in_fc)]
np_array = arcpy.da.TableToNumPyArray(in_fc, final_fields, query, skip_nulls, null_values)
object_id_index = np_array[OIDFieldName]
fc_dataframe = pd.DataFrame(np_array, index=object_id_index, columns=input_fields)
return fc_dataframe
@vitale232
Copy link

vitale232 commented Aug 16, 2019

Thanks for this! I'm committing your arcgis_table_to_df function to an internal library with a minor edit. If no input_fields are provided, read all fields:

def table_to_data_frame(in_table, input_fields=None, where_clause=None):
    """Function will convert an arcgis table into a pandas dataframe with an object ID index, and the selected
    input fields using an arcpy.da.SearchCursor."""
    OIDFieldName = arcpy.Describe(in_table).OIDFieldName
    if input_fields:
        final_fields = [OIDFieldName] + input_fields
    else:
        final_fields = [field.name for field in arcpy.ListFields(in_table)]
    data = [row for row in arcpy.da.SearchCursor(in_table, final_fields, where_clause=where_clause)]
    fc_dataframe = pd.DataFrame(data, columns=final_fields)
    fc_dataframe = fc_dataframe.set_index(OIDFieldName, drop=True)
    return fc_dataframe

@d-wasserman
Copy link
Author

This behavior makes sense. I think I might add it too.

Glad you found this useful!

@GIS-Luke
Copy link

I wanted to add my thanks. I'll put aside some time to share some of my time saving snippets.

@d-wasserman
Copy link
Author

d-wasserman commented Feb 10, 2020

Looking forward to seeing what you have!

@GIS-Luke
Copy link

GIS-Luke commented Feb 24, 2020

Over here
https://my.usgs.gov/confluence/x/YSxKI
I found a useful snippet to go back to arcgis table, or swap the arcpy.da.NumPyArrayToTable for arcpy.da.NumPyArrayToFeatureClass if you have XY attributes.
'Shape' can be resource intensive with polygons. Since my workflow involved going back to arcgis, I also had to drop 'Shape_Area' and 'Shape_Length' so I added the below line.
[final_fields.remove(bad_fld for bad_fld in ['Shape', 'Shape_Area', 'Shape_Length']

@d-wasserman
Copy link
Author

d-wasserman commented Feb 28, 2020

Over here
https://my.usgs.gov/confluence/x/YSxKI
I found a useful snippet to go back to arcgis table, or swap the arcpy.da.NumPyArrayToTable for arcpy.da.NumPyArrayToFeatureClass if you have XY attributes.
'Shape' can be resource intensive with polygons. Since my workflow involved going back to arcgis, I also had to drop 'Shape_Area' and 'Shape_Length' so I added the below line.
[final_fields.remove(bad_fld for bad_fld in ['Shape', 'Shape_Area', 'Shape_Length']

Nice! Something I found out recently that might make it a little cleaner is the required property of fields. It might help with filtering out derived/geo fields.

@TyceHerrman
Copy link

This is so useful - exactly what I was looking for! Would love to see this added to arcpy or the ArcGIS Python API. I haven't found anything comparable in either.

@d-wasserman
Copy link
Author

This is so useful - exactly what I was looking for! Would love to see this added to arcpy or the ArcGIS Python API. I haven't found anything comparable in either.

Thanks! I keep on using it as it still seems faster and dependable than the API. The trick I am running into now is join functions, and have had some success there. Let me know if you can share any back! I want to move beyond record joins.

@JJoffre
Copy link

JJoffre commented May 24, 2020

THANK YOU !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! So ...... Umm how can I change this to get me .aliasName and description row Values.

@AnneEstoppey
Copy link

Thanks, I am using the edited version from @vitale232 and that works like a charm!

@d-wasserman
Copy link
Author

The most updated version should match the functionality provided by @vitale232. Is there an error in the current gist?

@Orrgvili
Copy link

Orrgvili commented Oct 2, 2020

thanks, this is wonderful :)
solved me some problems

@d-wasserman
Copy link
Author

Not enough information is provided to assist. Can you provide a full snippet?

@dkitoko
Copy link

dkitoko commented Feb 7, 2021

Sorry, I was not able to visualize the data using the code below:

def table_to_data_frame(in_table, input_fields=None, where_clause="insert_date >= TO_DATE('01/01/2021','DD/MM/YYYY') and insert_date < TO_DATE('02/01/2021','DD/MM/YYYY')"):

    """Function will convert an arcgis table into a pandas dataframe with an object ID index, and the selected
    input fields using an arcpy.da.SearchCursor."""
    OIDFieldName = arcpy.Describe(in_table).OIDFieldName
    if input_fields:
        final_fields = [OIDFieldName] + input_fields
    else:
        final_fields = [field.name for field in arcpy.ListFields(in_table)]
    data = [row for row in arcpy.da.SearchCursor(in_table, final_fields, where_clause=where_clause)]
    fc_dataframe = pd.DataFrame(data, columns=final_fields)
    fc_dataframe = fc_dataframe.set_index(OIDFieldName, drop=True)
   return fc_dataframe

Than I realize that I needed to cal it using "fc_dataframe = table_to_data_frame(in_table)" at the ends. And also that my where_clause is invalid. If i remove the SQL query it's works. I teste other less complex queries and it's also works. My issue now is how to query by date. Before converting the table I want to filter by date.

@d-wasserman
Copy link
Author

This modification is beyond the scope of the gist. My only advice is check that you are using the appropriate flavor of SQL with regard to temporal queries for your RDBMS. Thanks.

@yodrin
Copy link

yodrin commented Apr 11, 2022

Thank you very much! An excellent tool! Very useful!

@Choumingzhao
Copy link

I got following Exception: Data must be 1-dimensional. And it turns out that the @Shape field may cause the problem. So I add a drop_shape argument to the functions.

# -*- encoding: utf-8 -*-

# Functionality: Convert feature class to pandas dataframe
# Original Author: [d-wasserman](https://gist.github.com/d-wasserman)
# Modified by [Choumingzhao](https://gist.github.com/Choumingzhao)
# Source: https://gist.github.com/d-wasserman/e9c98be1d0caebc2935afecf0ba239a0

import arcpy
import pandas as pd

def fc_to_df(in_fc, input_fields=None, drop_shape=True, query=""):
    """Function will convert an arcgis feature class table into a pandas dataframe with an object ID index, and the selected
    input fields using an arcpy.da.SearchCursor.
    :param - in_fc - input feature class or table to convert
    :param - input_fields - fields to input to a da search cursor for retrieval
    :param - drop_shape - drop the shape field from the dataframe
    :param - query - sql query to grab appropriate values
    :returns - pandas.DataFrame"""
    OIDFieldName = arcpy.Describe(in_fc).OIDFieldName
    if input_fields:
        final_fields = [OIDFieldName] + input_fields
    else:
        final_fields = [field.name for field in arcpy.ListFields(in_fc)]
    if drop_shape and u"Shape" in final_fields:
        final_fields.remove(u"Shape")
    data = [row for row in arcpy.da.SearchCursor(in_fc,final_fields,where_clause=query)]
    fc_dataframe = pd.DataFrame(data, columns=final_fields)
    fc_dataframe = fc_dataframe.set_index(OIDFieldName,drop=True)
    return fc_dataframe

def fc_to_df2(in_fc, input_fields=None, drop_shape=True, query="", skip_nulls=False, null_values=None):
    """Function will convert an arcgis feature class table into a pandas dataframe with an object ID index, and the selected
    input fields. Uses TableToNumPyArray to get initial data.
    :param - in_fc - input feature class or table to convert
    :param - input_fields - fields to input into a da numpy converter function
    :param - drop_shape - drop the shape field from the dataframe
    :param - query - sql like query to filter out records returned
    :param - skip_nulls - skip rows with null values
    :param - null_values - values to replace null values with.
    :returns - pandas dataframe"""
    OIDFieldName = arcpy.Describe(in_fc).OIDFieldName
    if input_fields:
        final_fields = [OIDFieldName] + input_fields
    else:
        final_fields = [field.name for field in arcpy.ListFields(in_fc)]
    if drop_shape and u"Shape" in final_fields:
        final_fields.remove(u"Shape")
    np_array = arcpy.da.TableToNumPyArray(in_fc, final_fields, query, skip_nulls, null_values)
    object_id_index = np_array[OIDFieldName]
    fc_dataframe = pd.DataFrame(np_array, index=object_id_index, columns=input_fields)
    return fc_dataframe

@d-wasserman
Copy link
Author

I got following Exception: Data must be 1-dimensional. And it turns out that the @Shape field may cause the problem. So I add a drop_shape argument to the functions.

# -*- encoding: utf-8 -*-

# Functionality: Convert feature class to pandas dataframe
# Original Author: [d-wasserman](https://gist.github.com/d-wasserman)
# Modified by [Choumingzhao](https://gist.github.com/Choumingzhao)
# Source: https://gist.github.com/d-wasserman/e9c98be1d0caebc2935afecf0ba239a0

import arcpy
import pandas as pd

def fc_to_df(in_fc, input_fields=None, drop_shape=True, query=""):
    """Function will convert an arcgis feature class table into a pandas dataframe with an object ID index, and the selected
    input fields using an arcpy.da.SearchCursor.
    :param - in_fc - input feature class or table to convert
    :param - input_fields - fields to input to a da search cursor for retrieval
    :param - drop_shape - drop the shape field from the dataframe
    :param - query - sql query to grab appropriate values
    :returns - pandas.DataFrame"""
    OIDFieldName = arcpy.Describe(in_fc).OIDFieldName
    if input_fields:
        final_fields = [OIDFieldName] + input_fields
    else:
        final_fields = [field.name for field in arcpy.ListFields(in_fc)]
    if drop_shape and u"Shape" in final_fields:
        final_fields.remove(u"Shape")
    data = [row for row in arcpy.da.SearchCursor(in_fc,final_fields,where_clause=query)]
    fc_dataframe = pd.DataFrame(data, columns=final_fields)
    fc_dataframe = fc_dataframe.set_index(OIDFieldName,drop=True)
    return fc_dataframe

def fc_to_df2(in_fc, input_fields=None, drop_shape=True, query="", skip_nulls=False, null_values=None):
    """Function will convert an arcgis feature class table into a pandas dataframe with an object ID index, and the selected
    input fields. Uses TableToNumPyArray to get initial data.
    :param - in_fc - input feature class or table to convert
    :param - input_fields - fields to input into a da numpy converter function
    :param - drop_shape - drop the shape field from the dataframe
    :param - query - sql like query to filter out records returned
    :param - skip_nulls - skip rows with null values
    :param - null_values - values to replace null values with.
    :returns - pandas dataframe"""
    OIDFieldName = arcpy.Describe(in_fc).OIDFieldName
    if input_fields:
        final_fields = [OIDFieldName] + input_fields
    else:
        final_fields = [field.name for field in arcpy.ListFields(in_fc)]
    if drop_shape and u"Shape" in final_fields:
        final_fields.remove(u"Shape")
    np_array = arcpy.da.TableToNumPyArray(in_fc, final_fields, query, skip_nulls, null_values)
    object_id_index = np_array[OIDFieldName]
    fc_dataframe = pd.DataFrame(np_array, index=object_id_index, columns=input_fields)
    return fc_dataframe

I might suggest using a describe object to make sure you actually get the geometry field. I can take a look at the drop shape component.

@salvuccigianluigi
Copy link

thanks for your usefull function, but sometimes I've problems.
I resolve with this:

def table_to_data_frame(in_table, input_fields=None, where_clause=None):
OIDFieldName = arcpy.Describe(in_table).OIDFieldName
if input_fields:
final_fields = [OIDFieldName] + input_fields
else:
final_fields = [field.name for field in arcpy.ListFields(in_table)]

df = pd.DataFrame(columns=final_fields)
data = [row for row in arcpy.da.SearchCursor(feature, final_fields)]
df = pd.DataFrame(data, columns =final_fields)
return df

it seems more fast, thanks again!

@d-wasserman
Copy link
Author

I think I need to see the full function definition to understand what you are suggesting @salvuccigianluigi. In my experience, arcgis_table_to_df is faster.

@DS-Koala
Copy link

Does anyone have an easy way to go backwards? After converting the table to a DataFrame, I want to do some modifications to the DataFrame and then update the original feature's attribute table to reflect it.

Thanks!!

@d-wasserman
Copy link
Author

There are a few approaches to this.

  1. Use the ArcGIS Python API and work with a spatially enabled dataframe and export back instead of these functions.
  2. Use ExtendTable. It is very finicky and works only with numeric columns reliably. It can work with text with some set up I have been told, but in practice I stay away.
  3. Use insert/update cursors. This takes some set up, but you can iterrows and approach it this way.
  4. Export to CSV, arcpy.TableToTable, join fields. This is pretty slow, but can be faster if you do field indexes. I don't prefer this approach.

@GIS-Luke
Copy link

GIS-Luke commented Mar 1, 2023

Does anyone have an easy way to go backwards? After converting the table to a DataFrame, I want to do some modifications to the DataFrame and then update the original feature's attribute table to reflect it.

Thanks!!

You could retain the geometry by reading the entire feature class with geopandas, then write it to a shapefile or a geopackage and read it back into ArcGIS. This can also be used to perform functions in geopandas that you may not have a license for in ArcGIS.

@Orrgvili
Copy link

Orrgvili commented Mar 1, 2023 via email

@sivogel
Copy link

sivogel commented Nov 7, 2023

There are a few approaches to this.

1. Use the ArcGIS Python API and work with a spatially enabled dataframe and export back instead of these functions.

2. Use [ExtendTable](https://github.com/d-wasserman/arc-sampling-and-scoring/blob/f5f442675447eda2c7da7822789bde054d65952b/Scripts/PercentileScoreFields.py#L89). It is very finicky and works only with numeric columns reliably. It can work with text with some set up I have been told, but in practice I stay away.

3. Use insert/update cursors. This takes some set up, but you can iterrows and approach it this way.

4. Export to CSV, arcpy.TableToTable, join fields. This is pretty slow, but can be faster if you do field indexes. I don't prefer this approach.

Hi GIS Luke,

Is there a limitation to your first option? I usually only use arcpy and not arcgis as I always thought the arcgis API was only for web applications.

ESRI writes:

ArcPy and the ArcGIS API for Python are complimentary libraries; ArcPy allows you to use, automate and extend desktop GIS, and the ArcGIS API for Python supports the same for web GIS.

If I can work with arcpy and sedf from arcgis on local data without any Enterprise/ArcGISOnline dependencies, it is not complementary to me. On the contrary, it is a good addition to arcpy!

@d-wasserman
Copy link
Author

They don't emphasize the fact it can create SEDF locally and then back, but just that functionality alone makes it a good complement these days.

@Orrgvili
Copy link

look here
to read from local FC
sdf = pd.DataFrame.spatial.from_featureclass("path\to\your\data\census_example\census.gdb\cities")
sdf.head()

write it back to local fc
sdf.spatial.to_featureclass(location=r"c:\output_examples\census.gdb\cities");

the problen with this it someting chagnge the type on numerc fields (int to float)

@samsung-chungso
Copy link

Has anyone encountered a ValueError: Object: Error in accessing describe while trying to access 'arcgis.Describe()'? I didn't encounter any error messages previously, but suddenly this error occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment