Skip to content

Instantly share code, notes, and snippets.

@jitsejan
Last active February 13, 2024 12:54
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jitsejan/2735e8ed80a70a67d1fa40a2d969c7eb to your computer and use it in GitHub Desktop.
Save jitsejan/2735e8ed80a70a67d1fa40a2d969c7eb to your computer and use it in GitHub Desktop.
Azure Table Storage Pandas Dataframe
import pandas as pd
from azure.cosmosdb.table.tableservice import TableService
CONNECTION_STRING = "DUMMYSTRING"
SOURCE_TABLE = "DUMMYTABLE"
def set_table_service():
""" Set the Azure Table Storage service """
return TableService(connection_string=CONNECTION_STRING)
def get_dataframe_from_table_storage_table(table_service, filter_query):
""" Create a dataframe from table storage data """
return pd.DataFrame(get_data_from_table_storage_table(table_service,
filter_query))
def get_data_from_table_storage_table(table_service, filter_query):
""" Retrieve data from Table Storage """
for record in table_service.query_entities(
SOURCE_TABLE, filter=filter_query
):
yield record
fq = "PartitionKey eq '12345'"
ts = set_table_service()
df = get_dataframe_from_table_storage_table(table_service=ts,
filter_query=fq)
@cliffeby
Copy link

Typo in: df = get_data_dataframe_from_table_storage_table(table_service=ts,
filter_query=fq)

Should be: df = get_dataframe_from_table_storage_table(table_service=ts,
filter_query=fq)

@nigelainscoe
Copy link

nigelainscoe commented May 24, 2019

Plus 1 for cliffeby above.

@Aatmaj1
Copy link

Aatmaj1 commented Dec 20, 2019

Hi,
I tried this code for extracting nearly 1 TB data but it will go memory out,can you suggest how to optimize this code?

@jitsejan
Copy link
Author

Typo in: df = get_data_dataframe_from_table_storage_table(table_service=ts,
filter_query=fq)

Should be: df = get_dataframe_from_table_storage_table(table_service=ts,
filter_query=fq)

Thanks @cliffeby and @nigelainscoe. Fixed it within 1 year!

@jitsejan
Copy link
Author

Hi,
I tried this code for extracting nearly 1 TB data but it will go memory out,can you suggest how to optimize this code?

Hi @Aatmaj1,

I have not tried this with big data sets. In one of my bigger projects however I used the above code, but instead of writing the whole table at once to a Pandas dataframe I modified the fq filter to iterate through the table by month and year and concatenated the Pandas dataframes with pandas.concat to get a single dataframe in the end.

If you have a more specific issue, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment