Last active
February 13, 2024 12:54
-
-
Save jitsejan/2735e8ed80a70a67d1fa40a2d969c7eb to your computer and use it in GitHub Desktop.
Azure Table Storage Pandas Dataframe
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
from azure.cosmosdb.table.tableservice import TableService | |
CONNECTION_STRING = "DUMMYSTRING" | |
SOURCE_TABLE = "DUMMYTABLE" | |
def set_table_service(): | |
""" Set the Azure Table Storage service """ | |
return TableService(connection_string=CONNECTION_STRING) | |
def get_dataframe_from_table_storage_table(table_service, filter_query): | |
""" Create a dataframe from table storage data """ | |
return pd.DataFrame(get_data_from_table_storage_table(table_service, | |
filter_query)) | |
def get_data_from_table_storage_table(table_service, filter_query): | |
""" Retrieve data from Table Storage """ | |
for record in table_service.query_entities( | |
SOURCE_TABLE, filter=filter_query | |
): | |
yield record | |
fq = "PartitionKey eq '12345'" | |
ts = set_table_service() | |
df = get_dataframe_from_table_storage_table(table_service=ts, | |
filter_query=fq) |
Hi,
I tried this code for extracting nearly 1 TB data but it will go memory out,can you suggest how to optimize this code?
Hi @Aatmaj1,
I have not tried this with big data sets. In one of my bigger projects however I used the above code, but instead of writing the whole table at once to a Pandas dataframe I modified the fq
filter to iterate through the table by month
and year
and concatenated the Pandas dataframes with pandas.concat
to get a single dataframe in the end.
If you have a more specific issue, please let me know.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks @cliffeby and @nigelainscoe. Fixed it within 1 year!