Skip to content

Instantly share code, notes, and snippets.

@evan-burke
Last active July 5, 2022 16:08
Show Gist options
  • Save evan-burke/15ab3418c4239a81f463b2b7d1080034 to your computer and use it in GitHub Desktop.
Save evan-burke/15ab3418c4239a81f463b2b7d1080034 to your computer and use it in GitHub Desktop.
Fastest way to generate a SQL schema from raw data and/or insert data to the table
# Use Pandas & SQLAlchemy.
# https://stackoverflow.com/questions/23103962/how-to-write-dataframe-to-postgres-table
# Note this will create a new table; see the 'faster option' at the above link for a method using 'copy' to an existing table.
# However, 'copy' cannot do an upsert; that requires inserting to a temp table, then upserting form temp table to destination table.
# This will lack PKs and FKs and indexes, of course, so if used naively you may see data duplication.
# Also the df.to_sql command can do an append (but not upsert), using the if_exists param:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
# If using 'append' with an existing table with a PK, you'll get a key error on insert.
import pandas as pd
import sqlalchemy as sa
# DB URL follows format defined here
# https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls
db_url = 'postgresql://user:pass@localhost:5432/mydatabase'
engine = sa.create_engine(db_url)
df = pd.DataFrame(mydata)
# or
df = pd.read_csv(mycsv)
# do quick and dirty date conversion if needed
object_cols = [col for col, col_type in df.dtypes.iteritems() if col_type == 'object']
df[object_cols] = df[object_cols].apply(pd.to_datetime, errors='ignore')
# write to the table
df.to_sql('new_tablename', engine, method='multi')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment