Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Bulk Inserts via SQLAlchemy and Flask-SQLAlchemy

Bulk Inserts via SQLAlchemy and Flask-SQLAlchemy

Problem

I ran into an issue today where I had to perform a bulk insert into a postgres DB. I was already using SQLAlchemy and Flask-SQLAlchemy to manage the connections to the db and I didn't want to have to use things like psycopg2 directly.

Solution

Note: SQLAlchemy provides an ORM. It isn't just an ORM. That is an important thing to be kept in mind. This means that you can bypass choose to not use the ORM layer when you don't want it. The idea with an ORM is to track changes to objects and when you have a case like that is when you'd use the ORM. In a bulk upload scenario, you don't need to track changes to objects. All you care is that everything be pushed into the DB.

SQLAlchemy (and Flask-SQLAlchemy) lets us do this by using the engine directly. This is how I did it:

from xref import db
from xref.models import user

users = get_users_to_insert() 
db.engine.execute(user.__table__.insert(), users)

Of course this assumes that you have a list of dicts with the key names matching the Columns defined in your SQLAlchemy model.

Resources

@chefdarek
Copy link

chefdarek commented Nov 14, 2019

How'd you do that?
I have a list of dicts:
[{0: {'date': Timestamp('2018-11-15 00:00:00'),
'all_day': 17883,
'am': 11114,
'pm': 11944,
'id': 11944},
1: {'date': Timestamp('2018-11-16 00:00:00'),
'all_day': 16170,
'am': 6899,
'pm': 13914,
'id': 13914},
2: {'date': Timestamp('2018-11-17 00:00:00'),
'all_day': 27978,
'am': 13001,
'pm': 9064,
'id': 9064}]

A table in postrgres:
class SalesDaily(db.Model):
tablename= 'salesdaily'
id = db.Column(db.Integer, primary_key=True)
date = db.Column(db.DateTime, nullable=True)
all_day = db.Column(db.Integer, nullable=True)
am = db.Column(db.Integer, nullable=True)
pm = db.Column(db.Integer, nullable=True)

and when I try to call the function:
csv_file_path = 'Miseenplace/data/test_sales_daily.csv'

# Read CSV with Pandas
with open(csv_file_path, 'r') as file:
    df = pd.read_csv(file)
records = [df.to_dict('date')]
db.engine.execute(SalesDaily.__table__.insert(), records)

I just get empty Records with all nulls?

@shrayasr
Copy link
Author

shrayasr commented Nov 15, 2019

Hi Darek,

This is how your list looks:

[
  {0: {'date': Timestamp('2018-11-15 00:00:00'), 'all_day': 17883, 'am': 11114, 'pm': 11944, 'id': 11944}},
  {1: {'date': Timestamp('2018-11-16 00:00:00'), 'all_day': 16170, 'am': 6899, 'pm': 13914, 'id': 13914}},
  {2: {'date': Timestamp('2018-11-17 00:00:00'), 'all_day': 27978, 'am': 13001, 'pm': 9064, 'id': 9064}}
]

The keys there are 0, 1 and 2. I think you should transform this so that it looks like this:

[
  {'date': Timestamp('2018-11-15 00:00:00'), 'all_day': 17883, 'am': 11114, 'pm': 11944, 'id': 11944},
  {'date': Timestamp('2018-11-16 00:00:00'), 'all_day': 16170, 'am': 6899, 'pm': 13914, 'id': 13914},
  {'date': Timestamp('2018-11-17 00:00:00'), 'all_day': 27978, 'am': 13001, 'pm': 9064, 'id': 9064}
]

IIRC, this should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment