Skip to content

Instantly share code, notes, and snippets.

@shrayasr
Last active December 11, 2020 16:04
Show Gist options
  • Save shrayasr/5df96d5bc287f3a2faa4 to your computer and use it in GitHub Desktop.
Save shrayasr/5df96d5bc287f3a2faa4 to your computer and use it in GitHub Desktop.
Bulk Inserts via SQLAlchemy and Flask-SQLAlchemy

Bulk Inserts via SQLAlchemy and Flask-SQLAlchemy

Problem

I ran into an issue today where I had to perform a bulk insert into a postgres DB. I was already using SQLAlchemy and Flask-SQLAlchemy to manage the connections to the db and I didn't want to have to use things like psycopg2 directly.

Solution

Note: SQLAlchemy provides an ORM. It isn't just an ORM. That is an important thing to be kept in mind. This means that you can bypass choose to not use the ORM layer when you don't want it. The idea with an ORM is to track changes to objects and when you have a case like that is when you'd use the ORM. In a bulk upload scenario, you don't need to track changes to objects. All you care is that everything be pushed into the DB.

SQLAlchemy (and Flask-SQLAlchemy) lets us do this by using the engine directly. This is how I did it:

from xref import db
from xref.models import user

users = get_users_to_insert() 
db.engine.execute(user.__table__.insert(), users)

Of course this assumes that you have a list of dicts with the key names matching the Columns defined in your SQLAlchemy model.

Resources

@shrayasr
Copy link
Author

Hi Darek,

This is how your list looks:

[
  {0: {'date': Timestamp('2018-11-15 00:00:00'), 'all_day': 17883, 'am': 11114, 'pm': 11944, 'id': 11944}},
  {1: {'date': Timestamp('2018-11-16 00:00:00'), 'all_day': 16170, 'am': 6899, 'pm': 13914, 'id': 13914}},
  {2: {'date': Timestamp('2018-11-17 00:00:00'), 'all_day': 27978, 'am': 13001, 'pm': 9064, 'id': 9064}}
]

The keys there are 0, 1 and 2. I think you should transform this so that it looks like this:

[
  {'date': Timestamp('2018-11-15 00:00:00'), 'all_day': 17883, 'am': 11114, 'pm': 11944, 'id': 11944},
  {'date': Timestamp('2018-11-16 00:00:00'), 'all_day': 16170, 'am': 6899, 'pm': 13914, 'id': 13914},
  {'date': Timestamp('2018-11-17 00:00:00'), 'all_day': 27978, 'am': 13001, 'pm': 9064, 'id': 9064}
]

IIRC, this should work ⬆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment