Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
sqlalchemy upsert supporting delayed ORM insertion and duplicate removal (inside a single query)
def upsert(session, model, rows):
table = model.__table__
stmt = postgresql.insert(table)
primary_keys = [key.name for key in inspect(table).primary_key]
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
if not update_dict:
raise ValueError("insert_or_update resulted in an empty update_dict")
stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
set_=update_dict)
seen = set()
foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
def handle_foreignkeys_constraints(row):
for c_name, c_value in foreign_keys.items():
foreign_obj = row.pop(c_value.table.name, None)
row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None
for const in unique_constraints:
unique = tuple([const,] + [row[col.name] for col in const.columns])
if unique in seen:
return None
seen.add(unique)
return row
rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
session.execute(stmt, rows)
@nirizr

This comment has been minimized.

Copy link
Owner Author

@nirizr nirizr commented Jul 28, 2018

This upsert method lets you reference ORM objects in future upsert SQL queries, and will also remove any local duplicates (which is something postgresql does not handle, i.e. if you insert the same row twice in the same upsert query postgreql will still fail).

example usage may, given the following Model objects:

class ParentModel(Model):
    id = Column(Integer, primary_key=True)

class ChildModel(Model):
    id = Column(Integer, primary_key=True)
    child_number = Column(Integer)
    parent_id = Column(Integer, ForeignKey(ParentModel.id))

be something like the following:

upsert_rows = {}

for parent in [ParentModel(), ParentModel()]:
   for i in range(10):
        child_model = {'id': i, 'parent': parent, 'child_number': i}
        upsert_rows.append(child_model)

# Parent objects will only be committed now, and their `id` fields will be populated.
session.commit()

# upsert will now fetch `ParentModel.id`s and will populate the `parent_id` fields.
upsert(session, ChildModel, upsert_rows)

Without using this upsert, you'd be requried to commit parent at the beginning of every loop iteration, instead of once at the end of the entire for block and each ParentModel object will have to be inserted individually.

Additionally, items violating UniqueConstraints in a single query will be filtered out to avoid postgresql errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment