Skip to content

Instantly share code, notes, and snippets.

@inklesspen
Created April 4, 2017 07:10
Show Gist options
  • Save inklesspen/dc6c7c55ab44b2d6b3bf355c3e4b7c0a to your computer and use it in GitHub Desktop.
Save inklesspen/dc6c7c55ab44b2d6b3bf355c3e4b7c0a to your computer and use it in GitHub Desktop.
from alembic import op
import sqlalchemy as sa
def upgrade():
t_files = sa.table('files', sa.column('id', sa.String(32)), sa.column('content', sa.Blob()))
# we have to use get_bind() because op.execute() doesn't have a return value
# but by using get_bind() we can no longer generate SQL scripts for the migration; we have to run it live
connection = op.get_bind()
my_id = '14c3e928ed9d4ecaa57226bc5b628132'
data = connection.execute(sa.select([t_files.c.content]).where(t_files.c.id == my_id)).scalar()
new_data = transform_data(data)
connection.execute(t_files.update().where(t_files.c.id == my_id).values(content=new_data))
def downgrade():
pass
@jenstroeger
Copy link

jenstroeger commented Apr 4, 2017

How does this scale with large amounts of data? At some point the transaction size must be exhausted, causing the migration to fail? Unless I set the execution_options to autocommit?

For example, if I was to run through all files:

result = connection.execute('select id, content from files')               
for id, data in result.fetchall():
    new_data = transform_data(data)
    connection.execute(t_files.update().where(t_files.c.id == id).values(content=new_data))

then at some point this loop would exhaust the transaction buffer, no? Also, the offline SQL must be huge...

@inklesspen
Copy link
Author

You can't do offline SQL with this method, because you're using Python to transform the data, and you cannot generate a SQL script which runs Python (unless you are using some unusual functionality built into your DB, in which case you should forgo the connection.execute stuff and use that SQL directly).

As far as scaling goes, in Postgres the limit is based on the number of commands executed, not the raw size of the data; and the limit is in the billions. I don't know what DB you are using, but I would not expect this to be an issue.

@jenstroeger
Copy link

jenstroeger commented Apr 5, 2017

I noticed that you’re using sa.table but I assume it’s either sa.Table or using import sqlalchemy.sql. What’s the advantage of one over the other? Since the auto-generated migration uses the capitalized Column etc, perhaps I ought to stick with that?

You also mentioned that MySQL doesn’t implement transactions for ALTER TABLE. If I use sa.execute() then is that wrapped into a transaction (and Alembic commits the transaction upon successful upgrade() or downgrade()?), or is that execute issued immediately?

@inklesspen
Copy link
Author

inklesspen commented Apr 6, 2017

no, it's not. it's sqlalchemy.table, in lowercase. (in older sqlalchemy versions, you had to use it as sqlalchemy.sql.table, but 1.x made it simpler)

you should use the lowercase table and column for lightweight table constructs like this.

Alembic runs the migrations inside a transaction. However, MySQL implicitly instantly commits the transaction whenever you use ALTER TABLE.

@jenstroeger
Copy link

I need to update my Alchemy then, yes.

And after the instant commit, will there be a new transaction or instant commits from thereon?

@jenstroeger
Copy link

I’ve finally updated the SO answer here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment