Skip to content

Instantly share code, notes, and snippets.

View jdavcs's full-sized avatar
🐧

John Davis jdavcs

🐧
  • Johns Hopkins University
  • Baltimore, MD
View GitHub Profile

Migrating a distributed system to SQLAlchemy 2.0 at "Galactic" scale: lessons learned

Galaxy (https://galaxyproject.org) is a globally-distributed open source software platform that connects analysis tools, datasets, compute resources, a graphical user interface, and a programmatic API. It enables accessible, reproducible, and collaborative data science and is used by thousands of scientists across the world. It has been in continuous development for 17 years and is among the most active open source projects in the world.

From an engineering standpoint, Galaxy is a large distributed system (400K+ lines of Python code), at the core of which is the data model with objects persisted in a relational database. Our data access layer relies heavily on SQLAlchemy (sqlalchemy.org) - a SQL automation tool kit and object-relational mapper widely-used in the Python ecosystem. SQLAlchemy is about the same age as Galaxy and has recently went through the largest upgrade process in its history. The complexity of upgradin

@jdavcs
jdavcs / demo.py
Last active November 27, 2023 18:28
nested transactions demo
import contextlib
import logging
from sqlalchemy import Column, Integer, String, create_engine, select
from sqlalchemy.orm import declarative_base, sessionmaker
#################### setup ####################
engine = create_engine(CONNECTION-STRING)
@jdavcs
jdavcs / running with autocommit
Last active September 14, 2023 13:44
savepoint transaction
import contextlib
from sqlalchemy import Column, Integer, String
from sqlalchemy import select
from sqlalchemy import create_engine
from sqlalchemy.orm import (
declarative_base,
sessionmaker,
)
#################### setup ####################
# Date: 6/15/23
# PR #16209
# api tests
2023-06-15T21:54:46.7968842Z Traceback (most recent call last):
2023-06-15T21:54:46.7969423Z File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/web/framework/decorators.py", line 337, in decorator
2023-06-15T21:54:46.7969911Z rval = func(self, trans, *args, **kwargs)
2023-06-15T21:54:46.7970491Z File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/webapps/galaxy/api/tools.py", line 563, in create
2023-06-15T21:54:46.7971005Z return self.service._create(trans, payload, **kwd)
2023-06-15T21:54:46.7971600Z File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/webapps/galaxy/services/tools.py", line 175, in _create
# Date: 06/13/23
# PR #16209
# api tests
2023-06-13T19:14:00.1624839Z [SQL: UPDATE job SET update_time=%(update_time)s, state=%(state)s, job_runner_name=%(job_runner_name)s, handler=%(handler)s WHERE job.id = %(job_id)s]
2023-06-13T19:14:00.1626020Z [parameters: {'update_time': datetime.datetime(2023, 6, 13, 18, 17, 46, 763299), 'state': <JobState.QUEUED: 'queued'>, 'job_runner_name': 'celery', 'handler': 'fv-az1234-686', 'job_id': 4}]
2023-06-13T19:14:00.1626864Z (Background on this error at: https://sqlalche.me/e/14/e3q8) (Background on this error at: https://sqlalche.me/e/14/7s2a)
2023-06-13T19:14:00.1628035Z celery.worker.request DEBUG 2023-06-13 18:17:48,871 [pN:main,p:2329,tN:Thread-2] Task accepted: galaxy.celery.tasks.setup_fetch_data[a554b187-f219-412b-a9da-2d73aa5dda92] pid:2329
2023-06-13T19:14:00.1629163Z galaxy.celery WARNING 2023-06-13 18:17:48,872 [pN:main,p:2329,tN:Thread-2] Celery task execution failed for setup_fetch_data setup_fetch_data (0.599 ms)
2023-06-13T19:14:00.1633944Z celery.app.t
# Date: 06/13/23
# PR #16209
# api tests
2023-06-13T19:36:43.8337105Z galaxy.workflow.scheduling_manager DEBUG 2023-06-13 18:41:56,943 [pN:main,p:2288,tN:WorkflowRequestMonitor.monitor_thread] Attempting to schedule workflow invocation [42]
2023-06-13T19:36:43.8338459Z galaxy.workflow.run DEBUG 2023-06-13 18:41:56,981 [pN:main,p:2288,tN:WorkflowRequestMonitor.monitor_thread] Marking step 459 outputs of invocation 42 delayed (workflow paused at this step waiting for review)
2023-06-13T19:36:43.8339675Z galaxy.workflow.run DEBUG 2023-06-13 18:41:56,983 [pN:main,p:2288,tN:WorkflowRequestMonitor.monitor_thread] Marking step 460 outputs of invocation 42 delayed (dependent step [459] delayed, so this step must be delayed)
2023-06-13T19:36:43.8340702Z galaxy.workflow.scheduling_manager DEBUG 2023-06-13 18:41:56,988 [pN:main,p:2288,tN:WorkflowRequestMonitor.monitor_thread] Workflow invocation [42] scheduled
2023-06-13T19:36:43.8341815Z galaxy.workflow.scheduling_manager DEBUG 2023-06-13 18:41:57,196 [pN:main,p:2288,
1 ==================================================================================
2 TESTING: new db/sqlite, pre-conflict dev (current dev minus 3a29, caa7)
3 ==================================================================================
4 - manage_db.sh init
5
6 - manage_db.sh version
7 9540a051226e (gxy) (head)
8 d4a650f47a3c (tsi) (head)
9 - manage_db.sh dbversion
10 9540a051226e (head)
@jdavcs
jdavcs / gist:331bfc06c029a9d125b44001e150c240
Created September 22, 2021 17:09
UI performance data
history,pairs,ui click,ui screen done,ui total,request 1,request 2,request total
old,10,860,969,109,52,18,70
old,10,779,884,105,55,17,72
old,10,720,821,101,52,18,70
old,10,670,781,111,51,18,69
old,10,671,769,98,39,14,53
old,10,870,971,101,55,12,67
old,10,563,665,102,38,17,55
old,10,649,729,80,37,11,48
old,10,311,397,86,42,19,61
@jdavcs
jdavcs / subquery_vs_no_subquery.py
Created May 21, 2021 04:28
subquery load options
from sqlalchemy import (
create_engine,
inspect,
Column,
ForeignKey,
Integer,
MetaData,
String,
Table,
)
@jdavcs
jdavcs / gist:ad7f31bd780ea9c8abd87f3ab480bb0c
Last active April 20, 2021 05:20
pre and post flush difference between SA 1.3 and 1.4 at checkpoint 4
PRE-FLUSH:
(Pdb) pp items
<galaxy.model.MetadataFile(None) at 0x7fb433652b20>,
<galaxy.model.Job(1) at 0x7fb4336496a0>,
<galaxy.model.User(1) at 0x7fb4336432e0>,
<galaxy.model.JobToOutputDatasetAssociation(1) at 0x7fb4336a5cd0>,
<galaxy.model.HistoryDatasetAssociation(1) at 0x7fb4336a5880>,
<galaxy.model.Dataset(1) at 0x7fb4336a5d00>