Skip to content

Instantly share code, notes, and snippets.

@natefoo
Last active February 27, 2020 20:08
Show Gist options
  • Save natefoo/6e45ae363ef434f38a935ff1b05b4b6e to your computer and use it in GitHub Desktop.
Save natefoo/6e45ae363ef434f38a935ff1b05b4b6e to your computer and use it in GitHub Desktop.
uWSGI Zerg Mode + Mules

Background

Two commonly used Galaxy server configurations are the use of uWSGI Zerg Mode and uWSGI Mules as Galaxy job handlers. These features are not easily compatible because Galaxy job handlers rely heavily on having unique server names, and handlers' server names must be persistent across restarts. Because zerg mode results in running two Galaxy servers simultaneously (however briefly), using mules with zerg mode would necessarily mean running mules with overlapping server names.

Solution

In a typical Galaxy zerg mode setup, the newly started zergling (B) terminates the old zergling (A) once B is ready to serve requests. Zergling B then continues to serve requests until another zergling (C) is started and terminates B.

It is possible to get both zerg mode and mules working together by configuring zergling B to start without mules, and perform a double zerg dance on each restart:

  1. Zergling A is running with job handler mules.
  2. Admin starts Zergling B without job handler mules.
  3. Zergling B completes loading and terminates zergling A. Job handling is paused.
  4. Emperor automatically restarts Zergling A on shutdown.
  5. Zergling A completes loading and terminates zergling B. Job handling resumes.

This setup requires the use of an additional uWSGI feature, Emperor.

HOWTO

Assuming the admin training layout under /srv/:

  1. Create /srv/galaxy/vassals
  2. Place vassal-zergpool.yml and vassal-zergling.yml in /srv/galaxy/vassals
  3. Integrate galaxy.yml with /srv/galaxy/config/galaxy.yml
  4. Place galaxy-restarter.yml in /srv/galaxy/config and integrate galaxy.yml into it. The important things are:
    1. galaxy-restarter.yml should not contain any mule or farm directives in the uwsgi section
    2. A custom job_config_file should be defined in the galaxy section
  5. Place job_conf-restarter.xml in /srv/galaxy/config
  6. Start the zergpool and zergling A with: /srv/galaxy/venv/bin/uwsgi --emperor /srv/galaxy/vassals --emperor-wrapper /srv/galaxy/venv/bin/uwsgi
  7. Start zergling B to initiate a restart with: /srv/galaxy/venv/bin/uwsgi --yaml /srv/galaxy/config/galaxy-restarter.yml

Mostly

Ok, it mostly works. It's not possible to start Galaxy with a job config that allows job records to be created in the database with handler = null, which is required in order to have zergling A pick up and recover jobs at startup that were submitted while zergling B was running. You can see we set the handler ID to an empty string in job_conf-restarter.xml, but that's not the same as null.

You could run this in a loop while restarting to catch most of them but it'd still be possible to miss some (and those jobs would therefore never run until Galaxy restarted again):

UPDATE job SET handler = null WHERE handler = '' AND state = 'new';

For a proper solution, it'd probably be 30 minutes of Galaxy dev work to make some special handler ID named _null_ that Galaxy would turn into a real null.

---
uwsgi:
master-fifo: /srv/galaxy/var/zerg/zergling-new.fifo
master-fifo: /srv/galaxy/var/zerg/zergling-running.fifo
master-fifo: /srv/galaxy/var/zerg/zergling-old.fifo
zerg: /srv/galaxy/var/zerg/pool.sock
if-exists: /srv/galaxy/var/zerg/zergling-running.fifo
hook-accepting1-once: writefifo:/srv/galaxy/var/zerg/zergling-running.fifo 2q
endif: null
hook-accepting1-once: spinningfifo:/srv/galaxy/var/zerg/zergling-new.fifo 1
chdir: /srv/galaxy/server
socket: 127.0.0.1:0
buffer-size: 16384
processes: 2
threads: 4
offload-threads: 2
static-map: /static/style=static/style/blue
static-map: /static=static
master: true
virtualenv: .venv
pythonpath: lib
module: galaxy.webapps.galaxy.buildapp:uwsgi_app()
thunder-lock: true
die-on-term: true
hook-master-start: unix_signal:2 gracefully_kill_them_all
hook-master-start: unix_signal:15 gracefully_kill_them_all
enable-threads: true
galaxy:
job_config_file: /srv/galaxy/config/job_conf-restarter.xml
# other galaxy settings here
---
uwsgi:
socket: 127.0.0.1:0
buffer-size: 16384
processes: 2
threads: 4
offload-threads: 2
static-map: /static/style=static/style/blue
static-map: /static=static
master: true
virtualenv: .venv
pythonpath: lib
module: galaxy.webapps.galaxy.buildapp:uwsgi_app()
thunder-lock: true
die-on-term: true
hook-master-start: unix_signal:2 gracefully_kill_them_all
hook-master-start: unix_signal:15 gracefully_kill_them_all
enable-threads: true
mule: lib/galaxy/main.py
mule: lib/galaxy/main.py
farm: job-handlers:1,2
galaxy:
job_config_file: /srv/galaxy/config/job_conf.xml
# other galaxy settings here
<?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
</plugins>
<handlers>
<!-- this prevents the restarter from handling jobs itself in web workers -->
<handler id=""/>
</handlers>
<destinations>
<destination id="null" runner="local"/>
</destinations>
</job_conf>
---
uwsgi:
master-fifo: /srv/galaxy/var/zerg/zergling-new.fifo
master-fifo: /srv/galaxy/var/zerg/zergling-running.fifo
master-fifo: /srv/galaxy/var/zerg/zergling-old.fifo
zerg: /srv/galaxy/var/zerg/pool.sock
if-exists: /srv/galaxy/var/zerg/zergling-running.fifo
hook-accepting1-once: writefifo:/srv/galaxy/var/zerg/zergling-running.fifo 2q
endif: null
hook-accepting1-once: spinningfifo:/srv/galaxy/var/zerg/zergling-new.fifo 1
chdir: /srv/galaxy/server
yaml: /srv/galaxy/config/galaxy.yml
---
uwsgi:
master: true
# remove http* options to listen for requests proxied by nginx using the uWSGI protocol on localhost:4001
http: :8080
http-to: 127.0.0.1:4001
zerg-pool: /srv/galaxy/var/zerg/pool.sock:127.0.0.1:4001
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment