Skip to content

Instantly share code, notes, and snippets.

@jmchilton
Created November 16, 2012 05:50
Show Gist options
  • Save jmchilton/4084546 to your computer and use it in GitHub Desktop.
Save jmchilton/4084546 to your computer and use it in GitHub Desktop.
Deploying Production Galaxy Environments with CloudBioLinux and CloudMan

Deploying Production Galaxy Environments with CloudBioLinux and CloudMan

Typically, CloudMan is used to spin up a dedicated Galaxy virtual cluster for a single user or small group on Amazon EC2. This document outlines methods for deploying customized Galaxy instances that are more suitable for production environments - these enhancements include enabling load balancing, external authentication, SSL, reporting, and taking advantage of external resources such as dedicated file and database servers and even compute resources outside of the cloud.

Configuring Galaxy Tools

Splitting Galaxy into Multiple Processes

CloudMan's default configuration launches just one Galaxy process. To really scale up the number of concurrent users Galaxy can handle it must be split into multiple processes as outlined and documented here <http://wiki.galaxyproject.org/Admin/Config/Performance/Web%20Application%20Scaling>.

galaxy_conf_dir: /opt/galaxy/web/conf.d

configure_multiple_galaxy_processes: True web_thread_count: 2 handler_thread_count: 2

When these options are enabled, CloudMan will rewrite the body of the upstream galaxy_app {...} to load balance web traffic accross the number of web threads you specify. However, when using multiple Galaxy processes job admin functionality needs to be routed to Galaxy's job manager, this can be done by updating your nginx.conf file to add the following location /admin/jobs subsection shown below.

location / {

...

location /admin/jobs {
proxy_pass http://localhost:8079;

}

}

External Authentication (LDAP)

Galaxy requires a proxy web server (in our nginx) to enable external authentication. See TODO:link_nginx for an example of how to enable such authentication.

Galaxy also must be configured to use the information that will be passed through by nginx. The method I have implemented for doing this, involves telling CloudMan to use a Galaxy configuration directory (TODO:link_to_pull_request) and then set the Galaxy options use_remote_user, remote_user_maildomain, remote_user_logout_href, and require_login appropriately.

galaxy_conf_dir: /opt/galaxy/web/conf.d galaxy_universe_use_remote_user: True galaxy_universe_remote_user_maildomain: <domain_name (e.g. example.org)> galaxy_universe_remote_user_logout_href: https://logout@<galaxy_url>/ galaxy_universe_require_login: True

If you are using external authentication in this mannor it is also likely a good idea to enable SSL.

Enable SSL

Your cloud security group will likely block port 443 by default. This must be opened.

If you are using Amazon EC2, when following the instructions on the CloudMan <http://wiki.galaxyproject.org/CloudMan> wiki site, be sure to add the HTTPS inbound rule in addition to the HTTP one mentioned.

Instructions for opening this port on private clouds will vary, the following command for instance will open it for OpenStack.

nova secgroup-add-rule <security_group> tcp 443 443 0.0.0.0/0

CloudMan will need to setup the desired SSL key and cert before nginx starts up. The CloudMan augmentations including

conf_files:
  • path: /usr/nginx/conf/key content: <base64 encoding of key>
  • path: /usr/nginx/conf/cert content: <base64 encoding of cert>

Reports Server

The Galaxy reports webapp is a small webapp that runs in parallel to Galaxy and provides a wealth of valuable data on every job that Galaxy has run as well as disk usage accounting, etc....

CloudMan can now enable the reports application by simply adding it to the list of services.

services:
  • name: Galaxy
  • name: GalaxyReports
  • name: Postgres

External Postgres Server

When deploying to Amazon, running a Postgres server right on the CloudMan/Galaxy head node makes a lot of sense. But for private cloud deployments, many institutions may already have well optimized, well maintained production Postgres servers.

To disable the Postgres server, simply manually specify the list of services CloudMan should start and exclude Postgres. For instance:

services:
  • name: Galaxy
  • name: GalaxyReports

Then Galaxy must simply be configured to use your external postgres server, this can be done by passing it in via the userdata variable galaxy_universe_database_connection.

galaxy_universe_database_connection: postgres://user:password@host:port/schema

Running Jobs on External Compute Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment