Skip to content

Instantly share code, notes, and snippets.

@TorHou
Last active August 29, 2015 14:17
Show Gist options
  • Save TorHou/b4ee6890442c5c3d479d to your computer and use it in GitHub Desktop.
Save TorHou/b4ee6890442c5c3d479d to your computer and use it in GitHub Desktop.
Small example server to communicate with a Galaxy Instance
<?xml version="1.0"?>
<!--
If the value of 'URL_method' is 'get', the request will consist of the value of 'URL' coming back in
the initial response. If value of 'URL_method' is 'post', any additional params coming back in the
initial response ( in addition to 'URL' ) will be encoded and appended to URL and a post will be performed.
-->
<tool name="Cherry Py Synchronous" id="cherrypy" tool_type="data_source">
<description>test</description>
<command interpreter="python">data_source.py $output $__app__.config.output_size_limit</command>
<inputs action="http://localhost:8090/getdata" check_values="false" method="post">
<display>go to cherrypy server $GALAXY_URL</display>
<param name="GALAXY_URL" type="baseurl" value="/tool_runner" />
<param name="tool_id" type="hidden" value="cherrypy" />
<param name="sendToGalaxy" type="hidden" value="1" />
<param name="hgta_compressType" type="hidden" value="none" />
<param name="hgta_outputType" type="hidden" value="tabular" />
</inputs>
<request_param_translation>
<request_param galaxy_name="URL_method" remote_name="URL_method" missing="post" />
<request_param galaxy_name="URL" remote_name="URL" missing="" />
<request_param galaxy_name="data_type" remote_name="outputType" missing="auto" >
<value_translation>
<value galaxy_value="tabular" remote_value="table" />
</value_translation>
</request_param>
</request_param_translation>
<uihints minwidth="800"/>
<outputs>
<data name="output" format="tabular" label="${tool.name}"/>
</outputs>
<options sanitize="False" refresh="True"/>
</tool>
<?xml version="1.0"?>
<!--
If the value of 'URL_method' is 'get', the request will consist of the value of 'URL' coming back in
the initial response. If value of 'URL_method' is 'post', any additional params coming back in the
initial response ( in addition to 'URL' ) will be encoded and appended to URL and a post will be performed.
-->
<tool name="Cherry Py Async" id="cherrypy_async" tool_type="data_source">
<description>test</description>
<command interpreter="python">data_source.py $output $__app__.config.output_size_limit</command>
<inputs action="http://localhost:8090/getdata_async" check_values="false" method="get">
<display>go to cherrypy server $GALAXY_URL</display>
<param name="GALAXY_URL" type="baseurl" value="/async/cherrypy_async" />
<param name="tool_id" type="hidden" value="cherrypy_async" />
<param name="sendToGalaxy" type="hidden" value="1" />
</inputs>
<request_param_translation>
<request_param galaxy_name="URL_method" remote_name="URL_method" missing="get" />
<request_param galaxy_name="URL" remote_name="URL" missing="" />
<request_param galaxy_name="data_type" remote_name="outputType" missing="auto" >
<value_translation>
<value galaxy_value="tabular" remote_value="table" />
</value_translation>
</request_param>
</request_param_translation>
<uihints minwidth="800"/>
<outputs>
<data name="output" format="tabular" label="${tool.name}"/>
</outputs>
<options sanitize="False" refresh="True"/>
</tool>

#Overview

This is a small example for the interaction of Galaxy with an external data source.

As the external source I chose CherryPy which is a python module for a minimalistic web server. CherryPy takes away some of the work which would be needed starting from SimpleHTTPServer.

One has to make the choice of whether the data should be retrieved synchronously if the data can be provided quickly, or asynchronously if the server needs some time to collect the data. This decision has to be made in the data_source configuration file and can't be made depending on the query (AFAIK).

Synchronous Communication

###Step 1. Galaxy to CherryPy In Galaxy the external data source is implemented as a tool with a specific type, namely: data_source. A lot of information on data sources can be found on the Galaxy help pages. The main two parameters which are sent from Galaxy to the external data source (EDS) are the sendToGalaxy and the GALAXY_URL parameters. If sendToGalaxy has the value 1 then that should tell the EDS that the incoming traffic is to be sent to a Galaxy instance and not to a normal user. The value of GALAXY_URL tells the EDS where to send the result (or where to send a query). On the Galaxy side these parameters have to be set in the xml of the tool file.

As you can see in the example (cherrypy.xml) the tool goes to the url http://localhost:8090/getdata with the arguments ?sendToGalaxy=1&GALAXY_URL=...localhost:8080... and some others.

###Step 2. CherryPy to Galaxy The idea is that the server now provides a form specifically for use within Galaxy. This can be an adapted version of a form provided on the website, it just needs to send the result to Galaxy and add some special parameters. If the results have to be calculated or if providing the results takes some time there is a way to make Galaxy idle and query a specific URL until the results are finished (I didn't test this yet).

  1. The cherrypy server getdata function distinguishes between a query coming from a "normal" user and a Galaxy instance.
  2. It provides a form where the action points towards the GALAXY_URL and should have a few (hidden) parameters set
  3. non-hidden length parameter (Galaxy will put this parameter in the final query see point 3)
  4. hidden tool_id which it got from Galaxy
  5. hidden output type, note that you can specify the type however you want. The type is translated by the wrapper to a Galaxy conforming type
  6. hidden URL, this is the URL where Galaxy will send the next/final query (point 3). Again you can name this parameter whatever you want, the xml-wrapper of Galaxy (aka the author of the wrapper) takes care of the translation
  7. you can specify the HTTP method with which Galaxy should query it the result in the last step, if you don't do that the author of the wrapper can specify one (see URL_method in cherrypy.xml)

###Step 3. Galaxy to CherryPy Galaxy will then send the final query back to the CherryPy server. It queries the "generate" method with the length parameter and loads the resulting dataset in the users history.

##Asynchronous Communication

There are some differences to the synchronous communication.

In step 2 Galaxy does not care about the URL parameter, it will send another GET to the same URL as before.

Like in step 3 the Galaxy server will contact the CherryPy server again only this time the GALAXY_URL is different and the parameter data_id is added. It expects the server to answer with a message that ends in "OK". The server should remember this new GALAXY_URL and start the process of retrieving the data.

###Step 4. CherryPy is done with the data Cherrypy has finished preparing the data and has put it in a certain location. Now Cherrypy contacts Galaxy at the given/second GALAXY_URL and sends the URL parameter to indicate where the resource can be downloaded and adds "STATUS=OK" to the parameters so that Galaxy knows, that everything went fine.

###Step 5. Galaxy requests the file from the given URL CherryPy has to serve the file and Galaxy will download it and add it to the users history.

#!/usr/bin/env python
import requests
import random
import string
import urllib
from argparse import ArgumentParser
from time import sleep
parser = ArgumentParser()
parser.add_argument("-g", "--galaxyurl", required=True)
parser.add_argument("-l", "--length", required=True)
args = parser.parse_args()
#print args.galaxyurl
# Simulation of some sophisticated method to generate the data
sleep(5)
data="".join(random.sample(string.hexdigits, int(args.length)))
# Write the file and send the URL to that file back to Galaxy
with open("workfile.tmp",'w+') as f:
f.write(data)
# Note that the handler is "download" which is defined in the CherryPy file
ans = requests.get(args.galaxyurl, params={"STATUS":"OK", "URL":"http://localhost:8090/download?filepath=/home/thouwaar/Projects/datasources/CherryPy/workfile.tmp"})
#print ans.text
import string
import random
import cherrypy
import urllib
import os
import requests
import subprocess
from cherrypy.lib.static import serve_file
class StringGenerator(object):
# Default behaviour
@cherrypy.expose
def index(self):
return "Hello World!"
# This method generates the data "on the fly" such that Galaxy can retrieve it immediately
# for synchronous communication
@cherrypy.expose
def generate(self, length=2, **params):
cherrypy.response.headers['Content-Type']= 'text/plain'
return ''.join(random.sample(string.hexdigits, int(length)))
# The method necessary for the exchange of data with Galaxy synchronously
@cherrypy.expose
def getdata(self, sendToGalaxy=0, GALAXY_URL="", hgta_compressType="none", tool_id="none", hgta_outputType="tabular"):
if int(sendToGalaxy) == 1:
print "Tool_id: " + tool_id
returnString= """<html>
<head></head>
<body>
<form method="get"
"""
returnString += " action=\""+GALAXY_URL+"\">"
returnString += """
<input type="text" value="8" name="length" />
<input type="HIDDEN" value="""
returnString += "\""+tool_id+"\""
# Note the URL parameter, this is where we tell Galaxy to get the data from
returnString += """ name="tool_id">
<input type="HIDDEN" value="table" name="outputType">
<input type="HIDDEN" value="http://localhost:8090/generate" name="URL">
<button type="submit">Send result to galaxy!</button>
</form>
"""
returnString += "Sending results to Galaxy at: " + GALAXY_URL
return returnString
else:
return "Just returning results. (GALAXY_URL: " + GALAXY_URL + ")"
# The method necessary for the exchange of data with Galaxy asynchronously
@cherrypy.expose
def getdata_async(self, length=8, sendToGalaxy=0, GALAXY_URL="", hgta_compressType="none", tool_id="none", hgta_outputType="tabular", data_id=-1, outputType="table"):
# if a data_id is sent by Galaxy we know that the Galaxy URL is the final adress
# where the results should be sent to
if int(data_id) != -1:
# we need to fork here and generate the data
# remembering the Galaxy_url
subprocess.Popen(["python", "generate_data_async.py", "-g" ,str(GALAXY_URL) , "-l" , str(length)] )
# we answer OK to this GET and Galaxy will start continuously checking
# if we sent the results yet
return "OK"
# this handles the initial request by Galaxy
# like in the synchronous case here we specify hidden parameters
# Note the lack of the "URL" parameter, this makes it necessary
# that "getdata_async" handles both the initial request and the second
# request which sends a data_id
# ... Maybe this will be fixed/changed ...
elif int(sendToGalaxy) == 1:
returnString= """<html>
<head></head>
<body>
<form method="get"
"""
returnString += "action=\""+GALAXY_URL+"\">"
returnString += """
<input type="HIDDEN" value="""
returnString += "\""+tool_id+"\""
returnString += """ name="tool_id">
<input type="text" value="8" name="length" />
<input type="HIDDEN" value="table" name="outputType">
<button type="submit">Send result to galaxy!</button>
</form>
"""
returnString += "Sending results to Galaxy at: " + GALAXY_URL
return returnString
# Not coming from Galaxy
else:
return "Just returning results. (GALAXY_URL: " + GALAXY_URL + ")"
# Function necessary for the download of target data which
# takes some time to retrieve
@cherrypy.expose
def download(self, filepath):
return serve_file(filepath, "application/x-download", "attachment")
if __name__ == '__main__':
cherrypy.config.update({'server.socket_port': 8090 })
cherrypy.quickstart(StringGenerator())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment