TorHou/cherrypy.xml

## cherrypy.xml
<?xml version="1.0"?>
<!--
    If the value of 'URL_method' is 'get', the request will consist of the value of 'URL' coming back in
    the initial response.  If value of 'URL_method' is 'post', any additional params coming back in the
    initial response ( in addition to 'URL' ) will be encoded and appended to URL and a post will be performed.
-->
<tool name="Cherry Py Synchronous" id="cherrypy" tool_type="data_source">
    <description>test</description>
    <command interpreter="python">data_source.py $output $__app__.config.output_size_limit</command>
    <inputs action="http://localhost:8090/getdata" check_values="false" method="post">
        <display>go to cherrypy server $GALAXY_URL</display>
        <param name="GALAXY_URL" type="baseurl" value="/tool_runner" />
        <param name="tool_id" type="hidden" value="cherrypy" />
        <param name="sendToGalaxy" type="hidden" value="1" />
        <param name="hgta_compressType" type="hidden" value="none" />
        <param name="hgta_outputType" type="hidden" value="tabular" />
    </inputs>
    <request_param_translation>
        <request_param galaxy_name="URL_method" remote_name="URL_method" missing="post" />
        <request_param galaxy_name="URL" remote_name="URL" missing="" />
        <request_param galaxy_name="data_type" remote_name="outputType" missing="auto" >
            <value_translation>
                <value galaxy_value="tabular" remote_value="table" />
            </value_translation>
        </request_param>
    </request_param_translation>
    <uihints minwidth="800"/>
    <outputs>
        <data name="output" format="tabular" label="${tool.name}"/>
    </outputs>
    <options sanitize="False" refresh="True"/>
</tool>

## cherrypy_async.xml
<?xml version="1.0"?>
<!--
    If the value of 'URL_method' is 'get', the request will consist of the value of 'URL' coming back in
    the initial response.  If value of 'URL_method' is 'post', any additional params coming back in the
    initial response ( in addition to 'URL' ) will be encoded and appended to URL and a post will be performed.
-->
<tool name="Cherry Py Async" id="cherrypy_async" tool_type="data_source">
    <description>test</description>
    <command interpreter="python">data_source.py $output $__app__.config.output_size_limit</command>
    <inputs action="http://localhost:8090/getdata_async" check_values="false" method="get">
        <display>go to cherrypy server $GALAXY_URL</display>
        <param name="GALAXY_URL" type="baseurl" value="/async/cherrypy_async" />
        <param name="tool_id" type="hidden" value="cherrypy_async" />
        <param name="sendToGalaxy" type="hidden" value="1" />
    </inputs>
    <request_param_translation>
        <request_param galaxy_name="URL_method" remote_name="URL_method" missing="get" />
        <request_param galaxy_name="URL" remote_name="URL" missing="" />
        <request_param galaxy_name="data_type" remote_name="outputType" missing="auto" >
            <value_translation>
                <value galaxy_value="tabular" remote_value="table" />
            </value_translation>
        </request_param>
    </request_param_translation>
    <uihints minwidth="800"/>
    <outputs>
        <data name="output" format="tabular" label="${tool.name}"/>
    </outputs>
    <options sanitize="False" refresh="True"/>
</tool>

## description.md

      
    Raw
  

              description.md
            
          
    #Overview
This is a small example for the interaction of Galaxy with an external data source.
As the external source I chose CherryPy which is a python module for a minimalistic web server. CherryPy takes away some of the work which would be needed starting from SimpleHTTPServer.
One has to make the choice of whether the data should be retrieved synchronously if the data can be provided quickly, or asynchronously if the server needs some time to collect the data. This decision has to be made in the data_source configuration file and can't be made depending on the query (AFAIK).
Synchronous Communication

###Step 1. Galaxy to CherryPy
In Galaxy the external data source is implemented as a tool with a specific type, namely: data_source. A lot of information on data sources can be found on the Galaxy help pages.
The main two parameters which are sent from Galaxy to the external data source (EDS) are the sendToGalaxy and the GALAXY_URL parameters. If sendToGalaxy has the value 1 then that should tell the EDS that the incoming traffic is to be sent to a Galaxy instance and not to a normal user. The value of GALAXY_URL tells the EDS where to send the result (or where to send a query). On the Galaxy side these parameters have to be set in the xml of the tool file.
As you can see in the example (cherrypy.xml) the tool goes to the url http://localhost:8090/getdata with the arguments ?sendToGalaxy=1&GALAXY_URL=...localhost:8080... and some others.
###Step 2. CherryPy to Galaxy
The idea is that the server now provides a form specifically for use within Galaxy. This can be an adapted version of a form provided on the website, it just needs to send the result to Galaxy and add some special parameters. If the results have to be calculated or if providing the results takes some time there is a way to make Galaxy idle and query a specific URL until the results are finished (I didn't test this yet).

The cherrypy server getdata function distinguishes between a query coming from a "normal" user and a Galaxy instance.
It provides a form where the action points towards the GALAXY_URL and should have a few (hidden) parameters set
non-hidden length parameter (Galaxy will put this parameter in the final query see point 3)
hidden tool_id which it got from Galaxy
hidden output type, note that you can specify the type however you want. The type is translated by the wrapper to a Galaxy conforming type
hidden URL, this is the URL where Galaxy will send the next/final query (point 3). Again you can name this parameter whatever you want, the xml-wrapper of Galaxy (aka the author of the wrapper) takes care of the translation
you can specify the HTTP method with which Galaxy should query it the result in the last step, if you don't do that the author of the wrapper can specify one (see URL_method in cherrypy.xml)

###Step 3. Galaxy to CherryPy
Galaxy will then send the final query back to the CherryPy server. It queries the "generate" method with the length parameter and loads the resulting dataset in the users history.
##Asynchronous Communication
There are some differences to the synchronous communication.
In step 2 Galaxy does not care about the URL parameter, it will send another GET to the same URL as before.
Like in step 3 the Galaxy server will contact the CherryPy server again only this time the GALAXY_URL is different and the parameter data_id is added. It expects the server to answer with a message that ends in "OK". The server should remember this new GALAXY_URL and start the process of retrieving the data.
###Step 4. CherryPy is done with the data
Cherrypy has finished preparing the data and has put it in a certain location. Now Cherrypy contacts Galaxy at the given/second GALAXY_URL and sends the URL parameter to indicate where the resource can be downloaded and adds "STATUS=OK" to the parameters so that Galaxy knows, that everything went fine.
###Step 5. Galaxy requests the file from the given URL
CherryPy has to serve the file and Galaxy will download it and add it to the users history.

  
## generate_data_async
#!/usr/bin/env python
import requests
import random
import string
import urllib
from argparse import ArgumentParser
from time import sleep

parser = ArgumentParser()
parser.add_argument("-g", "--galaxyurl", required=True)
parser.add_argument("-l", "--length", required=True)
args = parser.parse_args()
#print args.galaxyurl

# Simulation of some sophisticated method to generate the data
sleep(5)
data="".join(random.sample(string.hexdigits, int(args.length)))

# Write the file and send the URL to that file back to Galaxy
with open("workfile.tmp",'w+') as f:
    f.write(data)
# Note that the handler is "download" which is defined in the CherryPy file
ans = requests.get(args.galaxyurl, params={"STATUS":"OK", "URL":"http://localhost:8090/download?filepath=/home/thouwaar/Projects/datasources/CherryPy/workfile.tmp"})

#print ans.text

## server.py
import string
import random
import cherrypy
import urllib
import os
import requests
import subprocess
from cherrypy.lib.static import serve_file


class StringGenerator(object):

    # Default behaviour
    @cherrypy.expose
    def index(self):
        return "Hello World!"

    # This method generates the data "on the fly" such that Galaxy can retrieve it immediately
    # for synchronous communication
    @cherrypy.expose
    def generate(self, length=2, **params):
        cherrypy.response.headers['Content-Type']= 'text/plain'
        return ''.join(random.sample(string.hexdigits, int(length)))

    # The method necessary for the exchange of data with Galaxy synchronously
    @cherrypy.expose
    def getdata(self, sendToGalaxy=0, GALAXY_URL="", hgta_compressType="none", tool_id="none", hgta_outputType="tabular"):
        if int(sendToGalaxy) == 1:
            print "Tool_id: " + tool_id
            returnString= """<html>
                      <head></head>
                      <body>
                        <form method="get"
                        """
            returnString += " action=\""+GALAXY_URL+"\">"
            returnString += """
                            <input type="text" value="8" name="length" />
                            <input type="HIDDEN" value="""
            returnString += "\""+tool_id+"\""
            # Note the URL parameter, this is where we tell Galaxy to get the data from
            returnString += """ name="tool_id">
                            <input type="HIDDEN" value="table" name="outputType">
                            <input type="HIDDEN" value="http://localhost:8090/generate" name="URL">
                            <button type="submit">Send result to galaxy!</button>
                        </form>
                        """
            returnString += "Sending results to Galaxy at: " + GALAXY_URL
            return returnString
        else:
            return "Just returning results. (GALAXY_URL: " + GALAXY_URL + ")"

    # The method necessary for the exchange of data with Galaxy asynchronously
    @cherrypy.expose
    def getdata_async(self, length=8, sendToGalaxy=0, GALAXY_URL="", hgta_compressType="none", tool_id="none", hgta_outputType="tabular", data_id=-1, outputType="table"):
        # if a data_id is sent by Galaxy we know that the Galaxy URL is the final adress
        # where the results should be sent to
        if int(data_id) != -1:
            # we need to fork here and generate the data
            # remembering the Galaxy_url
            subprocess.Popen(["python", "generate_data_async.py", "-g" ,str(GALAXY_URL) , "-l" , str(length)] )
            # we answer OK to this GET and Galaxy will start continuously checking
            # if we sent the results yet
            return "OK"

        # this handles the initial request by Galaxy
        # like in the synchronous case here we specify hidden parameters
        # Note the lack of the "URL" parameter, this makes it necessary
        # that "getdata_async" handles both the initial request and the second
        # request which sends a data_id
        # ... Maybe this will be fixed/changed ...
        elif int(sendToGalaxy) == 1:
            returnString= """<html>
                      <head></head>
                      <body>
                        <form method="get"
                        """
            returnString += "action=\""+GALAXY_URL+"\">"
            returnString += """
                            <input type="HIDDEN" value="""
            returnString += "\""+tool_id+"\""
            returnString += """ name="tool_id">
                            <input type="text" value="8" name="length" />
                            <input type="HIDDEN" value="table" name="outputType">
                            <button type="submit">Send result to galaxy!</button>
                        </form>
                        """
            returnString += "Sending results to Galaxy at: " + GALAXY_URL
            return returnString
        # Not coming from Galaxy
        else:
            return "Just returning results. (GALAXY_URL: " + GALAXY_URL + ")"

    # Function necessary for the download of target data which
    # takes some time to retrieve
    @cherrypy.expose
    def download(self, filepath):
        return serve_file(filepath, "application/x-download", "attachment")

if __name__ == '__main__':
    cherrypy.config.update({'server.socket_port': 8090 })
    cherrypy.quickstart(StringGenerator())
	<?xml version="1.0"?>
	<!--
	If the value of 'URL_method' is 'get', the request will consist of the value of 'URL' coming back in
	the initial response. If value of 'URL_method' is 'post', any additional params coming back in the
	initial response ( in addition to 'URL' ) will be encoded and appended to URL and a post will be performed.
	-->
	<tool name="Cherry Py Synchronous" id="cherrypy" tool_type="data_source">
	<description>test</description>
	<command interpreter="python">data_source.py $output $__app__.config.output_size_limit</command>
	<inputs action="http://localhost:8090/getdata" check_values="false" method="post">
	<display>go to cherrypy server $GALAXY_URL</display>
	<param name="GALAXY_URL" type="baseurl" value="/tool_runner" />
	<param name="tool_id" type="hidden" value="cherrypy" />
	<param name="sendToGalaxy" type="hidden" value="1" />
	<param name="hgta_compressType" type="hidden" value="none" />
	<param name="hgta_outputType" type="hidden" value="tabular" />
	</inputs>
	<request_param_translation>
	<request_param galaxy_name="URL_method" remote_name="URL_method" missing="post" />
	<request_param galaxy_name="URL" remote_name="URL" missing="" />
	<request_param galaxy_name="data_type" remote_name="outputType" missing="auto" >
	<value_translation>
	<value galaxy_value="tabular" remote_value="table" />
	</value_translation>
	</request_param>
	</request_param_translation>
	<uihints minwidth="800"/>
	<outputs>
	<data name="output" format="tabular" label="${tool.name}"/>
	</outputs>
	<options sanitize="False" refresh="True"/>
	</tool>
	#!/usr/bin/env python
	import requests
	import random
	import string
	import urllib
	from argparse import ArgumentParser
	from time import sleep

	parser = ArgumentParser()
	parser.add_argument("-g", "--galaxyurl", required=True)
	parser.add_argument("-l", "--length", required=True)
	args = parser.parse_args()
	#print args.galaxyurl

	# Simulation of some sophisticated method to generate the data
	sleep(5)
	data="".join(random.sample(string.hexdigits, int(args.length)))

	# Write the file and send the URL to that file back to Galaxy
	with open("workfile.tmp",'w+') as f:
	f.write(data)
	# Note that the handler is "download" which is defined in the CherryPy file
	ans = requests.get(args.galaxyurl, params={"STATUS":"OK", "URL":"http://localhost:8090/download?filepath=/home/thouwaar/Projects/datasources/CherryPy/workfile.tmp"})

	#print ans.text
	import string
	import random
	import cherrypy
	import urllib
	import os
	import requests
	import subprocess
	from cherrypy.lib.static import serve_file


	class StringGenerator(object):

	# Default behaviour
	@cherrypy.expose
	def index(self):
	return "Hello World!"

	# This method generates the data "on the fly" such that Galaxy can retrieve it immediately
	# for synchronous communication
	@cherrypy.expose
	def generate(self, length=2, **params):
	cherrypy.response.headers['Content-Type']= 'text/plain'
	return ''.join(random.sample(string.hexdigits, int(length)))

	# The method necessary for the exchange of data with Galaxy synchronously
	@cherrypy.expose
	def getdata(self, sendToGalaxy=0, GALAXY_URL="", hgta_compressType="none", tool_id="none", hgta_outputType="tabular"):
	if int(sendToGalaxy) == 1:
	print "Tool_id: " + tool_id
	returnString= """<html>
	<head></head>
	<body>
	<form method="get"
	"""
	returnString += " action=\""+GALAXY_URL+"\">"
	returnString += """
	<input type="text" value="8" name="length" />
	<input type="HIDDEN" value="""
	returnString += "\""+tool_id+"\""
	# Note the URL parameter, this is where we tell Galaxy to get the data from
	returnString += """ name="tool_id">
	<input type="HIDDEN" value="table" name="outputType">
	<input type="HIDDEN" value="http://localhost:8090/generate" name="URL">
	<button type="submit">Send result to galaxy!</button>
	</form>
	"""
	returnString += "Sending results to Galaxy at: " + GALAXY_URL
	return returnString
	else:
	return "Just returning results. (GALAXY_URL: " + GALAXY_URL + ")"

	# The method necessary for the exchange of data with Galaxy asynchronously
	@cherrypy.expose
	def getdata_async(self, length=8, sendToGalaxy=0, GALAXY_URL="", hgta_compressType="none", tool_id="none", hgta_outputType="tabular", data_id=-1, outputType="table"):
	# if a data_id is sent by Galaxy we know that the Galaxy URL is the final adress
	# where the results should be sent to
	if int(data_id) != -1:
	# we need to fork here and generate the data
	# remembering the Galaxy_url
	subprocess.Popen(["python", "generate_data_async.py", "-g" ,str(GALAXY_URL) , "-l" , str(length)] )
	# we answer OK to this GET and Galaxy will start continuously checking
	# if we sent the results yet
	return "OK"

	# this handles the initial request by Galaxy
	# like in the synchronous case here we specify hidden parameters
	# Note the lack of the "URL" parameter, this makes it necessary
	# that "getdata_async" handles both the initial request and the second
	# request which sends a data_id
	# ... Maybe this will be fixed/changed ...
	elif int(sendToGalaxy) == 1:
	returnString= """<html>
	<head></head>
	<body>
	<form method="get"
	"""
	returnString += "action=\""+GALAXY_URL+"\">"
	returnString += """
	<input type="HIDDEN" value="""
	returnString += "\""+tool_id+"\""
	returnString += """ name="tool_id">
	<input type="text" value="8" name="length" />
	<input type="HIDDEN" value="table" name="outputType">
	<button type="submit">Send result to galaxy!</button>
	</form>
	"""
	returnString += "Sending results to Galaxy at: " + GALAXY_URL
	return returnString
	# Not coming from Galaxy
	else:
	return "Just returning results. (GALAXY_URL: " + GALAXY_URL + ")"

	# Function necessary for the download of target data which
	# takes some time to retrieve
	@cherrypy.expose
	def download(self, filepath):
	return serve_file(filepath, "application/x-download", "attachment")

	if __name__ == '__main__':
	cherrypy.config.update({'server.socket_port': 8090 })
	cherrypy.quickstart(StringGenerator())