Skip to content

Instantly share code, notes, and snippets.

@yochannah
Last active April 21, 2017 11:04
Show Gist options
  • Select an option

  • Save yochannah/0e0d8e00235b0b5c4ee228f20fb2d96c to your computer and use it in GitHub Desktop.

Select an option

Save yochannah/0e0d8e00235b0b5c4ee228f20fb2d96c to your computer and use it in GitHub Desktop.
Aequatus and the Galaxy workflow workshop at BiVi2017

Setup

  • Imported basic setup from an .ova file so an entire galaxy is set up on your machine in a VM.
  • used two post-it note system to indicate if you're done (green) or need help (pink).

Intro to galaxy slides.

Galaxy is a workflow system that preserves history with many configurable tools - includes a galaxy "toolshed" to allow admins to install additional tools. https://toolshed.g2.bx.psu.edu/. I liked that you can keep an old version of the tool alongside a new version of the tool at once!

  • State in galaxy workflow is indicated with colour in the workflow history bar on the right.
  • History steps can be hidden if they seem uninteresting.
  • New histories are started manually, not automatically, and can be named to keep analyses clear and separate.
  • A workflow can be re-run on different input datasets
  • Galaxy has dataset visualisation, including genome browsers, charts, 3d pdb vis, etc.
  • New plugins developer info here: https://github.com/galaxyproject/training-material/tree/master/Dev-Corner
  • Can run jobs on clusters and execute analyses of large jobs in parallel - designed to be scalable.

Sharing

  • Jupyter, R studio, others
  • easily shared datasets & good for creating supplementary materials for publication

Into the VM

  • "galaxy tours" UI tour showed us how to upload data and add a tool to a workflow.
  • Easy to upload example files from a URL, typed data, or direct upload.
  • Workflows can be renamed and edited as can every step within a workflow.
  • Re-running the tool will re-run the workflow with new parameters and needs to be initialised manually.
  • the "delete" button doens't permanently delete items - just adds them to a trash bin. This will be purged eventually depending on your galaxy setup, but gives you some grace time to recover things if needed.
  • reproducibility information - metadata about the workflow - is available in the workflow panel on the right by clicking on the "i" icon for an expanded workflow.
  • galaxy scratchbook allows you to create analyses with side-by-side windowed mode. Nice.

GeneSeqToFamily presentation

The Ensembl genetree pipeline uses various tools (BLAST, T-Coffee, others) to generate information about gene families and protein families. Generally requires programming knowledge to use. The GeneSeqToFamily Galaxy tool was created to make identifying gene families easier.

In the VM

We used three sample files with species info, CDS, and JSON, then ran GeneSeqToFamily preparation on the files.

  • we had to change the datatype of the .nhx species file in the right hand pane using the pencil icon. Just set it to .nhx
  • now open the workflow Geneseqtofamily tool. (Not prep this time)
  • go to NCBI BLAST + BLASTp and change max hits to show to 4. (It won't look like it saved but apparently it did!)
  • Press run workflow. This took a while on my VM with relatively limited resources.
  • when everything turned green, I went to the final step "Gene align and family aggregator on data", expanded the step, and clicked on the bar chart button to get the Aequatus visualisation.
  • you can also examine the other green data result steps on the right to get info about the data steps that led you here. Provenance!

Exploring the graph

  • use the magnifying glass on the left to see the 4 gene families in the results. Unfortunately they're numbered, not named.
  • exons that are similar are coloured similarly across organisms
  • a bit hard to see what's going on on my macbook pro screen - might be better in a large monitor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment