Skip to content

Instantly share code, notes, and snippets.

@fomightez
Last active July 16, 2018 17:54
Show Gist options
  • Save fomightez/4aae844937a4aa6e22f555757a81b201 to your computer and use it in GitHub Desktop.
Save fomightez/4aae844937a4aa6e22f555757a81b201 to your computer and use it in GitHub Desktop.
Expanded tweet reply about using InterMine and Binder-served Jupyter notebooks...

This is a follow-up on an exchange started here. My extended reply follows...

Main use

Typically I am using the code that had been adapted from the templates into full-blown scripts to perform steps in the data analysis with the notebook as as way to perform the steps and document and package everything together.

Usually I am doing this with notebooks that either aren't ready to be shared, and so I just upload them to work on them as I develop an analysis.

If the scripts were something I already shared I can even just use the %load magic to pull it in directly from elsewhere on Github. If I need to run it with command line parameters I can write it to the Binder instance, or upload it, and then call it with %run and the provide the argruments in the script call.

The one demo I can share right now is a little convoluted and not really an example of what I just described. Plus, it wasn't set up for teaching the 'process'. It was made to generate mock data for documenting a series of a few useful analysis steps that I had put together in another notebook using real data. That full analyses and the real data isn't ready to be shared.
Those caveats aside, this demo does clearly illustrate using the MyBinder.org system with InterMine. You can open the active notebook via Binder here. Alternatively, you can view the boring, static notebook here.

I have been meaning to move some of my Yeastmine-utilizing scripts, as well as others, to feature active notebooks that help serve to complement the documentation that they have on Github. Something along those lines might serve better as a demonstration of using InterMine and Binder-served Jupyter notebooks together.

Other possible uses

  • developing code that uses InterMine

Besides what we dicussed, using InterMine via Binder is great for developing the code and functions that will be used. You can easily test out all sorts of things. It is great for adjusting the code you can get directly from the templates at YeatMine in an interactive way.

  • sharing code or analyses that utilize InterMine, either with collaborators or the public

  • training - Using the Binder-system you can share the code to do something via InterMine in a way that empowers others to extend it and adapt it to their specific needs beyond what you can do via the browser interface.

Advantages

Provides:

  • portability

  • much greater reproducibility

Expanding on that...

Frees you from your local computer because you just need a browser. (Technically, you can even use tablet or smart phone once things are set up.)

Also, you can be free of dealing with Python versions and installations or your local machine/or machines and know it is more reproducible, especially if lock in certain versions of dependencies.
I don't need to be concerned that something I do is going to mess up something else I had working on my local machine. Or that when I update my operating system on my local machine, stuff is going to stop working.

Plus Jupyter works with other languages, and so I know Binder does other languages too. (I am unsure how many of them overlap with the code InterMine can make.)

Beyond portability, you also get much better reproducibility. Avoids the problem of ,"Well it works on my computer."

Possible Speedbumps

It is important to keep in mind that the running instance is ephemeral and there is no warranty and no net. Save back to local often if developing something.

Not for gig files and big/long compute.

Not going to meet everyones security needs. The documentation warns, "You shouldn’t do anything on mybinder.org that you wouldn’t mind sharing with the world!"

Some would argue Jupyter notebooks, and by extension running them on Binder, aren't the answer for real 'Big' data or meet all the needs for open data: see a related discussion here and here.

Need to be online for access.

How I came to using this strategy

PythonAnywhere had been my favored place for developing and running scripts that used InterMine/YeastMine.
To be portable, I had been using PythonAnywhere for developing scripts and running them via the command line but my free account involves a proxy and urllib seems to have a problem with proxies right now and so free accounts aren't connecting to InterMine right at this time.

Now the reproducibility factor the Binder environment provides, is making me favor it for certain aspects.

Resources

Introducing Binder 2.0 — share your interactive research environment - November 30, 2017 | eLife

Binder 2.0 announcement by Berkeley Institute for Data Science

Build a binder workshops

Binder group featured on Google Cloud Platform podcast

Nature article:
Make your data dance with interactive visualization tools: TOOLBOX 30 JANUARY 2018 by Jeffrey M. Perkel, features Binder along with several other tools.

Good summary list of resources for getting started with Binder can be found in this twitter thread.

Examples of Working Binder Implementations

Binder 2.0, a Tech Guide

Binder / mybinder.org Documentation

Gathering place for developers and users

Binder's vision, the technology behind it, and where it's going next from the Binder 2.0 talk @ scipy2018 conference. Slides available here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment