Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save daletovar/4505df9323634e7a87396deeefe0db06 to your computer and use it in GitHub Desktop.
Save daletovar/4505df9323634e7a87396deeefe0db06 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "0947cf4b",
"metadata": {},
"source": [
"# What is `mybinder`?\n",
"\n",
"\n",
"[`mybinder`] is a service provided by the open source `jupyter` community that generates on-demand ☁️ compute instances, from common content providers, it replaced the former `tmpnb` service. A binder can support a variety of computational environments with [**Ju**lia][julia], [**Pyt**hon][python] and [**R**][r], or more generally an any `dockerfile`.\n",
"\n",
"> In this post we will talk about the usage of the binder service. please visit the binder documentation to learn about [common usage patterns in binder](https://mybinder.readthedocs.io/en/latest/using/using.html#common-usage-patterns-in-binder) or more [specific configuration settings](https://mybinder.readthedocs.io/en/latest/using/using.html#common-usage-patterns-in-binder).\n",
"\n",
"[`mybinder`]: #"
]
},
{
"cell_type": "markdown",
"id": "fed01152",
"metadata": {},
"source": [
"## How is `mybinder` used?\n",
"\n",
"`mybinder` provides the ability to turn content from a provider (eg. Github, GitLab, Zenodo) containing [common configuration files]() that define tailored computational environments. Visitors to __a binder__ will be able to interact with content using [different interactive computing interfaces][interfaces] (eg. [`jupyterlab`], [`notebook`], [RStudio]).\n",
"\n",
"\n",
"The binder concept first appeared ~2015 https://www.youtube.com/watch?v=Or6pMrskYoU\n",
"and it [moved out of beta in 2018](https://blog.jupyter.org/binderhub-is-out-of-beta-fa2781a229d6), now it is supported by a federation of organization that loan computational resources to the open web.\n",
"\n",
"[interfaces]: https://mybinder.readthedocs.io/en/latest/using/using.html#choose-from-multiple-user-interfaces\n",
"\n",
"[julia]: #\n",
"[python]: #\n",
"[r]: #\n",
"[`jupyterlab`]: #\n",
"[`notebook`]: #\n",
"[RStudio]: #\n",
"[`mybinder`]: #\n",
"\n",
"In the coming sections, we'll see in data, that there are over 100,000 units of content deployed with `mybinder` that have seeded over 10 million interactive computing sessions. With numbers of this magnitude there are many applications of `mybinder`. Below is short list of a few popular ways `binder`s help folks:\n",
"* `jupyterlab` uses binder to create previews of each pull request.\n",
"* https://the-turing-way.netlify.app/reproducible-research/binderhub.html\n",
"* projects like `scipy`, `numpy`, and `pandas` provide binders for users to try out the newest versions of their tools.\n",
"* individuals \n",
"* interactive documentation\n",
"* ephemeral classrooms\n",
"* community events\n",
"* demos\n",
"\n",
"With such a low barrier to entry, it is to believe that mybinder is a free, open-source service, anyway all these projects needed to get started were:\n",
"- [x] code or content\n",
"- [x] environment configuration"
]
},
{
"cell_type": "markdown",
"id": "2f9a19ff",
"metadata": {},
"source": [
"## Exploring `binderlytics`\n",
"\n",
"One way to discuss is through data. The binder team provides the [binderlytics] service, built on [datasette], that allows you to explore binders that have been deployed since the end of 2018. i've downloaded the 4gb sqlite database that they serve to analyze in this post. In the following sections we'll create a dataframe of binderlytics that we can talk about."
]
},
{
"cell_type": "raw",
"id": "9389e252",
"metadata": {},
"source": [
"<details><summary>Imports and configuration</summary>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "459fb0a4",
"metadata": {},
"outputs": [],
"source": [
" import pandas, ibis, matplotlib.pyplot\n",
" matplotlib.rcParams.update({\n",
" 'figure.figsize': (12, 6), \"axes.grid\": True,\n",
" })"
]
},
{
"cell_type": "markdown",
"id": "58b1e878",
"metadata": {},
"source": [
"> Disclosure: We help maintain the ibis-framework through Quansight Labs, with funding coming from our open source maintainence agreements."
]
},
{
"cell_type": "raw",
"id": "abba51e6",
"metadata": {},
"source": [
"</details>"
]
},
{
"cell_type": "raw",
"id": "3645d08c",
"metadata": {},
"source": [
"<details open><summary>Query the databases and create a dataframe.</summary>"
]
},
{
"cell_type": "markdown",
"id": "cd303e98",
"metadata": {},
"source": [
"We use `ibis` is construct out query and manipulate the results with `pandas`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dbb961de",
"metadata": {},
"outputs": [],
"source": [
" db = __import__(\"ibis\").backends.sqlite.connect(\"Downloads/binder-launches.db\").table(\"binder\")\n",
" db = db.projection([db.timestamp.substr(0, 7).name(\"date\"), db])\n",
" df = db.groupby([db.date, db.provider, db.origin, db.org]).count().execute(1e8)\n",
" df = df.set_index(df.pop(\"date\").add(\"-01\").pipe(pandas.to_datetime))\n",
" df[\"origin\"] = df.origin.fillna(\"\").str.partition(\":\")[0].replace({\"\": None})"
]
},
{
"cell_type": "raw",
"id": "c03410e1",
"metadata": {},
"source": [
"</details>"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d48cf80c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The database contains records of 13,946,918 binder deploys'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
" f\"\"\"The database contains records of {df[\"count\"].sum():,} binder deploys\"\"\""
]
},
{
"cell_type": "markdown",
"id": "764d4a59",
"metadata": {},
"source": [
"## binder `providers`\n",
"\n",
"The `providers` designate where the binder content comes from. mybinder allows authors to deploy their contents from 8 different providers. the records indicate that GitHub, GitLab, and Git were the first sources for binders. Later others were added, the other services are presented below with the months of their first appearance on binder. \n",
"\n",
"The earliest appearances of the `providers`, according to the data, are shown in the table below."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3c60a015",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>provider</th>\n",
" <th>GitLab</th>\n",
" <th>Git</th>\n",
" <th>GitHub</th>\n",
" <th>Gist</th>\n",
" <th>Zenodo</th>\n",
" <th>Figshare</th>\n",
" <th>Dataverse</th>\n",
" <th>Hydroshare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>first record</th>\n",
" <td>2018-11-30</td>\n",
" <td>2018-11-30</td>\n",
" <td>2018-11-30</td>\n",
" <td>2019-04-30</td>\n",
" <td>2019-06-30</td>\n",
" <td>2019-09-30</td>\n",
" <td>2019-12-31</td>\n",
" <td>2020-02-29</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"provider GitLab Git GitHub Gist Zenodo \\\n",
"first record 2018-11-30 2018-11-30 2018-11-30 2019-04-30 2019-06-30 \n",
"\n",
"provider Figshare Dataverse Hydroshare \n",
"first record 2019-09-30 2019-12-31 2020-02-29 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
" providers = df.groupby([pandas.Grouper(freq=\"1M\"), \"provider\"])[\"count\"].sum().unstack().fillna(0)\n",
" providers = providers[providers.sum().sort_values().index]\n",
" appearances = providers.div(providers).apply(pandas.Series.idxmin).sort_values().to_frame(\"first record\"); appearances.T"
]
},
{
"cell_type": "markdown",
"id": "1d40ace4",
"metadata": {},
"source": [
"we can look at the monthly deploys of binder to see that github is by far the most common provider, and collectively binder is being deployed over half a million times a month."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c273ffa1",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
" providers.plot.bar(stacked=True);"
]
},
{
"cell_type": "markdown",
"id": "fefd7667",
"metadata": {},
"source": [
"## 🙏 the binder federation"
]
},
{
"cell_type": "markdown",
"id": "c555a44a",
"metadata": {},
"source": [
"cloud resources ain't free, so we need to give thanks to those providing the infrastructure. binder is a federation. there are several organizations that support the binder federation, and they share together they share the load of providing the world binders."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6efb204c",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
" df.groupby([pandas.Grouper(freq=\"1M\"), \"origin\"])[\"count\"].sum().unstack().plot.bar(stacked=True);"
]
},
{
"cell_type": "markdown",
"id": "7fc3c076",
"metadata": {},
"source": [
"## how is binder used"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "128922d5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This dataset counts content from 112,920 different repositories.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
" F\"This dataset counts content from {db[[db.provider, db.repo]].distinct().count().execute():,} different repositories.\""
]
},
{
"cell_type": "markdown",
"id": "6f37c9d8",
"metadata": {},
"source": [
"with millions of deploys, and over 100,000 content source, there are diverse applications of binder. below we list some common applications of binders:\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "bc87cc7d",
"metadata": {},
"source": [
"### i ❤️ binder more than you\n",
"\n",
"A small percentage of the binders deploy content through gist. personally, gist are my favorite way to share ideas and content with others."
]
},
{
"cell_type": "raw",
"id": "8fd5f623",
"metadata": {},
"source": [
"<details open><summary>I was so excited that we could deploy binders with <code>gist</code> I can assure you that I used it the first month it was available.</summary>"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c0e33c97",
"metadata": {},
"outputs": [],
"source": [
" assert (\n",
" # Get the date of the first appearance of binder\n",
" appearances.loc[\"Gist\", appearances.columns[0]].strftime(\"%Y-%m\") \n",
" == \n",
" # and compare it the first record, of my (ie. tonyfast's) first gist binder.\n",
" db.filter([db.org == \"tonyfast\", db.provider==\"Gist\"]).sort_by(\"date\").head(1).date.execute().iloc[0])"
]
},
{
"cell_type": "raw",
"id": "20767eda",
"metadata": {},
"source": [
"</details>"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "59f3ebad",
"metadata": {},
"outputs": [],
"source": [
" gist = db[db.provider == \"Gist\"][db.spec, db.repo, db.org].distinct().execute(1e6)"
]
},
{
"cell_type": "markdown",
"id": "c76a4ffa",
"metadata": {},
"source": [
"i've made hella binders, when we look at the numbers, for better or worse, i lead the pack in unique binders i've built with github gist. since i frequently work in notebooks, gist provides a means for me to share content with the minimum number of files necessary. usually, i can get away witha notebook and `requirements.txt` or an `environment.yml`"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "901ddfcb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>tonyfast</th>\n",
" <th>NehaAkashDeo</th>\n",
" <th>ArthurAraujoBrum</th>\n",
" <th>manics</th>\n",
" <th>jtpio</th>\n",
" <th>bollwyvl</th>\n",
" <th>rsignell-usgs</th>\n",
" <th>anonymous</th>\n",
" <th>parente</th>\n",
" <th>ELC</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>binder on gist</th>\n",
" <td>56</td>\n",
" <td>44</td>\n",
" <td>31</td>\n",
" <td>29</td>\n",
" <td>23</td>\n",
" <td>22</td>\n",
" <td>15</td>\n",
" <td>14</td>\n",
" <td>13</td>\n",
" <td>13</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" tonyfast NehaAkashDeo ArthurAraujoBrum manics jtpio \\\n",
"binder on gist 56 44 31 29 23 \n",
"\n",
" bollwyvl rsignell-usgs anonymous parente ELC \n",
"binder on gist 22 15 14 13 13 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
" gist.org.value_counts().sort_values(ascending=False).head(10).to_frame(\"binder on gist\").T"
]
},
{
"cell_type": "markdown",
"id": "6692c8f9",
"metadata": {},
"source": [
"## binders are designed and architected\n",
"\n",
"when we make a binder, we are making a statement of the reprodubibility of an idea. at the same time we make a statement that something is reproducible we have to acknowledge that there may be a time when it is not reproducible. the continuous current of software changes in open source may require new techonologies or versions or breaking api changes.\n",
"\n",
"the complexity of a binder grows with the requirements of project, not the content."
]
},
{
"cell_type": "markdown",
"id": "9e92bf13",
"metadata": {},
"source": [
"## making this notebook a binder."
]
},
{
"cell_type": "raw",
"id": "ff685790",
"metadata": {},
"source": [
"<details open><summary>A <code>requirements.txt</code> with the dependencies needed to run this document.</summary>"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "5640351f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting requirements.txt\n"
]
}
],
"source": [
" %%file requirements.txt\n",
" ibis-framework[sqlite]\n",
" pandas\n",
" matplotlib"
]
},
{
"cell_type": "raw",
"id": "b4eb82cf",
"metadata": {},
"source": [
"</details>"
]
},
{
"cell_type": "markdown",
"id": "b44997e7",
"metadata": {},
"source": [
"> My preferred practice for sharing gist is through official Github `gh` cli."
]
},
{
"cell_type": "markdown",
"id": "5307f041",
"metadata": {},
"source": [
" !gh gist create -p 2021-binder-is-the-best.ipynb requirements.txt"
]
},
{
"cell_type": "markdown",
"id": "d6a51383",
"metadata": {},
"source": [
"The CLI tells me that we now have a gist @ https://gist.github.com/54e95a64de8a959f83f8443710ff53d1"
]
},
{
"cell_type": "markdown",
"id": "e2c734d0",
"metadata": {},
"source": [
"and build the url for our binder on the mybinder site, and we have to remember to gist the Gist provider.\n",
"\n",
"https://mybinder.org/v2/gist/tonyfast/54e95a64de8a959f83f8443710ff53d1/HEAD\n",
" \n",
"[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gist/tonyfast/54e95a64de8a959f83f8443710ff53d1/HEAD)\n",
"\n",
"This won't work til i slice off some data."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "ibis",
"language": "python",
"name": "ibis"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
ibis-framework[sqlite]
pandas
matplotlib
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment