Skip to content

Instantly share code, notes, and snippets.

@bollwyvl
Last active May 17, 2019 01:17
Show Gist options
  • Save bollwyvl/ccc4d995fd5ab8578af17e8d7f198358 to your computer and use it in GitHub Desktop.
Save bollwyvl/ccc4d995fd5ab8578af17e8d7f198358 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# py2pkg"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"ename": "ModuleNotFoundError",
"evalue": "No module named 'antimatter'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-1-f44c7554ea01>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mantimatter\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;31m# import antigravity # <- this one is more fun\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'antimatter'"
]
}
],
"source": [
"import antimatter\n",
"# import antigravity # <- this one is more fun"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we'd like to have tools like [`depfinder`](https://github.com/ericdill/depfinder) and [`conda-deps`](https://github.com/cgat-developers/conda-deps) (or even [`repo2docker`](https://github.com/jupyter/repo2docker), perhaps) be able to uncover installable package names found in some source files, we'll need a big old library of them. But where to get it?"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"TOP_LEVEL = \"top_level.txt\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Standard python packaging (at least as of the Middle-High `setuptools` dialect) provides `top_level.txt`, a means for communicating all of the _distributions_ included inside of a package. While many python packages contain only one, eponymous, package, there's quite a few that provide more (e.g. c extensions), some with wildly divergent names. Further, due to _namespace packages_ or plain old re-implementations of APIs (`PIL` vs `pillow`), there might be even more weirdness. Let's find all of them on this computer."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import json, IPython\n",
"from pathlib import Path\n",
"from collections import defaultdict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`conda info` knows _things_ about my computer, like every place I've installed conda packages. All of them, in turn, might have a `pkgs` directory, which might have some unpacked packages in them.\n",
"\n",
"> This would work for `tar.bz2` files as well, but would be a lot slower, and not as much fun. Perhaps the future [conda format](https://github.com/conda/conda-package-handling) would make this easier, and also more general to other languages."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"raw_info = !conda info --json\n",
"conda_info = json.loads(\"\\n\".join(raw_info))\n",
"pkg_dirs = set(list(map(Path, conda_info[\"pkgs_dirs\"])) + [Path(env, \"pkgs\") for env in conda_info[\"envs\"]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This computer _should_ have one or more environments, provided it's running with `conda`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"30"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(pkg_dirs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see what was in them."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"toppings = {}\n",
"eponymous = []\n",
"for pkg_dir in pkg_dirs:\n",
" for top in pkg_dir.rglob(TOP_LEVEL):\n",
" top_txt = top.read_text().strip()\n",
" if not top_txt:\n",
" continue\n",
" pkg = top.parents[top.parents.index(pkg_dir) - 1]\n",
" idx = json.loads((pkg / \"info\" / \"index.json\").read_text())\n",
" name = idx[\"name\"]\n",
" for export in top_txt.split(\"\\n\"):\n",
" export_path = tuple(export.split(\"/\"))\n",
" if name == export_path[0]:\n",
" eponymous += [name]\n",
" continue\n",
" if name in toppings.get(export_path[0], []):\n",
" continue\n",
" toppings.setdefault(export_path[0], []).append(name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see what we found! There are likely quite a few that have the same name:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"256 eponymous packages\n"
]
}
],
"source": [
"print(len(set(eponymous)), \"eponymous packages\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now for the good stuff:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"Cython\": [\n",
" \"cython\"\n",
" ],\n",
" \"IPython\": [\n",
" \"ipython\"\n",
" ],\n",
" \"JupyterLibrary\": [\n",
" \"robotframework-jupyterlibrary\"\n",
" ],\n",
" \"OpenSSL\": [\n",
" \"pyopenssl\"\n",
" ],\n",
" \"PIL\": [\n",
" \"pillow\"\n",
" ],\n",
" \"REST\": [\n",
" \"restinstance\"\n",
" ],\n",
" \"SPARQLWrapper\": [\n",
" \"sparqlwrapper\"\n",
" ],\n",
" \"SeleniumLibrary\": [\n",
" \"robotframework-seleniumlibrary\"\n",
" ],\n",
" \"_ast27\": [\n",
" \"typed-ast\"\n",
" ],\n",
" \"_ast3\": [\n",
" \"typed-ast\"\n",
" ],\n",
" \"_cffi_backend\": [\n",
" \"cffi\"\n",
" ],\n",
" \"_constant_time\": [\n",
" \"cryptography\"\n",
" ],\n",
" \"_multiprocess\": [\n",
" \"multiprocess\"\n",
" ],\n",
" \"_openssl\": [\n",
" \"cryptography\"\n",
" ],\n",
" \"_padding\": [\n",
" \"cryptography\"\n",
" ],\n",
" \"_pylief\": [\n",
" \"py-lief\"\n",
" ],\n",
" \"_pyrsistent_version\": [\n",
" \"pyrsistent\"\n",
" ],\n",
" \"_pytest\": [\n",
" \"pytest\"\n",
" ],\n",
" \"_pytest_mock_version\": [\n",
" \"pytest-mock\"\n",
" ],\n",
" \"_ruamel_yaml\": [\n",
" \"ruamel.yaml\"\n",
" ],\n",
" \"anaconda_project\": [\n",
" \"anaconda-project\"\n",
" ],\n",
" \"async_timeout\": [\n",
" \"async-timeout\"\n",
" ],\n",
" \"attr\": [\n",
" \"attrs\"\n",
" ],\n",
" \"benchmark\": [\n",
" \"lime\"\n",
" ],\n",
" \"binstar_client\": [\n",
" \"anaconda-client\"\n",
" ],\n",
" \"blackd\": [\n",
" \"black\"\n",
" ],\n",
" \"blib2to3\": [\n",
" \"black\"\n",
" ],\n",
" \"blis\": [\n",
" \"cython-blis\"\n",
" ],\n",
" \"bs4\": [\n",
" \"beautifulsoup4\"\n",
" ],\n",
" \"chromedriver_binary\": [\n",
" \"python-chromedriver-binary\"\n",
" ],\n",
" \"conda_build\": [\n",
" \"conda-build\"\n",
" ],\n",
" \"conda_env\": [\n",
" \"conda\"\n",
" ],\n",
" \"conda_smithy\": [\n",
" \"conda-smithy\"\n",
" ],\n",
" \"conda_verify\": [\n",
" \"conda-verify\"\n",
" ],\n",
" \"dask\": [\n",
" \"dask-core\"\n",
" ],\n",
" \"dask_glm\": [\n",
" \"dask-glm\"\n",
" ],\n",
" \"dask_ml\": [\n",
" \"dask-ml\"\n",
" ],\n",
" \"dateutil\": [\n",
" \"python-dateutil\"\n",
" ],\n",
" \"dns\": [\n",
" \"dnspython\"\n",
" ],\n",
" \"dot_parser\": [\n",
" \"pydot\"\n",
" ],\n",
" \"easy_install\": [\n",
" \"setuptools\"\n",
" ],\n",
" \"en_vectors_web_lg\": [\n",
" \"spacy-model-en_vectors_web_lg\"\n",
" ],\n",
" \"flex\": [\n",
" \"flex-swagger\"\n",
" ],\n",
" \"git\": [\n",
" \"gitpython\"\n",
" ],\n",
" \"gitdb\": [\n",
" \"gitdb2\"\n",
" ],\n",
" \"github\": [\n",
" \"pygithub\"\n",
" ],\n",
" \"graphql\": [\n",
" \"graphql-core\"\n",
" ],\n",
" \"graphql_relay\": [\n",
" \"graphql-relay\"\n",
" ],\n",
" \"graphviz\": [\n",
" \"python-graphviz\"\n",
" ],\n",
" \"html5lib\": [\n",
" \"bleach\"\n",
" ],\n",
" \"ipykernel_launcher\": [\n",
" \"ipykernel\"\n",
" ],\n",
" \"jinja2_time\": [\n",
" \"jinja2-time\"\n",
" ],\n",
" \"jsonpath_ng\": [\n",
" \"jsonpath-ng\"\n",
" ],\n",
" \"jupyter\": [\n",
" \"jupyter_core\"\n",
" ],\n",
" \"jwt\": [\n",
" \"pyjwt\"\n",
" ],\n",
" \"lazy_object_proxy\": [\n",
" \"lazy-object-proxy\"\n",
" ],\n",
" \"libarchive\": [\n",
" \"python-libarchive-c\"\n",
" ],\n",
" \"libfuturize\": [\n",
" \"future\"\n",
" ],\n",
" \"libpasteurize\": [\n",
" \"future\"\n",
" ],\n",
" \"lief\": [\n",
" \"py-lief\"\n",
" ],\n",
" \"magic\": [\n",
" \"python-magic\"\n",
" ],\n",
" \"matplotlib\": [\n",
" \"matplotlib-base\"\n",
" ],\n",
" \"mdr\": [\n",
" \"scikit-mdr\"\n",
" ],\n",
" \"more_itertools\": [\n",
" \"more-itertools\"\n",
" ],\n",
" \"mpl_toolkits\": [\n",
" \"matplotlib-base\"\n",
" ],\n",
" \"msgpack\": [\n",
" \"msgpack-python\"\n",
" ],\n",
" \"multipart\": [\n",
" \"python-multipart\"\n",
" ],\n",
" \"nose_exclude\": [\n",
" \"nose-exclude\"\n",
" ],\n",
" \"numpy\": [\n",
" \"numpy-base\"\n",
" ],\n",
" \"past\": [\n",
" \"future\"\n",
" ],\n",
" \"pkg_resources\": [\n",
" \"setuptools\"\n",
" ],\n",
" \"plac_core\": [\n",
" \"plac\"\n",
" ],\n",
" \"plac_ext\": [\n",
" \"plac\"\n",
" ],\n",
" \"plac_tk\": [\n",
" \"plac\"\n",
" ],\n",
" \"pvectorc\": [\n",
" \"pyrsistent\"\n",
" ],\n",
" \"pyDOE2\": [\n",
" \"pydoe2\"\n",
" ],\n",
" \"pylab\": [\n",
" \"matplotlib-base\"\n",
" ],\n",
" \"pytest_cov\": [\n",
" \"pytest-cov\"\n",
" ],\n",
" \"pytest_mock\": [\n",
" \"pytest-mock\"\n",
" ],\n",
" \"pywt\": [\n",
" \"pywavelets\"\n",
" ],\n",
" \"pyximport\": [\n",
" \"cython\"\n",
" ],\n",
" \"rdflib_jsonld\": [\n",
" \"rdflib-jsonld\"\n",
" ],\n",
" \"requests_cache\": [\n",
" \"requests-cache\"\n",
" ],\n",
" \"requests_oauthlib\": [\n",
" \"requests-oauthlib\"\n",
" ],\n",
" \"rflint\": [\n",
" \"robotframework-lint\"\n",
" ],\n",
" \"robot\": [\n",
" \"robotframework\"\n",
" ],\n",
" \"ruamel\": [\n",
" \"ruamel.yaml\"\n",
" ],\n",
" \"skimage\": [\n",
" \"scikit-image\"\n",
" ],\n",
" \"sklearn\": [\n",
" \"scikit-learn\"\n",
" ],\n",
" \"smmap\": [\n",
" \"smmap2\"\n",
" ],\n",
" \"socks\": [\n",
" \"pysocks\"\n",
" ],\n",
" \"sockshandler\": [\n",
" \"pysocks\"\n",
" ],\n",
" \"sphinxcontrib\": [\n",
" \"sphinxcontrib-jsmath\",\n",
" \"sphinxcontrib-applehelp\",\n",
" \"sphinxcontrib-htmlhelp\",\n",
" \"sphinxcontrib-restbuilder\",\n",
" \"sphinxcontrib-qthelp\",\n",
" \"sphinxcontrib-httpdomain\",\n",
" \"sphinxcontrib-serializinghtml\",\n",
" \"sphinxcontrib-httpexample\",\n",
" \"sphinxcontrib-devhelp\"\n",
" ],\n",
" \"strict_rfc3339\": [\n",
" \"strict-rfc3339\"\n",
" ],\n",
" \"test\": [\n",
" \"wsproto\"\n",
" ],\n",
" \"test_data\": [\n",
" \"conda\"\n",
" ],\n",
" \"tests\": [\n",
" \"soupsieve\",\n",
" \"ipywebrtc\"\n",
" ],\n",
" \"tlz\": [\n",
" \"toolz\"\n",
" ],\n",
" \"typed_ast\": [\n",
" \"typed-ast\"\n",
" ],\n",
" \"umap\": [\n",
" \"umap-learn\"\n",
" ],\n",
" \"update_checker_test\": [\n",
" \"update_checker\"\n",
" ],\n",
" \"vsts\": [\n",
" \"vsts-python-api\"\n",
" ],\n",
" \"wx\": [\n",
" \"wxpython\"\n",
" ],\n",
" \"yapftests\": [\n",
" \"yapf\"\n",
" ],\n",
" \"zmq\": [\n",
" \"pyzmq\"\n",
" ]\n",
"}\n"
]
}
],
"source": [
"print(json.dumps(toppings, indent=2, sort_keys=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's kind of a lot! Extending this to **all the packages on conda-forge, anaconda, and bioconda** would probably only take a couple hours, and the resulting data could be packaged, distributed, and updated (monthly?) so that dependency tools could share effort."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**All the wheels on PyPI** would take a bit longer, but is probably also worth doing."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment