Last active
May 17, 2019 01:17
-
-
Save bollwyvl/ccc4d995fd5ab8578af17e8d7f198358 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# py2pkg" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"ename": "ModuleNotFoundError", | |
"evalue": "No module named 'antimatter'", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-1-f44c7554ea01>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mantimatter\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;31m# import antigravity # <- this one is more fun\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'antimatter'" | |
] | |
} | |
], | |
"source": [ | |
"import antimatter\n", | |
"# import antigravity # <- this one is more fun" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If we'd like to have tools like [`depfinder`](https://github.com/ericdill/depfinder) and [`conda-deps`](https://github.com/cgat-developers/conda-deps) (or even [`repo2docker`](https://github.com/jupyter/repo2docker), perhaps) be able to uncover installable package names found in some source files, we'll need a big old library of them. But where to get it?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"TOP_LEVEL = \"top_level.txt\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Standard python packaging (at least as of the Middle-High `setuptools` dialect) provides `top_level.txt`, a means for communicating all of the _distributions_ included inside of a package. While many python packages contain only one, eponymous, package, there's quite a few that provide more (e.g. c extensions), some with wildly divergent names. Further, due to _namespace packages_ or plain old re-implementations of APIs (`PIL` vs `pillow`), there might be even more weirdness. Let's find all of them on this computer." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import json, IPython\n", | |
"from pathlib import Path\n", | |
"from collections import defaultdict" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"`conda info` knows _things_ about my computer, like every place I've installed conda packages. All of them, in turn, might have a `pkgs` directory, which might have some unpacked packages in them.\n", | |
"\n", | |
"> This would work for `tar.bz2` files as well, but would be a lot slower, and not as much fun. Perhaps the future [conda format](https://github.com/conda/conda-package-handling) would make this easier, and also more general to other languages." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"raw_info = !conda info --json\n", | |
"conda_info = json.loads(\"\\n\".join(raw_info))\n", | |
"pkg_dirs = set(list(map(Path, conda_info[\"pkgs_dirs\"])) + [Path(env, \"pkgs\") for env in conda_info[\"envs\"]])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This computer _should_ have one or more environments, provided it's running with `conda`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"30" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(pkg_dirs)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's see what was in them." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"toppings = {}\n", | |
"eponymous = []\n", | |
"for pkg_dir in pkg_dirs:\n", | |
" for top in pkg_dir.rglob(TOP_LEVEL):\n", | |
" top_txt = top.read_text().strip()\n", | |
" if not top_txt:\n", | |
" continue\n", | |
" pkg = top.parents[top.parents.index(pkg_dir) - 1]\n", | |
" idx = json.loads((pkg / \"info\" / \"index.json\").read_text())\n", | |
" name = idx[\"name\"]\n", | |
" for export in top_txt.split(\"\\n\"):\n", | |
" export_path = tuple(export.split(\"/\"))\n", | |
" if name == export_path[0]:\n", | |
" eponymous += [name]\n", | |
" continue\n", | |
" if name in toppings.get(export_path[0], []):\n", | |
" continue\n", | |
" toppings.setdefault(export_path[0], []).append(name)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's see what we found! There are likely quite a few that have the same name:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"256 eponymous packages\n" | |
] | |
} | |
], | |
"source": [ | |
"print(len(set(eponymous)), \"eponymous packages\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now for the good stuff:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{\n", | |
" \"Cython\": [\n", | |
" \"cython\"\n", | |
" ],\n", | |
" \"IPython\": [\n", | |
" \"ipython\"\n", | |
" ],\n", | |
" \"JupyterLibrary\": [\n", | |
" \"robotframework-jupyterlibrary\"\n", | |
" ],\n", | |
" \"OpenSSL\": [\n", | |
" \"pyopenssl\"\n", | |
" ],\n", | |
" \"PIL\": [\n", | |
" \"pillow\"\n", | |
" ],\n", | |
" \"REST\": [\n", | |
" \"restinstance\"\n", | |
" ],\n", | |
" \"SPARQLWrapper\": [\n", | |
" \"sparqlwrapper\"\n", | |
" ],\n", | |
" \"SeleniumLibrary\": [\n", | |
" \"robotframework-seleniumlibrary\"\n", | |
" ],\n", | |
" \"_ast27\": [\n", | |
" \"typed-ast\"\n", | |
" ],\n", | |
" \"_ast3\": [\n", | |
" \"typed-ast\"\n", | |
" ],\n", | |
" \"_cffi_backend\": [\n", | |
" \"cffi\"\n", | |
" ],\n", | |
" \"_constant_time\": [\n", | |
" \"cryptography\"\n", | |
" ],\n", | |
" \"_multiprocess\": [\n", | |
" \"multiprocess\"\n", | |
" ],\n", | |
" \"_openssl\": [\n", | |
" \"cryptography\"\n", | |
" ],\n", | |
" \"_padding\": [\n", | |
" \"cryptography\"\n", | |
" ],\n", | |
" \"_pylief\": [\n", | |
" \"py-lief\"\n", | |
" ],\n", | |
" \"_pyrsistent_version\": [\n", | |
" \"pyrsistent\"\n", | |
" ],\n", | |
" \"_pytest\": [\n", | |
" \"pytest\"\n", | |
" ],\n", | |
" \"_pytest_mock_version\": [\n", | |
" \"pytest-mock\"\n", | |
" ],\n", | |
" \"_ruamel_yaml\": [\n", | |
" \"ruamel.yaml\"\n", | |
" ],\n", | |
" \"anaconda_project\": [\n", | |
" \"anaconda-project\"\n", | |
" ],\n", | |
" \"async_timeout\": [\n", | |
" \"async-timeout\"\n", | |
" ],\n", | |
" \"attr\": [\n", | |
" \"attrs\"\n", | |
" ],\n", | |
" \"benchmark\": [\n", | |
" \"lime\"\n", | |
" ],\n", | |
" \"binstar_client\": [\n", | |
" \"anaconda-client\"\n", | |
" ],\n", | |
" \"blackd\": [\n", | |
" \"black\"\n", | |
" ],\n", | |
" \"blib2to3\": [\n", | |
" \"black\"\n", | |
" ],\n", | |
" \"blis\": [\n", | |
" \"cython-blis\"\n", | |
" ],\n", | |
" \"bs4\": [\n", | |
" \"beautifulsoup4\"\n", | |
" ],\n", | |
" \"chromedriver_binary\": [\n", | |
" \"python-chromedriver-binary\"\n", | |
" ],\n", | |
" \"conda_build\": [\n", | |
" \"conda-build\"\n", | |
" ],\n", | |
" \"conda_env\": [\n", | |
" \"conda\"\n", | |
" ],\n", | |
" \"conda_smithy\": [\n", | |
" \"conda-smithy\"\n", | |
" ],\n", | |
" \"conda_verify\": [\n", | |
" \"conda-verify\"\n", | |
" ],\n", | |
" \"dask\": [\n", | |
" \"dask-core\"\n", | |
" ],\n", | |
" \"dask_glm\": [\n", | |
" \"dask-glm\"\n", | |
" ],\n", | |
" \"dask_ml\": [\n", | |
" \"dask-ml\"\n", | |
" ],\n", | |
" \"dateutil\": [\n", | |
" \"python-dateutil\"\n", | |
" ],\n", | |
" \"dns\": [\n", | |
" \"dnspython\"\n", | |
" ],\n", | |
" \"dot_parser\": [\n", | |
" \"pydot\"\n", | |
" ],\n", | |
" \"easy_install\": [\n", | |
" \"setuptools\"\n", | |
" ],\n", | |
" \"en_vectors_web_lg\": [\n", | |
" \"spacy-model-en_vectors_web_lg\"\n", | |
" ],\n", | |
" \"flex\": [\n", | |
" \"flex-swagger\"\n", | |
" ],\n", | |
" \"git\": [\n", | |
" \"gitpython\"\n", | |
" ],\n", | |
" \"gitdb\": [\n", | |
" \"gitdb2\"\n", | |
" ],\n", | |
" \"github\": [\n", | |
" \"pygithub\"\n", | |
" ],\n", | |
" \"graphql\": [\n", | |
" \"graphql-core\"\n", | |
" ],\n", | |
" \"graphql_relay\": [\n", | |
" \"graphql-relay\"\n", | |
" ],\n", | |
" \"graphviz\": [\n", | |
" \"python-graphviz\"\n", | |
" ],\n", | |
" \"html5lib\": [\n", | |
" \"bleach\"\n", | |
" ],\n", | |
" \"ipykernel_launcher\": [\n", | |
" \"ipykernel\"\n", | |
" ],\n", | |
" \"jinja2_time\": [\n", | |
" \"jinja2-time\"\n", | |
" ],\n", | |
" \"jsonpath_ng\": [\n", | |
" \"jsonpath-ng\"\n", | |
" ],\n", | |
" \"jupyter\": [\n", | |
" \"jupyter_core\"\n", | |
" ],\n", | |
" \"jwt\": [\n", | |
" \"pyjwt\"\n", | |
" ],\n", | |
" \"lazy_object_proxy\": [\n", | |
" \"lazy-object-proxy\"\n", | |
" ],\n", | |
" \"libarchive\": [\n", | |
" \"python-libarchive-c\"\n", | |
" ],\n", | |
" \"libfuturize\": [\n", | |
" \"future\"\n", | |
" ],\n", | |
" \"libpasteurize\": [\n", | |
" \"future\"\n", | |
" ],\n", | |
" \"lief\": [\n", | |
" \"py-lief\"\n", | |
" ],\n", | |
" \"magic\": [\n", | |
" \"python-magic\"\n", | |
" ],\n", | |
" \"matplotlib\": [\n", | |
" \"matplotlib-base\"\n", | |
" ],\n", | |
" \"mdr\": [\n", | |
" \"scikit-mdr\"\n", | |
" ],\n", | |
" \"more_itertools\": [\n", | |
" \"more-itertools\"\n", | |
" ],\n", | |
" \"mpl_toolkits\": [\n", | |
" \"matplotlib-base\"\n", | |
" ],\n", | |
" \"msgpack\": [\n", | |
" \"msgpack-python\"\n", | |
" ],\n", | |
" \"multipart\": [\n", | |
" \"python-multipart\"\n", | |
" ],\n", | |
" \"nose_exclude\": [\n", | |
" \"nose-exclude\"\n", | |
" ],\n", | |
" \"numpy\": [\n", | |
" \"numpy-base\"\n", | |
" ],\n", | |
" \"past\": [\n", | |
" \"future\"\n", | |
" ],\n", | |
" \"pkg_resources\": [\n", | |
" \"setuptools\"\n", | |
" ],\n", | |
" \"plac_core\": [\n", | |
" \"plac\"\n", | |
" ],\n", | |
" \"plac_ext\": [\n", | |
" \"plac\"\n", | |
" ],\n", | |
" \"plac_tk\": [\n", | |
" \"plac\"\n", | |
" ],\n", | |
" \"pvectorc\": [\n", | |
" \"pyrsistent\"\n", | |
" ],\n", | |
" \"pyDOE2\": [\n", | |
" \"pydoe2\"\n", | |
" ],\n", | |
" \"pylab\": [\n", | |
" \"matplotlib-base\"\n", | |
" ],\n", | |
" \"pytest_cov\": [\n", | |
" \"pytest-cov\"\n", | |
" ],\n", | |
" \"pytest_mock\": [\n", | |
" \"pytest-mock\"\n", | |
" ],\n", | |
" \"pywt\": [\n", | |
" \"pywavelets\"\n", | |
" ],\n", | |
" \"pyximport\": [\n", | |
" \"cython\"\n", | |
" ],\n", | |
" \"rdflib_jsonld\": [\n", | |
" \"rdflib-jsonld\"\n", | |
" ],\n", | |
" \"requests_cache\": [\n", | |
" \"requests-cache\"\n", | |
" ],\n", | |
" \"requests_oauthlib\": [\n", | |
" \"requests-oauthlib\"\n", | |
" ],\n", | |
" \"rflint\": [\n", | |
" \"robotframework-lint\"\n", | |
" ],\n", | |
" \"robot\": [\n", | |
" \"robotframework\"\n", | |
" ],\n", | |
" \"ruamel\": [\n", | |
" \"ruamel.yaml\"\n", | |
" ],\n", | |
" \"skimage\": [\n", | |
" \"scikit-image\"\n", | |
" ],\n", | |
" \"sklearn\": [\n", | |
" \"scikit-learn\"\n", | |
" ],\n", | |
" \"smmap\": [\n", | |
" \"smmap2\"\n", | |
" ],\n", | |
" \"socks\": [\n", | |
" \"pysocks\"\n", | |
" ],\n", | |
" \"sockshandler\": [\n", | |
" \"pysocks\"\n", | |
" ],\n", | |
" \"sphinxcontrib\": [\n", | |
" \"sphinxcontrib-jsmath\",\n", | |
" \"sphinxcontrib-applehelp\",\n", | |
" \"sphinxcontrib-htmlhelp\",\n", | |
" \"sphinxcontrib-restbuilder\",\n", | |
" \"sphinxcontrib-qthelp\",\n", | |
" \"sphinxcontrib-httpdomain\",\n", | |
" \"sphinxcontrib-serializinghtml\",\n", | |
" \"sphinxcontrib-httpexample\",\n", | |
" \"sphinxcontrib-devhelp\"\n", | |
" ],\n", | |
" \"strict_rfc3339\": [\n", | |
" \"strict-rfc3339\"\n", | |
" ],\n", | |
" \"test\": [\n", | |
" \"wsproto\"\n", | |
" ],\n", | |
" \"test_data\": [\n", | |
" \"conda\"\n", | |
" ],\n", | |
" \"tests\": [\n", | |
" \"soupsieve\",\n", | |
" \"ipywebrtc\"\n", | |
" ],\n", | |
" \"tlz\": [\n", | |
" \"toolz\"\n", | |
" ],\n", | |
" \"typed_ast\": [\n", | |
" \"typed-ast\"\n", | |
" ],\n", | |
" \"umap\": [\n", | |
" \"umap-learn\"\n", | |
" ],\n", | |
" \"update_checker_test\": [\n", | |
" \"update_checker\"\n", | |
" ],\n", | |
" \"vsts\": [\n", | |
" \"vsts-python-api\"\n", | |
" ],\n", | |
" \"wx\": [\n", | |
" \"wxpython\"\n", | |
" ],\n", | |
" \"yapftests\": [\n", | |
" \"yapf\"\n", | |
" ],\n", | |
" \"zmq\": [\n", | |
" \"pyzmq\"\n", | |
" ]\n", | |
"}\n" | |
] | |
} | |
], | |
"source": [ | |
"print(json.dumps(toppings, indent=2, sort_keys=True))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"That's kind of a lot! Extending this to **all the packages on conda-forge, anaconda, and bioconda** would probably only take a couple hours, and the resulting data could be packaged, distributed, and updated (monthly?) so that dependency tools could share effort." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**All the wheels on PyPI** would take a bit longer, but is probably also worth doing." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment