Created
February 23, 2020 05:50
-
-
Save simonw/41d56712427e6a4178fc6495d664005f to your computer and use it in GitHub Desktop.
Convert Datasette RST changelog to GFM for releases - for https://github.com/simonw/datasette/issues/680
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Convert Datasette RST changelog to GFM for releases\n", | |
"\n", | |
"For https://github.com/simonw/datasette/issues/680\n", | |
"\n", | |
"Question is: can I automatically take the most recent section from https://datasette.readthedocs.io/en/latest/changelog.html and convert it into Markdown suitable for automatically posting a GitHub release?\n", | |
"\n", | |
"Pandoc looks useful but isn't pure Python. https://pypi.org/project/pypandoc/ includes a prebuilt binary wheel for OS X though, so I'll start by playing with that.\n", | |
"\n", | |
" ~ $ jupyter-venv/bin/pip install pypandoc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from pypandoc.pandoc_download import download_pandoc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"* Downloading pandoc from https://github.com/jgm/pandoc/releases/download/2.9.2/pandoc-2.9.2-macOS.pkg ...\n", | |
"* Unpacking pandoc-2.9.2-macOS.pkg to tempfolder...\n", | |
"* Copying pandoc to /Users/simonw/Applications/pandoc ...\n", | |
"* Making /Users/simonw/Applications/pandoc/pandoc executeable...\n", | |
"* Copying pandoc-citeproc to /Users/simonw/Applications/pandoc ...\n", | |
"* Making /Users/simonw/Applications/pandoc/pandoc-citeproc executeable...\n", | |
"* Done.\n" | |
] | |
} | |
], | |
"source": [ | |
"download_pandoc()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import pypandoc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"some title\n", | |
"==========\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text('# some title', 'rst', format='md'))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"changelog = '''.. _v0_36:\n", | |
"\n", | |
"0.36 (2020-02-21)\n", | |
"-----------------\n", | |
"\n", | |
"* The ``datasette`` object passed to plugins now has API documentation: :ref:`datasette`. (`#576 <https://github.com/simonw/datasette/issues/576>`__)\n", | |
"* New methods on ``datasette``: ``.add_database()`` and ``.remove_database()`` - :ref:`documentation <datasette_add_database>`. (`#671 <https://github.com/simonw/datasette/issues/671>`__)\n", | |
"* ``prepare_connection()`` plugin hook now takes optional ``datasette`` and ``database`` arguments - :ref:`plugin_hook_prepare_connection`. (`#678 <https://github.com/simonw/datasette/issues/678>`__)\n", | |
"* Added three new plugins and one new conversion tool to the :ref:`ecosystem`.'''" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
".. _v0_36:\n", | |
"\n", | |
"0.36 (2020-02-21)\n", | |
"-----------------\n", | |
"\n", | |
"* The ``datasette`` object passed to plugins now has API documentation: :ref:`datasette`. (`#576 <https://github.com/simonw/datasette/issues/576>`__)\n", | |
"* New methods on ``datasette``: ``.add_database()`` and ``.remove_database()`` - :ref:`documentation <datasette_add_database>`. (`#671 <https://github.com/simonw/datasette/issues/671>`__)\n", | |
"* ``prepare_connection()`` plugin hook now takes optional ``datasette`` and ``database`` arguments - :ref:`plugin_hook_prepare_connection`. (`#678 <https://github.com/simonw/datasette/issues/678>`__)\n", | |
"* Added three new plugins and one new conversion tool to the :ref:`ecosystem`.\n" | |
] | |
} | |
], | |
"source": [ | |
"print(changelog)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"# 0.36 (2020-02-21)\n", | |
"\n", | |
" - The `datasette` object passed to plugins now has API documentation:\n", | |
" `datasette`.\n", | |
" ([\\#576](https://github.com/simonw/datasette/issues/576))\n", | |
" - New methods on `datasette`: `.add_database()` and\n", | |
" `.remove_database()` - `documentation <datasette_add_database>`.\n", | |
" ([\\#671](https://github.com/simonw/datasette/issues/671))\n", | |
" - `prepare_connection()` plugin hook now takes optional `datasette`\n", | |
" and `database` arguments - `plugin_hook_prepare_connection`.\n", | |
" ([\\#678](https://github.com/simonw/datasette/issues/678))\n", | |
" - Added three new plugins and one new conversion tool to the\n", | |
" `ecosystem`.\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(changelog, 'gfm', format='rst'))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This isn't a bad start, but has one big flaw: the `:ref:datasette` references were understandable not resolved, becausee I didn't provide enough information for that to happen.\n", | |
"\n", | |
"One option: use rendered HTML as input instead of RST." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"changelog_html = \"\"\"\n", | |
"<div class=\"section\" id=\"v0-36\">\n", | |
"<span id=\"id2\"></span><h2>0.36 (2020-02-21)<a class=\"headerlink\" href=\"#v0-36\" title=\"Permalink to this headline\">¶</a></h2>\n", | |
"<ul class=\"simple\">\n", | |
"<li>The <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code> object passed to plugins now has API documentation: <a class=\"reference internal\" href=\"datasette.html#datasette\"><span class=\"std std-ref\">Datasette class</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/576\">#576</a>)</li>\n", | |
"<li>New methods on <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code>: <code class=\"docutils literal notranslate\"><span class=\"pre\">.add_database()</span></code> and <code class=\"docutils literal notranslate\"><span class=\"pre\">.remove_database()</span></code> - <a class=\"reference internal\" href=\"datasette.html#datasette-add-database\"><span class=\"std std-ref\">documentation</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/671\">#671</a>)</li>\n", | |
"<li><code class=\"docutils literal notranslate\"><span class=\"pre\">prepare_connection()</span></code> plugin hook now takes optional <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code> and <code class=\"docutils literal notranslate\"><span class=\"pre\">database</span></code> arguments - <a class=\"reference internal\" href=\"plugins.html#plugin-hook-prepare-connection\"><span class=\"std std-ref\">prepare_connection(conn, database, datasette)</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/678\">#678</a>)</li>\n", | |
"<li>Added three new plugins and one new conversion tool to the <a class=\"reference internal\" href=\"ecosystem.html#ecosystem\"><span class=\"std std-ref\">The Datasette Ecosystem</span></a>.</li>\n", | |
"</ul>\n", | |
"</div>\n", | |
"\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<div id=\"v0-36\" class=\"section\">\n", | |
"\n", | |
"<span id=\"id2\"></span>\n", | |
"\n", | |
"## 0.36 (2020-02-21)[¶](#v0-36 \"Permalink to this headline\")\n", | |
"\n", | |
" - The `datasette` object passed to plugins now has API documentation:\n", | |
" [<span class=\"std std-ref\">Datasette\n", | |
" class</span>](datasette.html#datasette).\n", | |
" ([\\#576](https://github.com/simonw/datasette/issues/576))\n", | |
" - New methods on `datasette`: `.add_database()` and\n", | |
" `.remove_database()` -\n", | |
" [<span class=\"std std-ref\">documentation</span>](datasette.html#datasette-add-database).\n", | |
" ([\\#671](https://github.com/simonw/datasette/issues/671))\n", | |
" - `prepare_connection()` plugin hook now takes optional `datasette`\n", | |
" and `database` arguments -\n", | |
" [<span class=\"std std-ref\">prepare\\_connection(conn, database,\n", | |
" datasette)</span>](plugins.html#plugin-hook-prepare-connection).\n", | |
" ([\\#678](https://github.com/simonw/datasette/issues/678))\n", | |
" - Added three new plugins and one new conversion tool to the\n", | |
" [<span class=\"std std-ref\">The Datasette\n", | |
" Ecosystem</span>](ecosystem.html#ecosystem).\n", | |
"\n", | |
"</div>\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(changelog_html, 'gfm', format='html'))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"That's pretty great! A couple of improvements:\n", | |
"\n", | |
"1. I can strip out the `Permalink to this headline` elements before running the transformation\n", | |
"2. I should resolve the relative path links to full URLs\n", | |
"3. I can probably strip out the `<span>` and `<div>` elements entirely\n", | |
"\n", | |
"It might be possible to do some of this using advanced Pandoc options - https://pandoc.org/MANUAL.html - but I'm going to play it safe and do it with BeautifulSoup instead.\n", | |
"\n", | |
"But first... let's try it against a more complex example that isn't just a single bulleted list." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"bigger_html = '''<div class=\"section\" id=\"v0-31\">\n", | |
"<span id=\"id9\"></span><h2>0.31 (2019-11-11)<a class=\"headerlink\" href=\"#v0-31\" title=\"Permalink to this headline\">¶</a></h2>\n", | |
"<p>This version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5.</p>\n", | |
"<p>If you are still running Python 3.5 you should stick with <code class=\"docutils literal notranslate\"><span class=\"pre\">0.30.2</span></code>, which you can install like this:</p>\n", | |
"<div class=\"highlight-default notranslate\"><div class=\"highlight\"><pre><span></span><span class=\"n\">pip</span> <span class=\"n\">install</span> <span class=\"n\">datasette</span><span class=\"o\">==</span><span class=\"mf\">0.30</span><span class=\"o\">.</span><span class=\"mi\">2</span>\n", | |
"</pre></div>\n", | |
"</div>\n", | |
"<ul class=\"simple\">\n", | |
"<li>Format SQL button now works with read-only SQL queries - thanks, Tobias Kunze (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/pull/602\">#602</a>)</li>\n", | |
"<li>New <code class=\"docutils literal notranslate\"><span class=\"pre\">?column__notin=x,y,z</span></code> filter for table views (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/614\">#614</a>)</li>\n", | |
"<li>Table view now uses <code class=\"docutils literal notranslate\"><span class=\"pre\">select</span> <span class=\"pre\">col1,</span> <span class=\"pre\">col2,</span> <span class=\"pre\">col3</span></code> instead of <code class=\"docutils literal notranslate\"><span class=\"pre\">select</span> <span class=\"pre\">*</span></code></li>\n", | |
"<li>Database filenames can now contain spaces - thanks, Tobias Kunze (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/pull/590\">#590</a>)</li>\n", | |
"<li>Removed obsolete <code class=\"docutils literal notranslate\"><span class=\"pre\">?_group_count=col</span></code> feature (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/504\">#504</a>)</li>\n", | |
"<li>Improved user interface and documentation for <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span> <span class=\"pre\">publish</span> <span class=\"pre\">cloudrun</span></code> (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/608\">#608</a>)</li>\n", | |
"<li>Tables with indexes now show the <code class=\"docutils literal notranslate\"><span class=\"pre\">CREATE</span> <span class=\"pre\">INDEX</span></code> statements on the table page (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/618\">#618</a>)</li>\n", | |
"<li>Current version of <a class=\"reference external\" href=\"https://www.uvicorn.org/\">uvicorn</a> is now shown on <code class=\"docutils literal notranslate\"><span class=\"pre\">/-/versions</span></code></li>\n", | |
"<li>Python 3.8 is now supported! (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/622\">#622</a>)</li>\n", | |
"<li>Python 3.5 is no longer supported.</li>\n", | |
"</ul>\n", | |
"</div>'''" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<div id=\"v0-31\" class=\"section\">\n", | |
"\n", | |
"<span id=\"id9\"></span>\n", | |
"\n", | |
"## 0.31 (2019-11-11)[¶](#v0-31 \"Permalink to this headline\")\n", | |
"\n", | |
"This version adds compatibility with Python 3.8 and breaks compatibility\n", | |
"with Python 3.5.\n", | |
"\n", | |
"If you are still running Python 3.5 you should stick with `0.30.2`,\n", | |
"which you can install like this:\n", | |
"\n", | |
"<div class=\"highlight-default notranslate\">\n", | |
"\n", | |
"<div class=\"highlight\">\n", | |
"\n", | |
" pip install datasette==0.30.2\n", | |
"\n", | |
"</div>\n", | |
"\n", | |
"</div>\n", | |
"\n", | |
" - Format SQL button now works with read-only SQL queries - thanks,\n", | |
" Tobias Kunze ([\\#602](https://github.com/simonw/datasette/pull/602))\n", | |
" - New `?column__notin=x,y,z` filter for table views\n", | |
" ([\\#614](https://github.com/simonw/datasette/issues/614))\n", | |
" - Table view now uses `select col1, col2, col3` instead of `select *`\n", | |
" - Database filenames can now contain spaces - thanks, Tobias Kunze\n", | |
" ([\\#590](https://github.com/simonw/datasette/pull/590))\n", | |
" - Removed obsolete `?_group_count=col` feature\n", | |
" ([\\#504](https://github.com/simonw/datasette/issues/504))\n", | |
" - Improved user interface and documentation for `datasette publish\n", | |
" cloudrun` ([\\#608](https://github.com/simonw/datasette/issues/608))\n", | |
" - Tables with indexes now show the `CREATE INDEX` statements on the\n", | |
" table page ([\\#618](https://github.com/simonw/datasette/issues/618))\n", | |
" - Current version of [uvicorn](https://www.uvicorn.org/) is now shown\n", | |
" on `/-/versions`\n", | |
" - Python 3.8 is now supported\\!\n", | |
" ([\\#622](https://github.com/simonw/datasette/issues/622))\n", | |
" - Python 3.5 is no longer supported.\n", | |
"\n", | |
"</div>\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(bigger_html, 'gfm', format='html'))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This looks pretty great rendered! https://gist.github.com/simonw/bec7efcee94b4fe7215ddb95ee6a3ec1#v0-31\n", | |
"\n", | |
"I can strip out the `<h2>` (including the permalink thing) entirely since that will turn into the title of the release.\n", | |
"\n", | |
"So all I really need to do is resolve those links into full URLs." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from bs4 import BeautifulSoup as Soup" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"soup = Soup(changelog_html)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 38, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[<a class=\"headerlink\" href=\"#v0-36\" title=\"Permalink to this headline\">¶</a>,\n", | |
" <a class=\"reference internal\" href=\"datasette.html#datasette\"><span class=\"std std-ref\">Datasette class</span></a>,\n", | |
" <a class=\"reference internal\" href=\"datasette.html#datasette-add-database\"><span class=\"std std-ref\">documentation</span></a>,\n", | |
" <a class=\"reference internal\" href=\"plugins.html#plugin-hook-prepare-connection\"><span class=\"std std-ref\">prepare_connection(conn, database, datasette)</span></a>,\n", | |
" <a class=\"reference internal\" href=\"ecosystem.html#ecosystem\"><span class=\"std std-ref\">The Datasette Ecosystem</span></a>]" | |
] | |
}, | |
"execution_count": 38, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"relative_links = [\n", | |
" a for a in soup.findAll(\"a\")\n", | |
" if not (a['href'].startswith('https://') or a['href'].startswith('http://'))\n", | |
"]\n", | |
"relative_links" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 39, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"for a in relative_links:\n", | |
" a['href'] = 'https://datasette.readthedocs.io/en/latest/' + a['href']" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<html><body><div class=\"section\" id=\"v0-36\">\n", | |
"<span id=\"id2\"></span><h2>0.36 (2020-02-21)<a class=\"headerlink\" href=\"https://datasette.readthedocs.io/en/latest/#v0-36\" title=\"Permalink to this headline\">¶</a></h2>\n", | |
"<ul class=\"simple\">\n", | |
"<li>The <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code> object passed to plugins now has API documentation: <a class=\"reference internal\" href=\"https://datasette.readthedocs.io/en/latest/datasette.html#datasette\"><span class=\"std std-ref\">Datasette class</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/576\">#576</a>)</li>\n", | |
"<li>New methods on <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code>: <code class=\"docutils literal notranslate\"><span class=\"pre\">.add_database()</span></code> and <code class=\"docutils literal notranslate\"><span class=\"pre\">.remove_database()</span></code> - <a class=\"reference internal\" href=\"https://datasette.readthedocs.io/en/latest/datasette.html#datasette-add-database\"><span class=\"std std-ref\">documentation</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/671\">#671</a>)</li>\n", | |
"<li><code class=\"docutils literal notranslate\"><span class=\"pre\">prepare_connection()</span></code> plugin hook now takes optional <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code> and <code class=\"docutils literal notranslate\"><span class=\"pre\">database</span></code> arguments - <a class=\"reference internal\" href=\"https://datasette.readthedocs.io/en/latest/plugins.html#plugin-hook-prepare-connection\"><span class=\"std std-ref\">prepare_connection(conn, database, datasette)</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/678\">#678</a>)</li>\n", | |
"<li>Added three new plugins and one new conversion tool to the <a class=\"reference internal\" href=\"https://datasette.readthedocs.io/en/latest/ecosystem.html#ecosystem\"><span class=\"std std-ref\">The Datasette Ecosystem</span></a>.</li>\n", | |
"</ul>\n", | |
"</div>\n", | |
"</body></html>\n" | |
] | |
} | |
], | |
"source": [ | |
"print(soup)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 49, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'0.36 (2020-02-21)¶'" | |
] | |
}, | |
"execution_count": 49, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Now strip out that h2\n", | |
"h2 = soup.find(\"h2\")\n", | |
"title = h2.text\n", | |
"h2.extract()\n", | |
"title" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 51, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<html><body><div class=\"section\" id=\"v0-36\">\n", | |
"<span id=\"id2\"></span>\n", | |
"<ul class=\"simple\">\n", | |
"<li>The <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code> object passed to plugins now has API documentation: <a class=\"reference internal\" href=\"https://datasette.readthedocs.io/en/latest/datasette.html#datasette\"><span class=\"std std-ref\">Datasette class</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/576\">#576</a>)</li>\n", | |
"<li>New methods on <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code>: <code class=\"docutils literal notranslate\"><span class=\"pre\">.add_database()</span></code> and <code class=\"docutils literal notranslate\"><span class=\"pre\">.remove_database()</span></code> - <a class=\"reference internal\" href=\"https://datasette.readthedocs.io/en/latest/datasette.html#datasette-add-database\"><span class=\"std std-ref\">documentation</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/671\">#671</a>)</li>\n", | |
"<li><code class=\"docutils literal notranslate\"><span class=\"pre\">prepare_connection()</span></code> plugin hook now takes optional <code class=\"docutils literal notranslate\"><span class=\"pre\">datasette</span></code> and <code class=\"docutils literal notranslate\"><span class=\"pre\">database</span></code> arguments - <a class=\"reference internal\" href=\"https://datasette.readthedocs.io/en/latest/plugins.html#plugin-hook-prepare-connection\"><span class=\"std std-ref\">prepare_connection(conn, database, datasette)</span></a>. (<a class=\"reference external\" href=\"https://github.com/simonw/datasette/issues/678\">#678</a>)</li>\n", | |
"<li>Added three new plugins and one new conversion tool to the <a class=\"reference internal\" href=\"https://datasette.readthedocs.io/en/latest/ecosystem.html#ecosystem\"><span class=\"std std-ref\">The Datasette Ecosystem</span></a>.</li>\n", | |
"</ul>\n", | |
"</div>\n", | |
"</body></html>\n" | |
] | |
} | |
], | |
"source": [ | |
"print(soup)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 52, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<div id=\"v0-36\" class=\"section\">\n", | |
"\n", | |
"<span id=\"id2\"></span>\n", | |
"\n", | |
" - The `datasette` object passed to plugins now has API documentation:\n", | |
" [<span class=\"std std-ref\">Datasette\n", | |
" class</span>](https://datasette.readthedocs.io/en/latest/datasette.html#datasette).\n", | |
" ([\\#576](https://github.com/simonw/datasette/issues/576))\n", | |
" - New methods on `datasette`: `.add_database()` and\n", | |
" `.remove_database()` -\n", | |
" [<span class=\"std std-ref\">documentation</span>](https://datasette.readthedocs.io/en/latest/datasette.html#datasette-add-database).\n", | |
" ([\\#671](https://github.com/simonw/datasette/issues/671))\n", | |
" - `prepare_connection()` plugin hook now takes optional `datasette`\n", | |
" and `database` arguments -\n", | |
" [<span class=\"std std-ref\">prepare\\_connection(conn, database,\n", | |
" datasette)</span>](https://datasette.readthedocs.io/en/latest/plugins.html#plugin-hook-prepare-connection).\n", | |
" ([\\#678](https://github.com/simonw/datasette/issues/678))\n", | |
" - Added three new plugins and one new conversion tool to the\n", | |
" [<span class=\"std std-ref\">The Datasette\n", | |
" Ecosystem</span>](https://datasette.readthedocs.io/en/latest/ecosystem.html#ecosystem).\n", | |
"\n", | |
"</div>\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(str(soup), 'gfm', format='html'))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I think this is good enough: https://gist.github.com/simonw/bec7efcee94b4fe7215ddb95ee6a3ec1#file-final-md\n", | |
"\n", | |
"I wonder if stripping id= attributes will get rid of those div and spans?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 58, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"for el in soup.findAll():\n", | |
" if \"id\" in el.attrs:\n", | |
" del el.attrs[\"id\"]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 59, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<div class=\"section\">\n", | |
"\n", | |
"<span></span>\n", | |
"\n", | |
" - The `datasette` object passed to plugins now has API documentation:\n", | |
" [<span class=\"std std-ref\">Datasette\n", | |
" class</span>](https://datasette.readthedocs.io/en/latest/datasette.html#datasette).\n", | |
" ([\\#576](https://github.com/simonw/datasette/issues/576))\n", | |
" - New methods on `datasette`: `.add_database()` and\n", | |
" `.remove_database()` -\n", | |
" [<span class=\"std std-ref\">documentation</span>](https://datasette.readthedocs.io/en/latest/datasette.html#datasette-add-database).\n", | |
" ([\\#671](https://github.com/simonw/datasette/issues/671))\n", | |
" - `prepare_connection()` plugin hook now takes optional `datasette`\n", | |
" and `database` arguments -\n", | |
" [<span class=\"std std-ref\">prepare\\_connection(conn, database,\n", | |
" datasette)</span>](https://datasette.readthedocs.io/en/latest/plugins.html#plugin-hook-prepare-connection).\n", | |
" ([\\#678](https://github.com/simonw/datasette/issues/678))\n", | |
" - Added three new plugins and one new conversion tool to the\n", | |
" [<span class=\"std std-ref\">The Datasette\n", | |
" Ecosystem</span>](https://datasette.readthedocs.io/en/latest/ecosystem.html#ecosystem).\n", | |
"\n", | |
"</div>\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(str(soup), 'gfm', format='html'))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I think I want to disable the `raw_html` option which is included in the `gfm` bundle." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 61, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"ename": "RuntimeError", | |
"evalue": "Pandoc died with exitcode \"6\" during conversion: b'Unknown option --raw_html.\\nTry pandoc --help for more information.\\n'", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-61-5acb1858a48a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpypandoc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconvert_text\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msoup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'gfm'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mformat\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'html'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mextra_args\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"--raw_html\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[0;32m~/jupyter-venv/lib/python3.7/site-packages/pypandoc/__init__.py\u001b[0m in \u001b[0;36mconvert_text\u001b[0;34m(source, to, format, extra_args, encoding, outputfile, filters)\u001b[0m\n\u001b[1;32m 101\u001b[0m \u001b[0msource\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_as_unicode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msource\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mencoding\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 102\u001b[0m return _convert_input(source, format, 'string', to, extra_args=extra_args,\n\u001b[0;32m--> 103\u001b[0;31m outputfile=outputfile, filters=filters)\n\u001b[0m\u001b[1;32m 104\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 105\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;32m~/jupyter-venv/lib/python3.7/site-packages/pypandoc/__init__.py\u001b[0m in \u001b[0;36m_convert_input\u001b[0;34m(source, format, input_type, to, extra_args, outputfile, filters)\u001b[0m\n\u001b[1;32m 323\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreturncode\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 324\u001b[0m raise RuntimeError(\n\u001b[0;32m--> 325\u001b[0;31m \u001b[0;34m'Pandoc died with exitcode \"%s\" during conversion: %s'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreturncode\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstderr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 326\u001b[0m )\n\u001b[1;32m 327\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mRuntimeError\u001b[0m: Pandoc died with exitcode \"6\" during conversion: b'Unknown option --raw_html.\\nTry pandoc --help for more information.\\n'" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(str(soup), 'gfm', format='html', extra_args=[\"--raw_html\"]))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 62, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'/Users/simonw/Applications/pandoc/pandoc'" | |
] | |
}, | |
"execution_count": 62, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"pypandoc.get_pandoc_path()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" ~ $ /Users/simonw/Applications/pandoc/pandoc --list-extensions=gfm\n", | |
" +all_symbols_escapable\n", | |
" -ascii_identifiers\n", | |
" +auto_identifiers\n", | |
" +autolink_bare_uris\n", | |
" +backtick_code_blocks\n", | |
" -east_asian_line_breaks\n", | |
" +emoji\n", | |
" +fenced_code_blocks\n", | |
" +gfm_auto_identifiers\n", | |
" -hard_line_breaks\n", | |
" +intraword_underscores\n", | |
" +lists_without_preceding_blankline\n", | |
" +pipe_tables\n", | |
" +raw_html\n", | |
" -raw_tex\n", | |
" +shortcut_reference_links\n", | |
" -smart\n", | |
" +space_in_atx_header\n", | |
" +strikeout\n", | |
" +task_lists\n", | |
"\n", | |
"https://pandoc.org/MANUAL.html#extensions\n", | |
"\n", | |
"> An extension can be enabled by adding +EXTENSION to the format name and disabled by adding -EXTENSION. For example, --from markdown_strict+footnotes is strict Markdown with footnotes enabled, while --from markdown-footnotes-pipe_tables is pandoc’s Markdown without footnotes or pipe tables." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 63, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" - The `datasette` object passed to plugins now has API documentation:\n", | |
" [Datasette\n", | |
" class](https://datasette.readthedocs.io/en/latest/datasette.html#datasette).\n", | |
" ([\\#576](https://github.com/simonw/datasette/issues/576))\n", | |
" - New methods on `datasette`: `.add_database()` and\n", | |
" `.remove_database()` -\n", | |
" [documentation](https://datasette.readthedocs.io/en/latest/datasette.html#datasette-add-database).\n", | |
" ([\\#671](https://github.com/simonw/datasette/issues/671))\n", | |
" - `prepare_connection()` plugin hook now takes optional `datasette`\n", | |
" and `database` arguments - [prepare\\_connection(conn, database,\n", | |
" datasette)](https://datasette.readthedocs.io/en/latest/plugins.html#plugin-hook-prepare-connection).\n", | |
" ([\\#678](https://github.com/simonw/datasette/issues/678))\n", | |
" - Added three new plugins and one new conversion tool to the [The\n", | |
" Datasette\n", | |
" Ecosystem](https://datasette.readthedocs.io/en/latest/ecosystem.html#ecosystem).\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(str(soup), 'gfm-raw_html', format='html'))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"https://gist.github.com/simonw/bec7efcee94b4fe7215ddb95ee6a3ec1#file-gfm-raw_html-md\n", | |
"\n", | |
"This is **great**! I'd like to ditch the line wrapping though." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 64, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" - The `datasette` object passed to plugins now has API documentation: [Datasette class](https://datasette.readthedocs.io/en/latest/datasette.html#datasette). ([\\#576](https://github.com/simonw/datasette/issues/576))\n", | |
" - New methods on `datasette`: `.add_database()` and `.remove_database()` - [documentation](https://datasette.readthedocs.io/en/latest/datasette.html#datasette-add-database). ([\\#671](https://github.com/simonw/datasette/issues/671))\n", | |
" - `prepare_connection()` plugin hook now takes optional `datasette` and `database` arguments - [prepare\\_connection(conn, database, datasette)](https://datasette.readthedocs.io/en/latest/plugins.html#plugin-hook-prepare-connection). ([\\#678](https://github.com/simonw/datasette/issues/678))\n", | |
" - Added three new plugins and one new conversion tool to the [The Datasette Ecosystem](https://datasette.readthedocs.io/en/latest/ecosystem.html#ecosystem).\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(str(soup), 'gfm-raw_html', format='html', extra_args=[\"--wrap=none\"]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Finally, try it again on the larger example." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 65, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"## 0.31 (2019-11-11)[¶](#v0-31 \"Permalink to this headline\")\n", | |
"\n", | |
"This version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5.\n", | |
"\n", | |
"If you are still running Python 3.5 you should stick with `0.30.2`, which you can install like this:\n", | |
"\n", | |
" pip install datasette==0.30.2\n", | |
"\n", | |
" - Format SQL button now works with read-only SQL queries - thanks, Tobias Kunze ([\\#602](https://github.com/simonw/datasette/pull/602))\n", | |
" - New `?column__notin=x,y,z` filter for table views ([\\#614](https://github.com/simonw/datasette/issues/614))\n", | |
" - Table view now uses `select col1, col2, col3` instead of `select *`\n", | |
" - Database filenames can now contain spaces - thanks, Tobias Kunze ([\\#590](https://github.com/simonw/datasette/pull/590))\n", | |
" - Removed obsolete `?_group_count=col` feature ([\\#504](https://github.com/simonw/datasette/issues/504))\n", | |
" - Improved user interface and documentation for `datasette publish cloudrun` ([\\#608](https://github.com/simonw/datasette/issues/608))\n", | |
" - Tables with indexes now show the `CREATE INDEX` statements on the table page ([\\#618](https://github.com/simonw/datasette/issues/618))\n", | |
" - Current version of [uvicorn](https://www.uvicorn.org/) is now shown on `/-/versions`\n", | |
" - Python 3.8 is now supported\\! ([\\#622](https://github.com/simonw/datasette/issues/622))\n", | |
" - Python 3.5 is no longer supported.\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"print(pypandoc.convert_text(bigger_html, 'gfm-raw_html', format='html', extra_args=[\"--wrap=none\"]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Nailed it!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment