lorinc/german_sentence_deck.ipynb

## german_sentence_deck.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "German_Sentence_Deck.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python2",
      "display_name": "Python 2"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/lorinc/edbe6ef72eb6f8259ab6c30b715170e7/german_sentence_deck.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LiUS5Hq7LzU6",
        "colab_type": "text"
      },
      "source": [
        "# An evidence-based language-acquisition method\n",
        "\n",
        "> *\"The Martians\" were a group of prominent Hungarian scientists of Jewish descent (mostly, but not exclusively, physicists and mathematicians) who emigrated to the United States in the early half of the 20th century. They included, among others, Theodore von Kármán, John von Neumann, Paul Halmos, Eugene Wigner, Edward Teller, George Pólya, John G. Kemeny and Paul Erdős.* -- Wikipedia\n",
        "\n",
        "> The method of language aquisition of these geniuses? They memorized books.\n",
        "\n",
        "## This is an executable script, not an article\n",
        "\n",
        "This is a [Google Colaboratory](https://research.google.com/colaboratory/faq.html) notebook. Upon execution, it generates a deck of cards for German spaced repetition learning.\n",
        "\n",
        "## Resources and tools\n",
        "- 300k+ German / English sentence pairs: [tatoeba.org](tatoeba.org)\n",
        "- German word frequency list: [DeReWo](http://www1.ids-mannheim.de/kl/projekte/methoden/derewo.html) by [Das Leibniz-Institut für Deutsche Sprache](https://www.uni-muenchen.de/)\n",
        "- [Anki](https://apps.ankiweb.net/), an Open Source [spaced repetition](https://en.wikipedia.org/wiki/Spaced_repetition) learning tool\n",
        "- Execution environment: [Google Colaboratory](https://research.google.com/colaboratory/faq.html)\n",
        "- Code repo: https://gist.github.com/lorinc/edbe6ef72eb6f8259ab6c30b715170e7\n",
        "- Notebook rendering: [nbviewer.jupyter.org](https://nbviewer.jupyter.org/gist/lorinc/edbe6ef72eb6f8259ab6c30b715170e7/german_sentence_deck.ipynb)\n",
        "\n",
        "## TL;DR\n",
        "Results of this script is freely and conveniently available.\n",
        "\n",
        "1. Download and install [Anki](https://apps.ankiweb.net/), an Open Source [spaced repetition](https://en.wikipedia.org/wiki/Spaced_repetition) learning tool for Linux, Android and Windows.\n",
        "2. From this tool, get the [shared deck of learning cards](https://ankiweb.net/shared/info/172351159) this script generates.\n",
        "3. **Learn every day while you commute, get fluent.**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MjU5ZcS25G27",
        "colab_type": "code",
        "cellView": "both",
        "colab": {}
      },
      "source": [
        "%%capture\n",
        "%%bash\n",
        "\n",
        "# cleaning up residues from past executions and sample data folder\n",
        "\n",
        "rm *\n",
        "rm -rf sample_data\n",
        "\n",
        "# downloading the tatoeba corpus\n",
        "\n",
        "wget -nv http://downloads.tatoeba.org/exports/sentences_detailed.tar.bz2 \\\n",
        "         http://downloads.tatoeba.org/exports/user_languages.tar.bz2 \\\n",
        "         http://downloads.tatoeba.org/exports/links.tar.bz2\n",
        "\n",
        "# downloading a 10k German word frequency list\n",
        "wget -nv http://www1.ids-mannheim.de/fileadmin/kl/derewo/DeReKo-2014-II-MainArchive-STT.100000.freq.7z\n",
        "\n",
        "# 7z is already pre-installed on hosted free Colab\n",
        "7z e DeReKo-2014-II-MainArchive-STT.100000.freq.7z\n",
        "mv DeReKo-2014-II-MainArchive-STT.100000.freq freq.csv\n",
        "\n",
        "# extracting tatoeba corpus\n",
        "\n",
        "tar xvjf sentences_detailed.tar.bz2 \n",
        "tar xvjf user_languages.tar.bz2\n",
        "tar xvjf links.tar.bz2\n",
        "\n",
        "# cleaning up\n",
        "rm *.bz2\n",
        "rm *.7z\n",
        "rm *.readme\n",
        "\n",
        "# show files\n",
        "ls -la"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UAPeTojXdvu8",
        "colab_type": "code",
        "cellView": "both",
        "colab": {}
      },
      "source": [
        "%%bash\n",
        "\n",
        "# in tatoeba CSVs null is represented as a '\\N' string\n",
        "\n",
        "# selecting reference users, whose translations will be used\n",
        "grep -P '^eng\\t[45]' user_languages.csv > eng_users.csv\n",
        "grep -P '^deu\\t[45]' user_languages.csv > deu_users.csv\n",
        "\n",
        "# German sentences (length, owner, punctuation)\n",
        "awk -F, '\n",
        "  BEGIN {FS=\"\\t\"};\n",
        "  {\n",
        "    if (($2 == \"deu\" &&\n",
        "      $4 != \"\\\\N\" &&\n",
        "      length($3) > 40 &&\n",
        "      length($3) < 100 &&\n",
        "      substr($3, 1, length($3)-1) !~ /[\\.\\?\\!]/))\n",
        "    print $0\n",
        "  } ' sentences_detailed.csv > deu.csv\n",
        "\n",
        "# English sentences (owner)\n",
        "awk -F, '\n",
        "  BEGIN {FS=\"\\t\"};\n",
        "  {\n",
        "    if (($2 == \"eng\" && $4 != \"\\\\N\")) print $0\n",
        "  } ' sentences_detailed.csv > eng.csv"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tFf8P3bL9TBN",
        "colab_type": "code",
        "cellView": "both",
        "colab": {}
      },
      "source": [
        "# pulling data into Pandas DataFrames\n",
        "\n",
        "import warnings\n",
        "import pandas as pd\n",
        "\n",
        "# suppressing futurewarnings\n",
        "warnings.simplefilter(action='ignore', category=FutureWarning)\n",
        "\n",
        "freq = pd.read_csv(\n",
        "                'freq.csv', \n",
        "                sep='\\t', \n",
        "                header=None,\n",
        "                names=['word', 'lemma', 'POS_tag', 'POS_confidence'])\n",
        "\n",
        "links = pd.read_csv('links.csv',\n",
        "                 delimiter='\\t',\n",
        "                 error_bad_lines=False,\n",
        "                 warn_bad_lines=True,\n",
        "                 index_col=0,\n",
        "                 header=None,\n",
        "                 mangle_dupe_cols=True)\n",
        "\n",
        "eng_sentences = pd.read_csv('eng.csv',\n",
        "                 delimiter='\\t',\n",
        "                 error_bad_lines=False,\n",
        "                 warn_bad_lines=True,\n",
        "                 index_col=0,\n",
        "                 usecols=[0,2,3],\n",
        "                 names=['id', 'text', 'owner'],\n",
        "                 header=None)\n",
        "\n",
        "deu_sentences = pd.read_csv('deu.csv',\n",
        "                 delimiter='\\t',\n",
        "                 error_bad_lines=False,\n",
        "                 warn_bad_lines=True,\n",
        "                 index_col=0,\n",
        "                 usecols=[0,2,3],\n",
        "                 names=['id', 'text', 'owner'],\n",
        "                 header=None)\n",
        "\n",
        "eng_users = pd.read_csv('eng_users.csv',\n",
        "                 delimiter='\\t',\n",
        "                 error_bad_lines=False,\n",
        "                 warn_bad_lines=True,\n",
        "                 index_col=0,\n",
        "                 usecols=[2],\n",
        "                 names=['owner'],\n",
        "                 header=None)\n",
        "\n",
        "deu_users = pd.read_csv('deu_users.csv',\n",
        "                 delimiter='\\t',\n",
        "                 error_bad_lines=False,\n",
        "                 warn_bad_lines=True,\n",
        "                 index_col=0,\n",
        "                 usecols=[2],\n",
        "                 names=['owner'],\n",
        "                 header=None)\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZPIGC6y1Mht4",
        "colab_type": "code",
        "cellView": "both",
        "colab": {}
      },
      "source": [
        "translations = links.join(deu_sentences, how='right')\\\n",
        "     .dropna().set_index(1)\\\n",
        "     .join(eng_sentences, how='right', lsuffix='_deu', rsuffix='_eng')\\\n",
        "     .dropna().loc[:,['text_deu','text_eng']].reset_index(drop=True)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gGc-KiSyWFhl",
        "colab_type": "code",
        "cellView": "both",
        "colab": {}
      },
      "source": [
        "# cleaning / trimming down the frequency list\n",
        "# and also diminishing the rarity as it grows\n",
        "\n",
        "bad_POS = ['TRUNC','$(','$,','$.','156259594','XY', 'CARD', 'NE']\n",
        "bad_lemma = ['UNKNOWN', 'unknown']\n",
        "\n",
        "POS_filter = ~freq.POS_tag.isin(bad_POS)\n",
        "lemma_filter = ~freq.lemma.isin(bad_lemma)\n",
        "\n",
        "freq = freq[POS_filter]\n",
        "freq = freq[lemma_filter]\n",
        "\n",
        "freq['log_freq'] = (\n",
        "    pd.np.log(\n",
        "        freq.index\n",
        "        .astype(pd.np.int64)\n",
        "        +1 # there I fixed np.log(0) with a ducktape\n",
        "    )\n",
        "    .astype(pd.np.int)\n",
        ")\n",
        "\n",
        "# preparing the German sentences to be probed against the frequency list\n",
        "word_lists = (\n",
        "    translations['text_deu']\n",
        "    .str.replace(r'[,:]', '')\n",
        "    .str.split()\n",
        ")\n",
        "\n",
        "# left join the frequency list to every single word list\n",
        "# and deriving the median rarity of the words in the sentence\n",
        "translations['rarity'] = (\n",
        "    word_lists\n",
        "      .apply(\n",
        "          lambda word_list: \n",
        "            pd.DataFrame(word_list)\n",
        "              .merge(freq, left_on=0, right_on='word', how='left')['log_freq']\n",
        "              .median()\n",
        "      )\n",
        ")\n",
        "\n",
        "# calculating the complexity value\n",
        "translations['complexity'] = (\n",
        "    translations.text_deu.str.len()\n",
        "    *\n",
        "    translations.rarity\n",
        ")\n",
        "\n",
        "# sorting sentences by complexity, then resetting index\n",
        "# ready to export\n",
        "(\n",
        "    translations\n",
        "      .sort_values('complexity', ascending=True)\n",
        "      .reset_index(drop=True, inplace=True)\n",
        ")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MTu70pIhErbd",
        "colab_type": "code",
        "cellView": "both",
        "colab": {}
      },
      "source": [
        "%%capture\n",
        "from google.colab import files\n",
        "\n",
        "url = ('https://translate.google.com/' +\n",
        "       'translate_tts?ie=UTF-8&tl=de-DE&client=tw-ob&q=')\n",
        "\n",
        "# making the German text Google Translate URL compatible\n",
        "translations['audio'] = (url +\n",
        "\n",
        "      translations['text_deu'].str.replace('[ \\'\\\"]', '+') + '+')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uC0PjIK_rPV5",
        "colab_type": "code",
        "outputId": "03638916-0f37-44b3-eda5-bf70fd544ba2",
        "cellView": "both",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        }
      },
      "source": [
        "# taking a look at the results. Remember - grammar does not count into\n",
        "# complexity, only how common the words are in the sentence\n",
        "(translations[['text_deu', 'text_eng', 'complexity']]\n",
        "   .sample(50)\n",
        "   .set_index(['text_deu','text_eng'])\n",
        "   .sort_values('complexity'))"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th>complexity</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>text_deu</th>\n",
              "      <th>text_eng</th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>Tom sagte, dass nicht nur er es hasse, das zu tun.</th>\n",
              "      <th>Tom said that he wasn't the only one who hated doing that.</th>\n",
              "      <td>175.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Das ist nur ein vorübergehender Rückschlag.</th>\n",
              "      <th>This is only a temporary setback.</th>\n",
              "      <td>180.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich weiß, dass ich hier nicht willkommen bin.</th>\n",
              "      <th>I know I'm not welcome here.</th>\n",
              "      <td>184.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich kann diesen Rechner nicht reparieren.</th>\n",
              "      <th>I can't fix this computer.</th>\n",
              "      <td>184.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich möchte so weit weg von hier, wie ich kann.</th>\n",
              "      <th>I want to get as far away from here as I can.</th>\n",
              "      <td>188.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich war erfolgreich, weil ich Glück hatte.</th>\n",
              "      <th>The reason I succeeded was because I was lucky.</th>\n",
              "      <td>193.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich habe zwei Jahre in Rio de Janeiro gearbeitet.</th>\n",
              "      <th>I worked in Rio de Janeiro for two years.</th>\n",
              "      <td>196.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Sein Gehalt wurde um zehn Prozent erhöht.</th>\n",
              "      <th>His salary was increased by ten percent.</th>\n",
              "      <td>210.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich sollte wohl besser allein hineingehen.</th>\n",
              "      <th>I think I should go in alone.</th>\n",
              "      <td>210.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich habe es niemandem gesagt, selbst Tom nicht.</th>\n",
              "      <th>I haven't told anyone, not even Tom.</th>\n",
              "      <td>211.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Wir unterhielten uns bei einer Tasse Kaffee.</th>\n",
              "      <th>We talked over a cup of coffee.</th>\n",
              "      <td>220.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Tom hat seinen Schirm in der Klasse vergessen.</th>\n",
              "      <th>Tom left his umbrella in the classroom.</th>\n",
              "      <td>230.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Sie hätten mir die Wahrheit sagen sollen.</th>\n",
              "      <th>You should've told me the truth.</th>\n",
              "      <td>231.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Werden Sie Tom anrufen, oder möchten Sie, dass ich das tu?</th>\n",
              "      <th>Are you going to call Tom or do you want me to?</th>\n",
              "      <td>236.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Wir können nicht einfach nur herumsitzen und gar nicht tun.</th>\n",
              "      <th>We can't just sit around and do nothing.</th>\n",
              "      <td>240.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Tom spielt nicht nur Mundharmonika, sondern auch Gitarre.</th>\n",
              "      <th>Not only does Tom play the harmonica, he plays the guitar, too.</th>\n",
              "      <td>256.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Das Mädchen mit den blauen Augen ist Jane.</th>\n",
              "      <th>The girl with blue eyes is Jane.</th>\n",
              "      <td>258.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Er hat gute Aussichten, gewählt zu werden.</th>\n",
              "      <th>He has good chances of being chosen.</th>\n",
              "      <td>258.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Wie kann man die Gefahren des Internets meiden?</th>\n",
              "      <th>How can you avoid the dangers of the Internet?</th>\n",
              "      <td>258.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich arbeitete den ganzen Tag auf dem Bauernhof.</th>\n",
              "      <th>I worked on the farm all day.</th>\n",
              "      <td>258.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Dies allein reicht schon aus, um uns zu überzeugen.</th>\n",
              "      <th>This alone is enough to convince us.</th>\n",
              "      <td>260.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Wir sind ein paar Wochen hinter unserem Zeitplan.</th>\n",
              "      <th>We're a few weeks behind schedule.</th>\n",
              "      <td>269.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Hat Tom gesagt, wo Mary hingegangen sein könnte?</th>\n",
              "      <th>Did Tom say where Mary might've gone?</th>\n",
              "      <td>269.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Maria sah so aus, als hätte sie seit Tagen nicht geschlafen.</th>\n",
              "      <th>Mary looked like she hadn't slept in days.</th>\n",
              "      <td>274.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich kann nicht glauben, dass es das ist, was Tom wirklich beunruhigt.</th>\n",
              "      <th>I can't believe that's what's really bothering Tom.</th>\n",
              "      <td>276.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Tom hat seinen Enkelkindern sehr viel Geld hinterlassen.</th>\n",
              "      <th>Tom left his grandchildren a lot of money.</th>\n",
              "      <td>280.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Nicht weit von meinem Haus gibt es einen Fluss.</th>\n",
              "      <th>There's a river near my house.</th>\n",
              "      <td>282.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Niemand kann die Tatsache leugnen, dass die Erde rund ist.</th>\n",
              "      <th>No one can deny the fact that the earth is round.</th>\n",
              "      <td>290.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Tom arbeitet gewöhnlich von neun bis halb sechs.</th>\n",
              "      <th>Tom usually works from nine to five-thirty.</th>\n",
              "      <td>294.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Wenn du mich nur gefragt hättest, dann hätte ich es getan.</th>\n",
              "      <th>If you'd just asked me, I would've done it.</th>\n",
              "      <td>300.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Lass uns hier verschwinden, bevor es zu spät ist!</th>\n",
              "      <th>Let's get out of here before it's too late.</th>\n",
              "      <td>300.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Tom hätte diese Stelle haben können, hätte er sie gewollt.</th>\n",
              "      <th>Tom could've had this job if he'd wanted it.</th>\n",
              "      <td>305.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich habe den ganzen Nachmittag vergeblich gewartet.</th>\n",
              "      <th>I waited all afternoon in vain.</th>\n",
              "      <td>306.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich frage mich, was ich zum Abendessen kochen soll.</th>\n",
              "      <th>I'm wondering what to cook for dinner.</th>\n",
              "      <td>306.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Du sagtest uns nicht, was er in diesem Brief geschrieben hatte.</th>\n",
              "      <th>You didn't tell us what he had written in this letter.</th>\n",
              "      <td>315.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Layla nannte der Polizei einen falschen Namen.</th>\n",
              "      <th>Layla gave the police a fake name.</th>\n",
              "      <td>322.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ihr glaubt wohl, ich begehe einen Fehler, oder?</th>\n",
              "      <th>You think I'm making a mistake, don't you?</th>\n",
              "      <td>329.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Was immer er für Fehler haben mag, Geiz gehört nicht dazu.</th>\n",
              "      <th>Whatever faults he may have, meanness is not one of them.</th>\n",
              "      <td>330.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Es wurden bedeutende Fortschritte erzielt.</th>\n",
              "      <th>Great progress has been made.</th>\n",
              "      <td>336.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Tom entschuldigte sich dafür, zu spät gekommen zu sein.</th>\n",
              "      <th>Tom excused himself for being late.</th>\n",
              "      <td>342.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Tom und Maria gingen händchenhaltend den Pfad entlang.</th>\n",
              "      <th>Tom and Mary walked down the path, holding hands.</th>\n",
              "      <td>357.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Der Revolutionsrat kam zusammen, um eine Strategie zu planen.</th>\n",
              "      <th>The revolutionary council met to plan strategy.</th>\n",
              "      <td>366.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Kennst du irgendwelche finnischen Zungenbrecher?</th>\n",
              "      <th>Do you know any Finnish tongue-twisters?</th>\n",
              "      <td>408.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Wir hätten vorher anrufen und einen Tisch bestellen sollen.</th>\n",
              "      <th>We should have phoned ahead and reserved a table.</th>\n",
              "      <td>420.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Niemand wusste, dass Tom ein ehemaliger Gefangener war.</th>\n",
              "      <th>No one knew Tom was an ex-con.</th>\n",
              "      <td>440.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Ich gebe meinen Hunden jeden Abend zwei Becher Hundefutter.</th>\n",
              "      <th>I feed my dog two cups of dog food every evening.</th>\n",
              "      <td>442.5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Solange du dich ruhig verhältst, kannst du in diesem Zimmer bleiben.</th>\n",
              "      <th>As long as you keep quiet, you can stay in this room.</th>\n",
              "      <td>483.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Wenn man sie Englisch sprechen hört, könnte man annehmen, sie sei Amerikanerin.</th>\n",
              "      <th>If you heard her speak English, you would take her for an American.</th>\n",
              "      <td>486.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Natürlich ist es nur der Anfang unserer Aufgabe, unsere gemeine Menschlichkeit zu erkennen.</th>\n",
              "      <th>Of course, recognizing our common humanity is only the beginning of our task.</th>\n",
              "      <td>552.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>Die Fahrgäste, die nach Hogwarts fahren, mögen sich bitte auf Bahnsteig neundreiviertel begeben.</th>\n",
              "      <th>Passengers for Hogwarts, please make your way to platform nine and three-quarters.</th>\n",
              "      <td>686.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                                                                                                       complexity\n",
              "text_deu                                           text_eng                                                      \n",
              "Tom sagte, dass nicht nur er es hasse, das zu tun. Tom said that he wasn't the only one who hated ...       175.0\n",
              "Das ist nur ein vorübergehender Rückschlag.        This is only a temporary setback.                        180.0\n",
              "Ich weiß, dass ich hier nicht willkommen bin.      I know I'm not welcome here.                             184.0\n",
              "Ich kann diesen Rechner nicht reparieren.          I can't fix this computer.                               184.5\n",
              "Ich möchte so weit weg von hier, wie ich kann.     I want to get as far away from here as I can.            188.0\n",
              "Ich war erfolgreich, weil ich Glück hatte.         The reason I succeeded was because I was lucky.          193.5\n",
              "Ich habe zwei Jahre in Rio de Janeiro gearbeitet.  I worked in Rio de Janeiro for two years.                196.0\n",
              "Sein Gehalt wurde um zehn Prozent erhöht.          His salary was increased by ten percent.                 210.0\n",
              "Ich sollte wohl besser allein hineingehen.         I think I should go in alone.                            210.0\n",
              "Ich habe es niemandem gesagt, selbst Tom nicht.    I haven't told anyone, not even Tom.                     211.5\n",
              "Wir unterhielten uns bei einer Tasse Kaffee.       We talked over a cup of coffee.                          220.0\n",
              "Tom hat seinen Schirm in der Klasse vergessen.     Tom left his umbrella in the classroom.                  230.0\n",
              "Sie hätten mir die Wahrheit sagen sollen.          You should've told me the truth.                         231.0\n",
              "Werden Sie Tom anrufen, oder möchten Sie, dass ... Are you going to call Tom or do you want me to?          236.0\n",
              "Wir können nicht einfach nur herumsitzen und ga... We can't just sit around and do nothing.                 240.0\n",
              "Tom spielt nicht nur Mundharmonika, sondern auc... Not only does Tom play the harmonica, he plays ...       256.5\n",
              "Das Mädchen mit den blauen Augen ist Jane.         The girl with blue eyes is Jane.                         258.0\n",
              "Er hat gute Aussichten, gewählt zu werden.         He has good chances of being chosen.                     258.0\n",
              "Wie kann man die Gefahren des Internets meiden?    How can you avoid the dangers of the Internet?           258.5\n",
              "Ich arbeitete den ganzen Tag auf dem Bauernhof.    I worked on the farm all day.                            258.5\n",
              "Dies allein reicht schon aus, um uns zu überzeu... This alone is enough to convince us.                     260.0\n",
              "Wir sind ein paar Wochen hinter unserem Zeitplan.  We're a few weeks behind schedule.                       269.5\n",
              "Hat Tom gesagt, wo Mary hingegangen sein könnte?   Did Tom say where Mary might've gone?                    269.5\n",
              "Maria sah so aus, als hätte sie seit Tagen nich... Mary looked like she hadn't slept in days.               274.5\n",
              "Ich kann nicht glauben, dass es das ist, was To... I can't believe that's what's really bothering ...       276.0\n",
              "Tom hat seinen Enkelkindern sehr viel Geld hint... Tom left his grandchildren a lot of money.               280.0\n",
              "Nicht weit von meinem Haus gibt es einen Fluss.    There's a river near my house.                           282.0\n",
              "Niemand kann die Tatsache leugnen, dass die Erd... No one can deny the fact that the earth is round.        290.0\n",
              "Tom arbeitet gewöhnlich von neun bis halb sechs.   Tom usually works from nine to five-thirty.              294.0\n",
              "Wenn du mich nur gefragt hättest, dann hätte ic... If you'd just asked me, I would've done it.              300.0\n",
              "Lass uns hier verschwinden, bevor es zu spät ist!  Let's get out of here before it's too late.              300.0\n",
              "Tom hätte diese Stelle haben können, hätte er s... Tom could've had this job if he'd wanted it.             305.0\n",
              "Ich habe den ganzen Nachmittag vergeblich gewar... I waited all afternoon in vain.                          306.0\n",
              "Ich frage mich, was ich zum Abendessen kochen s... I'm wondering what to cook for dinner.                   306.0\n",
              "Du sagtest uns nicht, was er in diesem Brief ge... You didn't tell us what he had written in this ...       315.0\n",
              "Layla nannte der Polizei einen falschen Namen.     Layla gave the police a fake name.                       322.0\n",
              "Ihr glaubt wohl, ich begehe einen Fehler, oder?    You think I'm making a mistake, don't you?               329.0\n",
              "Was immer er für Fehler haben mag, Geiz gehört ... Whatever faults he may have, meanness is not on...       330.0\n",
              "Es wurden bedeutende Fortschritte erzielt.         Great progress has been made.                            336.0\n",
              "Tom entschuldigte sich dafür, zu spät gekommen ... Tom excused himself for being late.                      342.0\n",
              "Tom und Maria gingen händchenhaltend den Pfad e... Tom and Mary walked down the path, holding hands.        357.5\n",
              "Der Revolutionsrat kam zusammen, um eine Strate... The revolutionary council met to plan strategy.          366.0\n",
              "Kennst du irgendwelche finnischen Zungenbrecher?   Do you know any Finnish tongue-twisters?                 408.0\n",
              "Wir hätten vorher anrufen und einen Tisch beste... We should have phoned ahead and reserved a table.        420.0\n",
              "Niemand wusste, dass Tom ein ehemaliger Gefange... No one knew Tom was an ex-con.                           440.0\n",
              "Ich gebe meinen Hunden jeden Abend zwei Becher ... I feed my dog two cups of dog food every evening.        442.5\n",
              "Solange du dich ruhig verhältst, kannst du in d... As long as you keep quiet, you can stay in this...       483.0\n",
              "Wenn man sie Englisch sprechen hört, könnte man... If you heard her speak English, you would take ...       486.0\n",
              "Natürlich ist es nur der Anfang unserer Aufgabe... Of course, recognizing our common humanity is o...       552.0\n",
              "Die Fahrgäste, die nach Hogwarts fahren, mögen ... Passengers for Hogwarts, please make your way t...       686.0"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 14
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "PJ-MSS1jrSAA",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# generating result and downloading it in chrome\n",
        "(translations[['text_deu', 'text_eng', 'audio', 'complexity']]\n",
        "    .to_csv('export.csv', sep='\\t', encoding='utf-8', header=False))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YF3mUW5d1T-q",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# downloading script results to your machine\n",
        "files.download('export.csv')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_cI60ol8ICQN",
        "colab_type": "text"
      },
      "source": [
        "### This is how the solution side of the flashcard looks like on my phone\n",
        "\n",
        "<img src=\"https://image.ibb.co/mKj6s9/Screenshot_20180912_203729_Anki_Droid.jpg\" width=\"400\" border=1/>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eFxrlHBPtP9b",
        "colab_type": "text"
      },
      "source": [
        "If you use ANKI, format your cards something like this to get the text-to-speach play automatically when you flip the card:\n",
        "\n",
        "```html\n",
        "{{FrontSide}}\n",
        "\n",
        "<hr/>\n",
        "\n",
        "{{Back}}\n",
        "\n",
        "<br\\><br\\>\n",
        "\n",
        "{{#Back URL}} \n",
        "<iframe\n",
        "  src=\"{{Back URL}}\"\n",
        "  style=\"border:2px solid black; padding: 25px; width: 340px; height: 120px;\"\n",
        "> {{/Back URL}}\n",
        "```"
      ]
    }
  ]
}