androidfred/advanced_anki_deck_editing.md

## advanced_anki_deck_editing.md

      
    Raw
  

              advanced_anki_deck_editing.md
            
          
    Advanced Anki deck editing made simple (ish)

Anki is a great open source flashcard app that can be used to learn anything.
This Gist is a full end to end example of how to:

export Anki decks from Anki
import Anki decks into MySQL
edit Anki decks using MySQL
export Anki decks from MySQL
import Anki decks back into Anki (including creating new Note Types and Cards)

While this Gist uses Korean language decks as examples, this Gist is not about Korean language decks, but about advanced Anki deck editing applicable to any Anki decks of any domain.
Background

The top two Anki decks for Korean are:

Korean Grammar Sentences by Evita ("the Grammar deck")
Korean Vocabulary by Evita (the "Vocab" deck)

While both are excellent, the Vocab deck doesn't have the audio, sentences or notes found in the Grammar deck.
Grammar deck fields:
Korean (sentence)
Translation
Alternative translation
Audio
Notes

Vocab deck fields:
Korean (word)
English
Hanja
Audio
Number

This Gist shows how an enriched vocab deck with fields from both decks can be created.
Options for editing Anki decks

The Anki Browser

Anki has a built-in browser that can be used to edit decks. It may be very powerful, but can be difficult to understand and use.
SQLite

Anki decks have the file ending .anki2 but are really just SQLite files, so they can be edited using any SQLite client. However, SQLite has limited functionality compared to eg MySQL.
Raw text

Anki decks can be exported to tab separated raw text files which can subsequently be imported into more or less anything.
(In Anki) File | Export. Notes in plain text. Select the deck to be exported. Export. Select location. Save.
The exported file will have a line for each note, and every line will have a tabs separating the fields. Eg, since the fields of the notes in the Grammar deck are Korean, Translation, Alternative Translation, Audio and Notes, the exported Grammar file looks like this:
작은 차를 살래요.	I'm going to buy a compact car.		[sound:eks391.mp3]		dif4
하는 것	doing something	the thing that you do	[sound:eks013.mp3]	This is one way to turn action verbs into nouns. 것 = 거 = thing.	dif4 phrase
먹는 것	eating	the thing that you eat	[sound:eks014.mp3]	-는 turns the verb into an adjective (present tense) and adding 것 makes it a noun	dif4 phrase

MySQL

MySQL is a great option for editing Anki decks. Installing and setting up MySQL is beyond the scope of this Gist, see https://www.mysql.com.
Editing Anki deck using MySQL

Import deck into MySQL

NOTE If you at any point screw up any of the schemas or tables created as part of this Gist, don't worry. The steps in this section are so few you can just drop the schema or table and start over.
In order to import the decks into MySQL, first, a schema must be created.
After logging in to MySQL:
mysql> create schema anki;
Query OK, 1 row affected (0.00 sec)
mysql> use anki;
Database changed
Then, tables for each deck must be created.
NOTE MySQL have different charsets and collations, different combinations of which may or may not be able to handle different languages. If the appropriate charset and collation is not specified, MySQL will default to simpler charsets and collations that are not able to handle eg Korean, which means any Korean imported into such tables will become garbled nonsense characters.
The grammar table will have the same columns as the fields in the grammar deck.
mysql> CREATE TABLE `grammar` (
  `korean` mediumtext COLLATE utf8mb4_unicode_ci,
  `translation` mediumtext COLLATE utf8mb4_unicode_ci,
  `alternative_translation` mediumtext COLLATE utf8mb4_unicode_ci,
  `audio` mediumtext COLLATE utf8mb4_unicode_ci,
  `notes` mediumtext COLLATE utf8mb4_unicode_ci
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.01 sec)


The vocab table will have the same columns as the fields in the vocab deck.
mysql> CREATE TABLE `vocab` (
  `korean` mediumtext COLLATE utf8mb4_unicode_ci,
  `english` mediumtext COLLATE utf8mb4_unicode_ci,
  `hanja` mediumtext COLLATE utf8mb4_unicode_ci,
  `audio` mediumtext COLLATE utf8mb4_unicode_ci,
  `number` mediumtext COLLATE utf8mb4_unicode_ci
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Query OK, 0 rows affected (0.01 sec)


The vocab_full table will have the same columns as the fields in the vocab deck + columns for the fields in the grammar deck.
mysql> CREATE TABLE `vocab_full` (
  `korean` mediumtext COLLATE utf8mb4_unicode_ci,
  `english` mediumtext COLLATE utf8mb4_unicode_ci,
  `hanja` mediumtext COLLATE utf8mb4_unicode_ci,
  `audio` mediumtext COLLATE utf8mb4_unicode_ci,
  `number` mediumtext COLLATE utf8mb4_unicode_ci,
  `korean_sentence` mediumtext COLLATE utf8mb4_unicode_ci,
  `english_sentence` mediumtext COLLATE utf8mb4_unicode_ci,
  `english_alternative_sentence` mediumtext COLLATE utf8mb4_unicode_ci,
  `audio_sentence` mediumtext COLLATE utf8mb4_unicode_ci,
  `notes_sentence` mediumtext COLLATE utf8mb4_unicode_ci
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.01 sec)

Now that there is a schema with tables in MySQL, the decks can be imported. (NOTE MySQL will infer the desired table by the import file name)
In the terminal:
$ cd <folder containing exported deck>
$ mysqlimport --fields-terminated-by='\t' --columns=korean,translation,alternative_translation,audio,notes --local -u root -p anki grammar.txt
Enter password: 
anki.grammar: Records: 2051  Deleted: 0  Skipped: 0  Warnings: 2050

$ mysqlimport --fields-terminated-by='\t' --columns=korean,english,hanja,audio,number --local -u root -p anki vocab.txt
Enter password: 
anki.vocab: Records: 5270  Deleted: 0  Skipped: 0  Warnings: 9
Now, all the power of MySQL is available to edit the decks.
Enriching the Vocab deck

The basic thing we want to do is pull data from the grammar table into the vocab table. A naive JOIN might look something like this:
mysql> SELECT
        vocab.korean
        ,vocab.english
        ,grammar.korean
        ,grammar.translation
    FROM
        grammar INNER JOIN vocab
            ON grammar.korean LIKE CONCAT( '%' ,vocab.korean ,'%' )
        AND grammar.translation LIKE CONCAT( '%' ,vocab.english ,'%' ) LIMIT 10;
The problem with the above JOIN is it will match eg the vocab.english content "yes" on grammar.translation "yesterday". A better JOIN might look something like this:
mysql> SELECT
        vocab.korean
        ,vocab.english
        ,grammar.korean
        ,grammar.translation
    FROM
        grammar INNER JOIN vocab
            ON grammar.korean regexp CONCAT( '[[:<:]]' ,vocab.korean ,'[[:>:]]' )
        AND grammar.translation regexp CONCAT( '[[:<:]]' ,vocab.english ,'[[:>:]]' ) LIMIT 10;
The above JOIN will respect word endings.
NOTE that there are still many other, potentially much better, JOINs or other operations that could be used to enrich the vocab deck, eg JOINs that use more intelligent text matching than merely single word matching, scripts that that scrape example sentences and audio off Naver or some other source etc. (any suggestions greatly appreciated, please comment!)
However, since even this relatively naive JOIN gives OK results, we will stick to it for now. Run an INSERT: (it will take a while to run, don't worry if it seems like it freezes, it will finish eventually)
mysql> INSERT
    INTO
        vocab_full (
            korean
            ,english
            ,hanja
            ,audio
            ,number
            ,korean_sentence
            ,english_sentence
            ,english_alternative_sentence
            ,audio_sentence
            ,notes_sentence
        ) SELECT
                vocab.korean
                ,vocab.english
                ,vocab.hanja
                ,vocab.audio
                ,vocab.number
                ,grammar.korean
                ,grammar.translation
                ,grammar.alternative_translation
                ,grammar.audio
                ,grammar.notes
            FROM
                grammar INNER JOIN vocab
                    ON grammar.korean regexp CONCAT( '[[:<:]]' ,vocab.korean ,'[[:>:]]' )
                AND grammar.translation regexp CONCAT( '[[:<:]]' ,vocab.english ,'[[:>:]]' );
Query OK, 589 rows affected (32.83 sec)
Records: 589  Duplicates: 0  Warnings: 0
NOTE that we're getting just 589 rows out of this join, many rows which are duplicates:
mysql> SELECT
        korean
        ,english
        ,korean_sentence
    FROM
        vocab_full LIMIT 50;
Query OK, 0 rows affected (0.00 sec)
+-----------+---------+----------------------------------------------------------------------------------------------------+
| korean    | english | korean_sentence                                                                                    |
+-----------+---------+----------------------------------------------------------------------------------------------------+
| 네        | yes     | 네. 맞아요.                                                                                        |
| 네        | yes     | 네, 좋아요.                                                                                        |
| 네        | yes     | 네, 학교에 갑니다.                                                                                 |
| 네        | yes     | 네, 운동을 합니다.                                                                                 |
| 네        | yes     | - 영화를 볼까요?<div>- 네, 좋아요.</div>                                                           |
| 네        | yes     | 네, 영화가 너무 슬퍼서 울고 있어요.                                                                |
| 네        | yes     | 네, 버스로 한 시간쯤 걸려요.                                                                       |
| 네        | yes     | 네, 그래서 음악을 들으면서 공부를 해요.                                                            |
| 네        | yes     | 네, 그래서 샌드위치를 먹으면서 일했어요.                                                           |
| 네        | yes     | 네, 모든 교실에 다 있어요.                                                                         |
| 네        | yes     | 네, 외국 사람이지만 한국말을 잘해요.                                                               |
| 네        | yes     | 네, 그렇지만 자주 안 와서 지하철이 더 편할 거예요.                                                 |
| 네        | yes     | - 요즘도 바빠요?<div>- 네, 계속 바빠요. 좀 쉬었으면 좋겠어요.</div>                                |
| 네        | yes     | - 집에 가도 돼요?<div>- 네, 집에 가도 돼요.</div>                                                  |
| 네        | yes     | - 이 신발 신어 봐도 돼요?<div>- 네, 신어 보세요.</div>                                             |
| 네        | yes     | 오늘 날씨가 춥지요?<div>- 네, 정말 춥네요.</div>                                                   |
| 학교      | school  | 학교 뒤에 높은 산이 있어요.                                                                        |
| 학교      | school  | 학교 앞에서 버스로 갈아타세요.                                                                     |
| 학교      | school  | 학교 옆에 '만나식당'이 괜찮은데 거기 한번 가 보세요.                                               |
| 학교      | school  | 학교 근처로 이사 오는 게 어때요?                                                                   |


This is fine for my purposes as I care more about exposure to sentences than any individual word.
Remove rows with no audio (Optional)

mysql> DELETE
    FROM
        vocab_full
    WHERE
        audio IS NULL
        OR audio_sentence IS NULL;
Query OK, 0 rows affected (0.00 sec)
Turns out there is no need to do this, as there are already no rows without audio.
Export vocab_full to file

In the terminal:
$ mysql -h localhost -e "SELECT * FROM anki.vocab_full" --batch --silent > vocab_full.txt -u root -p

Importing deck back into Anki

Add Note Type for new deck

(In Anki) Tools | Manage Note Types | Add |  <pick a Clone> | OK | <new note type name> | OK will create a new Note Type. Select <new note type> | Fields | Add fields such that fields reflect the vocab_full columns (in the same order)
Import new deck

(In Anki) Create Deck | <new deck name> | Open <new deck>
(In Anki) File | Import | <select file to be imported> | Open will bring up the Import dialogue. Before clicking Import, make sure:

the Type is the newly created Note Type
the Deck is the newly created Deck
the Fields map to each other correctly

then Import.
NOTE While the expected result of the Import is that the notes are imported into the selected Deck, there is a bug in Anki that results in the notes being imported into the "Default" deck. To fix this
(In Anki) Browse | Select the 'Default' deck in the left of the browser | Select all the notes | Change Deck and move the notes to the desired deck.
Fix cards for new deck

(In Anki) Browse | Select the new deck in the right of left browser | Cards
Front Template:
<span style="font-family: Gulim; ">{{Korean}}</span>{{Audio}}

Styling:
.card {
 font-family: "Malgun Gothic";
 //font-family: Gulim;
 font-size: 20px;
 text-align: center;
 color: black;
 background-color: white;
}
.eng { font-family: Arial; }
.eng.title {color: #999999; font-size:80%;}
.notes {/*font-family: Batang;*/ color: #449933; font-size: 80%}

Back Template:
{{FrontSide}}<br>
{{English}}

<hr id=answer>

{{Korean_sentence}}{{Audio_sentence}}<br><br>

<span class="eng title">English:<br></span>
<span class="eng">{{English_sentence}}</span><br><br>
{{#Alternative Translation}}<span class="eng title">Alternative Translation:<br></span>{{/Alternative Translation}}<span class="eng">{{English_alternative_sentence}}</span> <br><br>

<div style='text-align: left; '>
{{#Notes}}<span class= "eng title">Additional Information:<br></span>{{/Notes}}<span class=notes>{{Notes_sentence}}</span></div>

Study

Phew! The new deck should now be ready to start studying.
Miscellaneous


Since the enriched Vocab deck ended up containing only 589 rows, many of which are duplicates, it would be nice to use other sources than the Grammar deck for enriching, eg a script could be written that scrapes example sentences and audio off Naver or some other source.


Other potential improvements? Comment!