Skip to content

Instantly share code, notes, and snippets.



Created Jul 10, 2012
What would you like to do?
mkcards: Turn DVD audio or songs into Anki cards
This toolkit will allow you turn TV episodes and transcriptions into
high-quality Anki cards. Quite a lot of assembly is required the first
time through, but after that, 90% of your effort will be (1) either looking
at the text or (2) aligning the start and end of each audio clip using a
halfway civilized tool. In other words, it's still going to take time, but
you'll actually spend that time studying the language, not messing around
with a 19-step process.
There's a good chance you can this working on a Mac or on Linux. Windows
will be a fairly major challenge.
Tools needed:
- The transcode utility or another way to rip audio tracks from a DVD
- The ffmpeg command-line tool
- Ruby 1.9
- TranscriberAG
- Anki
- Optional: Dropbox and AnkiDroid for mobile use
To download TranscriberAG, check out the following sites: (Mac, Windows) (Modern Linux systems)
Rip an audio track from a DVD. Replace $CHAPTER with the episode number
(1, 2, etc.) and $OUTPUT with a file name.
transcode -i /dev/dvd -x dvd -T $CHAPTER,-1 -a 1 -y null,wav -m $OUTPUT.wav
If you wish, you can convert stereo WAV to mono.
ffmpeg -i buffy_209_stereo.wav -ac 1 buffy_209.wav
To preserve your sanity, replace all spaces in the audio file name with "_".
If you want to work with music instead of DVD audio, just grab your MP3
file and use it in place of the *.wav.
Use TranscriberAG to align your transcription with sound file. Read the
tutorials and take a few minutes to experiment; it's a very powerful and
efficient tool, designed for professionals, but you need to get used to it.
Note that I currently assign speaker names when transcribing. If you
assign "no speaker" or leave the text of a segment blank, no card will be
made for that segment.
When done, export your transcript in *.stm format, and invoke 'mkcards' as
./mkcards -c da_vinci_claude.stm da_vinci_claude.mp3
You may need to 'mv mkcards.rb && chmod +x mkcards' the first time, or call
it as 'ruby mkcards.rb', or something along those lines.
The '-c' here is optional. If you specify it, the volume will be cut in
half, which is generally the right choice for music, which is roughly twice
as loud as television episodes. This will let you do reps in Anki without
crazy volume shifts.
This will generate da_vinci_claude.tsv and a media/ directory. Copy all
the MP3 files in the media directory to your Anki media folder, and import
the TSV file.
You'll want to create a new card model before importing the TSV file. The
model fields I use are:
- Text
- Speaker
- Source
- Start
- Sound
- Note
I usually put "Speaker" and "Sound" on the front of the card, and
everything else on the back.
#!/usr/bin/env ruby
require 'fileutils'
# Represents one line of an STM file.
class Line
# Time in seconds to add to the start and end of each clip. This makes
# it much less time-consuming and error prone to deal with fast
# conversations and individual lines in songs.
PADDING = 0.25
# Given a line of an STM file, parse it into the fields we need.
def initialize(text)
fields = text.split(' ', 7)
puts fields.inspect
@source = fields[0]
if fields[2] != 'inter_segment_gap'
# Strip file-specific speaker prefixes.
@speaker = fields[2].sub("#{@source}_", '')
@start = fields[3]
@end = fields[4]
# Split apart dialog on "///" in case there are notes
@dialog, @note = fields[6].split(%r{///}, 2).map {|t| process_text(t) }
@note ||= ''
# Do we have a speaker field? If not, this is dead space.
def blank?
@speaker.nil? || @dialog.nil? || @dialog.empty?
# What file name should we use for this clip?
def clip_name
"#{@source}-#{start}".gsub('.', '_') + ".mp3"
# Convert this line to tab-separated values for import into Anki.
def to_tsv
[@dialog, @speaker, @source, start, "[sound:#{clip_name}]",
# Generate a command which we can use to export an MP3 file.
def export_command(audio_source_path, outdir, cut_volume)
cmd = ['ffmpeg', '-i', audio_source_path, '-ss', start, '-t', duration]
if cut_volume
cmd += ['-vol', '128']
cmd << File.join(outdir, clip_name)
# Parse the file at the specified location into an array of Line objects.
def self.parse_stm(path)
lines = [], 'r') do |f|
f.each_line do |line|
next if line =~ /^;/ || line.empty?
parsed =
next if parsed.blank?
lines << parsed
# When does this clip start, in seconds, represented as a string?
def start
sprintf("%.2f", @start.to_f - PADDING)
# How long is this clip in seconds, represented as a string? We use this
# to invoke ffmpeg.
def duration
sprintf("%.2f", (@end.to_f - @start.to_f) + PADDING*2)
# Split a text field into lines at "//" and clean up whitespace.
def process_text(text)
text.split(%r{//}).map {|l| l.strip }.join("<br>")
# Parse our command-line args.
cut_volume = false
if ARGV.length >= 1 && ARGV[0] == '-c'
ARGV.shift # Drop first argument.
cut_volume = true
if ARGV.length != 2
STDERR.puts "Usage: mkcards [-c] input.stm input.wav"
STDERR.puts " -c will cut the volume in half, making songs more like DVDs"
exit 1
stm, audio_source_path = ARGV
# Create our output media directory.
# Process our STM file and generate our output files.
lines = Line.parse_stm(stm)\.stm$/, '.tsv'), 'w') do |tsv|
for line in lines
tsv.puts line.to_tsv
system *line.export_command(audio_source_path, "media", cut_volume)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.