As configured in my dotfiles.
start new:
tmux
start new with session name:
As configured in my dotfiles.
start new:
tmux
start new with session name:
from __future__ import print_function | |
import subprocess | |
import shutil | |
import os | |
import stat | |
import time | |
# This script looks extremely defensive, but *should* let you rerun at | |
# any stage along the way. Also a lot of code repetition due to eventual support | |
# for "non-blob" install from something besides the magic kk_all_deps.tar.gz |
# This extracts png images from the | |
# packed/pickle'd cifar-100 dataset | |
# available at http://www.cs.toronto.edu/~kriz/cifar.html | |
# | |
# No Rights Reserved/ CC0 | |
# Say thanks @whereismatthi on Twitter if it's useful | |
# | |
# probably requires python3 | |
# definitely requires PyPNG: pip3 install pypng |
""" | |
Author: Awni Hannun | |
This is an example CTC decoder written in Python. The code is | |
intended to be a simple example and is not designed to be | |
especially efficient. | |
The algorithm is a prefix beam search for a model trained | |
with the CTC loss function. |
Sometimes it is really nice to just take a quick look at some data. However, when working on remote computers, it is a bit of a burden to move data files to a local computer to create a plot in something like R
. One solution is to use gnuplot
and make a quick plot that is rendered in the terminal. It isn't very pretty by default, but it gets the job done quickly and easily. There are also advanced gnuplot
capabilities that aren't covered here at all.
gnuplot
has it's own internal syntax that can be fed in as a script, which I won't get into. Here is the very simplified gnuplot
code we'll be using:
set terminal dumb size 120, 30; set autoscale; plot '-' using 1:3 with lines notitle
Let's break this down:
Given that LibriVox contains enough of english content for a speech processing corpus, LibriSpeech, to be built from it, I've wondered how much content LibriVox has in languages other than English.
I've downloaded the JSON API contents of Librivox, separated the audiobooks according to their language, and summed up their lengths, obtaining a language breakdown expressed in spoken time.
This gave results of over 60 thousand hours for english, thousands of hours each for German, Dutch, French, Spanish, and hundreds of hours for other languages.
import cv2 | |
import numpy as np | |
import pyaudio | |
import librosa | |
import librosa.display | |
import matplotlib.pyplot as plt | |
import time | |
rate = 16000 | |
chunk_size = rate // 4 |
So in the midst of all these Sesame Streets characters and robots transforming automobile era of "contextualize" language models, there is this "Toronto Book Corpus" that points to this kinda recently influential paper:
Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. "Aligning books and movies: Towards story-like visual explanations by watching movies and reading books." In Proceedings of the IEEE international conference on computer vision, pp. 19-27.
Some might know my personal pet peeve on collecting translation datasets but this BookCorpus has no translations, so why do I even care about it?