Skip to content

Instantly share code, notes, and snippets.

@emk
emk / 00_README.md
Last active July 10, 2023 02:33
Anki subs2srs card template, with optional hint and note

To use these templates, you'll need to create a new Anki "note type" with the following fields:

  • Sound
  • Time
  • Source
  • Image
  • Target: line
  • Base: line
  • Target: line before
  • Base: line before
@emk
emk / twice-nothing.rs
Last active August 29, 2015 14:07
Make this work, and StreamingIterator will be able to do a lot.
pub trait Nothing<'a> {
fn nothing(&'a mut self) -> ();
fn twice_nothing(&'a mut self) -> () {
{ self.nothing(); }
{ self.nothing(); }
}
}
// This works.
@emk
emk / generic.rs
Created October 4, 2014 01:00
Hey, at least it's generic!
pub trait StreamingIterator<'a, T> {
/// Return either the next item in the sequence, or `None` if all items
/// have been consumed.
fn next(&'a mut self) -> Option<T>;
/// Hey, it compiles.
fn reduce<S>(&'a mut self, init: S, r: |S,T| -> S) -> S {
let mut sum = init;
//streaming_for!(v in self, {
// sum = r(sum, v);
@emk
emk / code.rs
Created October 2, 2014 16:42
A tricky ownership issue with fill_buf
impl<'a,T: Buffer+'a> Buffer for ChunkBuffer<'a,T> {
fn fill_buf<'a>(&'a mut self) -> IoResult<&'a [u8]> {
if self.buffer.as_slice().contains_slice(self.boundary.as_slice()) {
// Exit 1: Valid data in our local buffer.
Ok(self.buffer.as_slice())
} else if self.buffer.len() > 0 {
// Exit 2: Add some more data to our local buffer so that it's
// valid (see invariants for top_up).
self.top_up()
} else {
@emk
emk / NOTES.md
Last active August 29, 2015 14:07
Rust: Zero-copy & zero-allocation saves a lot

Some benchmark results, for:

  1. Setting up a MemReader.
  2. Setting up a MemReader and reading 130 tokens as sentences (with lots of String and Vector).
  3. Reading 1 token, with no copies or allocations.
test conll::memreader_overhead         ... bench:       195 ns/iter (+/- 22) = 22394 MB/s
test conll::sentence_reader_iter_bench ... bench:    269884 ns/iter (+/- 40269) = 16 MB/s
test conll::token_from_str_bench ... bench: 99 ns/iter (+/- 24) = 313 MB/s
@emk
emk / melt2connlx
Created September 29, 2014 13:27
melt2connlx: Convert MElt POS-tagger output into a CONLLX file
#!/usr/bin/env ruby
#
# Convert `MElt -tdL` output into *.conllx format. Usage:
#
# melt2conllx < input.melt > output.conllx
#
# Input should be one line per sentence, formatted like:
#
# Durant/P/durant le/DET/le trajet/NC/trajet qui/PROREL/qui
#
@emk
emk / map_reduce.rs
Created September 27, 2014 12:08
Need something where both Clone and Copy work
pub struct Emitter<R,K: Hash + Eq + Clone,V: Copy,MR: MapReduce<R,K,V>> {
results: HashMap<K,V>
}
impl<R,K: Hash + Eq + Clone,V: Copy,MR: MapReduce<R,K,V>> Emitter<R,K,V,MR> {
fn new() -> Emitter<R,K,V,MR> {
Emitter{results: HashMap::with_capacity(25000)}
}
#[inline]
@emk
emk / error.txt
Created September 17, 2014 18:54
Cargo errors on Heroku
Updating git repository `https://github.com/reem/rust-unsafe-any.git`
Unable to update https://github.com/reem/rust-unsafe-any.git#75cff194
Caused by:
failed to clone into: /tmp/cargo_ASPp/.cargo/git/db/rust-unsafe-any-3633b05955fd77c7
Caused by:
[16] The SSL certificate is invalid
@emk
emk / lexique2-import.sql
Created September 6, 2014 13:19
Import Lexique 3 French vocabulary data into MySQL
-- Using Lexique 3 from http://lexique.org/ with SQLite 3.
--
-- On the command line, to remove header and extract the first 10 columns.
-- iconv -f ISO-8859-15 -t UTF-8 Lexique380/Bases+Scripts/Lexique380.txt | tail -n+2 | cut -f 1-10 > lexique1-10.txt
-- Set up our original data table. This is pretty raw.
PRAGMA encoding = "UTF-8";
CREATE TABLE lexique (
ortho TEXT,
phon TEXT,
@emk
emk / french-lexique-5000-film-corpus.txt
Created September 5, 2014 20:02
French Lexique top 5000 film corpus
-- Generated from Lexique 3. For raw data & Creative Commons License, see: http://lexique.org/
-- Note that this word list has some peculiarities, because it was intended for use by language
-- processing software.
être
avoir
je
de
ne
pas
le