Skip to content

Instantly share code, notes, and snippets.

View rspeer's full-sized avatar

Elia Robyn Lake (Robyn Speer) rspeer

View GitHub Profile
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
"""
This file contains code that, when run on Python 2.7.5 or earlier, creates
a string that should not exist: u'\Udeadbeef'. That's a single "character"
that's illegal in Python because it's outside the valid Unicode range.
It then uses it to crash various things in the Python standard library and
corrupt a database.
On Python 3... well, this file is full of syntax errors on Python 3. But
if you were to change the print statements and byte literals and stuff:
/* This Rust code scans through the Common Crawl, looking for text that's
* not English. I suspect I may learn much later that it's terrible,
* unidiomatic Rust, but it would take me months to learn what good Rust is.
*
* We depend on some external libraries:
*
* - html5ever: an HTML parser (we only use its low-level tokenizer)
* - encoding: handles text in all the encodings that WHATWG recognizes
* - string_cache: interns a bunch of frequently-used strings, like tag names -- necessary to use
* the html5ever tokenizer
function _init()
-- tiles to move per frame
-- don't make this more than 1
fstep = 1/8
-- step counter
-- it can overflow, that's fine
step = 0
trailpos = 0
@rspeer
rspeer / countmerge.awk
Last active June 20, 2018 00:30
Given a sorted file where each line is a key and a count, merge adjacent lines with the same key by adding their counts.
# Given a tab-separated, sorted file where each line is a key and a count,
# merge adjacent lines with the same key by adding their counts.
BEGIN {
# Initialize the current count.
# We use the empty string as a sentinel value, indicating that we haven't
# seen a key yet. We won't output a total for the empty string.
key = ""
count = 0
}
>>> from wordfreq import tokenize, word_frequency
>>> tokenize('电影放映机', 'zh')
['电影', '放映', '机']
>>> word_frequency('电影放映机', 'zh')
5.370851923771552e-08
>>> word_frequency('programme', 'en')
5.754399373371567e-05
@rspeer
rspeer / aaaa.html
Created March 14, 2016 20:00
Overflowing the stack of Text.HTML.TagSoup with a straightforward HTML file
<html>
<body>
aaaaaa
aaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaa
@rspeer
rspeer / dominion-rnn-cards.txt
Last active November 12, 2015 07:49
Dominion cards generated with a neural net. See http://forum.dominionstrategy.com/index.php?topic=13415.100.
2renole
$3, Action
Trash this card. If you do, gain a Silver per 5 cards it, and put them into your hand.
3rost
$5, Action, Duration
^ marks the name of the card.
The column with all the @ signs indicates the cost and type. I probably missed some because I was impatiently editing a file I had already.
A = Action, T = Treasure, V = victory, a = Attack, R = Reaction, v = traVeler, D = Duration, E = Event, r = Ruins.
| indicates a line break, and --- indicates a horizontal line.
>>> import wordfreq, langcodes
>>> def legible_list(lst):
... return('\N{LEFT-TO-RIGHT MARK}, '.join(lst))
...
>>> for lang in sorted(wordfreq.available_languages()):
... language_name = langcodes.get(lang).language_name('en')
... top_ten = legible_list(wordfreq.top_n_list(lang, 10))
... print('%-3s %-12s %s' % (lang, language_name, top_ten))