Elia Robyn Lake (Robyn Speer) rspeer

## how-to-make-a-racist-ai-without-really-trying.ipynb

      
              1 file
            
          
              38 forks
            
          
              9 comments
            
          
              228 stars
            
          
                rspeer
                / how-to-make-a-racist-ai-without-really-trying.ipynb
            
            
              Last active
              December 23, 2023 22:54
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## deadbeef_character.py
"""
This file contains code that, when run on Python 2.7.5 or earlier, creates
a string that should not exist: u'\Udeadbeef'. That's a single "character"
that's illegal in Python because it's outside the valid Unicode range.

It then uses it to crash various things in the Python standard library and
corrupt a database.

On Python 3... well, this file is full of syntax errors on Python 3. But
if you were to change the print statements and byte literals and stuff:

## commoncrawl.rs
/* This Rust code scans through the Common Crawl, looking for text that's
 * not English. I suspect I may learn much later that it's terrible,
 * unidiomatic Rust, but it would take me months to learn what good Rust is.
 *
 * We depend on some external libraries:
 *
 *   - html5ever: an HTML parser (we only use its low-level tokenizer)
 *   - encoding: handles text in all the encodings that WHATWG recognizes
 *   - string_cache: interns a bunch of frequently-used strings, like tag names -- necessary to use
 *     the html5ever tokenizer

## snekmaze2.p8
function _init()
 -- tiles to move per frame
 -- don't make this more than 1
 fstep = 1/8

 -- step counter
 -- it can overflow, that's fine
 step = 0
 trailpos = 0


## countmerge.awk
# Given a tab-separated, sorted file where each line is a key and a count,
# merge adjacent lines with the same key by adding their counts.

BEGIN {
    # Initialize the current count.
    # We use the empty string as a sentinel value, indicating that we haven't
    # seen a key yet. We won't output a total for the empty string.
    key = ""
    count = 0
}

## wordfreq-1.2-example.py
>>> from wordfreq import tokenize, word_frequency
>>> tokenize('电影放映机', 'zh')
['电影', '放映', '机']

>>> word_frequency('电影放映机', 'zh')
5.370851923771552e-08

>>> word_frequency('programme', 'en')
5.754399373371567e-05

## aaaa.html
<html>
<body>
aaaaaa
aaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

aaaaaaaaa

## dominion-rnn-cards.txt
2renole
$3, Action

Trash this card. If you do, gain a Silver per 5 cards it, and put them into your hand.


3rost
$5, Action, Duration

## description.txt
^ marks the name of the card.
The column with all the @ signs indicates the cost and type. I probably missed some because I was impatiently editing a file I had already.
A = Action, T = Treasure, V = victory, a = Attack, R = Reaction, v = traVeler, D = Duration, E = Event, r = Ruins.
| indicates a line break, and --- indicates a horizontal line.

## wordfreq-1.2-top-words.py
>>> import wordfreq, langcodes

>>> def legible_list(lst):
...     return('\N{LEFT-TO-RIGHT MARK}, '.join(lst))
...

>>> for lang in sorted(wordfreq.available_languages()):
...     language_name = langcodes.get(lang).language_name('en')
...     top_ten = legible_list(wordfreq.top_n_list(lang, 10))
...     print('%-3s %-12s %s' % (lang, language_name, top_ten))
	"""
	This file contains code that, when run on Python 2.7.5 or earlier, creates
	a string that should not exist: u'\Udeadbeef'. That's a single "character"
	that's illegal in Python because it's outside the valid Unicode range.

	It then uses it to crash various things in the Python standard library and
	corrupt a database.

	On Python 3... well, this file is full of syntax errors on Python 3. But
	if you were to change the print statements and byte literals and stuff:
	/* This Rust code scans through the Common Crawl, looking for text that's
	* not English. I suspect I may learn much later that it's terrible,
	* unidiomatic Rust, but it would take me months to learn what good Rust is.
	*
	* We depend on some external libraries:
	*
	* - html5ever: an HTML parser (we only use its low-level tokenizer)
	* - encoding: handles text in all the encodings that WHATWG recognizes
	* - string_cache: interns a bunch of frequently-used strings, like tag names -- necessary to use
	* the html5ever tokenizer
	function _init()
	-- tiles to move per frame
	-- don't make this more than 1
	fstep = 1/8

	-- step counter
	-- it can overflow, that's fine
	step = 0
	trailpos = 0
	# Given a tab-separated, sorted file where each line is a key and a count,
	# merge adjacent lines with the same key by adding their counts.

	BEGIN {
	# Initialize the current count.
	# We use the empty string as a sentinel value, indicating that we haven't
	# seen a key yet. We won't output a total for the empty string.
	key = ""
	count = 0
	}
	>>> from wordfreq import tokenize, word_frequency
	>>> tokenize('电影放映机', 'zh')
	['电影', '放映', '机']

	>>> word_frequency('电影放映机', 'zh')
	5.370851923771552e-08

	>>> word_frequency('programme', 'en')
	5.754399373371567e-05
	<html>
	<body>
	aaaaaa
	aaaaaa
	aaaaaaaaaaaaaaaaaaaaaaaaaaaaa

	aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

	aaaaaaaaa
	2renole
	$3, Action

	Trash this card. If you do, gain a Silver per 5 cards it, and put them into your hand.



	3rost
	$5, Action, Duration
	^ marks the name of the card.
	The column with all the @ signs indicates the cost and type. I probably missed some because I was impatiently editing a file I had already.
	A = Action, T = Treasure, V = victory, a = Attack, R = Reaction, v = traVeler, D = Duration, E = Event, r = Ruins.
	\| indicates a line break, and --- indicates a horizontal line.
	>>> import wordfreq, langcodes

	>>> def legible_list(lst):
	... return('\N{LEFT-TO-RIGHT MARK}, '.join(lst))
	...

	>>> for lang in sorted(wordfreq.available_languages()):
	... language_name = langcodes.get(lang).language_name('en')
	... top_ten = legible_list(wordfreq.top_n_list(lang, 10))
	... print('%-3s %-12s %s' % (lang, language_name, top_ten))