Skip to content

Instantly share code, notes, and snippets.

View benhoyt's full-sized avatar

Ben Hoyt benhoyt

View GitHub Profile
@benhoyt
benhoyt / join.awk
Created November 20, 2018 00:59
AWK program to compare time complexity of joining strings
# AWK program to compare time complexity of joining strings using a
# simple O(N^2) algorithm and a slightly more complex O(N log N) one.
# Join array elements, separated by sep: O(N^2) version
function join1(a, sep, i, s) {
for (i = 1; i+1 in a; i++) {
s = s a[i] sep
}
if (i in a) {
s = s a[i]
@benhoyt
benhoyt / edit_distance.py
Created May 8, 2018 14:06
Calculate edit distance with simple (memoized) recursive algorithm
def distance(s, t, cache=None):
"""Return minimum edit distance between s and t, where an edit
is a character substitution, deletion, or addition.
"""
if not s:
return len(t)
if not t:
return len(s)
if cache is None:
@benhoyt
benhoyt / flattenjson.go
Created July 7, 2020 05:06
Little Go tool to flatten JSON input
// Flatten JSON input
//
// Example:
//
// $ echo '{"user":"Ben","ints":[true,false,null],"sub":{"x":1,"y":2}}' | go run flattenjson.go
// _.ints[0] = true
// _.ints[1] = false
// _.ints[2] = null
// _.sub.x = 1
// _.sub.y = 2
@benhoyt
benhoyt / markdown.diff
Created October 7, 2020 19:28
Diff to override goldmark's code block output
diff --git a/internal/markdown/markdown.go b/internal/markdown/markdown.go
index a729b9f..94d29c1 100644
--- a/internal/markdown/markdown.go
+++ b/internal/markdown/markdown.go
@@ -13,8 +13,11 @@ import (
"bytes"
"github.com/yuin/goldmark"
+ "github.com/yuin/goldmark/ast"
"github.com/yuin/goldmark/parser"
@benhoyt
benhoyt / countwords.fs
Created March 12, 2021 07:33
Forth: print frequencies of unique words in stdin, most frequent first
200 constant max-line
create line max-line allot \ Buffer for read-line
wordlist constant counts \ Hash table of words to count
variable num-uniques 0 num-uniques !
\ Allocate space for new string and copy bytes, return new string.
: copy-string ( addr u -- addr' u )
dup >r dup allocate throw
dup >r swap move r> r> ;
@benhoyt
benhoyt / python-stdlib.md
Created October 17, 2019 00:08
Overview of (parts of) the Python standard library

I'm going to demo a bunch of Python builtin and stdlib functions. There's a lot to get through, so I'll be going fast, but please stop me and ask questions as we go. The goal is to give you a taste of Python's power and expressivity if you're not a Python person, or maybe teach you a few new tricks if you are already.

Built-in functions

# enumerate: iterate with index *and* item
>>> strings = ['123', '0', 'x']
>>> for i, s in enumerate(strings):
...     print(f'{i} - {s}')  # f-strings!
@benhoyt
benhoyt / mt.py
Created September 21, 2021 03:33
Quick performance test of Python 3.10's "match" vs "if...elif"
"""Quick performance tests comparing "match" with "if...elif".
See:
https://benhoyt.com/writings/python-pattern-matching/
https://news.ycombinator.com/item?id=28601616
# Enum switch with match:
$ python3.10 -m timeit -s 'import mt' -c 'mt.enum_match(mt.FileType.FILE)'
1000000 loops, best of 5: 356 nsec per loop
$ python3.10 -m timeit -s 'import mt' -c 'mt.enum_match(mt.FileType.SYMLINK)'
@benhoyt
benhoyt / client.go
Created September 23, 2021 00:12
Benchmark of three ways to do optional fields in Go structs
package client
func IntPtr(n int) *int { return &n }
type FooArgsPtr struct {
UserID *int
User string
GroupID *int
Group string
}
@benhoyt
benhoyt / repeat-while.diff
Last active December 16, 2021 02:22
See how fast we can make direct-threaded code in C (using computed goto)
static void* prog[] = {
- // loop:
&&i_pushvar0, // pushvar i
&&i_pushnum, (void*)100000000, // pushnum 100000000
&&i_jge, (void*)5, // jge end
+ // loop:
&&i_pushvar0, // push i
&&i_addvar1, // addvar s
&&i_incvar0, // incvar i
- &&i_jmp, (void*)((long long)-10), // jmp loop
@benhoyt
benhoyt / ngrams.py
Created May 12, 2016 15:34
Print most frequent N-grams in given file
"""Print most frequent N-grams in given file.
Usage: python ngrams.py filename
Problem description: Build a tool which receives a corpus of text,
analyses it and reports the top 10 most frequent bigrams, trigrams,
four-grams (i.e. most frequently occurring two, three and four word
consecutive combinations).
NOTES