Skip to content

Instantly share code, notes, and snippets.

View soaxelbrooke's full-sized avatar
📈
Text ⇨ Understanding

Stuart Axelbrooke soaxelbrooke

📈
Text ⇨ Understanding
View GitHub Profile
@soaxelbrooke
soaxelbrooke / main.py
Created August 7, 2023 01:31
Reading/Querying Parquet Datasets from Self-Hosted S3-Compatible Block Storage with s3fs + PyArrow + Polars
# Having already:
# export AWS_ACCESS_KEY_ID=youraccesskey
# export AWS_SECRET_ACCESS_KEY=yoursecretkey
import pyarrow.dataset as ds
import polars as pl
import s3fs
S3_ENDPOINT = "http://your.s3.endpoint:3900"
@soaxelbrooke
soaxelbrooke / adding-tailscale-to-edgerouter.md
Created January 9, 2023 18:14 — forked from lg/adding-tailscale-to-edgerouter.md
Add tailscale to an EdgeRouter and surviving system upgrade

Adding tailscale to an EdgeRouter (and surviving system upgrades)

I suggest you run sudo bash on all of these so you're the root user.

Installing

  1. Download tailscale and put the files in /config/. Find the latest stable or unstable version for your EdgeRouter's processor (ex. ER4 is mips and ERX is mipself)
sudo bash    # if you havent already
@soaxelbrooke
soaxelbrooke / example_file_input.rs
Created August 23, 2020 04:05
Example Handling Input Files With Yew
use yew::events::ChangeData;
use yew::web_sys::File;
use yew::prelude::*;
pub struct MyFileInput {
link: ComponentLink<Self>,
file: Option<File>,
}
pub enum Msg {
@soaxelbrooke
soaxelbrooke / systemd-talk.md
Last active February 13, 2020 00:04
Instructions for setting up a systemd service!

systemd Talk

First, let's make ourselves a simple python web server with flask:

from flask import Flask
app = Flask(__name__)
import os

PORT = int(os.getenv('FLASK_PORT', 5000))
@soaxelbrooke
soaxelbrooke / ml_utils.py
Last active October 13, 2018 23:34 — forked from zmjjmz/ml_utils.py
regexp match lookup layer
import keras
import tensorflow
import numpy
import re
# Capturing group is important so it can be left padded with space (token splitter)
token_pattern = r"([\w']+|[,\.\?;\-\(\)])"
substitution = r" \1"
@soaxelbrooke
soaxelbrooke / parallel_word_frequency_count.sh
Last active September 2, 2018 22:16
Counts word frequencies in parallel, combining them.
# Need wf - install with `cargo install wf`
mkdir splits wfs
echo 'Splitting file into parts...'
split -a 5 -l 200000 $1 splits/split
ls splits/ | parallel 'echo "Counting {}..."; cat splits/{} | wf > wfs/{}_wf.txt'
echo 'Combining split counts...'
python -c 'from tqdm import tqdm; from functools import reduce; from glob import glob; from collections import Counter; of = open("wfs.txt", "w"); wf = reduce(lambda a, b: a + b, (Counter(dict((pair[0], int(pair[1])) for pair in (line.strip().split() for line in open(fpath)))) for fpath in tqdm(glob("wfs/*"))), Counter()); [of.write("{} {}\n".format(key, count)) for key, count in sorted(wf.items(), key=lambda p: -p[1])]'
rm -rf wfs splits
echo 'Word frequencies written to wfs.txt.'
@soaxelbrooke
soaxelbrooke / Dockerfile
Created July 10, 2018 00:21
A dockerfile for building minimal rust nightly containers
FROM rustlang/rust:nightly AS builder
WORKDIR /app
COPY src/ src/
COPY Cargo.* ./
RUN cargo build --release
FROM debian:stretch-slim
COPY --from=builder /app/target/release/api .
CMD ["./api"]
@soaxelbrooke
soaxelbrooke / wvsqlite.py
Last active March 1, 2023 09:37
Script for converting txt word embedding files to SQLite databases for fast embedding lookup.
#!/usr/bin/env python3.6
"""
Example usage:
$ python3.6 wvsqlite.py glove.840B.300d.txt
Produces an sqlite database at with byte strings of floats for each word vector, indexed by
token for fast lookup for vocabs much smaller than the embedding vocab (aka most real vocabs).
Float size can be set via FLOAT_BYTES env var, and can be 4 or 8, and LIMIT can be set to take
@soaxelbrooke
soaxelbrooke / word_vectorizer.py
Last active May 16, 2018 06:25
Word embeddings utility class for loading and transforming quickly.
import pandas
import numpy
import csv
from typing import List, Optional
class WordVectorizer:
def __init__(self, embeddings_path: str, embedding_dim: int, limit=None):
with open(embeddings_path) as infile:
# Skip header if this was produced by fasttext, which has metadata on first line
@soaxelbrooke
soaxelbrooke / sqlite_command_line_multiple_dot_commands.sh
Last active April 25, 2018 12:01
Execute a multi-line sqlite script without having to create a script file! Works with make!
echo ".mode csv && .header on && .once report.csv && select * from report;" | sed -E -e 's/\s*\&+\s*/\n/g' | sqlite3 database.sqlite