Skip to content

Instantly share code, notes, and snippets.

@phiresky
phiresky / exponential-retry.js
Created April 5, 2024 13:43
lemmy exponential retry simulation
function calc(base) {
function retryDelaySeconds(retry) {
return Math.pow(base, retry);
}
function timeToStr(sec) {
sec = Math.round(sec);
if (sec < 60) {
return sec + "s";
}
@phiresky
phiresky / cjk-tokenization.js
Last active September 24, 2021 20:22
CJK Tokenization is hard or impossible to do perfectly, especially if you don't know the language or don't want to load megabytes of dictionaries. Here's a simple solution that gets you most of the way.
// cjk tokenization snippet:
"hello test 德国 をクリックしてください 안녕하세요 세계"
.replace(/(\p{Script=Han}|\p{Script=Hiragana}|\p{Script=Katakana}|\p{Script=Hang})/ug, "$1 ")
// inserts spaces after every CJK character, so a normal tokenizer will pick up each character as
// a separate "word". there doesn't seem to be a smarter alternative to this.
@phiresky
phiresky / recursive-decr-convert-rdiff-to-borg.sh
Created August 29, 2021 10:46
recursively convert a rdiff-backup repository to a borg backup one
set -e
set -x
src=backup
dst=borg
tmpdir=./tmp
t1=$(mktemp)
rdiff-backup --list-increments $src > $t1
@phiresky
phiresky / fast_dict_space.py
Last active July 15, 2021 10:37
gym space like dict space but faster based on numpy structured array
from typing import OrderedDict
import numpy as np
import numpy.lib.recfunctions as rfn
from gym import spaces
def structured_to_unstructured(arr: np.ndarray):
return rfn.structured_to_unstructured(arr, casting="no")
@phiresky
phiresky / amazon-sort-by-price-per-unit.js
Last active December 10, 2023 18:44
reorders amazon search results to sort by price per weight to find the actually cheapest item (only tested on amazon.de)
// reorders amazon search results to sort by price per weight to find the actually cheapest item
// 1. install https://addons.mozilla.org/en-US/firefox/addon/weautopagerize/
// 2. search for something
// 3. scroll down all the way
// 4. run this script
// "featured by amazon"
for (let crap of document.getElementsByClassName(
"template=FEATURED_ASINS_LIST"
))
@phiresky
phiresky / sql-libs-for-typescript.md
Created June 26, 2020 10:42
SQL libs for typescript

Using SQL databases in a typed language is a pain unless you have great libraries to support you. There's a lot of different libraries for TypeScript, but they all have flaws.

This is complete overview of SQL libraries for TypeScript. If I'm missing a library, please let me know.

Object Relation Mappers (ORMs)

In an ORM you declare the schema completely in the host language (TypeScript). The ORM then completely manages synchronization between your objects / classes and the corresponding database tables.

ORMs always have the same issues: If you have somewhat complex queries, you will get to the limit of the ORM and not be able to represent that query in it without escape hatching. You also lose direct control over how the queries are handled, and thus may get surprising performance issues when the ORM uses dumb SQL queries in the background.

@phiresky
phiresky / tune.md
Last active September 6, 2024 07:01
SQLite performance tuning

You can scale a SQLite database to multiple GByte in size and many concurrent readers by applying the below optimizations.

Run these every time you connect to the db

(some are applied permanently, but others are reset on new connection)

pragma journal_mode = WAL;

Instead of writing directly to the db file, write to a write-ahead-log instead and regularily commit the changes. Allows multiple concurrent readers, and can significantly improve performance.

@phiresky
phiresky / .gitignore
Last active April 24, 2022 16:59
parity auto-kill script
/node_modules
*.log
@phiresky
phiresky / pdfextract.sh
Created May 29, 2020 19:12
ripgrep pdf text extractor with caching that is much faster than pdfgrep
#!/bin/bash
# usage: `rg --no-line-number --sort-files --pre pdfextract "$@"`
# better and much faster solution: https://github.com/phiresky/ripgrep-all
fname="$1"
cachedir=/tmp/pdfextract
mkdir -p "$cachedir"
<meta charset="utf-8">
<script src="https://unpkg.com/sql.js@1.2.2/dist/sql-asm.js"></script>
<script>
async function go() {
const SQL = await initSqlJs();
const dbres = await fetch("https://rawcdn.githack.com/kotartemiy/newscatcher/b30358cf57c9f8f4a481b51c0a0884a64e0b85b2/newscatcher/data/package_rss.db");