Skip to content

Instantly share code, notes, and snippets.

View gist:e96982253136dcd253f8376ded2fd0e5
sqlite> .schema itemAnnotations
CREATE TABLE IF NOT EXISTS "itemAnnotations" (
itemID INTEGER PRIMARY KEY,
parentItemID INT NOT NULL,
type INTEGER NOT NULL,
authorName TEXT,
text TEXT,
comment TEXT,
color TEXT,
pageLabel TEXT,
@atomotic
atomotic / epub-search.md
Created November 13, 2021 12:11
indexing epub content into solr
View epub-search.md

indexing epub content into solr

solr schema

  • 1 document per chapter, then collapse
  • multivalued fields: chapter_title and chapter_text, keeping order.

text extraction

how to extract structured text from epub

View docker-compose.yml
version: "3"
node-exporter:
image: prom/node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- "--path.procfs=/host/proc"
@atomotic
atomotic / readme.md
Created October 3, 2021 10:46
export a static image from Annotorious, with annotation data embedded
View readme.md
View gist:c9e328712c1604a772ca031d67b5d44a
~ ipfs ls /ipns/ipfs-sec.stackexchange.cloudflare-ipfs.com/crypto/
zdj7WawSwGzackrPMpRyE5gB14rrR3CXSML4Cowsfo8RVA48m 261478141 A
zdj7WmdqpgAKtT6bik5FZiUuEBw3ibBE2Jvbf2yDHoCTeZtUR 2369946 -
zdj7WikogGGVBPciUv1hgnecawo7P4E6Rwj44LC6vdibCywTN 276549715 I
zdj7WfHUdRgg4LTr77ZX3tR3fLj7xDDpo8StCFFU4B2ZR43cm 1034 M
zdj7WfKAh1sMoUDZLq13yJzyb7dpHaqSV1p2Ftdx2VUgxMhFe 154 index.html
zdj7WcC5bvaUjBXYXZnsUa7Ghe2rtiSst9JnwpMwDTBWC4N4m 8343 search.html
zdj7WazUKfQCWpKePKDRBFaomsJrNcEu6U9obNP9UhLg6cArN 53013564 _index
@atomotic
atomotic / mastodon-followers.sh
Created August 31, 2018 08:26
get the list of followers of a mastodon user. output in ntriples
View mastodon-followers.sh
#!/usr/bin/env bash
instance="https://digipres.club"
user="raffaele"
json=$(curl -s -H "Accept: application/activity+json" $instance/users/$user/followers?page=1)
echo "$json" | jq -r .orderedItems[] | xargs -I% echo "<%> <follows> <$instance/user/$user> ."
next=$(echo "$json" | jq -r .next)
while true; do
View gist:445c3996727ad77db30e15259304a15c
# apt install rustc cargo
# git clone https://github.com/tari/warcdedupe
# cd warcdedupe
# cargo install
# ...
# ./target/debug/warcdedupe -h
WARC deduplicator.
Usage:
warcdedupe [options] [<infile>] [<outfile>]
View readme.md

install solr and create a core (books)

brew install solr
solr start
solr create -c books -d /usr/local/Cellar/solr/7.2.1/example/files/conf

index a pdf

post -c books /tmp/gabriella-giannachi-archive-everything-mapping-the-everyday.pdf
@atomotic
atomotic / brainwashed-2017-poll.txt
Created January 13, 2018 09:07
brainwashed 2017 Readers Poll - The Results
View brainwashed-2017-poll.txt
spotify:album:4nSWX5A4xVomzrOEGDKLQ6 - Slowdive, Slowdive
spotify:album:4JQ2igmQEWUihSRzWgTiCF - Gas, Narkopop
spotify:album:0D8xltlqklXZ1DV7lFyE22 - Drew McDowall, Unnatural Channel
spotify:album:7Hcbzsu4lqRzPakrCnpgb9 - Emptyset, Borders
spotify:album:4y372QHtXp8aJCV7M4YkBv - Lawrence English, Cruel Optimism
spotify:album:5EXqFb0ch5dqP2ncl63XVY - Gnod, Just Say No To The Psycho Right-Wing Capitalist Fascist Industrial Death Machine
spotify:album:4yLRI4kaOy4LhSPZ2sCVbE - Godflesh, Post Self
spotify:album:6B1OkPs0AlG9QsHIxKwrgp - William Basinski, A Shadow In Time
spotify:album:6LDgPsDJlyJ948ARpncN9c - Alessandro Cortini, Avanti
spotify:album:02RHfsgbl7H9lnYXEsTLsA - Wire, Silver / Lead