Skip to content

Instantly share code, notes, and snippets.

@hubgit
hubgit / download-all-linked-files.js
Created March 3, 2024 17:26
Download all downloadable links in a single zip archive
const links = document.querySelectorAll('a[download]')
if (links.length === 0) {
console.log("No downloadable files found")
return
}
const handle = await showSaveFilePicker({
suggestedName: 'files.zip',
types: [{
@hubgit
hubgit / mlx-mixtral-macos.md
Created January 9, 2024 22:45
Run Mixtral-8x7B-Instruct-v0.1 LLM on macOS (Apple Silicon) using MLX
brew install git-lfs 

git clone https://github.com/ml-explore/mlx-examples
cd mlx-examples/llms/mixtral
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
cd Mixtral-8x7B-Instruct-v0.1
git lfs pull --include "consolidated.*.pt" # ~100GB
git lfs pull --include "tokenizer.model"
@hubgit
hubgit / main.ts
Created December 11, 2023 12:18
Fetch tracks played on the Independent Music Podcast
import { DOMParser, type Element, } from "https://deno.land/x/deno_dom@v0.1.43/deno-dom-wasm.ts";
const parser = new DOMParser()
const fetchDOM = async (url: string) => {
const response = await fetch(url)
if (!response.ok) {
throw new Error('Response was not ok')
}
const html = await response.text()
@hubgit
hubgit / chat.ts
Last active May 11, 2023 08:35
Vercel Edge Function for an OpenAI API request
import type { NextRequest } from 'next/server'
import { createParser } from 'eventsource-parser'
export const config = {
runtime: 'edge',
}
export default async function handler(req: NextRequest) {
const encoder = new TextEncoder()
const decoder = new TextDecoder()
@hubgit
hubgit / textract-pdf-tables.sh
Last active June 15, 2023 13:31
Extract tabular data from a PDF to CSV
# brew install awscli
# aws configure
aws s3 cp your-file.pdf s3://your-bucket/your-file.pdf
# https://pypi.org/project/amazon-textract-helper/
# https://github.com/aws-samples/amazon-textract-textractor/tree/master/helper
# pip install amazon-textract-helper
amazon-textract --input-document s3://your-bucket/your-file.pdf --features TABLES --pretty-print TABLES --pretty-print-table-format=csv
# https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/
[...document.querySelectorAll('div,main,body')].forEach(node => {
node.style.position = 'relative'
node.style.height = 'auto'
node.style.overflowY = 'visible'
});
[...document.querySelectorAll('button')].forEach(node => {
node.remove()
});
get_iplayer --pid m001d2h4 --subtitles --output "m001d2h4"
ffmpeg -i m001d2h4/Only_Connect_Series_18_-_07._Scrummagers_v_Crustaceans_m001d2h4_original.mp4 -vf "subtitles=m001d2h4/Only_Connect_Series_18_-_07._Scrummagers_v_Crustaceans_m001d2h4_original.srt" -ss 17:49 -t 5 -copyts output.mov
@hubgit
hubgit / line-reader-transform-stream.js
Created September 18, 2022 19:32
LineReader TransformStream
lineReader = () => {
let buffer = "";
return new TransformStream({
transform(chunk, controller) {
buffer += chunk;
const parts = buffer.split("\n");
parts.slice(0, -1).forEach((part) => controller.enqueue(part));
buffer = parts[parts.length - 1];
},
@hubgit
hubgit / genbank-to-sqlite.ts
Last active September 5, 2022 22:04
A ReadableStream created from an async iterator which fetches paginated data, piped into a WritableStream which inserts items into an SQLite database.
import { parse } from 'https://deno.land/x/xml@2.0.4/mod.ts'
import { readableStreamFromIterable } from 'https://deno.land/std@0.96.0/io/streams.ts'
import { Database } from 'https://deno.land/x/sqlite3@0.5.2/mod.ts'
import ProgressBar from 'https://deno.land/x/progress@v1.2.7/mod.ts'
let counter = 0
const progress = new ProgressBar({
title: 'processing:',
interval: 100,
@hubgit
hubgit / README.md
Last active September 2, 2022 07:05
Processing the Crossref Public Data File

First, download the data files using a BitTorrent client:

aria2c https://academictorrents.com/download/4dcfdf804775f2d92b7a030305fa0350ebef6f3e.torrent

Next, convert the data files to a single newline-delimited JSON file:

deno run process.ts