Skip to content

Instantly share code, notes, and snippets.

View dlebech's full-sized avatar

David Volquartz Lebech dlebech

View GitHub Profile
@dlebech
dlebech / mysql_to_gcs.php
Created March 12, 2024 14:58
mysqldump-php to Google Cloud Storage with gzip streaming
<?php
# Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/
# Install requirements:
# composer require ifsnop/mysqldump-php
# composer require google/cloud-storage
require_once __DIR__ . '/vendor/autoload.php';
require_once __DIR__ . '/config.php';
@dlebech
dlebech / fasttext_langdetect_example.py
Last active May 17, 2021 20:45
Quick example of language detection with fasttext small model (less than 1MB model)
# Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/
# Prepare with: pip install fasttext
# Tested with Python 3.9
import urllib.request
import fasttext
# Download small model (917KB)
# Other options: https://fasttext.cc/docs/en/language-identification.html
@dlebech
dlebech / imagemagick-oneliners.bash
Created May 2, 2021 20:05
Some useful one-liners for Imagemagick, including resizing, convert to pencil and fake miniature
# Make * work in bash or zsh:
# zsh
setopt extendedglob
# bash
shopt -s extglob
# Resize all images (in-place) in a folder
# to 1200 pixels on the longest side
mogrify -resize 1200 *.jpg
@dlebech
dlebech / binomial_prob.sql
Last active August 15, 2022 20:41
Binomial probability calculation function in SQL (BigQuery)
-- Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/
-- Calculate the probability of k successes for n trials with probability of success k,
-- using the binomial distribution.
-- Calculate the binomial coefficient using the "multiplicative formula"
CREATE OR REPLACE FUNCTION functions.binomial_coef(n INT64, k INT64) AS ((
-- k!/(n!*(n-k)!)
-- We're going to have a hard time doing factorials here,
-- but based on the "multiplicative formula" in Wiki, it should be possible:
@dlebech
dlebech / paintings_crawl.py
Created July 26, 2020 16:00
Python Script for downloading and organizing images from The Painting Dataset: https://www.robots.ox.ac.uk/~vgg/data/paintings/
# Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/
#
# Download images from The Painting Dataset: https://www.robots.ox.ac.uk/~vgg/data/paintings/painting_dataset_2018.xlsx
# The image urls are outdaed in the Excel sheet but the painting urls are not,
# so this script re-crawls those images and downloads them locally.
# It works as of July 2020.
#
# Run this first with:
# $ scrapy runspider paintings_crawl.py -o paintings.json
# Images are stored in 'out/raw'
@dlebech
dlebech / ft_extract.py
Last active June 8, 2019 07:11
Extract photos and names of members of Danish parliament
# Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/
# Run this file first, e.g.:
# $ scrapy runspider ft_extract.py -o members.json
#
# It will probably stop working if they change their urls for the contact list of course.
# Worked in Spring of 2019
import scrapy
import re
from urllib.parse import urlparse, urlunparse
@dlebech
dlebech / tokenizer.js
Last active August 11, 2022 13:34
Keras text tokenizer in JavaScript with minimal functionality
// Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/
class Tokenizer {
constructor(config = {}) {
this.filters = config.filters || /[\\.,/#!$%^&*;:{}=\-_`~()]/g;
this.lower = typeof config.lower === 'undefined' ? true : config.lower;
// Primary indexing methods. Word to index and index to word.
this.wordIndex = {};
this.indexWord = {};
@dlebech
dlebech / oneliners_matplotlib.py
Last active March 13, 2020 15:29
Matplotlib useful one liners that I always forget
# Matplotlib
# Creating a list of colors (e.g. for a bar chart)
# "Blues" is the colormap. It can be any colormap
# https://matplotlib.org/examples/color/colormaps_reference.html
colors = [matplotlib.colors.to_hex(c) for c in plt.cm.Blues(np.linspace(0, 1, len(some_dataframe.index)))]
# Globally adjusting DPI and figure size
matplotlib.rcParams['figure.dpi'] = 100
matplotlib.rcParams['figure.figsize'] = [6.0, 4.0]
@dlebech
dlebech / keras_embedding_onehot.py
Last active June 16, 2018 10:30
Minimal Keras examples for various purposes
# Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/
# Create a Keras embedding layer with an initial one-hot encoding by using identity initializer
import tensorflow as tf
import numpy as np
# Input sequence consisting of four features (e.g. words)
# Let's pretend this is "hello world hello everyone else"
# Where hello is then mapped to 1, world = 0, everyone = 2, else = 3,
a = np.array([[1, 0, 1, 2, 3]])
@dlebech
dlebech / perl.sh
Created May 4, 2017 10:21
Command-line notes
# Convert a unix timestamp in millisconds in a column of a CSV to a date
cat thefile.csv | perl -MPOSIX -pe 's/(^\d+),/strftime("%F,", localtime($1\/1000))/ge'