Skip to content

Instantly share code, notes, and snippets.

View korenmiklos's full-sized avatar

Miklós Koren korenmiklos

View GitHub Profile
@tyleransom
tyleransom / linking_with_embeddings.R
Last active January 17, 2024 01:59
Using embeddings to fuzzy match databases
library(tidyverse)
library(openai)
#-------------------------------------------------------------------------------
# Step 1: Open AI API key
#-------------------------------------------------------------------------------
# Your OpenAI API key should be an environment variable you set in ~.Renviron
# ... you should never put your API key directly in your code!
@larsvilhuber
larsvilhuber / download.do
Last active October 23, 2023 16:01
Conditional Stata download
// this would be the directory to put data
global root : pwd
global data "$root/datadir"
cap mkdir "$data"
// now to the file in question
global nhtsfile "nhts-ascii.zip"
capture confirm file "$data/$nhtsfile"
if _rc != 0 {
// what to do when file does NOT exist
copy https://nhts.ornl.gov/2009/download/Ascii.zip "$data/$nhtsfile"
@seanjtaylor
seanjtaylor / benfords_law.R
Created November 6, 2020 06:46
Benford's Law
library(tidyverse)
library(rvest)
doc <- read_html("https://county.milwaukee.gov/EN/County-Clerk/Off-Nav/Election-Results/Election-Results-Fall-2020")
doc %>%
html_nodes('table') %>%
.[[4]] %>%
html_table() %>%
tail(-1) %>%
mutate(ward = 1:n(),
biden = stringr::str_sub(X3, 1, 1),
/* So how does this work?
I'm using ANSI escape sequences to control the behavior of the terminal while
cat is outputting the text. I deliberately place these control sequences inside
comments so the C++ compiler doesn't try to treat them as code.*/
//
/*The commands in the fake code comment move the cursor to the left edge and
clear out the line, allowing the fake code to take the place of the real code.
And this explanation uses similar commands to wipe itself out too. */
//
#include <cstdio>
@seece
seece / dod.md
Created September 13, 2020 22:24
Data-Oriented Design Book Review

Data-Oriented Design Book Review

Pekka Väänänen, Sep 14 2020

59b4f66b3c27439a8dd629ade17d65f8

Data-Oriented Design (2018) by Richard Fabian

Computers keep getting faster but the future ain't what it used to be. Instead of higher clock rates we get deeper pipelines, higher latencies, more cores. Programming these systems requires paying attention to how we structure and access our data. In Data-Oriented Design Richard Fabian—who has worked at Frontier Developments, Rockstar Games, and Team17—presents us an approach to reason about these issues from a C++ game developer's perspective.

Data-oriented design is about caches and decoupling meaning from data. The former implies laying out your data so that they're compact and predictably accessed. The latter means exposing the raw transforms from one sequence of bits to another. For example, finding the pla

@carlislerainey
carlislerainey / makefile-dag.md
Last active October 12, 2022 10:18
Drawing the Makefile DAG
@DavidWells
DavidWells / aws-lambda-redirect.js
Created June 28, 2018 20:48
How to do a 301 redirect from an AWS lambda function
exports.handler = (event, context, callback) => {
const response = {
statusCode: 301,
headers: {
Location: 'https://google.com',
}
};
return callback(null, response);
}
@martijngastkemper
martijngastkemper / csvkit-example.bash
Last active May 26, 2022 07:54
CSVkit example to convert CSV to SQLLite and query the data
# This example requires CSVkit (https://github.com/wireservice/csvkit). A Python toolset with a lot of very cool CSV tools.
# IMPORTANT NOTE: make sure to use a proper csv file. I had a lot of trouble with a csv file created by a service with Dutch
# as locale. Changing it to US solved the problem. Some locales use comma's to seperate point numbers. A semicolon is then
# used.
# Create table example and Load file-a.csv into it
csvsql --db sqlite:///example.db --table example --insert file-a.csv
# Add an extra file to table example
@evanwill
evanwill / gitBash_windows.md
Last active April 26, 2024 03:58
how to add more utilities to git bash for windows, wget, make

How to add more to Git Bash on Windows

Git for Windows comes bundled with the "Git Bash" terminal which is incredibly handy for unix-like commands on a windows machine. It is missing a few standard linux utilities, but it is easy to add ones that have a windows binary available.

The basic idea is that C:\Program Files\Git\mingw64\ is your / directory according to Git Bash (note: depending on how you installed it, the directory might be different. from the start menu, right click on the Git Bash icon and open file location. It might be something like C:\Users\name\AppData\Local\Programs\Git, the mingw64 in this directory is your root. Find it by using pwd -W). If you go to that directory, you will find the typical linux root folder structure (bin, etc, lib and so on).

If you are missing a utility, such as wget, track down a binary for windows and copy the files to the corresponding directories. Sometimes the windows binary have funny prefixes, so

from datetime import datetime
import csv
import sys
START_DATE = 'start_date'
END_DATE = 'end_date'
SPELL_ID = 'spell_id'
IMPUTED_END_DATE = 'imputed_end_date'
TOLERANCE = 31 # days
PRIMARY_KEYS = ['frame_id', 'person_id']