Skip to content

Instantly share code, notes, and snippets.

@gdmcdonald
gdmcdonald / eff_fuzzy_match.R
Last active February 1, 2022 13:01
Efficient fuzzy match of two data frames by one common string column in R, outputing a list of the matching and non-matching rows
#Efficient fuzzy match of two data frames by one common column
library(dplyr)
library(fuzzyjoin)
library(stringdist)
eff_fuzzy_match<-function(data_frame_A,
data_frame_B,
by_what,
choose_p = 0.1,
choose_max_dist = 0.4,
@jganzabal
jganzabal / Nvidia Titan XP + MacBook Pro + Akitio Node + Tensorflow + Keras.md
Last active July 10, 2025 15:43
How to setup Nvidia Titan XP for deep learning on a MacBook Pro with Akitio Node + Tensorflow + Keras
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@conormm
conormm / r-to-python-data-wrangling-basics.md
Last active May 3, 2025 19:21
R to Python: Data wrangling with dplyr and pandas

R to python data wrangling snippets

The dplyr package in R makes data wrangling significantly easier. The beauty of dplyr is that, by design, the options available are limited. Specifically, a set of key verbs form the core of the package. Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe. Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R. The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).

dplyr is organised around six key verbs:

@brunosan
brunosan / index.md
Last active June 15, 2018 23:57
This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.

This is a list inspired by some of our current or potential lines of work at the World Bank Innovation Labs. The “Innovations in Big Data Analytics” program helps to strengthen the World Bank capabilities to effectively use big data in its operational and strategic work.

We are always looking for great Data Scientists. If you can solve any of these [using open software], you'll be heads down helping us from day one. Email us to brunosanchez@worldbank.org

(This list is updated frequently).

1. Nightlights from Satellite

We are building an open stack to process nightly data from satellite and query light output from all known villages. Currently we are doing 20 years of nightly data for 600,000 villages in India.

@darribas
darribas / panel_FE.ipynb
Last active January 10, 2021 18:41
Fixed-Effects panel OLS
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@EconometricsBySimulation
EconometricsBySimulation / lmOut
Last active August 13, 2025 16:03
A simple command to grab coefficients, t-stats, p-values, f-stats, etc from a regression and export them as an easy to use spreadsheet.
lmOut <- function(res, file="test.csv", ndigit=3, writecsv=T) {
# If summary has not been run on the model then run summary
if (length(grep("summary", class(res)))==0) res <- summary(res)
co <- res$coefficients
nvar <- nrow(co)
ncol <- ncol(co)
f <- res$fstatistic
formatter <- function(x) format(round(x,ndigit),nsmall=ndigit)