Skip to content

Instantly share code, notes, and snippets.

View germayneng's full-sized avatar

Germayne germayneng

View GitHub Profile
@germayneng
germayneng / stratifiedCV.r
Created August 11, 2017 06:19 — forked from mrecos/stratifiedCV.r
Stratified K-folds Cross-Validation with Caret
require(caret)
#load some data
data(USArrests)
### Prepare Data (postive observations)
# add a column to be the strata. In this case it is states, it can be sites, or other locations
# the original data has 50 rows, so this adds a state label to 10 consecutive observations
USArrests$state <- c(rep(c("PA","MD","DE","NY","NJ"), each = 5))
# this replaces the existing rownames (states) with a simple numerical index
@germayneng
germayneng / gist:431f2821c849359fdd697528abf200f2
Last active December 11, 2017 04:57 — forked from conormm/r-to-python-data-wrangling-basics.md
R to Python: Data wrangling with dplyr and pandas (update)
R to python useful data wrangling snippets
The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
dplyr is organised around six key verbs
@germayneng
germayneng / knitr_header.r
Created September 9, 2018 12:14 — forked from cfljam/knitr_header.r
Global Options Chunk for Knitr in RMarkdown Documents
###
### Thanks to Karl Broman http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html
```{r global_options, include=FALSE}
rm(list=ls()) ### To clear namespace
library(knitr)
opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
echo=TRUE, warning=FALSE, message=FALSE)
```
@germayneng
germayneng / parallel.py
Created March 11, 2019 04:42 — forked from MInner/parallel.py
Executing jobs in parallel with a nice progress bar: a tqdm wrapper for joblib.Parallel
from tqdm import tqdm_notebook as tqdm
from joblib import Parallel, delayed
import time
import random
def func(x):
time.sleep(random.randint(1, 10))
return x
@germayneng
germayneng / altair_app.py
Created March 15, 2019 02:29 — forked from gschivley/altair_app.py
Altair plot in Plotly Dash
# -*- coding: utf-8 -*-
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import pandas as pd
import sqlalchemy
import altair as alt
import io
from vega_datasets import data
@germayneng
germayneng / distcorr.py
Created March 15, 2020 12:38 — forked from satra/distcorr.py
Distance Correlation in Python
from scipy.spatial.distance import pdist, squareform
import numpy as np
from numbapro import jit, float32
def distcorr(X, Y):
""" Compute the distance correlation function
>>> a = [1,2,3,4,5]
>>> b = np.array([1,2,9,4,4])
@germayneng
germayneng / aiohttp-example.py
Created April 28, 2020 03:20 — forked from Den1al/aiohttp-example.py
concurrent http requests with aiohttp
# author: @Daniel_Abeles
# date: 18/12/2017
import asyncio
from aiohttp import ClientSession
from timeit import default_timer
import async_timeout
async def fetch_all(urls: list):
@germayneng
germayneng / BigQueryGeohashEncode.sql
Last active May 27, 2020 12:42 — forked from killerbees/BigQueryGeohashEncode.sql
Big Query STD SQL Gist for Geohash Encode
#standardSQL
CREATE TEMPORARY FUNCTION geohashEncode(latitude FLOAT64, logitude FLOAT64, precision FLOAT64)
RETURNS STRING
LANGUAGE js
AS """
var Geohash = {};
/* (Geohash-specific) Base32 map */
Geohash.base32 = '0123456789bcdefghjkmnpqrstuvwxyz';
lat = Number(latitude);
@germayneng
germayneng / export-pyspark-schema-to-json.py
Created September 23, 2020 04:22 — forked from stefanthoss/export-pyspark-schema-to-json.py
Export/import a PySpark schema to/from a JSON file
import json
from pyspark.sql.types import *
# Define the schema
schema = StructType(
[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
)
# Write the schema
with open("schema.json", "w") as f:
@germayneng
germayneng / distcorr.py
Created December 8, 2020 06:05 — forked from raphaelvallat/distcorr.py
Distance correlation with permutation test
import numpy as np
import multiprocessing
from joblib import Parallel, delayed
from scipy.spatial.distance import pdist, squareform
def _dcorr(y, n2, A, dcov2_xx):
"""Helper function for distance correlation bootstrapping.
"""
# Pairwise Euclidean distances
b = squareform(pdist(y, metric='euclidean'))