Germayne germayneng

## stratifiedCV.r
require(caret)

#load some data
data(USArrests)

### Prepare Data (postive observations)
# add a column to be the strata.  In this case it is states, it can be sites, or other locations
# the original data has 50 rows, so this adds a state label to 10 consecutive observations
USArrests$state <- c(rep(c("PA","MD","DE","NY","NJ"), each = 5))
# this replaces the existing rownames (states) with a simple numerical index

## gist:431f2821c849359fdd697528abf200f2
R to python useful data wrangling snippets

The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).

dplyr is organised around six key verbs

## knitr_header.r
###
### Thanks to Karl Broman http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html

```{r global_options, include=FALSE}
rm(list=ls()) ### To clear namespace
library(knitr)
opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
               echo=TRUE, warning=FALSE, message=FALSE)
```

## parallel.py
from tqdm import tqdm_notebook as tqdm
from joblib import Parallel, delayed
import time

import random

def func(x):
    time.sleep(random.randint(1, 10))
    return x

## altair_app.py
# -*- coding: utf-8 -*-
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import pandas as pd
import sqlalchemy
import altair as alt
import io
from vega_datasets import data

## distcorr.py
from scipy.spatial.distance import pdist, squareform
import numpy as np

from numbapro import jit, float32

def distcorr(X, Y):
    """ Compute the distance correlation function

    >>> a = [1,2,3,4,5]
    >>> b = np.array([1,2,9,4,4])

## aiohttp-example.py
# author: @Daniel_Abeles
# date:   18/12/2017

import asyncio
from aiohttp import ClientSession
from timeit import default_timer
import async_timeout


async def fetch_all(urls: list):

## BigQueryGeohashEncode.sql
#standardSQL
CREATE TEMPORARY FUNCTION geohashEncode(latitude FLOAT64, logitude FLOAT64, precision FLOAT64)
RETURNS STRING
  LANGUAGE js
  AS """
    var Geohash = {};
    /* (Geohash-specific) Base32 map */
    Geohash.base32 = '0123456789bcdefghjkmnpqrstuvwxyz';

    lat = Number(latitude);

## export-pyspark-schema-to-json.py
import json
from pyspark.sql.types import *

# Define the schema
schema = StructType(
    [StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
)

# Write the schema
with open("schema.json", "w") as f:

## distcorr.py
import numpy as np
import multiprocessing
from joblib import Parallel, delayed
from scipy.spatial.distance import pdist, squareform

def _dcorr(y, n2, A, dcov2_xx):
    """Helper function for distance correlation bootstrapping.
    """
    # Pairwise Euclidean distances
    b = squareform(pdist(y, metric='euclidean'))
	require(caret)

	#load some data
	data(USArrests)

	### Prepare Data (postive observations)
	# add a column to be the strata. In this case it is states, it can be sites, or other locations
	# the original data has 50 rows, so this adds a state label to 10 consecutive observations
	USArrests$state <- c(rep(c("PA","MD","DE","NY","NJ"), each = 5))
	# this replaces the existing rownames (states) with a simple numerical index
	R to python useful data wrangling snippets

	The dplyr package in R makes data wrangling significantly easier.
	The beauty of dplyr is that, by design, the options available are limited.
	Specifically, a set of key verbs form the core of the package.
	Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
	Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
	The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).

	dplyr is organised around six key verbs
	###
	### Thanks to Karl Broman http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html

	```{r global_options, include=FALSE}
	rm(list=ls()) ### To clear namespace
	library(knitr)
	opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
	echo=TRUE, warning=FALSE, message=FALSE)
	```
	from tqdm import tqdm_notebook as tqdm
	from joblib import Parallel, delayed
	import time

	import random

	def func(x):
	time.sleep(random.randint(1, 10))
	return x
	# -- coding: utf-8 --
	import dash
	import dash_core_components as dcc
	import dash_html_components as html
	from dash.dependencies import Input, Output
	import pandas as pd
	import sqlalchemy
	import altair as alt
	import io
	from vega_datasets import data
	from scipy.spatial.distance import pdist, squareform
	import numpy as np

	from numbapro import jit, float32

	def distcorr(X, Y):
	""" Compute the distance correlation function

	>>> a = [1,2,3,4,5]
	>>> b = np.array([1,2,9,4,4])
	# author: @Daniel_Abeles
	# date: 18/12/2017

	import asyncio
	from aiohttp import ClientSession
	from timeit import default_timer
	import async_timeout


	async def fetch_all(urls: list):
	#standardSQL
	CREATE TEMPORARY FUNCTION geohashEncode(latitude FLOAT64, logitude FLOAT64, precision FLOAT64)
	RETURNS STRING
	LANGUAGE js
	AS """
	var Geohash = {};
	/* (Geohash-specific) Base32 map */
	Geohash.base32 = '0123456789bcdefghjkmnpqrstuvwxyz';

	lat = Number(latitude);
	import json
	from pyspark.sql.types import *

	# Define the schema
	schema = StructType(
	[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
	)

	# Write the schema
	with open("schema.json", "w") as f:
	import numpy as np
	import multiprocessing
	from joblib import Parallel, delayed
	from scipy.spatial.distance import pdist, squareform

	def _dcorr(y, n2, A, dcov2_xx):
	"""Helper function for distance correlation bootstrapping.
	"""
	# Pairwise Euclidean distances
	b = squareform(pdist(y, metric='euclidean'))