Skip to content

Instantly share code, notes, and snippets.

@turingDH
turingDH / Supervised_SKLearn.ipynb
Created March 28, 2019 21:34
SKLearn Supervised Learning
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@turingDH
turingDH / sum_square_Pool.py
Created March 26, 2019 00:12
multiprocessing Pool example
## Credit to LucidProgramming https://www.youtube.com/watch?v=u2jTn-Gj2Xw for the walkthrough. My changes weren't terribly significant.
import os # to get core count on machine
import time # to time the duration
from multiprocessing import Pool # to instantiate a Pool of workers to distribute the process across the cores in CPU
def sum_square(number):
s=0
@turingDH
turingDH / gist:80a91745a7d60d4272486c0618a91476
Created November 15, 2018 03:28
Search script files in RStudio; open via file protocol
library(dplyr)
library(DT)
library(purrr)
library(tidyr)
phraseToSearch <- 'this|that'
scriptFiles <-
bind_rows(
map_dfc('~/R', list.files, full.names = T) %>% rename(fileNames = V1),
@turingDH
turingDH / sparklyr_cv_pipeline_example.R
Created April 5, 2018 05:02 — forked from eddjberry/sparklyr_cv_pipeline_example.R
An example of creating a Spark pipeline with sparklyr
# Load packages
library(dplyr)
library(sparklyr)
# Set up connect
sc <- spark_connect(master = "local")
# Create a Spark DataFrame of mtcars
mtcars_sdf <- copy_to(sc, mtcars)
@turingDH
turingDH / gist:54fa4d3a712760ccba15ccb7ebaea8d1
Created March 30, 2018 17:34 — forked from conormm/r-to-python-data-wrangling-basics.md
R to Python: Data wrangling with dplyr and pandas
R to python useful data wrangling snippets
The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
dplyr is organised around six key verbs
import numpy as np
import pandas as pd
greeks = [chr(code) for code in range(945,970)]
Greeks = ['alpha', 'beta', 'gamma', 'delta', 'epsilon', 'zeta', 'eta', 'theta', 'iota', 'kappa', 'lambda', 'mu', 'nu', 'xi', 'omicron', 'pi', 'rho', 'word-final sigma', 'sigma', 'tau', 'upsilon', 'phi', 'chi', 'psi', 'omega']
df = pd.DataFrame(greeks, Greeks).reset_index().reset_index()
df.rename(columns={df.columns[0]:"chr_val", df.columns[1]:"greek text", df.columns[2]:"greek symbol"}, inplace=True)
df['chr_val'] += 945
print(df)
@turingDH
turingDH / AirQuality_Regression.ipynb
Last active March 30, 2018 00:39
UCI Air Quality = initial EDA
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@turingDH
turingDH / LogisticRegressionRanges_R_Screen.png
Last active February 9, 2018 03:52
Logistic Regression: probability vs. odds vs. log odds
LogisticRegressionRanges_R_Screen.png