Skip to content

Instantly share code, notes, and snippets.

View drorata's full-sized avatar

Dror Atariah drorata

View GitHub Profile
@drorata
drorata / sparkWordCount.py
Created August 18, 2015 10:31
Sample of word count using spark from a local file. Order the result in a descending order.
import re
from pyspark import SparkContext
print "-----------------===========================-----------------"
print "-----------------==========Staring==========-----------------"
print "-----------------===========================-----------------"
sc = SparkContext(appName = "simple app")
print "-----------------===========================-----------------"
print "-----------------==========Loaded file======-----------------"
print "-----------------===========================-----------------"
Cat,Country,Count
A,DE,0.4596065657
B,DE,0
C,US,0.3224789091
A,UK,0.4740651803
B,US,5
C,UK,0.6467712916
A,UK,0.4206986968
B,DE,0.647481787
C,UK,0.7009353881
@drorata
drorata / gist:e58b673fd87edfc92960
Last active January 19, 2016 08:44
Multi-type frequency count
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assume you have events log with two event types `foo` and `bar`. The goal is to obtain a frequency bar plot of the events per some predescribed time interval."
]
},
{
@drorata
drorata / toree-example
Created March 14, 2016 14:42
Toree, Jupyter and Spark
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Toree on Mac\n",
"\n",
"## Installing\n",
"\n",
@drorata
drorata / gist:cab06bdcc77a9afe5625
Created March 16, 2016 14:13
Frequencies of events per day
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assume you have events log with two event types `foo` and `bar`. The goal is to obtain a frequency bar plot of the events per some predescribed time interval."
]
},
{
{
"basics": {
"name": "Dr. Foo Bar",
"label": "Programmer",
"picture": "",
"email": "drorata@gmail.com",
"phone": "(912) 555-4321",
"website": "http://richardhendricks.com",
"summary": "Richard hails from Tulsa. He has earned degrees from the University of Oklahoma and Stanford. (Go Sooners and Cardinals!) Before starting Pied Piper, he worked for Hooli as a part time software developer. While his work focuses on applied information theory, mostly optimizing lossless compression schema of both the length-limited and adaptive variants, his non-work interests range widely, everything from quantum computing to chaos theory. He could tell you about it, but THAT would NOT be a “length-limited” conversation!",
"location": {
@drorata
drorata / example.py
Created June 24, 2016 14:00
Mean and variance normalization of pandas.DataFrame
import pandas as pd
import numpy as np
from sklearn import preprocessing
X = pd.DataFrame(np.random.normal(size=(50,4), scale=1, loc=3))
print(X.describe())
scalar = preprocessing.StandardScaler().fit(X)
print(pd.DataFrame(scalar.transform(X)).describe())
In [1]: import numpy as np
In [2]: def myfunc(a,b):
...: if a > b:
...: return a+b
...: else:
...: return float(a) / float(b)
...:
In [3]: vfunc = np.vectorize(myfunc)
@drorata
drorata / Testing snakebite.ipynb
Created January 4, 2017 15:24
Minimal example with Snakebite
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@drorata
drorata / map_vs_flatMap_example.ipynb
Created January 9, 2017 14:15
Compare map vs flatMap
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.