Skip to content

Instantly share code, notes, and snippets.

Ravi Shekhar r-shekhar

Block or report user

Report or block r-shekhar

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@r-shekhar
r-shekhar / assign.py
Created Jun 1, 2017
Assign Taxi Zones Snippet
View assign.py
def assign_taxi_zones(df, lon_var, lat_var, locid_var):
"""Joins DataFrame with Taxi Zones shapefile.
This function takes longitude values provided by `lon_var`, and latitude
values provided by `lat_var` in DataFrame `df`, and performs a spatial join
with the NYC taxi_zones shapefile.
The shapefile is hard coded in, as this function makes a hard assumption of
latitude and longitude coordinates. It also assumes latitude=0 and
longitude=0 is not a datapoint that can exist in your dataset. Which is
reasonable for a dataset of New York, but bad for a global dataset.
Only rows where `df.lon_var`, `df.lat_var` are reasonably near New York,
@r-shekhar
r-shekhar / zeppelin_ubuntu.md
Created May 27, 2017 — forked from pratos/zeppelin_ubuntu.md
To Install Zeppelin [Scala and Spark] in Ubuntu 16.04LTS
View zeppelin_ubuntu.md

Install Zeppelin in Ubuntu systems

  • First install Java, Scala and Spark in Ubuntu

    • Install Java
      sudo apt-add-repository ppa:webupd8team/java
      sudo apt-get update
      sudo apt-get install oracle-java8-installer
      
@r-shekhar
r-shekhar / spec-file.txt
Created May 25, 2017
Conda Environment Specifications File (Explicit)
View spec-file.txt
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://repo.continuum.io/pkgs/free/linux-64/_license-1.1-py35_1.tar.bz2
https://repo.continuum.io/pkgs/free/linux-64/_nb_ext_conf-0.3.0-py35_0.tar.bz2
https://repo.continuum.io/pkgs/free/linux-64/alabaster-0.7.9-py35_0.tar.bz2
https://repo.continuum.io/pkgs/free/linux-64/anaconda-custom-py35_0.tar.bz2
https://repo.continuum.io/pkgs/free/linux-64/anaconda-clean-1.0.0-py35_0.tar.bz2
https://repo.continuum.io/pkgs/free/linux-64/anaconda-client-1.5.1-py35_0.tar.bz2
@r-shekhar
r-shekhar / run_jupyter.sh
Created Apr 14, 2017
Run Jupyter on a Flintrock provisioned cluster
View run_jupyter.sh
#!/bin/bash
export spark_master_hostname=`ec2-metadata |grep local-ipv4 |cut -f2 -d' '`
export memory=2000M
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7777"
pyspark --master spark://$spark_master_hostname:7077 \
--executor-memory $memory --driver-memory $memory
@r-shekhar
r-shekhar / commands.sh
Created Apr 14, 2017
Conda setup for Dask
View commands.sh
#!/bin/bash
sudo apt update && sudo apt upgrade -y
sudo apt install s3cmd awscli -y
wget https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh
bash Miniconda*sh -b -p ${HOME}/miniconda3
echo "export PATH=${HOME}/miniconda3/bin:${PATH}" >> ~/.bashrc
export PATH=${HOME}/miniconda3/bin:${PATH}
conda install -c conda-forge geopandas python-snappy dask distributed \
fastparquet fiona numba boto3 jupyter seaborn -y
@r-shekhar
r-shekhar / clean_install.md
Created May 15, 2016
Setting up Ubuntu 14.04 for Deep Learning, PySpark, and Climate Science
View clean_install.md

Setting up an Ubuntu 14.04 clean install for Development with PySpark and Deep Learning

  • Assumes NVIDIA GPU
  • Prefers Ubuntu native packages over Docker for simplicity
sudo apt-get update && sudo apt-get -y upgrade

0. Install Java

View .zshrc
##############################################################################
#super duper shell wildcards. Makes zsh worth using
setopt extendedglob
##############################################################################
#keep history file between sessions
HISTSIZE=1000000
SAVEHIST=1000000
HISTFILE=$HOME/.history
setopt APPEND_HISTORY
You can’t perform that action at this time.