Jon Roosevelt yuanzhaoYZ

## github_gpg_key.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / github_gpg_key.md
            
            
              Last active
              March 13, 2020 15:39
                — forked from ankurk91/github_gpg_key.md
            
              
                Github : Signing commits using GPG (Ubuntu/Mac)
              
          
    Github : Signing commits using GPG (Ubuntu/Mac) 🔐


Do you have an Github account ? If not create one.
Install required tools
Latest Git Client
gpg tools

# Ubuntu
sudo apt-get install gpa seahorse
# MacOS with https://brew.sh/


## zeppelin_s3_backend.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                yuanzhaoYZ
                / zeppelin_s3_backend.md
            
            
              Last active
              July 9, 2022 00:48
            
              
                S3 backed notebooks for Zeppelin 
              
          
    7. S3 backed notebooks for Zeppelin


SSH into your master node — https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node-ssh.html
Create the zeppelin-site.xml file if it doesn’t already exist — 

sudo cp /etc/zeppelin/conf/zeppelin-site.xml.template /etc/zeppelin/conf/zeppelin-site.xml


## rllab installation with anaconda.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / rllab installation with anaconda.md
            
            
              Created
              November 25, 2018 16:43
            
              
                rllab installation with anaconda (tested on mac-os)
              
          
    Installation

Assume you have anaconda installed on your computer
conda env remove -n rllab_test -y
cd ~/Downloads
git clone https://github.com/rll/rllab.git
cd rllab
conda env create -n rllab_test -f environment.yml

  
## install_anaconda_jupyter.sh
# Install Anaconda
wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
bash Anaconda3-5.1.0-Linux-x86_64.sh -b -f -p $HOME/anaconda
export PATH="$HOME/anaconda/bin:$PATH"
echo 'export PATH="$HOME/anaconda/bin:$PATH"' >> ~/.bashrc
conda update -y -n base conda

# Install Jupyter
conda create -y -n jupyter python=3.5 jupyter nb_conda
screen -dmS jupyter

## debug_spark.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / debug_spark.md
            
            
              Created
              September 1, 2017 12:00
            
              
                Debugging Spark
              
          
    To connect a debugger to the driver

Append the following to your spark submit (or gatk-launch) options:
replace 5005 with a different available port if necessary
--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

This will suspend the driver until it gets a remote connection from intellij.

  
## ubuntu_nic_bonding.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / ubuntu_nic_bonding.md
            
            
              Created
              July 29, 2017 05:21
            
              
                nic bonding@ubuntu 14.04
              
          
    auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
  bond-master bond0

auto eth1
iface eth1 inet manual


## jinja_template
import datetime
from jinja2 import Environment

start = datetime.datetime.strptime("2017-02-01", "%Y-%m-%d")
end = datetime.datetime.strptime("2017-07-24", "%Y-%m-%d")
date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days+1)]

template = """spark-submit --master yarn --deploy-mode cluster --class com.xyz.XXXAPP s3://com.xyz/aa-1.5.11-all.jar --input-request-events s3://com.xyz/data/event_{{date_str}}/* --input-geofence-events s3://com.xyz/data2/event_/{{date_str}}/* --output s3://com.xyz/output/{{date_str}}"""


## jupyter_notebook@EMR.md

      
              1 file
            
          
              1 fork
            
          
              1 comment
            
          
              4 stars
            
          
                yuanzhaoYZ
                / jupyter_notebook@EMR.md
            
            
              Last active
              September 20, 2019 15:37
            
              
                Run Jupyter Notebook and JupyterHub on Amazon EMR
              
          
    Jupyter on EMR allows users to save their work on Amazon S3 rather than on local storage on the EMR cluster (master node).
To store notebooks on S3, use:
--notebook-dir <s3://your-bucket/folder/>

To store notebooks in a directory different from the user’s home directory, use:
--notebook-dir <local directory>


## bitbucket_clone.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / bitbucket_clone.md
            
            
              Last active
              July 5, 2017 12:40
            
              
                Clone all git repositories from BitBucket
              
          
    curl -s  -k https://USERNAME:PASSWORD@api.bitbucket.org/1.0/user/repositories | python -c 'import sys, json, os; r = json.loads(sys.stdin.read()); [os.system("git clone %s" % d["resource_uri"].replace("/1.0/repositories","https://USERNAME:PASSWORD@bitbucket.org")+".git") for d in r]'

  
## spark_submit.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / spark_submit.md
            
            
              Last active
              July 9, 2017 02:44
            
          
    Pyspark

spark-submit

spark-submit --master yarn --deploy-mode cluster --name pyspark_job --driver-memory 2G --driver-cores 2 --executor-memory 12G --executor-cores 5 --num-executors 10 --conf spark.yarn.executor.memoryOverhead=4096 --conf spark.task.maxFailures=36 --conf spark.driver.maxResultSize=0 --conf spark.network.timeout=800s --conf spark.scheduler.listenerbus.eventqueue.size=500000 --conf spark.speculation=true --py-files lib.zip,lib1.zip,lib2.zip spark_test.py
spark_test.py

import pyspark
import sys
from pyspark.sql import SQLContext
	# Install Anaconda
	wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
	bash Anaconda3-5.1.0-Linux-x86_64.sh -b -f -p $HOME/anaconda
	export PATH="$HOME/anaconda/bin:$PATH"
	echo 'export PATH="$HOME/anaconda/bin:$PATH"' >> ~/.bashrc
	conda update -y -n base conda

	# Install Jupyter
	conda create -y -n jupyter python=3.5 jupyter nb_conda
	screen -dmS jupyter
	import datetime
	from jinja2 import Environment

	start = datetime.datetime.strptime("2017-02-01", "%Y-%m-%d")
	end = datetime.datetime.strptime("2017-07-24", "%Y-%m-%d")
	date_generated = [start + datetime.timedelta(days=x) for x in range(0, (end-start).days+1)]

	template = """spark-submit --master yarn --deploy-mode cluster --class com.xyz.XXXAPP s3://com.xyz/aa-1.5.11-all.jar --input-request-events s3://com.xyz/data/event_{{date_str}}/* --input-geofence-events s3://com.xyz/data2/event_/{{date_str}}/* --output s3://com.xyz/output/{{date_str}}"""