Jon Roosevelt yuanzhaoYZ

## Burrow config
[zookeeper]
hostname=slave1.example.com
hostname=slave2.example.com
hostname=slave3.example.com
port=2181
timeout=6
lock-path=/burrow/notifier

[kafka "XX-prod"]
broker=slave1.example.com

## spark-elastic
import org.elasticsearch.spark._
import org.apache.spark.sql._
//val sqlContext = new SQLContext(sc)
val options = Map("pushdown" -> "true", "es.nodes" -> "host_ip_here", "es.port" -> "9200",
"es.nodes.wan.only" -> "true")
sqlContext.read.format("es").options(options).load("index_name").write.mode(SaveMode.Overwrite).json("path_to_output")
sc.esRDD("index_name",options)


## wordnet_ES_install.sh
sudo su
mkdir -p /etc/elasticsearch/analysis
cd /etc/elasticsearch/analysis
wget http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz
tar xvzf WNprolog-3.0.tar.gz
mv prolog/wn_s.pl .
rm -rf prolog
rm -f WNprolog-3.0.tar.gz

## ES_MissingFilterWithNestedObjects
Elasticsearch missing filter with nested objects

## jupyter_notebook+spark 2.1(Ubuntu).md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / jupyter_notebook+spark 2.1(Ubuntu).md
            
            
              Last active
              March 30, 2017 17:46
                — forked from tommycarpi/ipython_notebook+spark.md
            
              
                Link Apache Spark with IPython Notebook
              
          
    How to link Apache Spark 2.1.0 with IPython notebook (Ubuntu)

Tested with
Python 2.7, Ubuntu 16.04 LTS, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark & Build it

Download Apache Spark and build it or download the pre-built version.

  
## ipython_notebook+spark(macOS).md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / ipython_notebook+spark(macOS).md
            
            
              Last active
              March 30, 2017 17:45
            
          
    How to link Apache Spark 2.1.0 with IPython notebook (Mac OS X)

Tested with
Python 2.7, OS X 10.11.3 El Capitan, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark & Build it

Download Apache Spark and build it or download the pre-built version.

  
## hadoop copy from s3
hadoop distcp -Dmapreduce.map.memory.mb=4096 -Dfs.s3a.awsAccessKeyId=XXX -Dfs.s3a.awsSecretAccessKey=XXXX  -m 250 hdfs:///data/* s3a://api-v3-data-sources/output/

## OSRM_NA_12.04 -> 15.04.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / OSRM_NA_12.04 -> 15.04.md
            
            
              Last active
              April 16, 2017 16:17
            
              
                OSRM with North america map installation and setup on Ubuntu 12.04 ~ 15.04
              
          
    Install

sudo su
apt-get update -y
apt-get install -y software-properties-common python-software-properties || true
add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get update -y
apt-get install -y zlib1g-dev curl libstdc++-5-dev make binutils libc-dev libgcc-5-dev git
cd /opt
mkdir /opt/osrm


## integrate matlab2017a with jupyter.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                yuanzhaoYZ
                / integrate matlab2017a with jupyter.md
            
            
              Last active
              April 28, 2017 13:49
            
          
    anaconda2


Download and install Anaconda https://www.continuum.io/downloads. Restart Terminal. Or, if you’d prefer to not get the full Anaconda software, check out this post.

wget https://repo.continuum.io/archive/Anaconda2-4.3.1-MacOSX-x86_64.sh
bash Anaconda2-4.3.1-MacOSX-x86_64.sh 


In terminal, type

/Users/zeta/anaconda/bin/pip install matlab_kernel


## bundle python lib for pyspark
pip install -t dependencies -r requirements.txt
cd dependencies
zip -r ../dependencies.zip .
	[zookeeper]
	hostname=slave1.example.com
	hostname=slave2.example.com
	hostname=slave3.example.com
	port=2181
	timeout=6
	lock-path=/burrow/notifier

	[kafka "XX-prod"]
	broker=slave1.example.com
	import org.elasticsearch.spark._
	import org.apache.spark.sql._
	//val sqlContext = new SQLContext(sc)
	val options = Map("pushdown" -> "true", "es.nodes" -> "host_ip_here", "es.port" -> "9200",
	"es.nodes.wan.only" -> "true")
	sqlContext.read.format("es").options(options).load("index_name").write.mode(SaveMode.Overwrite).json("path_to_output")
	sc.esRDD("index_name",options)
	sudo su
	mkdir -p /etc/elasticsearch/analysis
	cd /etc/elasticsearch/analysis
	wget http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz
	tar xvzf WNprolog-3.0.tar.gz
	mv prolog/wn_s.pl .
	rm -rf prolog
	rm -f WNprolog-3.0.tar.gz
	pip install -t dependencies -r requirements.txt
	cd dependencies
	zip -r ../dependencies.zip .