Jia Yu jiayuasu

## readme.markdown

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jiayuasu
                / readme.markdown
            
            
              Last active
              August 29, 2015 14:21
                — forked from jonah-williams/readme.markdown
            
          
    Project

Description: What does this project do and who does it serve?
Project Setup

How do I, as a developer, start working on the project?

What dependencies does it have (where are they expressed) and how do I install them?
How can I see the project working before I change anything?


## metrics.properties
# Enable Graphite
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host=<graphite host>
*.sink.graphite.port=<graphite port>
*.sink.graphite.period=10

# Enable jvm source for instance master, worker, driver and executor
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource

## GeoSparkAndBabylon.java
/*---------------------------- Step 0: Create GeoSpark Spatial RDDs ----------------------------*/
PointRDD spatialRDD = new PointRDD(sparkContext, PointInputLocation, PointOffset, FileDataSplitter.CSV, false, PointNumPartitions, StorageLevel.MEMORY_ONLY());
PolygonRDD queryRDD = new PolygonRDD(sparkContext, PolygonInputLocation,  FileDataSplitter.CSV, false, PolygonNumPartitions, StorageLevel.MEMORY_ONLY());
/*---------------------------- Step 1: Issue GeoSpark Spatial Join Query with Index ----------------------------*/
spatialRDD.spatialPartitioning(GridType.RTREE);
queryRDD.spatialPartitioning(spatialRDD.grids);
spatialRDD.buildIndex(IndexType.RTREE,true);
JavaPairRDD<Polygon,Long> joinResult = JoinQuery.SpatialJoinQueryCountByKey(spatialRDD,queryRDD,true, true);
/*---------------------------- Step 2: Create Babylon Choropleth Map using Twitter dataset ----------------------------*/
ChoroplethMap visualizationOperator = new ChoroplethMap(1000,600,USMainLandBoundary,false);

## GeoSpark_Scala_Old_Example.scala
/*---------------------------- GeoSpark 0.3.X or older Scala API usage ----------------------------*/

/*
 * If you are writing GeoSpark program in Spark Scala Shell, no need to declare the Spark Context by yourself.
 * If you are writing a self-contained GeoSpark Scala program, please declare the Spark Context as follows and
 * stop it at the end of the entire program.
 */
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

## CSE 512 Project Phase 2.scala
/*---------------------------- Start an example Spatial Join Query using Cartesian Product algorithm ----------------------------*/
val objectRDD = new PointRDD(sc, "/home/SparkUser/Downloads/GeoSpark/src/test/resources/arealm.csv", 0, "csv", 10); /* The O means spatial attribute starts at Column 0 and the 10 means 10 RDD partitions */
val rectangleRDD = new RectangleRDD(sc, "/home/SparkUser/Downloads/GeoSpark/src/test/resources/zcta510.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0. You might need to "collect" all rectangles into a list and do the Carteian Product join. */
val resultSize = JoinQuery.SpatialJoinQueryUsingCartesianProduct(objectRDD, rectangleRDD).count();
/*---------------------------- End an example Spatial Join Query using Cartesian Product algorithm ----------------------------*/


CSE 512 Naive Spatial Join Query (Should be written in Java)

1.Create a PointRDD objectRDD;

## CSE 512 Naive Spatial Join Query.java
1.Create a PointRDD objectRDD;
2.Create a RectangleRDD queryWindowRDD;
3.Collect rectangles from queryWindowRDD to one Java List L;
4. For each rectangle R in L
    do RangeQuery.SpatialRangeQuery(objectRDD, queryEnvelope, 0);
   End;
5.Collect all results; //"Collect" is a standard function under SparkContext.
6.Parallelize the results to generate a RDD in this format: JavaPairRDD<Envelope, HashSet<Point>>.;//"Parallelize" is a standard function under SparkContext.
7.Return the result RDD;

## Babylon-GeoSpark.scala
import com.vividsolutions.jts.geom.Envelope;

import java.awt.Color;

import org.datasyslab.geospark.spatialRDD.LineStringRDD;
import org.datasyslab.geospark.spatialRDD.PointRDD;
import org.datasyslab.geospark.spatialRDD.PolygonRDD;
import org.datasyslab.geospark.spatialRDD.RectangleRDD;

import org.datasyslab.geospark.enums.FileDataSplitter;

## gcc 5 on ubuntu 14.04
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-5 g++-5

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5

## GeoSpark_Scala_New_Example.scala
/*---------------------------- GeoSpark 0.4 (or later) Scala API usage ----------------------------*/

/*
 * If you are writing GeoSpark program in Spark Scala Shell, no need to declare the Spark Context by yourself.
 * If you are writing a self-contained GeoSpark Scala program, please declare the Spark Context as follows and
 * stop it at the end of the entire program.
 */
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

## 0_reuse_code.js
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
	# Enable Graphite
	*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
	*.sink.graphite.host=<graphite host>
	*.sink.graphite.port=<graphite port>
	*.sink.graphite.period=10

	# Enable jvm source for instance master, worker, driver and executor
	master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
	worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
	driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
	/---------------------------- Step 0: Create GeoSpark Spatial RDDs ----------------------------/
	PointRDD spatialRDD = new PointRDD(sparkContext, PointInputLocation, PointOffset, FileDataSplitter.CSV, false, PointNumPartitions, StorageLevel.MEMORY_ONLY());
	PolygonRDD queryRDD = new PolygonRDD(sparkContext, PolygonInputLocation, FileDataSplitter.CSV, false, PolygonNumPartitions, StorageLevel.MEMORY_ONLY());
	/---------------------------- Step 1: Issue GeoSpark Spatial Join Query with Index ----------------------------/
	spatialRDD.spatialPartitioning(GridType.RTREE);
	queryRDD.spatialPartitioning(spatialRDD.grids);
	spatialRDD.buildIndex(IndexType.RTREE,true);
	JavaPairRDD<Polygon,Long> joinResult = JoinQuery.SpatialJoinQueryCountByKey(spatialRDD,queryRDD,true, true);
	/---------------------------- Step 2: Create Babylon Choropleth Map using Twitter dataset ----------------------------/
	ChoroplethMap visualizationOperator = new ChoroplethMap(1000,600,USMainLandBoundary,false);
	/---------------------------- GeoSpark 0.3.X or older Scala API usage ----------------------------/

	/*
	* If you are writing GeoSpark program in Spark Scala Shell, no need to declare the Spark Context by yourself.
	* If you are writing a self-contained GeoSpark Scala program, please declare the Spark Context as follows and
	* stop it at the end of the entire program.
	*/
	import org.apache.spark.SparkContext
	import org.apache.spark.SparkConf
	/---------------------------- Start an example Spatial Join Query using Cartesian Product algorithm ----------------------------/
	val objectRDD = new PointRDD(sc, "/home/SparkUser/Downloads/GeoSpark/src/test/resources/arealm.csv", 0, "csv", 10); /* The O means spatial attribute starts at Column 0 and the 10 means 10 RDD partitions */
	val rectangleRDD = new RectangleRDD(sc, "/home/SparkUser/Downloads/GeoSpark/src/test/resources/zcta510.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0. You might need to "collect" all rectangles into a list and do the Carteian Product join. */
	val resultSize = JoinQuery.SpatialJoinQueryUsingCartesianProduct(objectRDD, rectangleRDD).count();
	/---------------------------- End an example Spatial Join Query using Cartesian Product algorithm ----------------------------/


	CSE 512 Naive Spatial Join Query (Should be written in Java)

	1.Create a PointRDD objectRDD;
	1.Create a PointRDD objectRDD;
	2.Create a RectangleRDD queryWindowRDD;
	3.Collect rectangles from queryWindowRDD to one Java List L;
	4. For each rectangle R in L
	do RangeQuery.SpatialRangeQuery(objectRDD, queryEnvelope, 0);
	End;
	5.Collect all results; //"Collect" is a standard function under SparkContext.
	6.Parallelize the results to generate a RDD in this format: JavaPairRDD<Envelope, HashSet<Point>>.;//"Parallelize" is a standard function under SparkContext.
	7.Return the result RDD;
	import com.vividsolutions.jts.geom.Envelope;

	import java.awt.Color;

	import org.datasyslab.geospark.spatialRDD.LineStringRDD;
	import org.datasyslab.geospark.spatialRDD.PointRDD;
	import org.datasyslab.geospark.spatialRDD.PolygonRDD;
	import org.datasyslab.geospark.spatialRDD.RectangleRDD;

	import org.datasyslab.geospark.enums.FileDataSplitter;
	sudo add-apt-repository ppa:ubuntu-toolchain-r/test
	sudo apt-get update
	sudo apt-get install gcc-5 g++-5

	sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5
	// Use Gists to store code you would like to remember later on
	console.log(window); // log the "window" object to the console