Skip to content

Instantly share code, notes, and snippets.

View jiayuasu's full-sized avatar

Jia Yu jiayuasu

View GitHub Profile

Project

Description: What does this project do and who does it serve?

Project Setup

How do I, as a developer, start working on the project?

  1. What dependencies does it have (where are they expressed) and how do I install them?
  2. How can I see the project working before I change anything?
# Enable Graphite
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host=<graphite host>
*.sink.graphite.port=<graphite port>
*.sink.graphite.period=10
# Enable jvm source for instance master, worker, driver and executor
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
@jiayuasu
jiayuasu / GeoSparkAndBabylon.java
Last active April 7, 2017 20:19
Use Choropleth Map to Visualize Spatial Join Query (GeoSpark + Babylon)
/*---------------------------- Step 0: Create GeoSpark Spatial RDDs ----------------------------*/
PointRDD spatialRDD = new PointRDD(sparkContext, PointInputLocation, PointOffset, FileDataSplitter.CSV, false, PointNumPartitions, StorageLevel.MEMORY_ONLY());
PolygonRDD queryRDD = new PolygonRDD(sparkContext, PolygonInputLocation, FileDataSplitter.CSV, false, PolygonNumPartitions, StorageLevel.MEMORY_ONLY());
/*---------------------------- Step 1: Issue GeoSpark Spatial Join Query with Index ----------------------------*/
spatialRDD.spatialPartitioning(GridType.RTREE);
queryRDD.spatialPartitioning(spatialRDD.grids);
spatialRDD.buildIndex(IndexType.RTREE,true);
JavaPairRDD<Polygon,Long> joinResult = JoinQuery.SpatialJoinQueryCountByKey(spatialRDD,queryRDD,true, true);
/*---------------------------- Step 2: Create Babylon Choropleth Map using Twitter dataset ----------------------------*/
ChoroplethMap visualizationOperator = new ChoroplethMap(1000,600,USMainLandBoundary,false);
@jiayuasu
jiayuasu / GeoSpark_Scala_Old_Example.scala
Last active February 3, 2017 22:53
These Scala APIs work for GeoSpark 0.3.X line
/*---------------------------- GeoSpark 0.3.X or older Scala API usage ----------------------------*/
/*
* If you are writing GeoSpark program in Spark Scala Shell, no need to declare the Spark Context by yourself.
* If you are writing a self-contained GeoSpark Scala program, please declare the Spark Context as follows and
* stop it at the end of the entire program.
*/
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
/*---------------------------- Start an example Spatial Join Query using Cartesian Product algorithm ----------------------------*/
val objectRDD = new PointRDD(sc, "/home/SparkUser/Downloads/GeoSpark/src/test/resources/arealm.csv", 0, "csv", 10); /* The O means spatial attribute starts at Column 0 and the 10 means 10 RDD partitions */
val rectangleRDD = new RectangleRDD(sc, "/home/SparkUser/Downloads/GeoSpark/src/test/resources/zcta510.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0. You might need to "collect" all rectangles into a list and do the Carteian Product join. */
val resultSize = JoinQuery.SpatialJoinQueryUsingCartesianProduct(objectRDD, rectangleRDD).count();
/*---------------------------- End an example Spatial Join Query using Cartesian Product algorithm ----------------------------*/
CSE 512 Naive Spatial Join Query (Should be written in Java)
1.Create a PointRDD objectRDD;
1.Create a PointRDD objectRDD;
2.Create a RectangleRDD queryWindowRDD;
3.Collect rectangles from queryWindowRDD to one Java List L;
4. For each rectangle R in L
do RangeQuery.SpatialRangeQuery(objectRDD, queryEnvelope, 0);
End;
5.Collect all results; //"Collect" is a standard function under SparkContext.
6.Parallelize the results to generate a RDD in this format: JavaPairRDD<Envelope, HashSet<Point>>.;//"Parallelize" is a standard function under SparkContext.
7.Return the result RDD;
@jiayuasu
jiayuasu / Babylon-GeoSpark.scala
Last active January 15, 2017 07:01
Babylon scala example
import com.vividsolutions.jts.geom.Envelope;
import java.awt.Color;
import org.datasyslab.geospark.spatialRDD.LineStringRDD;
import org.datasyslab.geospark.spatialRDD.PointRDD;
import org.datasyslab.geospark.spatialRDD.PolygonRDD;
import org.datasyslab.geospark.spatialRDD.RectangleRDD;
import org.datasyslab.geospark.enums.FileDataSplitter;
@jiayuasu
jiayuasu / gcc 5 on ubuntu 14.04
Created January 23, 2017 19:42 — forked from beci/gcc 5 on ubuntu 14.04
use gcc 5.x on ubuntu 14.04
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5
@jiayuasu
jiayuasu / GeoSpark_Scala_New_Example.scala
Last active June 1, 2017 18:11
These Scala APIs work for GeoSpark 0.4 or later
/*---------------------------- GeoSpark 0.4 (or later) Scala API usage ----------------------------*/
/*
* If you are writing GeoSpark program in Spark Scala Shell, no need to declare the Spark Context by yourself.
* If you are writing a self-contained GeoSpark Scala program, please declare the Spark Context as follows and
* stop it at the end of the entire program.
*/
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
@jiayuasu
jiayuasu / 0_reuse_code.js
Created January 27, 2017 21:52
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console