Skip to content

Instantly share code, notes, and snippets.

@Charmatzis
Last active February 5, 2018 11:06
Show Gist options
  • Save Charmatzis/ac03d963faa078bff586fcfcaaa8fea6 to your computer and use it in GitHub Desktop.
Save Charmatzis/ac03d963faa078bff586fcfcaaa8fea6 to your computer and use it in GitHub Desktop.
use Zeppelin in EMR with geotrellis and geotrellis-spark-sql

Introduction

This is a help page for settinh up Zeppelin with geotrellis and geotrellis-spark-sql and continue with analysis.

Steps

Prestep

Set properties for EMR cluster

Cluster Size
Master Memory 8
Master Cores 2
Number of Worker Nodes 3
Memory Per Worker Node (GB) 64
Cores Per Worker Node 16
Spark Size
Selected Executors Per Node 5
spark.executor.instances 15
spark.yarn.executor.memoryOverhead 2048
spark.executor.memory 10G
spark.yarn.driver.memoryOverhead 1024
spark.driver.memory 6G
spark.executor.cores 3
spark.driver.cores 1
spark.default.parallelism 90

First

Go to Interpreters => add a repository

Name: Astrea

Url: https://dl.bintray.com/s22s/maven/

Name: Geotools

Url: http://download.osgeo.org/webdav/geotools/

Name: GeoSolutions

Url: http://maven.geo-solutions.it/

Then go to %spark edit and then add artifacts

  1. org.locationtech.geotrellis:geotrellis-spark_2.11:1.1.1
  2. org.locationtech.geotrellis:geotrellis-raster_2.11:1.1.1
  3. org.locationtech.geotrellis:geotrellis-s3_2.11:1.1.1
  4. org.locationtech.geotrellis:geotrellis-vector_2.11:1.1.1
  5. org.locationtech.geotrellis:geotrellis-geotools_2.11:1.1.1
  6. astraea:geotrellis-spark-sql_2.11:0.2.3
  7. com.knockdata:spark-highcharts:0.6.5
  8. org.locationtech.spatial4j:spatial4j:0.6

Save and then open a experiment

Second

Import to the fisrt shell

%spark
import geotrellis.spark._
import geotrellis.raster._
import geotrellis.vector._

val extent = Extent(0,0,1,1)

if lazy load it will be sucessfull import of geotrellis

Third

To test geotrellis-spark-sql add to the next shell

import org.apache.spark.sql.{SparkSession, gt} 

val _spark = SparkSession
            .builder().getOrCreate()

import _spark.implicits._
implicit val sc = _spark.sparkContext
implicit val _sql = _spark.sqlContext

gt.gtRegister(_sql)

Then the geotrellis-spark-sql will be registered.

Forth

add highchart scripts

%angular
<script type="text/javascript">

	$(function () {
	    if (typeof Highcharts == "undefined") {
			$.getScript("http://code.highcharts.com/highcharts.js")
			  .done(function( script, textStatus ) {
			    console.log( "load http://code.highcharts.com/highcharts.js " + textStatus );
			  })
			  .fail(function(jqxhr, settings, exception ) {
			     console.log("load http://code.highcharts.com/highcharts.js " + exception);
			  });
		} else {
		    console.log("highcharts already loaded");
		}
	});
</script>

and

%angular
<script type="text/javascript">

	$(function () {
			$.getScript("https://code.highcharts.com/modules/drilldown.js")
			  .done(function( script, textStatus ) {
			    console.log( "load https://code.highcharts.com/modules/drilldown.js " + textStatus );
			  })
			  .fail(function(jqxhr, settings, exception ) {
			     console.log("load https://code.highcharts.com/modules/drilldown.js " + exception);
			  });
	});
</script>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment