Skip to content

Instantly share code, notes, and snippets.

@lossyrob
Last active November 1, 2020 12:41
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save lossyrob/59f8116b07d37f7f45c5 to your computer and use it in GitHub Desktop.
Save lossyrob/59f8116b07d37f7f45c5 to your computer and use it in GitHub Desktop.
Ingest GeoTIFF into HDFS using GeoTrellis spark (0.10 Snapshot)
### INGEST GEOTIFFS INTO HDFS ###
# geotrellis-spark JAR. Shouldn't have to change this one if running in the root folder (remember to run ./sbt "project spark" assembly)
JAR=spark/target/scala-2.10/geotrellis-spark-assembly-0.10.0-SNAPSHOT.jar
# Amount of memory for the driver
DRIVER_MEMORY=3G
# Amount of memory per executor. If in local mode, change the DRIVER_MEMORY instead.
EXECUTOR_MEMORY=512G
# MASTER
# For local ingest, options are "local" or "local[K]", where K is the number of executors, e.g. "local[8]"
# Otherwise specify the spark master, such as spark://207.184.161.138:7077 or mesos://192.168.1.2:5050
MASTER=local[8]
# Directory with the input tiled GeoTIFF's
INPUT=file:/Users/rob/data/nlcd/clipped_tiles
# Catalog directory on HDFS
CATALOG=hdfs://localhost/catalog
# Name of the layer. This will be used in conjunction with the zoom level to reference the layer (see LayerId)
LAYER_NAME=nlcd
# This defines the destination spatial reference system we want to use
# (in this case, Web Mercator)
CRS=EPSG:3857
# true means we want to pyramid the raster up to larger zoom levels,
# so if our input rasters are at a resolution that maps to zoom level 11, pyramiding will also save
# off levels 10, 9, ..., 1.
PYRAMID=true
# true will delete the HDFS data for the layer if it already exists.
CLOBBER=true
# We need to remove some bad signatures from the assembled JAR. We're working on excluding these
# files as part of the build step, this is a workaround.
zip -d $JAR META-INF/ECLIPSEF.RSA
zip -d $JAR META-INF/ECLIPSEF.SF
# Run the spark submit job
spark-submit \
--class geotrellis.spark.ingest.HadoopIngestCommand \
--master $MASTER \
--driver-memory $DRIVER_MEMORY \
--executor-memory $EXECUTOR_MEMORY \
$JAR \
--crs $CRS \
--pyramid $PYRAMID \
--clobber $CLOBBER \
--input $INPUT \
--catalog $CATALOG \
--layerName $LAYER_NAME
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment