Skip to content

Instantly share code, notes, and snippets.

@dacr
Last active May 7, 2023 15:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dacr/c56f604b9486461708b2a16e74b8e766 to your computer and use it in GitHub Desktop.
Save dacr/c56f604b9486461708b2a16e74b8e766 to your computer and use it in GitHub Desktop.
Feed elasticsearch with almost 20 years of chicago crimes (using spark). / published by https://github.com/dacr/code-examples-manager #385ba213-e769-499c-92ae-3f63cfb72d15/ebc1b9c37aaafb3304faed44448e29616a08d1e3
// summary : Feed elasticsearch with almost 20 years of chicago crimes (using spark).
// keywords : scala, elasticsearch, feed, chicago, crimes, bigdata, spark
// publish : gist
// authors : David Crosson
// license : Apache NON-AI License Version 2.0 (https://raw.githubusercontent.com/non-ai-licenses/non-ai-licenses/main/NON-AI-APACHE2)
// id : 385ba213-e769-499c-92ae-3f63cfb72d15
// created-on : 2019-11-02T21:23:37Z
// managed-by : https://github.com/dacr/code-examples-manager
// execution : scala 2.12 ammonite script (http://ammonite.io/) - run as follow 'amm scriptname.sc'
// spark 2.4.4 is only for scala 2.12, 2.5.x will bring scala 2.13 support
import $ivy.`org.apache.spark::spark-sql:2.4.4`
//import $ivy.`org.elasticsearch::elasticsearch-spark-20:7.3.2` // not yet available for scala 2.12 !!!
import org.apache.spark.sql._
/*
Fill elasticsearch with ~19 years of chicago crimes data :
`curl -L https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD -o crimes.csv`
*/
val spark =
SparkSession.builder()
.master("local[*]")
.getOrCreate()
spark.conf.set("spark.sql.session.timeZone", "America/Chicago")
def sc = spark.sparkContext
val crimesCSV =
spark.read.format("csv")
.option("sep", ",")
.option("inferSchema", "true")
.option("header", "true")
.option("timestampFormat", "MM/d/yyyy hh:mm:ss a")
.load("crimes.csv")
println(crimesCSV.count())
crimesCSV.printSchema()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment