Skip to content

Instantly share code, notes, and snippets.

@jeff303
Last active September 18, 2020 18:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jeff303/b748de7230002d233c5a1691cd2a3252 to your computer and use it in GitHub Desktop.
Save jeff303/b748de7230002d233c5a1691cd2a3252 to your computer and use it in GitHub Desktop.
A simple spark-shell session showing how to do useful things
// some useful imports
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
// start with some very simple JSON
val simpleJsonStr = """{"foo": 42, "bar": "baz"}"""
// just read; schema will be inferred
val simpleDf = spark.read.json(Seq(simpleJsonStr).toDS())
// print the DataFrame contents
simpleDf.show()
// print the schema
simpleDf.printSchema()
// define a simple schema
val simpleSchema = StructType(
StructField("foo", IntegerType, true) ::
StructField("bar", StringType, true) ::
Nil)
// read the same JSON using this schema
val simpleDf = spark.read.schema(simpleSchema).json(Seq(simpleJsonStr).toDS())
// print the schema again to see the updated types
simpleDf.printSchema()
// that same schema can be represented as JSON
val schemaJson = """{
"type" : "struct",
"fields" : [ {
"name" : "foo",
"type" : "integer",
"nullable" : true,
"metadata" : { }
}, {
"name" : "bar",
"type" : "string",
"nullable" : true,
"metadata" : { }
} ]
}"""
val simpleSchemaFromJson = DataType.fromJson(schemaJson).asInstanceOf[StructType]
// or Spark DDL
val simpleSchemaFromDDL = DataType.fromDDL("foo INTEGER, bar STRING").asInstanceOf[StructType]
// create a DataFrame from this schema, which has no rows
val emptyDf = spark.createDataFrame(sc.emptyRDD[Row], simpleSchema)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment