Skip to content

Instantly share code, notes, and snippets.

@dyno
Created December 1, 2022 00:12
Show Gist options
  • Save dyno/da9a04207cf9c3215b7204207b0a41b1 to your computer and use it in GitHub Desktop.
Save dyno/da9a04207cf9c3215b7204207b0a41b1 to your computer and use it in GitHub Desktop.
Spark Json
import org.apache.spark.sql.types.{IntegerType, MapType, StringType, StructField, StructType}
import org.apache.spark.sql.{Column, DataFrame, Encoders}
val DataSchema: StructType = StructType(
List(
StructField("A", StringType, nullable = true), // required
StructField("B", StringType, nullable = true), // required
StructField("C", StringType, nullable = true) // required
)
)
val eventStr = """
{"A": 1, "C": "world"}
{"A": 2, "B": "Hello"}
"""
val events = eventStr.split("\n").filter(_.nonEmpty)
val df: DataFrame = spark.createDataset(events)(Encoders.STRING).toDF("data")
df.withColumn("json", from_json(col("data"), DataSchema)).select("json.*")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment