Skip to content

Instantly share code, notes, and snippets.

@yzhong52
Last active June 18, 2022 13:53
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yzhong52/f81e929e5810271292bd08856e2f4512 to your computer and use it in GitHub Desktop.
Save yzhong52/f81e929e5810271292bd08856e2f4512 to your computer and use it in GitHub Desktop.
Create Spark DataFrame From List[Any]
// Spark 2.1
val spark = SparkSession.builder().master("local").getOrCreate()
// Given a list of mixture of strings in integers
val values = List("20030100013280", 1.0)
// Create `Row` from `Seq`
val row = Row.fromSeq(values)
// Create `RDD` from `Row`
val rdd = spark.sparkContext.makeRDD(List(row))
// Create schema fields
val fields = List(
StructField("First Column", StringType, nullable = false),
StructField("Second Column", DoubleType, nullable = false)
)
// Create `DataFrame`
val dataFrame = spark.createDataFrame(rdd, StructType(fields))
// Done! Yay!
dataFrame.show(1)
+--------------+-------------+
| First Column|Second Column|
+--------------+-------------+
|20030100013280| 1.0|
+--------------+-------------+
@hhoughgg
Copy link

hhoughgg commented Jul 7, 2017

Had to import:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

Thanks for this. Worked perfectly.

@sisgandarli
Copy link

Thanks for the code snippet. Helped a lot.

@ChinmaySKulkarni
Copy link

@yzhong52 If I wanted to save millions of rows, what is the best way to extend this?

@yzhong52
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment