Skip to content

Instantly share code, notes, and snippets.

@NeerajBhadani
Last active May 25, 2020 09:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save NeerajBhadani/c012a1e3dfb52a70bb644f0b18ace7cb to your computer and use it in GitHub Desktop.
Save NeerajBhadani/c012a1e3dfb52a70bb644f0b18ace7cb to your computer and use it in GitHub Desktop.
Create Spark DataFrame
val initial_df = Seq(
("x", 4, 1),
("x", 6, 2),
("z", 7, 3),
("a", 3, 4),
("z", 5, 2),
("x", 7, 3),
("x", 9, 7),
("z", 1, 8),
("z", 4, 9),
("z", 7, 4),
("a", 8, 5),
("a", 5, 2),
("a", 3, 8),
("x", 2, 7),
("z", 1, 9)
).toDF("col1", "col2", "col3")
// Generate Array columns
val full_df = (initial_df.groupBy("col1")
.agg(collect_list($"col2").as("array_col1"),
collect_list($"col3").as("array_col2"))
val df = full_df.drop("array_col1")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment