Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Create Spark DataFrame
val initial_df = Seq(
("x", 4, 1),
("x", 6, 2),
("z", 7, 3),
("a", 3, 4),
("z", 5, 2),
("x", 7, 3),
("x", 9, 7),
("z", 1, 8),
("z", 4, 9),
("z", 7, 4),
("a", 8, 5),
("a", 5, 2),
("a", 3, 8),
("x", 2, 7),
("z", 1, 9)
).toDF("col1", "col2", "col3")
// Generate Array columns
val full_df = (initial_df.groupBy("col1")
.agg(collect_list($"col2").as("array_col1"),
collect_list($"col3").as("array_col2"))
val df = full_df.drop("array_col1")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment