Skip to content

Instantly share code, notes, and snippets.

@prakashrd
Created March 7, 2019 11:57
Show Gist options
  • Save prakashrd/e9142b98deb229844bca59d3e5a41b88 to your computer and use it in GitHub Desktop.
Save prakashrd/e9142b98deb229844bca59d3e5a41b88 to your computer and use it in GitHub Desktop.
spark-joining-datasets
scala> val left = Seq((0), (1)).toDF("id")
left: org.apache.spark.sql.DataFrame = [id: int]
scala> left.join(right, "id").show
+---+-----+
| id|right|
+---+-----+
| 0| zero|
| 0| four|
+---+-----+
scala> val left = Seq((0), (1)).toDF("id")
left: org.apache.spark.sql.DataFrame = [id: int]
scala> val right = Seq((0, "zero"), (2, "two"), (3, "three"), (0, "four")).toDF("id", "right")
right: org.apache.spark.sql.DataFrame = [id: int, right: string]
scala> left.join(right, "id").show
+---+-----+
| id|right|
+---+-----+
| 0| zero|
| 0| four|
+---+-----+
scala>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment