Skip to content

Instantly share code, notes, and snippets.

@tilakpatidar
Created March 25, 2018 13:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tilakpatidar/2aa2d1c5f40303db12a358969fd6895a to your computer and use it in GitHub Desktop.
Save tilakpatidar/2aa2d1c5f40303db12a358969fd6895a to your computer and use it in GitHub Desktop.
Finding unique records between ORC file and MySQL rows using Apache Spark
import spark.implicits._
import org.apache.spark.sql.SaveMode
val products = spark.sqlContext.read.format("jdbc").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "products").option("user", "gobblin").option("password", "gobblin").option("url", "jdbc:mysql://localhost/mopar_demo").load()
val newProducts = spark.sqlContext.read.format("orc").load("/Users/tilak/gobblin/mopar-demo/output/org/apache/gobblin/copy/user/tilak/pricing.products_1521799535.csv/20180325023900_append/part.task_PullCsvFromS3_1521945534992_0_0.orc")
val newnewProducts = newProducts.except(products)
val dfWriter = newnewProducts.write.mode(SaveMode.Append)
val connectionProperties = new java.util.Properties()
connectionProperties.setProperty("Driver", "com.mysql.jdbc.Driver")
connectionProperties.setProperty("user", "root")
connectionProperties.setProperty("password", "1")
dfWriter.jdbc("jdbc:mysql://localhost/mopar_demo", "products", connectionProperties)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment