Skip to content

Instantly share code, notes, and snippets.

@neilghosh
Last active August 19, 2018 15:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save neilghosh/ccfa348c536444a9a89100b2b8914ab8 to your computer and use it in GitHub Desktop.
Save neilghosh/ccfa348c536444a9a89100b2b8914ab8 to your computer and use it in GitHub Desktop.
Spark PostgreSQL

Install Spark and run master and slaves (workers) in standalone mode.

brew install apache-spark
/usr/local/Cellar/apache-spark/2.3.1/bin/spark-class org.apache.spark.deploy.master.Master
/usr/local/Cellar/apache-spark/2.3.1/bin/spark-class org.apache.spark.deploy.worker.Worker  spark://<MASTER_IP>:7077 -c 1 -m 512M

In PostgreSQL

CREATE TABLE items(
  id VARCHAR (100),
  description VARCHAR (100),
);

Load CSV from file COPY items FROM '/Users//items.csv' DELIMITER ',' CSV HEADER;

Launch Spark Shell

pyspark --conf spark.executor.extraClassPath=/Users/<pathTo>/postgresql-42.2.4.jar  --driver-class-path /Users/<pathTo>/postgresql-42.2.4.jar  --master spark://192.168.1.199:7077 --executor-memory 512m

Connect to existing PostgreSQL

df = spark.read \
    .format("jdbc") \
    .option("driver", "org.postgresql.Driver") \
    .option("url", "jdbc:postgresql:retailme") \
    .option("dbtable", "items") \
    .option("user", "<postgres_user>") \
    .option("password", "") \
    .load()
    
df.count() #This fires the query and displayed the count once can check progress in the Spark UI    

# Join, shows number of records where ids exist sin df1 but not df2

left_join = df1.join(df2, df1.id == df2.id,how='left') # Could also use 'left_outer'
left_join.filter(col('id').isNull()).count()


# Write data to tables
mode = "overwrite"
url = "jdbc:postgresql:retailme"
properties = {"user": "<postgreUser>","password": "","driver": "org.postgresql.Driver"}
df.write.jdbc(url=url, table="items", mode=mode, properties=properties)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment