Skip to content

Instantly share code, notes, and snippets.

@royrusso
Last active June 12, 2016 23:49
Show Gist options
  • Save royrusso/51bc89427e1575d4d777 to your computer and use it in GitHub Desktop.
Save royrusso/51bc89427e1575d4d777 to your computer and use it in GitHub Desktop.
Load datasource in to dataframe, using Spark DataSource API
os.environ['SPARK_CLASSPATH'] = "/path/to/driver/postgresql-9.3-1103.jdbc41.jar"
from pyspark import SparkContext
from pyspark.sql import SQLContext, Row
sc = SparkContext("local[*]", '<JOBNAME>')
sqlctx = SQLContext(sc)
df = sqlctx.load(
source="jdbc",
url="jdbc:postgresql://<HOST>/<DATABASE>?user=<USERNAME>&password=<PASSWORD>",
dbtable="<SCHEMA>.<TABLENAME>")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment