Skip to content

Instantly share code, notes, and snippets.

@rustyrazorblade
Created December 15, 2015 16:29
Show Gist options
  • Save rustyrazorblade/4e74301de9192f2e8bc2 to your computer and use it in GitHub Desktop.
Save rustyrazorblade/4e74301de9192f2e8bc2 to your computer and use it in GitHub Desktop.
ETL
from pyspark.sql import SQLContext
sql = SQLContext(sc)
df = sql.read.format("org.apache.spark.sql.cassandra").load(keyspace="lens", table="raw_data")
df.write.format('parquet').save(hdfs_path)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment