Skip to content

Instantly share code, notes, and snippets.

@wesslen
Created October 27, 2016 19:38
Show Gist options
  • Save wesslen/c74550698bb86163f07607dde249a444 to your computer and use it in GitHub Desktop.
Save wesslen/c74550698bb86163f07607dde249a444 to your computer and use it in GitHub Desktop.
from pyspark.sql import SQLContext
from pyspark.sql.functions import substring
sqlContext = SQLContext(sc)
jobDir = "/user/rwesslen/tweets/Pres_Tweets/"
jobName = "presTweet"
tweets = sqlContext.read.format('json').load([jobDir + "*.json"])
tweets.coalesce(1).toJSON().saveAsTextFile(jobDir + jobName)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment