This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import nltk | |
nltk.download() | |
## use nltk.download() within a Python prompt to | |
## download the `punkt` data | |
## Anaconda is recommended, to pick up NumPy, NLTK, etc. | |
## http://continuum.io/downloads | |
## this also requires TextBlob/PerceptronTagger |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# using four part files to construct "minitweet" | |
cat rawtweets/part-0000[1-3] > minitweets | |
# change log4j properties to WARN to reduce noise during demo | |
mv conf/log4j.properties.template conf/log4j.properties | |
vim conf/log4j.properties # Change to WARN | |
# launch Spark shell REPL | |
./bin/spark-shell |
NewerOlder