eg: add csv package,
%dep
z.reset()
z.addRepo("Spark Packages Repo").url("http://dl.bintray.com/spark-packages/maven")
z.load("com.databricks:spark-csv_2.11:1.4.0")
No exception in pyspark
.
issue comment
sql query will raise java.lang.ClassNotFoundException: com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2
solve: add cacheTable after registerTempTable
df_parquet.registerTempTable("click_parquet")
sqlContext.cacheTable("click_parquet")
Of cause, install ipython first: sudo pip-2.7 install ipython
.
then, start IPYTHON=1 pyspark
.
eg:
IPYTHON=1 pyspark --packages com.databricks:spark-csv_2.11:1.4.0
If using --jars
, must make sure the jar packages exsit.