Skip to content

Instantly share code, notes, and snippets.

@lbustelo
Created March 8, 2016 22:29
Show Gist options
  • Save lbustelo/56e7f034455b28f102cd to your computer and use it in GitHub Desktop.
Save lbustelo/56e7f034455b28f102cd to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Marking com.databricks:spark-csv_2.10:1.4.0 for download\n",
"Preparing to fetch from:\n",
"-> file:/tmp/.ivy2/\n",
"-> https://repo1.maven.org/maven2\n",
"-> New file at /tmp/.ivy2/https/repo1.maven.org/maven2/org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar\n",
"-> New file at /tmp/.ivy2/https/repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.4.0/spark-csv_2.10-1.4.0.jar\n",
"-> New file at /tmp/.ivy2/https/repo1.maven.org/maven2/com/univocity/univocity-parsers/1.5.1/univocity-parsers-1.5.1.jar\n"
]
}
],
"source": [
"%AddDeps com.databricks spark-csv_2.10 1.4.0 --transitive"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import org.apache.spark.sql.SQLContext\n",
"import sqlContext._\n",
"import sqlContext.implicits._\n",
"import SQLContext._\n",
"val sqlContext = new SQLContext(sc)\n",
"import sqlContext.implicits._\n",
"val df = sqlContext.read.format(\"com.databricks.spark.csv\").option(\"header\", \"true\").option(\"inferSchema\", \"true\").load(\"cars.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+----+-----+-----+--------------------+-----+\n",
"|year| make|model| comment|blank|\n",
"+----+-----+-----+--------------------+-----+\n",
"|2012|Tesla| S| No comment| |\n",
"|1997| Ford| E350|Go get one now th...| |\n",
"|2015|Chevy| Volt| null| null|\n",
"+----+-----+-----+--------------------+-----+\n",
"\n"
]
}
],
"source": [
"df.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Toree",
"language": "",
"name": "toree"
},
"language_info": {
"name": "scala"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment