Skip to content

Instantly share code, notes, and snippets.

@hoto17296
Last active August 13, 2018 10:33
Show Gist options
  • Save hoto17296/98fcf4748afef59b0832b67446307c7b to your computer and use it in GitHub Desktop.
Save hoto17296/98fcf4748afef59b0832b67446307c7b to your computer and use it in GitHub Desktop.
PySpark Notebook から Treasure Data にクエリを投げる
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"! \\\n",
"TD_SPARK_JAR_URL='https://s3.amazonaws.com/td-spark/td-spark-assembly-latest.jar'; \\\n",
"TD_SPARK_JAR_PATH=\"${SPARK_HOME}/jars/td-spark.jar\"; \\\n",
"[ -f \"${TD_SPARK_JAR_PATH}\" ] || wget ${TD_SPARK_JAR_URL} -O ${TD_SPARK_JAR_PATH}"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"! \\\n",
"TD_API_KEY='TD_API_KEY'; \\\n",
"SPARK_CONF_PATH=\"${SPARK_HOME}/conf/spark-defaults.conf\"; \\\n",
"[ -z $(grep spark.td.apikey ${SPARK_CONF_PATH} 2>/dev/null) ] && echo \"spark.td.apikey=${TD_API_KEY}\" >> ${SPARK_CONF_PATH}"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.context import SparkContext\n",
"from pyspark.sql.session import SparkSession\n",
"\n",
"sc = SparkContext()\n",
"spark = SparkSession(sc)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+----+---------------+--------------------+--------------------+----+--------------------+----+------+----------+\n",
"|user| host| path| referer|code| agent|size|method| time|\n",
"+----+---------------+--------------------+--------------------+----+--------------------+----+------+----------+\n",
"|null| 224.225.147.72|/category/books?f...| /item/games/2018| 200|Mozilla/4.0 (comp...| 135| GET|1412377189|\n",
"|null|136.204.225.125| /category/health| /category/games| 200|Mozilla/5.0 (comp...| 136| GET|1412377179|\n",
"|null| 32.57.104.68| /category/office| /category/computers| 200|Mozilla/5.0 (comp...| 113| GET|1412377169|\n",
"|null| 144.138.86.216| /category/software| -| 200|Mozilla/4.0 (comp...| 135| GET|1412377158|\n",
"|null| 200.69.132.222| /item/software/1166| -| 200|Mozilla/5.0 (Wind...| 47| GET|1412377148|\n",
"|null| 200.81.68.44| /category/cameras|/category/electro...| 200|Mozilla/5.0 (comp...| 81| GET|1412377137|\n",
"|null| 44.90.70.114| /category/office| -| 200|Mozilla/5.0 (Wind...| 68| GET|1412377127|\n",
"|null| 108.75.69.24|/category/electro...| -| 200|Mozilla/5.0 (comp...| 40| GET|1412377117|\n",
"|null| 216.177.45.198| /category/toys| /item/books/3472| 200|Mozilla/5.0 (iPho...| 62| GET|1412377106|\n",
"|null|184.189.183.181| /category/cameras|/item/giftcards/3836| 200|Mozilla/4.0 (comp...| 133| GET|1412377096|\n",
"+----+---------------+--------------------+--------------------+----+--------------------+----+------+----------+\n",
"\n"
]
}
],
"source": [
"df = spark.read.format(\"com.treasuredata.spark\") \\\n",
" .options(sql=\"SELECT * FROM www_access LIMIT 10\", engine=\"presto\") \\\n",
" .load(\"sample_datasets\")\n",
"df.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment