Skip to content

Instantly share code, notes, and snippets.

@rezapci
Created August 27, 2019 01:47
Show Gist options
  • Save rezapci/15c046fa26d77add4a00cf23092eb10e to your computer and use it in GitHub Desktop.
Save rezapci/15c046fa26d77add4a00cf23092eb10e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"nbformat_minor": 1,
"cells": [
{
"source": "This is the second assignment for the Coursera course \"Advanced Machine Learning and Signal Processing\"\n\n\nJust execute all cells one after the other and you are done - just note that in the last one you have to update your email address (the one you've used for coursera) and obtain a submission token, you get this from the programming assignment directly on coursera.\n\nPlease fill in the sections labelled with \"###YOUR_CODE_GOES_HERE###\"",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 1,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "Waiting for a Spark session to start...\nSpark Initialization Done! ApplicationId = app-20190827005931-0000\nKERNEL_ID = 0ec0e13f-b527-4551-9915-cd9099bbdefb\n--2019-08-27 00:59:34-- https://github.com/IBM/coursera/raw/master/coursera_ml/a2.parquet\nResolving github.com (github.com)... 192.30.253.112\nConnecting to github.com (github.com)|192.30.253.112|:443... connected.\nHTTP request sent, awaiting response... 302 Found\nLocation: https://raw.githubusercontent.com/IBM/coursera/master/coursera_ml/a2.parquet [following]\n--2019-08-27 00:59:34-- https://raw.githubusercontent.com/IBM/coursera/master/coursera_ml/a2.parquet\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 59032 (58K) [application/octet-stream]\nSaving to: 'a2.parquet'\n\na2.parquet 100%[===================>] 57.65K --.-KB/s in 0.003s \n\n2019-08-27 00:59:34 (17.8 MB/s) - 'a2.parquet' saved [59032/59032]\n\n"
}
],
"source": "!wget https://github.com/IBM/coursera/raw/master/coursera_ml/a2.parquet"
},
{
"source": "Now it\u2019s time to have a look at the recorded sensor data. You should see data similar to the one exemplified below\u2026.\n",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 2,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "+-----+-----------+-------------------+-------------------+-------------------+\n|CLASS| SENSORID| X| Y| Z|\n+-----+-----------+-------------------+-------------------+-------------------+\n| 0| 26| 380.66434005495194| -139.3470983812975|-247.93697521077704|\n| 0| 29| 104.74324299209692| -32.27421440203938|-25.105013725863852|\n| 0| 8589934658| 118.11469236129976| 45.916682927433534| -87.97203782706572|\n| 0|34359738398| 246.55394030642543|-0.6122810693132044|-398.18662513951506|\n| 0|17179869241|-190.32584900181487| 234.7849657520335|-206.34483804019288|\n| 0|25769803830| 178.62396382387422| -47.07529438881511| 84.38310769821979|\n| 0|25769803831| 85.03128805189493|-4.3024316644854546|-1.1841857567516714|\n| 0|34359738411| 26.786262674736566| -46.33193951911338| 20.880756008396055|\n| 0| 8589934592|-16.203752396859194| 51.080957032176954| -96.80526656416971|\n| 0|25769803852| 47.2048142440404| -78.2950899652916| 181.99604091494786|\n| 0|34359738369| 15.608872398939273| -79.90322809181754| 69.62150711098005|\n| 0| 19|-4.8281721129789315| -67.38050508399905| 221.24876396496404|\n| 0| 54| -98.40725712852762|-19.989364074314732| -302.695196085276|\n| 0|17179869313| 22.835845394816594| 17.1633660118843| 32.877914832011385|\n| 0|34359738454| 84.20178070080324| -32.81572075916947| -48.63517643958031|\n| 0| 0| 56.54732521345129| -7.980106018032676| 95.05162719436447|\n| 0|17179869201| -57.6008655247749| 5.135393798773895| 236.99158698947267|\n| 0|17179869308| -65.59264738389012| -48.92660057215126| -61.58970715383383|\n| 0|25769803790| 34.82337351291005| 9.483542084393937| 197.6066372962772|\n| 0|25769803825| 39.80573823439121|-0.7955236412785212| -79.66652640650325|\n+-----+-----------+-------------------+-------------------+-------------------+\nonly showing top 20 rows\n\n"
}
],
"source": "df=spark.read.load('a2.parquet')\n\ndf.createOrReplaceTempView(\"df\")\nspark.sql(\"SELECT * from df\").show()"
},
{
"source": "Please create a VectorAssembler which consumes columns X, Y and Z and produces a column \u201cfeatures\u201d\n",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 3,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "from pyspark.ml.feature import VectorAssembler\nvectorAssembler=VectorAssembler(inputCols=[\"X\", \"Y\", \"Z\"], outputCol=\"features\")"
},
{
"source": "Please instantiate a classifier from the SparkML package and assign it to the classifier variable. Make sure to either\n1.\tRename the \u201cCLASS\u201d column to \u201clabel\u201d or\n2.\tSpecify the label-column correctly to be \u201cCLASS\u201d\n",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 4,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "from pyspark.ml.classification import GBTClassifier\n\nclassifier = GBTClassifier(labelCol='CLASS', featuresCol='features', maxIter=10)"
},
{
"source": "Let\u2019s train and evaluate\u2026\n",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 5,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "from pyspark.ml import Pipeline\npipeline = Pipeline(stages=[vectorAssembler, classifier])"
},
{
"execution_count": 6,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "model = pipeline.fit(df)"
},
{
"execution_count": 7,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "prediction = model.transform(df)"
},
{
"execution_count": 8,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "+-----+-----------+-------------------+-------------------+-------------------+--------------------+--------------------+--------------------+----------+\n|CLASS| SENSORID| X| Y| Z| features| rawPrediction| probability|prediction|\n+-----+-----------+-------------------+-------------------+-------------------+--------------------+--------------------+--------------------+----------+\n| 0| 26| 380.66434005495194| -139.3470983812975|-247.93697521077704|[380.664340054951...|[1.32589766213010...|[0.93412155816733...| 0.0|\n| 0| 29| 104.74324299209692| -32.27421440203938|-25.105013725863852|[104.743242992096...|[1.32589766213010...|[0.93412155816733...| 0.0|\n| 0| 8589934658| 118.11469236129976| 45.916682927433534| -87.97203782706572|[118.114692361299...|[1.32680169324328...|[0.93423273625928...| 0.0|\n| 0|34359738398| 246.55394030642543|-0.6122810693132044|-398.18662513951506|[246.553940306425...|[1.32711813715030...|[0.93427161142354...| 0.0|\n| 0|17179869241|-190.32584900181487| 234.7849657520335|-206.34483804019288|[-190.32584900181...|[1.32590267922033...|[0.93412217565278...| 0.0|\n| 0|25769803830| 178.62396382387422| -47.07529438881511| 84.38310769821979|[178.623963823874...|[1.32589766213010...|[0.93412155816733...| 0.0|\n| 0|25769803831| 85.03128805189493|-4.3024316644854546|-1.1841857567516714|[85.0312880518949...|[1.32621410603712...|[0.93416049441777...| 0.0|\n| 0|34359738411| 26.786262674736566| -46.33193951911338| 20.880756008396055|[26.7862626747365...|[1.32589766213010...|[0.93412155816733...| 0.0|\n| 0| 8589934592|-16.203752396859194| 51.080957032176954| -96.80526656416971|[-16.203752396859...|[1.32590267922033...|[0.93412217565278...| 0.0|\n| 0|25769803852| 47.2048142440404| -78.2950899652916| 181.99604091494786|[47.2048142440404...|[1.32589955669978...|[0.93412179134479...| 0.0|\n| 0|34359738369| 15.608872398939273| -79.90322809181754| 69.62150711098005|[15.6088723989392...|[1.32590267922033...|[0.93412217565278...| 0.0|\n| 0| 19|-4.8281721129789315| -67.38050508399905| 221.24876396496404|[-4.8281721129789...|[1.32396579485090...|[0.93388339065422...| 0.0|\n| 0| 54| -98.40725712852762|-19.989364074314732| -302.695196085276|[-98.407257128527...|[1.32590267922033...|[0.93412217565278...| 0.0|\n| 0|17179869313| 22.835845394816594| 17.1633660118843| 32.877914832011385|[22.8358453948165...|[1.32590267922033...|[0.93412217565278...| 0.0|\n| 0|34359738454| 84.20178070080324| -32.81572075916947| -48.63517643958031|[84.2017807008032...|[1.32589766213010...|[0.93412155816733...| 0.0|\n| 0| 0| 56.54732521345129| -7.980106018032676| 95.05162719436447|[56.5473252134512...|[1.32589766213010...|[0.93412155816733...| 0.0|\n| 0|17179869201| -57.6008655247749| 5.135393798773895| 236.99158698947267|[-57.600865524774...|[1.33122144395126...|[0.93477377205806...| 0.0|\n| 0|17179869308| -65.59264738389012| -48.92660057215126| -61.58970715383383|[-65.592647383890...|[1.32590267922033...|[0.93412217565278...| 0.0|\n| 0|25769803790| 34.82337351291005| 9.483542084393937| 197.6066372962772|[34.8233735129100...|[1.51228073545265...|[0.95367147883022...| 0.0|\n| 0|25769803825| 39.80573823439121|-0.7955236412785212| -79.66652640650325|[39.8057382343912...|[1.32711813715030...|[0.93427161142354...| 0.0|\n+-----+-----------+-------------------+-------------------+-------------------+--------------------+--------------------+--------------------+----------+\nonly showing top 20 rows\n\n"
}
],
"source": "prediction.show()"
},
{
"execution_count": 9,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"execution_count": 9,
"metadata": {},
"data": {
"text/plain": "0.9986850756081526"
},
"output_type": "execute_result"
}
],
"source": "from pyspark.ml.evaluation import MulticlassClassificationEvaluator\nbinEval = MulticlassClassificationEvaluator().setMetricName(\"accuracy\") .setPredictionCol(\"prediction\").setLabelCol(\"CLASS\")\n \nbinEval.evaluate(prediction) "
},
{
"source": "If you are happy with the result (I\u2019m happy with > 0.55) please submit your solution to the grader by executing the following cells, please don\u2019t forget to obtain an assignment submission token (secret) from the Coursera\u2019s graders web page and paste it to the \u201csecret\u201d variable below, including your email address you\u2019ve used for Coursera. (0.55 means that you are performing better than random guesses)\n",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 10,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "!rm -Rf a2_m2.json"
},
{
"execution_count": 11,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "prediction = prediction.repartition(1)\nprediction.write.json('a2_m2.json')"
},
{
"execution_count": 12,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "--2019-08-27 01:02:11-- https://raw.githubusercontent.com/IBM/coursera/master/rklib.py\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.8.133\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.8.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 2540 (2.5K) [text/plain]\nSaving to: 'rklib.py'\n\nrklib.py 100%[===================>] 2.48K --.-KB/s in 0s \n\n2019-08-27 01:02:11 (57.8 MB/s) - 'rklib.py' saved [2540/2540]\n\n"
}
],
"source": "!rm -f rklib.py\n!wget https://raw.githubusercontent.com/IBM/coursera/master/rklib.py"
},
{
"execution_count": 13,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "import zipfile\n\ndef zipdir(path, ziph):\n for root, dirs, files in os.walk(path):\n for file in files:\n ziph.write(os.path.join(root, file))\n\nzipf = zipfile.ZipFile('a2_m2.json.zip', 'w', zipfile.ZIP_DEFLATED)\nzipdir('a2_m2.json', zipf)\nzipf.close()"
},
{
"execution_count": 14,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "!base64 a2_m2.json.zip > a2_m2.json.zip.base64"
},
{
"execution_count": 16,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "Submission successful, please check on the coursera grader page for the status\n-------------------------\n{\"elements\":[{\"itemId\":\"LTL4F\",\"id\":\"f_F-qCtuEei_fRLwaVDk3g~LTL4F~eVfXF8hmEemMQArn5bUERg\",\"courseId\":\"f_F-qCtuEei_fRLwaVDk3g\"}],\"paging\":{},\"linked\":{}}\n-------------------------\n"
}
],
"source": "from rklib import submit\nkey = \"J3sDL2J8EeiaXhILFWw2-g\"\npart = \"G4P6f\"\nemail = \"rezapci@msn.com\"\nsecret = \"CEcetftDxBQRKA2t\"\n\nwith open('a2_m2.json.zip.base64', 'r') as myfile:\n data=myfile.read()\nsubmit(email, secret, key, part, [part], data)"
},
{
"execution_count": null,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": ""
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.6 with Spark",
"name": "python36",
"language": "python3"
},
"language_info": {
"mimetype": "text/x-python",
"nbconvert_exporter": "python",
"version": "3.6.8",
"name": "python",
"file_extension": ".py",
"pygments_lexer": "ipython3",
"codemirror_mode": {
"version": 3,
"name": "ipython"
}
}
},
"nbformat": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment