Skip to content

Instantly share code, notes, and snippets.

@vitillo
Last active August 29, 2015 14:20
Show Gist options
  • Save vitillo/9645b5f6849bd2051ea8 to your computer and use it in GitHub Desktop.
Save vitillo/9645b5f6849bd2051ea8 to your computer and use it in GitHub Desktop.
Telemetry v4 validation
Display the source blob
Display the rendered blob
Raw
{"nbformat_minor": 0, "cells": [{"source": "### Telemetry v4 validation", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "import ujson as json\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport numpy as np\nimport plotly.plotly as py\n\nfrom moztelemetry import get_pings, get_pings_properties, get_one_ping_per_client\nfrom __future__ import division\n\n%pylab inline", "outputs": [{"output_type": "stream", "name": "stdout", "text": "Populating the interactive namespace from numpy and matplotlib\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 7, "cell_type": "code", "source": "def get_subset(pings):\n return get_pings_properties(pings, [\"clientId\", \n \"payload/info/sessionId\", \n \"payload/info/subsessionCounter\",\n \"payload/info/reason\"])\n\ndef client_session_id(ping):\n return \"{}:{}\".format(ping[\"clientId\"], ping[\"payload/info/sessionId\"])\n\ndef get_shutdown_ids(pings):\n return pings.filter(lambda p: p.get(\"payload/info/reason\", \"\") == \"shutdown\")\\\n .map(lambda p: client_session_id(p))\n \ndef get_start_ids(pings):\n return pings.filter(lambda p: p.get(\"payload/info/subsessionCounter\", \"\") == 1)\\\n .map(lambda p: client_session_id(p))\n \ndef filter_by_ids(pings, sessionids):\n return pings.filter(lambda p: client_session_id(p) in sessionids)\n\ndef recent_buildid_filter(ping):\n return ping[\"appBuildId\"] >= \"20150423000000\"", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "Get all main pings for a single day:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 8, "cell_type": "code", "source": "date_pings = get_pings(sc, app=\"Firefox\", channel=\"nightly\", submission_date=\"20150426\", schema=\"v4\")", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 9, "cell_type": "code", "source": "main_pings = date_pings.filter(recent_buildid_filter)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 10, "cell_type": "code", "source": "main_subset = get_subset(main_pings)\nmain_subset = main_subset.filter(lambda p: \"clientId\" in p)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 11, "cell_type": "code", "source": "main_subset.cache()\nmain_subset.count()", "outputs": [{"execution_count": 11, "output_type": "execute_result", "data": {"text/plain": "118003"}, "metadata": {}}], "metadata": {"scrolled": true, "collapsed": false, "trusted": true}}, {"source": "Get only those fragment belonging to sessions that have both a starting fragment and an ending fragment in the selected timerange. That should allow us to identify the sessions for which we are *likely* to have all fragments. I am not sure yet how likely is likely though...", "cell_type": "markdown", "metadata": {}}, {"execution_count": 12, "cell_type": "code", "source": "shutdown_ids = set(get_shutdown_ids(main_subset).collect())", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 13, "cell_type": "code", "source": "start_ids = set(get_start_ids(main_subset).collect())", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 14, "cell_type": "code", "source": "complete_ids = shutdown_ids.intersection(start_ids)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 15, "cell_type": "code", "source": "main_complete_session_fragments = filter_by_ids(main_subset, complete_ids)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Let's group fragments by their session id and check for duplicated or missing subsession counters:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 16, "cell_type": "code", "source": "frame = pd.DataFrame(main_complete_session_fragments.collect())", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 17, "cell_type": "code", "source": "def process_subsessions(sub):\n ss_count = sorted(list(sub[\"payload/info/subsessionCounter\"]))\n valid = len(ss_count) == ss_count[-1]\n return pd.Series({\"subsession_counter\": ss_count, \"n_subsessions\": len(ss_count), \"valid\": valid})\n\ngrouped_frame = frame.groupby([\"clientId\", \"payload/info/sessionId\"]).apply(lambda x: process_subsessions(x))", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "If we consider only sessions that have more than one fragment, what's the proportion of sessions with an incorrect ordering?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 18, "cell_type": "code", "source": "correct = grouped_frame[np.logical_and(grouped_frame[\"valid\"] == True, grouped_frame[\"n_subsessions\"] > 1)]\ninvalid = grouped_frame[grouped_frame[\"valid\"] == False]\nprint \"{:.2f}%\".format(100*len(invalid)/(len(correct) + len(invalid)))", "outputs": [{"output_type": "stream", "name": "stdout", "text": "7.95%\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Some examples of sessions with incorrect fragment numbering:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 20, "cell_type": "code", "source": "grouped_frame[grouped_frame[\"valid\"] == False].ix[:10]", "outputs": [{"execution_count": 20, "output_type": "execute_result", "data": {"text/plain": " n_subsessions \\\nclientId payload/info/sessionId \n0306e635-8aba-4c17-bf36-0ac36fa190e9 e20f6b52-42ba-47bd-9630-5e9bb1c26642 2 \n045e2666-0a8a-4d76-9ac8-08a55dd256b8 6ee6fab4-7fdd-4430-b434-bc8c3922c691 2 \n05568c9a-9117-468e-8288-991b7bf488d7 7248c23d-7dad-4296-94ba-2730a038f778 2 \n05a02daa-52cf-455e-9469-440be2670f52 3ccfd5e3-989a-4017-875b-cb27da099f4a 2 \n08ad8686-e288-4012-a28e-78d3d5fc6e2a 86334fc8-bcc3-4c38-9786-c2ca99391daf 2 \n08ce21a6-bca8-4542-92d0-148b39e98e79 34e3eceb-ef1f-48cb-a406-da0988922b31 2 \n09aad15a-af2b-4707-b2f5-c0d81b758510 9e599a6e-f0b3-4892-a7e7-955da54d4a0b 5 \n0a57bc09-9299-4295-8b1e-fb7dc7f2c7e7 058f04de-6043-40d6-a211-e09b88746a4e 3 \n0cc629aa-b65d-403f-9669-7ce91655e98d 490de40f-75fc-40ab-bf22-27fdc798b843 3 \n 677ddb63-221b-4a78-8a75-8ed83af3d9d5 3 \n\n subsession_counter \\\nclientId payload/info/sessionId \n0306e635-8aba-4c17-bf36-0ac36fa190e9 e20f6b52-42ba-47bd-9630-5e9bb1c26642 [1, 1] \n045e2666-0a8a-4d76-9ac8-08a55dd256b8 6ee6fab4-7fdd-4430-b434-bc8c3922c691 [1, 1] \n05568c9a-9117-468e-8288-991b7bf488d7 7248c23d-7dad-4296-94ba-2730a038f778 [1, 1] \n05a02daa-52cf-455e-9469-440be2670f52 3ccfd5e3-989a-4017-875b-cb27da099f4a [1, 1] \n08ad8686-e288-4012-a28e-78d3d5fc6e2a 86334fc8-bcc3-4c38-9786-c2ca99391daf [1, 4] \n08ce21a6-bca8-4542-92d0-148b39e98e79 34e3eceb-ef1f-48cb-a406-da0988922b31 [1, 1] \n09aad15a-af2b-4707-b2f5-c0d81b758510 9e599a6e-f0b3-4892-a7e7-955da54d4a0b [1, 2, 4, 5, 6] \n0a57bc09-9299-4295-8b1e-fb7dc7f2c7e7 058f04de-6043-40d6-a211-e09b88746a4e [1, 2, 7] \n0cc629aa-b65d-403f-9669-7ce91655e98d 490de40f-75fc-40ab-bf22-27fdc798b843 [1, 2, 2] \n 677ddb63-221b-4a78-8a75-8ed83af3d9d5 [1, 1, 2] \n\n valid \nclientId payload/info/sessionId \n0306e635-8aba-4c17-bf36-0ac36fa190e9 e20f6b52-42ba-47bd-9630-5e9bb1c26642 False \n045e2666-0a8a-4d76-9ac8-08a55dd256b8 6ee6fab4-7fdd-4430-b434-bc8c3922c691 False \n05568c9a-9117-468e-8288-991b7bf488d7 7248c23d-7dad-4296-94ba-2730a038f778 False \n05a02daa-52cf-455e-9469-440be2670f52 3ccfd5e3-989a-4017-875b-cb27da099f4a False \n08ad8686-e288-4012-a28e-78d3d5fc6e2a 86334fc8-bcc3-4c38-9786-c2ca99391daf False \n08ce21a6-bca8-4542-92d0-148b39e98e79 34e3eceb-ef1f-48cb-a406-da0988922b31 False \n09aad15a-af2b-4707-b2f5-c0d81b758510 9e599a6e-f0b3-4892-a7e7-955da54d4a0b False \n0a57bc09-9299-4295-8b1e-fb7dc7f2c7e7 058f04de-6043-40d6-a211-e09b88746a4e False \n0cc629aa-b65d-403f-9669-7ce91655e98d 490de40f-75fc-40ab-bf22-27fdc798b843 False \n 677ddb63-221b-4a78-8a75-8ed83af3d9d5 False ", "text/html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>n_subsessions</th>\n <th>subsession_counter</th>\n <th>valid</th>\n </tr>\n <tr>\n <th>clientId</th>\n <th>payload/info/sessionId</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0306e635-8aba-4c17-bf36-0ac36fa190e9</th>\n <th>e20f6b52-42ba-47bd-9630-5e9bb1c26642</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>045e2666-0a8a-4d76-9ac8-08a55dd256b8</th>\n <th>6ee6fab4-7fdd-4430-b434-bc8c3922c691</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>05568c9a-9117-468e-8288-991b7bf488d7</th>\n <th>7248c23d-7dad-4296-94ba-2730a038f778</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>05a02daa-52cf-455e-9469-440be2670f52</th>\n <th>3ccfd5e3-989a-4017-875b-cb27da099f4a</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>08ad8686-e288-4012-a28e-78d3d5fc6e2a</th>\n <th>86334fc8-bcc3-4c38-9786-c2ca99391daf</th>\n <td> 2</td>\n <td> [1, 4]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>08ce21a6-bca8-4542-92d0-148b39e98e79</th>\n <th>34e3eceb-ef1f-48cb-a406-da0988922b31</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>09aad15a-af2b-4707-b2f5-c0d81b758510</th>\n <th>9e599a6e-f0b3-4892-a7e7-955da54d4a0b</th>\n <td> 5</td>\n <td> [1, 2, 4, 5, 6]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>0a57bc09-9299-4295-8b1e-fb7dc7f2c7e7</th>\n <th>058f04de-6043-40d6-a211-e09b88746a4e</th>\n <td> 3</td>\n <td> [1, 2, 7]</td>\n <td> False</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">0cc629aa-b65d-403f-9669-7ce91655e98d</th>\n <th>490de40f-75fc-40ab-bf22-27fdc798b843</th>\n <td> 3</td>\n <td> [1, 2, 2]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>677ddb63-221b-4a78-8a75-8ed83af3d9d5</th>\n <td> 3</td>\n <td> [1, 1, 2]</td>\n <td> False</td>\n </tr>\n </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Are there fragments that belong to the above selected sessions before the selected date, i.e. are fragments sent out of order?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 79, "cell_type": "code", "source": "def oo_fragments(date):\n def client_session_id(ping):\n return \"{}:{}\".format(ping[\"clientId\"], ping[\"payload\"][\"info\"][\"sessionId\"])\n \n return get_pings(sc, app=\"Firefox\", channel=\"nightly\", submission_date=date, schema=\"v4\")\\\n .filter(recent_buildid_filter).filter(lambda p: \"clientId\" in p)\\\n .filter(lambda p: client_session_id(p) in complete_ids)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 80, "cell_type": "code", "source": "frags = oo_fragments(\"20150423\").collect()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 96, "cell_type": "code", "source": "frags[0][\"payload\"][\"info\"], frags[0][\"clientId\"], frags[0][\"submissionDate\"]", "outputs": [{"execution_count": 96, "output_type": "execute_result", "data": {"text/plain": "({u'addons': u'%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D:40.0a1',\n u'asyncPluginInit': False,\n u'previousBuildId': u'20150422030206',\n u'previousSubsessionId': u'57fa126f-c598-44b9-baa0-8e59d23971c3',\n u'profileSubsessionCounter': 28,\n u'reason': u'shutdown',\n u'revision': u'https://hg.mozilla.org/mozilla-central/rev/0b202671c9e2',\n u'sessionId': u'a1da662a-4c1d-4d91-b1cf-929055fbc393',\n u'sessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'subsessionCounter': 1,\n u'subsessionId': u'a37245ed-b7e9-448c-b686-d64c6132aab2',\n u'subsessionLength': 368,\n u'subsessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'timezoneOffset': 120},\n u'a797aa30-c0d2-46ab-a6b0-6246d45c4165',\n u'20150423')"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 86, "cell_type": "code", "source": "def filter_by_client_session(ping, client, session):\n return ping.get(\"clientId\", \"\") == client and ping[\"payload\"][\"info\"][\"sessionId\"] == session", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 90, "cell_type": "code", "source": "trace = main_pings.filter(lambda p: filter_by_client_session(p, \n \"a797aa30-c0d2-46ab-a6b0-6246d45c4165\", \n \"a1da662a-4c1d-4d91-b1cf-929055fbc393\")).collect()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 95, "cell_type": "code", "source": "trace[0][\"payload\"][\"info\"], trace[0][\"submissionDate\"]", "outputs": [{"execution_count": 95, "output_type": "execute_result", "data": {"text/plain": "({u'addons': u'%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D:40.0a1',\n u'asyncPluginInit': False,\n u'previousBuildId': u'20150422030206',\n u'previousSubsessionId': u'57fa126f-c598-44b9-baa0-8e59d23971c3',\n u'profileSubsessionCounter': 28,\n u'reason': u'shutdown',\n u'revision': u'https://hg.mozilla.org/mozilla-central/rev/0b202671c9e2',\n u'sessionId': u'a1da662a-4c1d-4d91-b1cf-929055fbc393',\n u'sessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'subsessionCounter': 1,\n u'subsessionId': u'a37245ed-b7e9-448c-b686-d64c6132aab2',\n u'subsessionLength': 368,\n u'subsessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'timezoneOffset': 120},\n u'20150426')"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 98, "cell_type": "code", "source": "json.dumps(frags[0][\"payload\"]) == json.dumps(trace[0][\"payload\"])", "outputs": [{"execution_count": 98, "output_type": "execute_result", "data": {"text/plain": "True"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 101, "cell_type": "code", "source": "trace[1][\"payload\"][\"info\"]", "outputs": [{"execution_count": 101, "output_type": "execute_result", "data": {"text/plain": "{u'addons': u'%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D:40.0a1',\n u'asyncPluginInit': False,\n u'previousBuildId': u'20150422030206',\n u'previousSubsessionId': u'57fa126f-c598-44b9-baa0-8e59d23971c3',\n u'profileSubsessionCounter': 28,\n u'reason': u'shutdown',\n u'revision': u'https://hg.mozilla.org/mozilla-central/rev/0b202671c9e2',\n u'sessionId': u'a1da662a-4c1d-4d91-b1cf-929055fbc393',\n u'sessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'subsessionCounter': 1,\n u'subsessionId': u'a37245ed-b7e9-448c-b686-d64c6132aab2',\n u'subsessionLength': 368,\n u'subsessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'timezoneOffset': 120}"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 102, "cell_type": "code", "source": "json.dumps(frags[0][\"payload\"]) == json.dumps(trace[1][\"payload\"])", "outputs": [{"execution_count": 102, "output_type": "execute_result", "data": {"text/plain": "True"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Are fragments that have the same numbering duplicates?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 27, "cell_type": "code", "source": "trace = main_pings.filter(lambda p: filter_by_client_session(p, \n \"003a4cc1-17cc-44e5-ae7a-721eded7bada\",\n \"1ba3635e-0353-4dd8-a5ee-5d65601fe3e3\")", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 28, "cell_type": "code", "source": "trace_fragments = trace.collect()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 33, "cell_type": "code", "source": "json.dumps(trace_fragments[0]) == json.dumps(trace_fragments[1])", "outputs": [{"execution_count": 33, "output_type": "execute_result", "data": {"text/plain": "False"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Fragments are different, but how much so?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 72, "cell_type": "code", "source": "def compare(obj1, obj2): \n for key, value in obj1.iteritems():\n if key not in obj2:\n print \"{} is missing in obj2\".format(key)\n continue\n \n if type(value) is dict:\n compare(value, obj2[key])\n continue\n\n if value != obj2[key]:\n print \"{}: {} vs {}\".format(key, value, obj2[key])\n \n for key in set(obj2.keys()).difference(set(obj1.keys())):\n print \"{} is missing in obj1\".format(key)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 73, "cell_type": "code", "source": "compare(trace_fragments[0][\"payload\"][\"info\"], trace_fragments[1][\"payload\"][\"info\"])", "outputs": [{"output_type": "stream", "name": "stdout", "text": "reason: shutdown vs aborted-session\nsubsessionLength: 407 vs 360\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 77, "cell_type": "code", "source": "compare(trace_fragments[0][\"payload\"][\"simpleMeasurements\"], trace_fragments[1][\"payload\"][\"simpleMeasurements\"])", "outputs": [{"output_type": "stream", "name": "stdout", "text": "activeTicks: 81 vs 72\nprofileBeforeChange is missing in obj2\nquitApplication is missing in obj2\nuptime: 7 vs 6\nleft: 28 vs 27\nback-button is missing in obj2\ntotalTime: 418 vs 371\nXPIDB_saves_overlapped is missing in obj2\nXPIDB_saves_late is missing in obj2\nXPIDB_saves_total is missing in obj2\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": null, "cell_type": "code", "source": "", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.9", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment