Last active
August 29, 2015 14:20
-
-
Save vitillo/9645b5f6849bd2051ea8 to your computer and use it in GitHub Desktop.
Telemetry v4 validation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"nbformat_minor": 0, "cells": [{"source": "### Telemetry v4 validation", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "import ujson as json\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport numpy as np\nimport plotly.plotly as py\n\nfrom moztelemetry import get_pings, get_pings_properties, get_one_ping_per_client\nfrom __future__ import division\n\n%pylab inline", "outputs": [{"output_type": "stream", "name": "stdout", "text": "Populating the interactive namespace from numpy and matplotlib\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 7, "cell_type": "code", "source": "def get_subset(pings):\n return get_pings_properties(pings, [\"clientId\", \n \"payload/info/sessionId\", \n \"payload/info/subsessionCounter\",\n \"payload/info/reason\"])\n\ndef client_session_id(ping):\n return \"{}:{}\".format(ping[\"clientId\"], ping[\"payload/info/sessionId\"])\n\ndef get_shutdown_ids(pings):\n return pings.filter(lambda p: p.get(\"payload/info/reason\", \"\") == \"shutdown\")\\\n .map(lambda p: client_session_id(p))\n \ndef get_start_ids(pings):\n return pings.filter(lambda p: p.get(\"payload/info/subsessionCounter\", \"\") == 1)\\\n .map(lambda p: client_session_id(p))\n \ndef filter_by_ids(pings, sessionids):\n return pings.filter(lambda p: client_session_id(p) in sessionids)\n\ndef recent_buildid_filter(ping):\n return ping[\"appBuildId\"] >= \"20150423000000\"", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "Get all main pings for a single day:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 8, "cell_type": "code", "source": "date_pings = get_pings(sc, app=\"Firefox\", channel=\"nightly\", submission_date=\"20150426\", schema=\"v4\")", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 9, "cell_type": "code", "source": "main_pings = date_pings.filter(recent_buildid_filter)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 10, "cell_type": "code", "source": "main_subset = get_subset(main_pings)\nmain_subset = main_subset.filter(lambda p: \"clientId\" in p)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 11, "cell_type": "code", "source": "main_subset.cache()\nmain_subset.count()", "outputs": [{"execution_count": 11, "output_type": "execute_result", "data": {"text/plain": "118003"}, "metadata": {}}], "metadata": {"scrolled": true, "collapsed": false, "trusted": true}}, {"source": "Get only those fragment belonging to sessions that have both a starting fragment and an ending fragment in the selected timerange. That should allow us to identify the sessions for which we are *likely* to have all fragments. I am not sure yet how likely is likely though...", "cell_type": "markdown", "metadata": {}}, {"execution_count": 12, "cell_type": "code", "source": "shutdown_ids = set(get_shutdown_ids(main_subset).collect())", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 13, "cell_type": "code", "source": "start_ids = set(get_start_ids(main_subset).collect())", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 14, "cell_type": "code", "source": "complete_ids = shutdown_ids.intersection(start_ids)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 15, "cell_type": "code", "source": "main_complete_session_fragments = filter_by_ids(main_subset, complete_ids)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Let's group fragments by their session id and check for duplicated or missing subsession counters:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 16, "cell_type": "code", "source": "frame = pd.DataFrame(main_complete_session_fragments.collect())", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 17, "cell_type": "code", "source": "def process_subsessions(sub):\n ss_count = sorted(list(sub[\"payload/info/subsessionCounter\"]))\n valid = len(ss_count) == ss_count[-1]\n return pd.Series({\"subsession_counter\": ss_count, \"n_subsessions\": len(ss_count), \"valid\": valid})\n\ngrouped_frame = frame.groupby([\"clientId\", \"payload/info/sessionId\"]).apply(lambda x: process_subsessions(x))", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "If we consider only sessions that have more than one fragment, what's the proportion of sessions with an incorrect ordering?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 18, "cell_type": "code", "source": "correct = grouped_frame[np.logical_and(grouped_frame[\"valid\"] == True, grouped_frame[\"n_subsessions\"] > 1)]\ninvalid = grouped_frame[grouped_frame[\"valid\"] == False]\nprint \"{:.2f}%\".format(100*len(invalid)/(len(correct) + len(invalid)))", "outputs": [{"output_type": "stream", "name": "stdout", "text": "7.95%\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Some examples of sessions with incorrect fragment numbering:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 20, "cell_type": "code", "source": "grouped_frame[grouped_frame[\"valid\"] == False].ix[:10]", "outputs": [{"execution_count": 20, "output_type": "execute_result", "data": {"text/plain": " n_subsessions \\\nclientId payload/info/sessionId \n0306e635-8aba-4c17-bf36-0ac36fa190e9 e20f6b52-42ba-47bd-9630-5e9bb1c26642 2 \n045e2666-0a8a-4d76-9ac8-08a55dd256b8 6ee6fab4-7fdd-4430-b434-bc8c3922c691 2 \n05568c9a-9117-468e-8288-991b7bf488d7 7248c23d-7dad-4296-94ba-2730a038f778 2 \n05a02daa-52cf-455e-9469-440be2670f52 3ccfd5e3-989a-4017-875b-cb27da099f4a 2 \n08ad8686-e288-4012-a28e-78d3d5fc6e2a 86334fc8-bcc3-4c38-9786-c2ca99391daf 2 \n08ce21a6-bca8-4542-92d0-148b39e98e79 34e3eceb-ef1f-48cb-a406-da0988922b31 2 \n09aad15a-af2b-4707-b2f5-c0d81b758510 9e599a6e-f0b3-4892-a7e7-955da54d4a0b 5 \n0a57bc09-9299-4295-8b1e-fb7dc7f2c7e7 058f04de-6043-40d6-a211-e09b88746a4e 3 \n0cc629aa-b65d-403f-9669-7ce91655e98d 490de40f-75fc-40ab-bf22-27fdc798b843 3 \n 677ddb63-221b-4a78-8a75-8ed83af3d9d5 3 \n\n subsession_counter \\\nclientId payload/info/sessionId \n0306e635-8aba-4c17-bf36-0ac36fa190e9 e20f6b52-42ba-47bd-9630-5e9bb1c26642 [1, 1] \n045e2666-0a8a-4d76-9ac8-08a55dd256b8 6ee6fab4-7fdd-4430-b434-bc8c3922c691 [1, 1] \n05568c9a-9117-468e-8288-991b7bf488d7 7248c23d-7dad-4296-94ba-2730a038f778 [1, 1] \n05a02daa-52cf-455e-9469-440be2670f52 3ccfd5e3-989a-4017-875b-cb27da099f4a [1, 1] \n08ad8686-e288-4012-a28e-78d3d5fc6e2a 86334fc8-bcc3-4c38-9786-c2ca99391daf [1, 4] \n08ce21a6-bca8-4542-92d0-148b39e98e79 34e3eceb-ef1f-48cb-a406-da0988922b31 [1, 1] \n09aad15a-af2b-4707-b2f5-c0d81b758510 9e599a6e-f0b3-4892-a7e7-955da54d4a0b [1, 2, 4, 5, 6] \n0a57bc09-9299-4295-8b1e-fb7dc7f2c7e7 058f04de-6043-40d6-a211-e09b88746a4e [1, 2, 7] \n0cc629aa-b65d-403f-9669-7ce91655e98d 490de40f-75fc-40ab-bf22-27fdc798b843 [1, 2, 2] \n 677ddb63-221b-4a78-8a75-8ed83af3d9d5 [1, 1, 2] \n\n valid \nclientId payload/info/sessionId \n0306e635-8aba-4c17-bf36-0ac36fa190e9 e20f6b52-42ba-47bd-9630-5e9bb1c26642 False \n045e2666-0a8a-4d76-9ac8-08a55dd256b8 6ee6fab4-7fdd-4430-b434-bc8c3922c691 False \n05568c9a-9117-468e-8288-991b7bf488d7 7248c23d-7dad-4296-94ba-2730a038f778 False \n05a02daa-52cf-455e-9469-440be2670f52 3ccfd5e3-989a-4017-875b-cb27da099f4a False \n08ad8686-e288-4012-a28e-78d3d5fc6e2a 86334fc8-bcc3-4c38-9786-c2ca99391daf False \n08ce21a6-bca8-4542-92d0-148b39e98e79 34e3eceb-ef1f-48cb-a406-da0988922b31 False \n09aad15a-af2b-4707-b2f5-c0d81b758510 9e599a6e-f0b3-4892-a7e7-955da54d4a0b False \n0a57bc09-9299-4295-8b1e-fb7dc7f2c7e7 058f04de-6043-40d6-a211-e09b88746a4e False \n0cc629aa-b65d-403f-9669-7ce91655e98d 490de40f-75fc-40ab-bf22-27fdc798b843 False \n 677ddb63-221b-4a78-8a75-8ed83af3d9d5 False ", "text/html": "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>n_subsessions</th>\n <th>subsession_counter</th>\n <th>valid</th>\n </tr>\n <tr>\n <th>clientId</th>\n <th>payload/info/sessionId</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0306e635-8aba-4c17-bf36-0ac36fa190e9</th>\n <th>e20f6b52-42ba-47bd-9630-5e9bb1c26642</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>045e2666-0a8a-4d76-9ac8-08a55dd256b8</th>\n <th>6ee6fab4-7fdd-4430-b434-bc8c3922c691</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>05568c9a-9117-468e-8288-991b7bf488d7</th>\n <th>7248c23d-7dad-4296-94ba-2730a038f778</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>05a02daa-52cf-455e-9469-440be2670f52</th>\n <th>3ccfd5e3-989a-4017-875b-cb27da099f4a</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>08ad8686-e288-4012-a28e-78d3d5fc6e2a</th>\n <th>86334fc8-bcc3-4c38-9786-c2ca99391daf</th>\n <td> 2</td>\n <td> [1, 4]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>08ce21a6-bca8-4542-92d0-148b39e98e79</th>\n <th>34e3eceb-ef1f-48cb-a406-da0988922b31</th>\n <td> 2</td>\n <td> [1, 1]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>09aad15a-af2b-4707-b2f5-c0d81b758510</th>\n <th>9e599a6e-f0b3-4892-a7e7-955da54d4a0b</th>\n <td> 5</td>\n <td> [1, 2, 4, 5, 6]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>0a57bc09-9299-4295-8b1e-fb7dc7f2c7e7</th>\n <th>058f04de-6043-40d6-a211-e09b88746a4e</th>\n <td> 3</td>\n <td> [1, 2, 7]</td>\n <td> False</td>\n </tr>\n <tr>\n <th rowspan=\"2\" valign=\"top\">0cc629aa-b65d-403f-9669-7ce91655e98d</th>\n <th>490de40f-75fc-40ab-bf22-27fdc798b843</th>\n <td> 3</td>\n <td> [1, 2, 2]</td>\n <td> False</td>\n </tr>\n <tr>\n <th>677ddb63-221b-4a78-8a75-8ed83af3d9d5</th>\n <td> 3</td>\n <td> [1, 1, 2]</td>\n <td> False</td>\n </tr>\n </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Are there fragments that belong to the above selected sessions before the selected date, i.e. are fragments sent out of order?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 79, "cell_type": "code", "source": "def oo_fragments(date):\n def client_session_id(ping):\n return \"{}:{}\".format(ping[\"clientId\"], ping[\"payload\"][\"info\"][\"sessionId\"])\n \n return get_pings(sc, app=\"Firefox\", channel=\"nightly\", submission_date=date, schema=\"v4\")\\\n .filter(recent_buildid_filter).filter(lambda p: \"clientId\" in p)\\\n .filter(lambda p: client_session_id(p) in complete_ids)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 80, "cell_type": "code", "source": "frags = oo_fragments(\"20150423\").collect()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 96, "cell_type": "code", "source": "frags[0][\"payload\"][\"info\"], frags[0][\"clientId\"], frags[0][\"submissionDate\"]", "outputs": [{"execution_count": 96, "output_type": "execute_result", "data": {"text/plain": "({u'addons': u'%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D:40.0a1',\n u'asyncPluginInit': False,\n u'previousBuildId': u'20150422030206',\n u'previousSubsessionId': u'57fa126f-c598-44b9-baa0-8e59d23971c3',\n u'profileSubsessionCounter': 28,\n u'reason': u'shutdown',\n u'revision': u'https://hg.mozilla.org/mozilla-central/rev/0b202671c9e2',\n u'sessionId': u'a1da662a-4c1d-4d91-b1cf-929055fbc393',\n u'sessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'subsessionCounter': 1,\n u'subsessionId': u'a37245ed-b7e9-448c-b686-d64c6132aab2',\n u'subsessionLength': 368,\n u'subsessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'timezoneOffset': 120},\n u'a797aa30-c0d2-46ab-a6b0-6246d45c4165',\n u'20150423')"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 86, "cell_type": "code", "source": "def filter_by_client_session(ping, client, session):\n return ping.get(\"clientId\", \"\") == client and ping[\"payload\"][\"info\"][\"sessionId\"] == session", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"execution_count": 90, "cell_type": "code", "source": "trace = main_pings.filter(lambda p: filter_by_client_session(p, \n \"a797aa30-c0d2-46ab-a6b0-6246d45c4165\", \n \"a1da662a-4c1d-4d91-b1cf-929055fbc393\")).collect()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 95, "cell_type": "code", "source": "trace[0][\"payload\"][\"info\"], trace[0][\"submissionDate\"]", "outputs": [{"execution_count": 95, "output_type": "execute_result", "data": {"text/plain": "({u'addons': u'%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D:40.0a1',\n u'asyncPluginInit': False,\n u'previousBuildId': u'20150422030206',\n u'previousSubsessionId': u'57fa126f-c598-44b9-baa0-8e59d23971c3',\n u'profileSubsessionCounter': 28,\n u'reason': u'shutdown',\n u'revision': u'https://hg.mozilla.org/mozilla-central/rev/0b202671c9e2',\n u'sessionId': u'a1da662a-4c1d-4d91-b1cf-929055fbc393',\n u'sessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'subsessionCounter': 1,\n u'subsessionId': u'a37245ed-b7e9-448c-b686-d64c6132aab2',\n u'subsessionLength': 368,\n u'subsessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'timezoneOffset': 120},\n u'20150426')"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 98, "cell_type": "code", "source": "json.dumps(frags[0][\"payload\"]) == json.dumps(trace[0][\"payload\"])", "outputs": [{"execution_count": 98, "output_type": "execute_result", "data": {"text/plain": "True"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 101, "cell_type": "code", "source": "trace[1][\"payload\"][\"info\"]", "outputs": [{"execution_count": 101, "output_type": "execute_result", "data": {"text/plain": "{u'addons': u'%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D:40.0a1',\n u'asyncPluginInit': False,\n u'previousBuildId': u'20150422030206',\n u'previousSubsessionId': u'57fa126f-c598-44b9-baa0-8e59d23971c3',\n u'profileSubsessionCounter': 28,\n u'reason': u'shutdown',\n u'revision': u'https://hg.mozilla.org/mozilla-central/rev/0b202671c9e2',\n u'sessionId': u'a1da662a-4c1d-4d91-b1cf-929055fbc393',\n u'sessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'subsessionCounter': 1,\n u'subsessionId': u'a37245ed-b7e9-448c-b686-d64c6132aab2',\n u'subsessionLength': 368,\n u'subsessionStartDate': u'2015-04-23T00:00:00.0+02:00',\n u'timezoneOffset': 120}"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 102, "cell_type": "code", "source": "json.dumps(frags[0][\"payload\"]) == json.dumps(trace[1][\"payload\"])", "outputs": [{"execution_count": 102, "output_type": "execute_result", "data": {"text/plain": "True"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Are fragments that have the same numbering duplicates?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 27, "cell_type": "code", "source": "trace = main_pings.filter(lambda p: filter_by_client_session(p, \n \"003a4cc1-17cc-44e5-ae7a-721eded7bada\",\n \"1ba3635e-0353-4dd8-a5ee-5d65601fe3e3\")", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 28, "cell_type": "code", "source": "trace_fragments = trace.collect()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 33, "cell_type": "code", "source": "json.dumps(trace_fragments[0]) == json.dumps(trace_fragments[1])", "outputs": [{"execution_count": 33, "output_type": "execute_result", "data": {"text/plain": "False"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Fragments are different, but how much so?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 72, "cell_type": "code", "source": "def compare(obj1, obj2): \n for key, value in obj1.iteritems():\n if key not in obj2:\n print \"{} is missing in obj2\".format(key)\n continue\n \n if type(value) is dict:\n compare(value, obj2[key])\n continue\n\n if value != obj2[key]:\n print \"{}: {} vs {}\".format(key, value, obj2[key])\n \n for key in set(obj2.keys()).difference(set(obj1.keys())):\n print \"{} is missing in obj1\".format(key)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 73, "cell_type": "code", "source": "compare(trace_fragments[0][\"payload\"][\"info\"], trace_fragments[1][\"payload\"][\"info\"])", "outputs": [{"output_type": "stream", "name": "stdout", "text": "reason: shutdown vs aborted-session\nsubsessionLength: 407 vs 360\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 77, "cell_type": "code", "source": "compare(trace_fragments[0][\"payload\"][\"simpleMeasurements\"], trace_fragments[1][\"payload\"][\"simpleMeasurements\"])", "outputs": [{"output_type": "stream", "name": "stdout", "text": "activeTicks: 81 vs 72\nprofileBeforeChange is missing in obj2\nquitApplication is missing in obj2\nuptime: 7 vs 6\nleft: 28 vs 27\nback-button is missing in obj2\ntotalTime: 418 vs 371\nXPIDB_saves_overlapped is missing in obj2\nXPIDB_saves_late is missing in obj2\nXPIDB_saves_total is missing in obj2\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": null, "cell_type": "code", "source": "", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.9", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment