Created
September 22, 2015 17:48
-
-
Save Dexterp37/52247591346eb3c26dac to your computer and use it in GitHub Desktop.
Find how many aborted-session, "main" pings have the same subsessionId of shutdown "main" pings (using clientId sampling)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"nbformat_minor": 0, "cells": [{"source": "# Bug 1196852 - aborted-session pings may not get deleted properly", "cell_type": "markdown", "metadata": {}}, {"source": "We want to check on the scale of how many aborted-session pings have the same subsessionId of shutdown pings. That means, checking server-side:\n * for buildids >= 20150722\n * how many aborted-session pings & shutdown pings are there with the same subsession id", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "import binascii\nimport ujson as json\nimport pandas as pd\nimport numpy as np\n\nfrom moztelemetry import get_pings, get_pings_properties, get_one_ping_per_client, get_clients_history", "outputs": [], "metadata": {"scrolled": true, "collapsed": false, "trusted": true}}, {"source": "###Define some handy filtering functions", "cell_type": "markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", "source": "def get_interesting_reasons(pings):\n return pings.filter(lambda p: p.get(\"payload/info/reason\", \"\") in [\"shutdown\", \"aborted-session\"])\n\ndef sample_by_clientId(ping):\n client_id = ping.get(\"clientId\", None)\n return client_id and binascii.crc32(ping[\"clientId\"]) % 100 < 10", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "### Get the main pings, with the relevant reasons, and filter them by buildId", "cell_type": "markdown", "metadata": {}}, {"source": "Let's fetch Telemetry submissions for some recent builds...", "cell_type": "markdown", "metadata": {}}, {"execution_count": 3, "cell_type": "code", "source": "build_ids = (\"20150722000000\", \"20150829999999\")\nmain_pings = get_pings(sc, app=\"Firefox\", channel=\"nightly\", build_id=build_ids, doc_type=\"main\", schema=\"v4\", fraction=1.0)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Sample 10% of pings, per clientId...", "cell_type": "markdown", "metadata": {}}, {"execution_count": 4, "cell_type": "code", "source": "sampled_pings = main_pings.filter(sample_by_clientId)", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "... and extract only the attributes we need from the Telemetry submissions:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 5, "cell_type": "code", "source": "subset = get_pings_properties(sampled_pings, [\"meta/sampleId\",\n \"payload/info/reason\",\n \"payload/info/subsessionId\"])", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Filter for the reasons we're interested in.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 6, "cell_type": "code", "source": "filteredSubset = get_interesting_reasons(subset)", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "Caching is fundamental as it allows for an iterative, real-time development workflow:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 7, "cell_type": "code", "source": "#cached = filteredSubset.cache()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many pings are we looking at?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 8, "cell_type": "code", "source": "filteredSubset.count()", "outputs": [{"execution_count": 8, "output_type": "execute_result", "data": {"text/plain": "561858"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "##Count the overlapping subsessionIds", "cell_type": "markdown", "metadata": {}}, {"source": "Let's find the subsessionIds belonging to \"shutdown\" main pings and the ones for \"aborted-session\" main pings.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 9, "cell_type": "code", "source": "shutdownSubsessionIds = filteredSubset.filter(lambda p: p.get(\"payload/info/reason\", \"\") == \"shutdown\").map(lambda p: p[\"payload/info/subsessionId\"])\nabortedSubsessionIds = filteredSubset.filter(lambda p: p.get(\"payload/info/reason\", \"\") == \"aborted-session\").map(lambda p: p[\"payload/info/subsessionId\"])", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many of them?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 10, "cell_type": "code", "source": "shutdownSubsessionIds.count()", "outputs": [{"execution_count": 10, "output_type": "execute_result", "data": {"text/plain": "531689"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 11, "cell_type": "code", "source": "abortedSubsessionIds.count()", "outputs": [{"execution_count": 11, "output_type": "execute_result", "data": {"text/plain": "30070"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Find how many aborted session pings have the same subsessionId of shutdown pings.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 12, "cell_type": "code", "source": "overlappingIds = abortedSubsessionIds.intersection(shutdownSubsessionIds)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many overlapping/broken aborted session then?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 13, "cell_type": "code", "source": "overlappingIds.count()", "outputs": [{"execution_count": 13, "output_type": "execute_result", "data": {"text/plain": "142"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.9", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment