Last active
September 23, 2015 16:12
-
-
Save Dexterp37/c0786d2ba2f782020264 to your computer and use it in GitHub Desktop.
DON'T USE -> WRONG SAMPLING - Find how many aborted-session, "main" pings have the same subsessionId of shutdown "main" pings.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"nbformat_minor": 0, "cells": [{"source": "# Bug 1196852 - aborted-session pings may not get deleted properly", "cell_type": "markdown", "metadata": {}}, {"source": "We want to check on the scale of how many aborted-session pings have the same subsessionId of shutdown pings. That means, checking server-side:\n * for buildids >= 20150722\n * how many aborted-session pings & shutdown pings are there with the same subsession id", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "import ujson as json\nimport pandas as pd\nimport numpy as np\n\nfrom moztelemetry import get_pings, get_pings_properties, get_one_ping_per_client, get_clients_history", "outputs": [], "metadata": {"scrolled": true, "collapsed": false, "trusted": true}}, {"source": "###Define some handy filtering functions", "cell_type": "markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", "source": "def get_interesting_reasons(pings):\n return pings.filter(lambda p: p.get(\"payload/info/reason\", \"\") in [\"shutdown\", \"aborted-session\"])", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "### Get the main pings, with the relevant reasons, and filter them by buildId", "cell_type": "markdown", "metadata": {}}, {"source": "Let's fetch Telemetry submissions for some recent builds...", "cell_type": "markdown", "metadata": {}}, {"execution_count": 3, "cell_type": "code", "source": "build_ids = (\"20150722000000\", \"20150829999999\")\nmain_pings = get_pings(sc, app=\"Firefox\", channel=\"nightly\", build_id=build_ids, doc_type=\"main\", schema=\"v4\")#, fraction=0.1)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "... and extract only the attributes we need from the Telemetry submissions:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 4, "cell_type": "code", "source": "subset = get_pings_properties(main_pings, [\"payload/info/reason\",\n \"payload/info/subsessionId\"])", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Filter for the reasons we're interested in.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 5, "cell_type": "code", "source": "filteredSubset = get_interesting_reasons(subset)", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "Caching is fundamental as it allows for an iterative, real-time development workflow:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 6, "cell_type": "code", "source": "cached = filteredSubset.cache()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many pings are we looking at?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 7, "cell_type": "code", "source": "filteredSubset.count()", "outputs": [{"execution_count": 7, "output_type": "execute_result", "data": {"text/plain": "5652017"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "##Count the overlapping subsessionIds", "cell_type": "markdown", "metadata": {}}, {"source": "Let's find the subsessionIds belonging to \"shutdown\" main pings and the ones for \"aborted-session\" main pings.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 8, "cell_type": "code", "source": "shutdownSubsessionIds = cached.filter(lambda p: p.get(\"payload/info/reason\", \"\") == \"shutdown\").map(lambda p: p[\"payload/info/subsessionId\"])\nabortedSubsessionIds = cached.filter(lambda p: p.get(\"payload/info/reason\", \"\") == \"aborted-session\").map(lambda p: p[\"payload/info/subsessionId\"])", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many of them?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 9, "cell_type": "code", "source": "shutdownSubsessionIds.count()", "outputs": [{"execution_count": 9, "output_type": "execute_result", "data": {"text/plain": "5341041"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 10, "cell_type": "code", "source": "abortedSubsessionIds.count()", "outputs": [{"execution_count": 10, "output_type": "execute_result", "data": {"text/plain": "310976"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Find how many aborted session pings have the same subsessionId of shutdown pings.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 11, "cell_type": "code", "source": "overlappingIds = abortedSubsessionIds.intersection(shutdownSubsessionIds)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many overlapping/broken aborted session then?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 12, "cell_type": "code", "source": "overlappingIds.count()", "outputs": [{"execution_count": 12, "output_type": "execute_result", "data": {"text/plain": "1613"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.9", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment