Skip to content

Instantly share code, notes, and snippets.

@Dexterp37
Created September 22, 2015 17:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Dexterp37/52247591346eb3c26dac to your computer and use it in GitHub Desktop.
Save Dexterp37/52247591346eb3c26dac to your computer and use it in GitHub Desktop.
Find how many aborted-session, "main" pings have the same subsessionId of shutdown "main" pings (using clientId sampling)
Display the source blob
Display the rendered blob
Raw
{"nbformat_minor": 0, "cells": [{"source": "# Bug 1196852 - aborted-session pings may not get deleted properly", "cell_type": "markdown", "metadata": {}}, {"source": "We want to check on the scale of how many aborted-session pings have the same subsessionId of shutdown pings. That means, checking server-side:\n * for buildids >= 20150722\n * how many aborted-session pings & shutdown pings are there with the same subsession id", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "import binascii\nimport ujson as json\nimport pandas as pd\nimport numpy as np\n\nfrom moztelemetry import get_pings, get_pings_properties, get_one_ping_per_client, get_clients_history", "outputs": [], "metadata": {"scrolled": true, "collapsed": false, "trusted": true}}, {"source": "###Define some handy filtering functions", "cell_type": "markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", "source": "def get_interesting_reasons(pings):\n return pings.filter(lambda p: p.get(\"payload/info/reason\", \"\") in [\"shutdown\", \"aborted-session\"])\n\ndef sample_by_clientId(ping):\n client_id = ping.get(\"clientId\", None)\n return client_id and binascii.crc32(ping[\"clientId\"]) % 100 < 10", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "### Get the main pings, with the relevant reasons, and filter them by buildId", "cell_type": "markdown", "metadata": {}}, {"source": "Let's fetch Telemetry submissions for some recent builds...", "cell_type": "markdown", "metadata": {}}, {"execution_count": 3, "cell_type": "code", "source": "build_ids = (\"20150722000000\", \"20150829999999\")\nmain_pings = get_pings(sc, app=\"Firefox\", channel=\"nightly\", build_id=build_ids, doc_type=\"main\", schema=\"v4\", fraction=1.0)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Sample 10% of pings, per clientId...", "cell_type": "markdown", "metadata": {}}, {"execution_count": 4, "cell_type": "code", "source": "sampled_pings = main_pings.filter(sample_by_clientId)", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "... and extract only the attributes we need from the Telemetry submissions:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 5, "cell_type": "code", "source": "subset = get_pings_properties(sampled_pings, [\"meta/sampleId\",\n \"payload/info/reason\",\n \"payload/info/subsessionId\"])", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Filter for the reasons we're interested in.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 6, "cell_type": "code", "source": "filteredSubset = get_interesting_reasons(subset)", "outputs": [], "metadata": {"collapsed": true, "trusted": true}}, {"source": "Caching is fundamental as it allows for an iterative, real-time development workflow:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 7, "cell_type": "code", "source": "#cached = filteredSubset.cache()", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many pings are we looking at?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 8, "cell_type": "code", "source": "filteredSubset.count()", "outputs": [{"execution_count": 8, "output_type": "execute_result", "data": {"text/plain": "561858"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "##Count the overlapping subsessionIds", "cell_type": "markdown", "metadata": {}}, {"source": "Let's find the subsessionIds belonging to \"shutdown\" main pings and the ones for \"aborted-session\" main pings.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 9, "cell_type": "code", "source": "shutdownSubsessionIds = filteredSubset.filter(lambda p: p.get(\"payload/info/reason\", \"\") == \"shutdown\").map(lambda p: p[\"payload/info/subsessionId\"])\nabortedSubsessionIds = filteredSubset.filter(lambda p: p.get(\"payload/info/reason\", \"\") == \"aborted-session\").map(lambda p: p[\"payload/info/subsessionId\"])", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many of them?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 10, "cell_type": "code", "source": "shutdownSubsessionIds.count()", "outputs": [{"execution_count": 10, "output_type": "execute_result", "data": {"text/plain": "531689"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 11, "cell_type": "code", "source": "abortedSubsessionIds.count()", "outputs": [{"execution_count": 11, "output_type": "execute_result", "data": {"text/plain": "30070"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "Find how many aborted session pings have the same subsessionId of shutdown pings.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 12, "cell_type": "code", "source": "overlappingIds = abortedSubsessionIds.intersection(shutdownSubsessionIds)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "How many overlapping/broken aborted session then?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 13, "cell_type": "code", "source": "overlappingIds.count()", "outputs": [{"execution_count": 13, "output_type": "execute_result", "data": {"text/plain": "142"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.9", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment