Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save georgf/60e5b99e9ad54b5d5c5c858b6e837cae to your computer and use it in GitHub Desktop.
Save georgf/60e5b99e9ad54b5d5c5c858b6e837cae to your computer and use it in GitHub Desktop.
"main" ping size distributions on Nightly
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# coding: utf-8
---
title: How big are the incoming "main" pings?
authors:
- georg_fritzsche
tags:
- ping size
- firefox
- main ping
created_at: 2017-05-25
updated_at: 2017-05-25
tldr: How big are the incoming "main" ping currently? Nearly all are under 400kb.
---
# ### How big are the incoming "main" pings?
# In[1]:
import pandas as pd
import numpy as np
import matplotlib
import json
from matplotlib import pyplot as plt
from moztelemetry.dataset import Dataset
from moztelemetry import get_pings_properties, get_one_ping_per_client
import pylab
get_ipython().magic(u'pylab inline')
# Based on a 10% submission sample, determine what the ping sizes are.
# As we don't have any meta field that tracks the real ping sizes, we estimate them using the serialized JSON string length.
# In[2]:
Dataset.from_source("telemetry").schema
# In[3]:
pings = Dataset.from_source("telemetry") .where(docType='main') .where(submissionDate=lambda x: int(x) >= 20170501 and int(x) < 20170514) .where(appUpdateChannel="nightly") .records(sc, sample=0.1)
# In[4]:
sizes = pings.map(lambda p: len(json.dumps(p)))
size_series = pd.Series(sizes.collect())
# Show the distribution of sizes in kb.
# In[5]:
(size_series / 1024).describe(percentiles=[0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999])
# In[8]:
size_series.hist(xrot=45)
# In[7]:
size_series.hist(xrot=45, log=True)
# In[ ]:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment