Skip to content

Instantly share code, notes, and snippets.

@chutten
Last active April 3, 2017 21:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chutten/0e91b0cc2cfe4cacc244a0db95a3810d to your computer and use it in GitHub Desktop.
Save chutten/0e91b0cc2cfe4cacc244a0db95a3810d to your computer and use it in GitHub Desktop.
New+Report
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# coding: utf-8
---
title: This is a Knowledge Template Header
authors:
- sally_smarts
- wesley_wisdom
tags:
- startup
- firefox
- example
created_at: 2016-06-29
updated_at: 2016-06-30
tldr: This is short description of the content and findings of the post.
---
# *NOTE: In the TL,DR, optimize for **clarity** and **comprehensiveness**. The goal is to convey the post with the least amount of friction, especially since ipython/beakers require much more scrolling than blog posts. Make the reader get a correct understanding of the post's takeaway, and the points supporting that takeaway without having to strain through paragraphs and tons of prose. Bullet points are great here, but are up to you. Try to avoid academic paper style abstracts.*
#
# - Having a specific title will help avoid having someone browse posts and only finding vague, similar sounding titles
# - Having an itemized, short, and clear tl,dr will help readers understand your content
# - Setting the reader's context with a motivation section makes someone understand how to judge your choices
# - Visualizations that can stand alone, via legends, labels, and captions are more understandable and powerful
#
# ### Motivation
# *NOTE: optimize in this section for **context setting**, as specifically as you can. For instance, this post is generally a set of standards for work in the repo. The specific motivation is to have least friction to current workflow while being able to painlessly aggregate it later.*
#
# The knowledge repo was created to consolidate research work that is currently scattered in emails, blogposts, and presentations, so that people didn't redo their work.
# ### Putting Big Bold Headers with Clear Takeaways Will Help Us Aggregate Later
# In[1]:
import pandas as pd
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
from moztelemetry.dataset import Dataset
from moztelemetry import get_pings_properties, get_one_ping_per_client
# The goal of this example is to determine if Firefox has a similar startup time distribution on all Operating Systems. Let's start by fetching 10% of Telemetry submissions for a given submission date...
# In[2]:
Dataset.from_source("telemetry").schema
# In[12]:
pings = Dataset.from_source("telemetry") .where(docType='main') .where(submissionDate="20170328") .records(sc, sample=0.01)
# In[13]:
subset = get_pings_properties(pings, ["clientId",
"payload/addonHistograms"])
# In[14]:
full_count = subset.count()
full_count
# In[15]:
filtered_count = subset.filter(lambda p: p["payload/addonHistograms"] is not None).count()
filtered_count
# In[16]:
1.0 * filtered_count / full_count
# In[17]:
filtered = subset.filter(lambda p: p["payload/addonHistograms"] is not None)
# In[23]:
addons.take(1)
# In[24]:
f.keys()
# In[ ]:
# In[25]:
addons = filtered.flatMap(lambda p: p['payload/addonHistograms'].keys()).map(lambda key: (key, 1))
# In[26]:
addons.countByKey()
# In[22]:
f = filtered.take(1)[0]['payload/addonHistograms']
f
# In[18]:
filtered.take(2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment