chutten/New+Report.ipynb Secret

## New+Report.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              New+Report.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## New+Report.py

# coding: utf-8
---
title: This is a Knowledge Template Header
authors:
- sally_smarts
- wesley_wisdom
tags:
- startup
- firefox
- example
created_at: 2016-06-29
updated_at: 2016-06-30
tldr: This is short description of the content and findings of the post.
---
# *NOTE: In the TL,DR, optimize for **clarity** and **comprehensiveness**. The goal is to convey the post with the least amount of friction, especially since ipython/beakers require much more scrolling than blog posts. Make the reader get a correct understanding of the post's takeaway, and the points supporting that takeaway without having to strain through paragraphs and tons of prose. Bullet points are great here, but are up to you. Try to avoid academic paper style abstracts.*
#
#  - Having a specific title will help avoid having someone browse posts and only finding vague, similar sounding titles
#  - Having an itemized, short, and clear tl,dr will help readers understand your content
#  - Setting the reader's context with a motivation section makes someone understand how to judge your choices
#  - Visualizations that can stand alone, via legends, labels, and captions are more understandable and powerful
#

# ### Motivation

# *NOTE: optimize in this section for **context setting**, as specifically as you can. For instance, this post is generally a set of standards for work in the repo. The specific motivation is to have least friction to current workflow while being able to painlessly aggregate it later.*
#
# The knowledge repo was created to consolidate research work that is currently scattered in emails, blogposts, and presentations, so that people didn't redo their work.

# ### Putting Big Bold Headers with Clear Takeaways Will Help Us Aggregate Later

# In[1]:

import pandas as pd
import numpy as np
import matplotlib

from matplotlib import pyplot as plt
from moztelemetry.dataset import Dataset
from moztelemetry import get_pings_properties, get_one_ping_per_client


# The goal of this example is to determine if Firefox has a similar startup time distribution on all Operating Systems. Let's start by fetching 10% of Telemetry submissions for a given submission date...

# In[2]:

Dataset.from_source("telemetry").schema


# In[12]:

pings = Dataset.from_source("telemetry")     .where(docType='main')     .where(submissionDate="20170328")     .records(sc, sample=0.01)


# In[13]:

subset = get_pings_properties(pings, ["clientId",
                                      "payload/addonHistograms"])


# In[14]:

full_count = subset.count()
full_count


# In[15]:

filtered_count = subset.filter(lambda p: p["payload/addonHistograms"] is not None).count()
filtered_count


# In[16]:

1.0 * filtered_count / full_count


# In[17]:

filtered = subset.filter(lambda p: p["payload/addonHistograms"] is not None)


# In[23]:

addons.take(1)


# In[24]:

f.keys()


# In[ ]:


# In[25]:

addons = filtered.flatMap(lambda p: p['payload/addonHistograms'].keys()).map(lambda key: (key, 1))


# In[26]:

addons.countByKey()


# In[22]:

f = filtered.take(1)[0]['payload/addonHistograms']
f


# In[18]:

filtered.take(2)

	# coding: utf-8
	---
	title: This is a Knowledge Template Header
	authors:
	- sally_smarts
	- wesley_wisdom
	tags:
	- startup
	- firefox
	- example
	created_at: 2016-06-29
	updated_at: 2016-06-30
	tldr: This is short description of the content and findings of the post.
	---
	# NOTE: In the TL,DR, optimize for clarity* and comprehensiveness. The goal is to convey the post with the least amount of friction, especially since ipython/beakers require much more scrolling than blog posts. Make the reader get a correct understanding of the post's takeaway, and the points supporting that takeaway without having to strain through paragraphs and tons of prose. Bullet points are great here, but are up to you. Try to avoid academic paper style abstracts.*
	#
	# - Having a specific title will help avoid having someone browse posts and only finding vague, similar sounding titles
	# - Having an itemized, short, and clear tl,dr will help readers understand your content
	# - Setting the reader's context with a motivation section makes someone understand how to judge your choices
	# - Visualizations that can stand alone, via legends, labels, and captions are more understandable and powerful
	#

	# ### Motivation

	# NOTE: optimize in this section for context setting, as specifically as you can. For instance, this post is generally a set of standards for work in the repo. The specific motivation is to have least friction to current workflow while being able to painlessly aggregate it later.
	#
	# The knowledge repo was created to consolidate research work that is currently scattered in emails, blogposts, and presentations, so that people didn't redo their work.

	# ### Putting Big Bold Headers with Clear Takeaways Will Help Us Aggregate Later

	# In[1]:

	import pandas as pd
	import numpy as np
	import matplotlib

	from matplotlib import pyplot as plt
	from moztelemetry.dataset import Dataset
	from moztelemetry import get_pings_properties, get_one_ping_per_client


	# The goal of this example is to determine if Firefox has a similar startup time distribution on all Operating Systems. Let's start by fetching 10% of Telemetry submissions for a given submission date...

	# In[2]:

	Dataset.from_source("telemetry").schema


	# In[12]:

	pings = Dataset.from_source("telemetry") .where(docType='main') .where(submissionDate="20170328") .records(sc, sample=0.01)


	# In[13]:

	subset = get_pings_properties(pings, ["clientId",
	"payload/addonHistograms"])


	# In[14]:

	full_count = subset.count()
	full_count


	# In[15]:

	filtered_count = subset.filter(lambda p: p["payload/addonHistograms"] is not None).count()
	filtered_count


	# In[16]:

	1.0 * filtered_count / full_count


	# In[17]:

	filtered = subset.filter(lambda p: p["payload/addonHistograms"] is not None)


	# In[23]:

	addons.take(1)


	# In[24]:

	f.keys()


	# In[ ]:




	# In[25]:

	addons = filtered.flatMap(lambda p: p['payload/addonHistograms'].keys()).map(lambda key: (key, 1))


	# In[26]:

	addons.countByKey()


	# In[22]:

	f = filtered.take(1)[0]['payload/addonHistograms']
	f


	# In[18]:

	filtered.take(2)