Skip to content

Instantly share code, notes, and snippets.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@maurodoglio
maurodoglio / add_pyspark_dependency.ipynb
Created February 17, 2017 14:41
Add PySpark dependency
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@maurodoglio
maurodoglio / crash_pings.json
Created January 4, 2017 14:18
Crash pings example
This file has been truncated, but you can view the full file.
[
{
"clientId": "5998b748-652b-47d5-8b41-2bf8b244357c",
"id": "ecb6d957-3dd4-47ec-ae3b-56618d11c170",
"environment": {
"profile": {
"resetDate": 16581,
"creationDate": 16319
},
"settings": {
from moztelemetry.dataset import Dataset
# Let's start selecting the `telemetry` dataset.
# This will load all the metadata about available dimensions and file locations.
dataset = Dataset.from_source('telemetry')
#The list of dimensions is now available on the `schema` attribute.
assert dataset.schema == [
u'submissionDate',
u'sourceName',
#!/usr/bin/pyspark
import logging
from os import environ
from mozaggregator.aggregator import aggregate_metrics
from mozaggregator.db import submit_aggregates
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName('telemetry-aggregates')
#!/bin/bash
conda install psycopg2 --yes
git clone https://github.com/maurodoglio/bz2db.git
pip install -r bz2db/requirements.txt
cd bz2db
python update_bugs.py
@maurodoglio
maurodoglio / example.py
Created August 16, 2016 13:33
Retrieve pings by docType and submissionDate
dataset = Dataset.from_source('telemetry)
filtered_dataset = dataset.where(docType='main',
submissionDate=lambda x: x.startswith('20160701'))
pings = filtered_dataset.records(sc, sample=0.1)
from moztelemetry import get_pings
pings = get_pings(sc, doc_type='OTHER', submission_date='20160714').filter(lambda x: x['meta']['docType'] == 'sync')
first_ping = pings.take(1)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.