Skip to content

Instantly share code, notes, and snippets.

@mreid-moz
mreid-moz / computer.py
Created December 23, 2019 15:47
IntcodeComputer from Advent of Code 2019
import logging
import itertools
class IntcodeComputer:
extra_memory = 100000
def __init__(self, program, label="Computron"):
self.halt = False
self.program = program + [0] * IntcodeComputer.extra_memory
self.label = label
self.offset = 0
# Usage:
# cd ~/mozilla/github/telemetry-streaming/configs
# for f in $(ls *.json); do
# python make_view.py $f
# done
### Use the "simple" path for rocket.
# python make_view.py rocket_android_events_schemas.json simplify
from collections import defaultdict
import json
@mreid-moz
mreid-moz / Data anomalies.txt
Created August 2, 2019 13:53
Bug 1553247 - Canonical list of known data anomalies
{
"anomalies": [
{
"summary": "Armagaddon data deletion",
"start_time": "2019-05-04T11:00:00Z",
"end_time": "2019-05-11T11:00:00Z",
"info_url": "https://bugzilla.mozilla.org/show_bug.cgi?id=1550787",
"reference_urls": [
"https://blog.mozilla.org/blog/2019/05/09/what-we-do-when-things-go-wrong/",
"https://hacks.mozilla.org/2019/05/technical-details-on-the-recent-firefox-add-on-outage/",
@mreid-moz
mreid-moz / bq_udf.txt
Created March 15, 2019 14:06
Using a BigQuery udf from the bq cli
$ cat udf.sql
CREATE TEMP FUNCTION foo(v STRING) AS (
(SELECT concat(v, 'foo'))
);
$ cat bq_test.sql
SELECT os, foo(os) FROM telemetry.clients_daily_v6 WHERE submission_date_s3 = '2019-02-10' LIMIT 3
$ cat udf.sql bq_test.sql | bq query
Waiting on bqjob_r167fb1853f68d6fd_000001694a7a7543_1 ... (0s) Current status: DONE
@mreid-moz
mreid-moz / dejsonlz4.py
Created July 4, 2018 16:55
Decompress ".jsonlz4" files and print their contents
# Uncompress ".jsonlz4" files. See also:
# https://github.com/avih/dejsonlz4
# Requires lz4:
# $ pip install lz4
import lz4.block
import sys
with open(sys.argv[1], "rb") as fin:
content = fin.read()
@mreid-moz
mreid-moz / pin_paradox.py
Created June 19, 2018 15:17
Like the birthday paradox, but for banking PINs. How many people should it take before you expect a duplicate PIN?
import random
### Converging on 124.908224 after 125000 iterations
iterations = 0
total_pins = 0
max_pin = 9999
bucket_count = max_pin / 10
histogram = [0] * bucket_count
@mreid-moz
mreid-moz / test_repartition.scala
Created January 15, 2018 18:15
Test repartitioning behaviour when writing parquet data.
import java.util.UUID.randomUUID
import scala.sys.process._
import com.mozilla.telemetry.utils.getOrCreateSparkSession
import java.util.zip.CRC32
val spark = getOrCreateSparkSession("test")
spark.sparkContext.setLogLevel("WARN")
import spark.implicits._
def getSampleId(clientId: String, modulus: Int) = {
@mreid-moz
mreid-moz / make_test_parquet_data.py
Last active September 24, 2017 22:39
Generate sample partitioned parquet data using pyarrow
# Generate a sample dataset with two partitioning fields
# `submission_date_s3` and `sample_id` and one or more
# actual parquet field.
#
# Arrow and Parquet reference material at
# https://arrow.apache.org/docs/python/parquet.html
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
#!/bin/bash
if [[ -z "$bucket" || -z "$date" ]]; then
echo "Missing arguments!" 1>&2
exit 1
fi
git clone https://github.com/mreid-moz/telemetry-batch-view.git
cd telemetry-batch-view
git checkout addons_view
@mreid-moz
mreid-moz / structs.py
Last active September 2, 2016 12:33 — forked from peterbe/structs.scala
StructType([
StructField("additional_minidumps", ArrayType(StringType(), containsNull = False), nullable = True),
StructField("addons", ArrayType(StringType(), containsNull = False), nullable = True),
StructField("addons_checked", BooleanType(), nullable = False),
StructField("address", StringType(), nullable = True),
StructField("app_notes", StringType(), nullable = True),
StructField("build_id", StringType(), nullable = True),
StructField("classifications", StructType([
StructField("jit", StructType([
StructField("category", StringType(), nullable = True),