Skip to content

Instantly share code, notes, and snippets.

Avatar
🎲
RAND()

Mikhail Popov bearloga

🎲
RAND()
View GitHub Profile
View name_pronunciations.md
@bearloga
bearloga / waxer-pmap.ipynb
Created Oct 22, 2020
Demo of using pmap to fetch metrics from Wikimedia Analytics Query Service API for different combinations of dimensions
View waxer-pmap.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bearloga
bearloga / upgrade_packages.R
Last active Oct 8, 2020
The script can be used to re-install packages after upgrading R (on Linux or Mac), since libraries cannot be reused between different minor versions (e.g. when upgrading 3.2.3 to 3.3.2). It detects when a package was installed from CRAN vs GitHub/Git and re-installs it using the appropriate func. Usage: `Rscript upgrade_packages.R`
View upgrade_packages.R
# WMF only:
if (file.exists("/etc/wikimedia-cluster")) {
message('Detected that this script is being run on a WMF machine ("', Sys.info()["nodename"], '"). Setting proxies...')
Sys.setenv("http_proxy" = "http://webproxy.eqiad.wmnet:8080")
Sys.setenv("https_proxy" = "http://webproxy.eqiad.wmnet:8080")
}
# General use:
message("Checking for a personal library...")
if (!dir.exists(Sys.getenv("R_LIBS_USER"))) {
@bearloga
bearloga / T261759.ipynb
Last active Sep 29, 2020
Analysis of MediaSearch interleaved A/B Test https://phabricator.wikimedia.org/T261759
View T261759.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bearloga
bearloga / example.Rmd
Created Jul 29, 2020
knitr engine for sympy code
View example.Rmd
---
title: "Sympy Engine"
output: html_notebook
editor_options:
chunk_output_type: inline
---
Assuming the "sympy" knitr engine has been registered:
```{sympy, results='asis'}
@bearloga
bearloga / waxer-demo.ipynb
Created Jul 23, 2020
Demo of using {waxer} R package in a Jupyter Notebook to fetch different Wikipedia languages' pageviews with different access methods
View waxer-demo.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bearloga
bearloga / engines.Rmd
Last active Mar 11, 2020
Automatically printing chunk engine in R Markdown
View engines.Rmd
---
title: "Printing chunk engine via hook"
output: github_document
---
```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
print_engine_hook <- function(before, options, envir) {
@bearloga
bearloga / sql-murder-mystery-solution.md
Created Oct 13, 2019
A walkthrough of the solution to SQL Murder Mystery by Northwestern University Knight Lab. Solution by Mikhail Popov (@bearloga)
View sql-murder-mystery-solution.md

Solution to SQL Murdery Mystery

A walkthrough of the solution to SQL Murder Mystery by Northwestern University Knight Lab. Solution by Mikhail Popov

Prompt

A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database.

Witness reports

@bearloga
bearloga / druid-csv-spec_country-all.json
Last active Jun 25, 2019
Druid ingestion spec for gzipped CSV data
View druid-csv-spec_country-all.json
{
"type": "index_hadoop",
"spec": {
"ioConfig": {
"type": "hadoop",
"inputSpec": {
"paths": "hdfs://analytics-hadoop/tmp/gsc-all.csv.gz",
"type": "static"
}
},
View logarithmic-time.R
# daily_stats has 5 columns used by this code: date, time_spent_10/25/50/75/90
ggplot(daily_stats) +
geom_segment(aes(x = date, xend = date, y = time_spent_10, yend = time_spent_90),
size = 1, color = "#00af89") +
geom_segment(aes(x = date, xend = date, y = time_spent_25, yend = time_spent_75),
size = 2, color = "#14866d") +
# geom_ribbon(aes(x = date, ymin = time_spent_lower, ymax = time_spent_upper), alpha = 0.3) +
# geom_line(aes(x = date, y = time_spent_middle)) +
geom_label(