Skip to content

Instantly share code, notes, and snippets.

View bearloga's full-sized avatar
🎲
RAND()

Mikhail Popov bearloga

🎲
RAND()
View GitHub Profile
@bearloga
bearloga / wdqs-cocktails.R
Created April 21, 2017 18:48
Fetches cocktails and their ingredients from Wikidata using SPARQL and Wikidata Query Service
cocktails <- WikidataQueryServiceR::query_wikidata('
SELECT DISTINCT ?cocktailLabel ?ingredientLabel ?instanceOfLabel ?subclassLabel
WHERE
{
?cocktail wdt:P31/wdt:P279* wd:Q134768 .
?cocktail wdt:P186 ?ingredient .
OPTIONAL {
?ingredient wdt:P279 ?subclass .
}
OPTIONAL {
@bearloga
bearloga / mkrproj.sh
Last active January 3, 2018 14:59
A bash shell script that can be used to turn the current directory into an RStudio project, opening the project in RStudio after creating it.
#!/bin/bash
# Usage: mkproj [projectname]
# projectname defaults to name of current directory
template="Version: 1.0\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSaveHistory: Default\n\nEnableCodeIndexing: Yes\nUseSpacesForTab: Yes\nNumSpacesForTab: 4\nEncoding: UTF-8\n\nRnwWeave: knitr\nLaTeX: pdfLaTeX"
wd=$(basename `pwd`)
if [ -z $1 ]; then
@bearloga
bearloga / dl2csv.R
Created November 9, 2017 18:15
Some code for converting an HTML description list into an R data.frame that can then be exported as a CSV.
library(rvest)
x <- "<dl>
<dt>Coffee</dt>
<dd>Black hot drink</dd>
<dt>Milk</dt>
<dd>White cold drink</dd>
</dl>"
y <- read_html(x)
@bearloga
bearloga / druid-csv-spec_country-all.json
Last active June 25, 2019 05:48
Druid ingestion spec for gzipped CSV data
{
"type": "index_hadoop",
"spec": {
"ioConfig": {
"type": "hadoop",
"inputSpec": {
"paths": "hdfs://analytics-hadoop/tmp/gsc-all.csv.gz",
"type": "static"
}
},
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# daily_stats has 5 columns used by this code: date, time_spent_10/25/50/75/90
ggplot(daily_stats) +
geom_segment(aes(x = date, xend = date, y = time_spent_10, yend = time_spent_90),
size = 1, color = "#00af89") +
geom_segment(aes(x = date, xend = date, y = time_spent_25, yend = time_spent_75),
size = 2, color = "#14866d") +
# geom_ribbon(aes(x = date, ymin = time_spent_lower, ymax = time_spent_upper), alpha = 0.3) +
# geom_line(aes(x = date, y = time_spent_middle)) +
geom_label(
@bearloga
bearloga / sql-murder-mystery-solution.md
Created October 13, 2019 01:42
A walkthrough of the solution to SQL Murder Mystery by Northwestern University Knight Lab. Solution by Mikhail Popov (@bearloga)

Solution to SQL Murdery Mystery

A walkthrough of the solution to SQL Murder Mystery by Northwestern University Knight Lab. Solution by Mikhail Popov

Prompt

A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database.

Witness reports

@bearloga
bearloga / engines.Rmd
Last active March 11, 2020 20:39
Automatically printing chunk engine in R Markdown
---
title: "Printing chunk engine via hook"
output: github_document
---
```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
print_engine_hook <- function(before, options, envir) {
@bearloga
bearloga / waxer-demo.ipynb
Created July 23, 2020 14:15
Demo of using {waxer} R package in a Jupyter Notebook to fetch different Wikipedia languages' pageviews with different access methods
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.