Back in the old days, when many data sets were still small, stem-and-leaf plots were a popular method of representing quantitative data. The example data shown in the text area comes from the cover of John Tukey's Exploratory Data Analysis. The stem-and-leaf plot updates as you change the data. Try adding fractions and negative values. Hover over the leaves to see the original values.
This interactive visualization demonstrates the Stochastic Outlier Selection (SOS) applied to roll call voting data. It was first presented at the NYC Machine Learning meetup on November 21, 2013. SOS is an unsupervised outlier-selection algorithm by J.H.M. Janssens, F. Huszar, E.O. Postma, and H.J. van den Herik (2012). It employs the concept of affinity to quantify the relationship between data points and subsequently computes an outlier probability for each data point. Intuitively, a data point is selected as an outlier when the other data points have insufficient affinity with it.
The data set contains 103 data points (senators) and 172 features (votes). The dissimilarity between the data points is the Euclidean distance. Each circle in the scatter plot represents a senator, of which the location is determined by applying the non-linear dimensionality reduction technique [t-SNE](http://homepage.tudelf
#!/bin/bash | |
# Hacked together by JeroenJanssens.com on 2013-12-10 | |
# Requires: https://github.com/joewalnes/websocketd | |
# Run: websocketd --devconsole --port 8080 ./chat.sh | |
echo "Please enter your name:"; read USER | |
echo "[$(date)] ${USER} joined the chat" >> chat.log | |
echo "[$(date)] Welcome to the chat ${USER}!" | |
tail -n 0 -f chat.log --pid=$$ | grep --line-buffered -v "] ${USER}>" & | |
while read MSG; do echo "[$(date)] ${USER}> ${MSG}" >> chat.log; done |
#!/usr/bin/env Rscript | |
num.words <- as.integer(commandArgs(trailingOnly = TRUE)) | |
f <- file("stdin") | |
input.lines <- readLines(f) | |
close(f) | |
full.text <- tolower(paste(input.lines, collapse = " ")) | |
splits <- gregexpr("\\w+", full.text) | |
words.all <- (regmatches(full.text, splits)[[1]]) | |
words.unique <- as.data.frame(table(words.all)) | |
words.sorted <- words.unique[order(-words.unique$Freq),] |
#!/usr/bin/env python | |
# The trick is to overwrite the file with spaces till the first newline. | |
# Only works if the program that reads it ignores empty lines. | |
import sys | |
filename = sys.argv[1] | |
f = open(filename, "r+b") | |
n = 0 | |
while f.read(1) != "\n": | |
n += 1 |
# make sure that you have the R package `ggmap` installed | |
curl -s http://api.citybik.es/citi-bike-nyc.json > citibikes.json | |
< citibikes.json jq -r '.[] | [.lat/1000000,.lng/1000000,.bikes] | @csv' | header -a lat,lng,bikes > citibikes.csv | |
< citibikes.csv Rio -vge 'require(ggmap); qmap("NYC", zoom=14) + geom_point(data=df, aes(x=lng, y=lat, size=bikes))' > citibikes.png |
#' Cache the result of an expression. | |
#' | |
#' Use \code{options(cache.path = "...")} to change the cache directory (which | |
#' is the current working directory by default). | |
#' | |
#' @param expr expression to evaluate | |
#' @param key basename for cache file | |
#' @param ignore_cache evalute expression regardless of cache file? | |
#' @return result of expression or read from cache file | |
#' |
If your R
script uses senstive information such as a password, then it's best to keep this in a seperate file (and perhaps outside the project's repository). Moreover, if you're giving a live demo using RStudio, then you should avoid putting this senstive information in your global environment.
If you put it in a YAML file, say .my_project.yaml
, which may look as follows:
---
api_service:
username: foo
password: bar123!
## Run RStudio Desktop | |
$ rstudio-bin | |
## Attach debugger | |
$ sudo gdb -p $(pgrep rsession) | |
(gdb) cont | |
Continuing. | |
## In RStudio, try to connect to Aster | |
> library(TeradataAsterR) |
Some jq
examples translated from https://github.com/jsonlines/guide
jq
can be downloaded from https://stedolan.github.io/jq/download/.
curl https://raw.githubusercontent.com/jsonlines/guide/master/datagov100.json > data.json
$ < data.json jq '.name' | head -n 6