Skip to content

Instantly share code, notes, and snippets.

# first foray into R, for data science and visualizations
# treemap of CERN contributions
url <- "https://www.dropbox.com/s/4g7m67xhubizoxe/cern_contributions.csv?dl=1"
CERN <- read.csv(url, header=TRUE, sep=",")
CERN$Contribution <- as.numeric(CERN$Contribution)
library(treemap)
treemap(CERN,
index=c("Country"),
vSize="Contribution",
palette = "Set1",
imgurl <- "/Users/prasanth/logo_swan_letters.png"
sticker(imgurl, package="", s_x=1, s_y=1, s_width=.6, s_height=.518,
h_fill="#ffffff", h_color="#0053A1", h_size=1, filename="/Users/prasanth/hex_logo_swan_letters.png")
imgurl <- "/Users/prasanth/logo_swan_letters.png"
sticker(imgurl, package="", s_x=1, s_y=1, s_width=.6, s_height=.518,
h_fill="#ffffff", h_color="#FB6700", h_size=1, filename="/Users/prasanth/hex_logo_swan_letters_1.png")
imgurl <- "/Users/prasanth/logo_swan_letters.png"
sticker(imgurl, package="", s_x=1, s_y=1, s_width=.6, s_height=.518, url="https://swan.cern.ch", u_x=1.15, u_y=0.15, u_color="#0053A1",
#Determine the active namenode of HDFS, this is required as the webHDFS implementation doesn't redirect to active namenode
import os
import json
import xml.etree.ElementTree as ET
try:
from urllib import urlopen
except ImportError:
from urllib.request import urlopen
From Spark 2.0 onwards column names are no longer case sensitive in some scenarios, this can be demonstrated by the following example
**Spark 1.6**
-bash-4.2$ cat /tmp/sample.json
{"test": "first test", "key": "key1"}
{"Test": "second test", "key": "key2"}
scala> val jDF = sqlContext.read.json("/tmp/sample.json")
scala> jDF.printSchema
# Build docker image, assumes Dockerfile is in the current directory
docker build -t <name>:<tag> .
# Get bash into docker image
docker run -it <image> /bin/bash
# Get bash into running container
docker exec -it <image> /bin/bash
# Tail the docker logs