Skip to content

Instantly share code, notes, and snippets.

# first foray into R, for data science and visualizations
# treemap of CERN contributions
url <- ""
CERN <- read.csv(url, header=TRUE, sep=",")
CERN$Contribution <- as.numeric(CERN$Contribution)
palette = "Set1",
imgurl <- "/Users/prasanth/logo_swan_letters.png"
sticker(imgurl, package="", s_x=1, s_y=1, s_width=.6, s_height=.518,
h_fill="#ffffff", h_color="#0053A1", h_size=1, filename="/Users/prasanth/hex_logo_swan_letters.png")
imgurl <- "/Users/prasanth/logo_swan_letters.png"
sticker(imgurl, package="", s_x=1, s_y=1, s_width=.6, s_height=.518,
h_fill="#ffffff", h_color="#FB6700", h_size=1, filename="/Users/prasanth/hex_logo_swan_letters_1.png")
imgurl <- "/Users/prasanth/logo_swan_letters.png"
sticker(imgurl, package="", s_x=1, s_y=1, s_width=.6, s_height=.518, url="", u_x=1.15, u_y=0.15, u_color="#0053A1",
#Determine the active namenode of HDFS, this is required as the webHDFS implementation doesn't redirect to active namenode
import os
import json
import xml.etree.ElementTree as ET
from urllib import urlopen
except ImportError:
from urllib.request import urlopen
From Spark 2.0 onwards column names are no longer case sensitive in some scenarios, this can be demonstrated by the following example
**Spark 1.6**
-bash-4.2$ cat /tmp/sample.json
{"test": "first test", "key": "key1"}
{"Test": "second test", "key": "key2"}
scala> val jDF ="/tmp/sample.json")
scala> jDF.printSchema
# Build docker image, assumes Dockerfile is in the current directory
docker build -t <name>:<tag> .
# Get bash into docker image
docker run -it <image> /bin/bash
# Get bash into running container
docker exec -it <image> /bin/bash
# Tail the docker logs