Skip to content

Instantly share code, notes, and snippets.

Local Spark Development

Infrastructure

Launch a Jupyter Notebook server using Docker and the jupyter/pyspark-notebook image on your local machine.

Copy and paste the below into your terminal.

docker run -d -v `pwd`:/home/jovyan -p 80:8888 jupyter/pyspark-notebook
@npearce
npearce / install-docker.md
Last active May 17, 2024 12:03
Amazon Linux 2 - install docker & docker-compose using 'sudo amazon-linux-extras' command

UPDATE (March 2020, thanks @ic): I don't know the exact AMI version but yum install docker now works on the latest Amazon Linux 2. The instructions below may still be relevant depending on the vintage AMI you are using.

Amazon changed the install in Linux 2. One no-longer using 'yum' See: https://aws.amazon.com/amazon-linux-2/release-notes/

Docker CE Install

sudo amazon-linux-extras install docker
sudo service docker start
@mrthomaskim
mrthomaskim / install-Python-AmazonLinux-20171023.log
Created June 13, 2018 20:44
Amazon Linux AMI, pyenv, virtualenv, Python, ... Hello, World!
### prerequisites
sudo yum groupinstall "Development Tools"
git --version
gcc --version
bash --version
python --version # (system)
sudo yum install -y openssl-devel readline-devel zlib-devel
sudo yum update
### install `pyenv`
@zcaceres
zcaceres / Nested-Routers-Express.md
Last active April 4, 2024 09:44
Child Routers in Express

Nested Routers in Express.js

Express makes it easy to nest routes in your routers. But I always had trouble accessing the request object's .params when you had a long URI with multiple parameters and nested routes.

Let's say you're building routes for a website www.music.com. Music is organized into albums with multiple tracks. Users can click to see a track list. Then they can select a single track and see a sub-page about that specific track.

At our application level, we could first have a Router to handle any requests to our albums.

const express = require('express');
@philosopherdog
philosopherdog / youtube-dl-cheat.txt
Created March 21, 2017 13:08
youtube-dl cheat sheet
# Basic Download:
youtube-dl URL
# Download Playlist, put in folder, and index with order:
youtube-dl -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' URL
# Download to /$uploader/$date/$title.$ext:
youtube-dl -o '%(uploader)s/%(date)s/%(title)s.%(ext)s' URL
# Download playlist starting from certain video:
@cosmincatalin
cosmincatalin / readme.md
Last active October 27, 2022 11:07
AWS EMR bootstrap to install R packages from CRAN

AWS EMR bootstrap to install R packages from CRAN

This bootstrap is useful if you want to deploy SparkR applications that run arbitrary code on the EMR cluster's workers. The R code will need to have its dependencies already installed on each of the workers, and will fail otherwise. This is the case if you use functions such as gapply or dapply.

How to use the bootstrap

  1. You will first have to download the gist to a file and then upload it to S3 in a bucket of your choice.
  2. Using the AWS EMR Console create a cluster and choose advanced options.
  3. In Step 3 you can configure your bootstraps. Choose to Configure and add a Custom action