Skip to content

Instantly share code, notes, and snippets.

@gavincampbell
Created March 10, 2021 08:15
Show Gist options
  • Save gavincampbell/62d1acbb89c27535c40fa771a94a9154 to your computer and use it in GitHub Desktop.
Save gavincampbell/62d1acbb89c27535c40fa771a94a9154 to your computer and use it in GitHub Desktop.
Files from post on running jupyter-pyspark in vscode
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.163.1/containers/debian
{
"name": "pyspark",
"image": "jupyter/pyspark-notebook",
// Set *default* container specific settings.json values on container create.
"settings": {
"terminal.integrated.shell.linux": "/bin/bash"
},
// Add the IDs of extensions you want installed when the container is created.
"extensions": [],
// Use 'forwardPorts' to make a list of ports inside the container available locally.
"forwardPorts": [8888,4040],
// Uncomment to use the Docker CLI from inside the container. See https://aka.ms/vscode-remote/samples/docker-from-docker.
// "mounts": [ "source=/var/run/docker.sock,target=/var/run/docker.sock,type=bind" ],
// Uncomment when using a ptrace-based debugger like C++, Go, and Rust
// "runArgs": [ "--cap-add=SYS_PTRACE", "--security-opt", "seccomp=unconfined" ],
// Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
"remoteUser": "jovyan"
// this is the user defined in the jupyter container
}
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = (SparkSession.builder.getOrCreate())
quizresults = spark.read.json('quizresults.json')
winner = quizresults.orderBy(desc("Points")).first()
spark.createDataFrame([winner]).show()
spark.stop()
{"Name":"Cletus Hogg","Points": 37},
{"Name":"Enos Strate","Points": 56},
{"Name":"Roscoe P Coltrane","Points": 68}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment