Skip to content

Instantly share code, notes, and snippets.

View MrPowers's full-sized avatar
🤓
Writing OSS code

Matthew Powers MrPowers

🤓
Writing OSS code
View GitHub Profile
@MrPowers
MrPowers / spark_dataframe_to_csv.sc
Last active April 4, 2016 15:49
Writing a Spark DataFrame to a CSV file
tx_cities.coalesce(1).write
.format("com.databricks.spark.csv")
.option("header", "true")
.save(System.getProperty("user.home") + "/Desktop/texas_cities")
@MrPowers
MrPowers / spark_dataframe.sc
Last active April 4, 2016 15:50
Creating a Spark DataFrame
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(System.getProperty("user.home") + "/Desktop/cities.csv")
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(System.getProperty("user.home") + "/Desktop/people/*.csv")
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(System.getProperty("user.home") + "/Desktop/people/*.gz")
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load("s3n://some_bucket/data/states/*.csv")
val accessKeyId = System.getenv("AWS_ACCESS_KEY_ID")
val secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY")
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKeyId)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretAccessKey)
df.coalesce(1).write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("s3n://some_bucket/data/states/all_states/")
@MrPowers
MrPowers / programming_websites.csv
Last active October 19, 2019 06:41
List of programming websites
website_url website_type main_language
news.ycombinator.com aggregator
mungingdata.com blog spark
m.signalvnoise.com blog rails
pgexercises.com train postgres
codequizzes.com train ruby
@MrPowers
MrPowers / person_data.csv
Last active October 19, 2019 06:57
Data for 100 fake people
person_name person_country
a China
b China
c China
d China
e China
f China
g China
h China
i China
import pathlib
import shutil
import deltalake as dl
import pandas as pd
import pyarrow.dataset as ds
from pyspark.sql import SparkSession
from delta import *
import chispa