Skip to content

Instantly share code, notes, and snippets.

@dvannoy
dvannoy / Baseball example
Last active June 25, 2016 21:11
Commands used for SQL on Hadoop: Getting Started. You can run these commands when following along with the slides from the presentation. These may not be perfect, but hopefully will be better than typing out what shows up in slides.
cd /home/cloudera/Downloads
hadoop fs -ls /
hadoop fs -mkdir -p /data/baseball/team/
hadoop fs -copyFromLocal baseball/team.csv /data/baseball/team/
CREATE EXTERNAL TABLE team (
year string,
league_id string,
team_id string,
DataFrame zoneDF = spark.Read()
.Option("header","true")
.Schema(taxiZoneSchema)
.Csv(taxiZoneSourcePath);
zoneDF.Write().Format("delta").Mode("overwrite").Save(taxiZonePath);
zoneDF.Show();
val zoneDF = spark.read.option("header","true").schema(taxiZoneSchema).csv(taxiZoneSourcePath)
zoneDF.write.format("delta").mode("overwrite").save(taxiZonePath)
zoneDF.show()
from pyspark.sql.functions import col, desc, regexp_replace, substring, to_date, from_json, explode, expr
from pyspark.sql.types import StructType, StringType
yellow_source_path = "wasbs://nyctlc@azureopendatastorage.blob.core.windows.net/yellow/puYear=2018/puMonth=*/*.parquet"
taxi_zone_source_path = "abfss://demo@datakickstartadls.dfs.core.windows.net/nyctaxi/lookups/taxi_zone_lookup.csv"
taxi_zone_path = "abfss://demo@datakickstartadls.dfs.core.windows.net/nyctaxi/lookups/taxi_zone"
taxi_rate_path = "abfss://demo@datakickstartadls.dfs.core.windows.net/nyctaxi/lookups/taxi_rate_code"
yellow_delta_path = "abfss://demo@datakickstartadls.dfs.core.windows.net/nyctaxi/tripdata/yellow_delta"
// Implicits provide many shortcuts, including conversion from Row into a specific type
import spark.implicits._
// Case class to use as type for each Row
case class VehicleStopRaw(
stop_id: String, stop_cause: String, service_area: String, subject_race: String,
subject_sex: String, subject_age: String, timestamp: String, stop_date: String,
stop_time: String, sd_resident: String, arrested: String, searched: String,
obtained_consent: String, contraband_found: String, property_seized: String)
@dvannoy
dvannoy / ksqldb_overview.md
Last active November 10, 2021 17:44
Notes and examples for ksqlDB