Skip to content

Instantly share code, notes, and snippets.

@dvannoy
dvannoy / ksqldb_overview.md
Last active November 10, 2021 17:44
Notes and examples for ksqlDB
// Implicits provide many shortcuts, including conversion from Row into a specific type
import spark.implicits._
// Case class to use as type for each Row
case class VehicleStopRaw(
stop_id: String, stop_cause: String, service_area: String, subject_race: String,
subject_sex: String, subject_age: String, timestamp: String, stop_date: String,
stop_time: String, sd_resident: String, arrested: String, searched: String,
obtained_consent: String, contraband_found: String, property_seized: String)
from pyspark.sql.functions import col, desc, regexp_replace, substring, to_date, from_json, explode, expr
from pyspark.sql.types import StructType, StringType
yellow_source_path = "wasbs://nyctlc@azureopendatastorage.blob.core.windows.net/yellow/puYear=2018/puMonth=*/*.parquet"
taxi_zone_source_path = "abfss://demo@datakickstartadls.dfs.core.windows.net/nyctaxi/lookups/taxi_zone_lookup.csv"
taxi_zone_path = "abfss://demo@datakickstartadls.dfs.core.windows.net/nyctaxi/lookups/taxi_zone"
taxi_rate_path = "abfss://demo@datakickstartadls.dfs.core.windows.net/nyctaxi/lookups/taxi_rate_code"
yellow_delta_path = "abfss://demo@datakickstartadls.dfs.core.windows.net/nyctaxi/tripdata/yellow_delta"
val zoneDF = spark.read.option("header","true").schema(taxiZoneSchema).csv(taxiZoneSourcePath)
zoneDF.write.format("delta").mode("overwrite").save(taxiZonePath)
zoneDF.show()
DataFrame zoneDF = spark.Read()
.Option("header","true")
.Schema(taxiZoneSchema)
.Csv(taxiZoneSourcePath);
zoneDF.Write().Format("delta").Mode("overwrite").Save(taxiZonePath);
zoneDF.Show();
@dvannoy
dvannoy / Baseball example
Last active June 25, 2016 21:11
Commands used for SQL on Hadoop: Getting Started. You can run these commands when following along with the slides from the presentation. These may not be perfect, but hopefully will be better than typing out what shows up in slides.
cd /home/cloudera/Downloads
hadoop fs -ls /
hadoop fs -mkdir -p /data/baseball/team/
hadoop fs -copyFromLocal baseball/team.csv /data/baseball/team/
CREATE EXTERNAL TABLE team (
year string,
league_id string,
team_id string,