This gist consists of Spark presentation examples.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val teams = sqlContext.table("lookup_example_nhl") | |
.withColumn("short_name", | |
when( | |
locate("New York", $"team") === 1, | |
regexp_extract($"team", "\\w+$", 0) | |
).when( | |
(locate("Devils", $"team") > 0) || | |
(locate("Kings", $"team") > 0) || | |
(locate("Sharks", $"team") > 0) || | |
(locate("Blues", $"team") > 0), |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE VIEW IF NOT EXISTS lookup_example_nhl | |
AS | |
SELECT * | |
FROM lookup_example_nhl_ext | |
WHERE team != 'Team'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sqlContext.table("lookup_example_nhl_ext").limit(2).show() | |
+-------------+--------+------------------+ | |
| team|division| conference| | |
+-------------+--------+------------------+ | |
| Team|Division| Conference| | |
|Boston Bruins|Atlantic|Eastern Conference| | |
+-------------+--------+------------------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select * from lookup_example_nhl_ext limit 5; | |
OK | |
Boston Bruins Atlantic Eastern Conference | |
Buffalo Sabres Atlantic Eastern Conference | |
Detroit Red Wings Atlantic Eastern Conference | |
Florida Panthers Atlantic Eastern Conference | |
Montreal Canadiens Atlantic Eastern Conference | |
Time taken: 0.1 seconds, Fetched: 5 row(s) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE EXTERNAL TABLE IF NOT EXISTS lookup_example_nhl_ext( | |
team String, | |
division String, | |
conference String) | |
COMMENT 'NHL teams' | |
ROW FORMAT DELIMITED | |
FIELDS TERMINATED BY ',' | |
LINES TERMINATED BY '\n' | |
STORED AS TEXTFILE | |
LOCATION 'hdfs:///user/<user>/lookup-example/nhl-lookup' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val schedule = sqlContext.read | |
.format("com.databricks.spark.csv") | |
.option("header", "true") | |
.option("inferSchema", "true") | |
.load("lookup-example/san-jose-schedule-2016-2017.csv") | |
.select( | |
to_date( | |
unix_timestamp($"START_DATE", "MM/dd/yyyy").cast("timestamp") | |
) as "date", | |
when( |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val schedule = sqlContext.read | |
.format("com.databricks.spark.csv") | |
.option("header", "true") | |
.option("inferSchema", "true") | |
.load("lookup-example/san-jose-schedule-2016-2017.csv") | |
.select( | |
to_date( | |
unix_timestamp($"START_DATE", "MM/dd/yyyy").cast("timestamp") | |
) as "date", | |
when( |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def combine[T](s: Seq[T]): Seq[Seq[T]] = | |
for { | |
len <- 1 to s.length | |
combinations <- s combinations len | |
} yield combinations | |
println(combine(List('a', 'b', 'c'))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df1.unionAll(df2.select(fd1.columns.map(df1(_)): _*)) |