Skip to content

Instantly share code, notes, and snippets.

@pavlov99
Last active September 17, 2016 08:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pavlov99/6ccd8f806f072d2faf221b8d609b3483 to your computer and use it in GitHub Desktop.
Save pavlov99/6ccd8f806f072d2faf221b8d609b3483 to your computer and use it in GitHub Desktop.
Medium blog: lookup-table-maintenance-in-hive snippets
val schedule = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load("lookup-example/san-jose-schedule-2016-2017.csv")
.select(
to_date(
unix_timestamp($"START_DATE", "MM/dd/yyyy").cast("timestamp")
) as "date",
when(
locate("San Jose", $"SUBJECT") === 1,
regexp_extract($"SUBJECT", "^San Jose at (.*)$", 1)
).otherwise(
regexp_extract($"SUBJECT", "^(.*) at San Jose$", 1)
) as "competitor"
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment