Skip to content

Instantly share code, notes, and snippets.

@cesar1091
Created October 1, 2022 23:16
Show Gist options
  • Save cesar1091/f56e47cbaadb8324389ada01f912d232 to your computer and use it in GitHub Desktop.
Save cesar1091/f56e47cbaadb8324389ada01f912d232 to your computer and use it in GitHub Desktop.
orders = spark.read.format("csv").option("inferSchema","true").schema(customSchema).load("/public/retail_db/orders/part-00000")
orders.write.format("hive").saveAsTable("orders")
result = spark.sql("select count(*) as count,date_format(order_date,'YYYYMM') as month from orders group by date_format(order_date, 'YYYYMM')")
result.write.option("compression","uncompressed").format("parquet").save("/user/vagrant/lab1/pregunta8/resultado")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment