Skip to content

Instantly share code, notes, and snippets.

@cesar1091
Last active October 1, 2022 22:45
Show Gist options
  • Save cesar1091/055a75c30f997b46378213e06fa729bc to your computer and use it in GitHub Desktop.
Save cesar1091/055a75c30f997b46378213e06fa729bc to your computer and use it in GitHub Desktop.
ProductSchema = StructType([StructField("product_id", IntegerType(), True),
StructField("product_category_id", IntegerType(), True),
StructField("product_name", StringType(), True),
StructField("product_description", StringType(), True),
StructField("product_price", FloatType(), True),
StructField("product_image", StringType(), True)])
product = spark.read.format("csv").option("inferSchema","true").schema(ProductSchema).load("/public/retail_db/products/part-00000")
product.createOrReplaceTempView("product")
result =spark.sql("select product_id, max(product_price) as max_price from product group by product_id")
result.createOrReplaceTempView("result")
result2 = spark.sql("select concat(product_id, '|', max_price) as data from result")
result2.repartition(1).write.option("compression","gzip").format("text").save("/user/vagrant/lab1/pregunta5/resultado")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment