Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save DharmendraRathor/46f690537a931a6232dbdc6673cafbb1 to your computer and use it in GitHub Desktop.
Save DharmendraRathor/46f690537a931a6232dbdc6673cafbb1 to your computer and use it in GitHub Desktop.
How to trim minutes and seconds from date filed in Pyspark datarame
How to trim minutes and seconds from date filed in Pyspark datarame.
Different apporaches to do that
Input : 2019-01-31 23:16:28
output : 2019-01-31 23:00:00
Not effecient
df.withColumn('tpep_pickup_datetime', concat(df.tpep_pickup_datetime.substr(0, 13), lit(‘:00:00’)))
Effecient then one mentioned above
df.withColumn(‘tpep_pickup_datetime',(round(unix_timestamp(col("tpep_pickup_datetime")) / 3600) * 3600)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment