Skip to content

Instantly share code, notes, and snippets.

@bh1995
Created January 11, 2021 22:20
Show Gist options
  • Save bh1995/9cd5b585a952fb6fb0154aaed68fd6e9 to your computer and use it in GitHub Desktop.
Save bh1995/9cd5b585a952fb6fb0154aaed68fd6e9 to your computer and use it in GitHub Desktop.
# clean listings data
listings_df2 = listings_df.withColumn('price1', regexp_replace('price', '\\$', ''))
listings_df2 = listings_df2.withColumn('price1', regexp_replace('price1', ',', ''))
listings_df2 = listings_df2.withColumn('price1', col('price1').cast('float'))
listings_df2 = listings_df2.withColumn('review_scores_rating1', col('review_scores_rating').cast('float'))
listings_df2 = listings_df2.withColumn('listing_value', f.round(col('price1')/col('review_scores_rating1'),2))
listings_df2.createOrReplaceTempView("listings_df2")
# Make sure all the datatypes are now as expected
listings_df2.printSchema()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment