Skip to content

Instantly share code, notes, and snippets.

@j450h1
Created June 13, 2020 23:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save j450h1/57cef2992d91787f341fa92e193adc10 to your computer and use it in GitHub Desktop.
Save j450h1/57cef2992d91787f341fa92e193adc10 to your computer and use it in GitHub Desktop.
unbalanced_data
user_count = user_df.groupby('churn').count()
user_count = user_count.withColumn('percent', col('count')/sum('count').over(Window.partitionBy()))
# multiply by 100 and round
user_count = user_count.withColumn("percent", round(user_count["percent"] * 100, 2))
user_count.orderBy('percent', ascending=False).show()
+-----+-----+-------+
|churn|count|percent|
+-----+-----+-------+
| 0| 173| 76.89|
| 1| 52| 23.11|
+-----+-----+-------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment