Skip to content

Instantly share code, notes, and snippets.

@oneryalcin
Created September 23, 2019 22:14
Show Gist options
  • Save oneryalcin/d9f6921311611fcc07764d86c0269b72 to your computer and use it in GitHub Desktop.
Save oneryalcin/d9f6921311611fcc07764d86c0269b72 to your computer and use it in GitHub Desktop.
6 Sparkify Session Count and Avg Song Count/Session
user_engagement = data\
.groupBy('userId', 'sessionId')\
.agg(F.max('itemInSession').alias('itemCount'))\
.groupBy('userId')\
.agg({"itemCount": "mean", "sessionId": "count"})\
.withColumnRenamed('count(sessionId)', 'sessionCount')\
.withColumnRenamed('avg(itemCount)', 'meanSongCount')\
.orderBy('userId')
user_engagement.show(10)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment