Created
September 23, 2019 22:19
-
-
Save oneryalcin/f4faade2c0210092fc852e698839708a to your computer and use it in GitHub Desktop.
7 Sparkify SQL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Show that we can do the same calculation above using SQL | |
data.createOrReplaceTempView('sparkify') | |
sub_query = """ | |
SELECT | |
userId, | |
sessionId, | |
max(itemInSession) as itemCount | |
FROM | |
sparkify | |
GROUP BY | |
userId, sessionId | |
ORDER BY | |
userId, sessionId | |
""" | |
query = """ | |
SELECT | |
userId, | |
count(userId) as sessionCount, | |
avg(itemCount) as meanSongCount | |
FROM | |
({sub_query}) | |
GROUP BY | |
userId | |
ORDER BY | |
userId | |
""".format(sub_query=sub_query) | |
spark.sql(query).show(10) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment