Skip to content

Instantly share code, notes, and snippets.

@naiborhujosua
Created October 26, 2020 15:07
Show Gist options
  • Save naiborhujosua/98239a077412239a7aad24e997b7da07 to your computer and use it in GitHub Desktop.
Save naiborhujosua/98239a077412239a7aad24e997b7da07 to your computer and use it in GitHub Desktop.
# Look at the data
msd.show()
# Count the number of distinct userIds
user_count = msd.select("userId").distinct().count()
print("Number of users: ", user_count)
# Count the number of distinct songIds
song_count = msd.select("songId").distinct().count()
print("Number of songs: ", song_count)
@naiborhujosua
Copy link
Author

Let's get familiar with the Million Songs Echo Nest Taste Profile data subset. For purposes of this course, we'll just call it the Million Songs dataset or msd. Let's get the number of users and the number of songs. Let's also see which songs have the most plays from this subset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment