-
-
Save miafreidenburg/27ed1967ec91c471dedf64d18af485fa to your computer and use it in GitHub Desktop.
Spotify: July Top 50 - Cleaning Log
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cleaning Steps | Notes | ||
---|---|---|---|
1 | Remove [, ], and ' characters in the artists column | Using find and replace; characters are present because of list format in Python | |
2 | Manually inspect artist names and fix as needed | Beyoncé fixed | |
3 | Remove index column | Index column present because of DataFrame format in Python | |
4 | Move rank column from last to first in the column order | Cut/paste from the end of the data set to the front | |
5 | Rename track_title, track_id, and track_artists columns | Removed "track_" from column names for simplicity | |
6 | Rename title column and duplicate for abbreviated version | Changed from "title" to "title_long"; copied values from "title_long" to new "title" column | |
7 | Manually inspect song titles and abbreviate or standardize as needed | Primary changes: removing superfluous artist names, removing show/movie titles, and standardizing formatting | |
8 | Create column for song duration in seconds | Using a simple formula and copy/pasting as values; rounded to whole seconds | |
9 | Create column for song duration in minutes | Using a simple formula and copy/pasting as values; rounded to one decimal place | |
10 | Export "CLEAN" sheet as a .csv file | Named "july_top_50_clean.csv" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment