Skip to content

Instantly share code, notes, and snippets.

@miafreidenburg
Created October 6, 2023 00:26
Show Gist options
  • Save miafreidenburg/27ed1967ec91c471dedf64d18af485fa to your computer and use it in GitHub Desktop.
Save miafreidenburg/27ed1967ec91c471dedf64d18af485fa to your computer and use it in GitHub Desktop.
Spotify: July Top 50 - Cleaning Log
Cleaning Steps Notes
1 Remove [, ], and ' characters in the artists column Using find and replace; characters are present because of list format in Python
2 Manually inspect artist names and fix as needed Beyoncé fixed
3 Remove index column Index column present because of DataFrame format in Python
4 Move rank column from last to first in the column order Cut/paste from the end of the data set to the front
5 Rename track_title, track_id, and track_artists columns Removed "track_" from column names for simplicity
6 Rename title column and duplicate for abbreviated version Changed from "title" to "title_long"; copied values from "title_long" to new "title" column
7 Manually inspect song titles and abbreviate or standardize as needed Primary changes: removing superfluous artist names, removing show/movie titles, and standardizing formatting
8 Create column for song duration in seconds Using a simple formula and copy/pasting as values; rounded to whole seconds
9 Create column for song duration in minutes Using a simple formula and copy/pasting as values; rounded to one decimal place
10 Export "CLEAN" sheet as a .csv file Named "july_top_50_clean.csv"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment