Skip to content

Instantly share code, notes, and snippets.

Last active September 25, 2020 01:33
Show Gist options
  • Save akaleeroy/e82b1f308b5bc60ae7ec448a936e5355 to your computer and use it in GitHub Desktop.
Save akaleeroy/e82b1f308b5bc60ae7ec448a936e5355 to your computer and use it in GitHub Desktop.
Parsing song titles on YouTube
/* eslint-disable */
// YouTube Music Uploader Hall of Shame
// Trying to grok the range of malformed input in song title strings
'BPC335 - Maxime Iko "Concilium"', // wrong order (catalog number before everything else), extra info (catalog number), bad separator `"`, bad extra separator `-`
'"Pollution" by Tom Lehrer', // wrong order (`Artist - Title` reversed), quotes, bad separator `by`
'DIS IZ WHY I\'M HOT (zef remix) - Die Antwoord', // `Artist - Title` reversed, bad case
'Man with no name - Teleport (Original mix). HQ', // bad case, noisy `(Original mix)`, extra info `HQ`, bad extra separator `.`
'Varg — Under Beige Nylon', // uneven spaces, bad separator `—`
'varg - under beige nylon - 46bpm', // bad case, bad extra separator `-`, extra info `46bpm`
'Kangding Ray AMBER DECAY', // no separator, bad case
'Falling in drop C.', // no artist at all, dubious punctuation `.`
'Asa Moto - Playtime - DEEWEE030', // bad extra separator `-`, extra info (catalogue number)
'Voodoo People - Quadsep - 1995', // bad extra separator `-`, extra info (year)
'Teste - The Wipe (5am Synaptic) - Plus 8 Records - 1992', // bad extra separator `-`, extra info (label, year)
'Varg | I Did Not Always Appear This Way [Ascetic House 2015]', // bad separator `|`, label and year unseparated inside brackets
'Pig&Dan -The Saint Job San (Lee Van Dowski Remix)', // uneven spaces, incorrect artist spelling
'NATHAN FAKE, THE TURTLE (HARD ISLANDS, 2009)', // bad case, bad separator `,`, extra info (album, year)
'Ambi Sessions 12/11 {Ambient Techno-Tribal-Dub Techno-Meditative}', // no artist, ambiguous date, extra info (genres), bad extra separator `-`
'PILLDRIVER // PITCH HIKER', // bad case, bad separator `\\`
'Wu-Tang Clan -- One Blood instrumental', // bad separator, bad extra info (no parens, wrong case)
'Mobb Deep "Peer Pressure"', // bad separator
// OK these next ones are not so bad
'The Prodigy - Voodoo People ( Parasense Rmx )', // spaces around parens
'Bernstein - Álom (Original Mix)', // uneven spaces, noisy `(Original Mix)`.
'Causa - Stages (Forthcoming Artikal Music UK)' // junk info in parens where `Remix` is expected
Copy link

Related: minimaxir/big-list-of-naughty-strings

The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment