The YouTube Spam Collection v. 1 is a public set of YouTube labeled comments that have been collected for spam research. It has five datasets composed by 1,956 real and non-encoded messages that were tagged as legitimate (ham) or spam.
This corpus has been collected using the YouTube Data API v3.
The samples were extracted from the comments section of 5 videos that were among the 10 most viewed on YouTube during the collection period. The table below lists the 5 datasets collected, the YouTube video ID, the number of samples in each class and the total number of samples per dataset.