Skip to content

Instantly share code, notes, and snippets.

@XWilliamY
Created July 3, 2020 05:31
Show Gist options
  • Save XWilliamY/2ecc5cca97856a5add4b94ab15df41e4 to your computer and use it in GitHub Desktop.
Save XWilliamY/2ecc5cca97856a5add4b94ab15df41e4 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working With Youtube Comments\n",
"\n",
"## Introduction\n",
"\n",
"A few questions motivated me to start this project. What are people saying about Blackpink, the individual members, and YG? What is the general sentiment, and does this differ for individual members, the entertainment company, and across different languages? \n",
"\n",
"I chose to analyze YouTube comments to answer these questions, and obtained them from Blackpink's latest prerelease single, <i> How You Like That </i>.\n",
"\n",
"# Setup\n",
"\n",
"Before anything, we need to gather a sizable dataset of Youtube comments. For this, I chose to use YouTube's official Data API. I averaged just about 2400 comments per API key, so I used a few to obtain more comments for the dataset. I also chose to ignore the replies of a commentThread, saving instead just the top comment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading in CSV Files\n",
"We'll use pandas to work with our generated csv files, missingno to quickly visualize whether we have any missing values, regex to do some regular expression matching, and numpy to generate indices for a task later."
]
},
{
"cell_type": "code",
"execution_count": 538,
"metadata": {},
"outputs": [],
"source": [
"# but first, import necessary libraries\n",
"import pandas as pd\n",
"import missingno as msno\n",
"import numpy as np\n",
"import emoji\n",
"import regex\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 539,
"metadata": {},
"outputs": [],
"source": [
"headers = [\"Comment\", \"Comment ID\", \"Reply Count\", \"Like Count\", \"Viewer Rating\"]\n",
"\n",
"def read_in_csvs(list_filenames):\n",
" return pd.concat([pd.read_csv(filename+\".csv\", names=headers) for filename in list_filenames])\n",
"\n",
"filenames = [\"aya_blackpink_time_3\", \"aya_blackpink_time_4\", \n",
" \"normal_blackpink_time_3\", \"normal_blackpink_time_4\"]\n",
"\n",
"blackpink = read_in_csvs(filenames)"
]
},
{
"cell_type": "code",
"execution_count": 540,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"(3320, 5)"
]
},
"execution_count": 540,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"blackpink.shape"
]
},
{
"cell_type": "code",
"execution_count": 541,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Comment</th>\n",
" <th>Comment ID</th>\n",
" <th>Reply Count</th>\n",
" <th>Like Count</th>\n",
" <th>Viewer Rating</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>곡이 지난곡을 재생산하는느낌은지울수가없네요. 변화가필요합니다.</td>\n",
" <td>UgzZok9hFyZ6KXL7dx54AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Yay already 10 M likes</td>\n",
" <td>UgwGx_T_qoH8wfxyACV4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>#RESPECTLISA STOP WITH THE BULLYING</td>\n",
" <td>Ugztt7qshUYp2Eprybh4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>does no one wanna talk about Rosé with balacla...</td>\n",
" <td>UgxJvrhriUipNrhc9Nl4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>I HOPE THE YG CEO LET BLACKPINK AND OTHER ART...</td>\n",
" <td>UgymlRqv2PpCO_oOzYd4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Comment \\\n",
"0 곡이 지난곡을 재생산하는느낌은지울수가없네요. 변화가필요합니다. \n",
"1 Yay already 10 M likes \n",
"2 #RESPECTLISA STOP WITH THE BULLYING \n",
"3 does no one wanna talk about Rosé with balacla... \n",
"4 I HOPE THE YG CEO LET BLACKPINK AND OTHER ART... \n",
"\n",
" Comment ID Reply Count Like Count Viewer Rating \n",
"0 UgzZok9hFyZ6KXL7dx54AaABAg 0 0 NaN \n",
"1 UgwGx_T_qoH8wfxyACV4AaABAg 0 0 NaN \n",
"2 Ugztt7qshUYp2Eprybh4AaABAg 0 0 NaN \n",
"3 UgxJvrhriUipNrhc9Nl4AaABAg 0 0 NaN \n",
"4 UgymlRqv2PpCO_oOzYd4AaABAg 0 0 NaN "
]
},
"execution_count": 541,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"blackpink.head()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment