Skip to content

Instantly share code, notes, and snippets.

@XWilliamY
Created July 3, 2020 05:27
Show Gist options
  • Save XWilliamY/6a0f0c094c154aca59abdccdbe420c57 to your computer and use it in GitHub Desktop.
Save XWilliamY/6a0f0c094c154aca59abdccdbe420c57 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working With Youtube Comments\n",
"\n",
"## Introduction\n",
"\n",
"A few questions motivated me to start this project. What are people saying about Blackpink, the individual members, and YG? What is the general sentiment, and does this differ for individual members, the entertainment company, and across different languages? \n",
"\n",
"I chose to analyze YouTube comments to answer these questions, and obtained them from Blackpink's latest prerelease single, <i> How You Like That </i>.\n",
"\n",
"# Setup\n",
"\n",
"Before anything, we need to gather a sizable dataset of Youtube comments. For this, I chose to use YouTube's official Data API. I averaged just about 2400 comments per API key, so I used a few to obtain more comments for the dataset. I also chose to ignore the replies of a commentThread, saving instead just the top comment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reading in CSV Files\n",
"We'll use pandas to work with our generated csv files, missingno to quickly visualize whether we have any missing values, regex to do some regular expression matching, and numpy to generate indices for a task later."
]
},
{
"cell_type": "code",
"execution_count": 538,
"metadata": {},
"outputs": [],
"source": [
"# but first, import necessary libraries\n",
"import pandas as pd\n",
"import missingno as msno\n",
"import numpy as np\n",
"import emoji\n",
"import regex\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 539,
"metadata": {},
"outputs": [],
"source": [
"headers = [\"Comment\", \"Comment ID\", \"Reply Count\", \"Like Count\", \"Viewer Rating\"]\n",
"\n",
"def read_in_csvs(list_filenames):\n",
" return pd.concat([pd.read_csv(filename+\".csv\", names=headers) for filename in list_filenames])\n",
"\n",
"filenames = [\"aya_blackpink_time_3\", \"aya_blackpink_time_4\", \n",
" \"normal_blackpink_time_3\", \"normal_blackpink_time_4\"]\n",
"\n",
"blackpink = read_in_csvs(filenames)"
]
},
{
"cell_type": "code",
"execution_count": 540,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"(3320, 5)"
]
},
"execution_count": 540,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"blackpink.shape"
]
},
{
"cell_type": "code",
"execution_count": 541,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Comment</th>\n",
" <th>Comment ID</th>\n",
" <th>Reply Count</th>\n",
" <th>Like Count</th>\n",
" <th>Viewer Rating</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>곡이 지난곡을 재생산하는느낌은지울수가없네요. 변화가필요합니다.</td>\n",
" <td>UgzZok9hFyZ6KXL7dx54AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Yay already 10 M likes</td>\n",
" <td>UgwGx_T_qoH8wfxyACV4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>#RESPECTLISA STOP WITH THE BULLYING</td>\n",
" <td>Ugztt7qshUYp2Eprybh4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>does no one wanna talk about Rosé with balacla...</td>\n",
" <td>UgxJvrhriUipNrhc9Nl4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>I HOPE THE YG CEO LET BLACKPINK AND OTHER ART...</td>\n",
" <td>UgymlRqv2PpCO_oOzYd4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Comment \\\n",
"0 곡이 지난곡을 재생산하는느낌은지울수가없네요. 변화가필요합니다. \n",
"1 Yay already 10 M likes \n",
"2 #RESPECTLISA STOP WITH THE BULLYING \n",
"3 does no one wanna talk about Rosé with balacla... \n",
"4 I HOPE THE YG CEO LET BLACKPINK AND OTHER ART... \n",
"\n",
" Comment ID Reply Count Like Count Viewer Rating \n",
"0 UgzZok9hFyZ6KXL7dx54AaABAg 0 0 NaN \n",
"1 UgwGx_T_qoH8wfxyACV4AaABAg 0 0 NaN \n",
"2 Ugztt7qshUYp2Eprybh4AaABAg 0 0 NaN \n",
"3 UgxJvrhriUipNrhc9Nl4AaABAg 0 0 NaN \n",
"4 UgymlRqv2PpCO_oOzYd4AaABAg 0 0 NaN "
]
},
"execution_count": 541,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"blackpink.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I was originally interested in whether or not a user who commented also rated the video, but since so many didn't, for the sake of using up less of my daily quota, I decided to just drop the viewer rating column."
]
},
{
"cell_type": "code",
"execution_count": 507,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Reply Count</th>\n",
" <th>Like Count</th>\n",
" <th>Viewer Rating</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>3320.000000</td>\n",
" <td>3320.000000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.349398</td>\n",
" <td>1.008434</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1.381091</td>\n",
" <td>1.520946</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>31.000000</td>\n",
" <td>33.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Reply Count Like Count Viewer Rating\n",
"count 3320.000000 3320.000000 0.0\n",
"mean 0.349398 1.008434 NaN\n",
"std 1.381091 1.520946 NaN\n",
"min 0.000000 0.000000 NaN\n",
"25% 0.000000 0.000000 NaN\n",
"50% 0.000000 1.000000 NaN\n",
"75% 0.000000 1.000000 NaN\n",
"max 31.000000 33.000000 NaN"
]
},
"execution_count": 507,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"blackpink.describe()"
]
},
{
"cell_type": "code",
"execution_count": 508,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x12ab26610>"
]
},
"execution_count": 508,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABcQAAAKaCAYAAAAQ6lqHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3XnYbWP9x/H39wzOCYckU2VIEkohDcbQRJn5RRkyVhQRkTnz3BEyRJlD/CLzUCJkSBEikRLxy5yZc5zv74/73qfV7jjO+Oy9n/1+Xde5PM9aaz/dz3W1nrXWZ9339xuZiSRJkiRJkiRJg92QTg9AkiRJkiRJkqSBYCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEnSIBERw9q+j06NRepGBuKSJEmSJEnSIBARkZljI2KmiPgGQGamobj0bwbikiRJkiRJUo+LiKE1/B4CbAEcHRG7gKG41DTszQ+RJEmSJEmS1K1qGP56RMwMbAN8DHgBOCQihmfmga1QPDOzs6OVOis8ByRJkiRJkqTeFhEzAbcDfwVuBh4Bvg58ENg/M/epxxmKq685Q1ySJEmSJEnqfXsCI4BtgQfrjPAbgG8Be0XEq5l5kDPF1e+sIS5JkiRJkiT1vvmBJzLzL60NmXkvMBq4BzggIr5TtxuGq28ZiEuSJEmSJEm97zFggYiYrc4CHwaQmfcAxwEvAgdFxM6dHKTUaQbikiRJkiRJUo+IiDfK834NvAocHBGzZObYKIYA8wKXAWcCW0XEggM0XKnrWENckqQ+ERHDgcWAVYCZgaeAi4FHM/P1To5N0vQREUMyc1ynxyFJkqaNiBhWg+4RwDLAc8DjmfkIcAmwBrB2OTS+AzwLLAwsD5wK/BPYGHgn8ODA/wZS54UlgyRJGvxqx/njgaWAdwOvAbMCfwFOAo7OzFc6N0JJ00pEjGyez/WBeebMfKqDw5IkSVOp1QgzIkYBV1Lu698G3AQcmZkX10kwxwGfA2YC/g7MSQnCPwx8EjgD+Fxm3taBX0PqOEumSJI0yNUb5lsos0BGU26I3w98ghKM7wscEhEjOzZISdNERMwCbB4RX6vfDwNuBDaMiOjo4CRJ0hSLiKE1DB8KXEQpjfJ14DvAW4HjIuILmTkG+BrwVeAE4PfAMcBSmTkW+Aql1vjfBv63kLqDJVMkSRrEahh+J3A/5Yb5L5k5LiJeysxHI+LjwAWUG+bHI+KweqMsqTeNBT4NLB8RswNbUR54L0iXhkqS1JPqzPDX6wSW4cCfgFMy89a6/x5gd+D7dRL5eZTyKZc0fsbiEbErpXziJzLzyQH/RaQuYckUSZIGqVom5W7gGeDTrXIJjaWWrfqDoygzSGcFPpuZf+rcqCVNqea5DVwPLAk8BHwyMx9p7e/sKCVJ0qSIiHmBlxr38EOBXwIrAvdS7u8fbRz/SWBv4L3ANzLzZ419i9V9CwJbZeadA/aLSF3IkimSJA1eHwDmB16kNNL5DzUMH5aZzwObUUqqrDOgI5Q0zdQwfHhd5TErZSn1KGq5lLrf+39JkrpcRCxC6fXzkcbmYcDPgD8AcwNvr8fOAJCZvwT2o8wePz8iPtH6YGbeA+wPrGEYLhmIS5I0aGXmLcDngeWAAyJiubo9W7WEayg+hNJs51FKKC6pxzTO6TF10zbA8sCDwDeBnSNiSC2ZFM3PSJKk7lJXbH41M6+IiGER8bbMfBU4CTiW8tL7zIiYMTNfawvFj6A01byh7Wf+MTP/ObC/idSdDMQlSRpEImLGiFi79X1mXg6sAawM7B8Ry9btGVVmjmvUELR+uNRj6kqP1jk9FCAzr8vMu4B1KS+8tgd2qvsyIuYBvlnrjEuSpC5RS5+RmadExAjgFmDXiJgzM18GfgLsCcwD/GYCofhlmfmNWnN8aKd+D6mbGYhLkjRI1JneJwBnRMTmre2ZeSklFF+JMlN8fCjeqiccEWsCL1M61kvqERExtK70mBk4DbgiIn4eEUvWB+TH+Xco/o2IOCAiPkZZcr0+pceAJEnqHuNXcNVZ4U8AOwBfaYTiZwO7UlZ33tgIxYc1f1Bmvj6A45Z6hoG4JEmDRGaOA06h1BXcIyK2aOxrD8WXa+2LiDmB/6E037trIMcsacrVFR6vR8RbgFuBDwLjKP0DLgPWiohZ6/LodYE/Ux6orwSGAivXEio+E0hdrHmOWupIGpwiYvaI+HgtbzYmImaNiEMBMnNV4HxgX/47FN+FUk/8gYgYUfuISHoTYaN5SZIGlxp2HwnMARyYmT9u7Ps8cDHwK2Av4LfA0ZSZop+oDXckdblaJqXVA2AdSmPcrwFPAbNRZosvTakfflFm/isiZgOWBUYCF9YwfZgPz1L3aTTCjfShXRrU6ouuVYAfUe7LTwTuocwMX7VV2jAifgJsAOwD/DAzH4+IkcBWwGrAms4IlyaNgbgkSYPQJIbi1wIvAJ8ClsvM2zswVElTKCJmAr4HjAKez8yvNvbNApwHfIRSP/yizHyu7fNDfXCWuk8NxzYGZs7M4+u2G4E7M3Objg5O0nQREfNSVnHtSHm5fTuwWWY+2rxev0EoPgMwpr5E89ouTQKXR0qSNAhl5o2UBnpPMOHyKZ+nlE9ZHfi4YbjUkxYEtgY2BMa0Ntbl1s9RSiHdCowGNqyzyMbzgVnqWjMC7wJ+EBF7RsS59fszOzssSdNLZj5MmRn+OjA7cFtmPlr3vd5otPkl4BzKSs+dImK2zHytsaLEa7s0CZwhLknSIPYmM8VXAh63TIrUGyZU3iQilgSuo9QO/1JmXla3D6n1wUdRSiQ9WWuQSuoBEfFOSm3g7YB/AUtm5t86OihJ00WjRNJywBcpL8U2A3bPzEMax42/D4iIK4CZgRUsqyRNPmeIS5LUoyJi6JsdM4GZ4ps19l1rGC71jlozfKaIOKix7XZgZWAGYL+I+FTdPq6G4s8DKwKf68igJU2RzPwH5UXXa8CswOatfTbClQaH1rncCLRvzsxvUJpnHgMcFBHfaR1f7wPmqF+vCqzYmhk+wEOXep4XUmmQaS2lkjQ4RcSQiBjaXBIZEXNO7DONUPxRYHREbDwAQ5U0fXwJ+E5EnN7akJm/o5RAWgw4pC0Uj8x8qRWQd2TEkiZJK9RqhFtXUEqcnQDsFRH7wb/P7bbPen5LPaTO9h4XEcMj4p11xddMAJn5EHAUcCwlFN85It5S64z/MiL2r8e1Xn47Q1yaTAZn0iBQZ4nOl5l/bSyh2iAzz+3w0CRNe18AlsrMXQAi4tfAsxGxXmaOeaMPZeaNEbEXsDtw88AMVdJ08HNgTmDf2jhrI4DMvLWWQfoVcGBEjMzMS5oPyZk5riMjlvSm2hrhjQRezswr676H6vY960uuveqs0JHAJpl5kue31Dvq+T42ImYGzgfmAxYBfhER52bmjzLzrxExGkjgMGAd4G31+/1aP8tzX5oyBuLS4LAMcFhEHJ2Z50TE5cDHI+LXmflYpwcnadqoK0DGAt+KiPkps0jmA3aZWBjekpnXRsQtmfnydB6qpGmgvWZ4DcIej4gTKSs9vxsRTCAUv4Xy4HxJJ8YtafLUGZ6tVV97AktFxDPA5Zl5fmY+EBFH1cP3qPcDZwHfAL4SEVdbX1zqDa1VnjUMvwl4EtgReAG4GFgoIkZl5lE1FD8UuA9YG7gb+HoN0/+rr4ikSWdTTWkQiIgFgUOA9YDbKV2pv5CZv+3owCRNc3U22AbAKZQb589l5g2dHZWk6SUiZqI0zLqibfvbgW0odUZPzcwtGvsWBe73QVnqLRFxLrAKcAewFPAycHpm7l73vxf4OrA98E9KjfHVay8BST2ivtQ6GZiHssrj8VoKbTXg73X7gZn5g8ZnRmbmK63Pe42Xpo51xqRBIDMfBHYGngWWBH5sGC4NTvVGeDbgeUoTvR0j4i2dHZWk6aHWCN4buCwiNmzuy8wnKQ/TFwCb1VnjrX33tmaPDeiAJU2WZt3viHgnMC+wVmZ+GngPcCuwSUQcAZCZ91MmwawC7A8sZxgu9Ya2uv9vB2YBTqxh+JmUBtlLA1sAM1JWgW3f+kAjDA/DcGnqOUNc6nF1ieW4iFgGOKBuXhHYPDPP7ODQJE0j9cY3G/99HyUMXxY4ErgS2DQzX+zoQCVNcxHxAUrt//WBL2fm2XV76/r/WeBHwDuAAzJz786NVtKUiIjvA28B5qLMFn2ubp8LOJpyvT83M3fu3CglTa6IGAUEMDQzn2nb9wHgAWB1YDSwGXBtLafyPeCLlL8LX7U3mDTtOWtE6lGtB+FGE43bgbUo9YT3Bk6p4dkZjc8Mn5Q6w5K6R1uTrdlrTdH7axD2COUm+wjg9IjYKDNfiYgZgB0oD88PvcGPltRlWtf25rbMvDsiDqDct59ea4afXf8GBPB+4BrgUuB/B37UkqZGDcU+SXmpdX1mPlfP7WGZ+c+I2A44BlgvImbMzG07OV5JkyYilgYOBt5bvz8AuLCu8CIz767blwKeAX5dw/Bh1L8HwD8oTTclTWOWTJF6UA3IxtWvPxoRSwBzZuYLmXkPZQnlz4AfR8SX6nEjgRMiYreODVzSZGlrsnUYpdHOr4C9ImKmOtPkPGAn4LPAuRGxNnAcZUn1iM6MXNLkqvVAx0XEiIhYOSLWjohPA9Rr+27AhcBpEbF1RLwHWAxYE7gnM8+1TIrU/SJiaPP7Gop9G7gTWCMi/ieLMXUyy+PAdsCfgBUjYs6BH7WkyRERKwDXAa9Sgu2HgOMp5VD+o1wSpQzi+yl1wwHeDcwNnJSZO9aQ/D/+bkiaepZMkXpYRJxKqTU2D/A74OTM/FHd937KTPF1gTMpNYdXBz6embd1ZMCSJlmrPEr9+gzK7LFLgQ9SbpRvBTbMzBciYnZKKHYEZcb4v4B1MvOOjgxe0mRprQSpS6uvolzX56u7LwL2y8zfR8QCwB7AlsDTwFjK7LGP1TB8/N8NSd0tIr4K/CYz76rffwrYD1gY2DozL6jbh9Xzew5ghsz8R8cGLelN1VKmv6Ks7DgkM5+KiHmBA4E1gA9m5sON4z9CKX22AHAD5W/Ac8BHrRUuTT8G4lIPaZZOiIjRlBIpB1NqCW9MCckOzczR9ZiFga8C6wCPAtu0brolda9m2YSIGAH8nHJu/6pu+y6wFXAPsG4NxWegzCb5EPB7H5il3lKb495ImSl2COXF1tKUAPxhSkB2e13x9VnKuf4McHwNy5rllSR1sYhYlhJ8nQ3sn5l/qts/A+wFLAJ8pT0U79R4JU2aiFiEcn9+KbBxZv6rse9zwE+ApTPzgbbPfY7yzL4g8GdgO6/t0vRlIC71oDqDbAfgfuCndYn10pSmWytRbqxHN46fB3g5M5/txHglTZk6M3wUpaHOJnXZdKsE0k7AtsAfqaF4xwYqaYo1Zn9uCBwKfCEzb2nsXxa4ALgxM9d9g5/hA7PUYyJiE+BkSu3//SYQir8H+FZmntO5UUqaHBHxbsrs8Fcpz+tXNSa0LU4Jyi8EXqeUQvxrZv618fmZW/f0vgiTpi9riEs9JiIOpcwa2wp4tDWLtJZB2Y9Sq2yviNi+9ZnMfMwwXOottUboSOBTlJnfL9ftwzPzFeBISq3whYFfRMRMnRqrpEkXEXPV/h8bADQeducFZgeebB6fmb+hzBhfKyJWmtDPNAyXuld77d/aMJPa+H5r4AvA3nVmKZl5FeWe/ilg/4iYufUZSd2thturAAkcDXyusfswyj39BsDXgF8A10XEMRHxlYgY0QjDwzBcmr4MxKXecwdlSfVclAdnImI4QK0XvC/wS+CoiPhapwYpaerU2eD7AKcDiwM71+1jGqH4EcBZwCzUvweSuldEfBT4KXAusGlEfKyx+wnKS7C56rHN5pi/p/QHmHGAhippGmnMDl09IubIzGyE4qcDm1NC8T0iYrG6/Wrgm8CnM/MFewNIvSMzHwQ+T5kFfkRtlH0ppRTSmpT7+iUo5/7vgG2ATYExjZ/hOS9NZ5ZMkbrYGzXHioj1gO9SHpqXycy/1IBsTN2/NOUm+oDMvG8gxyxp8k2s3EF9ON6JctO8e2YeUrcPr+H4CGDmzHxq4EYsaXLV0idXUuqHXpKZF7c1z50XuASYGVix2QcgItagzDTbJDNvGPjRS5oaEbEN8ANgT+CEzHy67fzfjdJw73jgh5n5h86NVtK0EBHvAS4D3gv8E1ijruoe/5xfV5DMAjxXm2vbHFsaIAbiUpdqa6D5VmA48FqrMUdErENZQj0rsNwEQvERmflqh4YvaRK1nes7AItSGuXek5mH1+3vA3bhDULxzoxc0qSq5/AVlHqhezWu5f/xMiwiNqes9BoLbAI8DswBHAW8CHyyVSpNUm+JiHOBtSnn+ImZ+VSriXZELAVcDcwG/BjYxuu71PtqKH4u8A5gC+DqGnwPrf8d0rqu2w9EGlgG4lIXagvIjgaWoiyxugs4MzN/VPetSalFNhuwbA3Fbb4h9Yi22WH/CywH3A7MQ7lxfhhYKzMfiYj3ArtSbqYPzMy9OjRsSZOoMQNsV2AtYPMJrdxq+1uwGWWV1+LAK5Q6wv8APlFXhQwxFJe618TO0Yg4D1iHUhLth5n5RN2+MrARcDnlhfi9AzVeSdNXRCxIaaY5jHJ9v9oXXlLnGYhLXazOJFkOOJFyAV0E+B9KJ/rv1mPWpCyxXBR4b7NLtaTeEBF7UBrlbgzcVGeL7QQcDnwL+H4N1RYG9qc06FkAeNpllVL3i4hrgGcyc72JHNOcJTY7sDowCngU+HmdSeZLb6mLtU1qWRJ4O+Wl1mOZ+Vjdfh7lBdkRwMnAq5RZ43MA6xuUSb2hdU2elJXZNRS/BBgK7Ea9rg/EOCVNmIG41KUiYgPK7O9NKQHZaxGxCqUb9cnAdq0Lb0R8gdJwb+PM/HOnxixpykTEBcALwLaZ+XxEzEeZKX4JZdn0S41jFwJebD1YS+p+EXEL8EBmbvRGs0drHdHhlJngV05ovw/PUvdqe6l1KvBJ4J1199XAjzPz3Lr/NEpzPYDHgDmBlTPzrgEdtKQp0lgBNitlFfe2mXnJm3xmQeC3wC8yc4OBGKekNzbszQ+R1CELAS8Bd9UwfCHgfEozrh0y89WI+FBm/iEzfxoRl2XmCx0dsaTJFhGjgI8A59UwfGHgZsrD87aZ+VJEfIvScH50Zj7QyfFKmnSNUigvAos2ZpP9VyheZ4C/DTg0Il5sb55pGC51t0YYfgqwCqX3x98pKzz3AfaLiBky84zM/HJEbAS8DwjgNK/vUm9o1P8eSlnN+XD9N1GZ+WBELE5psCmpw4Z0egCS3tAcwCy1C/27gFspAdnXakC2IbBbRMwNYBgudb+IiPZtmfk8cAPw0Yj4DHAT5VzfOjNfrM34lgNGRsTwAR2wpGnlRGAJymoualmk/7gPr99vQGmoafkzqYe0ru8RsQTwCUp5s3Mz88ba++fLQALbRsRiAJl5VmbunZl7GYZLvaOG4W8BVqD0/RkN3D2Jn320EaZL6iADcanD3uCBGMpyquERsR9wJ3Al8JXMfCEi3kmpPfgKpcyCpC5XZ5O0mua1r9C6DvgwcDFwY2ZukJnP1dmiuwCLAedYV1TqLY0a/62X2t+NiK/Vfe1lUxYENgTuoNQNl9SlImJ4RCwaEavBf5zrb6P0+LivvvgaWvf/CtgT+BiwcNvP+q+X5ZK63lnANcCHgDtryD3J+ZqrvqTOMxCXOqgGZK3llQtExCz8u5TR5ZSlV3sCfwS2yMx/1TB8P2B54GBnhkvdr5ZHaDXZ2gM4KyIOjohPA2Tm8cDRlPrBr0fEqhGxJXA8sB6wgQ1zpd5Vz98DKdfzYyNin9o4k4iYISJWAk4DZqasBMvJebCWNHAiYmZKCcOfABdExJKNUPtV4DVgmVoe5fXGS/DLgGcpZdLGszm21JN2pqzwfBewUUTMOKH+IJK6l001pQ5p1BUlIk6gLLkaCfyMUkfw7loO5RpgFkoZhaeB91Jmi66amXd0ZPCSpkhEnAGsBvyJMiP8YeC4zDyq7t8DWB1YklIy4X5gt8z8Y2dGLGlqtV3vl6c8RK9BmQX+EGVG6euUmqKrZeYYG2hK3an2/fgd8AhwEnAt8ERmjm0ccxPlvN4EuL2e0wG8nxKK75mZpw/02CVNmYk0w54PuACYC9gWuNzVnFLvMBCXOiwijqaUPzmdEnSvBNxOCcF+GxFzAjtQlmONoCy7PiUz7+/MiCVNqrYgbBFKHeG9M/O62lTnB8D8wLGZeXg9bjbKg/T/USaOvdSZ0UuaHK2GmW+wr/m34B3AosCmlBfhDwO3AD9rzSZ9o58jqXMiYgRwBTAO2AJ4uJZFibqqo9Vob3Hg55SyhodRQvB3Ad8EPgssl5kPdea3kDQ5Gs2wZwDeA7wVeCAzn6j75wcuBWainOOXt16CufpD6m4G4tIAa71hbtw8/xi4KDMvrPt3BLanzDzZMTNve6O30pK6V/sMz4hYBvgWsHmr1FFEfAg4ilI7+JjMPGJCn5XUGyJiJmBLyszRVyb3YdhzX+peEbEGpWzht4FfTuz8joiPACcDH6CUUHmUMrFldVd4Sr2h8ZJrFHAR8G5gDsrLroOBSzLzgRqKX0IJxbcHrsrM1zo1bkmTxkBcGkDNB9164RxFqRG8fWbe3jhuO0pw9nDdd0cjQPdts9Tlmi+xIuJgyoySWSg1wten1BClntMfpITi8wM/ysyDOjNqSVOrnu87A6My85WJXbO9nku9JSKOANYBPpiZL07C8UMoK0HmoATi12fm36fvKCVNSxExI3AzpXTpD4B/AatQmt4fC+yTmc/U8ikXUZrmfjozb+zQkCVNIpv1SAOkPvi2wvAfA78BzgEWAuau24cBZOYxwPeAeYDTI+IDrYdmH56l7lbP9VYY/hPKsup5KYH3ysB6WdVj76QssfwXsGFEvK1TY5c01U6mvPDaBSZ+zfZ6LvWGRoPbdwCPTSwMj4ghUSyameMy89TMPDwzzzIMl3pHo1HuFpQ+H9tn5nmZeRVwfd13V2Y+A1DP73WB8ykBuqQuZyAuDYA6W7RVO/RQ4FPADyn1wIcC+0fEO2p9sqEwPhQ/iVKn8PnOjFzS5Gg715cEZqXMJluR0lzrZ8BxEfElGD9DPDLzrrp/jcx8ujOjlzS5Gg/MLY9TGu6tEhEjOzAkSdNe6+XVfcDitRzKhA8sL8SHA1dGxBcHYnCSpr3GS+tFgWHAXwDqPfzFwB6ZeVJEvDUilqifeTAzN61lVoZ2ZOCSJpmBuDQAGrNFP0TpQr1zZu4LfBX4BmUm+AU1FH+9EYofBqxs4x2pNzTO9aMoNQRnBP6QmWMy8zZgL0qjrVMjYqPW52oo/kfPdak3tFZ0tcvM54HRlJdgqw3ooCRNF41g7CZKucMvRMSsE/nIcpSVIn+d3mOTNO20nsHbXnbPBLyemS9GxAbAmcDumXlwXT2yHbBT+98E+4FI3c9AXBogEfEj4AJgCeD3AJk5hlJrbDtK9/kLImKetlD8mQ4NWdKU+yTwZUrd0LGtjZl5L/8OxX8YEVu0yqd0ZpiSJlVEvC8i3gtQV3TNBFwVETtGxOKNQ39LWS69cUTMOIFZ5JJ6UC2VcDzlhffGtdHef4iI2SjX/2eBBwZ2hJKmRn0GfwtwVEQsXDefA3wwIs4DzgJ2y8xD6r7FKPXEHweeG/ABS5oqBuLSwDmPEnp/kDJzBIDMfAW4lBKKzwVcFxFz+1ZZ6g3NsKvRB2BxSrf5xSizRsY/NGfmnyih+A3AIRExy8COWNLkqufwFcD5jYfkJYGXgP2B8yLi9Ij4AKUfwMmUGeJztEojdWLckqa50cBllF4/u9bG2MD4UmmtxpvbZuaTnRmipKmwBuW5vFUa6S5KEL4a8MvMPLS+7P4o5Vo/EtjVa73Ue8JJadK0V+sIt0onBOVcGxcRywLXAX8AdsrM6xqfGUFpxLEX8PnMdJml1OUiYmjz5VUtfZKN768BPgbsDvwoM19o7FsYeDEz/zGQY5Y0+eqy6FWBE4CHgM0ys1VPdAlgdWAzSjmF+4CzgUOBM4DtWvcEknpfvX7vBGwN/BO4DZidUmd4FLBhZv6hcyOUNKna7+XrtvMogfjimfl8RCwNfBPYiNJQc2YggFeAT2TmmAn9HEndzUBcmsaaF8O6bDIz89nG/pWBq4BbKM042kPxGWoNUkldrO3F156UFSCvAIdl5qON466l3FT/VyguqXfUUPyTwOnAg8AWwP3NsDsivgasDKxHaYr9J2ClzHy6/YWZpN4WEWsCnwIWoawMuRK4MjMf7ujAJE2S1r18LZPyjsaL7vdRzufrgU3r7O95KCu91wWeAe4Gzq5lVoZl5tg3+J+R1KUMxKVpqPmwGxHHUR6KXwcuzcxdG8etQrnI3kKpQ3Z9J8YraepFxNmUc/1J4B2UEGytzLyxccy1lP4BhwJHZ+aLHRiqpKlUV319ihKKPwBslZn3TeC4lSl/F3YC9szM0QM6UEmTre0+3hdYUh+IiBkoDXDnAbYCfp2ZD0TEPsDGwPaZeflEPu/McKlHWUNcmoYaN9HHUOqMXUF5YP52RJzfqhWcmdcAnwWWAo6vpVQk9YC2muFLAfMBawLLAutTZoReXAMxADJzJcrN9jeAGQZyvJKmXJ0VPl69zl8LbAosBJxcZ5K1jo963K+AI4FfAKvaXFPqbjXUat3HDwVma9s/wefmiBjSOrc9x6Xek5mvAffXb/cADoiIzYGDKJNctmgdGxFD2893w3CpdxmIS9NA8yY5ImYC3gl8MzN3BDYBvk4JwE9vC8XXpTTSfPS/fqikrtN8YK5eo8wM/2NmPgf8CtiR0oDnf9tC8SWBZTPzmYEcs6QpU5dAj4uI4RExf0TMGxEjM3MM8EtKKP5eGqF4s6lWZv4LuJPSXHeEs02l7tRW7vAwyvl9X0ScHRFfAKh/C/4r8M7Mca1z23Nc6n71hVe70cD/AhdQJrAcCJwInAasFxFfhhJ+e75Lg4eBuDSV6k10q47w7MDcwLPA7wBqPfAzgW9Ra482QvGjchmlAAAgAElEQVQrgPkz828dGLqkyVDrDLYemPeOiFOB4yjlx16EcnOcmb8FdqaE4udExGdaPyMzHxr4kUuaXPV8HxsRo4BLKXVEbwGui4jlgCGZeTXlpfdCwEm10V5ztdjMwOLAY534HSRNmsa1/afABsAdwDGUl1nfi4iD6nEGYFKPqzW/Z4yI7SJi/rr5Jko29lZKecNPU0odblT3H9hcDSZpcDAQl6ZCrS/Yuok+CbiOchO9HrBk67gaip9DCcVXBC6sD9kALw/ooCVNtnqut158/ZRyLi9GaaS1ekR8NSKGt46vofhOwP8BJ0bEjB0YtqQp0DrfI2IkcAMwC/AD4GRgKHA5sHVEDKPMJN0EeDfw84iYt/Gj5gJWAbZxZYjU3SJiU+DjwGaU/j77UVZ8vQMY2rzGS+p53wL2B66OiM9n5uPAbsCXgM0z84/AR4CTgNuAh/h3WRVJg4RNNaUp1La88oeUkigXAm+hNOS4knJDfUfjM6MoN9q7AMtk5iMDPW5Jk6fVgb5+PQfwU+A7wH3AzJRAbBSwK6Xb/NjGZ5cEnnZmuNQbWud7XVI9J3Aq8K36cExEzAkcBXweWDUzb6pB2WeBrwDr1NlnUcunzGQTXan71VIpKwOfyMyXImIR4NfANcAWdduimXlvRwcqabJNqEluRKxOeS5fl7Ka+zhgAUod8W0y84Z63LuAR+u9wfhnAkm9z0BcmgJtXehnp7xRvhq4ul4st6S8Uf45sG9bKD4zMCwzn+3A0CVNoYj4ATAHMCuwQescjohZgVsps0h3oS0Ul9RbImIG4PfAWOAZYLXMfKWxfy7KLPGXgVUy89W2+4LmC/P/egiX1FnNc7Sx7UTgo5m5ZEQsRLmuXwVsmZkvRsRWwPzA4bVniKQeUPuBjK0vuhOYIzP/2di/FWU1yEzAn4ARlFXfB2fmq43jDMOlQcaSKdIUaDz0Hg48AHwBeKzVcCczfwRsCawF7BMRH2p89gXDcKm3RMSCwFKU2WPDGmH4yNo476PAc5QmPJvXUgqSetNclB4A8wFvo6wEGd9Auz5IX1X3z1a3jQ+9m0GbYbjUfRovrLaOiJnq5vuBRSJiE+Bm4BfAVjUMnwdYlbJqxBfeUo+oL7/G1glpp1J6gdwYEcdHxOI15D4Z2Bj4MfAB4BPA7pRr/HiG4dLgYyAuTZ2ngL9TGnC0ArDhAJl5CrA5JRQfHRGLd2SEkqZaZj5IqTd4M7ByRGxft78SESNqKP4RYCSldvhMb/jDJHWViIjm95n5MLAfcDalKebOdXvzYfgFSjBmXWGph7RebEXEbsC3GyWNfkCpFXxa/e/mmflC7QuwP+XF95GZ+VIHhi1pMrV6fdUw/DZKr48bKKUP1wXOpzTRJTNvBw4BVuDfPcEe7MS4JQ0cZ7BJk2hCy54z85CIeA7YBzg3IlbJzIcjYobMfC0zT6tNuQ6nLLuW1OUmtJQaoNYKPpBy7TwgIl7LzBNquYQRmflcRLwHmLMG5JK6XGMpdVDvizNzTGbeGxHH1G271Mx8NPAi8B5gfcoscnuBSF2szgDfHvhpZv6l8WJrBKXxNbUPwCvA3pR7+o8Bu0fEbJQQbSngM5n554Eev6RJ0+jdEVnVF2AHAU9TegH8uR77JHAE8Grj+NeAv0bEZ4Ex9fMTfCaQNDhYQ1yaBG31QIdQGme+0tj2DcoMsqcoDbX+HhHDM3NM3T+rAZnU/drO9U0owddw4M7MPLduX46ylHI54DuZeULdPqJZa1BSd2ud73X22GHA+4AxwN3ArnXfQsC3ga0ptUWfAF6j9AxYPjPHWFdU6l4R8WXgFMoM8CNaTa5rX5B3ZeZajb8FQ4B5KY2zF6Ws+roeODkz7+vMbyBpUkTEHJn5xAS2Xw/cl5lb1e83ppRP2SszD64vzYZk5vPN67nXdmnwc4a49CbaArL9gA9TbpKvj4gLM/OCzDy23kTvCFwYEWvVmeKtUNzmO1KXay2trF+fD6xImQ06V929WWaulpk3RsRBlFB8/4h4S2aONgyXekfbUurfAS9RGmnODXwJ+FxErJqZD9R+IWOBTSgzStcHnquzx4bZRFfqXnW15tspqzWHRMSRtQza7JQXYNS/BVHDr4eAbSJiRsrLr3GGYlJ3i4j3A3dFxEcz87a6bRjl5fXclFIprckupwG715XeMwBHUu4DTmqe65730uBnDXFpItoCsv+lNMp8HrgJWIVSJuUbAJl5NPA9SvOtX0fEO1szxG2qJXW/RrPcQynLpdcH3g/MQ3mQXiEirqnH3gjsC9wHfDMi3tpeh1hS92ospT4WeBxYPzO3zMzPU+qKLkKpGUxmPkCZXXomsBKwfeO67gOz1GUiYlSdGQ5AZh4J7ApsA+wcEXMAzwL/avtc1P/ORlkJOtZQTOpu9bx9Gdg2M2+LiKEA9fx9GvgNsHFEbE1ZLbJ3Zh5SP/7++m/mDgxdUoc5Q1yaiEZA9jVgeWAd4LbMfC0ilqE0zzs6Ip7JzLMy85iIGEHpVD1DxwYuaYrUmv/LAJcDN9d6gtQZoi8C+0TErpl5aGbeWptrPpGZz3Zu1JKmUFAehH9BaZBNRKwPbAvskpnnR8QsmflcZt4TEaOBocC+ETFjZu5mWCZ1l/qiazQwT0Sc3bqOZ+bhNe8+lPIM/FFg7roibBZgxvp5gCeBDSl1xSV1qYj4CLA2JeQ+oT6H3xoRx2TmyfWwMymrPk8E9s/MA+pnP0B52f0KcPTAj15Sp1lDXGqoYdhawNLAZcCNNfweTZkR/nHg1UZtsQ8DJwMvAGtm5jN1+2ytryV1n4mc628HHgB+mJm71FknQ+py6lmAW4E/ZuZ6HRu8pCnWap5Vv54T+BuwR2aOjogvAmfV7w+uD9aHAPdm5g/rZxYEDgDWoDTbe8pVYFJ3iYj3AQ9n5ktRGt5f09j3bUoo/jLwa+AfQFJekI2lrPo4ITPvHPiRS5ocEXERJeyet9YAXww4gTKRbZPMPKs2zf0m8FXKi68jgMWAJSgvuT9W+4HYQFPqM84Ql6qIGAVcSpnZPTNwcf36NWA2YI7MfLkeO6wuw/pdRJwFHAyMAp4BMAyXutebnOutpZWrR8QpmXlvRIyrjXWei4h7gTmsGyz1hohYgDITdCRwe2beVbcPocwCvR74bJ0VeiSwByUEp35uCeDOVpCemQ9GxB7ATpn55ED+LpImTasBZkRsB3w/IrbPzGPrvsMj4mXKjNDbgO97Lks961DKpLVdgT3raq7tgX2AM2rIfXpEfB+4k7KKe0vKqrArgH0zc6z39VJ/MhCXgNpU6zbgEUqjvNsys7lM8mZgw4jYATi2Xjhbb5FfBv5JCdMkdbE3O9czc1ydbXI0sF1EjM7M+4GMiLko9cR/j3WDpa4XEcsCp1Neas8GvBYRW2bmWZSqaBkRF1PO988AR9SZ4UMj4j2UYPwF4PR6bCsU/2uHfiVJk+c2ykvvvSOCRih+bL0fOAh4W0Qc3QjRx68ikdT1/kxpiLlGRJyQmY9k5h0RsW/df2o9908HrgKuapVCa/2A+kxvGC71IQNx9b3agfpk4DFgy8z8W90+pFEb9GzgK8C3KXWET6olFN5OabD1t7pdUpd6s3O9EXadEBELAd8CFomI44ERwGrA+4DNrBssdbeIWIFSG/wM4BxgTkpt8DMi4qHMvAHGB2NzUWaGfzAivkIphfJJysqRler13qXUUo/JzJtqMLYPpfZ/MxQ/JCJeo5RPeDkivlNXfxqGSz0iM5+IiCOBC4GVKdd8JhCKv15fhgO81Pp8vff32i71KWuIq+9FxLsoNYSPowbdbfuH1ofhOYBrgXdQllzdByxMWU69QmsZtqTu9Gbnej1mfOgVETsDm1HqDD5OmVW+hXVFpe5Wm15fS5n5/d3MfLFuX54Sjp+YmftHxPDMHFP3bU+ZJb4scDtwN6UsikuppS4VEW8BRmXm429y3IeBvSl1hfdpheJ133bALzPznuk6WEnTTFs/kLcCF1HKl66emf9oHLcE5dxfG/hKo9GmJBmISxGxJuWt8mKZ+ac3OGZYfSieB/gO8CHgrZQH5gMz894BG7CkKTIp53o9LijXx3H1JnsB4HngafsDSN2tnrOXAssAC2fmA20PzjcBN1AekIe3LZseDrw1M59obHNmuNSFImIo5VxfBFiuGYK9wfHNUHyvzDxu+o9S0rQQEUsBCwEXN3p6NSex7EQpgbR2Zl7e9sL7Q8AxlGv+MpZFktQypNMDkLrAEGBM65vaaKtdqzxCUN5Af5Iyi2wzw3CpZ7zpuV7LpySlVMpKwL8y847M/IthuNQTXqHMDP8bcE5EvLMRhg+nlAv8H8oqr2siYoeIWKV+NtvCcJdSS91rCHAqkMCFdRXYG8rM3wH7AdcBx0bE1tN9hJKmWl0JcjJlhddlEbFzvV9vXp+PpVzX9wDIzDGt+/zM/AOlkeZy9XvDcEmAgbgEcA8wFtgKxjfVi+YBjXrBPwa+nJmvZ+ZLLqGWesrknOtHAJt70yz1ltok9wJKD4C5gIsi4m1198+ABSkzxK8HRgKHAVdGxF+A1dt+lue/1KXq7M+fATsCczPpofjuwE+BX0/3QUqaanVG+OeADSmlSw8G/hgRu9SePwCvUXp+fTgi1m59tHWfn5n31/t+8y9J4/kHQYJ/UrpTfzEi1oLyENx+wYyIRSkzy24Z+CFKmgYm51yfAbh14IcoaWpl5muUfgHbUULxyyPiF8AHKCu8Ns3MjYCPU8on7A/cBFzSmRFLmhITONf/IxRvvvSOiIUi4vuUVZ8bZeZ9Az1eSVMmM/8vM38KrABsQOnrsy/wm4jYDXg/cDjwDLBG/Uy2v9huTHyRJGuISzC+ttiNlKVW383Mi9v2z0aZMbo88JnMfGjgRylpanmuS/0jImagzCrbn/KwvGpmXtWqH1qXXI9r+4w1w6Ue0zjXj6G8/F47Mx9p7H8PJSz7HLDExPqISOoNEbEBsCrwZeBZ4AxgTkpZtM9m5i87ODxJPcBAXKoiYlXgfOAp4DTgB5R6wysA6wNrAivWOmSSepTnutQ/ImIkJQQ7EniSEpRNtPmepN4TESOA1fh3KL5GZj7WCMM/AyyfmXd0cJiSplL7y+yIWAH4IrAWME/d/J3MPKwT45PUOwzEpYbagf6HwOKUJZUJ/KP++3pm3t3B4UmaRjzXpf7RFpT9H7BOc/aopMGhca4fSznXvwnsULcZhkuDVH35PQvwXWBWSs8ve31JmigDcalNRMwOLAAsAQwFbgYeycynOzkuSdOW57rUP2pQtirwfSCApTPzic6OStK01iifcjjwHuBFYAXDcGnwas0ab5VEq9uGGYpLmhgDcUmSJA16NShbl9KQa31rhUuDU30Btg7wVWD7zLyrw0OSNJ21heHjv5akN2IgLkmSpL4QEcMzc0z92gaa0iBVSygMz8znOz0WSZLUfQzEJUmSJEmSJEl9YUinByBJkiRJkiRJ0kAwEJckSZIkSZIk9YWeD8QjYv2IOCYiro+I5yIiI+LMTo9LkiRJkiRJktRdhnV6ANPAnsCHgBeAR4BFOjscSZIkSZIkSVI36vkZ4sCOwMLALMA2HR6LJEmSJEmSJKlL9fwM8cz8VevriOjkUCRJkiRJkiRJXWwwzBCXJEmSJEmSJOlNGYhLkiRJkiRJkvpCz5dMmRZWWmml7PQYJE1/Rx11FAA77LBDh0ciaXryXJf6h+e71B+OOuoollhiiU4PQ9LAGYw1kbs+e7zmmmvYf//9Oe2005hvvvk6PZw3M1X/H3GGuCRJkiRJkiSpLxiIS5IkSZIkSZL6goG4JEmSJEmSJKkvGIhLkiRJkiRJkvqCgbgkSZIkSZIkqS8M6/QAplZErA2sXb+du/53mYg4tX79ZGbuPOADkyRJkiRJkiR1lZ4PxIElgC+3bVuw/gN4CDAQlyRJkiRJkqQ+1/MlUzLzu5kZE/m3QKfHKEmSJEmSJEnqvJ4PxCVJkiRJkiRJmhQG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZIkSeoLBuKSJEmSJEmSpL5gIC5JkiRJkiRJ6gsG4pIkSZIkSZKkvmAgLkmSJEmSJEnqCwbikiRJkiRJkqS+YCAuSZIkSZL+v737D/n1rus4/nq7RFJrSdNVullNp8WEiKh0S502pCzCOBL9+KMZxSihn0Tuj6xg1YSIVUiMWFGcomFrw6SyzXNqsRqZxrDVjFg/YMx+bOrJnKetT398vyfu3d7nnPs+Z3i2vR6Pfy6+1/W5vtfnuv58cvG5AKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAB9HhS0AAAoWSURBVABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFQRxAAAAAAAqCOIAAAAAAFQQxAEAAAAAqCCIAwAAAABQQRAHAAAAAKCCIA4AAAAAQAVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAKgjgAAAAAABUEcQAAAAAAKgjiAAAAAABUEMQBAAAAAKggiAMAAAAAUEEQBwAAAACggiAOAAAAAEAFQRwAAAAAgAqCOAAAAAAAFfYdxGfm+pm5Y2b+dWY+OTMPzcwHZ+btM/P5u8ZeNDPvnJm7Z+bBmfnUzDwwM3fOzNUz88w9/v/ymXnHzPzVzPz79pz7Z+bXZuYlp5jXZ8/MT8/MfTPzyMz828zcPDNfdrBHAQAAAADA09lB3hD/4STPSfInSW5IcjjJo0l+Ksk9M3PRjrGXJPnOJB9LcmuSX0jy7iQvTnJTkj+emc/a9f+/l+RHkzyy/e9fTvJAku9J8jcz88rdE5qZZ23n85NJPr6d1+1J3pTk/TPzNQe4PwAAAACAszIzb5uZNTO/cq7n0uIgz3x3lD6Vz11rPbLHxa5Lcm2StyX5/u3uu5I8b631v7vGPjPJe5NcmeRbk9y84/AvJvmttdYDu865Nsl1SW5M8opdl/+RJJcneVeSbztxvZn53WxC/E0z84rd8wAAAAAAeKLNzNcm+b4k95zrubQ46DPf9xvie8XwrRNR+6U7xh7fK0Kvtf4nm1D9uPHbY9fvjuFb1yf5ZJLLdi7NMjOT5Jrtzx/feb211m1J7kzy5Ulec6r7AgAAAAA4WzNzfjYrX7wlycPneDr7dvz48Rw9ejRJcsstt+T48ePndkIHcCbP/In4qOY3b7enLfAzc16Sb9zv+K2VzdIsSfLYjv2XJLk4yYfXWvfvcd4fbrev2+d1AAAAAADO1I1J3rXWOnKuJ7Jfx48fz6FDh3LnnXcmSW677bYcOnToqRTFD/zMD7JkSpJkZn4syXOTnJ/kq5JckU3c/vk9xl6Q5K1JJsnzk1yV5CVJfnut9e59XvLNST4nyV+utT66Y//LttsPn+S8f9huL93ndQAAAAAADmxmvjeb7vld53ouB3H48OEcO3bscfuOHTuWw4cP5+qrrz5Hs9qfM33ms9Y66IUeTHLhjl1/lOS711of2WPsy5P83Y5dK5sPbF67XT7ldNf6kiR3J3leklevtf5ix7HvyOZ1+MNrrU+76Zm5Kpv1yt+71nrDfu4NAAAAAOAgZuZlSf48yRVrrfu2+44m+dBa663ncm6nc+WVV96e5PV7HLr9yJEjV32m57NfZ/PMD/yG+FrrC7YXuDDJq7J5M/yDM/NNa60P7Br795uhc16SFyZ5U5KfSXLFzLxxrfXQKW7qBdkse/L8JD+wM4YDAAAAADxJvDLJBUn+dvPZwyTJeUlePTPXJHnOWutT52pyp3LkyJGvP9dzOENn/MwPHMRP2L4R/vsz84Fsli35zSSXnWTsY0n+JckNM/ORJL+TTRjfs9ZvY/j7slkW5QfXWu/cY9jHttvzTzLFE/s/epLjAAAAAABn69Yk79+179ezWdL5Z5M8ZRbkfgo542d+xkH8hLXWP8/MvUm+YmYuWGv9x2lOOfGxy9fudXBmvjDJHUlens2b4XvF8CS5b7s92RrhL91uT7bGOAAAAADAWdl+9/BxL+XOzCeSPLTW+tC5mdXT29k882c8QXP4ou32sX2MfeF2++juAzPzoiR/mk0Mv+YUMTxJ/jGbt84v3a41vts3bLfv28ecAAAAAAB4mttXEJ+ZS2fm05YmmZlnzMx1SV6Q5K611sPb/V+5XTd89/jnJrlh+/M9u469OMmfJbkkyVvWWjeeak5r8zXQX93+fMfM/P+9zMy3JPm6JPdmE9gBAAAAAD4j1lqvfbJ/UPPpZr/PfDZd+TSDZn4oyc9l8+XO+5P8Z5ILk7wmyZcmeTDJ69da927H35rk8iR3ZfMW938nuSibt7Y/b7v/DWut/9pxjfuTfHGSv07yByeZym+stf5pxznPyuYN8Fdls2bMHUkuTvLmbNaJed1a6+7T3iAAAAAAAE97+w3ilyW5JskVSV6UTdT+RDbrc78nyS+ttR7aMf6NSb49yVdnE86fneThJPckuTnJTWutR3dd4/QTSa5cax3ddd6zk/zE9noXJ/l4kqNJ3n4i0AMAAAAAwL6COAAAAAAAPNU9UR/VBAAAAACAJzVBHAAAAACACoI4AAAAAAAVBHEAAAAAACoI4gAAAAAAVBDEAQAAAACoIIgDAAAAAFBBEAcAAAAAoIIgDgAAAABABUEcAAAAAIAK/we4HKcChlJ86wAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 1800x720 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"msno.matrix(blackpink)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nice, we have no missing data. The sizable chunk missing in Viewer Rating is mostly because I gave up trying to collect whether or not a commentor also like the video. Since we obtained the comments sorted by time, it makes sense that most comments don't have a lot of replies and likes. I also tried to obtain comments by relevance, but I got the same 18-30 comments, so it didn't seem worthwhile to continue."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Goals\n",
"## Quick Qualitative Analysis\n",
"I was hoping that we could use the other columns to draw some interesting conclusions. For example, was it more likely the case that people praising BlackPink would also like the video? But because most commentors virtually did not like the video, we don't have enough to draw meaningful conclusions. We could also consider whether comments that were against BlackPink had more replies, but again, most of the comments had at most one or two replies. So seems like most of our analysis will just focus on the comments themselves.\n",
"\n",
"## Comments by language\n",
"KPop has become a global phenomenon, reaching all corners of the globe since its inception. Prominent in the States, in Brazil, and of course in the countries adjacent to Korea. Would be interesting to see how this is represented in the comments, so detect language."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud import translate_v2 as translate\n",
"translate_client = translate.Client()\n",
"# set the environment variable GOOGLE_APPLICATION_CREDENTIALS to your client secret json file\n",
"# for example:\n",
"# export GOOGLE_APPLICATION_CREDENTIALS='../youtubeviewcounts.json'"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"# get supported languages\n",
"languages = translate_client.get_languages()\n",
"available_languages = {}\n",
"for language in languages:\n",
" language_shorthand = language.get('language')\n",
" language_fullname = language.get('name')\n",
" available_languages[language_shorthand] = language_fullname"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/plain": [
"{'af': 'Afrikaans',\n",
" 'sq': 'Albanian',\n",
" 'am': 'Amharic',\n",
" 'ar': 'Arabic',\n",
" 'hy': 'Armenian',\n",
" 'az': 'Azerbaijani',\n",
" 'eu': 'Basque',\n",
" 'be': 'Belarusian',\n",
" 'bn': 'Bengali',\n",
" 'bs': 'Bosnian',\n",
" 'bg': 'Bulgarian',\n",
" 'ca': 'Catalan',\n",
" 'ceb': 'Cebuano',\n",
" 'ny': 'Chichewa',\n",
" 'zh-CN': 'Chinese (Simplified)',\n",
" 'zh-TW': 'Chinese (Traditional)',\n",
" 'co': 'Corsican',\n",
" 'hr': 'Croatian',\n",
" 'cs': 'Czech',\n",
" 'da': 'Danish',\n",
" 'nl': 'Dutch',\n",
" 'en': 'English',\n",
" 'eo': 'Esperanto',\n",
" 'et': 'Estonian',\n",
" 'tl': 'Filipino',\n",
" 'fi': 'Finnish',\n",
" 'fr': 'French',\n",
" 'fy': 'Frisian',\n",
" 'gl': 'Galician',\n",
" 'ka': 'Georgian',\n",
" 'de': 'German',\n",
" 'el': 'Greek',\n",
" 'gu': 'Gujarati',\n",
" 'ht': 'Haitian Creole',\n",
" 'ha': 'Hausa',\n",
" 'haw': 'Hawaiian',\n",
" 'iw': 'Hebrew',\n",
" 'hi': 'Hindi',\n",
" 'hmn': 'Hmong',\n",
" 'hu': 'Hungarian',\n",
" 'is': 'Icelandic',\n",
" 'ig': 'Igbo',\n",
" 'id': 'Indonesian',\n",
" 'ga': 'Irish',\n",
" 'it': 'Italian',\n",
" 'ja': 'Japanese',\n",
" 'jw': 'Javanese',\n",
" 'kn': 'Kannada',\n",
" 'kk': 'Kazakh',\n",
" 'km': 'Khmer',\n",
" 'rw': 'Kinyarwanda',\n",
" 'ko': 'Korean',\n",
" 'ku': 'Kurdish (Kurmanji)',\n",
" 'ky': 'Kyrgyz',\n",
" 'lo': 'Lao',\n",
" 'la': 'Latin',\n",
" 'lv': 'Latvian',\n",
" 'lt': 'Lithuanian',\n",
" 'lb': 'Luxembourgish',\n",
" 'mk': 'Macedonian',\n",
" 'mg': 'Malagasy',\n",
" 'ms': 'Malay',\n",
" 'ml': 'Malayalam',\n",
" 'mt': 'Maltese',\n",
" 'mi': 'Maori',\n",
" 'mr': 'Marathi',\n",
" 'mn': 'Mongolian',\n",
" 'my': 'Myanmar (Burmese)',\n",
" 'ne': 'Nepali',\n",
" 'no': 'Norwegian',\n",
" 'or': 'Odia (Oriya)',\n",
" 'ps': 'Pashto',\n",
" 'fa': 'Persian',\n",
" 'pl': 'Polish',\n",
" 'pt': 'Portuguese',\n",
" 'pa': 'Punjabi',\n",
" 'ro': 'Romanian',\n",
" 'ru': 'Russian',\n",
" 'sm': 'Samoan',\n",
" 'gd': 'Scots Gaelic',\n",
" 'sr': 'Serbian',\n",
" 'st': 'Sesotho',\n",
" 'sn': 'Shona',\n",
" 'sd': 'Sindhi',\n",
" 'si': 'Sinhala',\n",
" 'sk': 'Slovak',\n",
" 'sl': 'Slovenian',\n",
" 'so': 'Somali',\n",
" 'es': 'Spanish',\n",
" 'su': 'Sundanese',\n",
" 'sw': 'Swahili',\n",
" 'sv': 'Swedish',\n",
" 'tg': 'Tajik',\n",
" 'ta': 'Tamil',\n",
" 'tt': 'Tatar',\n",
" 'te': 'Telugu',\n",
" 'th': 'Thai',\n",
" 'tr': 'Turkish',\n",
" 'tk': 'Turkmen',\n",
" 'uk': 'Ukrainian',\n",
" 'ur': 'Urdu',\n",
" 'ug': 'Uyghur',\n",
" 'uz': 'Uzbek',\n",
" 'vi': 'Vietnamese',\n",
" 'cy': 'Welsh',\n",
" 'xh': 'Xhosa',\n",
" 'yi': 'Yiddish',\n",
" 'yo': 'Yoruba',\n",
" 'zu': 'Zulu',\n",
" 'he': 'Hebrew',\n",
" 'zh': 'Chinese (Simplified)'}"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# double check available_languages\n",
"available_languages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From here, we can choose to continue using Google's official api, or use googletrans https://github.com/ssut/py-googletrans"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'en'"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from googletrans import Translator\n",
"translator = Translator()\n",
"\n",
"translator.translate(\"Hello\").src\n",
"translator.detect(\"Hello\").lang"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# so either translate_client.translate() or translator.translate()\n",
"\n",
"# translator.translate('안녕하세요.')\n",
"# <Translated src=ko dest=en text=Good evening. pronunciation=Good evening.>\n",
"# translate_client.translate([\"HELLO\", \"GOODBYE\"])\n",
"\"\"\"\n",
"[{'translatedText': 'HELLO', 'detectedSourceLanguage': 'en', 'input': 'HELLO'},\n",
" {'translatedText': 'GOODBYE',\n",
" 'detectedSourceLanguage': 'en',\n",
" 'input': 'GOODBYE'}]\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comments by Language Cont.\n",
"So now that we have a tool to detect a given language, and a tool to translate languages, let's count up how many different kinds of languages are represented in the comments, and then translate them into English so we can apply the same analysis to all comments. "
]
},
{
"cell_type": "code",
"execution_count": 509,
"metadata": {},
"outputs": [],
"source": [
"comments = blackpink.get(\"Comment\")"
]
},
{
"cell_type": "code",
"execution_count": 510,
"metadata": {},
"outputs": [],
"source": [
"# doesn't seem like we can directly feed the dataframe in\n",
"# so convert to a list\n",
"list_of_comments = comments.tolist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Some Non-Language Results From our First Pass"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂\n",
"😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂\n"
]
}
],
"source": [
"print(list_of_comments[0])\n",
"print(translator.translate(list_of_comments[0]).text)"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://www.youtube.com/watch?v=abPiBoRJeQo\n",
"https://www.youtube.com/watch?v=abPiBoRJeQo\n"
]
}
],
"source": [
"print(list_of_comments[1])\n",
"print(translator.translate(list_of_comments[1]).text)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂', 'https://www.youtube.com/watch?v=abPiBoRJeQo', 'https://www.youtube.com/watch?v=abPiBoRJeQo', 'Yess', 'بتس_تنجح_والهيترز_تنبح', 'https://youtu.be/7lhGxyGoOuA', '❤️❤️❤️', 'blackpinkkkkk', '🧐🧐🧐🧐🧐🧐🧐🧐', 'Dejen de escuchar esta basura mejor escuchen more & more de las twice']\n"
]
}
],
"source": [
"print(list_of_comments[:10])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Some examples of Non-English Language Translations"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"بتس_تنجح_والهيترز_تنبح\n",
"ar Pts_a_to_heat_heats_ bark\n"
]
}
],
"source": [
"print(list_of_comments[4])\n",
"translated = translator.translate(list_of_comments[4])\n",
"print(translated.src, translated.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Yikes ... But this is definitely a comment about BTS."
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"es Stop listening to this garbage better listen more & more than twice\n"
]
}
],
"source": [
"list_of_comments[9]\n",
"translated = translator.translate(list_of_comments[9])\n",
"print(translated.src, translated.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This one is a little better in terms of translation, but come on, you can stan more than one fandom."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Some Pre-flections\n",
"Obviously, translating these languages into English isn't the best way to handle non-English comments, since we're now at the mercy of Google Translate. But Google Translate does a great job most of the time, so we'll leverage the API to help us with language detection and translation."
]
},
{
"cell_type": "code",
"execution_count": 512,
"metadata": {},
"outputs": [],
"source": [
"# use list comprehension \n",
"all_translated = []\n",
"index = 0"
]
},
{
"cell_type": "code",
"execution_count": 513,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"1\n",
"2\n",
"3\n",
"4\n",
"5\n",
"6\n",
"7\n",
"8\n",
"9\n",
"10\n",
"11\n",
"12\n",
"13\n",
"14\n",
"15\n",
"16\n",
"17\n",
"18\n",
"19\n",
"20\n",
"21\n",
"22\n",
"23\n",
"24\n",
"25\n",
"26\n",
"27\n",
"28\n",
"29\n",
"30\n",
"31\n",
"32\n",
"33\n",
"34\n",
"35\n",
"36\n",
"37\n",
"38\n",
"39\n",
"40\n",
"41\n",
"42\n",
"43\n",
"44\n",
"45\n",
"46\n",
"47\n",
"48\n",
"49\n",
"50\n",
"51\n",
"52\n",
"53\n",
"54\n",
"55\n",
"56\n",
"57\n",
"58\n",
"59\n",
"60\n",
"61\n",
"62\n",
"63\n",
"64\n",
"65\n",
"66\n",
"67\n",
"68\n",
"69\n",
"70\n",
"71\n",
"72\n",
"73\n",
"74\n",
"75\n",
"76\n",
"77\n",
"78\n",
"79\n",
"80\n",
"81\n",
"82\n",
"83\n",
"84\n",
"85\n",
"86\n",
"87\n",
"88\n",
"89\n",
"90\n",
"91\n",
"92\n",
"93\n",
"94\n",
"95\n",
"96\n",
"97\n",
"98\n",
"99\n",
"100\n",
"101\n",
"102\n",
"103\n",
"104\n",
"105\n",
"106\n",
"107\n",
"108\n",
"109\n",
"110\n",
"111\n",
"112\n",
"113\n",
"114\n",
"115\n",
"116\n",
"117\n",
"118\n",
"119\n",
"120\n",
"121\n",
"122\n",
"123\n",
"124\n",
"125\n",
"126\n",
"127\n",
"128\n",
"129\n",
"130\n",
"131\n",
"132\n",
"133\n",
"134\n",
"135\n",
"136\n",
"137\n",
"138\n",
"139\n",
"140\n",
"141\n",
"142\n",
"143\n",
"144\n",
"145\n",
"146\n",
"147\n",
"148\n",
"149\n",
"150\n",
"151\n",
"152\n",
"153\n",
"154\n",
"155\n",
"156\n",
"157\n",
"158\n",
"159\n",
"160\n",
"161\n",
"162\n",
"163\n",
"164\n",
"165\n",
"166\n",
"167\n",
"168\n",
"169\n",
"170\n",
"171\n",
"172\n",
"173\n",
"174\n",
"175\n",
"176\n",
"177\n",
"178\n",
"179\n",
"180\n",
"181\n",
"182\n",
"183\n",
"184\n",
"185\n",
"186\n",
"187\n",
"188\n",
"189\n",
"190\n",
"191\n",
"192\n",
"193\n",
"194\n",
"195\n",
"196\n",
"197\n",
"198\n",
"199\n",
"200\n",
"201\n",
"202\n",
"203\n",
"204\n",
"205\n",
"206\n",
"207\n",
"208\n",
"209\n",
"210\n",
"211\n",
"212\n",
"213\n",
"214\n",
"215\n",
"216\n",
"217\n",
"218\n",
"219\n",
"220\n",
"221\n",
"222\n",
"223\n",
"224\n",
"225\n",
"226\n",
"227\n",
"228\n",
"229\n",
"230\n",
"231\n",
"232\n",
"233\n",
"234\n",
"235\n",
"236\n",
"237\n",
"238\n",
"239\n",
"240\n",
"241\n",
"242\n",
"243\n",
"244\n",
"245\n",
"246\n",
"247\n",
"248\n",
"249\n",
"250\n",
"251\n",
"252\n",
"253\n",
"254\n",
"255\n",
"256\n",
"257\n",
"258\n",
"259\n",
"260\n",
"261\n",
"262\n",
"263\n",
"264\n",
"265\n",
"266\n",
"267\n",
"268\n",
"269\n",
"270\n",
"271\n",
"272\n",
"273\n",
"274\n",
"275\n",
"276\n",
"277\n",
"278\n",
"279\n",
"280\n",
"281\n",
"282\n",
"283\n",
"284\n",
"285\n",
"286\n",
"287\n",
"288\n",
"289\n",
"290\n",
"291\n",
"292\n",
"293\n",
"294\n",
"295\n",
"296\n",
"297\n",
"298\n",
"299\n",
"300\n",
"301\n",
"302\n",
"303\n",
"304\n",
"305\n",
"306\n",
"307\n",
"308\n",
"309\n",
"310\n",
"311\n",
"312\n",
"313\n",
"314\n",
"315\n",
"316\n",
"317\n",
"318\n",
"319\n",
"320\n",
"321\n",
"322\n",
"323\n",
"324\n",
"325\n",
"326\n",
"327\n",
"328\n",
"329\n",
"330\n",
"331\n",
"332\n",
"333\n",
"334\n",
"335\n",
"336\n",
"337\n",
"338\n",
"339\n",
"340\n",
"341\n",
"342\n",
"343\n",
"344\n",
"345\n",
"346\n",
"347\n",
"348\n",
"349\n",
"350\n",
"351\n",
"352\n",
"353\n",
"354\n",
"355\n",
"356\n",
"357\n",
"358\n",
"359\n",
"360\n",
"361\n",
"362\n",
"363\n",
"364\n",
"365\n",
"366\n",
"367\n",
"368\n",
"369\n",
"370\n",
"371\n",
"372\n",
"373\n",
"374\n",
"375\n",
"376\n",
"377\n",
"378\n",
"379\n",
"380\n",
"381\n",
"382\n",
"383\n",
"384\n",
"385\n",
"386\n",
"387\n",
"388\n",
"389\n",
"390\n",
"391\n",
"392\n",
"393\n",
"394\n",
"395\n",
"396\n",
"397\n",
"398\n",
"399\n",
"400\n",
"401\n",
"402\n",
"403\n",
"404\n",
"405\n",
"406\n",
"407\n",
"408\n",
"409\n",
"410\n",
"411\n",
"412\n",
"413\n",
"414\n",
"415\n",
"416\n",
"417\n",
"418\n",
"419\n",
"420\n",
"421\n",
"422\n",
"423\n",
"424\n",
"425\n",
"426\n",
"427\n",
"428\n",
"429\n",
"430\n",
"431\n",
"432\n",
"433\n",
"434\n",
"435\n",
"436\n",
"437\n",
"438\n",
"439\n",
"440\n",
"441\n",
"442\n",
"443\n",
"444\n",
"445\n",
"446\n",
"447\n",
"448\n",
"449\n",
"450\n",
"451\n",
"452\n",
"453\n",
"454\n",
"455\n",
"456\n",
"457\n",
"458\n",
"459\n",
"460\n",
"461\n",
"462\n",
"463\n",
"464\n",
"465\n",
"466\n",
"467\n",
"468\n",
"469\n",
"470\n",
"471\n",
"472\n",
"473\n",
"474\n",
"475\n",
"476\n",
"477\n",
"478\n",
"479\n",
"480\n",
"481\n",
"482\n",
"483\n",
"484\n",
"485\n",
"486\n",
"487\n",
"488\n",
"489\n",
"490\n",
"491\n",
"492\n",
"493\n",
"494\n",
"495\n",
"496\n",
"497\n",
"498\n",
"499\n",
"500\n",
"501\n",
"502\n",
"503\n",
"504\n",
"505\n",
"506\n",
"507\n",
"508\n",
"509\n",
"510\n",
"511\n",
"512\n",
"513\n",
"514\n",
"515\n",
"516\n",
"517\n",
"518\n",
"519\n",
"520\n",
"521\n",
"522\n",
"523\n",
"524\n",
"525\n",
"526\n",
"527\n",
"528\n",
"529\n",
"530\n",
"531\n",
"532\n",
"533\n",
"534\n",
"535\n",
"536\n",
"537\n",
"538\n",
"539\n",
"540\n",
"541\n",
"542\n",
"543\n",
"544\n",
"545\n",
"546\n",
"547\n",
"548\n",
"549\n",
"550\n",
"551\n",
"552\n",
"553\n",
"554\n",
"555\n",
"556\n",
"557\n",
"558\n",
"559\n",
"560\n",
"561\n",
"562\n",
"563\n",
"564\n",
"565\n",
"566\n",
"567\n",
"568\n",
"569\n",
"570\n",
"571\n",
"572\n",
"573\n",
"574\n",
"575\n",
"576\n",
"577\n",
"578\n",
"579\n",
"580\n",
"581\n",
"582\n",
"583\n",
"584\n",
"585\n",
"586\n",
"587\n",
"588\n",
"589\n",
"590\n",
"591\n",
"592\n",
"593\n",
"594\n",
"595\n",
"596\n",
"597\n",
"598\n",
"599\n",
"600\n",
"601\n",
"602\n",
"603\n",
"604\n",
"605\n",
"606\n",
"607\n",
"608\n",
"609\n",
"610\n",
"611\n",
"612\n",
"613\n",
"614\n",
"615\n",
"616\n",
"617\n",
"618\n",
"619\n",
"620\n",
"621\n",
"622\n",
"623\n",
"624\n",
"625\n",
"626\n",
"627\n",
"628\n",
"629\n",
"630\n",
"631\n",
"632\n",
"633\n",
"634\n",
"635\n",
"636\n",
"637\n",
"638\n",
"639\n",
"640\n",
"641\n",
"642\n",
"643\n",
"644\n",
"645\n",
"646\n",
"647\n",
"648\n",
"649\n",
"650\n",
"651\n",
"652\n",
"653\n",
"654\n",
"655\n",
"656\n",
"657\n",
"658\n",
"659\n",
"660\n",
"661\n",
"662\n",
"663\n",
"664\n",
"665\n",
"666\n",
"667\n",
"668\n",
"669\n",
"670\n",
"671\n",
"672\n",
"673\n",
"674\n",
"675\n",
"676\n",
"677\n",
"678\n",
"679\n",
"680\n",
"681\n",
"682\n",
"683\n",
"684\n",
"685\n",
"686\n",
"687\n",
"688\n",
"689\n",
"690\n",
"691\n",
"692\n",
"693\n",
"694\n",
"695\n",
"696\n",
"697\n",
"698\n",
"699\n",
"700\n",
"701\n",
"702\n",
"703\n",
"704\n",
"705\n",
"706\n",
"707\n",
"708\n",
"709\n",
"710\n",
"711\n",
"712\n",
"713\n",
"714\n",
"715\n",
"716\n",
"717\n",
"718\n",
"719\n",
"720\n",
"721\n",
"722\n",
"723\n",
"724\n",
"725\n",
"726\n",
"727\n",
"728\n",
"729\n",
"730\n",
"731\n",
"732\n",
"733\n",
"734\n",
"735\n",
"736\n",
"737\n",
"738\n",
"739\n",
"740\n",
"741\n",
"742\n",
"743\n",
"744\n",
"745\n",
"746\n",
"747\n",
"748\n",
"749\n",
"750\n",
"751\n",
"752\n",
"753\n",
"754\n",
"755\n",
"756\n",
"757\n",
"758\n",
"759\n",
"760\n",
"761\n",
"762\n",
"763\n",
"764\n",
"765\n",
"766\n",
"767\n",
"768\n",
"769\n",
"770\n",
"771\n",
"772\n",
"773\n",
"774\n",
"775\n",
"776\n",
"777\n",
"778\n",
"779\n",
"780\n",
"781\n",
"782\n",
"783\n",
"784\n",
"785\n",
"786\n",
"787\n",
"788\n",
"789\n",
"790\n",
"791\n",
"792\n",
"793\n",
"794\n",
"795\n",
"796\n",
"797\n",
"798\n",
"799\n",
"800\n",
"801\n",
"802\n",
"803\n",
"804\n",
"805\n",
"806\n",
"807\n",
"808\n",
"809\n",
"810\n",
"811\n",
"812\n",
"813\n",
"814\n",
"815\n",
"816\n",
"817\n",
"818\n",
"819\n",
"820\n",
"821\n",
"822\n",
"823\n",
"824\n",
"825\n",
"826\n",
"827\n",
"828\n",
"829\n",
"830\n",
"831\n",
"832\n",
"833\n",
"834\n",
"835\n",
"836\n",
"837\n",
"838\n",
"839\n",
"840\n",
"841\n",
"842\n",
"843\n",
"844\n",
"845\n",
"846\n",
"847\n",
"848\n",
"849\n",
"850\n",
"851\n",
"852\n",
"853\n",
"854\n",
"855\n",
"856\n",
"857\n",
"858\n",
"859\n",
"860\n",
"861\n",
"862\n",
"863\n",
"864\n",
"865\n",
"866\n",
"867\n",
"868\n",
"869\n",
"870\n",
"871\n",
"872\n",
"873\n",
"874\n",
"875\n",
"876\n",
"877\n",
"878\n",
"879\n",
"880\n",
"881\n",
"882\n",
"883\n",
"884\n",
"885\n",
"886\n",
"887\n",
"888\n",
"889\n",
"890\n",
"891\n",
"892\n",
"893\n",
"894\n",
"895\n",
"896\n",
"897\n",
"898\n",
"899\n",
"900\n",
"901\n",
"902\n",
"903\n",
"904\n",
"905\n",
"906\n",
"907\n",
"908\n",
"909\n",
"910\n",
"911\n",
"912\n",
"913\n",
"914\n",
"915\n",
"916\n",
"917\n",
"918\n",
"919\n",
"920\n",
"921\n",
"922\n",
"923\n",
"924\n",
"925\n",
"926\n",
"927\n",
"928\n",
"929\n",
"930\n",
"931\n",
"932\n",
"933\n",
"934\n",
"935\n",
"936\n",
"937\n",
"938\n",
"939\n",
"940\n",
"941\n",
"942\n",
"943\n",
"944\n",
"945\n",
"946\n",
"947\n",
"948\n",
"949\n",
"950\n",
"951\n",
"952\n",
"953\n",
"954\n",
"955\n",
"956\n",
"957\n",
"958\n",
"959\n",
"960\n",
"961\n",
"962\n",
"963\n",
"964\n",
"965\n",
"966\n",
"967\n",
"968\n",
"969\n",
"970\n",
"971\n",
"972\n",
"973\n",
"974\n",
"975\n",
"976\n",
"977\n",
"978\n",
"979\n",
"980\n",
"981\n",
"982\n",
"983\n",
"984\n",
"985\n",
"986\n",
"987\n",
"988\n",
"989\n",
"990\n",
"991\n",
"992\n",
"993\n",
"994\n",
"995\n",
"996\n",
"997\n",
"998\n",
"999\n",
"1000\n",
"1001\n",
"1002\n",
"1003\n",
"1004\n",
"1005\n",
"1006\n",
"1007\n",
"1008\n",
"1009\n",
"1010\n",
"1011\n",
"1012\n",
"1013\n",
"1014\n",
"1015\n",
"1016\n",
"1017\n",
"1018\n",
"1019\n",
"1020\n",
"1021\n",
"1022\n",
"1023\n",
"1024\n",
"1025\n",
"1026\n",
"1027\n",
"1028\n",
"1029\n",
"1030\n",
"1031\n",
"1032\n",
"1033\n",
"1034\n",
"1035\n",
"1036\n",
"1037\n",
"1038\n",
"1039\n",
"1040\n",
"1041\n",
"1042\n",
"1043\n",
"1044\n",
"1045\n",
"1046\n",
"1047\n",
"1048\n",
"1049\n",
"1050\n",
"1051\n",
"1052\n",
"1053\n",
"1054\n",
"1055\n",
"1056\n",
"1057\n",
"1058\n",
"1059\n",
"1060\n",
"1061\n",
"1062\n",
"1063\n",
"1064\n",
"1065\n",
"1066\n",
"1067\n",
"1068\n",
"1069\n",
"1070\n",
"1071\n",
"1072\n",
"1073\n",
"1074\n",
"1075\n",
"1076\n",
"1077\n",
"1078\n",
"1079\n",
"1080\n",
"1081\n",
"1082\n",
"1083\n",
"1084\n",
"1085\n",
"1086\n",
"1087\n",
"1088\n",
"1089\n",
"1090\n",
"1091\n",
"1092\n",
"1093\n",
"1094\n",
"1095\n",
"1096\n",
"1097\n",
"1098\n",
"1099\n",
"1100\n",
"1101\n",
"1102\n",
"1103\n",
"1104\n",
"1105\n",
"1106\n",
"1107\n",
"1108\n",
"1109\n",
"1110\n",
"1111\n",
"1112\n",
"1113\n",
"1114\n",
"1115\n",
"1116\n",
"1117\n",
"1118\n",
"1119\n",
"1120\n",
"1121\n",
"1122\n",
"1123\n",
"1124\n",
"1125\n",
"1126\n",
"1127\n",
"1128\n",
"1129\n",
"1130\n",
"1131\n",
"1132\n",
"1133\n",
"1134\n",
"1135\n",
"1136\n",
"1137\n",
"1138\n",
"1139\n",
"1140\n",
"1141\n",
"1142\n",
"1143\n",
"1144\n",
"1145\n",
"1146\n",
"1147\n",
"1148\n",
"1149\n",
"1150\n",
"1151\n",
"1152\n",
"1153\n",
"1154\n",
"1155\n",
"1156\n",
"1157\n",
"1158\n",
"1159\n",
"1160\n",
"1161\n",
"1162\n",
"1163\n",
"1164\n",
"1165\n",
"1166\n",
"1167\n",
"1168\n",
"1169\n",
"1170\n",
"1171\n",
"1172\n",
"1173\n",
"1174\n",
"1175\n",
"1176\n",
"1177\n",
"1178\n",
"1179\n",
"1180\n",
"1181\n",
"1182\n",
"1183\n",
"1184\n",
"1185\n",
"1186\n",
"1187\n",
"1188\n",
"1189\n",
"1190\n",
"1191\n",
"1192\n",
"1193\n",
"1194\n",
"1195\n",
"1196\n",
"1197\n",
"1198\n",
"1199\n",
"1200\n",
"1201\n",
"1202\n",
"1203\n",
"1204\n",
"1205\n",
"1206\n",
"1207\n",
"1208\n",
"1209\n",
"1210\n",
"1211\n",
"1212\n",
"1213\n",
"1214\n",
"1215\n",
"1216\n",
"1217\n",
"1218\n",
"1219\n",
"1220\n",
"1221\n",
"1222\n",
"1223\n",
"1224\n",
"1225\n",
"1226\n",
"1227\n",
"1228\n",
"1229\n",
"1230\n",
"1231\n",
"1232\n",
"1233\n",
"1234\n",
"1235\n",
"1236\n",
"1237\n",
"1238\n",
"1239\n",
"1240\n",
"1241\n",
"1242\n",
"1243\n",
"1244\n",
"1245\n",
"1246\n",
"1247\n",
"1248\n",
"1249\n",
"1250\n",
"1251\n",
"1252\n",
"1253\n",
"1254\n",
"1255\n",
"1256\n",
"1257\n",
"1258\n",
"1259\n",
"1260\n",
"1261\n",
"1262\n",
"1263\n",
"1264\n",
"1265\n",
"1266\n",
"1267\n",
"1268\n",
"1269\n",
"1270\n",
"1271\n",
"1272\n",
"1273\n",
"1274\n",
"1275\n",
"1276\n",
"1277\n",
"1278\n",
"1279\n",
"1280\n",
"1281\n",
"1282\n",
"1283\n",
"1284\n",
"1285\n",
"1286\n",
"1287\n",
"1288\n",
"1289\n",
"1290\n",
"1291\n",
"1292\n",
"1293\n",
"1294\n",
"1295\n",
"1296\n",
"1297\n",
"1298\n",
"1299\n",
"1300\n",
"1301\n",
"1302\n",
"1303\n",
"1304\n",
"1305\n",
"1306\n",
"1307\n",
"1308\n",
"1309\n",
"1310\n",
"1311\n",
"1312\n",
"1313\n",
"1314\n",
"1315\n",
"1316\n",
"1317\n",
"1318\n",
"1319\n",
"1320\n",
"1321\n",
"1322\n",
"1323\n",
"1324\n",
"1325\n",
"1326\n",
"1327\n",
"1328\n",
"1329\n",
"1330\n",
"1331\n",
"1332\n",
"1333\n",
"1334\n",
"1335\n",
"1336\n",
"1337\n",
"1338\n",
"1339\n",
"1340\n",
"1341\n",
"1342\n",
"1343\n",
"1344\n",
"1345\n",
"1346\n",
"1347\n",
"1348\n",
"1349\n",
"1350\n",
"1351\n",
"1352\n",
"1353\n",
"1354\n",
"1355\n",
"1356\n",
"1357\n",
"1358\n",
"1359\n",
"1360\n",
"1361\n",
"1362\n",
"1363\n",
"1364\n",
"1365\n",
"1366\n",
"1367\n",
"1368\n",
"1369\n",
"1370\n",
"1371\n",
"1372\n",
"1373\n",
"1374\n",
"1375\n",
"1376\n",
"1377\n",
"1378\n",
"1379\n",
"1380\n",
"1381\n",
"1382\n",
"1383\n",
"1384\n",
"1385\n",
"1386\n",
"1387\n",
"1388\n",
"1389\n",
"1390\n",
"1391\n",
"1392\n",
"1393\n",
"1394\n",
"1395\n",
"1396\n",
"1397\n",
"1398\n",
"1399\n",
"1400\n",
"1401\n",
"1402\n",
"1403\n",
"1404\n",
"1405\n",
"1406\n",
"1407\n",
"1408\n",
"1409\n",
"1410\n",
"1411\n",
"1412\n",
"1413\n",
"1414\n",
"1415\n",
"1416\n",
"1417\n",
"1418\n",
"1419\n",
"1420\n",
"1421\n",
"1422\n",
"1423\n",
"1424\n",
"1425\n",
"1426\n",
"1427\n",
"1428\n",
"1429\n",
"1430\n",
"1431\n",
"1432\n",
"1433\n",
"1434\n",
"1435\n",
"1436\n",
"1437\n",
"1438\n",
"1439\n",
"1440\n",
"1441\n",
"1442\n",
"1443\n",
"1444\n",
"1445\n",
"1446\n",
"1447\n",
"1448\n",
"1449\n",
"1450\n",
"1451\n",
"1452\n",
"1453\n",
"1454\n",
"1455\n",
"1456\n",
"1457\n",
"1458\n",
"1459\n",
"1460\n",
"1461\n",
"1462\n",
"1463\n",
"1464\n",
"1465\n",
"1466\n",
"1467\n",
"1468\n",
"1469\n",
"1470\n",
"1471\n",
"1472\n",
"1473\n",
"1474\n",
"1475\n",
"1476\n",
"1477\n",
"1478\n",
"1479\n",
"1480\n",
"1481\n",
"1482\n",
"1483\n",
"1484\n",
"1485\n",
"1486\n",
"1487\n",
"1488\n",
"1489\n",
"1490\n",
"1491\n",
"1492\n",
"1493\n",
"1494\n",
"1495\n",
"1496\n",
"1497\n",
"1498\n",
"1499\n",
"1500\n",
"1501\n",
"1502\n",
"1503\n",
"1504\n",
"1505\n",
"1506\n",
"1507\n",
"1508\n",
"1509\n",
"1510\n",
"1511\n",
"1512\n",
"1513\n",
"1514\n",
"1515\n",
"1516\n",
"1517\n",
"1518\n",
"1519\n",
"1520\n",
"1521\n",
"1522\n",
"1523\n",
"1524\n",
"1525\n",
"1526\n",
"1527\n",
"1528\n",
"1529\n",
"1530\n",
"1531\n",
"1532\n",
"1533\n",
"1534\n",
"1535\n",
"1536\n",
"1537\n",
"1538\n",
"1539\n",
"1540\n",
"1541\n",
"1542\n",
"1543\n",
"1544\n",
"1545\n",
"1546\n",
"1547\n",
"1548\n",
"1549\n",
"1550\n",
"1551\n",
"1552\n",
"1553\n",
"1554\n",
"1555\n",
"1556\n",
"1557\n",
"1558\n",
"1559\n",
"1560\n",
"1561\n",
"1562\n",
"1563\n",
"1564\n",
"1565\n",
"1566\n",
"1567\n",
"1568\n",
"1569\n",
"1570\n",
"1571\n",
"1572\n",
"1573\n",
"1574\n",
"1575\n",
"1576\n",
"1577\n",
"1578\n",
"1579\n",
"1580\n",
"1581\n",
"1582\n",
"1583\n",
"1584\n",
"1585\n",
"1586\n",
"1587\n",
"1588\n",
"1589\n",
"1590\n",
"1591\n",
"1592\n",
"1593\n",
"1594\n",
"1595\n",
"1596\n",
"1597\n",
"1598\n",
"1599\n",
"1600\n",
"1601\n",
"1602\n",
"1603\n",
"1604\n",
"1605\n",
"1606\n",
"1607\n",
"1608\n",
"1609\n",
"1610\n",
"1611\n",
"1612\n",
"1613\n",
"1614\n",
"1615\n",
"1616\n",
"1617\n",
"1618\n",
"1619\n",
"1620\n",
"1621\n",
"1622\n",
"1623\n",
"1624\n",
"1625\n",
"1626\n",
"1627\n",
"1628\n",
"1629\n",
"1630\n",
"1631\n",
"1632\n",
"1633\n",
"1634\n",
"1635\n",
"1636\n",
"1637\n",
"1638\n",
"1639\n",
"1640\n",
"1641\n",
"1642\n",
"1643\n",
"1644\n",
"1645\n",
"1646\n",
"1647\n",
"1648\n",
"1649\n",
"1650\n",
"1651\n",
"1652\n",
"1653\n",
"1654\n",
"1655\n",
"1656\n",
"1657\n",
"1658\n",
"1659\n",
"1660\n",
"1661\n",
"1662\n",
"1663\n",
"1664\n",
"1665\n",
"1666\n",
"1667\n",
"1668\n",
"1669\n",
"1670\n",
"1671\n",
"1672\n",
"1673\n",
"1674\n",
"1675\n",
"1676\n",
"1677\n",
"1678\n",
"1679\n",
"1680\n",
"1681\n",
"1682\n",
"1683\n",
"1684\n",
"1685\n",
"1686\n",
"1687\n",
"1688\n",
"1689\n",
"1690\n",
"1691\n",
"1692\n",
"1693\n",
"1694\n",
"1695\n",
"1696\n",
"1697\n",
"1698\n",
"1699\n",
"1700\n",
"1701\n",
"1702\n",
"1703\n",
"1704\n",
"1705\n",
"1706\n",
"1707\n",
"1708\n",
"1709\n",
"1710\n",
"1711\n",
"1712\n",
"1713\n",
"1714\n",
"1715\n",
"1716\n",
"1717\n",
"1718\n",
"1719\n",
"1720\n",
"1721\n",
"1722\n",
"1723\n",
"1724\n",
"1725\n",
"1726\n",
"1727\n",
"1728\n",
"1729\n",
"1730\n",
"1731\n",
"1732\n",
"1733\n",
"1734\n",
"1735\n",
"1736\n",
"1737\n",
"1738\n",
"1739\n",
"1740\n",
"1741\n",
"1742\n",
"1743\n",
"1744\n",
"1745\n",
"1746\n",
"1747\n",
"1748\n",
"1749\n",
"1750\n",
"1751\n",
"1752\n",
"1753\n",
"1754\n",
"1755\n",
"1756\n",
"1757\n",
"1758\n",
"1759\n",
"1760\n",
"1761\n",
"1762\n",
"1763\n",
"1764\n",
"1765\n",
"1766\n",
"1767\n",
"1768\n",
"1769\n",
"1770\n",
"1771\n",
"1772\n",
"1773\n",
"1774\n",
"1775\n",
"1776\n",
"1777\n",
"1778\n",
"1779\n",
"1780\n",
"1781\n",
"1782\n",
"1783\n",
"1784\n",
"1785\n",
"1786\n",
"1787\n",
"1788\n",
"1789\n",
"1790\n",
"1791\n",
"1792\n",
"1793\n",
"1794\n",
"1795\n",
"1796\n",
"1797\n",
"1798\n",
"1799\n",
"1800\n",
"1801\n",
"1802\n",
"1803\n",
"1804\n",
"1805\n",
"1806\n",
"1807\n",
"1808\n",
"1809\n",
"1810\n",
"1811\n",
"1812\n",
"1813\n",
"1814\n",
"1815\n",
"1816\n",
"1817\n",
"1818\n",
"1819\n",
"1820\n",
"1821\n",
"1822\n",
"1823\n",
"1824\n",
"1825\n",
"1826\n",
"1827\n",
"1828\n",
"1829\n",
"1830\n",
"1831\n",
"1832\n",
"1833\n",
"1834\n",
"1835\n",
"1836\n",
"1837\n",
"1838\n",
"1839\n",
"1840\n",
"1841\n",
"1842\n",
"1843\n",
"1844\n",
"1845\n",
"1846\n",
"1847\n",
"1848\n",
"1849\n",
"1850\n",
"1851\n",
"1852\n",
"1853\n",
"1854\n",
"1855\n",
"1856\n",
"1857\n",
"1858\n",
"1859\n",
"1860\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"1861\n",
"1862\n",
"1863\n",
"1864\n",
"1865\n",
"1866\n",
"1867\n",
"1868\n",
"1869\n",
"1870\n",
"1871\n",
"1872\n",
"1873\n",
"1874\n",
"1875\n",
"1876\n",
"1877\n",
"1878\n",
"1879\n",
"1880\n",
"1881\n",
"1882\n",
"1883\n",
"1884\n",
"1885\n",
"1886\n",
"1887\n",
"1888\n",
"1889\n",
"1890\n",
"1891\n",
"1892\n",
"1893\n",
"1894\n",
"1895\n",
"1896\n",
"1897\n",
"1898\n",
"1899\n",
"1900\n",
"1901\n",
"1902\n",
"1903\n",
"1904\n",
"1905\n",
"1906\n",
"1907\n",
"1908\n",
"1909\n",
"1910\n",
"1911\n",
"1912\n",
"1913\n",
"1914\n",
"1915\n",
"1916\n",
"1917\n",
"1918\n",
"1919\n",
"1920\n",
"1921\n",
"1922\n",
"1923\n",
"1924\n",
"1925\n",
"1926\n",
"1927\n",
"1928\n",
"1929\n",
"1930\n",
"1931\n",
"1932\n",
"1933\n",
"1934\n",
"1935\n",
"1936\n",
"1937\n",
"1938\n",
"1939\n",
"1940\n",
"1941\n",
"1942\n",
"1943\n",
"1944\n",
"1945\n",
"1946\n",
"1947\n",
"1948\n",
"1949\n",
"1950\n",
"1951\n",
"1952\n",
"1953\n",
"1954\n",
"1955\n",
"1956\n",
"1957\n",
"1958\n",
"1959\n",
"1960\n",
"1961\n",
"1962\n",
"1963\n",
"1964\n",
"1965\n",
"1966\n",
"1967\n",
"1968\n",
"1969\n",
"1970\n",
"1971\n",
"1972\n",
"1973\n",
"1974\n",
"1975\n",
"1976\n",
"1977\n",
"1978\n",
"1979\n",
"1980\n",
"1981\n",
"1982\n",
"1983\n",
"1984\n",
"1985\n",
"1986\n",
"1987\n",
"1988\n",
"1989\n",
"1990\n",
"1991\n",
"1992\n",
"1993\n",
"1994\n",
"1995\n",
"1996\n",
"1997\n",
"1998\n",
"1999\n",
"2000\n",
"2001\n",
"2002\n",
"2003\n",
"2004\n",
"2005\n",
"2006\n",
"2007\n",
"2008\n",
"2009\n",
"2010\n",
"2011\n",
"2012\n",
"2013\n",
"2014\n",
"2015\n",
"2016\n",
"2017\n",
"2018\n",
"2019\n",
"2020\n",
"2021\n",
"2022\n",
"2023\n",
"2024\n",
"2025\n",
"2026\n",
"2027\n",
"2028\n",
"2029\n",
"2030\n",
"2031\n",
"2032\n",
"2033\n",
"2034\n",
"2035\n",
"2036\n",
"2037\n",
"2038\n",
"2039\n",
"2040\n",
"2041\n",
"2042\n",
"2043\n",
"2044\n",
"2045\n",
"2046\n",
"2047\n",
"2048\n",
"2049\n",
"2050\n",
"2051\n",
"2052\n",
"2053\n",
"2054\n",
"2055\n",
"2056\n",
"2057\n",
"2058\n",
"2059\n",
"2060\n",
"2061\n",
"2062\n",
"2063\n",
"2064\n",
"2065\n",
"2066\n",
"2067\n",
"2068\n",
"2069\n",
"2070\n",
"2071\n",
"2072\n",
"2073\n",
"2074\n",
"2075\n",
"2076\n",
"2077\n",
"2078\n",
"2079\n",
"2080\n",
"2081\n",
"2082\n",
"2083\n",
"2084\n",
"2085\n",
"2086\n",
"2087\n",
"2088\n",
"2089\n",
"2090\n",
"2091\n",
"2092\n",
"2093\n",
"2094\n",
"2095\n",
"2096\n",
"2097\n",
"2098\n",
"2099\n",
"2100\n",
"2101\n",
"2102\n",
"2103\n",
"2104\n",
"2105\n",
"2106\n",
"2107\n",
"2108\n",
"2109\n",
"2110\n",
"2111\n",
"2112\n",
"2113\n",
"2114\n",
"2115\n",
"2116\n",
"2117\n",
"2118\n",
"2119\n",
"2120\n",
"2121\n",
"2122\n",
"2123\n",
"2124\n",
"2125\n",
"2126\n",
"2127\n",
"2128\n",
"2129\n",
"2130\n",
"2131\n",
"2132\n",
"2133\n",
"2134\n",
"2135\n",
"2136\n",
"2137\n",
"2138\n",
"2139\n",
"2140\n",
"2141\n",
"2142\n",
"2143\n",
"2144\n",
"2145\n",
"2146\n",
"2147\n",
"2148\n",
"2149\n",
"2150\n",
"2151\n",
"2152\n",
"2153\n",
"2154\n",
"2155\n",
"2156\n",
"2157\n",
"2158\n",
"2159\n",
"2160\n",
"2161\n",
"2162\n",
"2163\n",
"2164\n",
"2165\n",
"2166\n",
"2167\n",
"2168\n",
"2169\n",
"2170\n",
"2171\n",
"2172\n",
"2173\n",
"2174\n",
"2175\n",
"2176\n",
"2177\n",
"2178\n",
"2179\n",
"2180\n",
"2181\n",
"2182\n",
"2183\n",
"2184\n",
"2185\n",
"2186\n",
"2187\n",
"2188\n",
"2189\n",
"2190\n",
"2191\n",
"2192\n",
"2193\n",
"2194\n",
"2195\n",
"2196\n",
"2197\n",
"2198\n",
"2199\n",
"2200\n",
"2201\n",
"2202\n",
"2203\n",
"2204\n",
"2205\n",
"2206\n",
"2207\n",
"2208\n",
"2209\n",
"2210\n",
"2211\n",
"2212\n",
"2213\n",
"2214\n",
"2215\n",
"2216\n",
"2217\n",
"2218\n",
"2219\n",
"2220\n",
"2221\n",
"2222\n",
"2223\n",
"2224\n",
"2225\n",
"2226\n",
"2227\n",
"2228\n",
"2229\n",
"2230\n",
"2231\n",
"2232\n",
"2233\n",
"2234\n",
"2235\n",
"2236\n",
"2237\n",
"2238\n",
"2239\n",
"2240\n",
"2241\n",
"2242\n",
"2243\n",
"2244\n",
"2245\n",
"2246\n",
"2247\n",
"2248\n",
"2249\n",
"2250\n",
"2251\n",
"2252\n",
"2253\n",
"2254\n",
"2255\n",
"2256\n",
"2257\n",
"2258\n",
"2259\n",
"2260\n",
"2261\n",
"2262\n",
"2263\n",
"2264\n",
"2265\n",
"2266\n",
"2267\n",
"2268\n",
"2269\n",
"2270\n",
"2271\n",
"2272\n",
"2273\n",
"2274\n",
"2275\n",
"2276\n",
"2277\n",
"2278\n",
"2279\n",
"2280\n",
"2281\n",
"2282\n",
"2283\n",
"2284\n",
"2285\n",
"2286\n",
"2287\n",
"2288\n",
"2289\n",
"2290\n",
"2291\n",
"2292\n",
"2293\n",
"2294\n",
"2295\n",
"2296\n",
"2297\n",
"2298\n",
"2299\n",
"2300\n",
"2301\n",
"2302\n",
"2303\n",
"2304\n",
"2305\n",
"2306\n",
"2307\n",
"2308\n",
"2309\n",
"2310\n",
"2311\n",
"2312\n",
"2313\n",
"2314\n",
"2315\n",
"2316\n",
"2317\n",
"2318\n",
"2319\n",
"2320\n",
"2321\n",
"2322\n",
"2323\n",
"2324\n",
"2325\n",
"2326\n",
"2327\n",
"2328\n",
"2329\n",
"2330\n",
"2331\n",
"2332\n",
"2333\n",
"2334\n",
"2335\n",
"2336\n",
"2337\n",
"2338\n",
"2339\n",
"2340\n",
"2341\n",
"2342\n",
"2343\n",
"2344\n",
"2345\n",
"2346\n",
"2347\n",
"2348\n",
"2349\n",
"2350\n",
"2351\n",
"2352\n",
"2353\n",
"2354\n",
"2355\n",
"2356\n",
"2357\n",
"2358\n",
"2359\n",
"2360\n",
"2361\n",
"2362\n",
"2363\n",
"2364\n",
"2365\n",
"2366\n",
"2367\n",
"2368\n",
"2369\n",
"2370\n",
"2371\n",
"2372\n",
"2373\n",
"2374\n",
"2375\n",
"2376\n",
"2377\n",
"2378\n",
"2379\n",
"2380\n",
"2381\n",
"2382\n",
"2383\n",
"2384\n",
"2385\n",
"2386\n",
"2387\n",
"2388\n",
"2389\n",
"2390\n",
"2391\n",
"2392\n",
"2393\n",
"2394\n",
"2395\n",
"2396\n",
"2397\n",
"2398\n",
"2399\n",
"2400\n",
"2401\n",
"2402\n",
"2403\n",
"2404\n",
"2405\n",
"2406\n",
"2407\n",
"2408\n",
"2409\n",
"2410\n",
"2411\n",
"2412\n",
"2413\n",
"2414\n",
"2415\n",
"2416\n",
"2417\n",
"2418\n",
"2419\n",
"2420\n",
"2421\n",
"2422\n",
"2423\n",
"2424\n",
"2425\n",
"2426\n",
"2427\n",
"2428\n",
"2429\n",
"2430\n",
"2431\n",
"2432\n",
"2433\n",
"2434\n",
"2435\n",
"2436\n",
"2437\n",
"2438\n",
"2439\n",
"2440\n",
"2441\n",
"2442\n",
"2443\n",
"2444\n",
"2445\n",
"2446\n",
"2447\n",
"2448\n",
"2449\n",
"2450\n",
"2451\n",
"2452\n",
"2453\n",
"2454\n",
"2455\n",
"2456\n",
"2457\n",
"2458\n",
"2459\n",
"2460\n",
"2461\n",
"2462\n",
"2463\n",
"2464\n",
"2465\n",
"2466\n",
"2467\n",
"2468\n",
"2469\n",
"2470\n",
"2471\n",
"2472\n",
"2473\n",
"2474\n",
"2475\n",
"2476\n",
"2477\n",
"2478\n",
"2479\n",
"2480\n",
"2481\n",
"2482\n",
"2483\n",
"2484\n",
"2485\n",
"2486\n",
"2487\n",
"2488\n",
"2489\n",
"2490\n",
"2491\n",
"2492\n",
"2493\n",
"2494\n",
"2495\n",
"2496\n",
"2497\n",
"2498\n",
"2499\n",
"2500\n",
"2501\n",
"2502\n",
"2503\n",
"2504\n",
"2505\n",
"2506\n",
"2507\n",
"2508\n",
"2509\n",
"2510\n",
"2511\n",
"2512\n",
"2513\n",
"2514\n",
"2515\n",
"2516\n",
"2517\n",
"2518\n",
"2519\n",
"2520\n",
"2521\n",
"2522\n",
"2523\n",
"2524\n",
"2525\n",
"2526\n",
"2527\n",
"2528\n",
"2529\n",
"2530\n",
"2531\n",
"2532\n",
"2533\n",
"2534\n",
"2535\n",
"2536\n",
"2537\n",
"2538\n",
"2539\n",
"2540\n",
"2541\n",
"2542\n",
"2543\n",
"2544\n",
"2545\n",
"2546\n",
"2547\n",
"2548\n",
"2549\n",
"2550\n",
"2551\n",
"2552\n",
"2553\n",
"2554\n",
"2555\n",
"2556\n",
"2557\n",
"2558\n",
"2559\n",
"2560\n",
"2561\n",
"2562\n",
"2563\n",
"2564\n",
"2565\n",
"2566\n",
"2567\n",
"2568\n",
"2569\n",
"2570\n",
"2571\n",
"2572\n",
"2573\n",
"2574\n",
"2575\n",
"2576\n",
"2577\n",
"2578\n",
"2579\n",
"2580\n",
"2581\n",
"2582\n",
"2583\n",
"2584\n",
"2585\n",
"2586\n",
"2587\n",
"2588\n",
"2589\n",
"2590\n",
"2591\n",
"2592\n",
"2593\n",
"2594\n",
"2595\n",
"2596\n",
"2597\n",
"2598\n",
"2599\n",
"2600\n",
"2601\n",
"2602\n",
"2603\n",
"2604\n",
"2605\n",
"2606\n",
"2607\n",
"2608\n",
"2609\n",
"2610\n",
"2611\n",
"2612\n",
"2613\n",
"2614\n",
"2615\n",
"2616\n",
"2617\n",
"2618\n",
"2619\n",
"2620\n",
"2621\n",
"2622\n",
"2623\n",
"2624\n",
"2625\n",
"2626\n",
"2627\n",
"2628\n",
"2629\n",
"2630\n",
"2631\n",
"2632\n",
"2633\n",
"2634\n",
"2635\n",
"2636\n",
"2637\n",
"2638\n",
"2639\n",
"2640\n",
"2641\n",
"2642\n",
"2643\n",
"2644\n",
"2645\n",
"2646\n",
"2647\n",
"2648\n",
"2649\n",
"2650\n",
"2651\n",
"2652\n",
"2653\n",
"2654\n",
"2655\n",
"2656\n",
"2657\n",
"2658\n",
"2659\n",
"2660\n",
"2661\n",
"2662\n",
"2663\n",
"2664\n",
"2665\n",
"2666\n",
"2667\n",
"2668\n",
"2669\n",
"2670\n",
"2671\n",
"2672\n",
"2673\n",
"2674\n",
"2675\n",
"2676\n",
"2677\n",
"2678\n",
"2679\n",
"2680\n",
"2681\n",
"2682\n",
"2683\n",
"2684\n",
"2685\n",
"2686\n",
"2687\n",
"2688\n",
"2689\n",
"2690\n",
"2691\n",
"2692\n",
"2693\n",
"2694\n",
"2695\n",
"2696\n",
"2697\n",
"2698\n",
"2699\n",
"2700\n",
"2701\n",
"2702\n",
"2703\n",
"2704\n",
"2705\n",
"2706\n",
"2707\n",
"2708\n",
"2709\n",
"2710\n",
"2711\n",
"2712\n",
"2713\n",
"2714\n",
"2715\n",
"2716\n",
"2717\n",
"2718\n",
"2719\n",
"2720\n",
"2721\n",
"2722\n",
"2723\n",
"2724\n",
"2725\n",
"2726\n",
"2727\n",
"2728\n",
"2729\n",
"2730\n",
"2731\n",
"2732\n",
"2733\n",
"2734\n",
"2735\n",
"2736\n",
"2737\n",
"2738\n",
"2739\n",
"2740\n",
"2741\n",
"2742\n",
"2743\n",
"2744\n",
"2745\n",
"2746\n",
"2747\n",
"2748\n",
"2749\n",
"2750\n",
"2751\n",
"2752\n",
"2753\n",
"2754\n",
"2755\n",
"2756\n",
"2757\n",
"2758\n",
"2759\n",
"2760\n",
"2761\n",
"2762\n",
"2763\n",
"2764\n",
"2765\n",
"2766\n",
"2767\n",
"2768\n",
"2769\n",
"2770\n",
"2771\n",
"2772\n",
"2773\n",
"2774\n",
"2775\n",
"2776\n",
"2777\n",
"2778\n",
"2779\n",
"2780\n",
"2781\n",
"2782\n",
"2783\n",
"2784\n",
"2785\n",
"2786\n",
"2787\n",
"2788\n",
"2789\n",
"2790\n",
"2791\n",
"2792\n",
"2793\n",
"2794\n",
"2795\n",
"2796\n",
"2797\n",
"2798\n",
"2799\n",
"2800\n",
"2801\n",
"2802\n",
"2803\n",
"2804\n",
"2805\n",
"2806\n",
"2807\n",
"2808\n",
"2809\n",
"2810\n",
"2811\n",
"2812\n",
"2813\n",
"2814\n",
"2815\n",
"2816\n",
"2817\n",
"2818\n",
"2819\n",
"2820\n",
"2821\n",
"2822\n",
"2823\n",
"2824\n",
"2825\n",
"2826\n",
"2827\n",
"2828\n",
"2829\n",
"2830\n",
"2831\n",
"2832\n",
"2833\n",
"2834\n",
"2835\n",
"2836\n",
"2837\n",
"2838\n",
"2839\n",
"2840\n",
"2841\n",
"2842\n",
"2843\n",
"2844\n",
"2845\n",
"2846\n",
"2847\n",
"2848\n",
"2849\n",
"2850\n",
"2851\n",
"2852\n",
"2853\n",
"2854\n",
"2855\n",
"2856\n",
"2857\n",
"2858\n",
"2859\n",
"2860\n",
"2861\n",
"2862\n",
"2863\n",
"2864\n",
"2865\n",
"2866\n",
"2867\n",
"2868\n",
"2869\n",
"2870\n",
"2871\n",
"2872\n",
"2873\n",
"2874\n",
"2875\n",
"2876\n",
"2877\n",
"2878\n",
"2879\n",
"2880\n",
"2881\n",
"2882\n",
"2883\n",
"2884\n",
"2885\n",
"2886\n",
"2887\n",
"2888\n",
"2889\n",
"2890\n",
"2891\n",
"2892\n",
"2893\n",
"2894\n",
"2895\n",
"2896\n",
"2897\n",
"2898\n",
"2899\n",
"2900\n",
"2901\n",
"2902\n",
"2903\n",
"2904\n",
"2905\n",
"2906\n",
"2907\n",
"2908\n",
"2909\n",
"2910\n",
"2911\n",
"2912\n",
"2913\n",
"2914\n",
"2915\n",
"2916\n",
"2917\n",
"2918\n",
"2919\n",
"2920\n",
"2921\n",
"2922\n",
"2923\n",
"2924\n",
"2925\n",
"2926\n",
"2927\n",
"2928\n",
"2929\n",
"2930\n",
"2931\n",
"2932\n",
"2933\n",
"2934\n",
"2935\n",
"2936\n",
"2937\n",
"2938\n",
"2939\n",
"2940\n",
"2941\n",
"2942\n",
"2943\n",
"2944\n",
"2945\n",
"2946\n",
"2947\n",
"2948\n",
"2949\n",
"2950\n",
"2951\n",
"2952\n",
"2953\n",
"2954\n",
"2955\n",
"2956\n",
"2957\n",
"2958\n",
"2959\n",
"2960\n",
"2961\n",
"2962\n",
"2963\n",
"2964\n",
"2965\n",
"2966\n",
"2967\n",
"2968\n",
"2969\n",
"2970\n",
"2971\n",
"2972\n",
"2973\n",
"2974\n",
"2975\n",
"2976\n",
"2977\n",
"2978\n",
"2979\n",
"2980\n",
"2981\n",
"2982\n",
"2983\n",
"2984\n",
"2985\n",
"2986\n",
"2987\n",
"2988\n",
"2989\n",
"2990\n",
"2991\n",
"2992\n",
"2993\n",
"2994\n",
"2995\n",
"2996\n",
"2997\n",
"2998\n",
"2999\n",
"3000\n",
"3001\n",
"3002\n",
"3003\n",
"3004\n",
"3005\n",
"3006\n",
"3007\n",
"3008\n",
"3009\n",
"3010\n",
"3011\n",
"3012\n",
"3013\n",
"3014\n",
"3015\n",
"3016\n",
"3017\n",
"3018\n",
"3019\n",
"3020\n",
"3021\n",
"3022\n",
"3023\n",
"3024\n",
"3025\n",
"3026\n",
"3027\n",
"3028\n",
"3029\n",
"3030\n",
"3031\n",
"3032\n",
"3033\n",
"3034\n",
"3035\n",
"3036\n",
"3037\n",
"3038\n",
"3039\n",
"3040\n",
"3041\n",
"3042\n",
"3043\n",
"3044\n",
"3045\n",
"3046\n",
"3047\n",
"3048\n",
"3049\n",
"3050\n",
"3051\n",
"3052\n",
"3053\n",
"3054\n",
"3055\n",
"3056\n",
"3057\n",
"3058\n",
"3059\n",
"3060\n",
"3061\n",
"3062\n",
"3063\n",
"3064\n",
"3065\n",
"3066\n",
"3067\n",
"3068\n",
"3069\n",
"3070\n",
"3071\n",
"3072\n",
"3073\n",
"3074\n",
"3075\n",
"3076\n",
"3077\n",
"3078\n",
"3079\n",
"3080\n",
"3081\n",
"3082\n",
"3083\n",
"3084\n",
"3085\n",
"3086\n",
"3087\n",
"3088\n",
"3089\n",
"3090\n",
"3091\n",
"3092\n",
"3093\n",
"3094\n",
"3095\n",
"3096\n",
"3097\n",
"3098\n",
"3099\n",
"3100\n",
"3101\n",
"3102\n",
"3103\n",
"3104\n",
"3105\n",
"3106\n",
"3107\n",
"3108\n",
"3109\n",
"3110\n",
"3111\n",
"3112\n",
"3113\n",
"3114\n",
"3115\n",
"3116\n",
"3117\n",
"3118\n",
"3119\n",
"3120\n",
"3121\n",
"3122\n",
"3123\n",
"3124\n",
"3125\n",
"3126\n",
"3127\n",
"3128\n",
"3129\n",
"3130\n",
"3131\n",
"3132\n",
"3133\n",
"3134\n",
"3135\n",
"3136\n",
"3137\n",
"3138\n",
"3139\n",
"3140\n",
"3141\n",
"3142\n",
"3143\n",
"3144\n",
"3145\n",
"3146\n",
"3147\n",
"3148\n",
"3149\n",
"3150\n",
"3151\n",
"3152\n",
"3153\n",
"3154\n",
"3155\n",
"3156\n",
"3157\n",
"3158\n",
"3159\n",
"3160\n",
"3161\n",
"3162\n",
"3163\n",
"3164\n",
"3165\n",
"3166\n",
"3167\n",
"3168\n",
"3169\n",
"3170\n",
"3171\n",
"3172\n",
"3173\n",
"3174\n",
"3175\n",
"3176\n",
"3177\n",
"3178\n",
"3179\n",
"3180\n",
"3181\n",
"3182\n",
"3183\n",
"3184\n",
"3185\n",
"3186\n",
"3187\n",
"3188\n",
"3189\n",
"3190\n",
"3191\n",
"3192\n",
"3193\n",
"3194\n",
"3195\n",
"3196\n",
"3197\n",
"3198\n",
"3199\n",
"3200\n",
"3201\n",
"3202\n",
"3203\n",
"3204\n",
"3205\n",
"3206\n",
"3207\n",
"3208\n",
"3209\n",
"3210\n",
"3211\n",
"3212\n",
"3213\n",
"3214\n",
"3215\n",
"3216\n",
"3217\n",
"3218\n",
"3219\n",
"3220\n",
"3221\n",
"3222\n",
"3223\n",
"3224\n",
"3225\n",
"3226\n",
"3227\n",
"3228\n",
"3229\n",
"3230\n",
"3231\n",
"3232\n",
"3233\n",
"3234\n",
"3235\n",
"3236\n",
"3237\n",
"3238\n",
"3239\n",
"3240\n",
"3241\n",
"3242\n",
"3243\n",
"3244\n",
"3245\n",
"3246\n",
"3247\n",
"3248\n",
"3249\n",
"3250\n",
"3251\n",
"3252\n",
"3253\n",
"3254\n",
"3255\n",
"3256\n",
"3257\n",
"3258\n",
"3259\n",
"3260\n",
"3261\n",
"3262\n",
"3263\n",
"3264\n",
"3265\n",
"3266\n",
"3267\n",
"3268\n",
"3269\n",
"3270\n",
"3271\n",
"3272\n",
"3273\n",
"3274\n",
"3275\n",
"3276\n",
"3277\n",
"3278\n",
"3279\n",
"3280\n",
"3281\n",
"3282\n",
"3283\n",
"3284\n",
"3285\n",
"3286\n",
"3287\n",
"3288\n",
"3289\n",
"3290\n",
"3291\n",
"3292\n",
"3293\n",
"3294\n",
"3295\n",
"3296\n",
"3297\n",
"3298\n",
"3299\n",
"3300\n",
"3301\n",
"3302\n",
"3303\n",
"3304\n",
"3305\n",
"3306\n",
"3307\n",
"3308\n",
"3309\n",
"3310\n",
"3311\n",
"3312\n",
"3313\n",
"3314\n",
"3315\n",
"3316\n",
"3317\n",
"3318\n",
"3319\n"
]
}
],
"source": [
"# you might need to run the below several times\n",
"for comment in list_of_comments[index:]:\n",
" print(index)\n",
" all_translated.append(translate_client.translate(comment))\n",
" index += 1"
]
},
{
"cell_type": "code",
"execution_count": 515,
"metadata": {},
"outputs": [],
"source": [
"everything_translated = [item.get('translatedText') for item in all_translated]\n",
"everything_detected = [item.get('detectedSourceLanguage') for item in all_translated]"
]
},
{
"cell_type": "code",
"execution_count": 516,
"metadata": {},
"outputs": [],
"source": [
"blackpink['Translations'] = everything_translated\n",
"blackpink['Detected Language'] = everything_detected"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Alternatives to Google Translate\n",
"You can use the langdetect library, although you'll need to handle cases like emojis, links, and other non-language situations. Google Translate will label the text with the most likely language based on non-Emoji characters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langdetect import detect\n"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [],
"source": [
"lang_detect_comments = []\n",
"for comment in list_of_comments:\n",
" try:\n",
" lang_detect_comments.append(detect(comment))\n",
" except:\n",
" lang_detect_comments.append('emoji')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Let's see how many unique languages there are"
]
},
{
"cell_type": "code",
"execution_count": 517,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Counter({'en': 2382, 'ar': 131, 'es': 124, 'id': 84, 'th': 71, 'vi': 68, 'pt': 55, 'ko': 50, 'tr': 41, 'jw': 36, 'tl': 36, 'ru': 25, 'hi': 18, 'et': 15, 'ja': 14, 'fr': 13, 'ms': 12, 'no': 12, 'sd': 10, 'zh-CN': 8, 'ku': 7, 'da': 7, 'fi': 6, 'su': 6, 'it': 5, 'ro': 4, 'so': 4, 'nl': 4, 'fy': 3, 'bg': 3, 'sr': 3, 'hr': 3, 'sq': 3, 'sk': 3, 'te': 3, 'kn': 3, 'pl': 3, 'mr': 2, 'zh-TW': 2, 'ps': 2, 'sw': 2, 'km': 2, 'bn': 2, 'ny': 2, 'fa': 2, 'az': 2, 'gd': 1, 'gl': 1, 'af': 1, 'yo': 1, 'gu': 1, 'haw': 1, 'sv': 1, 'ga': 1, 'ig': 1, 'sl': 1, 'kk': 1, 'st': 1, 'sn': 1, 'sm': 1, 'lb': 1, 'ht': 1, 'el': 1, 'co': 1, 'zu': 1, 'uk': 1, 'de': 1, 'hu': 1, 'mi': 1, 'mt': 1, 'hmn': 1, 'si': 1, 'eu': 1})\n"
]
}
],
"source": [
"from collections import Counter\n",
"\n",
"unique_langs = Counter(everything_detected)\n",
"print(unique_langs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Qualitative Analysis of Translations"
]
},
{
"cell_type": "code",
"execution_count": 518,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values\n",
"def get_results_of_comparison(target_value, source_column, target_column, dataset):\n",
" # create a boolean mask\n",
" mask = dataset[source_column].values == target_value\n",
" pos = np.flatnonzero(mask) # get idx\n",
" \n",
" return mask, pos, dataset[mask][target_column]\n",
"\n",
"# mask, indices, column = get_results_of_lang('vi', 'Comment', combined)"
]
},
{
"cell_type": "code",
"execution_count": 519,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 곡이 지난곡을 재생산하는느낌은지울수가없네요. 변화가필요합니다.\n",
"246 와.......3일만에 조회수가 1억 넘었네...역시 월드클래스 블핑👍👍❤️❤️❤️❤️\n",
"354 하..한국..어 댓..어딨어..?\n",
"392 부잣집 딸들\\n(rich daughters)\n",
"545 나는 당신이 그렇게 좋아\n",
"563 나는 BLACPINK 노래를 좋아한다\n",
"569 블랙핑크 너무 좋아 하는데 이번 노래는 쫌 올드한거같아요....ㅜㅜ 특히 삠빠라빠라...\n",
"592 대박대박대박\n",
"900 제발ㅠㅠ 한국인없나요??\n",
"915 한국인 어디있습니까\n",
"928 내가 엄청 홍보하고 다님 ㅎㅎ\n",
"941 얘들아 고생했어 덕분에 좋은 노래 좋은 뮤비 잘 보고 있어... 늘 고맙고 컴백 ...\n",
"1047 이쁘다 멋지다 좋다ㅠㅠㅠㅠ\n",
"1230 한국인 댓글 어디있는거야 ㅠㅠㅠㅠ 블핑체고 🖤💖🖤💖\n",
"1245 2:26 퓨전 한복이다 !!\n",
"1403 노래 개좋네❤❤내 여돌 1순위가 블랙핑크다 더 흥해라🙏🏻\n",
"28 Ang gandaaa ni Jisoo ㅠ.ㅠ\n",
"14 뮤비너무멋집니다\\n노래는 괜적으로 넘 안와닿음 \\n나이들었나봄 ㅠ ㅜ\n",
"24 이런 그룹 또 언제 나온다고 정말,, 대단하다\n",
"40 나는이 새로운 노래를 좋아했습니다. 이제는 제가 가장 좋아하는 노래 중 하나입니다.🖤💘\n",
"79 나는이 노래를 듣는 것을 멈출 수 없다! 블랙 핑크는 항상 완벽 할 것이다!\n",
"169 KST 00:40 148.54M --> 03:10 148.54M... Freezin...\n",
"245 나는 그것을 좋아했다\n",
"269 한국인 .... 분명 블랙핑크는 한국아이돌이란 말이야 ㅠㅠㅠㅠㅠㅠㅠ\n",
"331 지금 하늘 구름 색은 tropical yeah\\n저 태양 빨간빛 네 두 볼 같아\\n...\n",
"433 허, 기본은 특히 급한 수술을 할 때 미쳤다는 것입니다\n",
"616 한국인없냐 한국인한명쯤은올리자 올려!\n",
"669 왜 외국댓에 묻힐것 같냐\n",
"700 ㅋㅋㅋㅋㅋ한국어 자막 보솤ㅋㅋㅋ\n",
"758 댓글 수 실화냐\n",
"834 마지막 의상 너무 이쁘잖어~\n",
"852 외국가수인겨? 왜 한글이 안보이냐 ㅋㅋㅋㅋ 좋아요 1000만 축하합니다~~\n",
"923 한국인 좋아요 박고가\n",
"146 음악과 배경은 훌륭하지만 YG는 회원의 노래 시간과 의상 비율을 균형있게 조정해야합...\n",
"189 불쌍한 지수는 YG에게 미움을 받거나 노래를 자주 부르지 않고 재능을 모두 나타내지...\n",
"281 채널 구독을 도와주세요\n",
"298 제발 장수 하자 블랙핑크 전세계 씹어 먹자 4명다 친하고 10년이상 가자\n",
"334 뮤비너무멋집니다\\n노래는 괜적으로 넘 안와닿음 \\n나이들었나봄 ㅠ ㅜ\n",
"344 이런 그룹 또 언제 나온다고 정말,, 대단하다\n",
"360 나는이 새로운 노래를 좋아했습니다. 이제는 제가 가장 좋아하는 노래 중 하나입니다.🖤💘\n",
"399 나는이 노래를 듣는 것을 멈출 수 없다! 블랙 핑크는 항상 완벽 할 것이다!\n",
"459 리사 랩할때 카디비가 보여서 당황중;;;;;\n",
"510 리사팬들 제니 개인 캠에 제니 리사 백업댄서같다 리사가 이 노래 혼자 살린다 춤 못...\n",
"520 조회수 많다\n",
"538 블랙핑크 사랑해\n",
"553 한국인 찾는다\n",
"632 한국인 손🖐\n",
"689 한국인 어디 업나 하.. 한국인 손??\n",
"791 이틀만에 1억 조회수라니...\\n중동 음악을 차용한 랩부분은 개인적으로 좋지만 무거...\n",
"813 ♡♡♡♡♡♡♡♡블핑최고♡♡♡♡♡♡♡\n",
"Name: Comment, dtype: object\n"
]
}
],
"source": [
"mask, idx, columns = get_results_of_comparison('ko', 'Detected Language', 'Comment', blackpink)\n",
"print(columns)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With ```target_value == 'ko' ```, it seems like one or two are misclassified.\n",
"With English, we see a few things happening. Links are in English, so it makes sense that they would be classified as English. In addition, at least from a glance, we see that comments that include all emoji are English by default. Of course, we can test this by extracting all the emojis from a text and seeing if the resulting stripped text is still non-empty. "
]
},
{
"cell_type": "code",
"execution_count": 521,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Jennie with short ugh I LOVE ITTT!\n"
]
}
],
"source": [
"def strip_emojis(text):\n",
" stripped = []\n",
" data = regex.findall(r'\\X', text)\n",
" for word in data:\n",
" if any(char in emoji.UNICODE_EMOJI for char in word):\n",
" continue\n",
" else:\n",
" stripped.extend(word)\n",
" return ''.join(stripped)\n",
"\n",
"print(strip_emojis(\"Jennie with short ugh I LOVE ITTT!😍\"))"
]
},
{
"cell_type": "code",
"execution_count": 523,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Counter({'en': 412})\n"
]
}
],
"source": [
"all_emoji = [strip_emojis(text) == '' for text in list_of_comments]\n",
"blackpink['All Emoji'] = all_emoji\n",
"mask, pos, columns = get_results_of_comparison(True, 'All Emoji', 'Detected Language', blackpink)\n",
"print(Counter(columns))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sure enough, we see that all-emoji texts are classified as English. This means we need to remove the given all-emoji comments from the total count of English comments.\n",
"\n",
"Now let's expand the shorthand in 'Detected Language' into the full name for each detected language. We also want to label comments that are 'All Emoji' with 'Emoji'."
]
},
{
"cell_type": "code",
"execution_count": 524,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Comment</th>\n",
" <th>Comment ID</th>\n",
" <th>Reply Count</th>\n",
" <th>Like Count</th>\n",
" <th>Viewer Rating</th>\n",
" <th>Translations</th>\n",
" <th>Detected Language</th>\n",
" <th>All Emoji</th>\n",
" <th>Expanded Language Name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>곡이 지난곡을 재생산하는느낌은지울수가없네요. 변화가필요합니다.</td>\n",
" <td>UgzZok9hFyZ6KXL7dx54AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>I can&amp;#39;t erase the feeling that the song re...</td>\n",
" <td>ko</td>\n",
" <td>False</td>\n",
" <td>Korean</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Yay already 10 M likes</td>\n",
" <td>UgwGx_T_qoH8wfxyACV4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>Yay already 10 M likes</td>\n",
" <td>en</td>\n",
" <td>False</td>\n",
" <td>English</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>#RESPECTLISA STOP WITH THE BULLYING</td>\n",
" <td>Ugztt7qshUYp2Eprybh4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>#RESPECTLISA STOP WITH THE BULLYING</td>\n",
" <td>en</td>\n",
" <td>False</td>\n",
" <td>English</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>does no one wanna talk about Rosé with balacla...</td>\n",
" <td>UgxJvrhriUipNrhc9Nl4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>does no one wanna talk about Rosé with balacla...</td>\n",
" <td>en</td>\n",
" <td>False</td>\n",
" <td>English</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>I HOPE THE YG CEO LET BLACKPINK AND OTHER ART...</td>\n",
" <td>UgymlRqv2PpCO_oOzYd4AaABAg</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>I HOPE THE YG CEO LET BLACKPINK AND OTHER ART...</td>\n",
" <td>en</td>\n",
" <td>False</td>\n",
" <td>English</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Comment \\\n",
"0 곡이 지난곡을 재생산하는느낌은지울수가없네요. 변화가필요합니다. \n",
"1 Yay already 10 M likes \n",
"2 #RESPECTLISA STOP WITH THE BULLYING \n",
"3 does no one wanna talk about Rosé with balacla... \n",
"4 I HOPE THE YG CEO LET BLACKPINK AND OTHER ART... \n",
"\n",
" Comment ID Reply Count Like Count Viewer Rating \\\n",
"0 UgzZok9hFyZ6KXL7dx54AaABAg 0 0 NaN \n",
"1 UgwGx_T_qoH8wfxyACV4AaABAg 0 0 NaN \n",
"2 Ugztt7qshUYp2Eprybh4AaABAg 0 0 NaN \n",
"3 UgxJvrhriUipNrhc9Nl4AaABAg 0 0 NaN \n",
"4 UgymlRqv2PpCO_oOzYd4AaABAg 0 0 NaN \n",
"\n",
" Translations Detected Language \\\n",
"0 I can&#39;t erase the feeling that the song re... ko \n",
"1 Yay already 10 M likes en \n",
"2 #RESPECTLISA STOP WITH THE BULLYING en \n",
"3 does no one wanna talk about Rosé with balacla... en \n",
"4 I HOPE THE YG CEO LET BLACKPINK AND OTHER ART... en \n",
"\n",
" All Emoji Expanded Language Name \n",
"0 False Korean \n",
"1 False English \n",
"2 False English \n",
"3 False English \n",
"4 False English "
]
},
"execution_count": 524,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def expand_lang_shorthand (row):\n",
" if row['All Emoji'] == True:\n",
" return 'Emoji'\n",
" else:\n",
" return available_languages[row['Detected Language']]\n",
"\n",
"blackpink['Expanded Language Name'] = blackpink.apply(lambda row: expand_lang_shorthand(row), axis=1)\n",
"\n",
"blackpink.head()\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 525,
"metadata": {},
"outputs": [],
"source": [
"unique_expanded_langs = Counter(blackpink['Expanded Language Name'])"
]
},
{
"cell_type": "code",
"execution_count": 526,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({'Korean': 50,\n",
" 'English': 1970,\n",
" 'Portuguese': 55,\n",
" 'Indonesian': 84,\n",
" 'Arabic': 131,\n",
" 'Emoji': 412,\n",
" 'Scots Gaelic': 1,\n",
" 'Frisian': 3,\n",
" 'Vietnamese': 68,\n",
" 'French': 13,\n",
" 'Thai': 71,\n",
" 'Spanish': 124,\n",
" 'Bulgarian': 3,\n",
" 'Javanese': 36,\n",
" 'Serbian': 3,\n",
" 'Marathi': 2,\n",
" 'Turkish': 41,\n",
" 'Russian': 25,\n",
" 'Finnish': 6,\n",
" 'Filipino': 36,\n",
" 'Japanese': 14,\n",
" 'Galician': 1,\n",
" 'Chinese (Traditional)': 2,\n",
" 'Malay': 12,\n",
" 'Sindhi': 10,\n",
" 'Norwegian': 12,\n",
" 'Afrikaans': 1,\n",
" 'Yoruba': 1,\n",
" 'Kurdish (Kurmanji)': 7,\n",
" 'Danish': 7,\n",
" 'Hindi': 18,\n",
" 'Pashto': 2,\n",
" 'Gujarati': 1,\n",
" 'Romanian': 4,\n",
" 'Italian': 5,\n",
" 'Swahili': 2,\n",
" 'Sundanese': 6,\n",
" 'Croatian': 3,\n",
" 'Somali': 4,\n",
" 'Albanian': 3,\n",
" 'Chinese (Simplified)': 8,\n",
" 'Khmer': 2,\n",
" 'Hawaiian': 1,\n",
" 'Bengali': 2,\n",
" 'Swedish': 1,\n",
" 'Irish': 1,\n",
" 'Estonian': 15,\n",
" 'Chichewa': 2,\n",
" 'Slovak': 3,\n",
" 'Persian': 2,\n",
" 'Igbo': 1,\n",
" 'Telugu': 3,\n",
" 'Slovenian': 1,\n",
" 'Kazakh': 1,\n",
" 'Sesotho': 1,\n",
" 'Azerbaijani': 2,\n",
" 'Shona': 1,\n",
" 'Kannada': 3,\n",
" 'Samoan': 1,\n",
" 'Dutch': 4,\n",
" 'Polish': 3,\n",
" 'Luxembourgish': 1,\n",
" 'Haitian Creole': 1,\n",
" 'Greek': 1,\n",
" 'Corsican': 1,\n",
" 'Zulu': 1,\n",
" 'Ukrainian': 1,\n",
" 'German': 1,\n",
" 'Hungarian': 1,\n",
" 'Maori': 1,\n",
" 'Maltese': 1,\n",
" 'Hmong': 1,\n",
" 'Sinhala': 1,\n",
" 'Basque': 1})"
]
},
"execution_count": 526,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"unique_expanded_langs"
]
},
{
"cell_type": "code",
"execution_count": 528,
"metadata": {},
"outputs": [],
"source": [
"original = pd.read_csv('preprocessed.csv', lineterminator='\\n')"
]
},
{
"cell_type": "code",
"execution_count": 529,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(11383, 9)"
]
},
"execution_count": 529,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"original.shape"
]
},
{
"cell_type": "code",
"execution_count": 530,
"metadata": {},
"outputs": [],
"source": [
"new = pd.concat([original, blackpink])"
]
},
{
"cell_type": "code",
"execution_count": 531,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(14703, 9)"
]
},
"execution_count": 531,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new.shape"
]
},
{
"cell_type": "code",
"execution_count": 533,
"metadata": {},
"outputs": [],
"source": [
"new.to_csv(\"more.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 537,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(14703, 9)"
]
},
"execution_count": 537,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment