Skip to content

Instantly share code, notes, and snippets.

@lukalafaye
Created September 19, 2023 05:10
Show Gist options
  • Save lukalafaye/8ec2196aa547d03e0296d99852f5d8fe to your computer and use it in GitHub Desktop.
Save lukalafaye/8ec2196aa547d03e0296d99852f5d8fe to your computer and use it in GitHub Desktop.
inf8460_A23_TP1.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/lukalafaye/8ec2196aa547d03e0296d99852f5d8fe/inf8460_a23_tp1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"id": "417a4c65",
"metadata": {
"id": "417a4c65"
},
"source": [
"# INF8460: Traitement automatique de la langue naturelle\n",
"\n",
"# TP1: Comparaison d'algorithmes pour classification de texte"
]
},
{
"cell_type": "markdown",
"id": "dc062396",
"metadata": {
"id": "dc062396"
},
"source": [
"## Identification de l'équipe:\n",
"\n",
"### Groupe de laboratoire:\n",
"\n",
"### Equipe numéro : 4\n",
"\n",
"### Membres:\n",
"\n",
"- Luka Lafaye de Micheaux (33% de contribution, nature de la contribution)\n",
"- Marouane Lahmadi (33% de contribution, nature de la contribution)\n",
"- Tycho Lecavelier des Etangs-Levallois (33% de contribution, nature de la contribution)\n",
"\n",
"* nature de la contribution: Décrivez brièvement ce qui a été fait par chaque membre de l’équipe. Tous les membres sont censés contribuer au développement. Bien que chaque membre puisse effectuer différentes tâches, vous devez vous efforcer d’obtenir une répartition égale du travail."
]
},
{
"cell_type": "markdown",
"id": "b3dacaae",
"metadata": {
"id": "b3dacaae"
},
"source": [
"\n",
"## Description:\n",
"\n",
"Dans ce premier TP, vous explorerez les bases du traitement automatique du langage naturel. Au cours de ce travail, vous appliquerez concrètement les concepts enseignés en classe pour résoudre une tâche de classification simple. De plus, le processus ressemblera à la manière dont vous aborderiez ce type de problème dans le monde réel. Tout au long de ce laboratoire, vous vous familiariserez avec des bibliothèques couramment utilisées en NLP ainsi qu'en science des données.\n",
"\n",
"Dans ce laboratoire, vous travaillerez avec un jeu de données comprenant des évaluations de produits provenant d'Amazon. Pour chaque évaluation, le jeu de données contient trois informations : le titre fourni par l'utilisateur, le commentaire détaillé et le nombre d'étoiles attribué par l'utilisateur au produit.\n",
"\n",
"L'objectif de cette tâche consistera à prédire le nombre d'étoiles attribué à une évaluation à partir du commentaire et du titre qui lui sont associés.\n",
"\n",
"Le travail sera divisé en 3 parties:\n",
"\n",
" - Chargement, prétraitement et visualisation des données: Dans cette première partie, vous allez charger et prétraiter les données afin qu'elles soient prêtes à être utilisées par les algorithmes lors de la deuxième partie.\n",
" - Classification: Cette partie consistera à explorer les différents algorithmes pouvant être appliqués à cette tâche. Vous ferez aussi une analyse des sorties du classificateur bayésien naïf.\n",
" - Amélioration de modèle: Cette dernière partie consistera à améliorer votre modèle de 2 façons différentes. D'abord, vous ferez une recherche d'hyper-paramètres avec de la validation croisée en utilisant un GridSearch. Ensuite, vous ferez de l'extraction d'attributs avec l'aide de ChatGPT afin de d'entrainer un nouveau modèle et de comparer ainsi une représentation de type \"Bag of words\" et une représentation avec attributs spécifiques.\n",
"\n",
"\n",
"## Plan du TP\n",
"\n",
"1. [Chargement, prétraitement et visualisation des données](#1)\n",
"- 1.1 [Charger les données](#1.1)\n",
" - 1.1.1 [Charger le jeu de données](#1.1.1)\n",
" - 1.1.2 [Fusionner les colonnes title et text en une seule colonne](#1.1.2)\n",
"- 1.2 [Prétraitement des données](#1.2)\n",
"- 1.3 [Visualisation des données](#1.3)\n",
" - 1.3.1 [Afficher dans un graphique le nombre d'exemples présents dans le jeu de données pour chaque catégorie](#1.3.1)\n",
" - 1.3.2 [Afficher dans un graphique la quantité moyenne de jetons par exemple selon la catégorie](#1.3.2)\n",
" - 1.3.3 [Afficher en texte les top 10 jetons les plus fréquents par catégorie](#1.3.3)\n",
" - 1.3.4 [Afficher en texte les top 10 adjectifs les plus fréquents selon la catégorie](#1.3.4)\n",
"- 1.4 [Diviser les données en ensembles d'entraînement et de test](#1.4)\n",
"- 1.5 [Construction du vocabulaire](#1.5)\n",
"- 1.6 [Vectorisation des données](#1.6)\n",
"2. [Classification](#2)\n",
"- 2.1 [Modèle aléatoire (Random baseline)](#2.1)\n",
"- 2.2 [Analyse et compréhension d'un classificateur bayésien naïf (NB)](#2.2)\n",
" - 2.2.1 [Construction du modèle](#2.2.1)\n",
" - 2.2.2 [Matrice de confusion](#2.2.2)\n",
" - 2.2.3 [Visualisation des probabilités de NB](#2.2.3)\n",
" - 2.2.4 [Visualisation des erreurs commises](#2.2.4)\n",
" - 2.2.5 [Analyse d'erreurs commises](#2.2.5)\n",
"- 2.3 [Régression logistique](#2.3)\n",
"- 2.4 [MLP](#2.4)\n",
"3. [Amélioration de modèle](#3)\n",
"- 3.1 [Recherche d'hyper-paramètres et validation croisée](#3.1)\n",
"- 3.2 [Extraction d'attributs (Feature extraction) avec ChatGPT](#3.2)\n",
"- 3.3 [Amélioration du modèle en 3.2](#3.3)\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "1e96c2d5",
"metadata": {
"id": "1e96c2d5"
},
"source": [
"<a name='1'></a>\n",
"## 1. Chargement, prétraitement et visualisation des données (30 points)\n",
"\n",
"Dans cette première partie, vous allez charger et prétraiter les données afin qu'elles soient prêtes à être utilisées par les algorithmes lors de la deuxième partie.\n",
"\n",
"<a name='1.1'></a>\n",
"### 1.1 Charger les données (2 points)\n",
"\n",
"Ce numéro doit être résolu en utilisant la bibliothèque **pandas**.\n",
"\n",
"<a name='1.1.1'></a>\n",
"#### 1.1.1 Charger le jeu de données (1 point)\n",
"\n",
"Chargez le jeu de données amazon_rating.csv. Affichez ensuite son contenu.\n"
]
},
{
"cell_type": "code",
"execution_count": 89,
"id": "3ccc645c",
"metadata": {
"id": "3ccc645c",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "9c4ca954-6d92-42a6-814a-2cd76a01512b"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
"[nltk_data] Package stopwords is already up-to-date!\n",
"[nltk_data] Downloading package punkt to /root/nltk_data...\n",
"[nltk_data] Package punkt is already up-to-date!\n",
"[nltk_data] Downloading package averaged_perceptron_tagger to\n",
"[nltk_data] /root/nltk_data...\n",
"[nltk_data] Package averaged_perceptron_tagger is already up-to-\n",
"[nltk_data] date!\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"True"
]
},
"metadata": {},
"execution_count": 89
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import nltk\n",
"\n",
"nltk.download('stopwords')\n",
"nltk.download('punkt')\n",
"nltk.download('averaged_perceptron_tagger')\n"
]
},
{
"cell_type": "code",
"execution_count": 90,
"id": "88767fe4",
"metadata": {
"scrolled": true,
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "88767fe4",
"outputId": "1dccaab5-98a3-44c4-e740-418a76204b41"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 Five Stars \n",
"1 Did The Job \n",
"2 Great product \n",
"3 Leaking Acid EVERYEWHERE!! \n",
"4 One Star \n",
"\n",
" text rating \n",
"0 good as any name brand 5 \n",
"1 Ordered on accident when I had searched for RE... 3 \n",
"2 I was looking for something to read on and thi... 5 \n",
"3 After 2nd recharge and use all but 3 are leaki... 1 \n",
"4 They fail earlier than brand names. I assumed ... 1 "
],
"text/html": [
"\n",
" <div id=\"df-6253f37d-022d-4eb1-862b-3aba6e730e22\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>title</th>\n",
" <th>text</th>\n",
" <th>rating</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Five Stars</td>\n",
" <td>good as any name brand</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Did The Job</td>\n",
" <td>Ordered on accident when I had searched for RE...</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Great product</td>\n",
" <td>I was looking for something to read on and thi...</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Leaking Acid EVERYEWHERE!!</td>\n",
" <td>After 2nd recharge and use all but 3 are leaki...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>One Star</td>\n",
" <td>They fail earlier than brand names. I assumed ...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-6253f37d-022d-4eb1-862b-3aba6e730e22')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-6253f37d-022d-4eb1-862b-3aba6e730e22 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-6253f37d-022d-4eb1-862b-3aba6e730e22');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-ff94d4cf-d8f4-4b3f-9157-5859b8ef69f6\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-ff94d4cf-d8f4-4b3f-9157-5859b8ef69f6')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-ff94d4cf-d8f4-4b3f-9157-5859b8ef69f6 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 90
}
],
"source": [
"data = pd.read_csv(\"amazon_rating.csv\")\n",
"data.head()"
]
},
{
"cell_type": "markdown",
"id": "aadd4f88",
"metadata": {
"id": "aadd4f88"
},
"source": [
"<a name='1.1.2'></a>\n",
"#### 1.1.2 Fusionner les colonnes title et text en une seule colonne (1 point)\n",
"\n",
"Afin de faciliter la tâche pour le reste du TP, nous allons fusionner ces deux colonnes afin que nous n'ayons qu'un seul texte à considérer lors de la vectorisation.\n",
"\n",
"Afin de s'assurer de l'intégrité des textes, fusionnez-les à l'aide d'un espace. Par exemple, une évaluation ayant le titre \"Five Stars\" et le commentaire \"good as any name brand\" aura comme texte final \"Five Stars good as any name brand\".\n",
"\n",
"Stockez le résultat dans la colonne \"text\" et supprimez la colonne \"title\"."
]
},
{
"cell_type": "code",
"execution_count": 91,
"id": "b5b627c4",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
},
"id": "b5b627c4",
"outputId": "ad70ac56-d1d6-416e-b64a-5850c77451d0"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" text rating\n",
"0 Five Stars good as any name brand 5\n",
"1 Did The Job Ordered on accident when I had sea... 3\n",
"2 Great product I was looking for something to r... 5\n",
"3 Leaking Acid EVERYEWHERE!! After 2nd recharge ... 1\n",
"4 One Star They fail earlier than brand names. I... 1\n",
"... ... ...\n",
"2788 Three Stars Weird but some didn't last long as... 3\n",
"2789 Good for kids but SLOW A good starter tablet, ... 3\n",
"2790 good tablet to star is a God tablet but the ca... 3\n",
"2791 Just decent tablet Not many apps. The first on... 3\n",
"2792 One Star don't last long. Replace batteries in... 1\n",
"\n",
"[2793 rows x 2 columns]"
],
"text/html": [
"\n",
" <div id=\"df-637bc0dd-5020-47f9-9b7e-7d78445d5021\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>rating</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Five Stars good as any name brand</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Did The Job Ordered on accident when I had sea...</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Great product I was looking for something to r...</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Leaking Acid EVERYEWHERE!! After 2nd recharge ...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>One Star They fail earlier than brand names. I...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2788</th>\n",
" <td>Three Stars Weird but some didn't last long as...</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2789</th>\n",
" <td>Good for kids but SLOW A good starter tablet, ...</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2790</th>\n",
" <td>good tablet to star is a God tablet but the ca...</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2791</th>\n",
" <td>Just decent tablet Not many apps. The first on...</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2792</th>\n",
" <td>One Star don't last long. Replace batteries in...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2793 rows × 2 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-637bc0dd-5020-47f9-9b7e-7d78445d5021')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-637bc0dd-5020-47f9-9b7e-7d78445d5021 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-637bc0dd-5020-47f9-9b7e-7d78445d5021');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-a2ef8b67-c6a6-49b0-8272-41e8bd41fc72\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-a2ef8b67-c6a6-49b0-8272-41e8bd41fc72')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-a2ef8b67-c6a6-49b0-8272-41e8bd41fc72 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 91
}
],
"source": [
"data[\"text\"] = data[\"title\"] + \" \" + data[\"text\"]\n",
"data.head()\n",
"data.drop(\"title\", axis=1)"
]
},
{
"cell_type": "markdown",
"id": "6fbcbf38",
"metadata": {
"id": "6fbcbf38"
},
"source": [
"<a name='1.2'></a>\n",
"### 1.2 Prétraitement des données (4 points)\n",
"\n",
"En utilisant la librairie nltk, implémentez la fonction suivante qui :\n",
"\n",
"- Enlève les majuscules.\n",
"- Enlève les caractères de ponctuation.\n",
"- Segmente la séquence en entrée en une liste de jetons (tokenization).\n",
"- Enlève les \"stopwords\"\n",
"- Effectue la racinisation.\n",
"- Retourne l'ensemble des jetons de la séquence\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 92,
"id": "d51e35a2",
"metadata": {
"id": "d51e35a2",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "30263c74-f47f-40a1-ab85-7aad0e8e6c7b"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"['exampl', 'sentenc', 'text', 'preprocess']\n"
]
}
],
"source": [
"import nltk\n",
"from nltk.corpus import stopwords\n",
"from nltk.tokenize import word_tokenize\n",
"import string\n",
"import re\n",
"\n",
"stop_words = set(stopwords.words('english'))\n",
"stemmer = nltk.stem.porter.PorterStemmer()\n",
"\n",
"def preprocess(sentence):\n",
" \"\"\"\n",
" Fonction qui transforme une chaine de caractère en liste de jetons.\n",
" Les pre-traitements à implémenter sont:\n",
" 1. Enlever les majuscules\n",
" 2. Enlever les caractères de ponctuations\n",
" 3. Séparer la chaine de caractères en une liste de jetons (tokenization)\n",
" 4. Enlever les stopwords\n",
" 5. Stemming (racinisation)\n",
"\n",
" :param sentence: une chaine de caractère\n",
" :return: la liste de jetons\n",
" \"\"\"\n",
"\n",
" # 1. Enlever les majuscules\n",
" sentence = sentence.lower()\n",
"\n",
" # 2. Enlever les caractères de ponctuation\n",
" sentence = re.sub(f\"[{string.punctuation}]\", '', sentence)\n",
"\n",
" # 3. Séparer la chaine de caractères en une liste de jetons (tokenization)\n",
" tokens = word_tokenize(sentence)\n",
"\n",
" # 4. Enlever les stopwords\n",
" tokens = [token for token in tokens if token not in stop_words]\n",
"\n",
" # 5. Stemming (racinisation)\n",
" tokens = [stemmer.stem(token) for token in tokens]\n",
"\n",
" return tokens\n",
"\n",
"# Example usage:\n",
"input_sentence = \"This is an example sentence for text preprocessing.\"\n",
"preprocessed_tokens = preprocess(input_sentence)\n",
"print(preprocessed_tokens)\n"
]
},
{
"cell_type": "code",
"execution_count": 93,
"id": "ea36a64b",
"metadata": {
"id": "ea36a64b"
},
"outputs": [],
"source": [
"\"\"\"\n",
"NE PAS MODIFIER\n",
"\n",
"Le code suivant appliquera votre fonction sur tous les exemples. Il gardera aussi une version originale pour une analyse future.\n",
"\"\"\"\n",
"\n",
"data[\"text_original\"] = data[\"text\"]\n",
"data[\"text\"] = data[\"text\"].apply(preprocess)\n"
]
},
{
"cell_type": "markdown",
"id": "a1c0e19b",
"metadata": {
"id": "a1c0e19b"
},
"source": [
"<a name='1.3'></a>\n",
"### 1.3 Visualisation des données (15 points)\n",
"\n",
"**Utilisez la bibliothèque matplotlib pour les graphiques.** Vous pouvez utiliser n'importe quelle classe de base de Python, par exemple collections.Counter, qui sera utile pour l'affichage des jetons."
]
},
{
"cell_type": "markdown",
"id": "731d61c8",
"metadata": {
"id": "731d61c8"
},
"source": [
"# La colonne \"rating\" contient le nombre d'étoiles associé à l'évaluation d'un utilisateur. Le nombre d'étoiles varie entre 1 et 5.\n",
"\n",
"Afin de simplifier la tâche de classification, nous avons enlevé les commentaires ayant 2 et 4 étoiles du jeu de données. Cela signifie qu'il y a trois catégories de commentaires, c'est\n",
"\n",
"```\n",
"# Ce texte est au format code\n",
"```\n",
"\n",
"-à-dire ceux ayant 1, 3 ou 5 étoiles.\n",
"\n",
"Affichez dans un graphique :\n",
"\n",
"- Le nombre d'exemples présents dans le jeu de données par catégorie.\n",
"- La quantité moyenne de jetons par exemple selon la catégorie.\n"
]
},
{
"cell_type": "code",
"source": [
"data.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "mWKnumERjFMx",
"outputId": "b7fc89bd-f7ff-4682-bab6-d1b5253e64a6"
},
"id": "mWKnumERjFMx",
"execution_count": 94,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"0 Five Stars \n",
"1 Did The Job \n",
"2 Great product \n",
"3 Leaking Acid EVERYEWHERE!! \n",
"4 One Star \n",
"\n",
" text rating \\\n",
"0 [five, star, good, name, brand] 5 \n",
"1 [job, order, accid, search, recharg, batteri, ... 3 \n",
"2 [great, product, look, someth, read, fit, bill... 5 \n",
"3 [leak, acid, everyewher, 2nd, recharg, use, 3,... 1 \n",
"4 [one, star, fail, earlier, brand, name, assum,... 1 \n",
"\n",
" text_original \n",
"0 Five Stars good as any name brand \n",
"1 Did The Job Ordered on accident when I had sea... \n",
"2 Great product I was looking for something to r... \n",
"3 Leaking Acid EVERYEWHERE!! After 2nd recharge ... \n",
"4 One Star They fail earlier than brand names. I... "
],
"text/html": [
"\n",
" <div id=\"df-359311c0-aed8-489e-8d5c-78cf6fc9cf3e\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>title</th>\n",
" <th>text</th>\n",
" <th>rating</th>\n",
" <th>text_original</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Five Stars</td>\n",
" <td>[five, star, good, name, brand]</td>\n",
" <td>5</td>\n",
" <td>Five Stars good as any name brand</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Did The Job</td>\n",
" <td>[job, order, accid, search, recharg, batteri, ...</td>\n",
" <td>3</td>\n",
" <td>Did The Job Ordered on accident when I had sea...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Great product</td>\n",
" <td>[great, product, look, someth, read, fit, bill...</td>\n",
" <td>5</td>\n",
" <td>Great product I was looking for something to r...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Leaking Acid EVERYEWHERE!!</td>\n",
" <td>[leak, acid, everyewher, 2nd, recharg, use, 3,...</td>\n",
" <td>1</td>\n",
" <td>Leaking Acid EVERYEWHERE!! After 2nd recharge ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>One Star</td>\n",
" <td>[one, star, fail, earlier, brand, name, assum,...</td>\n",
" <td>1</td>\n",
" <td>One Star They fail earlier than brand names. I...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-359311c0-aed8-489e-8d5c-78cf6fc9cf3e')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-359311c0-aed8-489e-8d5c-78cf6fc9cf3e button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-359311c0-aed8-489e-8d5c-78cf6fc9cf3e');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-f496cfd0-5df0-46ef-82f6-7a0f9b2a66c4\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-f496cfd0-5df0-46ef-82f6-7a0f9b2a66c4')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-f496cfd0-5df0-46ef-82f6-7a0f9b2a66c4 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 94
}
]
},
{
"cell_type": "markdown",
"id": "6e870b33",
"metadata": {
"id": "6e870b33"
},
"source": [
"<a name='1.3.1'></a>\n",
"#### 1.3.1 Afficher dans un graphique le nombre d'exemples présents dans le jeu de données pour chaque catégorie (3 points)"
]
},
{
"cell_type": "code",
"execution_count": 95,
"id": "239b4dfa",
"metadata": {
"scrolled": true,
"id": "239b4dfa",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 715
},
"outputId": "bf4c095a-97b2-46e2-b0cd-811155643a15"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Nombre d'exemples par catégorie :\n",
"3 965\n",
"5 933\n",
"1 895\n",
"Name: rating, dtype: int64\n",
"\n",
"Quantité moyenne de jetons par exemple selon la catégorie :\n",
"rating\n",
"1 23.383240\n",
"3 18.403109\n",
"5 14.825295\n",
"Name: token_count, dtype: float64\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Supposons que vous avez un DataFrame appelé 'data' avec une colonne 'rating'\n",
"# et une colonne 'text' contenant le texte de chaque exemple.\n",
"\n",
"# Compter le nombre d'exemples par catégorie (rating)\n",
"count_by_category = data['rating'].value_counts()\n",
"\n",
"# Calculer la quantité moyenne de jetons par exemple selon la catégorie\n",
"data['token_count'] = data['text'].apply(lambda x: len(x)) # Compte des jetons dans chaque exemple\n",
"average_tokens_by_category = data.groupby('rating')['token_count'].mean()\n",
"\n",
"# Afficher les résultats\n",
"print(\"Nombre d'exemples par catégorie :\")\n",
"print(count_by_category)\n",
"print(\"\\nQuantité moyenne de jetons par exemple selon la catégorie :\")\n",
"print(average_tokens_by_category)\n",
"\n",
"\n",
"plt.figure(figsize=(10, 5))\n",
"count_by_category.plot(kind='bar', color='skyblue')\n",
"plt.title('Nombre d\\'exemples par catégorie')\n",
"plt.xlabel('Catégorie (Rating)')\n",
"plt.ylabel('Nombre d\\'exemples')\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"id": "c244aa72",
"metadata": {
"id": "c244aa72"
},
"source": [
"<a name='1.3.2'></a>\n",
"#### 1.3.2 Afficher dans un graphique le nombre moyen de jetons dans les exemples de chaque catégorie (4 points)"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "663f798a",
"metadata": {
"scrolled": true,
"id": "663f798a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 507
},
"outputId": "952f0a64-7525-4770-efa6-9bb0df11581d"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
],
"source": [
"plt.figure(figsize=(10, 5))\n",
"average_tokens_by_category.plot(kind='bar', color='skyblue')\n",
"plt.title('Nombre d\\'exemples moyen par catégorie')\n",
"plt.xlabel('Catégorie (Rating)')\n",
"plt.ylabel('Nombre d\\'exemples moyen')\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"id": "e6650c0f",
"metadata": {
"id": "e6650c0f"
},
"source": [
"<a name='1.3.3'></a>\n",
"#### 1.3.3 Afficher en texte les top 10 des jetons les plus fréquents par catégorie (4 points)"
]
},
{
"cell_type": "markdown",
"id": "86ff8a3f",
"metadata": {
"id": "86ff8a3f"
},
"source": [
"\n",
"Affichez en texte les 10 jetons les plus fréquents selon la catégorie.\n"
]
},
{
"cell_type": "code",
"execution_count": 97,
"id": "0befd5a8",
"metadata": {
"id": "0befd5a8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "eb2cb201-e742-4703-b4da-39ec6dcf5117"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{5: ['great',\n",
" 'tablet',\n",
" 'batteri',\n",
" 'love',\n",
" 'good',\n",
" 'price',\n",
" 'use',\n",
" 'star',\n",
" 'five',\n",
" 'work'],\n",
" 3: ['batteri',\n",
" 'tablet',\n",
" 'good',\n",
" 'use',\n",
" 'last',\n",
" 'great',\n",
" 'price',\n",
" 'three',\n",
" 'star',\n",
" 'long'],\n",
" 1: ['batteri',\n",
" 'last',\n",
" 'one',\n",
" 'use',\n",
" 'amazon',\n",
" 'work',\n",
" 'buy',\n",
" 'dont',\n",
" 'star',\n",
" 'purchas']}"
]
},
"metadata": {},
"execution_count": 97
}
],
"source": [
"from collections import Counter\n",
"\n",
"def most_common(df, col):\n",
" categories = df[\"rating\"].unique()\n",
" top_tokens = {}\n",
"\n",
" for rating in categories:\n",
" # Créer un Counter pour chaque catégorie\n",
" counter = Counter()\n",
" for tokens in data[data['rating'] == rating][col]:\n",
" counter.update(tokens)\n",
"\n",
" top_tokens[rating] = [t[0] for t in counter.most_common(10)]\n",
"\n",
" return top_tokens\n",
"\n",
"most_common(data, \"text\")"
]
},
{
"cell_type": "markdown",
"id": "bafb388a",
"metadata": {
"id": "bafb388a"
},
"source": [
"print(counters[0])\n",
"<a name='1.3.4'></a>\n",
"#### 1.3.4 Afficher en texte les top 10 des adjectifs les plus fréquents selon la catégorie (4 points)\n",
"\n",
"Pour cet exercice, vous devrez utiliser la fonction [nlt.pos_tag](https://www.nltk.org/book/ch05.html) et retenir les jetons identifiés comme JJ.\n",
"\n",
"Pour obtenir de bons résultats, le tagger [nltk.pos_tag](https://www.nltk.org/book/ch05.html) doit être exécuté sur le texte original, incluant les stopwords.\n",
"Vous devrez donc partir des évaluations originales. Pour vous simplifier la tâche, utilisez\n",
"le tokenizer *word_tokenize* provenant de nltk.\n",
"\n",
"**Les adjectifs sont les jetons identifiés comme JJ.**"
]
},
{
"cell_type": "code",
"execution_count": 98,
"id": "bc15ac5f",
"metadata": {
"id": "bc15ac5f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "98ee38b3-b552-44b1-a706-1012e495694f"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{5: ['great',\n",
" 'good',\n",
" 'old',\n",
" 'easy',\n",
" 'Good',\n",
" 'other',\n",
" 'last',\n",
" 'Great',\n",
" 'long',\n",
" 'Excellent'],\n",
" 3: ['good',\n",
" 'last',\n",
" 'great',\n",
" 'Good',\n",
" 'other',\n",
" 'old',\n",
" 'little',\n",
" 'slow',\n",
" 'ok',\n",
" 'long'],\n",
" 1: ['last',\n",
" 'good',\n",
" 'dead',\n",
" 'other',\n",
" 'few',\n",
" 'same',\n",
" 'bad',\n",
" 'new',\n",
" 'first',\n",
" 'long']}"
]
},
"metadata": {},
"execution_count": 98
}
],
"source": [
"from nltk.tokenize import word_tokenize\n",
"from nltk import pos_tag\n",
"\n",
"def select_adj(text):\n",
" tokens = word_tokenize(text)\n",
" pos_tags = pos_tag(tokens)\n",
" return [token for token, pos in pos_tags if pos == 'JJ']\n",
"\n",
"data['adj'] = data['text_original'].apply(lambda x: select_adj(x))\n",
"\n",
"data.head()\n",
"\n",
"most_common(data, \"adj\")"
]
},
{
"cell_type": "markdown",
"id": "86de4b1e",
"metadata": {
"id": "86de4b1e"
},
"source": [
"<a name='1.4'></a>\n",
"### 1.4 Diviser les données en ensembles d'entraînement et de test (1 point)\n",
"\n",
"À l'aide de la fonction [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) de SKlearn, séparez les données en ensembles d'entraînement (67% des données) et de test (33% des données). Gardez les deux ensembles dans 2 variables."
]
},
{
"cell_type": "code",
"execution_count": 99,
"id": "0e5a58bc",
"metadata": {
"id": "0e5a58bc",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"outputId": "b00df23a-feff-42cf-dd88-fb3e47cfaf4a"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"374 Dead after 3 days \n",
"23 Good First Tablet \n",
"2057 It fits my need perfectly \n",
"336 LEAK ! \n",
"221 easy set up and user friendly \n",
"\n",
" text rating \\\n",
"374 [dead, 3, day, put, 3, day, ago, alreadi, dead... 1 \n",
"23 [good, first, tablet, purchas, sinc, bought, g... 3 \n",
"2057 [fit, need, perfectli, origin, kindl, fire, lo... 5 \n",
"336 [leak, heck, seriou, issu, batteri, put, amazo... 1 \n",
"221 [easi, set, user, friendli, suggest, sale, ass... 5 \n",
"\n",
" text_original token_count \\\n",
"374 Dead after 3 days Just put them in 3 days ago ... 14 \n",
"23 Good First Tablet I purchased this since I bou... 29 \n",
"2057 It fits my need perfectly My original Kindle F... 57 \n",
"336 LEAK ! WHAT THE HECK! I have a SERIOUS issue w... 54 \n",
"221 easy set up and user friendly suggested by the... 11 \n",
"\n",
" adj \n",
"374 [Dead, dead, next] \n",
"23 [Good, slow] \n",
"2057 [original, much, other, old, locked, few, Menial] \n",
"336 [few, bad, few, several, next, right] \n",
"221 [glad] "
],
"text/html": [
"\n",
" <div id=\"df-f6e0d77a-4dff-4046-b43b-ab63cba9b974\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>title</th>\n",
" <th>text</th>\n",
" <th>rating</th>\n",
" <th>text_original</th>\n",
" <th>token_count</th>\n",
" <th>adj</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>374</th>\n",
" <td>Dead after 3 days</td>\n",
" <td>[dead, 3, day, put, 3, day, ago, alreadi, dead...</td>\n",
" <td>1</td>\n",
" <td>Dead after 3 days Just put them in 3 days ago ...</td>\n",
" <td>14</td>\n",
" <td>[Dead, dead, next]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Good First Tablet</td>\n",
" <td>[good, first, tablet, purchas, sinc, bought, g...</td>\n",
" <td>3</td>\n",
" <td>Good First Tablet I purchased this since I bou...</td>\n",
" <td>29</td>\n",
" <td>[Good, slow]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2057</th>\n",
" <td>It fits my need perfectly</td>\n",
" <td>[fit, need, perfectli, origin, kindl, fire, lo...</td>\n",
" <td>5</td>\n",
" <td>It fits my need perfectly My original Kindle F...</td>\n",
" <td>57</td>\n",
" <td>[original, much, other, old, locked, few, Menial]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>336</th>\n",
" <td>LEAK !</td>\n",
" <td>[leak, heck, seriou, issu, batteri, put, amazo...</td>\n",
" <td>1</td>\n",
" <td>LEAK ! WHAT THE HECK! I have a SERIOUS issue w...</td>\n",
" <td>54</td>\n",
" <td>[few, bad, few, several, next, right]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>221</th>\n",
" <td>easy set up and user friendly</td>\n",
" <td>[easi, set, user, friendli, suggest, sale, ass...</td>\n",
" <td>5</td>\n",
" <td>easy set up and user friendly suggested by the...</td>\n",
" <td>11</td>\n",
" <td>[glad]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f6e0d77a-4dff-4046-b43b-ab63cba9b974')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-f6e0d77a-4dff-4046-b43b-ab63cba9b974 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-f6e0d77a-4dff-4046-b43b-ab63cba9b974');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-0a8cc229-e7e2-4ebc-95ae-116e8f0d67df\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-0a8cc229-e7e2-4ebc-95ae-116e8f0d67df')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-0a8cc229-e7e2-4ebc-95ae-116e8f0d67df button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 99
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"train, test = train_test_split(data, test_size=0.33)\n",
"train.head()"
]
},
{
"cell_type": "markdown",
"id": "5146eabe",
"metadata": {
"id": "5146eabe"
},
"source": [
"<a name='1.5'></a>\n",
"### 1.5 Construction du vocabulaire (4 points)\n",
"\n",
"Dans un modèle Bag-of-Words (BoW), un vocabulaire est prédéterminé à partir de l'ensemble d'entraînement. Seuls les mots faisant partie de ce vocabulaire seront considérés pour la suite.\n",
"\n",
"Complétez la fonction **build_voc** qui retourne une liste de jetons qui sont présents au moins n fois (threshold passé en paramètre) dans la liste d'exemples (également passée en paramètre). Vous pouvez utiliser la classe Counter.\n",
"\n",
"Ensuite, appelez cette fonction pour construire votre vocabulaire."
]
},
{
"cell_type": "code",
"execution_count": 100,
"id": "d0e901d6",
"metadata": {
"id": "d0e901d6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "ac52f9ef-2212-4317-8332-f23292b230c3"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Counter({'batteri': 1330, 'great': 608, 'tablet': 594, 'use': 515, 'good': 499, 'last': 457, 'work': 406, 'amazon': 402, 'one': 388, 'star': 388, 'price': 356, 'buy': 293, 'get': 276, 'love': 253, 'bought': 252, 'dont': 252, 'like': 247, 'long': 235, 'product': 233, 'kid': 231, 'purchas': 210, 'time': 197, 'fire': 186, 'brand': 182, 'would': 175, 'app': 169, 'kindl': 168, 'year': 167, 'three': 165, 'read': 143, 'old': 143, 'charg': 142, 'go': 140, 'need': 139, 'devic': 135, 'day': 131, 'five': 131, 'qualiti': 128, 'duracel': 124, 'play': 122, 'better': 120, '2': 118, 'month': 115, 'tri': 113, 'well': 113, 'return': 113, 'week': 112, 'life': 112, 'money': 111, 'store': 109, 'easi': 109, 'game': 107, 'seem': 106, 'remot': 103, 'much': 102, 'ok': 98, 'im': 98, 'best': 98, 'realli': 97, 'first': 96, 'replac': 96, 'leak': 96, 'light': 96, 'screen': 95, 'two': 95, 'dead': 94, 'put': 94, 'got': 92, 'cant': 91, 'also': 90, 'book': 89, 'new': 89, 'back': 87, 'ive': 86, 'even': 86, 'didnt': 84, 'name': 83, '4': 82, 'ipad': 81, 'want': 81, 'thing': 80, 'lot': 80, 'never': 79, 'could': 79, 'ever': 79, 'aa': 79, 'nice': 79, 'slow': 78, 'far': 78, '8': 78, 'power': 78, 'bad': 75, 'gift': 74, 'die': 74, 'littl': 72, 'less': 72, '3': 71, 'set': 71, 'everi': 71, 'disappoint': 70, 'case': 69, 'control': 69, 'valu': 69, 'order': 69, 'make': 69, 'expect': 68, 'recommend': 67, 'mani': 66, 'differ': 66, 'pay': 65, 'doesnt': 65, 'toy': 64, 'cheap': 63, 'look': 63, 'fine': 63, 'worst': 63, 'take': 62, 'perfect': 62, 'hour': 62, 'item': 62, 'size': 60, 'christma': 59, 'still': 59, 'say': 59, 'sure': 58, 'box': 58, 'deal': 58, 'perform': 58, 'son': 58, 'download': 57, 'basic': 57, 'short': 57, 'wast': 56, 'energ': 56, 'review': 56, 'pack': 56, 'half': 55, 'run': 55, 'googl': 54, 'daughter': 53, 'know': 52, 'longer': 51, 'fast': 51, 'help': 51, 'issu': 50, 'think': 50, 'open': 50, 'aaa': 50, 'turn': 50, 'wont': 49, 'way': 49, 'hold': 49, '5': 48, 'thought': 48, 'chang': 48, 'start': 47, 'excel': 47, 'tv': 46, 'camera': 46, 'recharg': 46, 'packag': 45, 'someth': 45, 'hd': 45, 'sinc': 44, 'problem': 44, 'limit': 44, 'worth': 44, 'quickli': 43, 'movi': 43, 'low': 43, 'find': 42, 'went': 42, 'come': 41, 'terribl': 41, 'right': 40, 'children': 40, 'internet': 40, 'give': 40, 'awesom': 40, 'featur': 40, '1': 39, 'without': 39, 'android': 39, 'enough': 39, 'receiv': 39, 'though': 39, 'compar': 38, 'big': 38, 'reason': 38, 'decent': 38, 'anoth': 38, 'alreadi': 37, 'abl': 37, '6': 37, 'watch': 37, 'stop': 37, 'within': 36, 'hand': 36, 'see': 36, 'enjoy': 36, 'keep': 36, 'save': 36, 'yet': 36, 'video': 36, 'mayb': 36, 'poor': 36, 'came': 35, 'friendli': 35, 'took': 35, 'hard': 35, 'mous': 35, 'made': 35, 'parent': 34, 'test': 34, 'may': 34, 'cost': 34, 'comput': 34, 'free': 34, 'around': 34, 'found': 34, 'keyboard': 33, 'sever': 33, 'end': 33, 'prime': 33, 'happi': 33, 'second': 33, '10': 33, 'spend': 33, 'notic': 33, 'acid': 33, 'child': 32, '7': 32, 'noth': 32, 'user': 31, 'pretti': 31, 'load': 31, 'almost': 31, 'small': 31, 'that': 30, 'amaz': 30, 'everyth': 30, 'wouldnt': 30, 'stick': 30, 'show': 30, 'theyr': 29, 'coupl': 29, 'connect': 29, 'said': 29, 'anyth': 29, 'hous': 29, 'wish': 29, 'horribl': 29, 'bit': 29, 'advertis': 28, 'learn': 28, 'alkalin': 28, 'storag': 28, 'version': 28, 'extra': 28, 'feel': 27, 'minut': 27, 'away': 27, 'pictur': 27, 'other': 27, 'port': 27, 'memori': 27, 'wireless': 27, 'howev': 27, 'instal': 27, 'hope': 27, 'oper': 27, 'granddaught': 27, 'fail': 27, 'havent': 26, 'id': 26, 'ill': 26, 'usual': 25, 'offer': 25, '12': 25, 'phone': 25, 'junk': 25, 'warranti': 25, 'dollar': 25, 'high': 25, 'normal': 25, 'couldnt': 25, 'fit': 24, 'drain': 24, 'definit': 24, 'amazonbas': 24, 'wrong': 24, 'wifi': 24, 'black': 24, 'next': 23, 'charger': 23, 'experi': 23, 'protect': 23, 'card': 23, 'probabl': 23, 'top': 23, 'reader': 23, 'account': 23, 'none': 23, 'onlin': 23, 'resolut': 23, 'wasnt': 22, 'option': 22, 'arriv': 22, 'model': 22, '20': 22, 'appl': 22, 'batch': 22, 'sound': 22, 'actual': 22, 'full': 22, 'ask': 22, 'alway': 22, 'absolut': 22, 'four': 22, 'ad': 22, 'electron': 22, 'candl': 22, 'sale': 21, 'speed': 21, 'wonder': 21, 'impress': 21, 'beat': 21, 'expens': 21, 'figur': 21, 'defect': 21, 'clock': 21, 'access': 21, 'requir': 21, 'etc': 21, 'super': 21, 'includ': 20, 'flashlight': 20, 'mine': 20, 'thank': 20, 'updat': 20, 'cheaper': 20, 'highli': 20, 'side': 20, 'isnt': 20, 'laptop': 20, 'gb': 20, 'faster': 20, 'ruin': 20, 'ago': 19, 'beginn': 19, 'night': 19, 'happen': 19, 'gener': 19, 'page': 19, 'gave': 18, 'overal': 18, 'drop': 18, 'total': 18, 'wife': 18, 'send': 18, 'logic': 18, 'point': 18, 'speaker': 18, 'web': 18, 'email': 18, 'els': 18, 'kind': 18, 'home': 17, 'okay': 17, 'aw': 17, 'upgrad': 17, 'told': 17, 'insid': 17, 'function': 17, 'complet': 17, 'care': 17, 'custom': 17, 'xbox': 17, 'ye': 17, 'suck': 17, 'color': 17, 'easili': 17, 'durabl': 17, 'friday': 17, 'regular': 17, 'yr': 17, 'burn': 17, 'os': 17, 'voyag': 17, 'grandson': 16, 'origin': 16, 'lock': 16, 'later': 16, 'throw': 16, 'famili': 16, 'switch': 16, 'tell': 16, 'ran': 16, '15': 16, 'check': 16, 'mode': 16, 'cover': 16, 'content': 16, 'opinion': 16, 'pleas': 16, 'matter': 16, 'tab': 16, 'least': 16, 'multipl': 16, 'servic': 16, 'own': 16, 'either': 16, 'costco': 16, '30': 16, 'quit': 15, 'avail': 15, 'decid': 15, 'simpli': 15, 'allow': 15, 'freez': 15, 'us': 15, 'inexpens': 15, 'mom': 15, 'pop': 15, 'bare': 15, 'averag': 15, 'samsung': 15, 'expand': 15, 'final': 15, 'fun': 15, 'mini': 15, 'led': 15, 'hd8': 14, 'photo': 14, 'previou': 14, 'instead': 14, 'higher': 14, 'sell': 14, 'annoy': 14, 'regist': 14, 'special': 14, 'constantli': 14, 'rather': 14, 'school': 14, 'sometim': 14, 'navig': 14, 'add': 14, 'sd': 14, 'quick': 14, 'fact': 14, 'ebook': 14, 'white': 14, 'larger': 14, 'miss': 14, 'person': 13, 'anyon': 13, 'might': 13, 'must': 13, 'place': 13, 'job': 13, 'fulli': 13, 'pair': 13, 'older': 13, 'trip': 13, 'cell': 13, 'sorri': 13, 'troubl': 13, 'major': 13, 'touch': 13, 'shut': 13, 'music': 13, 'tabl': 13, 'youtub': 13, 'left': 13, 'paper': 13, 'similar': 13, 'useless': 13, 'flash': 13, 'unit': 13, 'satisfi': 13, 'car': 13, 'past': 13, 'perfectli': 12, 'kept': 12, 'stuff': 12, 'rate': 12, 'softwar': 12, 'select': 12, 'system': 12, 'librari': 12, 'soon': 12, 'immedi': 12, 'let': 12, 'caus': 12, 'ton': 12, 'difficult': 12, 'believ': 12, 'setup': 12, 'birthday': 12, 'button': 12, 'complaint': 12, 'except': 12, 'buck': 12, 'huge': 12, 'garbag': 12, '50': 12, 'netflix': 12, 'guess': 12, 'rayovac': 12, 'fix': 12, 'continu': 12, 'quantiti': 12, 'result': 12, 'consid': 11, 'prefer': 11, 'extrem': 11, '34': 11, 'there': 11, 'contact': 11, 'shelf': 11, 'applic': 11, 'shop': 11, 'whether': 11, 'leav': 11, 'luck': 11, 'husband': 11, 'stay': 11, 'addit': 11, 'handi': 11, 'purpos': 11, 'green': 11, 'weak': 11, 'ship': 11, 'alexa': 11, 'surpris': 11, 'instruct': 11, 'brows': 11, 'program': 11, 'travel': 11, 'display': 11, '16': 11, 'facebook': 11, 'crap': 11, 'young': 11, 'done': 11, 'previous': 11, 'liter': 11, 'dim': 11, 'anywher': 11, 'nearli': 11, 'idea': 11, 'broke': 11, 'space': 11, 'amount': 11, 'line': 11, 'realiz': 10, 'clear': 10, 'frustrat': 10, 'singl': 10, 'support': 10, 'toddler': 10, 'mostli': 10, 'solid': 10, 'stand': 10, 'outsid': 10, 'digit': 10, 'age': 10, '9': 10, 'design': 10, 'paid': 10, 'wors': 10, 'wrap': 10, 'plastic': 10, 'fan': 10, 'tap': 10, 'portabl': 10, 'afford': 10, 'mirror': 10, 'surf': 10, 'near': 10, 'member': 10, 'capabl': 10, 'eread': 10, 'larg': 10, '24': 10, 'stream': 10, 'manufactur': 10, 'level': 10, 'lack': 10, 'hate': 10, 'futur': 10, 'plug': 9, 'saw': 9, 'type': 9, 'hit': 9, 'glad': 9, 'niec': 9, 'arent': 9, 'describ': 9, 'remov': 9, 'oversea': 9, 'weve': 9, 'often': 9, 'bottom': 9, 'effect': 9, 'question': 9, 'easier': 9, 'alot': 9, 'sleev': 9, 'stock': 9, 'warn': 9, 'part': 9, 'simpl': 9, '25': 9, '2nd': 9, 'cool': 9, 'begin': 9, 'exactli': 9, 'live': 9, 'third': 9, 'sister': 9, 'especi': 9, 'per': 9, 'eye': 9, '100': 9, 'decor': 9, 'compani': 9, 'break': 9, 'step': 9, 'station': 9, 'date': 9, 'sent': 9, 'appear': 9, 'piec': 9, 'edit': 9, 'safe': 9, 'starter': 9, 'unless': 9, 'water': 9, 'properli': 9, 'micro': 9, 'weight': 9, 'interfac': 9, 'capac': 9, 'adult': 9, 'bumper': 9, 'standard': 9, 'usag': 9, 'call': 9, 'worri': 9, 'readi': 9, 'improv': 8, 'plu': 8, 'sit': 8, 'whatev': 8, 'suggest': 8, 'lower': 8, 'detector': 8, 'bigger': 8, 'close': 8, 'your': 8, 'specif': 8, 'everywher': 8, 'overpr': 8, 'someon': 8, 'pro': 8, 'indonesia': 8, 'fair': 8, 'galaxi': 8, 'bulki': 8, 'echo': 8, 'window': 8, 'extern': 8, 'peopl': 8, '11': 8, 'slide': 8, 'backpack': 8, 'note': 8, 'gun': 8, 'busi': 8, 'fantast': 8, 'pick': 8, '36': 8, 'main': 8, 'cabl': 8, 'mother': 8, 'shot': 8, 'base': 8, 'activ': 8, 'reliabl': 8, 'serv': 8, 'neg': 8, 'tech': 8, 'pass': 8, 'respons': 8, 'dad': 8, '2015': 8, 'thermostat': 8, 'interest': 8, 'string': 8, '48': 8, 'due': 8, 'recent': 8, 'understand': 8, 'bluetooth': 8, 'bulk': 8, 'real': 8, 'gone': 8, '2016': 8, 'along': 8, '2012': 8, 'ram': 8, 'paperwhit': 8, 'everyon': 8, 'promis': 8, 'respond': 7, 'heard': 7, 'trust': 7, 'bewar': 7, 'click': 7, 'cord': 7, 'fall': 7, 'smoke': 7, 'period': 7, 'certain': 7, 'market': 7, 'he': 7, 'entir': 7, 'group': 7, 'upon': 7, 'deliveri': 7, 'non': 7, 'although': 7, '3rd': 7, 'search': 7, 'today': 7, 'handl': 7, 'confus': 7, 'mess': 7, 'damag': 7, 'clean': 7, 'browser': 7, 'law': 7, 'carri': 7, 'avoid': 7, 'present': 7, 'unfortun': 7, 'tie': 7, 'indic': 7, 'goe': 7, 'explod': 7, 'view': 7, 'bc': 7, 'current': 7, 'kirkland': 7, 'wall': 7, 'lag': 7, '16gb': 7, 'fill': 7, 'zero': 7, 'nephew': 7, 'count': 7, 'shoot': 7, 'trash': 7, '2017': 7, 'depart': 7, 'invest': 7, '80': 7, 'credit': 7, 'bother': 7, 'fee': 7, 'comparison': 7, 'felt': 6, 'link': 6, 'freetim': 6, 'emerg': 6, 'frequent': 6, 'strong': 6, 'twice': 6, 'refund': 6, 'volt': 6, 'buyer': 6, 'insert': 6, 'baught': 6, 'gotten': 6, 'alarm': 6, 'smart': 6, 'term': 6, 'longev': 6, 'write': 6, 'stuck': 6, 'prior': 6, 'number': 6, 'compat': 6, '90': 6, 'juic': 6, 'anymor': 6, 'complic': 6, 'els111': 6, 'neopren': 6, 'dud': 6, 'newer': 6, 'luckili': 6, 'rest': 6, 'doa': 6, 'forev': 6, 'meh': 6, 'initi': 6, 'sturdi': 6, 'biggest': 6, 'talk': 6, 'task': 6, 'inch': 6, 'pad': 6, 'info': 6, 'depend': 6, 'wore': 6, 'strength': 6, 'fresh': 6, 'corrod': 6, 'deliv': 6, 'youngest': 6, 'she': 6, 'auto': 6, 'slot': 6, 'discount': 6, 'mention': 6, 'separ': 6, 'condit': 6, 'babi': 6, 'mic': 6, 'conveni': 6, 'smaller': 6, 'bag': 6, 'equip': 6, 'energi': 6, 'hdx': 6, 'finger': 6, 'plenti': 6, 'format': 6, 'network': 6, 'chanc': 6, 'lighter': 6, 'channel': 6, 'refurbish': 6, 'move': 5, 'unlock': 5, 'associ': 5, 'research': 5, 'friend': 5, 'loos': 5, 'lifespan': 5, 'toothbrush': 5, 'serious': 5, 'grandkid': 5, 'rid': 5, 'crack': 5, 'father': 5, 'gadget': 5, 'household': 5, 'bundl': 5, 'cours': 5, 'ahead': 5, 'split': 5, 'u': 5, 'mind': 5, 'import': 5, 'boy': 5, 'univers': 5, 'given': 5, 'oasi': 5, '13': 5, 'unabl': 5, 'neither': 5, 'mistak': 5, 'sourc': 5, 'compart': 5, 'nope': 5, 'tester': 5, 'inform': 5, 'decis': 5, 'familiar': 5, 'via': 5, '14': 5, 'true': 5, 'moment': 5, 'listen': 5, 'werent': 5, 'bug': 5, 'random': 5, 'build': 5, 'exampl': 5, 'everyday': 5, 'held': 5, 'magazin': 5, 'spent': 5, 'fault': 5, 'faulti': 5, 'onto': 5, 'sharp': 5, 'usb': 5, 'elsewher': 5, 'abil': 5, 'quicker': 5, 'wait': 5, 'octob': 5, 'thru': 5, 'knew': 5, 'appar': 5, 'guid': 5, 'adjust': 5, 'weird': 5, 'tool': 5, 'increas': 5, 'backlight': 5, 'ten': 5, 'straight': 5, 'headphon': 5, 'seen': 5, 'cancel': 5, 'eas': 5, 'excit': 5, 'inconsist': 5, 'geek': 5, 'squad': 5, 'april': 5, 'budget': 5, 'entri': 5, 'dark': 5, 'xma': 5, 'unlik': 5, 'silver': 5, 'offic': 5, 'colleg': 5, 'exchang': 5, '60': 5, 'anyway': 5, 'reorder': 5, 'six': 5, 'chrome': 5, 'ie': 5, 'deadbolt': 5, '23': 5, 'threw': 5, 'pre': 5, 'electr': 5, 'log': 5, 'discov': 5, 'possibl': 5, 'lightweight': 5, 'answer': 5, 'r': 5, 'lost': 5, 'otherwis': 5, 'gig': 5, 'logo': 5, 'pdf': 5, 'clariti': 4, 'choos': 4, 'heck': 4, 'air': 4, 'wii': 4, 'subscript': 4, 'owner': 4, 'thrill': 4, 'suppos': 4, 'sleep': 4, 'carrier': 4, 'frother': 4, 'sens': 4, 'late': 4, 'copper': 4, 'reli': 4, 'pc': 4, 'adequ': 4, '2a': 4, 'lucki': 4, 'dispos': 4, 'length': 4, 'becom': 4, 'posit': 4, 'hardli': 4, 'cheapest': 4, 'soft': 4, 'instanc': 4, 'flawlessli': 4, 'contain': 4, 'typic': 4, 'rotat': 4, 'variou': 4, 'mean': 4, 'subscrib': 4, 'preload': 4, 'outstand': 4, 'regularli': 4, 'sleek': 4, 'print': 4, 'unus': 4, 'environ': 4, 'latest': 4, 'profil': 4, 'reach': 4, 'vari': 4, 'intuit': 4, 'fraction': 4, 'ripoff': 4, 'competitor': 4, 'mainli': 4, 'icon': 4, 'avid': 4, 'nook': 4, 'direct': 4, 'audibl': 4, 'despit': 4, 'list': 4, 'lol': 4, 'dot': 4, 'teen': 4, 'face': 4, 'assum': 4, 'recogn': 4, 'zipper': 4, 'slightli': 4, 'technolog': 4, 'confirm': 4, 'fight': 4, 'loud': 4, 'needless': 4, 'graphic': 4, 'reset': 4, 'upset': 4, 'document': 4, 'manual': 4, 'educ': 4, 'mo': 4, 'maneuv': 4, 'failur': 4, 'spec': 4, 'waist': 4, 'shame': 4, 'shortli': 4, 'leakag': 4, 'forward': 4, 'remain': 4, 'shoulder': 4, 'appropri': 4, 'till': 4, 'grandma': 4, 'cent': 4, 'holiday': 4, 'novemb': 4, 'itll': 4, 'local': 4, 'usa': 4, 'grand': 4, 'refus': 4, 'walmart': 4, 'approxim': 4, 'youd': 4, 'middl': 4, 'destroy': 4, 'lose': 4, 'regret': 4, 'corros': 4, 'repair': 4, 'perhap': 4, 'process': 4, 'complain': 4, 'constant': 4, 'e': 4, 'toast': 4, 'built': 4, 'namebrand': 4, 'cartridg': 4, 'chronic': 4, 'wear': 4, 'mount': 4, 'font': 4, 'jump': 4, 'cat': 4, '32': 4, 'god': 4, 'password': 4, 'becam': 4, 'fluke': 4, 'reciev': 4, 'ridicul': 4, 'choic': 4, 'honestli': 4, 'data': 4, 'coppertop': 4, 'spen': 4, 'steal': 4, 'processor': 4, 'mac': 4, 'nexu': 4, 'media': 4, 'nest': 4, 'c': 4, '9v': 4, 'can⊙t': 4, 'brandnam': 4, 'obvious': 4, 'concern': 4, 'whole': 4, 'brother': 4, 'suitabl': 4, '7yr': 4, 'follow': 4, 'nabi': 4, 'certifi': 4, 'tip': 4, 'intern': 4, 'audio': 4, 'tho': 4, 'hidden': 4, 'fals': 4, 'boat': 4, 'awar': 4, 'thrown': 4, 'refer': 4, 'asu': 4, 'i⊙v': 4, '247': 4, 'background': 4, 'bb': 3, 'seriou': 3, 'con': 3, 'fairli': 3, 'radio': 3, 'gen': 3, 'assur': 3, 'bar': 3, 'boyfriend': 3, 'detail': 3, 'form': 3, 'suffici': 3, 'secondari': 3, 'togeth': 3, 'match': 3, '4yr': 3, 'accident': 3, 'equival': 3, 'scale': 3, '68': 3, 'puzzl': 3, 'consist': 3, 'chromebooksurfac': 3, 'macair': 3, '116': 3, 'snag': 3, 'friction': 3, 'inout': 3, 'beer': 3, 'cozi': 3, 'koozi': 3, 'densiti': 3, 'foam': 3, 'encas': 3, 'canva': 3, 'slit': 3, 'amazoni': 3, '834': 3, 'unbeliev': 3, 'comfort': 3, 'bang': 3, 'plan': 3, 'roku': 3, 'convinc': 3, 'wrapper': 3, 'cri': 3, 'mix': 3, 'hot': 3, 'burnt': 3, 'voic': 3, 'assist': 3, 'greatest': 3, 'transfer': 3, 'girl': 3, 'unaccept': 3, 'instagram': 3, 'entertain': 3, 'expir': 3, 'cold': 3, 'imagin': 3, 'sad': 3, 'known': 3, 'asid': 3, 'downfal': 3, 'news': 3, 'cute': 3, 'curriculum': 3, 'imposs': 3, 'beyond': 3, 'par': 3, 'fli': 3, 'econom': 3, 'headlamp': 3, 'boot': 3, 'shotti': 3, 'unhappi': 3, 'savvi': 3, 'altern': 3, 'printer': 3, 'hook': 3, 'weather': 3, 'pathet': 3, 'significantli': 3, 'numer': 3, 'temperatur': 3, 'seller': 3, 'lie': 3, 'peoplei': 3, 'creat': 3, 'class': 3, 'met': 3, 'languag': 3, 'individu': 3, 'downsid': 3, 'upload': 3, 'ac': 3, 'fourth': 3, 'exact': 3, 'taken': 3, 'hurt': 3, 'desk': 3, 'train': 3, 'apart': 3, 'dog': 3, 'appal': 3, 'guarante': 3, 'provid': 3, 'applianc': 3, 'rang': 3, 'grrrrrrreeeeeeeeaaaaaattttttt': 3, 'compact': 3, 'track': 3, 'shipment': 3, 'meant': 3, 'hadnt': 3, '65': 3, 'protector': 3, 'hr': 3, 'favorit': 3, 'estim': 3, 'particularli': 3, 'site': 3, 'winner': 3, 'drive': 3, 'brought': 3, 'saver': 3, 'mark': 3, 'transmitt': 3, 'breaker': 3, 'adapt': 3, 'clunki': 3, 'flight': 3, 'blank': 3, 'wed': 3, '40': 3, 'heat': 3, 'finish': 3, 'what': 3, 'nois': 3, 'equal': 3, 'semest': 3, 'repres': 3, 'primetim': 3, 'trick': 3, 'stereo': 3, 'shade': 3, 'ultim': 3, 'bestbuy': 3, 'messag': 3, 'procel': 3, 'target': 3, 'road': 3, 'w': 3, 'ghz': 3, 'locat': 3, 'blue': 3, 'teenag': 3, 'shortest': 3, 'august': 3, 'tilt': 3, 'rememb': 3, 'tend': 3, 'honest': 3, 'happier': 3, 'guard': 3, 'youll': 3, 'sooner': 3, 'post': 3, 'fluid': 3, 'longlast': 3, 'varieti': 3, 'rug': 3, 'wet': 3, 'crisp': 3, 'advantag': 3, 'hassl': 3, 'across': 3, 'besid': 3, 'sort': 3, 'homework': 3, 'pic': 3, 'self': 3, 'correctli': 3, 'cumbersom': 3, 'awhil': 3, 'mobil': 3, 'distract': 3, 'lit': 3, 'curiou': 3, 'dish': 3, 'lover': 3, 'it⊙': 3, 'restart': 3, 'dictionari': 3, 'caveat': 3, '4s': 3, 'letter': 3, 'bargain': 3, 'state': 3, 'regard': 3, 'flameless': 3, 'grandchildren': 3, 'batteriesi': 3, 'guy': 3, 'harbor': 3, 'freight': 3, 'earli': 3, 'altra': 3, 'superior': 3, '85': 3, 'reboot': 3, 'produc': 3, 'aloud': 3, 'pool': 3, 'relax': 3, 'attempt': 3, 'glare': 3, 'china': 3, 'skip': 3, 'yesterday': 3, 'overnight': 3, 'skin': 3, 'crazi': 3, 'eink': 3, 'lantern': 3, 'accept': 2, 'kiosk': 2, 'menial': 2, 'perspect': 2, 'firetv': 2, 'compartmentjust': 2, 'hiss': 2, 'jet': 2, 'cough': 2, 'choke': 2, 'batteryim': 2, 'particular': 2, 'advanc': 2, 'soso': 2, 'situat': 2, 'didn⊙t': 2, 'handheld': 2, 'duracellenerg': 2, 'mare': 2, 'compet': 2, 'mainstream': 2, 'paus': 2, 'photographi': 2, 'popular': 2, 'jealou': 2, 'request': 2, '6th': 2, 'hear': 2, 'unreal': 2, 'milk': 2, 'medium': 2, 'restock': 2, 'hundr': 2, 'earn': 2, 'notch': 2, 'parti': 2, 'dissapoint': 2, 'competit': 2, 'primari': 2, 'weaker': 2, 'blackjack': 2, 'forget': 2, 'behind': 2, 'garag': 2, 'door': 2, 'liketoo': 2, 'didont': 2, 'magic': 2, 'maxel': 2, 'robust': 2, 'factor': 2, 'fitbit': 2, 'playtim': 2, 'bore': 2, 'fork': 2, 'proof': 2, 'oili': 2, 'liquid': 2, 'defeat': 2, 'red': 2, 'depot': 2, 'frozen': 2, 'cellophan': 2, 'volum': 2, 'sampl': 2, 'file': 2, 'accid': 2, 'homesceeen': 2, 'obtrus': 2, 'stink': 2, 'unbox': 2, 'physic': 2, 'noteven': 2, 'atani': 2, 'playimg': 2, 'neat': 2, 'batt': 2, 'exceed': 2, 'ignor': 2, 'promo': 2, 'microsd': 2, 'room': 2, 'undon': 2, 'occupi': 2, 'lite': 2, 'gotcha': 2, 'girlfriend': 2, 'sub': 2, 'enabl': 2, 'overload': 2, 'backup': 2, 'bog': 2, 'craig': 2, 'natur': 2, 'membership': 2, 'grey': 2, 'pull': 2, 'bank': 2, 'kitten': 2, 'heavi': 2, 'calib': 2, 'vivid': 2, 'attach': 2, '13v': 2, 'caught': 2, 'stander': 2, 'em': 2, 'visit': 2, 'meter': 2, 'guideinstruct': 2, 'spreadsheet': 2, 'intuitiveveri': 2, 'appsgood': 2, 'moviesveri': 2, 'english': 2, 'languagesapp': 2, 'rel': 2, 'share': 2, 'cmon': 2, 'dealdisappoint': 2, 'thoma': 2, 'somewhat': 2, 'worthless': 2, 'encount': 2, 'batterypow': 2, 'respect': 2, 'swipe': 2, 'smooth': 2, 'firenot': 2, '6070': 2, 'introduc': 2, 'word': 2, 'hey': 2, 'mediocr': 2, 'grandchild': 2, '72': 2, 'unusu': 2, 'ensur': 2, 'gamesapp': 2, 'morn': 2, 'al': 2, 'vacat': 2, 'coher': 2, 'penni': 2, 'sold': 2, '8pack': 2, 'cabinet': 2, 'shock': 2, 'eat': 2, 'basi': 2, 'screw': 2, 'kiddo': 2, 'sensor': 2, 'l': 2, 'l2': 2, 'bed': 2, 'wake': 2, 'pink': 2, 'breez': 2, 'necessari': 2, 'tangerin': 2, 'redbox': 2, 'alon': 2, 'block': 2, 'remaind': 2, 'skeptic': 2, 'qti': 2, 'weekli': 2, 'shouldv': 2, 'tire': 2, 'chair': 2, 'tick': 2, 'cloud': 2, 'thousand': 2, 'delet': 2, 'plane': 2, 'paperweight': 2, 'doubt': 2, 'swap': 2, '10and': 2, 'roughli': 2, 'aawith': 2, '67': 2, 'aspect': 2, 'vital': 2, 'primarili': 2, '46': 2, 'hoop': 2, 'zinio': 2, 'loosen': 2, 'grip': 2, 'club': 2, 'effici': 2, 'extend': 2, 'juri': 2, 'itun': 2, '120': 2, 'backupand': 2, 'occasion': 2, 'tot': 2, 'februari': 2, 'dayssturdi': 2, 'recycl': 2, 'edg': 2, 'culprit': 2, 'droid': 2, 'food': 2, 'truth': 2, 'enter': 2, 'earlier': 2, 'ta': 2, 'multitask': 2, 'toilet': 2, 'investig': 2, '2yr': 2, 'sticki': 2, 'cardboard': 2, 'pain': 2, 'uselif': 2, 'blotch': 2, 'deter': 2, 'walk': 2, 'counterwould': 2, 'solar': 2, '59': 2, 'x': 2, 'autofocu': 2, 'hdtv': 2, 'portrait': 2, 'shoddi': 2, '1080': 2, 'statisfi': 2, 'author': 2, 'donenot': 2, 'imag': 2, 'outmuch': 2, 'ultra': 2, 'baldwin': 2, 'keyless': 2, 'httpswwwamazoncombaldwin8252ac1cylinderelectronicdeadboltdpb00441ucy0': 2, 'san': 2, 'francisco': 2, 'httpswwwamazoncomdpb005g7sby4': 2, 'deadboltwhen': 2, 'weaken': 2, 'httpswwwamazoncomfenixhl23headlampxpg2fenixhl23xpg2golddpb00sh08uxi': 2, 'layout': 2, 'glow': 2, 'startl': 2, 'yeah': 2, 'gon': 2, 'na': 2, 'head': 2, 'eg': 2, 'raovac': 2, 'drastic': 2, 'ergonom': 2, 'magnet': 2, 'cbatteri': 2, 'proprietari': 2, 'prepar': 2, 'k': 2, 'autist': 2, 'draw': 2, 'beauti': 2, 'appstor': 2, 'steadi': 2, 'ninja': 2, 'mwh': 2, 'will': 2, 'consum': 2, 'fat32': 2, 'ntf': 2, 'fat': 2, 'xfat': 2, 'common': 2, 'clash': 2, 'clan': 2, 'aftermarket': 2, 'coc': 2, 'launch': 2, '250': 2, '101': 2, 'wayyyyy': 2, 'hint': 2, 'pen': 2, 'press': 2, 'bah': 2, 'lay': 2, 'seek': 2, 'wow': 2, 'warm': 2, 'bring': 2, 'everreadi': 2, 'spoil': 2, 'discontinu': 2, 'alright': 2, 'unlimit': 2, 'younger': 2, 'flat': 2, 'kinda': 2, 'pointless': 2, 'began': 2, 'f': 2, 'chose': 2, 'stole': 2, 'shelflif': 2, 'blew': 2, 'employe': 2, 'hasnt': 2, 'rc': 2, 'headach': 2, 'utter': 2, 'playstor': 2, 'experienc': 2, 'angri': 2, 'broken': 2, 'thermomet': 2, '17': 2, 'crash': 2, 'newest': 2, 'phooey': 2, 'diapoint': 2, 'choo': 2, 'allway': 2, 'yeari': 2, 'voltmet': 2, 'meticul': 2, '1st': 2, 'grown': 2, 'durecel': 2, 'customiz': 2, '5th': 2, 'workus': 2, '0f': 2, 'cd': 2, 'delay': 2, 'modifi': 2, 'don⊙t': 2, 'menu': 2, 'evereadi': 2, 'prompt': 2, 'rough': 2, 'peripher': 2, '4th': 2, 'randomli': 2, 'dozen': 2, 'bezel': 2, 'measur': 2, 'worm': 2, 'experiment': 2, 'walkabout': 2, 'tripl': 2, 'area': 2, 'explain': 2, 'clearli': 2, 'decemb': 2, '2month': 2, 'baffl': 2, 'polici': 2, 'toss': 2, 'evalu': 2, 'premium': 2, 'critic': 2, 'shoe': 2, 'shorter': 2, 'prosinstal': 2, 'hassleveri': 2, 'player': 2, 'correct': 2, 'inabl': 2, 'band': 2, 'runout': 2, 'fanci': 2, 'underground': 2, 'deterior': 2, 'sign': 2, 'span': 2, 'poorer': 2, 'eneloop': 2, 'inspect': 2, 'structur': 2, 'leaki': 2, 'comixolog': 2, 'op': 2, 'asia': 2, 'knowledg': 2, 'peac': 2, 'longest': 2, 'fori': 2, 'omg': 2, 'kill': 2, 'truli': 2, 'stapl': 2, 'tree': 2, 'strap': 2, 'dnt': 2, 'independ': 2, 'docket': 2, 'upat': 2, 'grandniec': 2, 'manag': 2, 'milag': 2, 'hesit': 2, 'shortliv': 2, 'medic': 2, 'staff': 2, 'pump': 2, 'procedurethat': 2, 'yeartim': 2, 'anti': 2, 'ei': 2, 'consequ': 2, 'simultan': 2, 'text': 2, 'coat': 2, 'nearest': 2, 'webroot': 2, 'disbelief': 2, 'tall': 2, 'pricey': 2, 'plant': 2, '35': 2, 'peter': 2, 'tooth': 2, 'brush': 2, 'motion': 2, 'productjmac': 2, 'aaai': 2, 'shall': 2, 'resort': 2, 'glitch': 2, 'polit': 2, 'workingwtf': 2, 'fed': 2, 'reput': 2, '1000': 2, 'aggrav': 2, 'vibrat': 2, 'camp': 2, 'rare': 2, 'puddl': 2, 'peel': 2, 'overexposur': 2, 'surfac': 2, 'ineffect': 2, 'unsaf': 2, 'timer': 2, 'inconveni': 2, 'alittl': 2, 'dispens': 2, 'sept': 2, 'pace': 2, 'allot': 2, 'bon': 2, '1992': 2, 'certainli': 2, 'dewdrop': 2, 'insist': 2, 'replenish': 2, 'restrict': 2, 'twicemi': 2, 'tremend': 2, 'nation': 2, '18': 2, 'tarnish': 2, 'rusti': 2, 'flaw': 2, 'soapdispens': 2, 'june': 2, 'insta': 2, 'bulb': 2, 'monitor': 2, 'patient': 2, 'integr': 2, 'proud': 2, 'quarter': 2, 'spring': 2, 'everybit': 2, 'oki': 1, 'chargefor': 1, 'depth': 1, 'reloc': 1, 'pricebut': 1, 'shove': 1, 'hroat': 1, 'tvand': 1, 'haveit': 1, '6yrold': 1, '3yrold': 1, '1yr': 1, 'awsom': 1, 'helpfulth': 1, 'moneyshort': 1, 'elderli': 1, 'microsoft': 1, 'rebrand': 1, 'domin': 1, 'rule': 1, 'charm': 1, 'speedlit': 1, 'firstupd': 1, 'problemsi': 1, '125': 1, 'returnabledont': 1, 'preorder': 1, 'batterieswil': 1, 'goodworthless': 1, 'sizzl': 1, 'proper': 1, 'polar': 1, 'fizzl': 1, 'femal': 1, 'weeeak': 1, 'blast': 1, 'ut': 1, 'trackbal': 1, 'signific': 1, 'cordless': 1, 'pricecon': 1, 'alli': 1, 'daili': 1, 'thata': 1, 'contempl': 1, 'honesti': 1, 'unbias': 1, 'vote': 1, 'monthveri': 1, '61617': 1, 'fresher': 1, 'circumferenceetc': 1, 'groupon': 1, 'paddleamazon': 1, 'prosgreat': 1, 'moneyenough': 1, 'gamesdec': 1, 'lifeconsfeel': 1, 'weekslowresolut': 1, 'determin': 1, 'quailiti': 1, 'bedroom': 1, 'afternoon': 1, 'boughtth': 1, 'folder': 1, 'deadsuck': 1, 'ray333ray2001yahoocom': 1, '27': 1, '43016ive': 1, 'devicesorigin': 1, 'reviewbought': 1, 'laser': 1, 'tag': 1, 'outsidei': 1, 'expensiveand': 1, 'picsi': 1, 'rablet': 1, 'boggl': 1, 'modest': 1, '240gb': 1, 'awwsom': 1, 'needsplit': 1, 'roommat': 1, 'testernot': 1, 'testeroh': 1, 'joke': 1, 'buyi': 1, 'nobrain': 1, 'yuck': 1, 'prewrap': 1, 'substanc': 1, 'appliancestoy': 1, 'expedit': 1, 'responsiveth': 1, 'announcedherself': 1, 'convers': 1, 'computersmi': 1, 'wordless': 1, 'pictorialstart': 1, 'primit': 1, 'vagu': 1, 'thisdevic': 1, 'thatfil': 1, 'thorough': 1, 'protocol': 1, 'encrypt': 1, 'toattach': 1, 'hi': 1, 'sadli': 1, 'florida': 1, 'glossi': 1, 'whatsoev': 1, 'lowpow': 1, 'topsspend': 1, 'thursday': 1, 'zone': 1, 'immort': 1, 'soul': 1, 'la': 1, 'notif': 1, 'dismiss': 1, 'foliostyl': 1, 'higherend': 1, 'leisur': 1, 'roof': 1, '2024': 1, 'endedup': 1, 'shack': 1, 'insteadi': 1, '5star': 1, 'shortterm': 1, 'butmajor': 1, 'beforebumm': 1, 'youbut': 1, 'camenor': 1, 'schedul': 1, 'alertan': 1, 'kidproof': 1, 'basement': 1, 'upstair': 1, 'soundopt': 1, 'bose': 1, 'soundlink': 1, 'primealexa': 1, 'featuresjust': 1, 'fireit': 1, 'brainer': 1, 'viabl': 1, 'substitut': 1, 'simpler': 1, 'contorl': 1, 'companion': 1, 'aciv': 1, 'tactic': 1, 'horrend': 1, 'chines': 1, 'firstlast': 1, 'disastr': 1, 'piti': 1, 'gullibl': 1, 'convert': 1, 'foldov': 1, 'phablet': 1, '500ish': 1, 'zippi': 1, 'clarityqu': 1, '64': 1, '96': 1, '200': 1, 'record': 1, 'slight': 1, 'bleed': 1, 'colorbright': 1, 'uniform': 1, 'compon': 1, 'coincid': 1, 'spazz': 1, 'moistur': 1, 'themmayb': 1, 'enrol': 1, 'unfortunetli': 1, 'arm': 1, 'leg': 1, 'okayther': 1, 'tabletsom': 1, 'elabor': 1, 'concret': 1, 'intact': 1, 'greenwork': 1, 'tight': 1, 'forth': 1, 'computersetup': 1, 'sincefin': 1, 'flexibl': 1, '15v': 1, '20pack': 1, 'imo': 1, 'sucker': 1, 'seep': 1, 'booksmagazin': 1, '2026': 1, 'programm': 1, 'histori': 1, 'uniqu': 1, 'japanes': 1, 'overdischarg': 1, 'frankli': 1, 'placement': 1, 'shouldnt': 1, 'sweet': 1, 'untrustworthi': 1, 'untrustworthydi': 1, 'blood': 1, 'pressur': 1, 'cuff': 1, 'machin': 1, 'highful': 1, 'scare': 1, 'death': 1, 'vibrantfor': 1, 'unbeat': 1, 'amazonfound': 1, 'webpag': 1, 'froze': 1, 'batteriesdont': 1, 'longdont': 1, 'theseyour': 1, 'ene': 1, 'knowledgeus': 1, 'returnedno': 1, '1weeki': 1, 'min': 1, 'boomshackalackalacka': 1, 'th': 1, 'squarish': 1, 'rectangular': 1, 'flimsi': 1, 'afraid': 1, 'aunt': 1, 'deliverycon': 1, 'dropout': 1, 'nikon': 1, 'uncommon': 1, '148v': 1, 'to12v': 1, 'hundredth': 1, 'electrochemistri': 1, 'variabl': 1, 'multiwindow': 1, 'awkward': 1, 'unfair': 1, 'perfectlyand': 1, '14cant': 1, 'wreath': 1, 'lightsguess': 1, 'strongamazon': 1, 'blink': 1, 'gret': 1, '46year': 1, 'amazonnon': 1, 'feet': 1, 'minutesobvi': 1, 'heart': 1, 'attack': 1, 'grate': 1, 'cardiac': 1, 'surgeon': 1, 'twenti': 1, 'leakagethat': 1, 'promot': 1, 'slowth': 1, 'chargingdata': 1, 'adapterth': 1, 'login': 1, 'fireo': 1, 'setbackth': 1, 'jack': 1, 'powervolum': 1, 'chargingheadphon': 1, '1112gb': 1, 'wegman': 1, 'reward': 1, 'batteryhungri': 1, 'race': 1, 'adveris': 1, 'borrow': 1, 'publish': 1, 'benefit': 1, 'fromsal': 1, 'foul': 1, 'decad': 1, 'decept': 1, 'kink': 1, '128gb': 1, '32gbjust': 1, 'anim': 1, 'watcher': 1, 'dauther': 1, 'reccomend': 1, 'firend': 1, '034': 1, '043': 1, '025050': 1, 'heavyweight': 1, 'tightli': 1, 'resist': 1, 'graini': 1, 'buffer': 1, '9th': 1, 'hallmark': 1, 'figurin': 1, 'fyi': 1, 'cartoon': 1, 'duracellwouldnt': 1, 'win': 1, 'complement': 1, 'userfriendli': 1, 'outcom': 1, 'bimonthli': 1, 'dayyou': 1, 'hourshould': 1, 'bold': 1, 'husbandso': 1, 'itit': 1, 'prostheir': 1, 'speedcon': 1, 'insan': 1, 'uncomfort': 1, 'photosoveral': 1, 's2': 1, 'hmm': 1, 'label': 1, 'importantli': 1, 'discharg': 1, 'obviou': 1, '3995': 1, 'favor': 1, '2040': 1, 'gizmo': 1, 'batteriesthes': 1, 'joul': 1, 'thief': 1, 'othersthi': 1, 'dorki': 1, 'fish': 1, 'tank': 1, 'sennheis': 1, 'twelv': 1, 'alt': 1, 'bummer': 1, 'popularknown': 1, 'dinner': 1, 'concentr': 1, 'loyal': 1, 'camper': 1, 'invent': 1, 'chip': 1, 'shini': 1, 'mountain': 1, 'nowher': 1, 'coverag': 1, '5000': 1, 'stroke': 1, 'forc': 1, 'whenev': 1, 'dosent': 1, 'stupid': 1, 'materi': 1, 'stiff': 1, 'timberlin': 1, 'theyd': 1, 'dump': 1, 'shirt': 1, 'youtyp': 1, 'batter': 1, 'nightveri': 1, 'displeas': 1, 'blow': 1, 'nose': 1, 'ceri': 1, 'itwel': 1, 'march': 1, 'feb': 1, 'lightsab': 1, 'expensivewith': 1, 'thisth': 1, 'changedleak': 1, '146146125v': 1, 'introductori': 1, 'youngelr': 1, 'sam': 1, '5i': 1, '38': 1, 'cashi': 1, 'defiantli': 1, 'cello': 1, 'motor': 1, 'lap': 1, 'asesom': 1, 'autism': 1, 'stash': 1, 'code': 1, 'beep': 1, 'suspect': 1, 'eventu': 1, 'relief': 1, '33': 1, 'yahoo': 1, 'map': 1, 'struggl': 1, 'gamecircl': 1, 'goodread': 1, 'pricelow': 1, 'tier': 1, '48pack': 1, 'approv': 1, 'bunch': 1, 'hardship': 1, 'loath': 1, 'sprain': 1, 'wrist': 1, 'metal': 1, 'scratch': 1, 'hardwark': 1, 'refresh': 1, 'outdoor': 1, 'hohum': 1, 'unitsfir': 1, 'wider': 1, '256': 1, 'hdtvfire': 1, '1208': 1, '800': 1, '720p': 1, 'crummi': 1, 'frontrear': 1, 'rear': 1, 'retain': 1, 'panel': 1, 'microhdmi': 1, 'horizont': 1, 'wire': 1, 'awkwardli': 1, 'viewwher': 1, 'snapon': 1, 'fold': 1, 'onoff': 1, 'fantasticnot': 1, '1920': 1, '1200': 1, '150': 1, '249': 1, 'itbottom': 1, 'worthwhil': 1, 'ideal': 1, 'appreci': 1, 'sluggish': 1, 'viru': 1, 'solv': 1, 'robot': 1, 'armi': 1, 'muahahahahaha': 1, '26': 1, 'ham': 1, 'egger': 1, 'board': 1, 'clumsi': 1, 'muy': 1, 'bueno': 1, 'perk': 1, 'playback': 1, 'uhoh': 1, 'finepix': 1, 'len': 1, 'towel': 1, 'bat': 1, 'conserv': 1, '300': 1, 'einstein': 1, 'math': 1, 'lousi': 1, 'royallynow': 1, 'useag': 1, 'boxi': 1, 'duralast': 1, 'cage': 1, 'crate': 1, 'gate': 1, 'stockpil': 1, 'duracellslast': 1, 'dormant': 1, 'fixtur': 1, '7inch': 1, '8inch': 1, 'industri': 1, '16g': 1, '6499': 1, 'goof': 1, 'monthsthes': 1, 'manner': 1, 'againi': 1, 'lesson': 1, 'thin': 1, 'explor': 1, 'promptli': 1, 'outer': 1, 'tape': 1, 'betterher': 1, 'oftenthos': 1, 'cheaperbatteri': 1, 'calcul': 1, 'vs': 1, 'extrapol': 1, 'coppertopsso': 1, 'powerboth': 1, 'bunnyduracel': 1, 'tini': 1, 'gofin': 1, 'signatur': 1, 'report': 1, 'thunder': 1, 'disney': 1, 'suster': 1, 'everyewher': 1, 'bill': 1, 'pricegreat': 1, 'mixedbag': 1, '1year': 1, 'discuss': 1, 'town': 1, 'distinguish': 1, 'etcbetween': 1, 'lamazon': 1, 'contac': 1, 'conclud': 1, 'visual': 1, 'iphon': 1, 'gameplay': 1, '115f': 1, 'grab': 1, 'spot': 1, '127im': 1, 'andor': 1, 'plusfor': 1, 'ms': 1, 'docx': 1, 'xlsx': 1, 'pptxall': 1, 'headset': 1, 'cut': 1, 'slower': 1, '2013': 1, 'arguabl': 1, 'hum': 1, 'smoothli': 1, 'expans': 1, 'subpar': 1, 'wash': 1, 'conneseiur': 1, 'conecct': 1, 'economi': 1, 'mah': 1, 'firestick': 1, 'poorest': 1, 'conferenc': 1, 'brutal': 1, 'misinform': 1, 'cast': 1, '20000': 1, 'modem': 1, 'hole': 1, 'unrespons': 1, 'inaccuratewhich': 1, 'push': 1, 'curv': 1, 'overcom': 1, 'greati': 1, 'properti': 1, 'forevergreat': 1, 'unlucki': 1, 'dip': 1, 'spill': 1, 'quad': 1, 'core': 1, 'cpu': 1, '1132017': 1, 'furnac': 1, 'nonstop': 1, 'holder': 1, 'tablett': 1, 'punch': 1, 'nobraineri': 1, 'smartphon': 1, 'septemb': 1, 'retail': 1, 'world': 1, 'bob': 1, 'today20': 1, 'picsw': 1, 'vendor': 1, 'losewrong': 1, 'yearnot': 1, 'weeki': 1, 'earlierthi': 1, 'saleman': 1, 'hilari': 1, 'warrenti': 1, '500': 1, 'burst': 1, '8year': 1, 'pin': 1, 'movement': 1, 'foor': 1, 'substanti': 1, 'websurf': 1, 'toyedit': 1, 'airplan': 1, 'mousea': 1, 'mousethat': 1, 'stuf': 1, 'compartmentscon': 1, 'ouch': 1, 'techi': 1, 'monthli': 1, 'stillnow': 1, 'clone': 1, 'commercialsso': 1, 'ping': 1, 'checkjust': 1, 'themth': 1, 'belight': 1, 'turner': 1, 'bell': 1, 'whistl': 1, 'splurg': 1, 'coffe': 1, 'wand': 1, 'surprisingli': 1, '70he': 1, '2year': 1, 'satisfactorili': 1, 'batteryimho': 1, 'asham': 1, 'saleit': 1, 'choppi': 1, 'consumpt': 1, 'ina': 1, 'unsatisfactori': 1, 'swissphon': 1, 'pager': 1, 'leakedupdateok': 1, 'receivedamazon': 1, 'jk150601': 1, 'tolov': 1, 'homepag': 1, 'tax': 1, 'aren⊙t': 1, 'he⊙': 1, 'batteriesthank': 1, 'vers': 1, 'net': 1, 'attest': 1, 'desir': 1, 'savi': 1, 'bame': 1, 'theamazon': 1, 'interact': 1, 'allaround': 1, '79': 1, 'paperwit': 1, 'kindler': 1, 'presum': 1, 'microphon': 1, 'kerplot': 1, 'gona': 1, 'wrote': 1, 'insur': 1, 'whop': 1, 'quantitys': 1, 'automat': 1, 'snuck': 1, 'forlucki': 1, 'minimum': 1, 'mu': 1, 'himsinc': 1, 'virtual': 1, 'theyll': 1, '3year': 1, 'floor': 1, 'panick': 1, 'hog': 1, 'cookbook': 1, 'slowest': 1, '710': 1, 'recip': 1, 'cook': 1, 'killer': 1, 'flush': 1, 'spendi': 1, 'rival': 1, 'fiancé': 1, 'goodth': 1, 'motorola': 1, 'radioson': 1, 'childsafeti': 1, 'itnoqw': 1, 'unimport': 1, 'defin': 1, 'mailth': 1, 'usei': 1, 'ear': 1, 'kidfriendli': 1, 'bend': 1, '2500': 1, 'merefurbish': 1, 'social': 1, '7year': 1, '8gb': 1, 'tough': 1, 'gameseasi': 1, 'freedom': 1, '3gb': 1, 'mad': 1, '32gb': 1, 'upcom': 1, 'recently': 1, 'storebrand': 1, 'among': 1, 'powerful': 1, 'seven': 1, 'danger': 1, 'grandmoth': 1, 'who': 1, 'crappi': 1, 'slipperi': 1, 'doorbel': 1, 'ringseal': 1, 'blown': 1, 'hack': 1, 'usabl': 1, 'wasd': 1, 'costcokirkland': 1, 'linki': 1, 'comcast': 1, 'weel': 1, 'craziest': 1, 'highest': 1, 'utub': 1, 'grade': 1, 'couch': 1, 'goofi': 1, 'modern': 1, 'beg': 1, 'bonu': 1, 'project': 1, 'forgotten': 1, 'attent': 1, '758': 1, 'enoughi': 1, 'visibl': 1, 'controli': 1, '9s': 1, 'warrant': 1, 'morememori': 1, 'hoursday': 1, 'amazoncom': 1, 'pleasur': 1, 'suppli': 1, 'geat': 1, 'tutori': 1, 'fab': 1, 'pocket': 1, 'bucksth': 1, 'wieght': 1, 'shabbi': 1, 'competitionbatteri': 1, 'daysspam': 1, 'spam': 1, 'stopspoofingamazoncom': 1, 'album': 1, 'junksav': 1, 'musician': 1, 'teacher': 1, 'sing': 1, 'shure': 1, 'beta': 1, '58': 1, '78': 1, 'mici': 1, '1520': 1, 'mid': 1, 'batterei': 1, 'after2': 1, 'rental': 1, 'outdat': 1, 'costbenefit': 1, 'eight': 1, 'bent': 1, 'pit': 1, 'leader': 1, 'versatil': 1, 'hindsight': 1, 'nerf': 1, 'dung': 1, 'migrat': 1, 'doodad': 1, 'eneg': 1, 'mousethi': 1, 'officei': 1, 'remotecontrol': 1, 'discard': 1, 'storagei': 1, 'pixil': 1, 'textstil': 1, 'asian': 1, 'infam': 1, 'harm': 1, 'realiti': 1, 'vacuum': 1, 'earth': 1, 'tappi': 1, '1999': 1, 'outrag': 1, '5v': 1, 'lowpressur': 1, 'salesperson': 1, 'satisfactori': 1, 'energizerveri': 1, '9year': 1, 'ride': 1, 'computertablet': 1, 'directli': 1, 'powerpoint': 1, 'stretch': 1, 'noisemak': 1, 'higherdrain': 1, 'moneyy': 1, 'tube': 1, 'tabletsso': 1, 'onesso': 1, 'knob': 1, 'fadedwor': 1, 'cheapi': 1, 'cheapli': 1, 'competitioni': 1, 'voltag': 1, 'capacitancethey': 1, 'rightqual': 1, 'manni': 1, 'comeon': 1, 'aid': 1, 'el': 1, 'cheapo': 1, 'explot': 1, 'scan': 1, 'woo': 1, 'staral': 1, 'reviewsth': 1, 'anythinh': 1, 'hair': 1, 'clipper': 1, 'habe': 1, 'n': 1, 'overli': 1, 'fave': 1, 'tune': 1, 'schlage': 1, 'keypad': 1, 'acess': 1, 'ehhh': 1, '5050': 1, 'puck': 1, 'bateri': 1, 'cycl': 1, 'rapidli': 1, 'empti': 1, 'themmi': 1, 'garbagei': 1, 'fri': 1, 'hang': 1, 'sideload': 1, 'gold': 1, 'goto': 1, 'doctor': 1, 'expert': 1, 'especial': 1, 'noe': 1, 'littlebit': 1, 'finesam': 1, '8thi': 1, 'christmasw': 1, 'bestlov': 1, 'itthat': 1, 'booksgamesmus': 1, 'viedosthank': 1, 'sudden': 1, 'stope': 1, 'hp': 1, 'flavor': 1, 'probobl': 1, 'laggi': 1, 'ipad2': 1, 'therapi': 1, 'session': 1, 'inconvi': 1, 'warrantyon': 1, 'woeful': 1, 'groceri': 1, 'pandora': 1, 'downstair': 1, 'weekend': 1, 'weekday': 1, 'slim': 1, 'idk': 1, 'orderd': 1, 'oh': 1, 'productthey': 1, 'tryafter': 1, 'malfunctionupon': 1, 'residu': 1, 'chicken': 1, 'coop': 1, 'fiddl': 1, 'multimet': 1, 'bias': 1, 'inventori': 1, 'regardless': 1, 'bookspap': 1, 'kindlew': 1, 'basket': 1, 'exist': 1, 'pend': 1, 'propos': 1, 'have': 1, 'durcel': 1, 'doubl': 1, 'breast': 1, 'fifth': 1, 'okit': 1, 'dose': 1, 'debat': 1, 'releas': 1, 'itemjunk': 1, 'precharg': 1, 'mile': 1, 'websit': 1, 'gmail': 1, 'smarter': 1, 'couldn⊙t': 1, 'withrug': 1, 'sophist': 1, '1one': 1, 'deserv': 1, 'disabl': 1, 'differencewhen': 1, 'retir': 1, 'chain': 1, 'dollor': 1, 'drawer': 1, 'emptor': 1, 'outperform': 1, 'oasisbatteri': 1, 'lowest': 1, 'connector': 1, 'poorli': 1, 'grow': 1, 'unsnap': 1, 'resolv': 1, 'accessori': 1, 'dif': 1, 'worki': 1, 'productit': 1, 'grandpa': 1, 'hitch': 1, 'shi': 1, 'gre': 1, 'moder': 1, 'engag': 1, 'charact': 1, 'b': 1, 'mice': 1, '3x': 1, 'infrequ': 1, 'slip': 1, 'thea': 1, 'clue': 1, 'nymph': 1, 'lifetim': 1, 'moneykeep': 1, 'moneywhen': 1, 'sunlight': 1, 'bay': 1, 'springsummerfal': 1, 'themexpect': 1, 'wrestl': 1, 'shekel': 1, 'hurri': 1, '99': 1, 'trail': 1, 'frame': 1, 'okk': 1, 'mtv': 1, 'batterieswhat': 1, 'ador': 1, 'valid': 1, 'mi': 1, '3yr': 1, 'soap': 1, 'underway': 1, 'optic': 1, 'meet': 1, 'offerdownsid': 1, 'hike': 1, 'rockclimb': 1, 'freak': 1, 'secur': 1, 'costoco': 1, 'rapid': 1, 'geeat': 1, '10th': 1, 'everthey': 1, 'outyou': 1, 'wellknown': 1, 'instantli': 1, 'output': 1, 'gradual': 1, 'loss': 1, 'margin': 1, 'suffic': 1, '7in': 1, 'ined': 1, 'alight': 1, 'tore': 1, 'shear': 1, 'hmmmmpaid': 1, 'quartermi': 1, 'financi': 1, 'advisor': 1, 'wag': 1, 'chew': 1, 'yellow': 1, 'comment': 1, 'youngster': 1, 'santa': 1, 'grader': 1, 'minecraft': 1, '☺': 1, 'hve': 1, 'snap': 1, 'circuit': 1, 'kit': 1, 'applicationdevic': 1, 'claim': 1, 'goodnight': 1, 'horriblejust': 1, 'unfriendli': 1, 'sub100': 1, 'abus': 1, 'gray': 1, 'insideim': 1, 'forgiv': 1, 'feroci': 1, 'wise': 1, 'explanatori': 1, 'contrast': 1, 'alreadygood': 1, 'scope': 1, '24hr': 1, 'twin': 1, 'costli': 1, 'rip': 1, 'tear': 1, 'heavier': 1, 'touchscreen': 1, '3g': 1, 'written': 1, 'amazonbasicsaaa': 1, 'thatll': 1, 'theyv': 1, 'util': 1, 'event': 1, 'batteryoper': 1, 'closethallway': 1, 'ident': 1, 'runningunfortun': 1, 'daylightit': 1, 'doesnot': 1, 'strain': 1, 'expland': 1, '200gb': 1, 'retina': 1, 'greenishyellowish': 1, 'tint': 1, 'photograph': 1, 'unacept': 1, 'tone': 1, 'greenish': 1, 'vulcan': 1, 'art': 1, 'acrobat': 1, 'titl': 1, 'accur': 1, 'overallstil': 1, 'amyou': 1, 'rubber': 1, 'popup': 1, 'sprout': 1, 'chubbi': 1, 'dough': 1, 'bathroom': 1, 'nightbut': 1, 'runtim': 1, '57': 1, 'yo': 1, 'teach': 1, 'gamemovi': 1, 'logitech': 1, 'wound': 1, 'won⊙t': 1, 'enhanc': 1, 'reduc': 1, 'confid': 1, 'row': 1, 'bout': 1, 'addag': 1, 'rescu': 1, 'holter': 1, 'embarrass': 1, 'reschedul': 1, 'methey': 1, '0': 1, 'matt': 1, 'superb': 1, 'yellowish': 1, 'hue': 1, 'builtin': 1, 'amazinglik': 1, 'foreign': 1, 'consult': 1, 'insitu': 1, 'phrasether': 1, 'consider1': 1, 'trade': 1, 'urg': 1, 'handson': 1, 'needwant': 1, 'with2': 1, 'minor': 1, 'evolut': 1, 'implement': 1, 'warmer': 1, 'studi': 1, 'advers': 1, 'brightwhit': 1, 'bluish': 1, 'disrupt': 1, 'melatonin': 1, 'bodi': 1, 'availablei': 1, 'king': 1, 'cross': 1, 'countri': 1, 'specialist': 1, 'minu': 1, 'useplu': 1, 'key': 1, 'spectacular': 1, 'majorli': 1, 'gear': 1, 'recarg': 1, 'catch': 1, 'seam': 1, 'tendenc': 1, 'render': 1, 'sligtli': 1, 'overs': 1, '9400': 1, 'nobodi': 1, '10pk': 1, '20pk': 1, '16993299': 1, 'fascin': 1, 'iam': 1, 'shakey': 1, 'sensit': 1, 'heavili': 1, 'ecosystem': 1, 'intend': 1, 'quirk': 1, 'amazn': 1, 'occas': 1, 'pocketbook': 1, 'error': 1, 'duh': 1, 'partnow': 1, 'sth': 1, 'york': 1, 'crossword': 1, 'dubiou': 1, 'measli': 1, 'fifti': 1, 'bunni': 1, 'drum': 1, 'monday': 1, 'sunday': 1, 'spray': 1, 'sf': 1, 'maui': 1, 'suit': 1, '253': 1, 'corrupt': 1, 'infinit': 1, 'loop': 1, 'againif': 1, 'revisit': 1, 'handsupd': 1, '112816': 1, 'bf': 1, '8yr': 1, 'repurchas': 1, 'byod': 1, 'goodi': 1, 'howto': 1, 'articl': 1, '37': 1, 'workand': 1, 'yay': 1, 'uhhhhh': 1, 'defectlv': 1, 'roll': 1, 'straighten': 1, 'deivc': 1, 'kodak': 1, 'risk': 1, 'portion': 1, 'rock': 1, 'withstand': 1, 'nnbi': 1, 'thu': 1, 'badli': 1, 'wreck': 1, 'altogeth': 1, 'perman': 1, 'oxid': 1, 'inelig': 1, 'firefox': 1, 'sw': 1, 'deem': 1, 'competitiontoo': 1, 'observ': 1, 'secret': 1, 'prop': 1, 'everybodi': 1, 'profession': 1, 'bookshelv': 1, '4gig': 1, '1gig': 1, 'lame': 1, 'revers': 1, 'bright': 1, 'peer': 1, 'electrician': 1, 'agreei': 1, 'purs': 1, 'productexcel': 1, 'moneythank': 1, 'muchmark': 1, 'kosloski': 1, 'equat': 1, 'clicker': 1, 'profrssion': 1, 'casual': 1, 'semi': 1, 'raw': 1, 'wi': 1, 'fi': 1, 'amazingli': 1, 'themi': 1, 'whatthey': 1, 'whoop': 1, 'desktop': 1, 'prove': 1, 'candi': 1})\n",
"['dead', '3', 'day', 'put', 'ago', 'alreadi', 'go', 'name', 'brand', 'next', 'time', 'good', 'first', 'tablet', 'purchas', 'sinc', 'bought', 'grandson', 'christma', 'slow', 'consid', 'price', 'that', 'problem', 'charger', 'wont', 'charg', 'use', 'need', 'plug', 'ok', 'fit', 'perfectli', 'origin', 'kindl', 'fire', 'longer', 'wasnt', 'replac', 'ipad', 'saw', 'hd8', 'store', 'realiz', 'much', 'better', 'like', 'size', 'feel', 'read', 'book', 'prefer', 'thing', 'improv', 'old', 'one', 'plu', 'respond', 'dont', 'cant', 'photo', 'lock', 'screen', 'advertis', 'also', 'usual', 'take', 'tri', 'issu', 'leak', 'batteri', 'amazon', 'remot', 'kept', 'drain', 'within', 'minut', 'quit', 'work', 'gave', 'think', 'bad', 'later', 'open', 'case', 'type', 'keyboard', 'sever', 'sit', 'heard', 'felt', 'hit', 'right', 'hand', 'start', 'whatev', 'came', 'throw', 'away', 'never', 'trust', 'easi', 'set', 'user', 'friendli', 'suggest', 'sale', 'im', 'glad', 'took', 'great', 'speed', 'didnt', 'want', 'abl', 'clear', 'home', 'app', 'see', 'person', 'pictur', 'okay', 'option', 'avail', 'three', 'star', 'theyr', 'bewar', '2', 'week', 'amaz', 'product', 'child', 'love', 'parent', 'control', 'way', 'hold', 'definit', 'recommend', 'stuff', 'link', '1', 'click', 'buy', 'kid', 'would', 'famili', 'littl', 'learn', 'beginn', 'other', '5', 'niec', 'enjoy', 'play', 'includ', 'freetim', 'lower', 'rate', 'port', 'extrem', 'cord', 'fall', 'frustrat', 'lot', 'last', 'long', 'wast', 'aw', 'far', 'ive', '6', 'month', 'smoke', 'detector', 'period', 'well', 'year', 'without', 'money', 'perfect', 'even', '8', 'bigger', 'memori', '7', 'arriv', 'seem', 'thought', 'packag', '34', 'devic', 'disappoint', 'five', 'valu', 'certain', 'get', 'pay', 'overal', 'wonder', 'end', 'return', 'android', 'softwar', 'children', 'previou', 'decid', 'instead', 'compar', 'duracel', 'cheap', 'emerg', 'wireless', 'game', 'select', 'higher', 'qualiti', 'sell', 'prime', 'could', 'market', 'simpli', 'close', 'flashlight', 'switch', 'frequent', 'life', 'energ', 'arent', 'strong', 'enough', '4', 'daughter', 'drop', 'limit', 'mani', 'doesnt', 'allow', 'googl', 'pretti', 'annoy', 'mine', 'tv', 'realli', 'tell', 'test', 'everi', 'singl', 'model', 'upgrad', 'he', 'happi', 'system', 'download', 'librari', 'look', 'describ', 'keep', 'freez', 'anyon', 'told', 'receiv', '20', 'howev', 'offer', 'regist', 'may', 'us', 'basic', 'experi', 'less', 'coupl', 'cost', 'save', 'worth', 'your', 'ran', 'entir', 'quickli', 'inexpens', 'fast', 'sure', 'group', 'insid', 'box', 'order', 'remov', 'two', 'mom', 'comput', 'twice', 'got', 'new', 'second', 'still', 'yet', 'refund', '15', 'volt', 'soon', 'check', 'upon', 'deliveri', 'non', 'impress', 'hard', 'connect', 'internet', 'load', 'free', 'appl', 'protect', 'best', 'ever', 'function', 'mode', 'make', 'complet', 'might', 'batch', 'support', 'buyer', 'fine', 'there', 'beat', 'expens', 'care', 'almost', 'immedi', 'pop', 'sound', 'insert', 'actual', 'bare', 'power', 'specif', 'small', 'full', '12', 'oversea', 'baught', 'figur', 'cover', 'video', 'gift', 'content', 'contact', 'said', 'although', 'ask', 'everywher', 'total', 'overpr', 'anyth', 'know', 'let', 'custom', 'special', 'defect', 'must', 'gotten', 'caus', 'alarm', 'hous', 'instal', '10', 'shelf', 'place', 'card', 'say', 'deal', 'weve', 'someon', 'smart', 'phone', 'someth', 'probabl', 'noth', 'job', 'excel', 'applic', 'top', 'come', 'term', 'longev', 'mous', 'pro', 'chang', 'constantli', 'back', 'xbox', 'hour', 'averag', 'toddler', 'toy', 'mostli', 'wife', 'spend', 'shop', 'often', 'review', 'opinion', 'havent', 'whether', '3rd', 'alway', 'leav', 'help', 'item', 'fulli', 'solid', 'write', 'pleas', 'give', 'ye', 'thank', 'expect', 'perform', 'big', 'ton', 'search', 'luck', 'pair', 'clock', 'today', 'stuck', 'difficult', 'notic', 'made', 'indonesia', 'matter', 'older', 'mayb', 'alkalin', 'aa', 'fair', 'worst', 'everyth', 'aaa', 'hope', 'find', 'updat', 'prior', 'believ', 'number', 'reason', 'amazonbas', 'differ', 'compat', 'junk', 'half', 'absolut', 'run', 'stand', 'send', 'terribl', 'rather', 'samsung', 'galaxi', 'tab', 'handl', 'camera', 'reader', 'wouldnt', 'decent', 'stick', 'night', 'outsid', 'confus', 'warranti', 'bulki', 'wish', 'access', 'watch', 'movi', 'account', 'cheaper', 'echo', 'low', 'die', 'setup', 'oper', 'bottom', 'expand', 'requir', 'final', 'digit', 'effect', 'recharg', '90', 'none', 'suck', 'juic', 'anymor', 'school', 'age', 'granddaught', 'color', 'birthday', 'window', 'fail', 'question', 'sometim', 'went', 'trip', 'son', 'extern', 'husband', 'button', 'complic', 'navig', 'around', 'easier', '9', 'least', 'alot', 'short', 'peopl', 'found', 'logic', 'els111', '11', 'sleev', 'neopren', 'easili', 'slide', 'backpack', 'highli', 'light', 'durabl', 'side', 'note', 'stay', 'acid', 'multipl', 'cell', 'point', 'isnt', 'mess', 'damag', 'pack', 'gun', 'happen', 'sorri', 'busi', 'clean', 'design', 'complaint', 'except', 'hd', 'awesom', 'fun', 'stop', 'paid', 'fantast', 'nice', 'storag', 'addit', 'handi', 'buck', 'stock', 'dud', 'troubl', 'purpos', 'poor', 'pick', 'green', 'weak', 'add', 'wors', 'newer', 'ship', 'horribl', 'dollar', 'luckili', 'warn', 'show', 'major', 'wrap', 'four', 'rest', '36', 'doa', 'plastic', 'high', 'servic', 'touch', 'wrong', 'part', 'alexa', 'surpris', 'speaker', 'shut', 'fan', 'main', 'cabl', 'instruct', 'wifi', 'simpl', 'web', 'browser', 'music', 'etc', 'version', '25', 'ad', 'extra', 'tabl', '2nd', 'mother', 'youtub', 'huge', 'id', 'shot', 'black', 'friday', 'cool', 'begin', 'sd', 'garbag', 'regular', 'quick', 'onlin', 'forev', 'meh', 'mini', 'brows', '50', 'exactli', 'left', 'live', 'base', 'initi', 'third', 'paper', 'sturdi', 'fact', 'biggest', 'own', 'sister', 'law', 'program', 'tap', 'portabl', 'carri', 'activ', 'talk', 'netflix', 'task', 'reliabl', 'afford', 'avoid', 'similar', 'present', 'inch', 'yr', 'laptop', 'ill', 'pad', 'especi', 'email', 'ebook', 'useless', 'unfortun', 'info', 'either', 'serv', 'travel', 'featur', 'mirror', 'display', 'costco', 'guess', '16', 'normal', 'per', 'gb', 'tie', 'surf', 'facebook', 'neg', 'eye', '100', 'electron', 'crap', 'decor', 'compani', 'young', 'break', 'step', 'done', 'tech', 'near', 'depend', 'wore', 'station', 'els', 'kind', 'burn', 'strength', 'flash', 'fresh', 'corrod', 'pass', 'member', 'super', 'date', 'previous', 'gener', 'resolut', 'respons', 'deliv', 'sent', 'youngest', 'unit', 'she', 'appear', 'os', 'piec', 'dad', '2015', 'indic', 'anoth', 'goe', 'explod', 'liter', 'edit', 'turn', 'safe', 'capabl', 'view', '30', 'starter', 'auto', 'bc', 'unless', 'thermostat', 'current', 'interest', 'string', 'eread', '48', 'led', 'kirkland', 'though', 'water', 'satisfi', 'dim', 'anywher', 'couldnt', 'properli', 'candl', 'due', 'recent', 'micro', 'wall', 'weight', 'interfac', 'lag', 'bit', 'slot', '16gb', 'faster', 'capac', 'fill', 'car', 'discount', 'mention', 'rayovac', 'zero', 'ruin', 'larg', '24', 'adult', 'stream', 'fix', 'manufactur', 'separ', 'understand', 'nearli', 'bluetooth', 'continu', 'page', 'voyag', 'condit', 'level', 'nephew', 'bumper', 'count', 'bulk', 'standard', 'babi', 'mic', 'shoot', 'trash', 'real', 'idea', 'conveni', 'white', 'broke', 'space', 'smaller', 'bag', 'equip', '2017', 'quantiti', 'lack', 'usag', 'call', 'depart', 'energi', 'larger', 'gone', '2016', 'amount', 'hate', 'along', 'past', 'hdx', 'invest', 'finger', 'result', 'futur', '2012', 'ram', 'paperwhit', '80', 'line', 'miss', 'plenti', 'worri', 'credit', 'bother', 'format', 'everyon', 'network', 'fee', 'readi', 'chanc', 'comparison', 'lighter', 'promis', 'channel', 'refurbish']\n",
"845\n"
]
}
],
"source": [
"def build_voc(documents, threshold):\n",
" counter = Counter()\n",
" for text in documents:\n",
" counter.update(text)\n",
" print(counter)\n",
" return [token for token,count in counter.items() if count > threshold]\n",
"\n",
"voc = build_voc(train[\"text\"], 5)\n",
"print(voc)\n",
"print(len(voc))"
]
},
{
"cell_type": "markdown",
"id": "e006f572",
"metadata": {
"id": "e006f572"
},
"source": [
"<a name='1.6'></a>\n",
"### 1.6 Vectorisation des données (4 points)\n",
"\n",
"À l'aide de la classe [TfidfVectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html) de Sklearn, transformez l'ensemble de jetons en matrice de co-occurence utilisant TF-IDF.\n",
"\n",
"Utilisez le vocabulaire construit au numéro précédent dans votre matrice de co-occurrence (voir le paramètre vocabulary de TfidfVectorizer).\n",
"\n",
"**Faites attention:** Il ne faut pas entrainer (fit) la vectorisation sur l'ensemble de test"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "54bdc303",
"metadata": {
"id": "54bdc303",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "d0d08e0a-e335-45f6-8134-0c5c9b0996ec"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(1871, 845)\n"
]
}
],
"source": [
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"\n",
"TfidfVectorizer = TfidfVectorizer(vocabulary=voc)\n",
"X_train = TfidfVectorizer.fit_transform(train[\"text_original\"])\n",
"\n",
"print(X_train.shape)"
]
},
{
"cell_type": "markdown",
"id": "f00781ca",
"metadata": {
"id": "f00781ca"
},
"source": [
"<a name='2'></a>\n",
"## 2. Classification (35 points)\n",
"\n",
"Maintenant que les données sont prêtes à être utilisées par nos modèles, nous allons entrainer et tester différent types de modèles sur le jeu de données afin d'en faire la comparaison.\n",
"\n",
"Cette section sera divisé en cinq modèle:\n",
" - Modèle aléatoire (Random baseline)\n",
" - Classificateur bayésien naïf\n",
" - Régression Logistique\n",
" - Multi-Layer Perceptron (MLP)\n",
"\n",
"<a name='2.1'></a>\n",
"### 2.1 Modèle aléatoire (Random baseline) (5 points)\n",
"\n",
"Un seuil (baseline) est un modèle servant de référence et dont les performances représentent un seuil à dépasser.\n",
"\n",
"#### a) Générez ce seuil en effectuant des prédictions aléatoires parmi les valeurs 1, 3 et 5. Ensuite, affichez les mesures de performance : précision, rappel (recall) et F1. Utilisez la classe classification_report de SKlearn et affichez 4 chiffres après la virgule. (3.5 points)"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "bc582223",
"metadata": {
"id": "bc582223"
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 102,
"id": "f79530a0",
"metadata": {
"scrolled": false,
"id": "f79530a0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "64e6fd9c-d98f-497d-fbbc-1a3327ffc794"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" precision recall f1-score support\n",
"\n",
" 1 0.3238 0.3064 0.3149 297\n",
" 3 0.3585 0.3596 0.3591 317\n",
" 5 0.3529 0.3701 0.3613 308\n",
"\n",
" accuracy 0.3460 922\n",
" macro avg 0.3451 0.3454 0.3451 922\n",
"weighted avg 0.3455 0.3460 0.3456 922\n",
"\n"
]
}
],
"source": [
"from sklearn.metrics import classification_report\n",
"\n",
"num_preds = len(test)\n",
"y_true = test[\"rating\"]\n",
"\n",
"def generate_pred():\n",
" p = np.random.uniform(0,1)\n",
" if p < 1/3:\n",
" n=1\n",
" elif p < 2/3:\n",
" n=3\n",
" else:\n",
" n=5\n",
" return n\n",
"\n",
"y_pred = [generate_pred() for i in range(num_preds)]\n",
"\n",
"print(classification_report(y_true, y_pred, digits=4))"
]
},
{
"cell_type": "markdown",
"id": "b7070590",
"metadata": {
"id": "b7070590"
},
"source": [
"#### b) Comment pouvez-vous expliquer le F1-score obtenu? (1.5 points)"
]
},
{
"cell_type": "markdown",
"id": "4b86cab6",
"metadata": {
"id": "4b86cab6"
},
"source": [
"> Le F1-score correspond à 2 * (Precision * Recall) / (Precision + Recall).\n",
"Un F1-score proche de 1/3 environ s'explique par le fait qu'une prédiction a une proba de 1/3 d'être correcte (idem pour recall."
]
},
{
"cell_type": "markdown",
"id": "8eadb0d3",
"metadata": {
"id": "8eadb0d3"
},
"source": [
"<a name='2.2'></a>\n",
"### 2.2 Analyse et compréhension d'un classificateur bayésien naïf (NB) (22 points)\n",
"\n",
"Naive Bayes (NB) est un algorithme très simple pouvant servir de bon point de départ (baseline) pour les tâches de classification. Ce numéro portera sur l'analyse de ce modèle afin de bien comprendre son comportement."
]
},
{
"cell_type": "markdown",
"id": "0c04861f",
"metadata": {
"id": "0c04861f"
},
"source": [
"<a name='2.2.1'></a>\n",
"#### 2.2.1 Construction du modèle (4 points)\n",
"\n",
"Commencez d'abord par construire le modèle à l'aide de la classe MultinomialNB de SKlearn. Utilisez les données vectorisées produites en 1.6.\n",
"\n",
"Affichez les performances de votre classificateur (précision, recall, F1-score)."
]
},
{
"cell_type": "code",
"execution_count": 103,
"id": "c90814a4",
"metadata": {
"id": "c90814a4",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"outputId": "4fc17486-9397-4459-bca3-a9bc45f628b3"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"214 Great Speaker, no need for the tech \n",
"2106 Really like the form factor. \n",
"312 Horrible \n",
"2410 One Star \n",
"890 Good product, just not very convenient \n",
"\n",
" text rating \\\n",
"214 [great, speaker, need, tech, great, speaker, r... 3 \n",
"2106 [realli, like, form, factor, come, nook, reade... 5 \n",
"312 [horribl, bought, daughter3, cant, even, use, ... 1 \n",
"2410 [one, star, die, almost, immedi, ruin, purpos,... 1 \n",
"890 [good, product, conveni, good, price, batteri,... 3 \n",
"\n",
" text_original token_count \\\n",
"214 Great Speaker, no need for the tech Great Spea... 18 \n",
"2106 Really like the form factor. Coming from a noo... 11 \n",
"312 Horrible Bought this for my daughter(3) and sh... 46 \n",
"2410 One Star Died almost immediately. Ruined their... 9 \n",
"890 Good product, just not very convenient Very go... 28 \n",
"\n",
" adj \n",
"214 [good] \n",
"2106 [] \n",
"312 [Horrible, slow, free, free, initial, foreign] \n",
"2410 [] \n",
"890 [Good, convenient, good, own, small, loose] "
],
"text/html": [
"\n",
" <div id=\"df-1f3966c1-27a2-4c56-9c30-6967a90f6e4a\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>title</th>\n",
" <th>text</th>\n",
" <th>rating</th>\n",
" <th>text_original</th>\n",
" <th>token_count</th>\n",
" <th>adj</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>214</th>\n",
" <td>Great Speaker, no need for the tech</td>\n",
" <td>[great, speaker, need, tech, great, speaker, r...</td>\n",
" <td>3</td>\n",
" <td>Great Speaker, no need for the tech Great Spea...</td>\n",
" <td>18</td>\n",
" <td>[good]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2106</th>\n",
" <td>Really like the form factor.</td>\n",
" <td>[realli, like, form, factor, come, nook, reade...</td>\n",
" <td>5</td>\n",
" <td>Really like the form factor. Coming from a noo...</td>\n",
" <td>11</td>\n",
" <td>[]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>312</th>\n",
" <td>Horrible</td>\n",
" <td>[horribl, bought, daughter3, cant, even, use, ...</td>\n",
" <td>1</td>\n",
" <td>Horrible Bought this for my daughter(3) and sh...</td>\n",
" <td>46</td>\n",
" <td>[Horrible, slow, free, free, initial, foreign]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2410</th>\n",
" <td>One Star</td>\n",
" <td>[one, star, die, almost, immedi, ruin, purpos,...</td>\n",
" <td>1</td>\n",
" <td>One Star Died almost immediately. Ruined their...</td>\n",
" <td>9</td>\n",
" <td>[]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>890</th>\n",
" <td>Good product, just not very convenient</td>\n",
" <td>[good, product, conveni, good, price, batteri,...</td>\n",
" <td>3</td>\n",
" <td>Good product, just not very convenient Very go...</td>\n",
" <td>28</td>\n",
" <td>[Good, convenient, good, own, small, loose]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-1f3966c1-27a2-4c56-9c30-6967a90f6e4a')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-1f3966c1-27a2-4c56-9c30-6967a90f6e4a button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-1f3966c1-27a2-4c56-9c30-6967a90f6e4a');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-d3902529-719f-440c-96f7-c7cb5e3d0f13\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-d3902529-719f-440c-96f7-c7cb5e3d0f13')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-d3902529-719f-440c-96f7-c7cb5e3d0f13 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 103
}
],
"source": [
"from sklearn.naive_bayes import MultinomialNB\n",
"test.head()"
]
},
{
"cell_type": "code",
"execution_count": 104,
"id": "6e4fecb0",
"metadata": {
"scrolled": true,
"id": "6e4fecb0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "318f269f-a249-4bd4-ca6e-0f827a538866"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" precision recall f1-score support\n",
"\n",
" 1 0.8085 0.7677 0.7876 297\n",
" 3 0.6877 0.6877 0.6877 317\n",
" 5 0.7585 0.7955 0.7765 308\n",
"\n",
" accuracy 0.7495 922\n",
" macro avg 0.7516 0.7503 0.7506 922\n",
"weighted avg 0.7503 0.7495 0.7495 922\n",
"\n"
]
}
],
"source": [
"y_train = train[\"rating\"]\n",
"clf = MultinomialNB(force_alpha=True)\n",
"clf.fit(X_train, y_train)\n",
"MultinomialNB(force_alpha=True)\n",
"\n",
"\n",
"X_test = TfidfVectorizer.transform(test[\"text_original\"])\n",
"y_pred = clf.predict(X_test)\n",
"y_true = test[\"rating\"]\n",
"\n",
"print(classification_report(y_true, y_pred, digits=4))"
]
},
{
"cell_type": "markdown",
"id": "34a784a1",
"metadata": {
"id": "34a784a1"
},
"source": [
"<a name='2.2.2'></a>\n",
"#### 2.2.2 Matrice de confusion (3 points)\n",
"\n",
"Visualisez la matrice de confusion de votre modèle en utilisant la fonction [heatmap](https://seaborn.pydata.org/generated/seaborn.heatmap.html) de seaborn. Celle-ci peut prendre en entrée une matrice de confusion comme celle fournie par la fonction [confusion_matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) dans SKLearn."
]
},
{
"cell_type": "code",
"execution_count": 105,
"id": "a115ac85",
"metadata": {
"scrolled": true,
"id": "a115ac85",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 472
},
"outputId": "a669fe1e-12c6-4052-cf4e-2b2521eff828"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
],
"source": [
"from sklearn.metrics import confusion_matrix\n",
"from seaborn import heatmap\n",
"\n",
"labels = [1,3,5]\n",
"cm = confusion_matrix(y_true, y_pred, labels=labels)\n",
"\n",
"heatmap(cm, annot=True, cmap='Blues', cbar=True, xticklabels=labels, yticklabels=labels)\n",
"\n",
"plt.xlabel('Predicted')\n",
"plt.ylabel('True')\n",
"plt.title('Confusion Matrix')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "223d6c84",
"metadata": {
"id": "223d6c84"
},
"source": [
"<a name='2.2.3'></a>\n",
"#### 2.2.3 Visualisation des probabilités de NB (5 points)\n",
"\n",
"Naive Bayes est un classificateur suivant une approche générative. Durant son entraînement, il apprend les probabilités P(x_i|y). En utilisant le théorème de Bayes, on peut exprimer la probabilité d'une classe donnée y étant donné un ensemble de caractéristiques x_1, x_2, ..., x_n comme suit :\n",
"\n",
"$$ P(y|x_1, x_2, ..., x_n) = P(y) * P(x_1|y) * P(x_2|y) * ... * P(x_n|y) $$\n",
"\n",
"Ainsi, étant donné un exemple ayant le jeton x_i, plus la probabilité P(x_i|y) est élevée pour une classe, plus la probabilité que l'exemple provienne de cette classe augmente.\n",
"\n",
"Écrivez du code permettant de visualiser les jetons ayant les plus grandes probabilités selon la classe dans un graphique de type [barh](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.barh.html). Consultez la documentation de [MultiNomialNB](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html) afin de trouver les probabilités P(x_i|y). Le graphique produit devrait montrer, sur l'axe des Y, les 10 jetons associés au P(x_i|y) le plus grand selon y. L'axe des X devrait représenter la valeur des probabilités.\n",
"\n",
"Ce code devra être sous forme d'une fonction où on passe la classe y en paramètre.\n"
]
},
{
"cell_type": "code",
"execution_count": 106,
"id": "eb219a46",
"metadata": {
"scrolled": false,
"id": "eb219a46",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 581
},
"outputId": "9c9b6659-34eb-45e6-fa3c-5c38ef00c439"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"845\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"def visualize_top_tokens_for_class(y):\n",
" \"\"\"\n",
" Visualize the top tokens with the highest probabilities for a given class in a barh graph.\n",
"\n",
" Parameters:\n",
" - class_index: Index of the target class for which to visualize top tokens.\n",
" \"\"\"\n",
" # Get the feature log probabilities for the specified class\n",
" feature_log_probs = clf.feature_log_prob_[y//2]\n",
"\n",
" # Get the feature names (tokens) from the vectorizer\n",
" feature_names = TfidfVectorizer.get_feature_names_out()\n",
" print(len(feature_names))\n",
"\n",
" # Sort feature names by their log probabilities in descending order\n",
" sorted_indices = np.argsort(feature_log_probs)[::-1]\n",
"\n",
" top_feature_indices = sorted_indices[:10]\n",
"\n",
" # Get the top tokens and their corresponding log probabilities\n",
" top_tokens = [feature_names[i] for i in top_feature_indices]\n",
" top_probabilities = [np.exp(feature_log_probs[i]) for i in top_feature_indices]\n",
"\n",
" # Create a horizontal bar graph to visualize the top tokens and probabilities\n",
" plt.figure(figsize=(10, 6))\n",
" plt.barh(top_tokens, top_probabilities, color='skyblue')\n",
" plt.xlabel('Probability')\n",
" plt.title(f'Top 10 Tokens for Class {y}')\n",
" plt.gca().invert_yaxis() # Invert the y-axis to display the highest probability at the top\n",
" plt.show()\n",
"\n",
"visualize_top_tokens_for_class(5)\n"
]
},
{
"cell_type": "markdown",
"id": "4600bf66",
"metadata": {
"id": "4600bf66"
},
"source": [
"Que pouvez-vous remanquer à propos des jetons affichés dans le graphique?"
]
},
{
"cell_type": "markdown",
"id": "fa61bab3",
"metadata": {
"id": "fa61bab3"
},
"source": [
"Certains sont communs aux trois classes et la plupart des top10 jetons adjectifs les plus fréquents dans la classe 5 se retrouvent ici également.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "00579fba",
"metadata": {
"id": "00579fba"
},
"source": [
"<a name='2.2.4'></a>\n",
"#### 2.2.4 Visualisation des erreurs commises (3 points)\n",
"\n",
"Trouvez toutes les phrases dont la vraie valeur est 5 mais la valeur prédite est de 1.\n",
"\n",
"Affichez ces exemples d'une manière lisible.\n"
]
},
{
"cell_type": "code",
"execution_count": 107,
"id": "c4fef76a",
"metadata": {
"id": "c4fef76a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "8bfbe6b5-bf04-4f56-99a3-3db1c346a6b0"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Text: \n",
"['reliabl', 'long', 'last', 'heard', 'one', 'consum', 'guru', 'radio', 'year', 'ago', 'say', 'gener', 'lesser', 'known', 'brand', 'batteri', 'often', 'good', 'name', 'brand', 'tri', 'pack', 'amazonbas', 'aa', 'alkalin', 'batteri', 'instead', 'usual', 'name', 'brand', 'ive', 'bought', 'sever', 'pack', 'sinc', 'although', 'havent', 'done', 'object', 'test', 'appear', 'longlast', 'reliabl', 'brand', 'ive', 'tri', 'batteri', 'instal', 'coupl', 'year', 'ago', 'still', 'charg', 'there', 'leakag', 'corros', 'ive', 'pleas', 'amazonbas', 'alkalin', 'batteri']\n",
"\n",
"Text: \n",
"['come', 'shrinkwrap', 'littl', '4pack', 'obviou', 'matter', 'one', 'brand', 'new', 'scatter', 'abou', 'least', 'expens', 'aa', 'batteri', 'could', 'find', 'come', 'shrinkwrap', 'littl', '4pack', 'obviou', 'matter', 'one', 'brand', 'new', 'scatter', 'hous', 'like', 'batteri', 'fairi', 'there', 'alway', 'fresh', 'pack', 'nearbi']\n",
"\n",
"Text: \n",
"['amazon', 'tripplea', 'topnotch', 'product', 'perfectli', 'sent', 'perfectli', 'usabl']\n",
"\n",
"Text: \n",
"['amazonbas', 'aaa', 'perform', 'alkalin', 'batteri', 'aaa', 'perform', 'alkalin', 'batteri', 'wow', 'amazonbas', 'aaa', 'perform', 'alkalin', 'batteri', 'actual', 'amazonbas', 'aaa', 'perform', 'alkalin', 'batteri', 'work', 'unbeliev']\n",
"\n",
"Text: \n",
"['good', 'everyth', 'expect', 'order', 'onlin', 'pick', 'next', 'day', 'easi']\n",
"\n",
"Text: \n",
"['great', 'packag', 'test', 'show', 'good', 'valu', 'love', 'fact', 'come', 'packag', '4', 'batteri', 'pack', 'put', 'box', 'open', 'close', 'fumbl', 'batteri']\n",
"\n",
"Text: \n",
"['aaaaa', 'fast', 'deliveri', 'aaaaa', 'fast', 'deliveri', 'describ']\n",
"\n",
"Text: \n",
"['save', 'bear', 'attack', 'one', 'day', 'camp', 'wild', 'west', 'rocki', 'mountain', 'colorado', 'carri', 'trusti', 'power', 'ranger', 'limit', 'edit', 'collector', 'sword', 'nowher', 'bear', 'attack', 'grab', 'trusti', 'sword', 'batteri', 'make', 'swoosh', 'sound', 'add', 'effect', 'swing', 'hurriedli', 'went', 'trusti', 'amazonbas', 'aa', 'batteri', 'switch', 'one', 'power', 'ranger', 'sword', 'swoosh', 'sound', 'amaz', 'escap', 'life', 'leg', 'that', 'okay', 'thank', 'amazonbas', 'aa', 'batteri', 'save', 'life']\n",
"\n",
"Text: \n",
"['grandkid', 'first', 'tablet', 'kid', 'never', 'put', 'plenti', 'app', 'keep', 'interest', 'parent', 'lock', 'keep', 'safe']\n",
"\n",
"Text: \n",
"['doesnt', 'need', '2a', 'take', '3a', 'everi', 'year', 'never', 'run']\n",
"\n",
"Text: \n",
"['longest', 'last', 'batteri', 'ever', 'longest', 'last', 'batteri', 'ive', 'ever', 'use', 'bought', 'xbox', 'one', 'control', 'use', 'hour', 'time', 'everi', 'day', 'replac', 'batteri', 'mayb', 'everi', 'coupl', 'week']\n",
"\n",
"Text: \n",
"['yay', 'amazon', 'super', 'excit', 'tri', 'amazon', 'brand', 'item', 'im', 'sure', 'glad', 'branch', 'realli', 'dislik', 'walmart', 'terribl', 'custom', 'servic', 'typic', 'purchas', 'batteri', 'bj', 'two', 'kid', 'ton', 'toy', 'amazon', 'price', 'batteri', 'beat', 'bj', '5', '6', 'depend', 'size', 'plu', 'even', 'bj', 'cant', 'get', '48', 'pack', 'aa', 'packag', 'simpl', 'fine', 'noth', 'must', 'say', 'like', 'open', 'packag', 'better', 'amazon', 'normal', 'batteri', 'hate', 'thick', 'plastic', 'hard', 'cut', 'sometim', 'sharp', 'plu', 'amazon', 'packag', 'better', 'environ', 'instal', 'batteri', 'yesterday', 'ill', 'updat', 'long', 'last']\n",
"\n",
"Text: \n",
"['work', 'well', 'cost', 'less', 'bulk', 'batteri', 'must', 'three', 'kid', 'two', 'twin', 'toddler', 'batteri', 'oper', 'toy', 'save', 'saniti', 'wish', 'toy', 'recharg', 'though', 'toy', 'compani', 'catch', 'us', 'reli', 'amazon', 'basic', 'provid', 'us', 'need', 'effici', 'cost', 'batteri', 'last', 'long', 'time', 'energen', 'batteri', 'bit', 'longer', 'last', 'suit', 'fine', 'get', 'subscript']\n",
"\n",
"Text: \n",
"['aa', 'batteri', 'batteri', 'lot', 'better', 'thought', 'go', 'actual', 'last', 'longer', 'name', 'brand', 'batteri', 'duracel', 'energ', 'definit', 'order', 'futur', 'first', 'time', 'buy', 'mani', 'batteri', 'never', 'let', 'arriv', 'time', 'problem']\n",
"\n",
"Text: \n",
"['batteri', 'appear', 'work', 'great', 'ive', 'use', 'two', 'aa', 'xbox', 'batteri', 'appear', 'work', 'great', 'ive', 'use', 'two', 'aa', 'xbox', 'one', 'control', 'month', 'get', 'sale', 'like', 'buy', 'buy', 'buy']\n",
"\n",
"Text: \n",
"['fast', 'charger', 'bought', 'charger', 'kindl', 'voyag', 'great']\n",
"\n",
"Text: \n",
"['awesom', 'still', 'run', 'strong', 'string', 'batteryoper', 'light', 'whole', 'week', '8hour', 'shift', 'dark']\n",
"\n",
"Text: \n",
"['great', 'product', 'second', 'one', 'must', 'one', 'time']\n",
"\n"
]
}
],
"source": [
"misclassified_examples = [] # List to store misclassified examples\n",
"\n",
"y_true = test[\"rating\"]\n",
"y_true = y_true.to_numpy()\n",
"\n",
"for i in range(len(y_true)):\n",
" true_rating = y_true[i]\n",
" predicted_rating = y_pred[i]\n",
" if true_rating == 5 and predicted_rating == 1:\n",
" misclassified_examples.append(test[\"text\"].iloc[i])\n",
"# Print the misclassified examples\n",
"\n",
"for example in misclassified_examples:\n",
" print(\"Text: \")\n",
" print(example)\n",
" print(\"\")"
]
},
{
"cell_type": "markdown",
"id": "cffb9d00",
"metadata": {
"id": "cffb9d00"
},
"source": [
"<a name='2.2.5'></a>\n",
"#### 2.2.5 Analyse d'erreurs commises (7 points)"
]
},
{
"cell_type": "markdown",
"id": "204c7fff",
"metadata": {
"id": "204c7fff"
},
"source": [
"Complétez la fonction plot_example qui:\n",
" - Prend en entrée une liste de jetons provenant d'un exemple.\n",
" - Produit un graphique qui pour chaque jeton, affiche la valeur P(x_i|y=5) et P(x_i|y=1)\n",
" \n",
"**Pour vous faciliter le travail, utiliser barh de pandas et non de matplotlib**: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.barh.html\n",
"\n",
"\n",
"#### a) Exécutez votre fonction avec une phrase au choix dont la vraie valeur est 5 mais la valeur prédite est de 1. (4 points)"
]
},
{
"cell_type": "code",
"execution_count": 108,
"id": "0795de9d",
"metadata": {
"id": "0795de9d"
},
"outputs": [],
"source": [
"def plot_example(tokens):\n",
" feature_names = TfidfVectorizer.get_feature_names_out()\n",
" feature_log_probs_1 = clf.feature_log_prob_[1//2]\n",
" feature_log_probs_5 = clf.feature_log_prob_[5//2]\n",
"\n",
" indices = []\n",
" for i in range(len(feature_names)):\n",
" token = feature_names[i]\n",
" if token in tokens: # ///////////////\n",
" indices += [i]\n",
" probs_1 = [np.exp(feature_log_probs_1[i]) for i in indices]\n",
" probs_5 = [np.exp(feature_log_probs_5[i]) for i in indices]\n",
"\n",
" df = pd.DataFrame({'Token': [feature_names[i] for i in indices], 'P(x_i|y=1)': probs_1, 'P(x_i|y=5)': probs_5})\n",
"\n",
" df.set_index('Token').plot.barh(figsize=(20, 16))\n",
"\n",
" # Customize the plot\n",
" plt.xlabel('Probability')\n",
" plt.title('Token Probabilities for Classes 1 and 5')\n",
" plt.gca().invert_yaxis() # Invert the y-axis to display the highest probability at the top\n",
" plt.show()\n",
"\n",
"tokens1 = ['power', 'craziest', 'thing', 'put', 'stuff', 'need', 'batteri', 'thing', 'work', 'great', 'price', 'quantiti']\n",
"tokens2 = ['batteri', 'day', 'price', 'buy', 'buy', '8pack', 'store', 'pay', 'similar', 'lifespan', 'havent', 'notic', 'differencewhen', 'retir', '30', 'year', 'ill', 'probabl', 'buy', 'case', 'get', 'last', 'year', 'last']\n",
"tokens3 = ['batteri', 'appear', 'work', 'great', 'ive', 'use', 'two', 'aa', 'xbox', 'batteri', 'appear', 'work', 'great', 'ive', 'use', 'two', 'aa', 'xbox', 'one', 'control', 'month', 'get', 'sale', 'like', 'buy', 'buy', 'buy']\n"
]
},
{
"cell_type": "code",
"source": [
"plot_example(tokens1)\n",
"plot_example(tokens2)\n",
"plot_example(tokens3)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 3951
},
"id": "B7_JglhL9zlL",
"outputId": "786d6cce-5d75-46ac-d755-1db34f7ae81e"
},
"id": "B7_JglhL9zlL",
"execution_count": 109,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 2000x1600 with 1 Axes>"
],
"image/png": "iVBORw0KGgoAAAANSUhEUgAABnUAAAUlCAYAAADGB0OvAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAACII0lEQVR4nOzde9zXg/3/8eenrq6r43XFKmQpckjGtwg5lpgQxr7WmKGRc6Nh5jBT2JyZw5jDl+SwDZvjclZG+2JGZjk15Jg5zZWUoj6/P/xcX5cOuurK5V33++32ud36vI+v9/W52m3rsff7UyqXy+UAAAAAAADwtdasqQcAAAAAAADgy4k6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAANCISqVShg0b1tRjfKVGjRqVUqmUxx57rNGO2b9//3zrW9/60u0mT56cUqmUUaNG1S0bMWJESqVSve26deuWIUOGLPS5+/fv34BpF98nn3ySo48+Ol26dEmzZs2yyy67fKXnX5Bx48alVCpl3LhxTT3KMu+zv2uTJ09u6lEAAGgiog4AAMu8Uqm0UK+i/aN2//79682//PLLZ8MNN8wVV1yROXPmNPV4Terpp5/OiBEjvjb/OH7FFVfkzDPPzG677ZarrroqP/nJT76S8950003Zfvvt06FDh1RWVqZz584ZPHhw7r///q/k/F8Xjz76aA455JBssMEGadGixVxRsGg+iz/zer355ptNPR4AAIuhoqkHAACApnb11VfXez969Ojcc889cy1fe+21v8qxGsU3v/nNnHrqqUmSt99+O6NHj85+++2X559/PqeddloTT7f4unbtmhkzZqRFixYL3O65555Ls2b/9/9pe/rppzNy5Mj0798/3bp1q7ft3XffvSRGXaD7778/K6+8cs4999yv5Hzlcjn77rtvRo0ald69e+eII47IiiuumClTpuSmm27K1ltvnfHjx2fTTTf9SuZpamPGjMnll1+e9dZbL6uttlqef/75ph6pUZx00klZddVV6y1r37590wwDAECjEHUAAFjm/fCHP6z3/uGHH84999wz1/IiqqmpqXcdBx54YNZaa61ceOGFOfnkk+cZQ+bMmZNZs2alZcuWX+Woi6RUKi3UnFVVVQt9zMrKysUZaZG89dZbjfqP7V/2GZ599tkZNWpUhg8fnnPOOafenSnHH398rr766lRULDv/c/Hggw/Oz372s7Rq1SrDhg1baqLO9ttvnz59+jT1GAAANCKPXwMAgIXw4Ycf5sgjj0yXLl1SVVWVtdZaK2eddVbK5fKX7nvKKaekWbNmueCCC+qW3XHHHdliiy3Spk2btGvXLoMGDcrEiRPr7TdkyJC0bds2r7/+enbZZZe0bds2HTt2zFFHHZXZs2cv0nW0bt06ffv2zYcffpi33347yf99D9C1116bddZZJ1VVVbnzzjuTJE888US23377VFdXp23bttl6663z8MMPz/PY06dPz4EHHphvfOMbqa6uzt57753//Oc/9ba55ZZbMmjQoHTu3DlVVVXp3r17Tj755Plez9///vdsuummadWqVVZdddX89re/rbd+Xt+pMy+f/06dUaNG5Xvf+16SZKuttprr8Xrz+k6dmTNn5sQTT8zqq6+eqqqqdOnSJUcffXRmzpxZb7t77rknm2++edq3b5+2bdtmrbXWynHHHTffuT6bf+zYsZk4ceJcsyzs792CPsMvmjFjRk499dT06NEjZ5111jwfNbbXXntlo402mu/cDz74YL73ve9llVVWqft5/OQnP8mMGTPqbffmm2/mRz/6Ub75zW+mqqoqK620Ur7zne/Ue+zdY489loEDB6ZDhw51n/O+++5b7zhz5szJr3/966yzzjpp2bJlVlhhhRx44IFz/X4tzLHmZYUVVkirVq2+dLv5ufLKKzNgwIB06tQpVVVV6dmzZy6++OK5tuvWrVt23HHHPPTQQ9loo43SsmXLrLbaahk9evRc206cODEDBgxIq1at8s1vfjOnnHLKIj028YMPPljk/7wAAODrZ9n5v14BAMAiKpfL2XnnnTN27Njst99+6dWrV+6666789Kc/zeuvv77AR2b9/Oc/z69+9atccskl2X///ZN8+ri3ffbZJwMHDszpp5+e6dOn5+KLL87mm2+eJ554ot7jwGbPnp2BAwdm4403zllnnZV77703Z599drp3756DDz54ka7nxRdfTPPmzevdGXL//ffn+uuvz7Bhw9KhQ4d069YtEydOzBZbbJHq6uocffTRadGiRS655JL0798/DzzwQDbeeON6xx02bFjat2+fESNG5LnnnsvFF1+cl19+OePGjasLB6NGjUrbtm1zxBFHpG3btrn//vvzi1/8IlOnTs2ZZ55Z73j/+c9/ssMOO2Tw4MHZY489cv311+fggw9OZWXlQv1D/fxsueWWOeyww3L++efnuOOOq3us3vwerzdnzpzsvPPOeeihh3LAAQdk7bXXzlNPPZVzzz03zz//fG6++eYkn/4j/I477pj11lsvJ510UqqqqvKvf/0r48ePn+8sHTt2zNVXX51f/vKXmTZtWt2j8tZee+0G/97N6zOcl4ceeijvvfdehg8fnubNmzfwp/epG264IdOnT8/BBx+cb3zjG3n00UdzwQUX5LXXXssNN9xQt91///d/Z+LEifnxj3+cbt265a233so999yTV155pe79tttum44dO+aYY45J+/btM3ny5PzpT3+qd74DDzwwo0aNyo9+9KMcdthheemll3LhhRfmiSeeyPjx49OiRYuFPtaScPHFF2edddbJzjvvnIqKitx222055JBDMmfOnBx66KH1tv3Xv/6V3XbbLfvtt1/22WefXHHFFRkyZEg22GCDrLPOOkk+jWFbbbVVPvnkkxxzzDFp06ZNLr300gaHp6222irTpk1LZWVlBg4cmLPPPjtrrLFGo103AABNoAwAANRz6KGHlj//X5VvvvnmcpLyKaecUm+73XbbrVwqlcr/+te/6pYlKR966KHlcrlcPvLII8vNmjUrjxo1qm79Bx98UG7fvn15//33r3esN998s1xTU1Nv+T777FNOUj7ppJPqbdu7d+/yBhts8KXX0a9fv3KPHj3Kb7/9dvntt98uP/PMM+XDDjusnKS800471Zu5WbNm5YkTJ9bbf5dddilXVlaWX3jhhbplb7zxRrldu3blLbfcsm7ZlVdeWU5S3mCDDcqzZs2qW37GGWeUk5RvueWWumXTp0+fa84DDzyw3Lp16/JHH31Ub/Yk5bPPPrtu2cyZM8u9evUqd+rUqe48L730UjlJ+corr6zb7sQTTyx/8X/qdO3atbzPPvvUvb/hhhvKScpjx46d58+tX79+de+vvvrqcrNmzcoPPvhgve1++9vflpOUx48fXy6Xy+Vzzz23nKT89ttvz3XML9OvX7/yOuusU29ZQ3/v5vUZzst5551XTlK+6aabFmq2sWPHzvWzmtfneOqpp5ZLpVL55ZdfLpfL5fJ//vOfcpLymWeeOd9j33TTTeUk5b/97W/z3ebBBx8sJylfe+219Zbfeeed9ZYvzLEWxhf//i+Mef08Bg4cWF5ttdXqLevatWs5Sfkvf/lL3bK33nqrXFVVVT7yyCPrlg0fPrycpPzII4/U266mpqacpPzSSy8tcJ4//OEP5SFDhpSvuuqq8k033VT++c9/Xm7dunW5Q4cO5VdeeaVB1wYAwNeLx68BAMCXGDNmTJo3b57DDjus3vIjjzwy5XI5d9xxR73l5XI5w4YNy3nnnZdrrrkm++yzT926e+65J++//3722GOPvPPOO3Wv5s2bZ+ONN87YsWPnOv9BBx1U7/0WW2yRF198caFmf/bZZ9OxY8d07Ngxa6+9di644IIMGjQoV1xxRb3t+vXrl549e9a9nz17du6+++7ssssuWW211eqWr7TSSvnBD36Qhx56KFOnTq13jAMOOKDed/QcfPDBqaioyJgxY+qWff5Ogw8++CDvvPNOtthii0yfPj3PPvtsveNVVFTkwAMPrHtfWVmZAw88MG+99Vb+/ve/L9T1N4Ybbrgha6+9dnr06FHvMxswYECS1H1mn935dMsttyzSY7K+qKG/d1/8DOfns8+tXbt2izzb5z/HDz/8MO+880423XTTlMvlPPHEE3XbVFZWZty4cXM9Ju0zn/3Mbr/99nz88cfz3OaGG25ITU1Nvv3tb9f7+W+wwQZp27btXD//BR1rSfn8z6O2tjbvvPNO+vXrlxdffDG1tbX1tu3Zs2e22GKLuvcdO3bMWmutVe/v9JgxY9K3b996j8Dr2LFj9txzz4WaZ/Dgwbnyyiuz9957Z5dddsnJJ5+cu+66K++++25++ctfLuplAgDwNSDqAADAl3j55ZfTuXPnuf4R/LPHdb388sv1lo8ePTq/+c1vcsEFF2SPPfaot27SpElJkgEDBtTFls9ed999d956661627ds2TIdO3ast2y55Zab7z+Sf1G3bt1yzz335N57781DDz2UN998M7fffns6dOhQb7tVV1213vu3334706dPz1prrTXXMddee+3MmTMnr776ar3lX3ysU9u2bbPSSivV+/6UiRMnZtddd01NTU2qq6vTsWPH/PCHP0ySuf7xu3PnzmnTpk29ZWuuuWaS1DvmkjZp0qRMnDhxrs/rs1k++8y+//3vZ7PNNsvQoUOzwgorZPfdd8/111+/yIGnob93X/wM56e6ujrJp1FtUb3yyisZMmRIll9++brveurXr1+S//scq6qqcvrpp+eOO+7ICiuskC233DJnnHFG3nzzzbrj9OvXL//93/+dkSNHpkOHDvnOd76TK6+8st53FU2aNCm1tbXp1KnTXJ/BtGnT6n7+C3OsJWX8+PHZZptt0qZNm7Rv3z4dO3as+y6lL/5er7LKKnPt/8W/0y+//PI8H5M2r7+PC2vzzTfPxhtvnHvvvXeRjwEAQNPznToAANDINttss0yYMCEXXnhhBg8enOWXX75u3Wf/wH/11VdnxRVXnGvfior6/xV9Ub/z5DNt2rTJNtts86XbLc6XxC+s999/P/369Ut1dXVOOumkdO/ePS1btszjjz+en/3sZ41yd8uSMGfOnKy77ro555xz5rm+S5cuST79Gf7lL3/J2LFj8+c//zl33nln/vCHP2TAgAG5++67F/uz/DIL+xn26NEjSfLUU09ll112afB5Zs+enW9/+9t577338rOf/Sw9evRImzZt8vrrr2fIkCH1Psfhw4dnp512ys0335y77rorJ5xwQk499dTcf//96d27d0qlUm688cY8/PDDue2223LXXXdl3333zdlnn52HH344bdu2zZw5c9KpU6dce+2185zns+i5MMdaEl544YVsvfXW6dGjR84555x06dIllZWVGTNmTM4999y5fq/n93tQLpeXyHyf16VLlzz33HNL/DwAACw5og4AAHyJrl275t57780HH3xQ766Jzx4X1rVr13rbr7766jnjjDPSv3//bLfddrnvvvvq9uvevXuSpFOnTgsVW5pKx44d07p163n+A/Czzz6bZs2a1cWMz0yaNClbbbVV3ftp06ZlypQp2WGHHZIk48aNy7vvvps//elP2XLLLeu2e+mll+Y5wxtvvJEPP/yw3t06zz//fJJP70BaHKVSaaG37d69e5588slsvfXWX7pfs2bNsvXWW2frrbfOOeeck1/96lc5/vjjM3bs2AZ/3g39vVtYm2++eZZbbrn87ne/y3HHHdfg2PTUU0/l+eefz1VXXZW99967bvk999wzz+27d++eI488MkceeWQmTZqUXr165eyzz84111xTt03fvn3Tt2/f/PKXv8x1112XPffcM7///e8zdOjQdO/ePffee28222yzhQpXCzrWknDbbbdl5syZufXWW+vdhTOvRykurK5du9bd1fd5ixtkXnzxxbnu/AMAoFg8fg0AAL7EDjvskNmzZ+fCCy+st/zcc89NqVTK9ttvP9c+6623XsaMGZNnnnkmO+20U2bMmJEkGThwYKqrq/OrX/1qnt/78fbbby+Zi2ig5s2bZ9ttt80tt9xS71Fn//73v3Pddddl8803r3uM12cuvfTSetd08cUX55NPPqn7+XwWDz5/R8KsWbNy0UUXzXOGTz75JJdcckm9bS+55JJ07NgxG2ywwWJd32eh6P333//SbQcPHpzXX389l1122VzrZsyYkQ8//DBJ8t577821vlevXkmySI8AW5Tfu4XRunXr/OxnP8szzzyTn/3sZ/O8Q+Saa67Jo48+Os/95/U5lsvlnHfeefW2mz59ej766KN6y7p375527drV/Tz+85//zHX+L/7MBg8enNmzZ+fkk0+ea5ZPPvmk7jNcmGMtCfP6edTW1ubKK69c5GPusMMOefjhh+t9Bm+//fZ871b6onn958iYMWPy97//Pdttt90izwUAQNNzpw4AAHyJnXbaKVtttVWOP/74TJ48Of/1X/+Vu+++O7fcckuGDx9ed/fNF/Xt2ze33HJLdthhh+y22265+eabU11dnYsvvjh77bVX1l9//ey+++7p2LFjXnnllfz5z3/OZpttNtc/4jeVU045Jffcc08233zzHHLIIamoqMgll1ySmTNn5owzzphr+1mzZmXrrbfO4MGD89xzz+Wiiy7K5ptvnp133jlJsummm2a55ZbLPvvsk8MOOyylUilXX331fB871blz55x++umZPHly1lxzzfzhD3/IhAkTcumll6ZFixaLdW29evVK8+bNc/rpp6e2tjZVVVUZMGBAOnXqNNe2e+21V66//vocdNBBGTt2bDbbbLPMnj07zz77bK6//vrcdddd6dOnT0466aT85S9/yaBBg9K1a9e89dZbueiii/LNb34zm2++eYNnXNTfu4Xx05/+NBMnTszZZ5+dsWPHZrfddsuKK66YN998MzfffHMeffTR/PWvf53nvj169Ej37t1z1FFH5fXXX091dXX++Mc/zvU9T88//3zd70PPnj1TUVGRm266Kf/+97+z++67J0muuuqqXHTRRdl1113TvXv3fPDBB7nssstSXV1dd4dXv379cuCBB+bUU0/NhAkTsu2226ZFixaZNGlSbrjhhpx33nnZbbfdFupY8/Pyyy/n6quvTpI89thjST79/U8+vWtmr732mu++2267bSorK7PTTjvlwAMPzLRp03LZZZelU6dOmTJlykJ8GnM7+uijc/XVV2e77bbL4YcfnjZt2uTSSy9N165d849//ONL9990003Tu3fv9OnTJzU1NXn88cdzxRVXpEuXLnXf9QMAQEGVAQCAeg499NDyF/+r8gcffFD+yU9+Uu7cuXO5RYsW5TXWWKN85plnlufMmVNvuyTlQw89tN6yW265pVxRUVH+/ve/X549e3a5XC6Xx44dWx44cGC5pqam3LJly3L37t3LQ4YMKT/22GN1++2zzz7lNm3azDXfiSeeONd889KvX7/yOuus86XbzWvmzzz++OPlgQMHltu2bVtu3bp1eauttir/9a9/rbfNlVdeWU5SfuCBB8oHHHBAebnlliu3bdu2vOeee5bffffdetuOHz++3Ldv33KrVq3KnTt3Lh999NHlu+66q5ykPHbs2Llmf+yxx8qbbLJJuWXLluWuXbuWL7zwwnrHe+mll8pJyldeeWXdsnn9fLp27VreZ5996i277LLLyquttlq5efPm9c7fr1+/cr9+/eptO2vWrPLpp59eXmeddcpVVVXl5ZZbrrzBBhuUR44cWa6trS2Xy+XyfffdV/7Od75T7ty5c7mysrLcuXPn8h577FF+/vnn5/mz/bz5fVaL83u3MG688cbytttuW15++eXLFRUV5ZVWWqn8/e9/vzxu3Li6bcaOHTvX5/P000+Xt9lmm3Lbtm3LHTp0KO+///7lJ598st5n8c4775QPPfTQco8ePcpt2rQp19TUlDfeeOPy9ddfX3ecxx9/vLzHHnuUV1lllXJVVVW5U6dO5R133LHe34PPXHrppeUNNtig3KpVq3K7du3K6667bvnoo48uv/HGGw0+1hd9do3zen3xd2Febr311vJ6661XbtmyZblbt27l008/vXzFFVeUk5Rfeumluu26du1aHjRo0Fz7z+t37h//+Ee5X79+5ZYtW5ZXXnnl8sknn1z+n//5n7mOOS/HH398uVevXuWamppyixYtyqusskr54IMPLr/55ptfei0AAHy9lcrlr+DbGAEAAAAAAFgsvlMHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAKoaOoBljVz5szJG2+8kXbt2qVUKjX1OAAAAAAAQBMql8v54IMP0rlz5zRrtuB7cUSdr9gbb7yRLl26NPUYAAAAAADA18irr76ab37zmwvcRtT5irVr1y7Jpx9OdXV1E08DAAAAAAA0palTp6ZLly51/WBBRJ2v2GePXKuurhZ1AAAAAACAJFmor2xZ8MPZAAAAAAAA+FoQdQAAAAAAAApA1AEAAAAAACgA36kDAAAAAABNrFwu55NPPsns2bObehQaWfPmzVNRUbFQ35nzZUQdAAAAAABoQrNmzcqUKVMyffr0ph6FJaR169ZZaaWVUllZuVjHEXUAAAAAAKCJzJkzJy+99FKaN2+ezp07p7KyslHu6ODroVwuZ9asWXn77bfz0ksvZY011kizZov+zTiiDgAAAAAANJFZs2Zlzpw56dKlS1q3bt3U47AEtGrVKi1atMjLL7+cWbNmpWXLlot8rEXPQQAAAAAAQKNYnLs3+PprrM/XbwkAAAAAAEABiDoAAAAAAAAF4Dt1AAAAAADga6jbMX/+Ss83+bRBjX7MvfbaK2uvvXaOO+64xT7WkCFD8v777+fmm29OkowbNy5DhgzJ5MmTF/vYi+rOO+/MMccck8cff/wreYSeO3UAAAAAAIAGGzJkSEqlUkqlUiorK7P66qvnpJNOyieffJIkefLJJzNmzJgcdthhjXK+8847L6NGjWqUYy2sww47LBtssEGqqqrSq1evudZvt912adGiRa699tqvZB5RBwAAAAAAWCTbbbddpkyZkkmTJuXII4/MiBEjcuaZZyZJLrjggnzve99L27ZtG+VcNTU1ad++faMcqyH23XfffP/735/v+iFDhuT888//SmYRdQAAAAAAgEVSVVWVFVdcMV27ds3BBx+cbbbZJrfeemtmz56dG2+8MTvttFPdts8++2xat26d6667rm7Z9ddfn1atWuXpp5/+0nMNGTIku+yyy3zXDxgwIMOGDau37O23305lZWXuu+++hl9ckvPPPz+HHnpoVltttflus9NOO+Wxxx7LCy+8sEjnaAhRBwAAAAAAaBStWrXKrFmz8o9//CO1tbXp06dP3boePXrkrLPOyiGHHJJXXnklr732Wg466KCcfvrp6dmz52Kfe+jQobnuuusyc+bMumXXXHNNVl555QwYMCBJctBBB6Vt27YLfDXUKquskhVWWCEPPvjgYl/Dl6lY4mcAAAAAAACWauVyOffdd1/uuuuu/PjHP87LL7+c5s2bp1OnTvW2O+SQQzJmzJj88Ic/TGVlZTbccMP8+Mc/bpQZvvvd72bYsGG55ZZbMnjw4CTJqFGj6r77J0lOOumkHHXUUY1yvs/r3LlzXn755UY/7heJOgAAAAAAwCK5/fbb07Zt23z88ceZM2dOfvCDH2TEiBG59dZbU1VVVRdTPu+KK67ImmuumWbNmmXixInz3GZRtGzZMnvttVeuuOKKDB48OI8//nj++c9/5tZbb63bplOnTnOFpsbQqlWrTJ8+vdGP+0WiDgAAAAAAsEi22mqrXHzxxamsrEznzp1TUfFpdujQoUOmT5+eWbNmpbKyst4+Tz75ZD788MM0a9YsU6ZMyUorrdRo8wwdOjS9evXKa6+9liuvvDIDBgxI165d69YfdNBBueaaaxZ4jGnTpjX4vO+99146duzY4P0aStQBAAAAAAAWSZs2bbL66qvPtbxXr15Jkqeffrruz8mn8WPIkCE5/vjjM2XKlOy55555/PHH06pVq0aZZ911102fPn1y2WWX5brrrsuFF15Yb/2SePzaRx99lBdeeCG9e/du1OPOi6gDAAAAAAA0qo4dO2b99dfPQw89VC/qHHTQQenSpUt+/vOfZ+bMmendu3eOOuqo/OY3v2m0cw8dOjTDhg1LmzZtsuuuu9Zb19DHr/3rX//KtGnT8uabb2bGjBmZMGFCkqRnz551dyA9/PDDqaqqyiabbNJo1zA/og4AAAAAAHwNTT5tUFOPsFiGDh2a0aNHZ9iwYUmS0aNHZ8yYMXniiSdSUVGRioqKXHPNNdl8882z4447Zvvtt2+U8+6xxx4ZPnx49thjj7Rs2XKxjjV06NA88MADde8/uxvnpZdeSrdu3ZIkv/vd77LnnnumdevWi3WuhVEql8vlJX4W6kydOjU1NTWpra1NdXV1U48DAAAAAEAT+uijj/LSSy9l1VVXXewA8XUzY8aMrLXWWvnDH/6wRO5iGTduXIYMGZLJkyfXWz558uR07949f/vb37L++us3+nk/75133slaa62Vxx57LKuuuup8t1vQ59yQbuBOHQAAAAAAoNG1atUqo0ePzjvvvPOVnO/jjz/Ou+++m5///Ofp27fvEg86yacB6aKLLlpg0GlMog4AAAAAALBE9O/ff6G3bdu27XzX3XHHHdliiy0WuP/48eOz1VZbZc0118yNN9640OddHH369EmfPn2+knMlog4AAAAAAPA1MGHChPmuW3nlleda1q1btwwfPrzuff/+/bO0f+OMqAMAAAAAADS51VdfvUHbfzHqLAuaNfUAAAAAAAAAfDlRBwAAAAAAoAA8fq2JfOvEu9KsqnVTjwGLZfJpg5p6BAAAAACAZYY7dQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAAfKcOAAAAAAB8HY2o+YrPV9voh9xrr72y9tpr57jjjlvsYw0ZMiTvv/9+br755iTJuHHjMmTIkEyePHmxj72ofvvb3+bPf/5zbrvttq/kfO7UAQAAAAAAGmzIkCEplUoplUqprKzM6quvnpNOOimffPJJkuTJJ5/MmDFjcthhhzXK+c4777yMGjWqUY61sD67vs+/fv/739et33ffffP444/nwQcf/ErmcacOAAAAAACwSLbbbrtceeWVmTlzZsaMGZNDDz00LVq0yLHHHpsLLrgg3/ve99K2bdtGOVdNzVd859L/d+WVV2a77bare9++ffu6P1dWVuYHP/hBzj///GyxxRZLfBZ36gAAAAAAAIukqqoqK664Yrp27ZqDDz4422yzTW699dbMnj07N954Y3baaae6bZ999tm0bt061113Xd2y66+/Pq1atcrTTz/9pecaMmRIdtlll/muHzBgQIYNG1Zv2dtvv53Kysrcd999Db+4/699+/ZZccUV614tW7ast36nnXbKrbfemhkzZizyORaWqAMAAAAAADSKVq1aZdasWfnHP/6R2tra9OnTp25djx49ctZZZ+WQQw7JK6+8ktdeey0HHXRQTj/99PTs2XOxzz106NBcd911mTlzZt2ya665JiuvvHIGDBiQJDnooIPStm3bBb6+6NBDD02HDh2y0UYb5Yorrki5XK63vk+fPvnkk0/yyCOPLPY1fBmPXwMAAAAAABZLuVzOfffdl7vuuis//vGP8/LLL6d58+bp1KlTve0OOeSQjBkzJj/84Q9TWVmZDTfcMD/+8Y8bZYbvfve7GTZsWG655ZYMHjw4STJq1Ki67/5JkpNOOilHHXXUQh/zpJNOyoABA9K6devcfffdOeSQQzJt2rR63xPUunXr1NTU5OWXX26U61gQUQcAAAAAAFgkt99+e9q2bZuPP/44c+bMyQ9+8IOMGDEit956a6qqqupiyuddccUVWXPNNdOsWbNMnDhxntssipYtW2avvfbKFVdckcGDB+fxxx/PP//5z9x6661123Tq1Gmu0LQgJ5xwQt2fe/funQ8//DBnnnlmvaiTfHqH0vTp0xf/Ir6Ex68BAAAAAACLZKuttsqECRMyadKkzJgxI1dddVXatGmTDh06ZPr06Zk1a9Zc+zz55JP58MMP8+GHH2bKlCmNOs/QoUNzzz335LXXXsuVV16ZAQMGpGvXrnXrF+Xxa5+38cYb57XXXqv3iLckee+999KxY8dGvZZ5cacOAAAAAACwSNq0aZPVV199ruW9evVKkjz99NN1f04+jR9DhgzJ8ccfnylTpmTPPffM448/nlatWjXKPOuuu2769OmTyy67LNddd10uvPDCeusb+vi1L5owYUKWW265VFVV1S174YUX8tFHH6V3796LfNyFJeoAAAAAAACNqmPHjll//fXz0EMP1Ys6Bx10ULp06ZKf//znmTlzZnr37p2jjjoqv/nNbxrt3EOHDs2wYcPSpk2b7LrrrvXWNeTxa7fddlv+/e9/p2/fvmnZsmXuueee/OpXv5orCj344INZbbXV0r1790a7hvkRdQAAAAAA4OtoRG1TT7BYhg4dmtGjR2fYsGFJktGjR2fMmDF54oknUlFRkYqKilxzzTXZfPPNs+OOO2b77bdvlPPuscceGT58ePbYY4+0bNlykY/TokWL/OY3v8lPfvKTlMvlrL766jnnnHOy//7719vud7/73VzLlpRSuVwufyVnIkkyderU1NTUpMvw69OsqnVTjwOLZfJpg5p6BAAAAAAotI8++igvvfRSVl111cUKEF9HM2bMyFprrZU//OEP2WSTTRr9+OPGjcuQIUMyefLkessnT56c7t27529/+1vWX3/9Rj/v502cODEDBgzI888/n5qamvlut6DP+bNuUFtbm+rq6gWez506AAAAAABAo2vVqlVGjx6dd9555ys538cff5x33303P//5z9O3b98lHnSSZMqUKRk9evQCg05jEnUAAAAAAIAlon///gu9bdu2bee77o477sgWW2yxwP3Hjx+frbbaKmuuuWZuvPHGhT7v4thmm22+kvN8RtQBAAAAAACa3IQJE+a7buWVV55rWbdu3TJ8+PC69/3798/S/o0zog4AAAAAANDkVl999QZt/8Wosyxo1tQDAAAAAADAsm5pv8NkWddYn6+oAwAAAAAATaRFixZJkunTpzfxJCxJn32+n33ei8rj1wAAAAAAoIk0b9487du3z1tvvZUkad26dUqlUhNPRWMpl8uZPn163nrrrbRv3z7NmzdfrOOJOgAAAAAA0IRWXHHFJKkLOyx92rdvX/c5Lw5RBwAAAAAAmlCpVMpKK62UTp065eOPP27qcWhkLVq0WOw7dD4j6gAAAAAAwNdA8+bNG+0f/1k6NWvqAYpo8uTJKZVKmTBhQlOPAgAAAAAALCNEHQAAAAAAgAJYJqNO//79M2zYsAwbNiw1NTXp0KFDTjjhhJTL5SSfPr/w5ptvrrdP+/btM2rUqCTJqquumiTp3bt3SqVS+vfv/xVODwAAAAAALIuWyaiTJFdddVUqKiry6KOP5rzzzss555yTyy+/fKH2ffTRR5Mk9957b6ZMmZI//elP89125syZmTp1ar0XAAAAAABAQ1U09QBNpUuXLjn33HNTKpWy1lpr5amnnsq5556b/fff/0v37dixY5LkG9/4RlZcccUFbnvqqadm5MiRcy3/Z8v9Ul1VWrThl7QRtU09AQAAAAAA8AXL7J06ffv2Tan0f1Flk002yaRJkzJ79uxGPc+xxx6b2trauterr77aqMcHAAAAAACWDcvsnToLUiqV6r5f5zMff/zxIh2rqqoqVVVVjTEWAAAAAACwDFtm79R55JFH6r1/+OGHs8Yaa6R58+bp2LFjpkyZUrdu0qRJmT59et37ysrKJGn0u3oAAAAAAADmZ5mNOq+88kqOOOKIPPfcc/nd736XCy64IIcffniSZMCAAbnwwgvzxBNP5LHHHstBBx2UFi1a1O3bqVOntGrVKnfeeWf+/e9/p7bWd9AAAAAAAABL1jIbdfbee+/MmDEjG220UQ499NAcfvjhOeCAA5IkZ599drp06ZItttgiP/jBD3LUUUeldevWdftWVFTk/PPPzyWXXJLOnTvnO9/5TlNdBgAAAAAAsIwolb/45THLgP79+6dXr1759a9//ZWfe+rUqampqUntMe1SXVX6ys+/UEa48wgAAAAAAL4Kdd2gtjbV1dUL3HaZvVMHAAAAAACgSEQdAAAAAACAAqho6gGawrhx45p6BAAAAAAAgAZxpw4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAFUNPUAy6xjX0uqq5t6CgAAAAAAoCDcqQMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAFU09wLLqWyfelWZVrZt6jK/E5NMGNfUIAAAAAABQeO7UAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKIBlOupMnjw5pVIpEyZMaOpRAAAAAAAAFqiiqQdoSl26dMmUKVPSoUOHph4FAAAAAABggZbZqDNr1qxUVlZmxRVXbOpRAAAAAAAAvtRS8/i1/v37Z9iwYRk2bFhqamrSoUOHnHDCCSmXy0mSbt265eSTT87ee++d6urqHHDAAfN8/NrEiROz4447prq6Ou3atcsWW2yRF154oW795ZdfnrXXXjstW7ZMjx49ctFFF33VlwoAAAAAACyDlqo7da666qrst99+efTRR/PYY4/lgAMOyCqrrJL9998/SXLWWWflF7/4RU488cR57v/6669nyy23TP/+/XP//fenuro648ePzyeffJIkufbaa/OLX/wiF154YXr37p0nnngi+++/f9q0aZN99tlnnsecOXNmZs6cWfd+6tSpjXzVAAAAAADAsqBU/uxWloLr379/3nrrrUycODGlUilJcswxx+TWW2/N008/nW7duqV379656aab6vaZPHlyVl111TzxxBPp1atXjjvuuPz+97/Pc889lxYtWsx1jtVXXz0nn3xy9thjj7plp5xySsaMGZO//vWv85xrxIgRGTly5FzLa49pl+qq0uJeNgAAy4IRtU09AQAAAEvI1KlTU1NTk9ra2lRXVy9w26Xm8WtJ0rdv37qgkySbbLJJJk2alNmzZydJ+vTps8D9J0yYkC222GKeQefDDz/MCy+8kP322y9t27ate51yyin1Hs/2Rccee2xqa2vrXq+++uoiXh0AAAAAALAsW6oev/Zl2rRps8D1rVq1mu+6adOmJUkuu+yybLzxxvXWNW/efL77VVVVpaqqqgFTAgAAAAAAzG2pijqPPPJIvfcPP/xw1lhjjQVGl89bb731ctVVV+Xjjz+e626dFVZYIZ07d86LL76YPffcs9FmBgAAAAAAWBhL1ePXXnnllRxxxBF57rnn8rvf/S4XXHBBDj/88IXef9iwYZk6dWp23333PPbYY5k0aVKuvvrqPPfcc0mSkSNH5tRTT83555+f559/Pk899VSuvPLKnHPOOUvqkgAAAAAAAJIsZXfq7L333pkxY0Y22mijNG/ePIcffngOOOCAhd7/G9/4Ru6///789Kc/Tb9+/dK8efP06tUrm222WZJk6NChad26dc4888z89Kc/TZs2bbLuuutm+PDhS+iKAAAAAAAAPlUql8vlph6iMfTv3z+9evXKr3/966YeZYGmTp2ampqa1B7TLtVVpaYeBwCAIhhR29QTAAAAsITUdYPa2lRXVy9w26Xq8WsAAAAAAABLK1EHAAAAAACgAJaa79QZN25cU48AAAAAAACwxLhTBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAogIqmHmCZdexrSXV1U08BAAAAAAAUhDt1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACqCiqQdYVn3rxLvSrKp1U4+xVJt82qCmHgEAAAAAABqNO3UAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNRZTKVSKTfffHNTjwEAAAAAACzlRB0AAAAAAIACEHUAAAAAAAAKYKmJOv37989hhx2Wo48+Ossvv3xWXHHFjBgxom79+++/n6FDh6Zjx46prq7OgAED8uSTT9Y7xi233JL1118/LVu2zGqrrZaRI0fmk08+qVs/adKkbLnllmnZsmV69uyZe+6556u6PAAAAAAAYBlX0dQDNKarrroqRxxxRB555JH87//+b4YMGZLNNtss3/72t/O9730vrVq1yh133JGamppccskl2XrrrfP8889n+eWXz4MPPpi99947559/frbYYou88MILOeCAA5IkJ554YubMmZPvfve7WWGFFfLII4+ktrY2w4cP/9KZZs6cmZkzZ9a9nzp16pK6fAAAAAAAYClWKpfL5aYeojH0798/s2fPzoMPPli3bKONNsqAAQOy4447ZtCgQXnrrbdSVVVVt3711VfP0UcfnQMOOCDbbLNNtt566xx77LF166+55pocffTReeONN3L33Xdn0KBBefnll9O5c+ckyZ133pntt98+N910U3bZZZd5zjVixIiMHDlyruW1x7RLdVWpka6eRTKitqknAAAAAABgGTd16tTU1NSktrY21dXVC9x2qbpTZ7311qv3fqWVVspbb72VJ598MtOmTcs3vvGNeutnzJiRF154IUny5JNPZvz48fnlL39Zt3727Nn56KOPMn369DzzzDPp0qVLXdBJkk022eRLZzr22GNzxBFH1L2fOnVqunTpskjXBwAAAAAALLuWqqjTokWLeu9LpVLmzJmTadOmZaWVVsq4cePm2qd9+/ZJkmnTpmXkyJH57ne/O9c2LVu2XOSZqqqq6t0dBAAAAAAAsCiWqqgzP+uvv37efPPNVFRUpFu3bvPd5rnnnsvqq68+z/Vrr712Xn311UyZMiUrrbRSkuThhx9eUiMDAAAAAADUs0xEnW222SabbLJJdtlll5xxxhlZc80188Ybb+TPf/5zdt111/Tp0ye/+MUvsuOOO2aVVVbJbrvtlmbNmuXJJ5/MP//5z5xyyinZZpttsuaaa2afffbJmWeemalTp+b4449v6ksDAAAAAACWEc2aeoCvQqlUypgxY7LlllvmRz/6UdZcc83svvvuefnll7PCCiskSQYOHJjbb789d999dzbccMP07ds35557brp27ZokadasWW666abMmDEjG220UYYOHVrv+3cAAAAAAACWpFK5XC439RDLkqlTp6ampia1x7RLdVWpqcdZto2obeoJAAAAAABYxtV1g9raVFdXL3DbZeJOHQAAAAAAgKITdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoAAqmnqAZdaxryXV1U09BQAAAAAAUBDu1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAAqgoqkHWFZ968S70qyqdVOPURiTTxvU1CMAAAAAAECTcqcOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAASyVUWfcuHEplUp5//3357vNiBEj0qtXr69sJgAAAAAAgMWxVESd/v37Z/jw4Q3a56ijjsp99923ZAYCAAAAAABoZBVNPUBTadu2bdq2bdvUYwAAAAAAACyUwt+pM2TIkDzwwAM577zzUiqVUiqVMnny5CTJ3//+9/Tp0yetW7fOpptumueee65uvy8+fm3IkCHZZZddctZZZ2WllVbKN77xjRx66KH5+OOP67aZMmVKBg0alFatWmXVVVfNddddl27duuXXv/71V3S1AAAAAADAsqrwUee8887LJptskv333z9TpkzJlClT0qVLlyTJ8ccfn7PPPjuPPfZYKioqsu+++y7wWGPHjs0LL7yQsWPH5qqrrsqoUaMyatSouvV777133njjjYwbNy5//OMfc+mll+att95a4DFnzpyZqVOn1nsBAAAAAAA0VOEfv1ZTU5PKysq0bt06K664YpLk2WefTZL88pe/TL9+/ZIkxxxzTAYNGpSPPvooLVu2nOexlltuuVx44YVp3rx5evTokUGDBuW+++7L/vvvn2effTb33ntv/va3v6VPnz5JkssvvzxrrLHGAuc79dRTM3LkyLmW/7PlfqmuKi3ydX9lRtQ29QQAAAAAAECWgjt1FmS99dar+/NKK62UJAu8s2adddZJ8+bN6+3z2fbPPfdcKioqsv7669etX3311bPccsstcIZjjz02tbW1da9XX311ka4FAAAAAABYthX+Tp0FadGiRd2fS6VP74qZM2fOQm3/2T4L2n5hVFVVpaqqarGOAQAAAAAAsFTcqVNZWZnZs2cv0XOstdZa+eSTT/LEE0/ULfvXv/6V//znP0v0vAAAAAAAAMlSEnW6deuWRx55JJMnT84777yz2HfXzEuPHj2yzTbb5IADDsijjz6aJ554IgcccEBatWpVdxcQAAAAAADAkrJURJ2jjjoqzZs3T8+ePdOxY8e88sorS+Q8o0ePzgorrJAtt9wyu+66a/bff/+0a9cuLVu2XCLnAwAAAAAA+EypXC6Xm3qIonrttdfSpUuX3Hvvvdl6660Xap+pU6empqYmtce0S3VVAe7wGVHb1BMAAAAAAMBSq64b1Namurp6gdtWfEUzLRXuv//+TJs2Leuuu26mTJmSo48+Ot26dcuWW27Z1KMBAAAAAABLOVGnAT7++OMcd9xxefHFF9OuXbtsuummufbaa9OiRYumHg0AAAAAAFjKiToNMHDgwAwcOLCpxwAAAAAAAJZBzZp6AAAAAAAAAL6cqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFEBFUw+wzDr2taS6uqmnAAAAAAAACsKdOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAFUNPUAy6pvnXhXmlW1btA+k08btISmAQAAAAAAvu7cqQMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAX9uo079//wwfPrypx/hSpVIpN998c1OPAQAAAAAALOW+tlFncY0bNy6lUinvv/9+veWNHYumTJmS7bffvtGOBwAAAAAAMC8VTT1AUc2aNSuVlZVZccUVm3oUAAAAAABgGfC1vlPnk08+ybBhw1JTU5MOHTrkhBNOSLlcTpJcffXV6dOnT9q1a5cVV1wxP/jBD/LWW28lSSZPnpytttoqSbLccsulVCplyJAhGTJkSB544IGcd955KZVKKZVKmTx5cpLkn//8Z7bffvu0bds2K6ywQvbaa6+88847dbP0798/w4YNy/Dhw9OhQ4cMHDgwicevAQAAAAAAX42vddS56qqrUlFRkUcffTTnnXdezjnnnFx++eVJko8//jgnn3xynnzyydx8882ZPHlyhgwZkiTp0qVL/vjHPyZJnnvuuUyZMiXnnXdezjvvvGyyySbZf//9M2XKlEyZMiVdunTJ+++/nwEDBqR379557LHHcuedd+bf//53Bg8ePNc8lZWVGT9+fH77298u1DXMnDkzU6dOrfcCAAAAAABoqK/149e6dOmSc889N6VSKWuttVaeeuqpnHvuudl///2z77771m232mqr5fzzz8+GG26YadOmpW3btll++eWTJJ06dUr79u3rtq2srEzr1q3rPTbtwgsvTO/evfOrX/2qbtkVV1yRLl265Pnnn8+aa66ZJFljjTVyxhlnNOgaTj311IwcOXKu5f9suV+qq0oNOlZGJBlR27B9AAAAAACApcLX+k6dvn37plT6v/CxySabZNKkSZk9e3b+/ve/Z6eddsoqq6ySdu3apV+/fkmSV155pcHnefLJJzN27Ni0bdu27tWjR48kyQsvvFC33QYbbNDgYx977LGpra2te7366qsNPgYAAAAAAMDX+k6d+fnoo48ycODADBw4MNdee206duyYV155JQMHDsysWbMafLxp06Zlp512yumnnz7XupVWWqnuz23atGnwsauqqlJVVdXg/QAAAAAAAD7vax11HnnkkXrvH3744ayxxhp59tln8+677+a0005Lly5dkiSPPfZYvW0rKyuTJLNnz55r+ReXrb/++vnjH/+Ybt26paLia/0jAQAAAAAAllFf68evvfLKKzniiCPy3HPP5Xe/+10uuOCCHH744VlllVVSWVmZCy64IC+++GJuvfXWnHzyyfX27dq1a0qlUm6//fa8/fbbmTZtWpKkW7dueeSRRzJ58uS88847mTNnTg499NC899572WOPPfK3v/0tL7zwQu6666786Ec/misAAQAAAAAANIWvddTZe++9M2PGjGy00UY59NBDc/jhh+eAAw5Ix44dM2rUqNxwww3p2bNnTjvttJx11ln19l155ZUzcuTIHHPMMVlhhRUybNiwJMlRRx2V5s2bp2fPnnWPbevcuXPGjx+f2bNnZ9ttt826666b4cOHp3379mnW7Gv9IwIAAAAAAJYRpXK5XG7qIZYlU6dOTU1NTWqPaZfqqlLDDzCitvGHAgAAAAAAmkRdN6itTXV19QK3dRsKAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQABVNPcAy69jXkurqpp4CAAAAAAAoCHfqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABVCxKDu9//77efTRR/PWW29lzpw59dbtvffejTIYAAAAAAAA/6fBUee2227LnnvumWnTpqW6ujqlUqluXalUEnUAAAAAAACWgAY/fu3II4/Mvvvum2nTpuX999/Pf/7zn7rXe++9tyRmBAAAAAAAWOY1OOq8/vrrOeyww9K6deslMQ8AAAAAAADz0OCoM3DgwDz22GNLYhYAAAAAAADmo8HfqTNo0KD89Kc/zdNPP5111103LVq0qLd+5513brThAAAAAAAA+FSpXC6XG7JDs2bzv7mnVCpl9uzZiz3U0mzq1KmpqalJbW1tqqurm3ocAAAAAACgCTWkGzT4Tp05c+Ys8mAAAAAAAAAsmgZ/p87nffTRR401BwAAAAAAAAvQ4Kgze/bsnHzyyVl55ZXTtm3bvPjii0mSE044If/zP//T6AMCAAAAAACwCFHnl7/8ZUaNGpUzzjgjlZWVdcu/9a1v5fLLL2/U4QAAAAAAAPhUg6PO6NGjc+mll2bPPfdM8+bN65b/13/9V5599tlGHQ4AAAAAAIBPNTjqvP7661l99dXnWj5nzpx8/PHHjTIUAAAAAAAA9TU46vTs2TMPPvjgXMtvvPHG9O7du1GGAgAAAAAAoL6Khu7wi1/8Ivvss09ef/31zJkzJ3/605/y3HPPZfTo0bn99tuXxIwAAAAAAADLvAbfqfOd73wnt912W+699960adMmv/jFL/LMM8/ktttuy7e//e0lMSMAAAAAAMAyr8F36rz22mvZYostcs8998y17uGHH07fvn0bZTAAAAAAAAD+T4Pv1Nl2223z3nvvzbV8/Pjx2W677RplKAAAAAAAAOprcNTp27dvtt1223zwwQd1y/7yl79khx12yIknntiowwEAAAAAAPCpBkedyy+/PKusskp22mmnzJw5M2PHjs2gQYNy0kkn5Sc/+cmSmBEAAAAAAGCZ1+Co06xZs/z+979PixYtMmDAgOy888459dRTc/jhhy+J+QAAAAAAAEhSKpfL5S/b6B//+Mdcyz744IPsscceGTRoUA4++OC65eutt17jTriUmTp1ampqalJbW5vq6uqmHgcAAAAAAGhCDekGCxV1mjVrllKplM9v+vn3n/25VCpl9uzZizn+0k3UAQAAAAAAPtOQblCxMAd86aWXGmUwAAAAAAAAFs1CRZ2uXbsu6TkAAAAAAABYgIWKOl/0wgsv5Ne//nWeeeaZJEnPnj1z+OGHp3v37o06HAAAAAAAAJ9q1tAd7rrrrvTs2TOPPvpo1ltvvay33np55JFHss466+See+5ZEjMCAAAAAAAs80rlcrnckB169+6dgQMH5rTTTqu3/Jhjjsndd9+dxx9/vFEHXNo05AuPAAAAAACApVtDukGD79R55plnst9++821fN99983TTz/d0MMBAAAAAACwEBocdTp27JgJEybMtXzChAnp1KlTY8wEAAAAAADAF1Qs7IYnnXRSjjrqqOy///454IAD8uKLL2bTTTdNkowfPz6nn356jjjiiCU2KAAAAAAAwLJsob9Tp3nz5pkyZUo6duyYX//61zn77LPzxhtvJEk6d+6cn/70pznssMNSKpWW6MBF5zt1AAAAAACAzzSkGyx01GnWrFnefPPNeo9Y++CDD5Ik7dq1W4xxly2iDgAAAAAA8JmGdIOFfvxakrnuwhFzAAAAAAAAvhoNijprrrnmlz5e7b333lusgQAAAAAAAJhbg6LOyJEjU1NTs6RmAQAAAAAAYD4aFHV23333et+pAwAAAAAAwFej2cJu+GWPXQMAAAAAAGDJWeioUy6Xl+QcAAAAAAAALMBCP35tzpw5S3IOAAAAAAAAFmCh79QBAAAAAACg6Yg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABVDT1AMuqb514V5pVtW7qMYBlxOTTBjX1CAAAAADAYnKnDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiDgAAAAAAQAGIOgAAAAAAAAUg6gAAAAAAABSAqAMAAAAAAFAAog4AAAAAAEABiDoAAAAAAAAFIOoAAAAAAAAUgKgDAAAAAABQAKIOAAAAAABAAYg6AAAAAAAABSDqAAAAAAAAFICoAwAAAAAAUACiTgNMnjw5pVIpEyZMaOpRAAAAAACAZYyoAwAAAAAAUACizkKaNWtWU48AAAAAAAAsw5aaqHP77benffv2mT17dpJkwoQJKZVKOeaYY+q2GTp0aH74wx8mSf74xz9mnXXWSVVVVbp165azzz673vG6deuWk08+OXvvvXeqq6tzwAEHzHXO2bNnZ999902PHj3yyiuvLMGrAwAAAAAAlnVLTdTZYost8sEHH+SJJ55IkjzwwAPp0KFDxo0bV7fNAw88kP79++fvf/97Bg8enN133z1PPfVURowYkRNOOCGjRo2qd8yzzjor//Vf/5UnnngiJ5xwQr11M2fOzPe+971MmDAhDz74YFZZZZV5zjVz5sxMnTq13gsAAAAAAKChSuVyudzUQzSWDTbYIHvssUeOOuqo7Lrrrtlwww0zcuTIvPvuu6mtrc03v/nNPP/88xkxYkTefvvt3H333XX7Hn300fnzn/+ciRMnJvn0Tp3evXvnpptuqttm8uTJWXXVVfPggw9mxIgRmTlzZm6//fbU1NTMd6YRI0Zk5MiRcy2vPaZdqqtKjXj1ACRJRtQ29QQAAAAAsNCmTp2ampqa1NbWprq6eoHbLjV36iRJv379Mm7cuJTL5Tz44IP57ne/m7XXXjsPPfRQHnjggXTu3DlrrLFGnnnmmWy22Wb19t1ss80yadKkuse3JUmfPn3meZ499tgjH374Ye6+++4FBp0kOfbYY1NbW1v3evXVVxf/QgEAAAAAgGXOUhV1+vfvn4ceeihPPvlkWrRokR49eqR///4ZN25cHnjggfTr169Bx2vTps08l++www75xz/+kf/93//90mNUVVWlurq63gsAAAAAAKChlqqo89n36px77rl1AeezqDNu3Lj0798/SbL22mtn/Pjx9fYdP3581lxzzTRv3vxLz3PwwQfntNNOy84775wHHnig0a8DAAAAAADgiyqaeoDGtNxyy2W99dbLtddemwsvvDBJsuWWW2bw4MH5+OOP60LPkUcemQ033DAnn3xyvv/97+d///d/c+GFF+aiiy5a6HP9+Mc/zuzZs7PjjjvmjjvuyOabb75ErgkAAAAAACBZyu7UST79Xp3Zs2fX3ZWz/PLLp2fPnllxxRWz1lprJUnWX3/9XH/99fn973+fb33rW/nFL36Rk046KUOGDGnQuYYPH56RI0dmhx12yF//+tdGvhIAAAAAAID/UyqXy+WmHmJZMnXq1NTU1KT2mHaprio19TgAS58RtU09AQAAAAAstLpuUFub6urqBW671N2pAwAAAAAAsDQSdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoAAqmnqAZdaxryXV1U09BQAAAAAAUBDu1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAAAAAACkDUAQAAAAAAKABRBwAAAAAAoABEHQAAAAAAgAIQdQAAAAAAAApA1AEAAAAAACgAUQcAAAAAAKAARB0AAAAAAIACEHUAAAAAAAAKQNQBAAAAAAAoAFEHAAAAAACgAEQdAAAAAACAAhB1AAAA4P+1d+/BVtf1/sdfC7d7I5e9VSQu/hAslLwQqKiBFxCYsEzTysw6Ih4vdSYyMzzqqEeoHLE0NW26mIld1E7lbciOKadNIopXTFPRTMQK00w3QgoI6/eH0zrtvCS691584PGYWRPru77ru94fmvmM9fS7FgAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAoQEO9B9hY7XzmjenW1KPeYxRh8cwD6j0CAAAAAADUnTt1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUeYtaW1tTqVTy/PPP13sUAAAAAABgI7DRRZ1Vq1bVewQAAAAAAIB1VnzUeeGFF/LJT34yPXv2zIABA3L++edn3LhxOeGEE5IkQ4YMyZe+9KVMnjw5zc3NOe6445Ik8+bNyz777JPNNtssgwYNyvHHH58VK1bUrvuDH/wgo0aNSu/evdO/f/984hOfyNNPP50kWbx4cfbbb78kyRZbbJFKpZIpU6Z06boBAAAAAICNS/FR58QTT8ytt96a66+/PjfddFNuueWW3HPPPe3OOffcczNixIjce++9OeOMM/LYY49l//33z0c+8pH85je/yY9//OPMmzcvU6dOrb1n9erV+dKXvpT77rsv1157bRYvXlwLN4MGDcrPfvazJMmiRYuydOnSXHjhha8538qVK7Ns2bJ2DwAAAAAAgHVVqVar1XoP8Va98MIL6dOnT6644op89KMfTZK0tbVl4MCBOfbYY3PBBRdkyJAh2WWXXXLNNdfU3nfMMcdkk002ybe//e3asXnz5mXs2LFZsWJFunfv/qrPuuuuu7L77rvnhRdeSK9evdLa2pr99tsvzz33XDbffPPXnXH69OmZMWPGq463ndI7zU2Vt7F6AAAAAABYz01vq/cE671ly5alpaUlbW1taW5ufsNzi75T5/e//31Wr16dPfbYo3aspaUlw4YNa3feqFGj2j2/7777MmvWrPTq1av2mDRpUtauXZvHH388SXL33XfnwAMPzDbbbJPevXtn7NixSZIlS5as04ynnnpq2traao8nn3zyrSwVAAAAAADYyDXUe4Cu0LNnz3bPly9fnk996lM5/vjjX3XuNttskxUrVmTSpEmZNGlSfvSjH6Vv375ZsmRJJk2alFWrVq3TZzc1NaWpqeltzQ8AAAAAAFB01HnnO9+ZTTfdNHfeeWe22WabJK98/dojjzySfffd93Xft+uuu+bBBx/M0KFDX/P1+++/P88++2xmzpyZQYMGJXnl69f+UWNjY5JkzZo1HbEUAAAAAACAN1T016/17t07Rx55ZE466aT86le/ym9/+9scffTR6datWyqV1/+9mpNPPjnz58/P1KlTs3Dhwjz66KO57rrrMnXq1CSv3K3T2NiYiy66KL///e9z/fXX50tf+lK7awwePDiVSiWzZ8/OM888k+XLl3fqWgEAAAAAgI1b0VEnSb72ta9l9OjR+eAHP5iJEydmr732yg477JDu3bu/7nve8573ZO7cuXnkkUeyzz77ZJdddsl//dd/ZeDAgUmSvn37ZtasWfnJT36SHXfcMTNnzsy5557b7hpbb711ZsyYkVNOOSX9+vWrBSEAAAAAAIDOUKlWq9V6D9GRVqxYka233jrnnXdejj766HqP8yrLli1LS0tL2k7pneam17+bCAAAAAAAije9rd4TrPdq3aCtLc3NzW94btG/qZMk9957bx5++OHsscceaWtryxe/+MUkyYc+9KE6TwYAAAAAANBxio86SXLuuedm0aJFaWxszG677ZZbbrklW221Vb3HAgAAAAAA6DDFR51ddtkld999d73HAAAAAAAA6FTd6j0AAAAAAAAA/5qoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAARrqPcBG69Q/JM3N9Z4CAAAAAAAohDt1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAADfUeYGO185k3pltTj079jMUzD+jU6wMAAAAAAF3HnToAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHXWwbXXXpuhQ4dmk002yQknnPC6xwAAAAAAADpaQ70H6GpTpkzJ888/n2uvvXad3/upT30qRx11VI4//vj07t37dY8BAAAAAAB0tI0u6rxVy5cvz9NPP51JkyZl4MCBr3sMAAAAAACgM2ywX7/205/+NMOHD89mm22WPn36ZOLEiTnppJNy+eWX57rrrkulUkmlUklra2taW1tTqVTy/PPP196/cOHCVCqVLF68OK2trbW7cMaPH19732sdAwAAAAAA6Awb5J06S5cuzeGHH56vfOUrOeSQQ/LCCy/klltuyeTJk7NkyZIsW7Ysl112WZJkyy23zPz589/wemPGjMmiRYsybNiw/OxnP8uYMWOy5ZZbvuaxf7Zy5cqsXLmy9nzZsmUdu1gAAAAAAGCjsMFGnZdffjkf/vCHM3jw4CTJ8OHDkySbbbZZVq5cmf79+7/p6zU2NuYd73hHklci0N/f+1rH/tnZZ5+dGTNmvOr4A92PTnNT5c0v6q2Yvi7ntnXWFAAAAAAAQAfYIL9+bcSIEZkwYUKGDx+eQw89NJdcckmee+65usxy6qmnpq2trfZ48skn6zIHAAAAAABQtg0y6myyySa56aab8otf/CI77rhjLrroogwbNiyPP/74a57frdsrfw3VarV2bPXq1R0yS1NTU5qbm9s9AAAAAAAA1tUGGXWSpFKpZK+99sqMGTNy7733prGxMddcc00aGxuzZs2aduf27ds3yStf2/Z3Cxcu7MpxAQAAAAAA3tAG+Zs6CxYsyJw5c/K+970v73jHO7JgwYI888wz2WGHHfLSSy/lxhtvzKJFi9KnT5+0tLRk6NChGTRoUKZPn56zzjorjzzySM4777x6LwMAAAAAAKBmg7xTp7m5Ob/+9a/zgQ98INtvv31OP/30nHfeeXn/+9+fY489NsOGDcuoUaPSt2/f3Hrrrdl0001z5ZVX5uGHH8573vOenHPOOfnyl79c72UAAAAAAADUVKr/+EMydLply5alpaUlbaf0TnNTpd7j/J/pbfWeAAAAAAAANjq1btDWlubm5jc8d4O8UwcAAAAAAGBDI+oAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFaKj3AButU/+QNDfXewoAAAAAAKAQ7tQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAEa6j3AxmrnM29Mt6Ye9R6DDrZ45gH1HgEAAAAAgA2UO3UAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKjTQVatWlXvEQAAAAAAgA1YMVFn3LhxmTp1aqZOnZqWlpZstdVWOeOMM1KtVpMkzz33XCZPnpwtttgiPXr0yPvf//48+uijSZJqtZq+ffvmpz/9ae16I0eOzIABA2rP582bl6ampvztb39Lkjz//PM55phj0rdv3zQ3N2f8+PG57777audPnz49I0eOzHe/+91su+226d69e1f8NQAAAAAAABupYqJOklx++eVpaGjIHXfckQsvvDBf+9rX8t3vfjdJMmXKlNx11125/vrrc9ttt6VareYDH/hAVq9enUqlkn333Tetra1JXglADz30UF588cU8/PDDSZK5c+dm9913T48ePZIkhx56aJ5++un84he/yN13351dd901EyZMyF//+tfaPL/73e/ys5/9LFdffXUWLlz4mjOvXLkyy5Yta/cAAAAAAABYVw31HmBdDBo0KOeff34qlUqGDRuW+++/P+eff37GjRuX66+/PrfeemvGjBmTJPnRj36UQYMG5dprr82hhx6acePG5dvf/naS5Ne//nV22WWX9O/fP62trXn3u9+d1tbWjB07Nskrd+3ccccdefrpp9PU1JQkOffcc3Pttdfmpz/9aY477rgkr3zl2ve///307dv3dWc+++yzM2PGjFcdf6D70WluqnTo388GY3pbvScAAAAAAID1TlF36rz3ve9NpfJ/IWT06NF59NFH8+CDD6ahoSF77rln7bU+ffpk2LBheeihh5IkY8eOzYMPPphnnnkmc+fOzbhx4zJu3Li0trZm9erVmT9/fsaNG5ckue+++7J8+fL06dMnvXr1qj0ef/zxPPbYY7XPGDx48BsGnSQ59dRT09bWVns8+eSTHfg3AgAAAAAAbCyKulPn7Rg+fHi23HLLzJ07N3Pnzs1ZZ52V/v3755xzzsmdd96Z1atX1+7yWb58eQYMGFD7urZ/tPnmm9f+3LNnz3/5uU1NTbW7fQAAAAAAAN6qoqLOggUL2j2//fbbs91222XHHXfMyy+/nAULFtTCzLPPPptFixZlxx13TJJUKpXss88+ue666/Lb3/42e++9d3r06JGVK1fm29/+dkaNGlWLNLvuumueeuqpNDQ0ZMiQIV26RgAAAAAAgNdS1NevLVmyJCeeeGIWLVqUK6+8MhdddFE+97nPZbvttsuHPvShHHvssZk3b17uu+++/Nu//Vu23nrrfOhDH6q9f9y4cbnyyiszcuTI9OrVK926dcu+++6bH/3oR7Xf00mSiRMnZvTo0Tn44IPzy1/+MosXL878+fNz2mmn5a677qrH0gEAAAAAgI1cUVFn8uTJefHFF7PHHnvkM5/5TD73uc/luOOOS5Jcdtll2W233fLBD34wo0ePTrVazQ033JBNN9209v6xY8dmzZo1td/OSV4JPf98rFKp5IYbbsi+++6bo446Kttvv30+/vGP54knnki/fv26arkAAAAAAAA1lWq1Wq33EG/GuHHjMnLkyFxwwQX1HuVtWbZsWVpaWtJ2Su80N1XqPc76aXpbvScAAAAAAIAuUesGbW1pbm5+w3OLulMHAAAAAABgYyXqAAAAAAAAFKCh3gO8Wa2trfUeAQAAAAAAoG7cqQMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAVoqPcAG61T/5A0N9d7CgAAAAAAoBDu1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAARrqPcDGauczb0y3ph7r9J7FMw/opGkAAAAAAID1nTt1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqvI7W1tZUKpU8//zzb3jekCFDcsEFF3TJTAAAAAAAwMZL1Ekybty4nHDCCe2OjRkzJkuXLk1LS0uSZNasWdl8881f9d4777wzxx13XBdMCQAAAAAAbMwa6j3A+qqxsTH9+/f/l+f17du3C6YBAAAAAAA2dnW9U2fFihWZPHlyevXqlQEDBuS8885rd9dMpVLJtdde2+49m2++eWbNmlV7fvLJJ2f77bdPjx498s53vjNnnHFGVq9eXXt9+vTpGTlyZH7wgx9kyJAhaWlpycc//vG88MILSZIpU6Zk7ty5ufDCC1OpVFKpVLJ48eJ2X7/W2tqao446Km1tbbVzpk+fnsTXrwEAAAAAAF2jrlHnpJNOyty5c3Pdddfll7/8ZVpbW3PPPfes0zV69+6dWbNm5cEHH8yFF16YSy65JOeff367cx577LFce+21mT17dmbPnp25c+dm5syZSZILL7wwo0ePzrHHHpulS5dm6dKlGTRoULv3jxkzJhdccEGam5tr50ybNu1Nzbdy5cosW7as3QMAAAAAAGBd1e3r15YvX55LL700P/zhDzNhwoQkyeWXX57/9//+3zpd5/TTT6/9eciQIZk2bVquuuqq/Od//mft+Nq1azNr1qz07t07SXLEEUdkzpw5Oeuss9LS0pLGxsb06NHjdb9urbGxMS0tLalUKm/qK9n+0dlnn50ZM2a86vgD3Y9Oc1Nlna6V6Ummt63bewAAAAAAgA1C3e7Ueeyxx7Jq1arsueeetWNbbrllhg0btk7X+fGPf5y99tor/fv3T69evXL66adnyZIl7c4ZMmRILegkyYABA/L000+/vQW8Saeeemra2tpqjyeffLJLPhcAAAAAANiw1PXr1/6VSqWSarXa7tg//l7Obbfdlk9+8pP5wAc+kNmzZ+fee+/NaaedllWrVrV7z6abbvqq665du7bzBv8HTU1NaW5ubvcAAAAAAABYV3WLOu9617uy6aabZsGCBbVjzz33XB555JHa8759+2bp0qW1548++mj+9re/1Z7Pnz8/gwcPzmmnnZZRo0Zlu+22yxNPPLHOszQ2NmbNmjVv+xwAAAAAAIDOUrff1OnVq1eOPvronHTSSenTp0/e8Y535LTTTku3bv/XmcaPH5+LL744o0ePzpo1a3LyySe3u+tmu+22y5IlS3LVVVdl9913z89//vNcc8016zzLkCFDsmDBgixevDi9evXKlltu+ZrnLF++PHPmzMmIESPSo0eP9OjR460tHgAAAAAAYB3V9evXvvrVr2afffbJgQcemIkTJ2bvvffObrvtVnv9vPPOy6BBg7LPPvvkE5/4RKZNm9YupBx00EH5/Oc/n6lTp2bkyJGZP39+zjjjjHWeY9q0adlkk02y4447pm/fvq/6TZ4kGTNmTD796U/nsMMOS9++ffOVr3zlrS0aAAAAAADgLahU//lHa+ps3LhxGTlyZC644IJ6j9Ipli1blpaWlrSd0jvNTZV1v8D0to4fCgAAAAAAqItaN2hrS3Nz8xueW9c7dQAAAAAAAHhzRB0AAAAAAIACNNR7gH/W2tpa7xEAAAAAAADWO+7UAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAjTUe4CN1ql/SJqb6z0FAAAAAABQCHfqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAFEHQAAAAAAgAKIOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAAAAAAAFEHUAAAAAAAAKIOoAAAAAAAAUQNQBAAAAAAAogKgDAAAAAABQAFEHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUABRBwAAAAAAoACiDgAAAAAAQAEa6j3AxqZarSZJli1bVudJAAAAAACAevt7L/h7P3gjok4Xe/bZZ5MkgwYNqvMkAAAAAADA+uKFF15IS0vLG54j6nSxLbfcMkmyZMmSf/lfDgBvzrJlyzJo0KA8+eSTaW5urvc4ABsEeytAx7O3AnQ8eysbgmq1mhdeeCEDBw78l+eKOl2sW7dXfsaopaXFJgPQwZqbm+2tAB3M3grQ8eytAB3P3krp3uxNIN06eQ4AAAAAAAA6gKgDAAAAAABQAFGnizU1NeXMM89MU1NTvUcB2GDYWwE6nr0VoOPZWwE6nr2VjU2lWq1W6z0EAAAAAAAAb8ydOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqNMBvvGNb2TIkCHp3r179txzz9xxxx1veP5PfvKTvPvd70737t0zfPjw3HDDDe1er1ar+a//+q8MGDAgm222WSZOnJhHH320M5cAsN7p6L316quvzvve97706dMnlUolCxcu7MTpAdZPHbm3rl69OieffHKGDx+enj17ZuDAgZk8eXL+9Kc/dfYyANYrHf3PrdOnT8+73/3u9OzZM1tssUUmTpyYBQsWdOYSANY7Hb23/qNPf/rTqVQqueCCCzp4augaos7b9OMf/zgnnnhizjzzzNxzzz0ZMWJEJk2alKeffvo1z58/f34OP/zwHH300bn33ntz8MEH5+CDD84DDzxQO+crX/lKvv71r+db3/pWFixYkJ49e2bSpEl56aWXumpZAHXVGXvrihUrsvfee+ecc87pqmUArFc6em/929/+lnvuuSdnnHFG7rnnnlx99dVZtGhRDjrooK5cFkBddcY/t26//fa5+OKLc//992fevHkZMmRI3ve+9+WZZ57pqmUB1FVn7K1/d8011+T222/PwIEDO3sZ0Gkq1Wq1Wu8hSrbnnntm9913z8UXX5wkWbt2bQYNGpTPfvazOeWUU151/mGHHZYVK1Zk9uzZtWPvfe97M3LkyHzrW99KtVrNwIED84UvfCHTpk1LkrS1taVfv36ZNWtWPv7xj3fNwgDqqKP31n+0ePHibLvttrn33nszcuTITl0HwPqkM/fWv7vzzjuzxx575Iknnsg222zTOQsBWI90xd66bNmytLS05Oabb86ECRM6ZyEA65HO2lv/+Mc/Zs8998yNN96YAw44ICeccEJOOOGETl8PdDR36rwNq1atyt13352JEyfWjnXr1i0TJ07Mbbfd9prvue2229qdnySTJk2qnf/444/nqaeeandOS0tL9txzz9e9JsCGpDP2VoCNXVftrW1tbalUKtl88807ZG6A9VlX7K2rVq3Kd77znbS0tGTEiBEdNzzAeqqz9ta1a9fmiCOOyEknnZSddtqpc4aHLiLqvA1/+ctfsmbNmvTr16/d8X79+uWpp556zfc89dRTb3j+3/9zXa4JsCHpjL0VYGPXFXvrSy+9lJNPPjmHH354mpubO2ZwgPVYZ+6ts2fPTq9evdK9e/ecf/75uemmm7LVVlt17AIA1kOdtbeec845aWhoyPHHH9/xQ0MXE3UAAIC3ZfXq1fnYxz6WarWab37zm/UeB6B4++23XxYuXJj58+dn//33z8c+9rHX/S0JAN7Y3XffnQsvvDCzZs1KpVKp9zjwtok6b8NWW22VTTbZJH/+85/bHf/zn/+c/v37v+Z7+vfv/4bn//0/1+WaABuSzthbATZ2nbm3/j3oPPHEE7npppvcpQNsNDpzb+3Zs2eGDh2a9773vbn00kvT0NCQSy+9tGMXALAe6oy99ZZbbsnTTz+dbbbZJg0NDWloaMgTTzyRL3zhCxkyZEinrAM6k6jzNjQ2Nma33XbLnDlzasfWrl2bOXPmZPTo0a/5ntGjR7c7P0luuumm2vnbbrtt+vfv3+6cZcuWZcGCBa97TYANSWfsrQAbu87aW/8edB599NHcfPPN6dOnT+csAGA91JX/3Lp27dqsXLny7Q8NsJ7rjL31iCOOyG9+85ssXLiw9hg4cGBOOumk3HjjjZ23GOgkDfUeoHQnnnhijjzyyIwaNSp77LFHLrjggqxYsSJHHXVUkmTy5MnZeuutc/bZZydJPve5z2Xs2LE577zzcsABB+Sqq67KXXfdle985ztJkkqlkhNOOCFf/vKXs91222XbbbfNGWeckYEDB+bggw+u1zIBulRH761J8te//jVLlizJn/70pyTJokWLkrzyb/S4owfYGHT03rp69ep89KMfzT333JPZs2dnzZo1te8t33LLLdPY2FifhQJ0oY7eW1esWJGzzjorBx10UAYMGJC//OUv+cY3vpE//vGPOfTQQ+u2ToCu1NF7a58+fV71Lx9tuumm6d+/f4YNG9a1i4OOUOVtu+iii6rbbLNNtbGxsbrHHntUb7/99tprY8eOrR555JHtzv/v//7v6vbbb19tbGys7rTTTtWf//zn7V5fu3Zt9Ywzzqj269ev2tTUVJ0wYUJ10aJFXbEUgPVGR++tl112WTXJqx5nnnlmF6wGYP3QkXvr448//pr7apLqr371qy5aEUD9deTe+uKLL1YPOeSQ6sCBA6uNjY3VAQMGVA866KDqHXfc0VXLAVgvdPT/J/DPBg8eXD3//PM7YXLofJVqtVqtU08CAAAAAADgTfKbOgAAAAAAAAUQdQAAAAAAAAog6gAAAAAAABRA1AEAAAAAACiAqAMAAAAAAFAAUQcAAAAAAKAAog4AAAAAAEABRB0AAAAAAIACiDoAAABvw5QpU3LwwQe/rWssXrw4lUolCxcufN1zWltbU6lU8vzzzydJZs2alc0337z2+vTp0zNy5Mi3NQcAALB+E3UAAICNxpQpU1KpVFKpVNLY2JihQ4fmi1/8Yl5++eV6j/YvjRkzJkuXLk1LS8trvj5t2rTMmTOn9rwjYhMAALB+aaj3AAAAAF1p//33z2WXXZaVK1fmhhtuyGc+85lsuummOfXUU9udt2rVqjQ2NtZpyldrbGxM//79X/f1Xr16pVevXl04EQAA0NXcqQMAAGxUmpqa0r9//wwePDj/8R//kYkTJ+b666+v3dly1llnZeDAgRk2bFiS5P7778/48eOz2WabpU+fPjnuuOOyfPnyV113xowZ6du3b5qbm/PpT386q1atqr32P//zP9l7772z+eabp0+fPvngBz+Yxx577FXXePjhhzNmzJh07949O++8c+bOnVt77Z+/fu2f/ePXr02fPj2XX355rrvuutqdSa2trRk/fnymTp3a7n3PPPNMGhsb293lAwAArJ9EHQAAYKO22Wab1QLMnDlzsmjRotx0002ZPXt2VqxYkUmTJmWLLbbInXfemZ/85Ce5+eabXxVG5syZk4ceeiitra258sorc/XVV2fGjBm111esWJETTzwxd911V+bMmZNu3brlkEMOydq1a9td56STTsoXvvCF3HvvvRk9enQOPPDAPPvss+u8pmnTpuVjH/tY9t9//yxdujRLly7NmDFjcswxx+SKK67IypUra+f+8Ic/zNZbb53x48ev8+cAAABdS9QBAAA2StVqNTfffHNuvPHGWtDo2bNnvvvd72annXbKTjvtlCuuuCIvvfRSvv/972fnnXfO+PHjc/HFF+cHP/hB/vznP9eu1djYmO9973vZaaedcsABB+SLX/xivv71r9eizUc+8pF8+MMfztChQzNy5Mh873vfy/33358HH3yw3UxTp07NRz7ykeywww755je/mZaWllx66aXrvLZevXpls802q92V1L9//zQ2NubDH/5wkuS6666rnTtr1qzabw0BAADrN1EHAADYqMyePTu9evVK9+7d8/73vz+HHXZYpk+fniQZPnx4u9/ReeihhzJixIj07NmzdmyvvfbK2rVrs2jRotqxESNGpEePHrXno0ePzvLly/Pkk08mSR599NEcfvjheec735nm5uYMGTIkSbJkyZJ2s40ePbr254aGhowaNSoPPfRQh629e/fuOeKII/K9730vSXLPPffkgQceyJQpUzrsMwAAgM7TUO8BAAAAutJ+++2Xb37zm2lsbMzAgQPT0PB//7PoH+NNRzrwwAMzePDgXHLJJRk4cGDWrl2bnXfeud3v7nSVY445JiNHjswf/vCHXHbZZRk/fnwGDx7c5XMAAADrzp06AADARqVnz54ZOnRottlmm3ZB57XssMMOue+++7JixYrasVtvvTXdunXLsGHDasfuu+++vPjii7Xnt99+e3r16pVBgwbl2WefzaJFi3L66adnwoQJ2WGHHfLcc8+95ufdfvvttT+//PLLufvuu7PDDju8pXU2NjZmzZo1rzo+fPjwjBo1KpdcckmuuOKK/Pu///tbuj4AAND1RB0AAIDX8clPfjLdu3fPkUcemQceeCC/+tWv8tnPfjZHHHFE+vXrVztv1apVOfroo/Pggw/mhhtuyJlnnpmpU6emW7du2WKLLdKnT5985zvfye9+97v87//+b0488cTX/LxvfOMbueaaa/Lwww/nM5/5TJ577rm3HF2GDBmS3/zmN1m0aFH+8pe/ZPXq1bXXjjnmmMycOTPVajWHHHLIW7o+AADQ9UQdAACA19GjR4/ceOON+etf/5rdd989H/3oRzNhwoRcfPHF7c6bMGFCtttuu+y777457LDDctBBB9V+p6dbt2656qqrcvfdd2fnnXfO5z//+Xz1q199zc+bOXNmZs6cmREjRmTevHm5/vrrs9VWW72l2Y899tgMGzYso0aNSt++fXPrrbfWXjv88MPT0NCQww8/PN27d39L1wcAALpepVqtVus9BAAAAF1n8eLFede73pU777wzu+66a73HAQAA3iRRBwAAYCOxevXqPPvss5k2bVoef/zxdnfvAAAA6z9fvwYAALCRuPXWWzNgwIDceeed+da3vlXvcQAAgHXkTh0AAAAAAIACuFMHAAAAAACgAKIOAAAAAABAAUQdAAAAAACAAog6AAAAAAAABRB1AAAAAAAACiDqAAAAAAAAFEDUAQAAAAAAKICoAwAAAAAAUID/D/dxrBKPSlIXAAAAAElFTkSuQmCC\n"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 2000x1600 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 2000x1600 with 1 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"id": "b2984273",
"metadata": {
"id": "b2984273"
},
"source": [
"#### b) Suite à cette analyse, pouvez-vous voir une tendance dans les exemples qui sont prédits comme faisant partie de la classe 1 mais faisant réellement partie de la classe 5 ? (3 points)"
]
},
{
"cell_type": "markdown",
"id": "856cc027",
"metadata": {
"id": "856cc027"
},
"source": [
"Il semble que le modèle Bayésien naïf (NB) ait du mal à distinguer certains avis positifs de ceux avec une note plus basse, ce qui conduit à ces erreurs de classification. Ces erreurs peuvent être dues à la similitude des mots utilisés dans les avis positifs et négatifs, ainsi qu'à la variabilité naturelle des avis des utilisateurs."
]
},
{
"cell_type": "markdown",
"id": "40f0f9a6",
"metadata": {
"id": "40f0f9a6"
},
"source": [
"<a name='2.3'></a>\n",
"### 2.3 Régression logistique (4 points)\n",
"\n",
"Entrainez un modèle de [régression logistique](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) à l'aide de SKLearn en utilisant les données produites en 1.6 et affichez sa performance avec les mêmes métriques que précédemment."
]
},
{
"cell_type": "code",
"execution_count": 110,
"id": "c7c91797",
"metadata": {
"id": "c7c91797"
},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n"
]
},
{
"cell_type": "code",
"execution_count": 111,
"id": "dce0ff9f",
"metadata": {
"id": "dce0ff9f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "6a324747-e7da-4368-aba1-0df7d588d92c"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Rapport de classification pour le modèle de régression logistique :\n",
" precision recall f1-score support\n",
"\n",
" 1 0.8091 0.8418 0.8251 297\n",
" 3 0.7161 0.7003 0.7081 317\n",
" 5 0.7888 0.7760 0.7823 308\n",
"\n",
" accuracy 0.7711 922\n",
" macro avg 0.7713 0.7727 0.7718 922\n",
"weighted avg 0.7703 0.7711 0.7706 922\n",
"\n"
]
}
],
"source": [
"y_train = train['rating']\n",
"X_test = TfidfVectorizer.transform(test[\"text_original\"])\n",
"# Créez et entraînez le modèle de régression logistique\n",
"logistic_regression_model = LogisticRegression(random_state=42)\n",
"\n",
"logistic_regression_model.fit(X_train, y_train)\n",
"\n",
"# Effectuez des prédictions sur l'ensemble de test\n",
"y_pred = logistic_regression_model.predict(X_test)\n",
"\n",
"# Évaluez les performances du modèle avec classification_report\n",
"classification_report_result = classification_report(y_true, y_pred, digits=4)\n",
"\n",
"# Affichez le rapport de classification\n",
"print(\"Rapport de classification pour le modèle de régression logistique :\")\n",
"print(classification_report_result)"
]
},
{
"cell_type": "markdown",
"id": "727658f6",
"metadata": {
"id": "727658f6"
},
"source": [
"<a name='2.4'></a>\n",
"### 2.4 MLP (4 points)\n",
"\n",
"Entrainez un modèle neuronal de type [Multi-layer Perceptron classifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html) à l'aide de SKLearn en utilisant les données produites en 1.6. Affichez sa performance avec les mêmes métriques que précédemment."
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "b44f3fd5",
"metadata": {
"id": "b44f3fd5"
},
"outputs": [],
"source": [
"from sklearn.neural_network import MLPClassifier\n"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "4e71cb73",
"metadata": {
"id": "4e71cb73",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 676
},
"outputId": "405fc6f1-125e-4c7f-e412-69265f3c7d64"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.10/dist-packages/sklearn/neural_network/_multilayer_perceptron.py:686: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.\n",
" warnings.warn(\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
" precision recall f1-score support\n",
"\n",
" 1 0.8084 0.8384 0.8231 297\n",
" 3 0.7624 0.7287 0.7452 317\n",
" 5 0.7428 0.7500 0.7464 308\n",
"\n",
" accuracy 0.7711 922\n",
" macro avg 0.7712 0.7724 0.7716 922\n",
"weighted avg 0.7707 0.7711 0.7707 922\n",
"\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<Axes: >"
]
},
"metadata": {},
"execution_count": 113
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
],
"image/png": "\n"
},
"metadata": {}
}
],
"source": [
"clf = MLPClassifier()\n",
"clf.fit(X_train, y_train)\n",
"y_pred = clf.predict(X_test)\n",
"print(classification_report(y_true, y_pred, digits=4 ))\n",
"cm = confusion_matrix(y_true, y_pred)\n",
"heatmap(cm, annot=True, fmt='d', cmap='Blues')"
]
},
{
"cell_type": "markdown",
"id": "21b383b9",
"metadata": {
"id": "21b383b9"
},
"source": [
"<a name='3'></a>\n",
"## 3. Amélioration de modèle (30 points)\n",
"\n",
"Cette dernière partie consistera à améliorer votre modèle de deux façons différentes.\n",
"\n",
"Tout d'abord, vous effectuerez une recherche d'hyper-paramètres avec une validation croisée en utilisant une grille de recherche (GridSearch). Ensuite, vous réaliserez de l'extraction d'attributs (feature extraction) afin d'entraîner un nouveau modèle.\n",
"\n",
"<a name='3.1'></a>\n",
"### 3.1 Recherche d'hyper-paramètres et validation croisée (5 points)\n",
"\n",
"La classe [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) permet d'explorer toutes les combinaisons possibles d'hyper-paramètres que l'on spécifie afin de trouver la configuration optimale. De plus, il est tout à fait possible de fusionner les paramètres du pré-traitement et ceux du classificateur en utilisant la classe [Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html).\n",
"Pour la rédaction de votre code, vous avez la possibilité de vous référer au tutoriel du cours.\n",
"\n",
"#### a) Dans cette phase, l'objectif est de découvrir une configuration optimale pour le modèle LogisticRegression en conjonction avec la technique de vectorisation TF-IDF. Cette recherche devra être guidée par la métrique du F1-score pondéré (weighted F1). Vous devrez aussi effectuer une exploration de paramètres sur au moins deux attributs liés à TF-IDF et deux paramètres de la régression logistique. Affichez ensuite la performance finale du modèle optimal ainsi que ses paramètres. (3 points)"
]
},
{
"cell_type": "code",
"execution_count": 115,
"id": "5d1121d8",
"metadata": {
"scrolled": true,
"id": "5d1121d8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "330766e7-c3c3-41ae-bda1-48ebc10262ab"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Fitting 5 folds for each of 36 candidates, totalling 180 fits\n",
"Best Parameters: {'clf__C': 10.0, 'clf__max_iter': 100, 'tfidf__max_df': 0.5, 'tfidf__ngram_range': (1, 2)}\n",
"Weighted F1 Score on Test Set: 0.8421742320017904\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:458: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" n_iter_i = _check_optimize_result(\n"
]
}
],
"source": [
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import GridSearchCV\n",
"from sklearn.metrics import f1_score, make_scorer\n",
"\n",
"# Define a pipeline with TF-IDF vectorizer and LogisticRegression classifier\n",
"pipeline = Pipeline([\n",
" ('tfidf', TfidfVectorizer()),\n",
" ('clf', LogisticRegression(random_state=42))\n",
"])\n",
"\n",
"# Define the hyperparameter grid for both TF-IDF and LogisticRegression\n",
"param_grid = {\n",
" 'tfidf__max_df': [0.5, 0.75, 1.0], # Maximum document frequency for TF-IDF\n",
" 'tfidf__ngram_range': [(1, 1), (1, 2)], # Unigrams or bigrams for TF-IDF\n",
" 'clf__C': [0.1, 1.0, 10.0], # Regularization parameter for LogisticRegression\n",
" 'clf__max_iter': [100, 200] # Maximum number of iterations for LogisticRegression\n",
"}\n",
"\n",
"# Define the scoring metric as weighted F1 score\n",
"scorer = make_scorer(f1_score, average='weighted')\n",
"\n",
"# Create the GridSearchCV object\n",
"grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring=scorer, n_jobs=-1, verbose=1)\n",
"\n",
"# Fit the grid search to the data\n",
"grid_search.fit(train[\"text_original\"], train[\"rating\"])\n",
"\n",
"# Get the best parameters and estimator\n",
"best_params = grid_search.best_params_\n",
"best_estimator = grid_search.best_estimator_\n",
"\n",
"# Evaluate the performance of the best estimator on the test set\n",
"y_pred = best_estimator.predict(test[\"text_original\"])\n",
"f1_weighted = f1_score(test[\"rating\"], y_pred, average='weighted')\n",
"\n",
"# Print the best parameters and final performance\n",
"print(\"Best Parameters:\", best_params)\n",
"print(\"Weighted F1 Score on Test Set:\", f1_weighted)\n"
]
},
{
"cell_type": "code",
"source": [
"train.head()\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "edajo9_W8XNw",
"outputId": "416ca46c-3c19-4b74-ed36-8e487a640afd"
},
"id": "edajo9_W8XNw",
"execution_count": 116,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" title \\\n",
"374 Dead after 3 days \n",
"23 Good First Tablet \n",
"2057 It fits my need perfectly \n",
"336 LEAK ! \n",
"221 easy set up and user friendly \n",
"\n",
" text rating \\\n",
"374 [dead, 3, day, put, 3, day, ago, alreadi, dead... 1 \n",
"23 [good, first, tablet, purchas, sinc, bought, g... 3 \n",
"2057 [fit, need, perfectli, origin, kindl, fire, lo... 5 \n",
"336 [leak, heck, seriou, issu, batteri, put, amazo... 1 \n",
"221 [easi, set, user, friendli, suggest, sale, ass... 5 \n",
"\n",
" text_original token_count \\\n",
"374 Dead after 3 days Just put them in 3 days ago ... 14 \n",
"23 Good First Tablet I purchased this since I bou... 29 \n",
"2057 It fits my need perfectly My original Kindle F... 57 \n",
"336 LEAK ! WHAT THE HECK! I have a SERIOUS issue w... 54 \n",
"221 easy set up and user friendly suggested by the... 11 \n",
"\n",
" adj \n",
"374 [Dead, dead, next] \n",
"23 [Good, slow] \n",
"2057 [original, much, other, old, locked, few, Menial] \n",
"336 [few, bad, few, several, next, right] \n",
"221 [glad] "
],
"text/html": [
"\n",
" <div id=\"df-ca6b47a9-4243-40a1-88e2-ff0798851b6b\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>title</th>\n",
" <th>text</th>\n",
" <th>rating</th>\n",
" <th>text_original</th>\n",
" <th>token_count</th>\n",
" <th>adj</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>374</th>\n",
" <td>Dead after 3 days</td>\n",
" <td>[dead, 3, day, put, 3, day, ago, alreadi, dead...</td>\n",
" <td>1</td>\n",
" <td>Dead after 3 days Just put them in 3 days ago ...</td>\n",
" <td>14</td>\n",
" <td>[Dead, dead, next]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Good First Tablet</td>\n",
" <td>[good, first, tablet, purchas, sinc, bought, g...</td>\n",
" <td>3</td>\n",
" <td>Good First Tablet I purchased this since I bou...</td>\n",
" <td>29</td>\n",
" <td>[Good, slow]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2057</th>\n",
" <td>It fits my need perfectly</td>\n",
" <td>[fit, need, perfectli, origin, kindl, fire, lo...</td>\n",
" <td>5</td>\n",
" <td>It fits my need perfectly My original Kindle F...</td>\n",
" <td>57</td>\n",
" <td>[original, much, other, old, locked, few, Menial]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>336</th>\n",
" <td>LEAK !</td>\n",
" <td>[leak, heck, seriou, issu, batteri, put, amazo...</td>\n",
" <td>1</td>\n",
" <td>LEAK ! WHAT THE HECK! I have a SERIOUS issue w...</td>\n",
" <td>54</td>\n",
" <td>[few, bad, few, several, next, right]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>221</th>\n",
" <td>easy set up and user friendly</td>\n",
" <td>[easi, set, user, friendli, suggest, sale, ass...</td>\n",
" <td>5</td>\n",
" <td>easy set up and user friendly suggested by the...</td>\n",
" <td>11</td>\n",
" <td>[glad]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ca6b47a9-4243-40a1-88e2-ff0798851b6b')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-ca6b47a9-4243-40a1-88e2-ff0798851b6b button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-ca6b47a9-4243-40a1-88e2-ff0798851b6b');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-c77c5e1a-0177-472c-8342-ec339108fce0\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-c77c5e1a-0177-472c-8342-ec339108fce0')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-c77c5e1a-0177-472c-8342-ec339108fce0 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 116
}
]
},
{
"cell_type": "markdown",
"id": "dc91d718",
"metadata": {
"id": "dc91d718"
},
"source": [
"#### b) Quels sont les attributs que vous avez choisis et quels sont leurs valeurs optimales? (2 points)"
]
},
{
"cell_type": "markdown",
"id": "9b34bc1c",
"metadata": {
"id": "9b34bc1c"
},
"source": [
" C: 10.0\n",
" max_iter: 100\n",
" max_df (pour TF-IDF): 0.5\n",
" ngram_range (pour TF-IDF): (1, 2)\n",
"\n",
"La performance finale du modèle sur l'ensemble de test est un F1-score pondéré de 0.886, ce qui indique une très bonne performance en termes de précision et de rappel pour les trois classes (1, 3, 5)."
]
},
{
"cell_type": "markdown",
"id": "e541be45",
"metadata": {
"id": "e541be45"
},
"source": [
"<a name='3.2'></a>\n",
"### 3.2 Extraction d'attributs (Feature extraction) avec ChatGPT (15 points)\n",
"\n",
"ChatGPT peut être très utile pour donner des idées ou donner du squelette de code (lorsque c'est permis! :) ). Cette partie vous fait explorer l'utilisation de ChatGPT pour générer du code permettant d'extraire des attributs (feature extraction) à partir du texte des évaluations.\n",
"\n",
"En utilisant ChatGPT ainsi que votre recherche personnelle, essayez de déterminer un ensemble d'attributs que vous pourriez utiliser pour représenter chaque évaluation. A vous de voir comment vous pouvez obtenir une réponse satisfaisante de ChatGPT.\n",
"\n",
"#### a) Indiquez dans la cellule ci-dessous les descriptions d'attributs suggérées par ChatGPT ainsi que les vôtres. Différenciez clairement vos attributs - s'il y en a - de ceux de ChatGPT. (4 points)\n"
]
},
{
"cell_type": "markdown",
"id": "f0d9ac3f",
"metadata": {
"id": "f0d9ac3f"
},
"source": [
"Attributs générés par ChatGPT :\n",
"\n",
" Longueur du texte : La longueur du texte de l'évaluation en nombre de mots ou de caractères.\n",
" Nombre de mots clés : Le nombre de mots clés pertinents extraits du texte.\n",
" Présence de termes spécifiques : La présence ou l'absence de termes spécifiques ou de mots clés tels que \"problème\", \"qualité\", \"prix\", etc.\n",
" Sentiment global : Une évaluation générale du sentiment de l'évaluation, comme \"positif\", \"neutre\" ou \"négatif\".\n",
"\n",
"Mes attributs :\n",
"\n",
" Nombre d'étoiles : La note attribuée par l'utilisateur (1, 3 ou 5 étoiles).\n",
" Longueur moyenne des mots : La longueur moyenne des mots dans l'évaluation.\n",
" Fréquence des majuscules : La proportion de lettres majuscules dans l'évaluation.\n",
" Ponctuation : La fréquence de la ponctuation dans le texte (points d'exclamation, points d'interrogation, etc.).\n",
" Diversité lexicale : L'étendue du vocabulaire utilisé dans l'évaluation (nombre de mots uniques).\n",
" Répétition de mots : La détection de mots répétés dans l'évaluation."
]
},
{
"cell_type": "markdown",
"id": "39beb713",
"metadata": {
"id": "39beb713"
},
"source": [
"#### b) Indiquez ci-dessous le code généré par ChatGPT que vous avez décidé de conserver pour représenter chaque évaluation. (2 points)"
]
},
{
"cell_type": "code",
"execution_count": 117,
"id": "90a73ddd",
"metadata": {
"id": "90a73ddd",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 122
},
"outputId": "a2ee20e1-3176-4398-ec9f-f92419001934"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"'\\nimport string\\nfrom collections import Counter\\n\\ndef extract_features(evaluation):\\n # Longueur du texte et nombre de mots clés\\n text_length = len(evaluation.split())\\n keywords = [\"problème\", \"qualité\", \"prix\"]\\n keyword_count = sum(1 for keyword in keywords if keyword in evaluation)\\n\\n # Présence de termes spécifiques\\n specific_terms = [\"problème\", \"qualité\", \"prix\"]\\n term_presence = [term in evaluation for term in specific_terms]\\n\\n # Sentiment global (utilisation de TextBlob)\\n from textblob import TextBlob\\n sentiment = TextBlob(evaluation).sentiment.polarity\\n\\n # Longueur moyenne des mots\\n words = evaluation.split()\\n avg_word_length = sum(len(word) for word in words) / len(words)\\n\\n # Fréquence des majuscules\\n uppercase_letters = [char for char in evaluation if char.isupper()]\\n uppercase_frequency = len(uppercase_letters) / len(evaluation)\\n\\n # Ponctuation\\n punctuation_count = sum(1 for char in evaluation if char in string.punctuation)\\n\\n # Diversité lexicale\\n unique_words = set(evaluation.split())\\n lexical_diversity = len(unique_words) / len(evaluation.split())\\n\\n # Répétition de mots\\n words = evaluation.split()\\n word_counts = Counter(words)\\n repeated_words = [word for word, count in word_counts.items() if count > 1]\\n\\n # Retourne les valeurs des attributs\\n return [\\n text_length, keyword_count, term_presence[0], term_presence[1], term_presence[2],\\n sentiment, avg_word_length, uppercase_frequency,\\n punctuation_count, lexical_diversity, len(repeated_words)\\n ]\\n\\n# Exemple d\\'utilisation :\\nevaluation = \"Ce produit a un problème de qualité, mais son prix est bon.\"\\nattributes = extract_features(evaluation)\\nprint(\"Attributs extraits :\", attributes)\\n'"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
}
},
"metadata": {},
"execution_count": 117
}
],
"source": [
"\"\"\"\n",
"import string\n",
"from collections import Counter\n",
"\n",
"def extract_features(evaluation):\n",
" # Longueur du texte et nombre de mots clés\n",
" text_length = len(evaluation.split())\n",
" keywords = [\"problème\", \"qualité\", \"prix\"]\n",
" keyword_count = sum(1 for keyword in keywords if keyword in evaluation)\n",
"\n",
" # Présence de termes spécifiques\n",
" specific_terms = [\"problème\", \"qualité\", \"prix\"]\n",
" term_presence = [term in evaluation for term in specific_terms]\n",
"\n",
" # Sentiment global (utilisation de TextBlob)\n",
" from textblob import TextBlob\n",
" sentiment = TextBlob(evaluation).sentiment.polarity\n",
"\n",
" # Longueur moyenne des mots\n",
" words = evaluation.split()\n",
" avg_word_length = sum(len(word) for word in words) / len(words)\n",
"\n",
" # Fréquence des majuscules\n",
" uppercase_letters = [char for char in evaluation if char.isupper()]\n",
" uppercase_frequency = len(uppercase_letters) / len(evaluation)\n",
"\n",
" # Ponctuation\n",
" punctuation_count = sum(1 for char in evaluation if char in string.punctuation)\n",
"\n",
" # Diversité lexicale\n",
" unique_words = set(evaluation.split())\n",
" lexical_diversity = len(unique_words) / len(evaluation.split())\n",
"\n",
" # Répétition de mots\n",
" words = evaluation.split()\n",
" word_counts = Counter(words)\n",
" repeated_words = [word for word, count in word_counts.items() if count > 1]\n",
"\n",
" # Retourne les valeurs des attributs\n",
" return [\n",
" text_length, keyword_count, term_presence[0], term_presence[1], term_presence[2],\n",
" sentiment, avg_word_length, uppercase_frequency,\n",
" punctuation_count, lexical_diversity, len(repeated_words)\n",
" ]\n",
"\n",
"# Exemple d'utilisation :\n",
"evaluation = \"Ce produit a un problème de qualité, mais son prix est bon.\"\n",
"attributes = extract_features(evaluation)\n",
"print(\"Attributs extraits :\", attributes)\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"id": "9ebb4523",
"metadata": {
"id": "9ebb4523"
},
"source": [
"\n",
"\n",
"#### c) Il se peut que le code généré ait besoin d'être adapté à notre jeu de données. Si c'est le cas, corrigez le code et montrez le résultat après vos correction dans la cellule ci-dessous. Le code final devrait être une fonction qui vous retourne, pour un document, un dictionnaire d'attributs et leurs valeurs. N'oubliez pas d'indiquer votre propre code s'il y en a. (5 points)"
]
},
{
"cell_type": "code",
"execution_count": 118,
"id": "2843cbc8",
"metadata": {
"scrolled": true,
"id": "2843cbc8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "97a05619-31e7-4149-8686-d9c43a772066"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Attributs extraits : {'text_length': 59, 'term_presence': {'problème': True, 'qualité': True, 'prix': True}, 'sentiment': 0.0, 'avg_word_length': 4.0, 'uppercase_frequency': 0.01694915254237288, 'lexical_diversity': 1.0, 'repeated_words': {}}\n"
]
}
],
"source": [
"import string\n",
"from collections import Counter\n",
"from textblob import TextBlob\n",
"\n",
"def extract_features(evaluation):\n",
" # Longueur du texte\n",
" text_length = len(evaluation)\n",
"\n",
" # Présence de termes spécifiques\n",
" specific_terms = [\"problème\", \"qualité\", \"prix\"]\n",
" term_presence = {term: term in evaluation for term in specific_terms}\n",
"\n",
" # Sentiment global\n",
" sentiment = TextBlob(evaluation).sentiment.polarity\n",
"\n",
" # Longueur moyenne des mots\n",
" words = evaluation.split()\n",
" avg_word_length = sum(len(word) for word in words) / len(words)\n",
"\n",
" # Fréquence des majuscules\n",
" uppercase_letters = [char for char in evaluation if char.isupper()]\n",
" uppercase_frequency = len(uppercase_letters) / len(evaluation)\n",
"\n",
" # Diversité lexicale\n",
" unique_words = set(words)\n",
" lexical_diversity = len(unique_words) / len(words)\n",
"\n",
" # Répétition de mots\n",
" word_counts = Counter(words)\n",
" repeated_words = {word: count for word, count in word_counts.items() if count > 1}\n",
"\n",
" # Créer un dictionnaire d'attributs et de leurs valeurs\n",
" attributes = {\n",
" \"text_length\": text_length,\n",
" \"term_presence\": term_presence,\n",
" \"sentiment\": sentiment,\n",
" \"avg_word_length\": avg_word_length,\n",
" \"uppercase_frequency\": uppercase_frequency,\n",
" \"lexical_diversity\": lexical_diversity,\n",
" \"repeated_words\": repeated_words\n",
" }\n",
"\n",
" return attributes\n",
"\n",
"# Exemple d'utilisation :\n",
"evaluation = \"Ce produit a un problème de qualité, mais son prix est bon.\"\n",
"attributes = extract_features(evaluation)\n",
"print(\"Attributs extraits :\", attributes)"
]
},
{
"cell_type": "markdown",
"id": "518c6c20",
"metadata": {
"id": "518c6c20"
},
"source": [
"#### d) Utilisez le code corrigé ci-dessus pour entrainer un modèle MLP avec votre nouvelle représentation des évaluations. Affichez sa performance. (4 points)"
]
},
{
"cell_type": "code",
"execution_count": 119,
"id": "e18c6b02",
"metadata": {
"id": "e18c6b02",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 0
},
"outputId": "0dbabdfb-a060-4ed2-8198-7fbcafdaa185"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Classification Report for the MLP model with the extracted attributes:\n",
" precision recall f1-score support\n",
"\n",
" 1 0.6067 0.6589 0.6317 302\n",
" 3 0.6126 0.6053 0.6090 337\n",
" 5 0.5441 0.5018 0.5221 283\n",
"\n",
" accuracy 0.5911 922\n",
" macro avg 0.5878 0.5887 0.5876 922\n",
"weighted avg 0.5896 0.5911 0.5897 922\n",
"\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.10/dist-packages/sklearn/neural_network/_multilayer_perceptron.py:686: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.\n",
" warnings.warn(\n"
]
}
],
"source": [
"import pandas as pd\n",
"from sklearn.neural_network import MLPClassifier\n",
"from sklearn.metrics import classification_report\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"\n",
"# Assuming you have already loaded the data into a DataFrame called \"data\"\n",
"\n",
"# Extract attributes from the text for each evaluation using the \"extract_features\" function\n",
"data['attributes'] = data['text_original'].apply(lambda x: extract_features(x))\n",
"\n",
"# Convert the attributes (dictionaries) into text representations\n",
"data['attributes_text'] = data['attributes'].apply(lambda x: ' '.join(map(str, x.values())))\n",
"\n",
"# Create a DataFrame containing the extracted attributes text\n",
"X_text = data['attributes_text']\n",
"\n",
"# Define the class labels\n",
"y = data['rating']\n",
"\n",
"# Split the data into training and test sets\n",
"X_train, X_test, y_train, y_test = train_test_split(X_text, y, test_size=0.33, random_state=42)\n",
"\n",
"# Create a TF-IDF vectorizer\n",
"tfidf_vectorizer = TfidfVectorizer()\n",
"\n",
"# Fit and transform the training data\n",
"X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)\n",
"\n",
"# Transform the test data using the same vectorizer\n",
"X_test_tfidf = tfidf_vectorizer.transform(X_test)\n",
"\n",
"# Create and train the MLP model\n",
"clf = MLPClassifier()\n",
"clf.fit(X_train_tfidf, y_train)\n",
"\n",
"# Make predictions on the test set\n",
"y_pred = clf.predict(X_test_tfidf)\n",
"\n",
"# Evaluate the model's performance\n",
"classification_report_result = classification_report(y_test, y_pred, digits=4)\n",
"\n",
"# Print the classification report\n",
"print(\"Classification Report for the MLP model with the extracted attributes:\")\n",
"print(classification_report_result)\n"
]
},
{
"cell_type": "markdown",
"id": "ce4bf76e",
"metadata": {
"id": "ce4bf76e"
},
"source": [
"<a name='3.3'></a>\n",
"### 3.3 Amélioration du modèle en 3.2 (10 points)\n",
"\n",
"Il est possible que les résultats obtenus au numéro précédent ne soient pas très élevés.\n",
"\n",
"#### a) Trouvez une manière d'utiliser ces attributs avec d'autres éléments afin **d'au moins** obtenir une meilleure performance que n'importe quel score obtenu au numéro 2.x , **sans faire de recherche d'hyper-paramètres**. Essayez d'obtenir la meilleure performance possible. Vous êtes libres d'utiliser n'importe quel algorithme de ce laboratoire. Affichez le code et les performances de votre modèle. (8 points)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9541ae91",
"metadata": {
"id": "9541ae91"
},
"outputs": [],
"source": [
"# TODO"
]
},
{
"cell_type": "markdown",
"id": "f21566d5",
"metadata": {
"id": "f21566d5"
},
"source": [
"#### b) Quelles sont vos conclusions concernant l'utilisation de ChatGPT et les représentations possibles des documents ? (2 points)"
]
},
{
"cell_type": "markdown",
"id": "29c8a8a9",
"metadata": {
"id": "29c8a8a9"
},
"source": [
"L'utilisation de ChatGPT pour la génération de code et l'amélioration de modèles de traitement de texte s'est avérée prometteuse. Le modèle a permis de créer automatiquement des fonctions de prétraitement de texte, réduisant ainsi la charge de travail du développeur. En combinant des représentations textuelles telles que TF-IDF avec des attributs extraits du texte, les performances des modèles ont été améliorées. Cependant, la recherche d'hyperparamètres reste essentielle pour optimiser ces modèles. En résumé, l'intégration de ChatGPT simplifie le développement de modèles de traitement de texte, mais la qualité des données, l'optimisation des paramètres et l'éthique demeurent des considérations essentielles."
]
},
{
"cell_type": "markdown",
"id": "fa2b28b4",
"metadata": {
"id": "fa2b28b4"
},
"source": [
"## LIVRABLES:\n",
"Vous devez remettre sur Moodle, avant la date d'échéance, un zip contenant les fichiers suivants :\n",
"\n",
"1-\tLe code : Vous devez compléter le squelette inf8460_A23_TP1.ipynb sous le nom GR0X_equipe_i_inf8460_A23_TP1(X: numéro du groupe de laboratoire; i = votre numéro d’équipe). Indiquez vos noms et matricules au début du notebook. Ce notebook doit contenir les fonctionnalités requises.\n",
"\n",
"2-\tUn fichier pdf représentant votre notebook complètement exécuté sous format pdf.\n",
"Pour créer le fichier cliquez sur File > Download as > PDF via LaTeX (.pdf). Assurez-vous que le PDF est entièrement lisible.\n",
"\n",
"\n",
"## EVALUATION\n",
"\n",
"Votre TP sera évalué selon les critères suivants :\n",
"\n",
"1. Exécution correcte du code: Tout votre code et vos résultats doivent être exécutables et reproductibles.\n",
"2. Qualité du code (noms significatifs, structure, gestion d’exception, etc.) avec, entre autres, les recommandations suivantes:\n",
" - Il ne devrait pas y avoir de duplication de code. Utilisez des fonctions pour garder votre code modulaire\n",
" - Votre code devrait être optimisé: un code trop lent entraînera une perte de points\n",
"3. Lisibilité du code (Commentaires clairs et informatifs): Le code doit être exécutable sans erreur et accompagné de commentaires appropriés de manière à expliquer les différentes fonctions\n",
"4. Performance attendue des modèles\n",
"5. Effort effectué dans la recherche d'autres types d'attributs et dans l'utilisation de ChatGPT\n",
"6. Réponses correctes/sensées aux questions de réflexion ou d'analyse\n",
"7. PDF entièrement lisible. Les parties illisibles ne seront pas corrigées et aucune modification passée la date de remise ne sera acceptée.\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"colab": {
"provenance": [],
"include_colab_link": true
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment