Skip to content

Instantly share code, notes, and snippets.

@ingridstevens
Last active November 24, 2022 18:59
Show Gist options
  • Save ingridstevens/d93eb5746b1656d763e7e076b04ffac1 to your computer and use it in GitHub Desktop.
Save ingridstevens/d93eb5746b1656d763e7e076b04ffac1 to your computer and use it in GitHub Desktop.
Translation + Sentiment Analysis
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Import Libraries\n",
"import deepl\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# import the csv file\n",
"df = pd.read_csv('translate-testdata.csv')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Load the API key of(deepL Free account)\n",
"auth_key = \"x-x-x-x-x:fx\" \n",
"translator = deepl.Translator(auth_key)\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Function that translates all text in the \"Text\" column of df\n",
"def translate_text(text):\n",
" result = translator.translate_text(text, target_lang=\"EN-US\")\n",
" return result.text"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Translate the text"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# Apply the translate_text function to the df \n",
"df['English'] = df['Text'].apply(translate_text)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# export the df to a csv file\n",
"df.to_csv('translate-testdata.csv', index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up so we only have the original text and english translation"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# Reduce the dataframe to only the \"English\" column and \"Text\" column\n",
"df = df[['Text', 'English']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sentiment Analysis \n",
"Run sentiment polarity analysis \n",
"Run sentiment emotion analysis"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"# import a library for sentiment analysis \n",
"from textblob import TextBlob\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# apply the sentiment analysis to the \"English\" column\n",
"df['Sentiment'] = df['English'].apply(lambda x: TextBlob(x).sentiment.polarity)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/yl/mwd8tygs7p38z57chhqrlkx80000gn/T/ipykernel_66292/2747275817.py:3: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.\n",
" df = df.append({'Text': 'Finding the right product was difficult'}, ignore_index=True)\n"
]
}
],
"source": [
"# add three rows to the dataframe column \"Text\" with the text \"I love you\", \"I hate you\", and \"I am neutral\"\n",
"\n",
"df = df.append({'Text': 'Finding the right product was difficult'}, ignore_index=True)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now to try sentiment analysis on the pre-labeled dataset \n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"df_test = pd.read_csv('kaggle-test.csv', encoding= 'unicode_escape')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# make the \"text\" column a string \n",
"df_test['text'] = df_test['text'].astype(str)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TextBlob: Polarity & Subjectivity\n",
"\n",
"The output of TextBlob is polarity and subjectivity. \n",
"\n",
"*Polarity* score lies between (-1 to 1) where:\n",
"* -1 identifies the most negative words such as ‘disgusting’, ‘awful’, ‘pathetic’, \n",
"* 1 identifies the most positive words like ‘excellent’, ‘best’. \n",
"\n",
"*Subjectivity* score lies between (0 and 1), It shows the amount of personal opinion, \n",
"* If a sentence has high subjectivity i.e. close to 1, It resembles that the text contains more personal opinion than factual information. \n",
"* Conversely, a 0 would indicate a purely factual statement"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# Apply sentiment analysis to text\n",
"df_test['polarity_textblob'] = df_test['text'].apply(lambda x: TextBlob(x).sentiment.polarity)\n",
"df_test['subjectivity_textblob'] = df_test['text'].apply(lambda x: TextBlob(x).sentiment.subjectivity)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# Make the df smaller by only keeping the \"text\" and \"IN_Sentiment\" columns\n",
"df_test = df_test[['text', 'sentiment', 'polarity_textblob', 'subjectivity_textblob']]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.1 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment