Skip to content

Instantly share code, notes, and snippets.

@EunkyungGu
Created October 18, 2020 11:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save EunkyungGu/a90a5fd2b721a1719910cd01a90c30fc to your computer and use it in GitHub Desktop.
Save EunkyungGu/a90a5fd2b721a1719910cd01a90c30fc to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
" <img src=\"https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/Logos/organization_logo/organization_logo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n",
"</center>\n",
"\n",
"# Watson Speech to Text Translator\n",
"\n",
"Estimated time needed: **25** minutes\n",
"\n",
"## Objectives\n",
"\n",
"After completing this lab you will be able to:\n",
"\n",
"- Create Speech to Text Translator\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Introduction\n",
"\n",
"<p>In this notebook, you will learn to convert an audio file of an English speaker to text using a Speech to Text API. Then you will translate the English version to a Spanish version using a Language Translator API. <b>Note:</b> You must obtain the API keys and enpoints to complete the lab.</p>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
"<h2>Table of Contents</h2>\n",
"<ul>\n",
" <li><a href=\"#ref0\">Speech To Text</a></li>\n",
" <li><a href=\"#ref1\">Language Translator</a></li>\n",
" <li><a href=\"#ref2\">Exercise</a></li>\n",
"</ul>\n",
"</div>\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting ibm_watson\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/a2/3c/c2cfb41db546fe98820e89017c892d73991cef61b9c48680191fe703a214/ibm-watson-4.7.1.tar.gz (385kB)\n",
"\u001b[K |████████████████████████████████| 389kB 6.8MB/s eta 0:00:01\n",
"\u001b[?25hCollecting wget\n",
" Downloading https://files.pythonhosted.org/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip\n",
"Requirement already satisfied: requests<3.0,>=2.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from ibm_watson) (2.24.0)\n",
"Requirement already satisfied: python_dateutil>=2.5.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from ibm_watson) (2.8.1)\n",
"Collecting websocket-client==0.48.0 (from ibm_watson)\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/8a/a1/72ef9aa26cfe1a75cee09fc1957e4723add9de098c15719416a1ee89386b/websocket_client-0.48.0-py2.py3-none-any.whl (198kB)\n",
"\u001b[K |████████████████████████████████| 204kB 8.0MB/s eta 0:00:01\n",
"\u001b[?25hCollecting ibm_cloud_sdk_core==1.7.3 (from ibm_watson)\n",
" Downloading https://files.pythonhosted.org/packages/b7/23/aa9ae242f6348a1ed28fca2e6d3e76e043c3db951f9b516e1992518fe2c3/ibm-cloud-sdk-core-1.7.3.tar.gz\n",
"Requirement already satisfied: idna<3,>=2.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests<3.0,>=2.0->ibm_watson) (2.10)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests<3.0,>=2.0->ibm_watson) (2020.6.20)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests<3.0,>=2.0->ibm_watson) (1.25.10)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests<3.0,>=2.0->ibm_watson) (3.0.4)\n",
"Requirement already satisfied: six>=1.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from python_dateutil>=2.5.3->ibm_watson) (1.15.0)\n",
"Collecting PyJWT>=1.7.1 (from ibm_cloud_sdk_core==1.7.3->ibm_watson)\n",
" Downloading https://files.pythonhosted.org/packages/87/8b/6a9f14b5f781697e51259d81657e6048fd31a113229cf346880bb7545565/PyJWT-1.7.1-py2.py3-none-any.whl\n",
"Building wheels for collected packages: ibm-watson, wget, ibm-cloud-sdk-core\n",
" Building wheel for ibm-watson (setup.py) ... \u001b[?25ldone\n",
"\u001b[?25h Stored in directory: /home/jupyterlab/.cache/pip/wheels/6e/14/69/dbbd573a3bab3bf64984572284f13f174f430038308abdd73c\n",
" Building wheel for wget (setup.py) ... \u001b[?25ldone\n",
"\u001b[?25h Stored in directory: /home/jupyterlab/.cache/pip/wheels/40/15/30/7d8f7cea2902b4db79e3fea550d7d7b85ecb27ef992b618f3f\n",
" Building wheel for ibm-cloud-sdk-core (setup.py) ... \u001b[?25ldone\n",
"\u001b[?25h Stored in directory: /home/jupyterlab/.cache/pip/wheels/34/6e/58/589e0f841c2fae9dad99630d78ddc7a60c5c7663a16a39cdbb\n",
"Successfully built ibm-watson wget ibm-cloud-sdk-core\n",
"Installing collected packages: websocket-client, PyJWT, ibm-cloud-sdk-core, ibm-watson, wget\n",
"Successfully installed PyJWT-1.7.1 ibm-cloud-sdk-core-1.7.3 ibm-watson-4.7.1 websocket-client-0.48.0 wget-3.2\n"
]
}
],
"source": [
"#you will need the following library \n",
"!pip install ibm_watson wget"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"ref0\">Speech to Text</h2>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>First we import <code>SpeechToTextV1</code> from <code>ibm_watson</code>.For more information on the API, please click on this <a href=\"https://cloud.ibm.com/apidocs/speech-to-text?code=python\">link</a></p>\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"from ibm_watson import SpeechToTextV1 \n",
"import json\n",
"from ibm_cloud_sdk_core.authenticators import IAMAuthenticator"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>The service endpoint is based on the location of the service instance, we store the information in the variable URL. To find out which URL to use, view the service credentials.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"url_s2t = \"https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/706e8a32-a293-4560-9c16-c1d8035f61f0\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>You require an API key, and you can obtain the key on the <a href=\"https://cloud.ibm.com/resources\">Dashboard </a>.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"iam_apikey_s2t = \"LRKAFntI3Najc9aiaGDv7UYQfzno196vp8DmcXyw4ToU\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>You create a <a href=\"http://watson-developer-cloud.github.io/python-sdk/v0.25.0/apis/watson_developer_cloud.speech_to_text_v1.html\">Speech To Text Adapter object</a> the parameters are the endpoint and API key.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<ibm_watson.speech_to_text_v1_adapter.SpeechToTextV1Adapter at 0x7fa5ccb1d668>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"authenticator = IAMAuthenticator(iam_apikey_s2t)\n",
"s2t = SpeechToTextV1(authenticator=authenticator)\n",
"s2t.set_service_url(url_s2t)\n",
"s2t"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>Lets download the audio file that we will use to convert into text.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2020-10-18 11:38:55-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/PolynomialRegressionandPipelines.mp3\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 4234179 (4.0M) [audio/mpeg]\n",
"Saving to: ‘PolynomialRegressionandPipelines.mp3’\n",
"\n",
"PolynomialRegressio 100%[===================>] 4.04M 4.92MB/s in 0.8s \n",
"\n",
"2020-10-18 11:38:57 (4.92 MB/s) - ‘PolynomialRegressionandPipelines.mp3’ saved [4234179/4234179]\n",
"\n"
]
}
],
"source": [
"!wget -O PolynomialRegressionandPipelines.mp3 https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/labs/PolynomialRegressionandPipelines.mp3\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We have the path of the wav file we would like to convert to text</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"filename='PolynomialRegressionandPipelines.mp3'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We create the file object <code>wav</code> with the wav file using <code>open</code> ; we set the <code>mode</code> to \"rb\" , this is similar to read mode, but it ensures the file is in binary mode.We use the method <code>recognize</code> to return the recognized text. The parameter audio is the file object <code>wav</code>, the parameter <code>content_type</code> is the format of the audio file.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"with open(filename, mode=\"rb\") as wav:\n",
" response = s2t.recognize(audio=wav, content_type='audio/mp3')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>The attribute result contains a dictionary that includes the translation:</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'result_index': 0,\n",
" 'results': [{'final': True,\n",
" 'alternatives': [{'transcript': 'in this video we will cover polynomial regression and pipelines ',\n",
" 'confidence': 0.94}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': \"what do we do when a linear model is not the best fit for our data let's look into another type of regression model the polynomial regression we transform our data into a polynomial then use linear regression to fit the parameters that we will discuss pipelines pipelines are way to simplify your code \",\n",
" 'confidence': 0.9}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': \"polynomial regression is a special case of the general linear regression this method is beneficial for describing curvilinear relationships what is a curvilinear relationship it's what you get by squaring or setting higher order terms of the predictor variables in the model transforming the data the model can be quadratic which means the predictor variable in the model is squared we use a bracket to indicated as an exponent this is the second order polynomial regression with a figure representing the function \",\n",
" 'confidence': 0.95}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': 'the model can be cubic which means the predictor variable is cute this is the third order polynomial regression we see by examining the figure that the function has more variation ',\n",
" 'confidence': 0.95}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': \"there also exists higher order polynomial regressions when a good fit hasn't been achieved by second or third order we can see in figures how much the graphs change when we change the order of the polynomial regression the degree of the regression makes a big difference and can result in a better fit if you pick the right value in all cases the relationship between the variable in the parameter is always linear \",\n",
" 'confidence': 0.91}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': \"let's look at an example from our data we generate a polynomial regression model \",\n",
" 'confidence': 0.89}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': 'in python we do this by using the poly fit function in this example we develop a third order polynomial regression model base we can print out the model symbolic form for the model is given by the following expression ',\n",
" 'confidence': 0.92}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': \"negative one point five five seven X. one cute plus two hundred four point eight X. one squared plus eight thousand nine hundred sixty five X. one plus one point three seven times ten to the power of five we can also have multi dimensional polynomial linear regression the expression can get complicated here are just some of the terms for two dimensional second order polynomial none pies poly fit function cannot perform this type of regression we use the preprocessing librarian scikit learn to create a polynomial feature object the constructor takes the degree of the polynomial as a parameter then we transform the features into a polynomial feature with the fit underscore transform method let's do a more intuitive example \",\n",
" 'confidence': 0.9}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': 'consider the feature shown here applying the method we transform the data we now have a new set of features that are transformed version of our original features as that I mention of the data gets larger we may want to normalize multiple features as scikit learn instead we can use the preprocessing module to simplify many tasks for example we can standardize each feature simultaneously we import standard scaler we train the object fit the scale object then transform the data into a new data frame on a rate X. underscore scale there are more normalization methods available in the pre processing library as well as other transformations we can simplify our code by using a pipeline library there are many steps to getting a prediction for example normalization polynomial transform and linear regression we simplify the process using a pipeline ',\n",
" 'confidence': 0.9}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': 'pipeline sequentially perform a series of transformations the last step carries out a prediction first we import all the modules we need then we import the library pipeline we create a list of topples the first element in the topple contains the name of the estimator model the second element contains model constructor we input the list in the pipeline constructor we now have a pipeline object we can train the pipeline by applying the train method to the pipeline object we can also produce a prediction as well ',\n",
" 'confidence': 0.89}]},\n",
" {'final': True,\n",
" 'alternatives': [{'transcript': 'the method normalizes the data performs a polynomial transform then outputs a prediction ',\n",
" 'confidence': 0.89}]}]}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"response.result"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/ipykernel_launcher.py:3: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead\n",
" This is separate from the ipykernel package so we can avoid doing imports until\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>transcript</th>\n",
" <th>confidence</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>in this video we will cover polynomial regress...</td>\n",
" <td>0.94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>what do we do when a linear model is not the b...</td>\n",
" <td>0.90</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>polynomial regression is a special case of the...</td>\n",
" <td>0.95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>the model can be cubic which means the predict...</td>\n",
" <td>0.95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>there also exists higher order polynomial regr...</td>\n",
" <td>0.91</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>let's look at an example from our data we gene...</td>\n",
" <td>0.89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>in python we do this by using the poly fit fun...</td>\n",
" <td>0.92</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>negative one point five five seven X. one cute...</td>\n",
" <td>0.90</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>consider the feature shown here applying the m...</td>\n",
" <td>0.90</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>pipeline sequentially perform a series of tran...</td>\n",
" <td>0.89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>the method normalizes the data performs a poly...</td>\n",
" <td>0.89</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" transcript confidence\n",
"0 in this video we will cover polynomial regress... 0.94\n",
"1 what do we do when a linear model is not the b... 0.90\n",
"2 polynomial regression is a special case of the... 0.95\n",
"3 the model can be cubic which means the predict... 0.95\n",
"4 there also exists higher order polynomial regr... 0.91\n",
"5 let's look at an example from our data we gene... 0.89\n",
"6 in python we do this by using the poly fit fun... 0.92\n",
"7 negative one point five five seven X. one cute... 0.90\n",
"8 consider the feature shown here applying the m... 0.90\n",
"9 pipeline sequentially perform a series of tran... 0.89\n",
"10 the method normalizes the data performs a poly... 0.89"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pandas.io.json import json_normalize\n",
"\n",
"json_normalize(response.result['results'],\"alternatives\")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<ibm_cloud_sdk_core.detailed_response.DetailedResponse at 0x7fa5ccb1db38>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"response"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We can obtain the recognized text and assign it to the variable <code>recognized_text</code>:</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"'in this video we will cover polynomial regression and pipelines '"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"recognized_text=response.result['results'][0][\"alternatives\"][0][\"transcript\"]\n",
"recognized_text\n",
"#type(recognized_text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"ref1\">Language Translator</h2>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>First we import <code>LanguageTranslatorV3</code> from ibm_watson. For more information on the API click <a href=\"https://cloud.ibm.com/apidocs/language-translator?code=python\"> here</a></p>\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"from ibm_watson import LanguageTranslatorV3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>The service endpoint is based on the location of the service instance, we store the information in the variable URL. To find out which URL to use, view the service credentials.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"url_lt='https://api.us-south.language-translator.watson.cloud.ibm.com/instances/21332ea1-3d3e-44a7-8de8-194809b142fe'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>You require an API key, and you can obtain the key on the <a href=\"https://cloud.ibm.com/resources\">Dashboard</a>.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"apikey_lt='D7UP0qh92eS0WdN48Owu_0LPpGw7ebI2x7paVIDAnvSf'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>API requests require a version parameter that takes a date in the format version=YYYY-MM-DD. This lab describes the current version of Language Translator, 2018-05-01</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"version_lt='2018-05-01'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>we create a Language Translator object <code>language_translator</code>:</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"<ibm_watson.language_translator_v3.LanguageTranslatorV3 at 0x7fa5357a65c0>"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"authenticator = IAMAuthenticator(apikey_lt)\n",
"language_translator = LanguageTranslatorV3(version=version_lt,authenticator=authenticator)\n",
"language_translator.set_service_url(url_lt)\n",
"language_translator"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We can get a Lists the languages that the service can identify.\n",
"The method Returns the language code. For example English (en) to Spanis (es) and name of each language.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/ipykernel_launcher.py:3: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead\n",
" This is separate from the ipykernel package so we can avoid doing imports until\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>language</th>\n",
" <th>name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>af</td>\n",
" <td>Afrikaans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ar</td>\n",
" <td>Arabic</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>az</td>\n",
" <td>Azerbaijani</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ba</td>\n",
" <td>Bashkir</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>be</td>\n",
" <td>Belarusian</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>71</th>\n",
" <td>uk</td>\n",
" <td>Ukrainian</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72</th>\n",
" <td>ur</td>\n",
" <td>Urdu</td>\n",
" </tr>\n",
" <tr>\n",
" <th>73</th>\n",
" <td>vi</td>\n",
" <td>Vietnamese</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74</th>\n",
" <td>zh</td>\n",
" <td>Simplified Chinese</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75</th>\n",
" <td>zh-TW</td>\n",
" <td>Traditional Chinese</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>76 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" language name\n",
"0 af Afrikaans\n",
"1 ar Arabic\n",
"2 az Azerbaijani\n",
"3 ba Bashkir\n",
"4 be Belarusian\n",
".. ... ...\n",
"71 uk Ukrainian\n",
"72 ur Urdu\n",
"73 vi Vietnamese\n",
"74 zh Simplified Chinese\n",
"75 zh-TW Traditional Chinese\n",
"\n",
"[76 rows x 2 columns]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pandas.io.json import json_normalize\n",
"\n",
"json_normalize(language_translator.list_identifiable_languages().get_result(), \"languages\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We can use the method <code>translate</code> this will translate the text. The parameter text is the text. Model_id is the type of model we would like to use use we use list the language . In this case, we set it to 'en-es' or English to Spanish. We get a Detailed Response object translation_response</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"<ibm_cloud_sdk_core.detailed_response.DetailedResponse at 0x7fa5357ca9b0>"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"translation_response = language_translator.translate(\\\n",
" text=recognized_text, model_id='en-ko') # ko for korean\n",
"translation_response"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>The result is a dictionary.</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'translations': [{'translation': '이 비디오에서 우리는 다항식 회귀와 파이프라인을 다룰 것이다. '}],\n",
" 'word_count': 10,\n",
" 'character_count': 64}"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"translation=translation_response.get_result()\n",
"translation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We can obtain the actual translation as a string as follows:</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'이 비디오에서 우리는 다항식 회귀와 파이프라인을 다룰 것이다. '"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"spanish_translation =translation['translations'][0]['translation']\n",
"spanish_translation "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We can translate back to English</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"translation_new = language_translator.translate(text=spanish_translation ,model_id='ko-en').get_result()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We can obtain the actual translation as a string as follows:</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'In this video, we will deal with polynomial regression and pipelines. '"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"translation_eng=translation_new['translations'][0]['translation']\n",
"translation_eng"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<p>We can convert it to French as well:</p>\n"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"French_translation=language_translator.translate(\n",
" text=translation_eng , model_id='en-fr').get_result()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Dans cette vidéo, nous traiterons de la régression polynomiale et des pipelines. '"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"French_translation['translations'][0]['translation']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Language Translator</h3>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" <a href=\"https://cloud.ibm.com/catalog/services/watson-studio\"><img src=\"https://ibm.box.com/shared/static/irypdxea2q4th88zu1o1tsd06dya10go.png\" width=\"750\" align=\"center\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>References</b>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[https://cloud.ibm.com/apidocs/speech-to-text?code=python](https://cloud.ibm.com/apidocs/speech-to-text?code=python&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork-19487395&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[https://cloud.ibm.com/apidocs/language-translator?code=python](https://cloud.ibm.com/apidocs/language-translator?code=python&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork-19487395&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Authors:\n",
"\n",
" [Joseph Santarcangelo](https://www.linkedin.com/in/joseph-s-50398b136?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork-19487395&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork-19487395&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) \n",
"\n",
"Joseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.\n",
"\n",
"## Other Contributor(s)\n",
"\n",
"<a href=\"https://www.linkedin.com/in/fanjiang0619/\">Fan Jiang</a>\n",
"\n",
"## Change Log\n",
"\n",
"| Date (YYYY-MM-DD) | Version | Changed By | Change Description |\n",
"| ----------------- | ------- | ---------- | ---------------------------------- |\n",
"| 2020-08-26 | 2.0 | Lavanya | Moved lab to course repo in GitLab |\n",
"| | | | |\n",
"| | | | |\n",
"\n",
"<hr/>\n",
"\n",
"## <h3 align=\"center\"> © IBM Corporation 2020. All rights reserved. <h3/>\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment