Skip to content

Instantly share code, notes, and snippets.

@OlegJakushkin
Last active January 12, 2021 17:46
Show Gist options
  • Save OlegJakushkin/66552c4446aad3d5a167794d8e6951df to your computer and use it in GitHub Desktop.
Save OlegJakushkin/66552c4446aad3d5a167794d8e6951df to your computer and use it in GitHub Desktop.
Translating LaTeX using Google Translate V3 In Colaboratory
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Google Translate V3 OJ",
"provenance": [],
"collapsed_sections": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "S5EgSNLvXywU"
},
"source": [
"# Google Cloud Platform - Using Machine Learning APIs ).\n",
"\n",
"This is an upgraded Python revision of [this notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/CPB100/lab4c/mlapis.ipynb).\n",
"\n",
"This notebook originally was being processed using DataLab on the Google Cloud Platform. This particular incarnation of the notebook is for running on Google Colaboratory which I am trying out for the first time.\n",
"\n",
"### Security\n",
"\n",
"First things first - we need to authenticate against the Google Cloud APIs.\n",
"\n",
"#### Getting a Google API Credential.\n",
"\n",
"First, visit <a href=\"http://console.cloud.google.com/apis\">API console</a>, choose \"Credentials\" on the left-hand menu. Choose \"Create Credentials\" and generate an API key for your application. You should probably restrict it by IP address to prevent abuse, but for now, just leave that field blank and delete the API key after trying out this demo.\n",
"\n",
"Then, when you have your key, you will enter it in this first executable cell:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "WxMSTfPdfoaO",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "cc8ff8f6-abeb-4bf0-9601-4d6ef3d4c7a4"
},
"source": [
"import getpass\n",
"\n",
"APIKEY = getpass.getpass()"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": [
"··········\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2w4bHiuYXywg"
},
"source": [
"From the same API console, choose \"Dashboard\" on the left-hand menu and \"Enable API\".\n",
"\n",
"Enable the following APIs for your project (search for them) if they are not already enabled:\n",
"<ol>\n",
"<li> Google Translate API </li>\n",
"<li> Google Cloud Vision API </li>\n",
"<li> Google Natural Language API </li>\n",
"<li> Google Cloud Speech API </li>\n",
"</ol>\n",
"\n",
"Finally, because we are calling the APIs from Python (clients in many other languages are available), let's install the Python package (it's not installed by default on Datalab).\n",
"\n",
"```!pip install --upgrade pip```\n",
"\n",
"```!pip install --upgrade google-api-python-client```\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "QqJHODYey6Pf",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "6733299d-ff22-4685-e6b9-7f3a01ff0957"
},
"source": [
"!pip install --upgrade pip\r\n",
"!pip install --upgrade google-api-python-client"
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": [
"Requirement already satisfied: pip in /usr/local/lib/python3.6/dist-packages (20.3.3)\n",
"Requirement already satisfied: google-api-python-client in /usr/local/lib/python3.6/dist-packages (1.12.8)\n",
"Requirement already satisfied: google-auth>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from google-api-python-client) (1.24.0)\n",
"Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python3.6/dist-packages (from google-api-python-client) (3.0.1)\n",
"Requirement already satisfied: httplib2<1dev,>=0.15.0 in /usr/local/lib/python3.6/dist-packages (from google-api-python-client) (0.17.4)\n",
"Requirement already satisfied: six<2dev,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from google-api-python-client) (1.15.0)\n",
"Requirement already satisfied: google-api-core<2dev,>=1.21.0 in /usr/local/lib/python3.6/dist-packages (from google-api-python-client) (1.24.1)\n",
"Requirement already satisfied: google-auth-httplib2>=0.0.3 in /usr/local/lib/python3.6/dist-packages (from google-api-python-client) (0.0.4)\n",
"Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client) (2018.9)\n",
"Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client) (1.52.0)\n",
"Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client) (2.23.0)\n",
"Requirement already satisfied: setuptools>=34.0.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client) (51.1.1)\n",
"Requirement already satisfied: protobuf>=3.12.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client) (3.12.4)\n",
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.6/dist-packages (from google-auth>=1.16.0->google-api-python-client) (0.2.8)\n",
"Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from google-auth>=1.16.0->google-api-python-client) (4.2.0)\n",
"Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.6/dist-packages (from google-auth>=1.16.0->google-api-python-client) (4.6)\n",
"Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.6/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=1.16.0->google-api-python-client) (0.4.8)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client) (2020.12.5)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client) (2.10)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client) (1.24.3)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client) (3.0.4)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iTdwxe76Xywy"
},
"source": [
"## Invoke Translate API\n",
"\n",
"[Google Cloud Translation](https://cloud.google.com/translate/docs/) documentation. I know a lot has gone on here - see [my LSTM notebook](https://nbviewer.jupyter.org/github/jeffreyrnorton/Notebooks_MachineLearning/blob/master/DeepNetsWithKeras_ANN_LSTM.ipynb) where I trained a Seq2Seq LSTM network to do translation on a relatively small vocabulary.\n",
"\n",
"Also note that this is a service. The translation is not happening on the VM running the notebook, but is running as a service. This is where we start seeing the true power of cloud compute!"
]
},
{
"cell_type": "code",
"metadata": {
"id": "HjjvlaPkXyw0"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "tueBLHcXXyw6"
},
"source": [
"That is really cool - how would a Gallego (a person from Galicia in the Northwest corner of Spain) say it?"
]
},
{
"cell_type": "code",
"metadata": {
"id": "YiUqUjZoXyw8"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "MVqvz5erXyxE"
},
"source": [
"## Invoke Vision API\n",
"\n",
"The [Vision API](https://cloud.google.com/vision/docs/) can work off an image in Cloud Storage or embedded directly into a POST message. I'll use Cloud Storage and do OCR on this image: <img src=\"https://storage.googleapis.com/cloud-training-demos/vision/sign2.jpg\" width=\"200\" />. \n",
"That photograph is from http://www.publicdomainpictures.net/view-image.php?image=15842."
]
},
{
"cell_type": "code",
"metadata": {
"id": "MeyNji74XyxG"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "xo82qdtIXyxO"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Buuu_L0SXyxa"
},
"source": [
"## Translate sign\n",
"\n",
"I don't read Chinese - what does it say. Let's run it through the translator."
]
},
{
"cell_type": "code",
"metadata": {
"id": "3bte0ys8Xyxa"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "xHsP9QTJXyxg"
},
"source": [
"## More OCR with the Vision API\n",
"OCR intrigues me - it is actually quite difficult to do well and there are engines like Tesseract that aren't too bad. So I want to try the engine with a POST and see if it can extract some English for me, but on a very difficult sign (text with skew):"
]
},
{
"cell_type": "code",
"metadata": {
"id": "dvUj1VwijpvG"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "GbTiAkidXyxi"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "HrDpMknvXyxq"
},
"source": [
"Checking the end of the text with the sign - how did we do? We missed one full line of text - so OCR still remains a difficult text, even for Google!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "We0OEsrXXyxq"
},
"source": [
"### PDF Document Translation\n",
"\n",
"The OCR above works in general on images. However, there also is a service which operates on PDF documents - as they [say](https://cloud.google.com/vision/docs/ocr), small dense text.\n",
"\n",
"Let's try this page from the writings of everybody's favorite fat king eating turkey legs and screaming out \"Call the Executioner\" - King Henry the VIII:\n",
"![](https://storage.googleapis.com/scoobie_earthquakes/HenryVII_863arabic_0202.jpg)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "wqF7wmn6Xyxu"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "2WIrguLjXyx0"
},
"source": [
"## Sentiment analysis with Language API\n",
"\n",
"Let's evaluate the sentiment of some famous quotes using [Google Cloud Natural Language API](https://cloud.google.com/natural-language/docs/)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "0Tqux2TOXyx2"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "860-c_rPXyx6"
},
"source": [
"In a [paper](https://arxiv.org/pdf/1010.3003.pdf) published in 2010 by Bollen et al, it was claimed that there was 87% correlation between tweets and the stock market. In January 2013, the following *false* tweet was sent which [momentarily sent Serepta Therapeutics falling](http://fortune.com/2015/12/07/dataminr-hedge-funds-twitter-data/), but when investors realized the ruse, it quickly recovered. Let's process the [tweet](http://kiddynamitesworld.com/the-sec-needs-to-arrest-some-people/).\n",
"\n",
"\"$SRPT FDA steps in as its 48 weeks results on Eteplirsen results are tainted and have been doctored they believeTrial papers seized by FDA.\""
]
},
{
"cell_type": "code",
"metadata": {
"id": "JKQykWnmXyx8"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7LRa6r8aXyyI"
},
"source": [
"And we see that this *is* a very negative statement - no wonder it impacted the market as Forbes relates."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YIDoG6EpXyyK"
},
"source": [
"<h2> Speech API </h2>\n",
"\n",
"The [Speech API](https://cloud.google.com/speech-to-text/docs/) can work on streaming data, audio content encoded and embedded directly into the POST message, or on a file on Cloud Storage. Pass in this <a href=\"https://storage.googleapis.com/cloud-training-demos/vision/audio.raw\">audio file</a> from Cloud Storage."
]
},
{
"cell_type": "code",
"metadata": {
"id": "3ufELN3cXyyS"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "fDSX-f3wXyyY"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "rxT5FFqaXyye"
},
"source": [
"## Challenge Exercise\n",
"\n",
"Here are a few portraits from the Metropolitan Museum of Art, New York (they are part of a [BigQuery public dataset](https://bigquery.cloud.google.com/dataset/bigquery-public-data:the_met) ):\n",
"\n",
"gs://gcs-public-data--met/14295/0.jpg \n",
"<img src=\"https://raw.githubusercontent.com/jeffreyrnorton/Notebooks_MachineLearning/master/images/14295.jpg\" width=400>\n",
"\n",
"gs://gcs-public-data--met/15091/0.jpg \n",
"<img src=\"https://raw.githubusercontent.com/jeffreyrnorton/Notebooks_MachineLearning/master/images/15091.jpg\" width=400>\n",
"\n",
"(Two given in the original assignment are not publically available and what good is it to tell you unhappy or happy when you can't see the photo?)\n",
"\n",
"Use the Vision API to identify which of these images depict happy people and which ones depict unhappy people.\n",
"\n",
"Hint: You will need to look for joyLikelihood and/or sorrowLikelihood from the response."
]
},
{
"cell_type": "code",
"metadata": {
"id": "_6JEstlzXyyi"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "VpR9BzotXyyu"
},
"source": [
"As a matter of interest - we can crop to the face with the following code:\n",
"```\n",
"xlow = ylow = 100000\n",
"xhigh = yhigh = -1\n",
"for point in responses['responses'][0]['faceAnnotations'][0]['boundingPoly']['vertices']:\n",
" x = point['x']\n",
" y = point['y']\n",
" if x < xlow: xlow = x\n",
" if y < ylow: ylow = y\n",
" if x > xhigh: xhigh = x\n",
" if y > yhigh: yhigh = y\n",
"\n",
"from PIL import Image\n",
"import urllib.request\n",
"\n",
"url = \"https://raw.githubusercontent.com/jeffreyrnorton/Notebooks_MachineLearning/master/images/14295.jpg\"\n",
"response = urllib.request.urlretrieve(url, \"tmp/i.jpg\")\n",
"img = Image.open(\"tmp/i.jpg\")\n",
"img2 = img.crop((xlow, ylow, xhigh, yhigh))\n",
"img2.save(\"tmp/img2.jpg\")\n",
"```\n",
"\n",
"But of course, what we are really interested in are the emotions which we can print out."
]
},
{
"cell_type": "code",
"metadata": {
"id": "nKMCKyeUXyyy"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "mrMjeMXUXyy2"
},
"source": [
"<img src=\"https://raw.githubusercontent.com/jeffreyrnorton/Notebooks_MachineLearning/master/images/14295.jpg\" width=400>\n",
"\n",
"Now that we have explored this - let's write the code to process the other image very concisely and as a function where we assume we are always getting the image from a Google bucket."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ZiljD_4SXyy4"
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "fnOOtibaXyy6"
},
"source": [
"<img src=\"https://raw.githubusercontent.com/jeffreyrnorton/Notebooks_MachineLearning/master/images/15091.jpg\" width=400>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TVqCm0C8Xyy6"
},
"source": [
"<h2> Clean up </h2>\n",
"\n",
"Remember to delete the API key by visiting <a href=\"http://console.cloud.google.com/apis\">API console</a>.\n",
"\n",
"If necessary, commit all your notebooks to git.\n",
"\n",
"If you are running Datalab on a Compute Engine VM or delegating to one, remember to stop or shut it down so that you are not charged.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KMgp1T0FXyy8"
},
"source": [
"Copyright 2018 Google Inc.\n",
"Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
"http://www.apache.org/licenses/LICENSE-2.0\n",
"Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
]
},
{
"cell_type": "code",
"metadata": {
"id": "j8Edgoja2oPy",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8be92a5d-2f76-4a6c-cb63-7760b38504ea"
},
"source": [
"!pip install --upgrade google-cloud-translate"
],
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": [
"Requirement already satisfied: google-cloud-translate in /usr/local/lib/python3.6/dist-packages (3.0.2)\n",
"Requirement already satisfied: google-api-core[grpc]<2.0.0dev,>=1.22.0 in /usr/local/lib/python3.6/dist-packages (from google-cloud-translate) (1.24.1)\n",
"Requirement already satisfied: proto-plus>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from google-cloud-translate) (1.13.0)\n",
"Requirement already satisfied: libcst>=0.2.5 in /usr/local/lib/python3.6/dist-packages (from google-cloud-translate) (0.3.16)\n",
"Requirement already satisfied: google-cloud-core<2.0dev,>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from google-cloud-translate) (1.5.0)\n",
"Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (1.52.0)\n",
"Requirement already satisfied: google-auth<2.0dev,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (1.24.0)\n",
"Requirement already satisfied: six>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (1.15.0)\n",
"Requirement already satisfied: setuptools>=34.0.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (51.1.1)\n",
"Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (2018.9)\n",
"Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (2.23.0)\n",
"Requirement already satisfied: protobuf>=3.12.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (3.12.4)\n",
"Requirement already satisfied: grpcio<2.0dev,>=1.29.0 in /usr/local/lib/python3.6/dist-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (1.32.0)\n",
"Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.6/dist-packages (from google-auth<2.0dev,>=1.21.1->google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (4.6)\n",
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.6/dist-packages (from google-auth<2.0dev,>=1.21.1->google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (0.2.8)\n",
"Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from google-auth<2.0dev,>=1.21.1->google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (4.2.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.2 in /usr/local/lib/python3.6/dist-packages (from libcst>=0.2.5->google-cloud-translate) (3.7.4.3)\n",
"Requirement already satisfied: typing-inspect>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from libcst>=0.2.5->google-cloud-translate) (0.6.0)\n",
"Requirement already satisfied: pyyaml>=5.2 in /usr/local/lib/python3.6/dist-packages (from libcst>=0.2.5->google-cloud-translate) (5.3.1)\n",
"Requirement already satisfied: dataclasses>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from libcst>=0.2.5->google-cloud-translate) (0.8)\n",
"Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.6/dist-packages (from pyasn1-modules>=0.2.1->google-auth<2.0dev,>=1.21.1->google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (0.4.8)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (2020.12.5)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (1.24.3)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (3.0.4)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate) (2.10)\n",
"Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.6/dist-packages (from typing-inspect>=0.4.0->libcst>=0.2.5->google-cloud-translate) (0.4.3)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "sOWWLzdXjQFF",
"outputId": "dbf8c91a-71cb-4cd6-af77-03c94c1bd924"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/content/gdrive')"
],
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": [
"Mounted at /content/gdrive\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "KVwRUp9-l_dq"
},
"source": [
"import os\r\n",
"os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"]=\"/content/gdrive/MyDrive/Colab Notebooks/BlazorAuthDemo-221d8f27a133.json\""
],
"execution_count": 13,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "DpySj3XUXyzA",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "025a19ea-66b3-4564-f165-657ee547a52f"
},
"source": [
"import os\r\n",
"from google.cloud import translate\r\n",
"def translate_text(text=\"YOUR_TEXT_TO_TRANSLATE\", filename=\"test\", project_id=\"blazorauthdemo-296322\"):\r\n",
" \"\"\"Translating Text.\"\"\"\r\n",
" print(\"running\")\r\n",
" client = translate.TranslationServiceClient()\r\n",
"\r\n",
" location = \"global\"\r\n",
"\r\n",
" parent = f\"projects/{project_id}/locations/{location}\"\r\n",
"\r\n",
" # Detail on supported types can be found here:\r\n",
" # https://cloud.google.com/translate/docs/supported-formats\r\n",
" response = client.translate_text(\r\n",
" request={\r\n",
" \"parent\": parent,\r\n",
" \"contents\": [text],\r\n",
" \"mime_type\": \"text/plain\", # mime types: text/plain, text/html\r\n",
" \"source_language_code\": \"ru\",\r\n",
" \"target_language_code\": \"en-UK\",\r\n",
" }\r\n",
" )\r\n",
"\r\n",
" # Display the translation for each input text provided\r\n",
" if(len( response.translations ) > 1) :\r\n",
" print(\"Error on: \" + filename)\r\n",
" raise\r\n",
" for translation in response.translations:\r\n",
" return translation.translated_text\r\n",
"\r\n",
"\r\n",
"def split_lined(file_path, lines_per_file=140):\r\n",
" name = os.path.split(file_path)[-1]\r\n",
" base = os.path.split(file_path)[0]\r\n",
" print(base)\r\n",
" smallfile = None\r\n",
" with open(file_path) as bigfile:\r\n",
" for lineno, line in enumerate(bigfile):\r\n",
" if lineno % lines_per_file == 0:\r\n",
" if smallfile:\r\n",
" smallfile.close()\r\n",
" small_filename = base + \"/\" + name + \"_{}.txt\".format(lineno + lines_per_file)\r\n",
" print(small_filename)\r\n",
" smallfile = open(small_filename, \"w\")\r\n",
" smallfile.write(line)\r\n",
" if smallfile:\r\n",
" smallfile.close()\r\n",
" os.remove(file_path)\r\n",
"\r\n",
"#ensure the path is set correctly\r\n",
"!echo $GOOGLE_APPLICATION_CREDENTIALS\r\n",
"out_base = \"/content/gdrive/MyDrive/Colab Notebooks/dis-out/\"\r\n",
"in_base = \"/content/gdrive/MyDrive/Colab Notebooks/dis/\"\r\n",
"\r\n",
"restart_scan = True\r\n",
"iteration = 0\r\n",
"while(restart_scan):\r\n",
" iteration = iteration+1 \r\n",
" for directory, subdirectories, files in os.walk(in_base):\r\n",
" for file in files:\r\n",
" my_string = open(in_base + file).read()\r\n",
" name = os.path.split(file)[-1]\r\n",
" print(str(name) + \" ...\")\r\n",
" try:\r\n",
" translated = translate_text(my_string, name)\r\n",
" with open(out_base + name, \"w\") as text_file:\r\n",
" text_file.write(translated)\r\n",
" os.remove(in_base + file)\r\n",
" except:\r\n",
" print(str(name) + \" failed\")\r\n",
" print(\"splitting it into 140 lines files\")\r\n",
" split_lined(in_base + file)\r\n",
" pass\r\n",
" if len(os.listdir(in_base) ) == 0 or iteration > 1:\r\n",
" restart_scan = False\r\n",
" print(iteration)\r\n"
],
"execution_count": 25,
"outputs": [
{
"output_type": "stream",
"text": [
"/content/gdrive/MyDrive/Colab Notebooks/BlazorAuthDemo-221d8f27a133.json\n",
"appendix.tex ...\n",
"running\n",
"appendix.tex failed\n",
"splitting it into 140 lines files\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis/appendix.tex_140.txt\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis/appendix.tex_280.txt\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis/appendix.tex_420.txt\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis/appendix.tex_560.txt\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis/appendix.tex_700.txt\n",
"part3.tex ...\n",
"running\n",
"part3.tex failed\n",
"splitting it into 140 lines files\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis/part3.tex_140.txt\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis/part3.tex_280.txt\n",
"/content/gdrive/MyDrive/Colab Notebooks/dis/part3.tex_420.txt\n",
"appendix.tex_140.txt ...\n",
"running\n",
"appendix.tex_280.txt ...\n",
"running\n",
"appendix.tex_420.txt ...\n",
"running\n",
"appendix.tex_560.txt ...\n",
"running\n",
"appendix.tex_700.txt ...\n",
"running\n",
"part3.tex_140.txt ...\n",
"running\n",
"part3.tex_280.txt ...\n",
"running\n",
"part3.tex_420.txt ...\n",
"running\n",
"2\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "MVBkeTmTLUkL"
},
"source": [
"#from google.colab import drive\r\n",
"#drive.mount('/content/gdrive')\r\n",
"\r\n",
"#ensure the file is accessibl\r\n",
"import os\r\n",
"os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"]=\"/content/gdrive/MyDrive/Colab Notebooks/BlazorAuthDemo-221d8f27a133.json\""
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment