Skip to content

Instantly share code, notes, and snippets.

@stefanik12
Created February 8, 2021 12:38
Show Gist options
  • Save stefanik12/11009c3a612c80ee9f9f0acdf40f7bd2 to your computer and use it in GitHub Desktop.
Save stefanik12/11009c3a612c80ee9f9f0acdf40f7bd2 to your computer and use it in GitHub Desktop.

Cloud translation services (and their alternatives) for automated use

When it comes to translation, Google Translate became a tool of everyday professional use perhaps for most of the people and hence it is also a first choice that comes to my mind when looking for a service to use in a larger-scale application. However, one quickly finds that there are actually many big corporate providers of translation APIs that support more than a hundred of languages.

| Not surprisingly though, providers of paid translation API vary in both quality and price.

In this article, I'll share our experience in an overview of commercially-available translation services, that you could possibly use in your app, or service. Particularly, we’ll take a look at Google Translate, Amazon Translator and Microsoft Translator. Additionally, we compare these translation engines to the freely-available options: we’ll outline what you might need to create a translation engine yourself to give you an idea of how deep you need to dive in, to create a useful translation engine yourself. We’ll acknowledge both the benefits and drawbacks of keeping the translation engine in your hands.

Modern machine translation: how do we translate in 2021

Before picking the right examples for comparison, let’s just take a look at what kind of data modern translation systems use to model the translation from one language to another.

The selection of the training data, just as the neural model architecture, that each of the services use, is kept a secret by their providers. It’s also quite likely that some services use proprietary data sources that, for example, Google surely has available from their web crawls or other services. So what difference can it make, quality-wise?

Neural language models, that are widely utilised nowadays by all the major translation services, require a large set of parallel corporas that contain aligned sequences in both source language and target language. Language model then learns the complicated relation that maps the source sequence to target sequence.

[image: BART objective: noisy to neat language output] caption: language models are pre-trained for general understanding of the language, via relatively simple objective, such as language modeling, where model is trained to predict the next word in the sequence. Better language generation quality can then be achieved with more complicated training objectives, such as denoising, where model is trained to reorder the random sequence of words to an original sequence.

caption: parallel corporas, contain a large volume of aligned text pairs. Translator is then trained to map lefthand texts to righthand pairs. Such corporas are sometimes created as a side product of verbal transcripts, such as Paracrawl, technical documentation, or subtitles. The volume of sequence pairs needed to train a useful neural translation algorithm nowadays starts at small hundreds on thousands of pairs.

[image: source->target]

Let’s translate!

Once the model is trained, a quality of the actual translation heavily depend on what kind of input text you translate and how well the model understands the domain. In order to get a meaningful and coherent translation of the text from some arbitrary domain, this domain should be represented in the training data. Neural machine translation models made a huge step towards generalization, ahead of their ancestor statistical translators, that, for example, Google Used until 2018. However, still it is good to be aware that, for example, a translator trained solemnly on a domain of law discourses, such as ParaCrawl, will perform poorly on, for example medical domain.

| Being aware of the available data sources and their domains, we’ve tried to evaluate the translation on as “harsh” examples as we could come up with.

We pick two samples from two rather diverse domains, one of the fairytale of Little Red Riding Hood story, with rather unlikely sequential composition of storyline and a meeting transcript, generated from speech-to-text system, with its original flaws.

We also compare an output of our own translation model, that we have trained on a freely-available data sources of OPUS, that uses the model that we’ll describe below.

In each of the translation outputs, we denote the factual flaws of each of the translation - these are the parts of the output, that contain either misleading, or wrong information, regarding the input.

After getting over the tedious registrations, you can try the translators yourself! Here are the links: Google Translate, Amazon Translator, Microsoft Cognitive Translator. Gauss OPUS Translator.

Translation experiments

In the first experiment, we collect a snippet of The Little Red Riding Hood failytale in two distinct source languages: Czech and Spanish, and evaluate the output in English, to allow you to compare.

####1st Input text: Little Red Riding Hood in Czech:

"Žila jednou jedna hodná a milá dívenka. Všichni ji měli velice rádi a ze všech nejvíce maminka s babičkou. Babička jí ušila červený čepeček a podle něj jí začali říkat Červená Karkulka. Babička bydlela na samotě u lesa, kde široko daleko nebyla žádná jiná chaloupka. Babička se tam starala o lesní zvířátka. Jednou v létě maminka napekla bábovku, do košíku přidala láhev vína a řekla Karkulce: „Babička má dneska svátek. Vezmi košík a zanes ho k babičce do chaloupky. Ale jdi rovnou, ať se v lese nezatouláš!“ Karkulka pokývala hlavou a zvesela se vydala na cestu. Cestu si zpříjemnila poskakováním a broukáním si."

  1. Google Translate:

"There once lived a good and nice girl. Everyone loved her very much and mother and grandmother most of all. Grandmother sewed her a red cap and according to him they started calling her Little Red Riding Hood. Grandmother lived alone in a forest where there was no other cottage far and wide. Grandmother took care of the forest animals there. One summer, my mother baked a cake, added a bottle of wine to the basket, and said to Hood, "Grandma has a holiday today. Take the basket and take it to Grandma's cottage. But go straight, so you don't wander in the woods! ” Hood nodded and cheered merrily. She made her trip more enjoyable by bouncing and humming."

  1. Amazon Translator:

"There was one good and nice girl living once. Everyone liked her very much and most of all mum with grandmother. Grandma sewed her red beanie, and according to him they started calling her Little Red Riding Hood. Grandma lived secluded by the woods, where far and wide there was no other house. Grandma took care of the woodland critters there. One summer, Mum baked a candy, added a bottle of wine to her basket and told Riding Hood: “Grandma has a holiday today. Take the basket and take it to Grandma's house. But go straight so you don't stray in the woods! “The Riding Hood boiled his head and went on a journey. She made her journey more enjoyable by hopping and buzzing."

  1. Microsoft Cognitive Translator:

"There was a good, nice girl once. Everyone loved her very much and most of all mum and grandmother. Her grandmother made her a red hat, and according to him, they started calling her Little Red Riding Hood. My grandmother lived alone by the forest, where there was no other cottage far and wide. Grandma took care of the forest animals there. One summer, my mother baked a cake, added a bottle of wine to her basket, and told Little Red: "Grandma**'s having** a holiday today. Take the basket and take it to grandma's cottage. But go straight so you don't wander in the woods!" Little Red nodded her head and made her way cheerfully. She made her journey more pleasant by bouncing and humming."

  1. Gauss Translator - trained on open OPUS data, for each lang pair

"There was one good girl who lived there. They all loved her very much and most of them were mommy and grandmother made her a red hat and he said they started calling her Red Riding Hood. Grandma lived alone in the woods, where there was no other cottage nearby. Grandma took care of the forest animals there. One summer mommy baked a cake, added a bottle of wine to the basket and said to the Riding Hood: "Grandma has a holiday today. Take the basket and take it to her grandmother in the cottage. But go straight to the forest and don't wander!" She nodded her head and hurried on her way. She made the journey more pleasant by bouncing and humming. "

####2nd Input text: English transcript

In the second experiment, we translate the meeting transcript, originally written in English. We are aware that speech-to-text systems for English overperform the systems in other languages, hence the performance with transcripts in other languages would perhaps be more biased by the transcript quality.

Again, we denote the factual and misleading parts of translation, with the native speakers. You'll have to believe us with this one, if you do not speak Czech :)

"The tap project, led by former Secretary of Defense Ash Carter and on the West Coast by annual manual, is an effort to ensure that emerging technologies are both developed and managed in ways that protect humanity. Artificial intelligence is a foundational technology. That s court to many of the emerging technologies that we have tap, study and evaluate. Therefore, we re so pleased to be hosting today s of discussion with such a respected group of speakers from across the country who will introduce shortly. Today s discussion will address the following questions. What are the existing principles guiding the development and use of AI? What are the current gaps in AI governance, and how do we ensure responsible innovation moving forward to give you agree with brief overview of today s event? We ll hear briefly from Secretary Carter providing opening remarks, then Joey Ito, director of the M I T Media lab. Then we ll break into a fireside chat with several experts in the field, enclosed with an open Q and a session before I turn it over."

  1. Google Translate:

"Projekt tap, vedený bývalým ministrem obrany Ashem Carterem a každoročním manuálem na západním pobřeží, je snahou zajistit, aby vznikající technologie byly vyvíjeny a řízeny způsoby, které chrání lidstvo. Umělá inteligence je základní technologie. To je soud pro mnoho nově vznikajících technologií, které využíváme, studujeme a hodnotíme. Proto nás velmi těší, že dnes provádíme diskusi s tak respektovanou skupinou řečníků z celé země, kteří se brzy představí. Dnešní diskuse se bude zabývat následujícími otázkami. Jaké jsou stávající principy, kterými se řídí vývoj a používání AI? Jaké jsou současné mezery ve správě AI a jak zajistíme, aby se odpovědné inovace pohybovaly kupředu a poskytly vám souhlas s krátkým přehledem dnešní události? Krátce uslyšíme od tajemníka Cartera úvodní poznámky, poté Joey Ito, ředitel laboratoře M I T Media. Pak se vloupáme do krbu s několika odborníky v oboru, uzavřeným otevřeným Q a relací, než to otočím. "

  1. Amazon Translator:

"Projekt kohoutku pod vedením bývalého ministra obrany Ashe Cartera a každoroční příručky na západním pobřeží je snahou zajistit, aby vznikající technologie byly vyvíjeny a řízeny způsobem, který chrání lidstvo. Umělá inteligence je základní technologie. To je soud mnoha vznikajících technologií, které máme kohoutek, studovat a hodnotit. Proto jsme rádi, že jsme hostitelem dnešní diskuse s takovou respektovanou skupinou řečníků z celé země, kteří brzy představí. Dnešní diskuse se bude zabývat následujícími otázkami. Jaké jsou stávající zásady, které řídí vývoj a používání umělé inteligence? Jaké jsou současné mezery ve správě umělé inteligence a jak zajistíme, aby odpovědné inovace pokročily, abychom vám poskytli souhlas se stručným přehledem dnešní události? Krátce uslyšíme od ministra Carterové, který předává úvodní poznámky, pak Joey Ito, ředitel laboratoře M I T Media. Pak se vloupneme do chatu u krbu s několika odborníky v terénu, uzavřený s otevřeným Q a relací, než ho otočím. "

  1. Microsoft Congitive Translator:

"Projekt kohoutku, vedený bývalým ministrem obrany Ashem Carterem a na západním pobřeží každoroční příručkou, je snahou zajistit, aby nově vznikající technologie byly vyvíjeny a řízeny způsobem, který chrání lidstvo. Umělá inteligence je základní technologie. To je soud pro mnoho nově vznikajících technologií, které máme kohoutek, studium a hodnocení. Proto jsme tak rádi, že budeme hostit dnešní diskusi s tak respektovanou skupinou řečníků z celé země, kteří se brzy představí. Dnešní rozprava se bude zabývat následujícími otázkami. Jaké jsou stávající zásady, kterými se řídí vývoj a používání aiti? Jaké jsou současné nedostatky ve správě ai a jak zajistíme, aby se odpovědné inovace posunuly kupředu, abychom vám poskytli stručný přehled dnešního s. Krátce uslyšíme od tajemníka Cartera, který poskytl úvodní poznámky, pak Joeyho Ita, ředitele laboratoře M I T Media. Pak se vloupáme do rozhovoru u krbu s několika odborníky v terénu, uzavřený s otevřeným Q a zasedání, než jsem se obrátit. "

  1. Gauss Translator - trained on open OPUS data, for each lang pair

"Projekt stepování, vedený bývalým ministrem obrany Ash Carterem a na západním pobřeží každoroční příručkou, je snahou zajistit, aby nové technologie byly vyvíjeny a řízeny způsobem, který ochrání lidstvo. Umělá inteligence je základní technologie. To je soud pro mnoho nově vznikajících technologií, které máme klepnout, studovat a hodnotit. Proto jsme velmi potěšeni, že dnes pořádáme diskusi s takto respektovanou skupinou řečníků z celé země, kteří brzy představí. Dnešní diskuse se bude zabývat následujícími otázkami: Jaké jsou stávající zásady, které vedou k rozvoji a používání umělé inteligence? Jaké jsou současné mezery ve správě AI a jak zajistíme odpovědné inovace vpřed, abychom souhlasili s stručným přehledem dnešní události. Krátce se dozvíme od tajemníka Cartera, který předkládá úvodní poznámky, pak Joeyho Ita, ředitele laboratoře MIT Media. Pak se vloupáme do krbového chatu s několika odborníky v terénu, uzavřeného otevřeným Q a sezením, než ho otočím. "

[image: google translate karkulka] [image: translation karkulka: 1, 2, 3, G] [image: amazon translate: transcript] [image: translation transcript: 1, 2, 3, G]

What does it cost?

Prices for translation APIs are usually billed for a number of characters of input. Some services, such as Microsoft's, provide discounted bundles for larger volumes of translation. Hence, the comparison table shows the best achievable price for given number of chars. For the use of more than 1 billion characters per month, Amazon and Google say they can provide individual pricing on demand.

Amazon and Microsoft allow to train your custom translator model on your own data, but it comes for an extra price and quality of such approach is initially unknown. You would necessarily need to test the quality of translation after the training yourself, and quite likely, you would need to repeat the training multiple times to achieve your desired quality improvement on your data domain.

How much does the custom training cost? In case of Microsoft, a custom translation splits down to extra max $300 for training a model and then $40 per million chars, instead of the original $10. With larger volumes (over 62.5M chars), custom model pricing gets more complicated. With Amazon, this is projected to a price of translation, that costs $60.00 per million chars, instead of the original $15. You would need to cover the computational resources needed for training on cloud, but we have not found more precise description of the process and pricing.

In the table below, we collect a pricing for standard, general translation, as of February 2021. In general, custom translations are 1.8-4 times higher.

References (as of February 2021): Microsoft, Amazon, Google.

Monthly volume 100k chars 1M chars 5M chars 10M chars 100M chars 500M chars
Amazon Translator Free Free $75.00 $150 $1,500 $7,500
Google Translate API Free $20 $100 $200 $2,000 $10,000
Microsoft Cognitive Free Free $50 $100 $1,000 $5,000

As you see, the price of translation varies significantly, even though the translation quality seems to be quite comparable.

Here are some examples with volumes of characters, that we estimated based on our own translation deployments:

  • Multilingual chat support service: allows the customers to communicate in their mother language: a service request message contains ~100-200 characters. Whole conversation usually contain ~1200 characters in average, so if you have, say, 10k chat support requests per month, and you do not use translator in any other service, it would monthly cost you $180 on Amazon, $120 on Microsoft, or $240 on Google Translate.

    This application is not as critical for translation quality, as some others, hence it is worth consider cutting down some costs on using cheaper translator, that still does the job. Importantly, the translation must be performed real-time, but this did not seem to be the problem in our tests: all translators were able to retrieve the response in 3 seconds, single-sentence texts usually took only cca 1sec.

  • E-shop localization: If you want to extend your customer base to other countries, it is a good start to localize your products description to native languages. We found that product descriptions on e-shops range between 3000-4000 characters, mainly depending on the price. So say you have an e-shop offering 5k products, that you wan to localize into another language. Additionally, your goods providers update the description of the product once per 3 months. This would monthly cost you $100 on Amazon, $67 on Microsoft, or $133 on Google Translate.

Deploy your own translator

Clearly, utilising standalone translator might have some cost-saving benefits, especially once you aim to utilise translation in a real-time application, such as chat service, or large-volume application, for example, for document processing. Let’s see what is the actual price of using such solution, on your own hardware, or in the cloud. Translation service can be deployed on your own hardware, or into cloud platform, such as AWS, just like any other application. Here’s a short review of performance scalability.

Deployment of a service utilising base Transformer architecture, that we used for translating aforementioned examples, consumes X GBs of RAM. Translation of a single sentence, or a sequence of words having say 20 words, takes X seconds on a single CPU, or X seconds on 32 CPUs. Say the input sequence has 200 words takes X seconds on a single CPU, or X seconds on 32 CPUs If the performance is still a concern, you can utilise distributed performance of graphical processing units (GPUs). Using a single GPU, the translation time shrink the single-CPU time by a ratio of X, to X seconds.

Train your own translator

If you are familiar with training neural networks, you could manage to train your own translation model and productise it, for example, as API, or simply as Python application. Note that, not only “language translation” is something you can tran your translator to do. Similarly, you can train your model to generate summarization, similarly to [this](link X), or essentially any other task, where both input and output are a textual sequence with known language domain.

We’ll share some tips on how to go around this task, although the approach might vary depending on your language and language domain specificity.

[image: translation training process]

  1. Find some parallel data sequences. OPUS is a good starting point, but it contains some biases, that are hard to eliminate without using additional data sets. In our experience, for example, Czech target sequences were quite short, which in consequence caused the model to omit some parts of longer input sequences on translation.
  2. Pick a model and initialise a vocabulary. Transformer model is a way to go, if you’d like to get your translation to a quality comparable to cloud services. Use [HuggingFace library](link X), that can provide you with the base model for your desired source language and a lovely, convenient interface for [training sequence-to-sequence](link X) models. There are [examples](link X) on how to train “seq2seq” model. Search for the models that are good at generating text, such as [mBART](link X). Then, replace the target vocabulary with a [SentencePiece](link X) vocabulary initialised from your target language sequences.
  3. Train your model. This one is indeed the tricky one, because a good performance on a single data set does not mean that the system is production-ready. Transformer models love to seek for heuristically shortcuts, such as the target shortening. Such problems can be avoided if you include at least two datasets to your validation monitoring. If the two are domain-orthogonal enough, the training [BLEU](link X) will keep increasing and loss decreasing, while it already starts to fail on another–domain dataset. If your target language is domain-limited, create a small, artificial parallel corpora, that talks about something else, than your training domain. Even as little as 50-100 sequence pairs will give you a good clue about how you’re doing.

[image: real evaluation loss]

  1. Evaluate your model thoroughly. Often even additional evaluation will not identify some embarrassing flaws of the system. If you manage to generalise them, return to point one, and balance the training data set so that the model is exposed to enough samples, where it makes an error, so it is able to correct itself. If you know what is a target domain of your system, actively seek for all the corner cases that you can come up with and evaluate your translator on it. Still, it is not a good idea to train and evaluate solemnly on your target domain, that could be quite specific - that could cause your translator to be very sensitive to input alternations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment