stefanik12/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Cloud translation services (and their alternatives) for automated use

When it comes to translation, Google Translate became a tool of everyday
professional use perhaps for most of the people and hence it is also a first
choice that comes to my mind when looking for a service to use in a larger-scale
application. However, one quickly finds that there are actually many big
corporate providers of translation APIs that support more than a hundred
of languages.
| Not surprisingly though, providers of paid translation API vary in both quality and price.
In this article, I'll share our experience in an overview of commercially-available
translation services, that you could possibly use in your app, or service.
Particularly, we’ll take a look at Google Translate, Amazon Translator
and Microsoft Translator. Additionally, we compare these translation engines
to the freely-available options: we’ll outline what you might need to create
a translation engine yourself to give you an idea of how deep you need to dive
in, to create a useful translation engine yourself. We’ll acknowledge both the
benefits and drawbacks of keeping the translation engine in your hands.
Modern machine translation: how do we translate in 2021

Before picking the right examples for comparison, let’s just take a look at
what kind of data modern translation systems use to model the translation
from one language to another.
The selection of the training data, just as the neural model architecture,
that each of the services use, is kept a secret by their providers.
It’s also quite likely that some services use proprietary data sources that,
for example, Google surely has available from their web crawls or other services.
So what difference can it make, quality-wise?
Neural language models, that are widely utilised nowadays by all the major
translation services, require a large set of parallel corporas that contain
aligned sequences in both source language and target language. Language model
then learns the complicated relation that maps the source sequence to target sequence.
[image: BART objective: noisy to neat language output]
caption: language models are pre-trained for general understanding of
the language, via relatively simple objective, such as language modeling,
where model is trained to predict the next word in the sequence.
Better language generation quality can then be achieved with more complicated
training objectives, such as denoising, where model is trained to reorder
the random sequence of words to an original sequence.
caption: parallel corporas, contain a large volume of aligned text pairs.
Translator is then trained to map lefthand texts to righthand pairs.
Such corporas are sometimes created as a side product of verbal transcripts,
such as Paracrawl,
technical documentation,
or subtitles.
The volume of sequence pairs needed to train a useful neural translation
algorithm nowadays starts at small hundreds on thousands of pairs.
[image: source->target]
Let’s translate!

Once the model is trained, a quality of the actual translation heavily depend
on what kind of input text you translate and how well the model understands
the domain. In order to get a meaningful and coherent translation of the text
from some arbitrary domain, this domain should be represented in the training data.
Neural machine translation models made a huge step towards generalization,
ahead of their ancestor statistical translators, that, for example, Google Used until 2018.
However, still it is good to be aware that, for example, a translator trained
solemnly on a domain of law discourses, such as ParaCrawl, will perform
poorly on, for example medical domain.
| Being aware of the available data sources and their domains, we’ve tried
to evaluate the translation on as “harsh” examples as we could come up with.
We pick two samples from two rather diverse domains, one of the fairytale
of Little Red Riding Hood story, with rather unlikely sequential composition
of storyline and a meeting transcript, generated from speech-to-text system,
with its original flaws.
We also compare an output of our own translation model, that
we have trained on a freely-available data sources of OPUS,
that uses the model that we’ll describe below.
In each of the translation outputs, we denote the factual flaws of each of
the translation - these are the parts of the output, that contain either
misleading, or wrong information, regarding the input.
After getting over the tedious registrations, you can try the translators yourself!
Here are the links: Google Translate,
Amazon Translator,
Microsoft Cognitive Translator.
Gauss OPUS Translator.
Translation experiments

In the first experiment, we collect a snippet of The Little Red Riding Hood failytale in
two distinct source languages:
Czech and Spanish, and evaluate the output in English, to allow you to compare.
####1st Input text: Little Red Riding Hood in Czech:
"Žila jednou jedna hodná a milá dívenka.
Všichni ji měli velice rádi a ze všech nejvíce maminka s babičkou.
Babička jí ušila červený čepeček a podle něj jí začali říkat Červená Karkulka.
Babička bydlela na samotě u lesa, kde široko daleko nebyla žádná jiná chaloupka.
Babička se tam starala o lesní zvířátka. Jednou v létě maminka napekla bábovku,
do košíku přidala láhev vína a řekla Karkulce:
„Babička má dneska svátek. Vezmi košík a zanes ho k babičce do chaloupky.
Ale jdi rovnou, ať se v lese nezatouláš!“
Karkulka pokývala hlavou a zvesela se vydala na cestu.
Cestu si zpříjemnila poskakováním a broukáním si."

Google Translate:

"There once lived a good and nice girl. Everyone loved her very much and mother
and grandmother most of all. Grandmother sewed her a red cap and
according to him
they started calling her Little Red Riding Hood. Grandmother lived alone in
a forest where there was no other cottage far and wide. Grandmother took care
of the forest animals there. One summer, my mother baked a cake, added
a bottle of wine to the basket, and said to Hood, "Grandma has a holiday today.
Take the basket and take it to Grandma's cottage. But go straight, so you don't
wander in the woods! ” Hood nodded and cheered merrily.
She made her trip more enjoyable by bouncing and humming."

Amazon Translator:

"There was one good and nice girl living once. Everyone liked her very much
and most of all mum with grandmother. Grandma sewed her red beanie, and
according to him they started calling her Little Red Riding Hood.
Grandma lived secluded by the woods, where far and wide there was
no other house. Grandma took care of the woodland critters there.
One summer, Mum baked a candy, added a bottle of wine to her basket
and told Riding Hood: “Grandma has a holiday today. Take the basket and take
it to Grandma's house. But go straight so you don't stray in the woods!
“The Riding Hood boiled his head and went on a journey.
She made her journey more enjoyable by hopping and buzzing."

Microsoft Cognitive Translator:

"There was a good, nice girl once. Everyone loved her very much and most of all
mum and grandmother. Her grandmother made her a red hat, and according to him,
they started calling her Little Red Riding Hood. My grandmother lived alone
by the forest, where there was no other cottage far and wide. Grandma took care
of the forest animals there. One summer, my mother baked a cake, added
a bottle of wine to her basket, and told Little Red: "Grandma**'s having**
a holiday today. Take the basket and take it to grandma's cottage. But go straight
so you don't wander in the woods!" Little Red nodded her head and made her way
cheerfully. She made her journey more pleasant by bouncing and humming."

Gauss Translator - trained on open OPUS data, for each lang pair

"There was one good girl who lived there. They all loved her very much and most
of them were mommy and grandmother made her a red hat and he said they
started calling her Red Riding Hood. Grandma lived alone in the woods, where
there was no other cottage nearby. Grandma took care of the forest animals there.
One summer mommy baked a cake, added a bottle of wine to the basket and said
to the Riding Hood: "Grandma has a holiday today. Take the basket and take it
to her grandmother in the cottage. But go straight to the forest and
don't wander!" She nodded her head and hurried on her way. She made the journey
more pleasant by bouncing and humming.
"
####2nd Input text: English transcript
In the second experiment, we translate the meeting transcript, originally written
in English. We are aware that speech-to-text systems for English overperform
the systems in other languages, hence the performance with transcripts in other
languages would perhaps be more biased by the transcript quality.
Again, we denote the factual and misleading parts of translation, with the native
speakers. You'll have to believe us with this one, if you do not speak Czech :)
"The tap project, led by former Secretary of Defense Ash Carter and on the
West Coast by annual manual, is an effort to ensure that emerging technologies
are both developed and managed in ways that protect humanity. Artificial
intelligence is a foundational technology. That s court to many of the emerging
technologies that we have tap, study and evaluate. Therefore, we re so pleased
to be hosting today s of discussion with such a respected group of speakers
from across the country who will introduce shortly. Today s discussion will
address the following questions. What are the existing principles guiding
the development and use of AI? What are the current gaps in AI governance,
and how do we ensure responsible innovation moving forward to give you agree
with brief overview of today s event? We ll hear briefly from Secretary Carter
providing opening remarks, then Joey Ito, director of the M I T Media lab.
Then we ll break into a fireside chat with several experts in the field,
enclosed with an open Q and a session before I turn it over."

Google Translate:

"Projekt tap, vedený bývalým ministrem obrany Ashem Carterem a
každoročním manuálem na západním pobřeží, je snahou zajistit,
aby vznikající technologie byly vyvíjeny a řízeny způsoby,
které chrání lidstvo. Umělá inteligence je základní technologie.
To je soud pro mnoho nově vznikajících technologií, které využíváme,
studujeme a hodnotíme. Proto nás velmi těší, že dnes provádíme diskusi s tak
respektovanou skupinou řečníků z celé země, kteří se brzy představí.
Dnešní diskuse se bude zabývat následujícími otázkami. Jaké jsou stávající
principy, kterými se řídí vývoj a používání AI? Jaké jsou současné mezery
ve správě AI a jak zajistíme, aby se odpovědné inovace pohybovaly kupředu
a poskytly vám souhlas s krátkým přehledem dnešní události? Krátce uslyšíme
od tajemníka Cartera úvodní poznámky, poté Joey Ito, ředitel laboratoře M I T
Media. Pak se vloupáme do krbu s několika odborníky v oboru, uzavřeným
otevřeným Q a relací, než to otočím.
"

Amazon Translator:

"Projekt kohoutku pod vedením bývalého ministra obrany Ashe Cartera a
každoroční příručky na západním pobřeží je snahou zajistit, aby vznikající
technologie byly vyvíjeny a řízeny způsobem, který chrání lidstvo.
Umělá inteligence je základní technologie. To je soud mnoha vznikajících
technologií, které máme kohoutek, studovat a hodnotit. Proto jsme rádi, že
jsme hostitelem dnešní diskuse s takovou respektovanou skupinou řečníků
z celé země, kteří brzy představí. Dnešní diskuse se bude zabývat následujícími
otázkami. Jaké jsou stávající zásady, které řídí vývoj a používání
umělé inteligence? Jaké jsou současné mezery ve správě umělé inteligence
a jak zajistíme, aby odpovědné inovace pokročily, abychom vám poskytli
souhlas se stručným přehledem dnešní události? Krátce uslyšíme od ministra
Carterové, který předává úvodní poznámky, pak Joey Ito, ředitel laboratoře
M I T Media. Pak se vloupneme do chatu u krbu s několika odborníky v terénu,
uzavřený s otevřeným Q a relací, než ho otočím.
"

Microsoft Congitive Translator:

"Projekt kohoutku, vedený bývalým ministrem obrany Ashem Carterem
a na západním pobřeží každoroční příručkou, je snahou zajistit,
aby nově vznikající technologie byly vyvíjeny a řízeny způsobem,
který chrání lidstvo. Umělá inteligence je základní technologie.
To je soud pro mnoho nově vznikajících technologií, které máme kohoutek,
studium a hodnocení. Proto jsme tak rádi, že budeme hostit dnešní diskusi
s tak respektovanou skupinou řečníků z celé země, kteří se brzy představí.
Dnešní rozprava se bude zabývat následujícími otázkami. Jaké jsou stávající
zásady, kterými se řídí vývoj a používání aiti? Jaké jsou současné nedostatky
ve správě ai a jak zajistíme, aby se odpovědné inovace posunuly kupředu,
abychom vám poskytli stručný přehled dnešního s. Krátce uslyšíme od tajemníka
Cartera, který poskytl úvodní poznámky, pak Joeyho Ita, ředitele laboratoře
M I T Media. Pak se vloupáme do rozhovoru u krbu s několika odborníky v terénu,
uzavřený s otevřeným Q a zasedání, než jsem se obrátit.
"

Gauss Translator - trained on open OPUS data, for each lang pair

"Projekt stepování, vedený bývalým ministrem obrany Ash Carterem a na
západním pobřeží každoroční příručkou, je snahou zajistit, aby nové
technologie byly vyvíjeny a řízeny způsobem, který ochrání lidstvo.
Umělá inteligence je základní technologie. To je soud pro mnoho nově
vznikajících technologií, které máme klepnout, studovat a hodnotit. Proto
jsme velmi potěšeni, že dnes pořádáme diskusi s takto respektovanou
skupinou řečníků z celé země, kteří brzy představí. Dnešní diskuse se
bude zabývat následujícími otázkami: Jaké jsou stávající zásady, které
vedou k rozvoji a používání umělé inteligence? Jaké jsou současné mezery
ve správě AI a jak zajistíme odpovědné inovace vpřed, abychom souhlasili
s stručným přehledem dnešní události. Krátce se dozvíme od tajemníka Cartera,
který předkládá úvodní poznámky, pak Joeyho Ita, ředitele laboratoře
MIT Media. Pak se vloupáme do krbového chatu s několika odborníky v terénu,
uzavřeného otevřeným Q a sezením, než ho otočím.
"
[image: google translate karkulka]
[image: translation karkulka: 1, 2, 3, G]
[image: amazon translate: transcript]
[image: translation transcript: 1, 2, 3, G]
What does it cost?

Prices for translation APIs are usually billed for a number of characters of input.
Some services, such as Microsoft's, provide discounted bundles for larger volumes
of translation.
Hence, the comparison table shows the best achievable price for given number of chars.
For the use of more than 1 billion characters per month, Amazon and Google
say they can provide individual pricing on demand.
Amazon and Microsoft allow to train your custom translator model on your own data,
but it comes for an extra price and quality of such approach is initially unknown.
You would necessarily need to test the quality of translation after the training
yourself, and quite likely, you would need to repeat the training multiple times
to achieve your desired quality improvement on your data domain.
How much does the custom training cost?
In case of Microsoft, a custom translation splits down to extra max $300 for training
a model and then $40 per million chars, instead of the original $10.
With larger volumes (over 62.5M chars), custom model pricing gets more complicated.
With Amazon, this is projected to a price of translation, that costs
$60.00 per million chars, instead of the original $15. You would need to cover
the computational resources needed for training on cloud, but we have not found
more precise description of the process and pricing.
In the table below, we collect a pricing for standard, general translation, as of
February 2021.
In general, custom translations are 1.8-4 times higher.
References (as of February 2021):
Microsoft,
Amazon,
Google.


Monthly volume
100k chars
1M chars
5M chars
10M chars
100M chars
500M chars


Amazon Translator
Free
Free
$75.00
$150
$1,500
$7,500


Google Translate API
Free
$20
$100
$200
$2,000
$10,000


Microsoft Cognitive
Free
Free
$50
$100
$1,000
$5,000


As you see, the price of translation varies significantly,
even though the translation quality seems to be quite comparable.
Here are some examples with volumes of characters,
that we estimated based on our own translation deployments:


Multilingual chat support service: allows the customers
to communicate in their mother language: a service request message
contains ~100-200 characters. Whole conversation usually contain ~1200 characters
in average, so if you have, say, 10k chat support requests per month, and you
do not use translator in any other service, it would monthly
cost you $180 on Amazon, $120 on Microsoft, or $240 on Google Translate.
This application is not as critical for translation quality, as some others,
hence it is worth consider cutting down some costs on using cheaper translator,
that still does the job. Importantly, the translation must be performed real-time,
but this did not seem to be the problem in our tests: all translators were able
to retrieve the response in 3 seconds, single-sentence texts usually took only
cca 1sec.


E-shop localization: If you want to extend your customer base to other countries,
it is a good start to localize your products description to native languages.
We found that product descriptions on e-shops range between 3000-4000 characters,
mainly depending on the price. So say you have an e-shop offering 5k products,
that you wan to localize into another language. Additionally, your goods providers
update the description of the product once per 3 months. This would monthly cost you
$100 on Amazon, $67 on Microsoft, or $133 on Google Translate.


Deploy your own translator

Clearly, utilising standalone translator might have some cost-saving
benefits, especially once you aim to utilise translation in a real-time application,
such as chat service, or large-volume application, for example, for document processing.
Let’s see what is the actual price of using such solution, on your own hardware,
or in the cloud.
Translation service can be deployed on your own hardware, or into cloud platform,
such as AWS, just like any other application.
Here’s a short review of performance scalability.
Deployment of a service utilising base Transformer architecture, that we used
for translating aforementioned examples, consumes X GBs of RAM. Translation
of a single sentence, or a sequence of words having say 20 words, takes X seconds
on a single CPU, or X seconds on 32 CPUs. Say the input sequence has 200 words
takes X seconds on a single CPU, or X seconds on 32 CPUs
If the performance is still a concern, you can utilise distributed performance
of graphical processing units (GPUs). Using a single GPU, the translation time
shrink the single-CPU time by a ratio of X, to X seconds.
Train your own translator

If you are familiar with training neural networks, you could manage to train your
own translation model and productise it, for example, as API, or simply as
Python application. Note that, not only “language translation” is something you
can tran your translator to do. Similarly, you can train your model to generate
summarization, similarly to [this](link X), or essentially any other task,
where both input and output are a textual sequence with known language domain.
We’ll share some tips on how to go around this task, although the approach might
vary depending on your language and language domain specificity.
[image: translation training process]

Find some parallel data sequences. OPUS is a good starting point, but it contains some biases, that are hard to eliminate without using additional data sets. In our experience, for example, Czech target sequences were quite short, which in consequence caused the model to omit some parts of longer input sequences on translation.
Pick a model and initialise a vocabulary. Transformer model is a way to go, if you’d like to get your translation to a quality comparable to cloud services. Use [HuggingFace library](link X), that can provide you with the base model for your desired source language and a lovely, convenient interface for [training sequence-to-sequence](link X) models. There are [examples](link X) on how to train “seq2seq” model. Search for the models that are good at generating text, such as [mBART](link X). Then, replace the target vocabulary with a [SentencePiece](link X) vocabulary initialised from your target language sequences.
Train your model. This one is indeed the tricky one, because a good performance on a single data set does not mean that the system is production-ready. Transformer models love to seek for heuristically shortcuts, such as the target shortening. Such problems can be avoided if you include at least two datasets to your validation monitoring. If the two are domain-orthogonal enough, the training [BLEU](link X) will keep increasing and loss decreasing, while it already starts to fail on another–domain dataset. If your target language is domain-limited, create a small, artificial parallel corpora, that talks about something else, than your training domain. Even as little as 50-100 sequence pairs will give you a good clue about how you’re doing.

[image: real evaluation loss]

Evaluate your model thoroughly. Often even additional evaluation will not identify some embarrassing flaws of the system. If you manage to generalise them, return to point one, and balance the training data set so that the model is exposed to enough samples, where it makes an error, so it is able to correct itself.
If you know what is a target domain of your system, actively seek for all the corner cases that you can come up with and evaluate your translator on it. Still, it is not a good idea to train and evaluate solemnly on your target domain, that could be quite specific - that could cause your translator to be very sensitive to input alternations.
Monthly volume	100k chars	1M chars	5M chars	10M chars	100M chars	500M chars
Amazon Translator	Free	Free	$75.00	$150	$1,500	$7,500
Google Translate API	Free	$20	$100	$200	$2,000	$10,000
Microsoft Cognitive	Free	Free	$50	$100	$1,000	$5,000