Last active
November 9, 2022 19:35
-
-
Save dyerrington/70cb9b55ef2dd34f484d879ae45c5b3b to your computer and use it in GitHub Desktop.
Google Translate API demo tested with Python 3.9.x. I want to say this may not work so well with Python 3.10 for some reason but if you follow the guide I referenced otherwise, you should be in business. Highly recommended that you create a new Python environment before engaging with any serious development if you haven't done so.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
One limitation with
TextBlob
sentence tokenizer is that it really only works great in latin-based languages and stumbles a bit with multi-byte punctuation such as Cyrillic, Hanzi/Phono-semantic, and Asian-based UTF-8 strings. This is where spaCy is a better choice but this requires a bit more planning to setup and execute since you have to load more libraries and deal with context a bit more selectively. So, if you have a specific care you want to handle, you should be able to extend the above examples with a switch statement to use better sentence handling prior to translation.Here's a good starting point if you want better sentence handling for non-latin based languages:
https://spacy.io/api/sentencizer
An example of using this (on English at least):