New algorithms and datasets have led to massive improvements in the range and quality of automated translations. Two leading providers, Google and Amazon, each provide hundreds of translation options. As a developer, leveraging these services lets you develop robust international applications by improving testing, copy, and automated systems.
Let's expand upon these use cases, starting with testing. Most digital systems accept some kind of text input. That input has to be encoded according to a character set. Some character sets, like ASCII and latin-1 (sometimes called "Extended ASCII") only support a limited set of languages. If you are using one of these restricted character sets, testing input in Arabic or Chinese would quickly reveal issues with these encodings. For global language support, use Unicode, which has characters for nearly anything, usually represented in the
utf-8 encoding. To test that your system reliably handles input from every target market, use a translation service to create reasonable inputs in a variety of languages and alphabets and include these in your automated test suite.
Translating your application into another language is perhaps the most obvious use for an online translation service. This is an important facet of localization, but it's not the whole story. Generally, a strong translation of substantial copy requires, at a minimum, a pass by a copyeditor fluent in both languages. That said, translations for one-word buttons, short headers, and other minor pieces of text can come from an online translation service, at least in the development stages.
Finally, there are more advanced applications of translation, where you want to translate text in response to a user's action or other event within your system. For this article, we will consider an automated support email distribution system. Imagine your company employs a global network of support staff for 24-hour email support coverage. Depending on the time of day, you may wish to translate support requests into Spanish, Chinese, Hindi, or any number of other languages, and possibly translate the responses back into the original message's language. Let's discuss the feasibility of such a system and how we might implement it using Amazon Translate or Google Cloud Translate.
Like most AWS features, the easiest way to get started with Amazon Translate is through the console. After logging in to your AWS account, a search for "Translate" will bring you to the Translate home page, which shows your usage metrics. Navigating by the left-hand menu bar should bring you to the real-time translation dashboard.
Using the console, you can translate up to 5,000 characters at a time between 25 languages, with almost every language pair supported. Most customer support emails are fairly short, so your support representative could take emails in unknown languages, paste them into the console, and receive a general translation right away. However, AWS offers better options.
Amazon Translate offers "Custom Terminology", which allows you to influence the outcome of a translation. Custom terminology can be uploaded as a CSV file where the columns represent languages, with the first column representing the input language. You can specify one case-sensitive word per row and its desired representation in any number of supported target languages.
Let's take a look at a quick example. My name, Philip, translates into German as Philipp. To ensure that the spelling remains correct, I uploaded the following as a terminology csv file.
As we can see, it's fixed! That said, Translate frequently resolves spelling issues by context. For example, "Apple" translates to "Apfel" by itself, but in the sentence "Apple makes computers," AWS translates "Apple" as a proper noun.
This powerful, customizable translation service is also accessible via an API. The API sends and receives JSON, with the same 5,000 character limit. As a developer, it up to you to handle splitting and concatenating text across multiple requests if you want to translate more than 5,000 contiguous characters.
For our hypothetical service, our architecture would read email from SES and trigger a translation. After translating the body of the email into the appropriate language, the system would send the original email and translated content to an available representative. The response would travel the same system in reverse, from representative to translation to email.
Google Translate is probably the best-known translation service on the market. Unlike Amazon, they offer a free console for anyone to perform a translation, also limited to 5,000 character chunks. This translation console is also available as a free app and lets you download languages to use offline. Furthermore, the Google Translate app integrates various text-to-speech, speech-to-text, Optical Character Recognition, and Handwriting Recognition features directly into the user interface where Amazon separates these features into separate services.
However, our use case requires automated translations. Google offers its translation technology as part of Google Cloud. To evaluate the service, you'll need a Google Cloud account. Once you have an account, you can follow the Quickstart guide to get started using the API in three steps.
Similar to AWS, you'll need to generate and link credentials before you can use the API to request translations. However, you may encounter usage errors if you try to translate too much, too quickly. Google provides default quotas to limit each account's translation activity. Amazon does not enforce quotas, they will let you translate as much as you want and then send you the bill.
Fortunately, you can adjust these quotas. If, for some reason, you need to translate more than a billion characters in a single day, you can do so if you're willing to pay for it (Google Translate costs 20 dollars per million characters). More frequently, you may want to reduce the quotas to avoid unanticipated charges and keep your spending in check, keeping in mind that whatever system you design will need graceful error handling if it encounters a quota-based error response to its translation request.
Using the API also enables Glossaries, which work the same as Amazon's Custom Terminologies. If you want more control and can invest in generating your own data, Google offers AutoML Translate. Essentially, you use their platform with your own datasets to train models for specific applications. For example, if our support platform frequently dealt with detailed bug reports with numerous technical phrases and industry-specific jargon, we could train a model to perform better translations within that domain.
Now that we have explored how to use each of the services, let's compare them directly on translation quality, capabilities, cost, and usability.
The most important facet of a translation service is actually translating text correctly. Google and Amazon do not reveal much information about their translation methodology, nor did I find a standard scoring metric for objectively comparing the quality of a large body of translation. Instead, I performed a small survey with a piece of sample text and six native speakers.
Remembering that we are building an automated translation system for a helpdesk, I wrote a typically vague technical support request. In this fictitious request, some kind of caching issue or database error is causing inconsistent behavior in a webapp.
There is a strange error on my user profile page. I tried to update my email, and I confirmed the new email from a message in my inbox. When I look at my profile, I see my new email, but when I look at my profile from a different computer I see the old address. Please fix this.
I translated this into Spanish, German, Chinese, Russian, Persian, and Hindi on both Google Cloud and Amazon Web Services. Six friends, each a native speaker of one of the above languages, did a blind comparison between the two translations in their respective language. The Russian and Spanish speakers both expressed a slight preference for the Google Cloud translation, the other four found the translations equivalent. While this is far from comprehensive evidence, it suggests to me that the translation quality between the services is roughly comparable.
Google does distinguish itself by offering over 100 languages to Amazon's 25, as well as a corresponding increase in language pairs. Both cloud platforms offer the potential of integrating translation with other services including sentiment analysis, text-to-speech, and speech-to-text. Amazon's Custom Terminology is met with Google's Glossary, and both provide a web console and API.
Amazon does win on price. Amazon Translate costs 15 dollars per million characters, while Google Cloud Translate costs 20 dollars for the same million characters. Both have a free offering. Amazon gives you 2 million free characters per month as part of the free tier, while Google offers half a million free characters per month, though their translation website and apps are of course free to use.
My personal recommendation is that if you're already using AWS, use Amazon Translate, and if you're already using Google Cloud, use Google Translate. If you're not using either and want quick, one-off translations, use the free consumer Google Translate. If you want to integrate translation into your application, choose Amazon for its easy setup, lower cost, and integration potential unless you need specific languages that only Google supports. Finally, for complex custom solutions, you can use Google's AutoML Translate platform to build a bespoke model.
Other competitors include IBM Watson, Alibaba Cloud, Baidu Cloud, Microsoft Translator, and several options by smaller companies. Depending on your workflow and existing technology stack, it may be better for you to integrate with one of these alternate providers, but generally Amazon and Google are the leading choices.
Even if you're not using the translation service, I recommend browsing Google's AutoML beginner's guide. The guide asks several questions that are essential to designing any system (phrased for designing a system with automated translation).
The first question that the guide asks is "What is the outcome you’re trying to achieve?" This question should inform your choice of technologies and thus your choice of translation service. It may be that paying human translators or using a combination of automated and human systems will help you reach your objectives. However, if you decide that automated translation is right for your product, you should now have the information that you need to make an informed choice about which service to use.
At the beginning of the article, I mentioned that translation services can help you generate realistic data for testing input fields and other user interactions. However, integrating automated translation into your product requires further testing. Of course, you will need automated tests to ensure that the system is functioning according to specification, but it would also be prudent to have people with the relevant bilingual fluency review a sample of the system's outputs. Finally, you'll want to use a service like WonderProxy to test that this global system is functional and localized worldwide.