Lost in translation: Why Google Translate often gets Yorùbá — and other languages — wrong

Wikimedia User Group Nigeria, October 2018 via Wikimedia Commons CC.BY.2.0.

The English language has dominated online discourse as the “universal” language of communication since the inception of the internet. As of February 2020, over half of the websites on the internet are in English, according to WebTech3.

But as more people get online who speak different languages, it has sparked a linguistic digital revolution — immediate access to English translations of multiple languages with the click of a button.

Many tech companies have recently put effort into documenting non-English words on the internet, paving the way for the digitization of multiple languages. Google, Yoruba Names, Masakhane MT and  ALC are examples of companies and start-ups that have been trying to marry technology with non-English languages.

In late February 2020, Google announced that it would add five new languages to its Google Translate services, including Kinyarwanda, Uighur, Tatar, Turkmen and Odia, after a four-year hiatus on adding new languages.

A man looks perplexed while reading a text online. Photo by Oladimeji Ajegbile, open-source via Pexels.

But have you ever clicked on the translation option and realized that the English translation is, at best, just OK? And at worst, not accurate at all?

There are many controversies and difficulties when it comes to doing this kind of language translation and access work.

Twitter offers Yorùbá language translation into English via Google Translate as much as possible, and usually, the outcome isn’t totally bad — perhaps a few words are correct.

The reason for these challenges is that tech companies usually collect their linguistic data for English translation sourced from the internet. This data may work for some languages, but languages like Yorùbá and Ìgbò, two main languages from Nigeria, are challenging, due to the inadequate or inaccurate accent marks to indicate tones on these words.

In response to why it has taken Google four years to add five new languages, a company spokesperson explained:

 Google Translate learns from existing translations found on the web, and when languages don’t have an abundance of web content, it’s been difficult for our system to support them effectively. … However, due to recent advances in our machine learning technology, and active involvement from our Google Translate Community members, we’ve been able to add support for these languages.

Also, most people are not so good with the orthographies — or spellings — in these languages. As a result, good translations don’t compute because these errors are not flagged as inadequate.

Most translations done by machines render some words wrong, especially words that are culturally nuanced. For example, Yorùbá words ayaba and obabìnrin have their meanings situated in a cultural context. Most machines translate both words as “queen.” However, from a traditional-cum-cultural vantage point, it is essential to note that the meanings of ayaba and obabìnrin are different: Ọbabìnrin means “queen” in English while ayaba is “wife of the king.”

Even with these translation complications, technology has helped with the advancement of  African languages in digital spaces, spurring the coinage of new words. African languages have grown with the influx of new gadgets like smartphones and tablets, as new words are coined to name these new technological tools and concepts. This process has thus expanded the usage and functionality of these languages.

With the emergence of new technologies, the vocabularies of many African languages have become more sophisticated. For instance, the Yorùbá language has some tech-influenced words such as erọ amúlétutù (“air conditioner”), erọ Ìbánisọ̀rọ̀ (“phone”) and erọ Ìlọta (“grinder”). Similarly, the Igbo language has words such as ekwè nti (“telephone”) and ugbọ̀ àlà (“vehicle”). These societies have given these gadgets names based on the functions they perform.

In courses on broadcasting and advertisement in Yorùbá, students learn that most people call TV erọ Amóhùnmáwòrán. This coinage generates many questions and opinions — some students argue that video cameras and recorders can also be called erọ amóhùnmáwòrán based on their functionalities.

These linguistic challenges in the tech space are healthy for languages — it stimulates critical thinking for both linguistic and tech advancement.

In 2019, Google opened its first AI research center in Accra, Ghana, focused on improving “Google Translate's ability to capture African languages more precisely,” according to CNN. Research scientist Moustapha Cisse, who heads Google's AI work in Africa, believes that “a continent with more than 2,000 dialects deserves to be better served,” as reported by CNN.

Mozilla and BMZ recently announced their cooperation to open up voice technology for African languages. With initiatives like this, there is more to show in the future with regards to studies in African languages.

Start the conversation

Authors, please log in »

Guidelines

  • All comments are reviewed by a moderator. Do not submit your comment more than once or it may be identified as spam.
  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.