There is tremendous interest in Artificial Intelligence (AI) technologies that can interpret human language in various ways. Chatbots, translation software, and intelligent assistants like Siri or Alexa all depend on an understanding of how we use language.
A few weeks ago, the non-profit research company OpenAI published a paper demonstrating their new approach to computational modeling of the English language. The authors give a number of examples of computer-generated texts from their new model that they say are indistinguishable from what a human might produce. OpenAI has decided not to release the model for fear that it might be misused, e.g. to create malicious bots on Twitter or other social media platforms.
As always, this could mean both good news and bad news for indigenous and minority languages. The good news is that these models don't require any special data to train; they can be built as long as programmers have a sufficiently large collection of text in the target language (e.g. from Wikipedia). Once a model is created it can be used to develop more advanced AI technologies in that language.
A few bits of bad news: first, these models require a huge amount of text to train, many millions or billions of words, and significant computational horsepower. Second, research in this area continues to focus almost 100% on English and there are no guarantees that the same models will work as well for more complex languages. And it is always important to remember that these technologies can be used as easily to benefit the public interest as they can to facilitate data collection by technology companies. Nevertheless, we hope to see more research in this area for under-resourced languages, and eventually AI technologies that support these languages.
Listen to the ‘Last Whispers’ of dying languages
A centuries-old Zen riddle asks: What is the sound of one hand clapping? It is intended to free a person from their regular state of mind, in order to allow another way of thinking.
When multimedia visual artist Lena Herzog looked at the world’s linguistic biodiversity, and took measure of the alarming rate at which languages go extinct, she decided to represent endangered languages in a radically different way.
She assembled recordings of nearly or already extinct languages with drone footage of natural sites, to produce what she describes as a “45 minute-long immersive oratorio.” The result, called Last Whispers, is a haunting audio-visual journey in black and white through a forest of sounds articulated in Tehuelche, Nivkh, Nahuatl, Warlpiri, Ainu, Koyukon, Nafsan, Jul’hoan, Surel, Ongota, and Qaqet. When those languages lose their last speakers, listeners find themselves in a dead-silent forest.
Mapping Australia's first languages
First Languages Australia recently relaunched their Gambay online resource that displays preferred names of Aboriginal and Torres Strait Islander languages and their respective locations across the country. The word Gambay means “together” in the Butchulla language. Built on Mapbox, an open-source mapping platform, the map allows users to locate language centers that are on the front lines of language revitalization efforts, along with better options for embedding digital media from a variety of sources.