With an estimated 100 million speakers, Swahili is the second-most-widely-used language on the African continent, after Arabic. Yet services such as automatic speech recognition (ASR) aren’t commercially available in this language, denying many users with disabilities and those who aren't literate the information they desperately need in their daily lives. This could change very soon though, as academic research and technology startups are converging to provide localized technologies to Swahili speakers.
Internet search on a simple mobile phone
One of these very promising innovations is about to be rolled out in Kenya. Uliza (meaning “ask” in Swahili) is a voice interface that allows users to access information from the Internet using a basic mobile phone.
All users need to do is call in and ask a question in Swahili. Within 15 to 90 minutes, an “answer agent” (an actual person working behind the scenes) responds with a voice answer. At the moment, a “crowd” of around 50 agents treat the queries by transcribing the voice recordings, searching for answers online in multiple languages, translating the information and sending it back to the caller in Swahili.
During the pilot carried out in the Kenyan capital Nairobi and in Western Kenya, some 600 beta users sent questions about their local representatives, asked for help with Swahili homework, and requested medical information that would be too delicate to bring up in person.
During Uliza's pilot project, these were the words included in the most frequent questions asked by users (translated from Swahili into English).
Uliza will solve another problem for its future users: the lack of access to information hosted on the Internet. There are many overlapping reasons for this situation: unaffordable mobile data bundles, distance to the nearest cybercafe, illiteracy in languages of wider communication, compounded by a dearth of available content in local languages.
Not a tech problem
Uliza's crowdsourcing model is admittedly labor-intensive but it has a major advantage: by having human beings handle the transcription and translation, it temporarily bypasses the lack of large voice datasets that typically constrains ASR efforts in African languages, while simultaneously collecting data from real speakers in a variety of accents and dialects. Uliza founder Grant Bridgman plans to use this database of short recordings and transcriptions to build machine learning capability and fully automate the system in the future. In this talk at Tufts University, Bridgman introduced the concept behind the project:
A good amount of research has already gone into building automatic speech recognition software for Swahili and other widely spoken African languages, but it’s taking a while for the technology to find its way into people’s hands. In an interview with Global Voices, Bridgman explained:
The technology exists and all of this is already available for first world languages, now we need to find a commercial model to make it viable for low-resource languages.
Companies looking to set up helplines for a rural customer base at a reduced cost are potential customers for the initial phase of Uliza’s growth. Eventually, a full service allowing mobile phone users without access to the Internet to find answers to their questions and upload their own voice content will be implemented. The cost to the user would be minimal — close to the price of an SMS.
Uliza’s model could be viable for other languages with a large enough number of speakers. But for the vast majority of the 2,000 languages spoken on the African continent, this isn't the case. But solutions might be coming from a research project led by Preethi Jyothi at the Beckman Institute, where a team of researchers used a probabilistic method to crowdsource transcriptions from non-native speakers. Once fine-tuned, probabilistic transcription could open up the possibility of ASR for less-represented languages, hopefully at a reasonable cost.