- Rising Voices - https://rising.globalvoices.org -

A New Audio Uploading Tool for Crowdsourced Wiktionary Project in Odia Language

Posted 29 March 2017 1:25 GMT 1 · Written by Rezwan

Categories: Audio, Feature, Languages, Wiki

^[1]

A home recording setup for the Kathabhidhana project for Wiktionary. Image via Subhashish Panigrahi from Wikimedia Commons. CC BY-SA 4.0

Wiktionary, Wikipedia's multilingual sister project, promises a great deal. At present, there are not many open-licensed audio recordings that you can hear or download — especially if your mother tongue is not one of the major languages ^[2]. Wiktionary is already available in multiple languages and in addition to the definitions of the words, many phonetic notations — at least in terms of the International Phonetic Alphabet (IPA) — are available. Now, an Odia-language community project is helping to simplify the process of volunteer contributions to the Odia Wiktionary ^[3] project.

Kathabhidhana, a community project led by Global Voices contributor and Odia Wikipedian Subhashish Panigrahi ^[4], is an open-source solution for recording large chunks of words. It then uploads them under open licenses so that they can be useful for projects like Wiktionary.

Odia ^[5], one of the state languages in India, is a Indo-Aryan language that is spoken mostly in eastern India by around 40 million native speakers. With over 5,000 years ^[6] of literary heritage, it has been recognized as one of the oldest South Asian languages, and has been given the status of a classical language ^[7] by the Indian government.

But thanks to the use of non-Unicode-based typing systems, the language's online presence is still lagging behind. To address these issues, a bunch of character encoding converters ^[8] that change typed text to Unicode using various non-Unicode encoding systems, are incorporated in Odia Wikipedia ^[9]; it now has more than 12,000 entries. The Odia Wiktionary, on the other hand, as a free, online-based and completely crowdsourced dictionary in the Odia language, is trying to bridge the gap.

The project draws its inspiration largely from other open-source software ^[10] created by Shrinivasan T ^[11], who used Python programming language to automate and simplify the process. He posted this tutorial on YouTube:

Panigrahi was inspired to do the Kathabhidhana project because the existing method ^[12] was a cumbersome process: you have to pronounce and record a word, then export it in Ogg Vorbis format to your Wikimedia Commons account, which is a central repository of media files for all Wikimedia projects. Once uploaded, the entry is added to the Wiktionary project. Apart from manually recording pronunciation, there is also an open-source text-to-speech project called Dhvani ^[13] that works for most Indian languages.

In contrast, having audio recordings of words in Wiktionary helps non-native speakers — as well as people with visual disabilities — listen to the pronunciation of different words. The word library can also be used for several Natural Language Processing ^[14] projects, like building text-to-speech ^[15] and speech-to-speech ^[16] engines.

You can download a copy ^[17] of Kathabhidhana and find all the audio recordings ^[18] made using this software.