The Indian state of Odisha publishes online dictionaries in 21 indigenous languages

Screenshot from the Odisha Virtual Academy Website.

In 2018, the government of the Indian state of Odisha published 21 dictionaries in the state's 21 provincial indigenous languages. The dictionaries were developed in collaboration with native-speaking communities for planned implementation in multilingual primary education programs. The trilingual dictionaries, with indigenous language translations into English and Odia (the official language of Odisha), have been uploaded in August 2019 for public use in an online education portal managed by the government.

On October 17, all the dictionaries were relicensed by online education portal Odisha Virtual Academy under a Creative Commons Attribution 4.0 International License.

Sailesh Patnaik, a volunteer Wikipedia editor from the Creative Commons and Wikimedia affiliated Odia Wikimedians User Group communities contributed largely in working closely with the government departments to make this happen. Patnaik was also involved in the earlier efforts of migrating copyright to Creative Commons licenses for several social media accounts and websites run by the government.

Eighth Schedule of Indian Constitution

Eighth Schedule of the Indian Constitution provides the basis of the use of a group of 22 Indian languages in governance, education and cultural promotion. It also provides guidelines for the public services examinations to be conducted in these scheduled languages. The languages in the current list are Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Meitei, Marathi, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu and Urdu. Being listed in this list also helps the provinces to officially recognize those languages that are spoken locally for education and governance. There are requests to add 38 more languages to this list.

This is not the first time that the Odisha government has allowed its online resources to be shared under a free Creative Commons license. In 2017, the Odisha government made headlines as the first Indian state government to relicense eight of its social media under a free license. This allowed for the use of content to be used on openly-licensed platforms like Wikipedia and its other sister projects like Wikimedia Commons (multimedia library), Wikisource (online free library) and Wiktionary (online free dictionary). Eventually, Jnanaranjan Sahu, a Wikimedian from OWUG created a tool to easily migrate openly-licensed images from relicensed social media handles to Wikimedia Commons so that the images can be used on Wikipedia.

Subhashish Panigrahi from Rising Voices interviewed Ranjana Chopra who heads the Scheduled Caste and Scheduled Tribe Development, Minorities and Backward Classes Welfare Department of the Odisha government to learn about this project.

To provide some background to the development of the dictionaries, Chopra shared how the need for the bilingual indigenous-language dictionaries to provide multilingual education was the driving force behind this work. As there are 21 distinct indigenous languages that are spoken in the state and grassroots-level workers required trilingual proficiency, they faced immense obstacles because of the lack of such resources. Some of key organizations that were involved in the compilation were the Academy of Tribal Languages and Culture (ATLC), Scheduled Castes and Scheduled Tribes Research and Training Institute (SCSTRTI) and Special Development Councils (SDC) — all government entities of the state. The Museum of Tribal Art and Artifact in Odisha's capital city, Bhubaneswar, remains a resource center for visitors to learn about the indigenous peoples in Odisha and their languages and cultures.

Rising Voices (RV): Many of the languages are not well documented. How did you collaborate with the speaker communities to collect and compile the words? How did existing dictionaries help in this process?

Ranjana Chopra (RC): It is a fact that the indigenous languages have not been properly documented in the [Odisha] state although some sporadic attempts have been made over the years. Indigenous languages are full of dialectical variations. However, despite the variations, there are ‘nucleus’ areas where the “core language” is referred/spoken, although with some mixtures. While preparing bilingual dictionaries and trilingual proficiency modules, resource persons from the various nucleus areas were invited to work on the texts with a well-organized non-overlapping time plan. The ‘nucleus’ area and the relevant resource persons were identified through conducting workshops in respective language localities.

RV: Some of the languages, like Bonda (Remosam), are spoken by as little as 8,000 people. Half of the Bonda community live in remote villages. How do you plan multilingual education in their native languages in general, and particularly, how will you make these books available to them?

RC: As stated earlier, the texts prepared on the indigenous languages are meant to strengthen ongoing interventions in multingual education including such activities in the Bonda language. Language teachers engaged from the same language speaking communities are supposed to facilitate formal education through their native languages. These language texts (bi-lingual dictionaries and trilingual proficiency modules) would help them in delivering the requirements.

As Academy of Tribal Languages and Culture was in charge of developing these dictionaries and trilingual proficiency modules, it has already dispatched copies of these resources to relevant authorities and pockets where different development activities are going on. Copies are also provided to native-language teachers and front-line workers of line departments including the Accredited Social Health Activist (ASHA) who work in creating awareness among citizens for health planning and use of existing health services, and the Anganwadi Workers who educate people living in rural areas about basic health education including contraception and nutrition, also provide pre-school education.

The author also reached out to Secretary of the Odisha government's Electronics and Information Technology Department Manoj Kumar Mishra about their plans to make the resources available in Unicode. According to Wikipedia, “Unicode is a computing standard that is used for the consistent encoding, representation, and handling of text in most of the world's writing systems.” Before Unicode was made available, many scripts, especially in India, used legacy encoding standards which can make web search almost impossible whereas text typed in Unicode helps make searching and sharing text universal.

RC: As you might have noticed already, the content of the dictionaries are readable but not searchable yet. Is there a plan to also release them in Unicode so that they are universally discoverable, irrespective of devices and operating systems?

Manoj Kumar Mishra: We are committed to bringing all the resources available on any of the [Odisha] government websites under Unicode, so that the content would be searchable on the web and will reach to every single citizen of the state, residing anywhere. Currently, we are working to improve the infrastructure required to make the text and the Odia font to be compatible and readable on different devices. Over the coming months, we will work to bring the dictionaries searchable on the Internet. We are also exploring to add these files to Odia Wikisource, where the community resource will convert it to Unicode, and will automatically become searchable on the web and thereby making the rich treasure of our written heritage accessible to all

The dictionaries are useful for linguists and ethnographers to further develop resources for languages that do not have a written form. They will also be used to create multimedia content aimed at helping younger generations make more use of the language. India is home to more than 780 languages and approximately 220-250 languages have died over the last 50 years.

Start the conversation

Authors, please log in »


  • All comments are reviewed by a moderator. Do not submit your comment more than once or it may be identified as spam.
  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.