CLC 2014: Coding for Language Communities

This article is republished with permission from the Global Native Networks blog.

Almost 26 years ago, Gayatri Chakravorty Spivak interrogated the politics of representation in her iconic post-colonialist essay, “Can the Subaltern Speak?” I revisited the essay when I learned about an upcoming workshop to equip underrepresented language speakers with tools to ensure their languages are supported by digital technologies. While the “Coding for Language Communities” 2014 Summer School presents an important opportunity to create a more inclusive digital space, can it avoid re-inscribing the Western paradigms of expression and selfhood that govern the Internet? Taking a cue from Spivak, I ask, can the subaltern code? 

It’s no secret that a handful of languages dominate the online space. A paper published last October by András Kornai in the journal PLOSOne (aptly titled “Digital Language Death”) found that less than five percent of the current world languages are in use online.

Offline, around 7,776 languages are in use today. To determine how many were in use online, Kornai developed a program to crawl top-level Web domains and document the number of words in each language. The results are, according to Kornai, “evidence of a massive die-off caused by the digital divide.”

Faced with these numbers, it’s hard not to conclude that new digital tools perpetrate structural damage on indigenous peoples. While new technologies often confer immense benefits to local communities’ livelihoods, health, and cultural preservation, their interfaces are designed with assumptions about language, intellectual property, and cultural norms of sharing and communication. As such, the Internet — and those developing it — continues the epistemic violence that accelerates the destruction of non-Western ways of seeing the world.

Enter the CIDLeS Summer School 2014: Coding for Language Communities, a conference that aims to give everyone the ability to engage digital spaces in their mother tongues. According to the CIDLes website, the conference will bring together three groups:

  • Speakers of languages that are currently not supported by language technologies and that want to use their language on electronic devices;

  • Students of linguistics and language-related disciplines interested in learning about software development;

  • Software developers and students of computational sciences that are interested in supporting under-resourced languages by technological means.

The Summer School, which will take place August 11th – 15th within the “Parque Natural das Serras de Aire e Candeeiros” near Minde, Portugal, brings together some heavy-hitters in the growing field of expanding linguistic diversity online. These experts will serve as the mentors for summer school participants.

For example, Kevin Scannell, best known for his Indigenous Tweets project, will help students “turn corpus data into spellcheckers.” Students will learn how to crawl, clean and tokenize data from the web, then generate frequency lists, add morphology, all the way to packaging up Firefox/OpenOffice extensions.

Bruce Birch, a linguist working on languages of Iwaidjan family spoken in Arnhem Land, will work on a mobile app for crowdsourced data collection and publishing. Among his ideas are apps for lexical data, phrasebooks, stories and other data from endangered languages. You will learn how to develop a database for the content and how to enable users to edit and share their content.

A colleague working as a Software Engineer in San Francisco once told me, “coding will set you free.” Spivak — who ultimately arrived at the conclusion that the subaltern could not speak — might disagree. She argued that even the most benevolent attempts by intellectuals to “give voice to the voiceless” necessarily took place within the context of a broader colonial project. Spivak might make similar claims about even well-intentioned interventions by techno-wizards to carve out spaces of representation for marginalized people online. Using the methods proposed by CLC 2014, can we inject Silicon Valley with counter-discourses that contest dominant digital discourses?

Many popular modes of Internet engagement (email, Facebook, Twitter, etc) remain fundamentally textual practices; however, some indigenous and minority languages find expression principally through the spoken word (see an earlier article in the context of Inuktitut and Twitter). How can we envision a more inclusive digital space that makes room for both spoken and written languages? 

All and all, CLC 2014 strikes me as a practical first step in creating a cadre of minority language speakers who are also equipped to build digital tools that accommodate those languages. And in the face of an impending “massive linguistic die-off,” it might be worth putting away Spivak for a moment to scrutinize some source code.

So, can the subaltern code? I suggest you sign up for CLC 2014 and find out.

Click here to learn more about the Coding for Language Communities Summer School or apply to participate now. For more information please contact

1 comment

Join the conversation

Authors, please log in »


  • All comments are reviewed by a moderator. Do not submit your comment more than once or it may be identified as spam.
  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.