Languages: Content Aggregation For Underrepresented Voices

This post is part of our Coverage of the Online Dialogue “Using Citizen Media Tools to Promote Under-Represented Languages.

During the online dialogue (November 16-22, 2011) at some interesting issues came up including the need for aggregation of online contents to preserve less spoken languages. There is no easy way or a reliable mechanism to find content in any language. We resort to search engines but what to search for in the sea of information? Content aggregators can play a role in solving this problem. A content aggregator is a platform/tool that gathers Web content from different online sources for reading/reuse. It can be a blog aggregator, Twitter aggregator or any other aggregator of contents.

Solana Larsen, Managing Editor of Global Voices Online told about the importance of contents for underrepresented languages:

People use languages to communicate something that is useful or entertaining to them. If they don't find this online in one language, then they will just switch to another. Aside from technical barriers, that is probably the main thing that needs to be addressed for underrepresented languages to spread online: There has to be unique, relevant and entertaining content online in underrepresented languages for people to begin naturally communicating in them. Most people will simply not do it out of principle.

Rhodri ap Dyfrig replied to Solana citing the case of Welsh language:

Indeed, but much of so the “unique, relevant and entertaining content” that Welsh speakers tend to read is based either around the subject of Wales, Welsh politics, or Welsh language culture.

Rhodri also introduced the Welsh language tweet aggregator Umap Cymraeg and in a blog post he describes details about it:

Umap Cymraeg does several things:

  • it collects a database of Twitter users that have used Welsh in their tweets (or have the potential to do so). Manually initially, but then the system begins to add users by itself;
  • it filters the tweets by these users for ones that are in Welsh, then publishes them on the main page;
  • it filters this stream of tweets for popular words and hashtags and creates hourly/daily/weekly/monthly top trending topics lists
  • it filters the links in Welsh language tweets to discover what links are most popular, giving a dynamic top news/shared links chart

Screenshot of Umap Cymraeg aggregator top news feed

The Umap site was originally developed to serve as a Basque language Twitter aggregator but was subsequently implemented in Catalan and Welsh. Rhodri explains about Umap Cymraeg in an interview with Global Voices Online:

I felt that Welsh discussions and tweets can get lost in the sea of English tweets, but through language-recognition algorithms and some clever software, this was a chance to see what the true nature of the Welsh discussion was, and maybe see if any interesting patterns emerged. [..] The site now follows over 2,000 Welsh language tweeters and gives a leaderboard of those who are most active in the language.

Rhodri concludes with:

Whatever the results though, Umap Cymraeg has already shown me that the language is alive online, and in a completely natural way. Although much of the trending topics relate back to Wales, Welsh politics or the language, it is obvious that the language is used to discuss nearly everything that goes on in people’s lives, and in a variety of registers and dialects. I hope Umap goes some way to showing that Welsh is a living language, with no sign of retreating into obscurity any time soon.

Y Blogiadur is the only Welsh-language blog aggregator covering nearly all Welsh-language blogs. is a tool created by Rhodri which filters the best of Welsh-language video online.

Hope we will be able to see aggregators for more underrepresented languages in the future.

Start the conversation

Authors, please log in »


  • All comments are reviewed by a moderator. Do not submit your comment more than once or it may be identified as spam.
  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.