Rising Voices note: This article is republished here as a collaboration with Indigenous Tweets. Read the original post here.
Over the last couple of weeks I've created maps showing the Twitter conversations taking place in the Irish, Basque, and Māori languages. The inspiration for this came from an email conversation with Paora Mato from the University of Waikato in Aotearoa, who has co-authored (with Te Taka Keegan) an excellent analysis of the Māori Twitter community based on data from Indigenous Tweets (forthcoming). Since people seemed to enjoy the maps I decided to do similar ones for the other Celtic languages (Welsh, Scottish Gaelic, Manx Gaelic, Cornish, and Breton) which you'll find below.These maps were all created in more-or-less the same way. I started with the lists of people tweeting in each language from the Indigenous Tweets site – the site includes everyone tweeting in the smaller languages like Breton, Cornish, and Māori, and the top-500 most active users for Irish, Basque, Welsh, etc.
Next, a small percentage of Twitter users have geolocation activated for their tweets, which means that when they tweet from a mobile device, a latitude and longitude are recorded in Twitter's database along with the tweet. These coordinates are then accessible to developers like me through the Twitter API. For users without geolocation activated, I just collected the (self-reported) location from their Twitter profile, canonicalized the placenames, and looked up the lat/longs in a database. For these users, I assumed that all of their tweets were sent from the resulting location. This means, for example, that all tweets from people whose profile location is set to “Dublin”, “Baile Átha Cliath”, “BÁC”, or variants thereof will appear to come from one particular location near the center of the city – whatever's in the database (as it happens, it's the Dublin Spire). This isn't really a problem since I'm only interested in creating maps at the level of countries or continents.
Canonicalizing the placenames takes a bit of manual labor, for a few reasons. First, sometimes people will give their location in their profile as something like “American ex-pat living in Galway”, and the geolocation services I've tried usually fail on strings like this. Second, many people tweeting in indigenous or minority languages give their location in their native language, and for languages like Welsh, Cornish, Māori and so on, these names are often missing from geolocation databases. Finally, there are misspellings and other noise in people's profiles that are best handled manually.
So at this point I have good coordinates for between 50-60% of the users listed on the Indigenous Tweets pages. I then gather all tweets from the database that are in the desired language and in which one user “mentions” another. In the case that I have coordinates for both the sender and the mentioned user, I simply draw an arc of a great circle on the map connecting the two points. I rendered the maps using the statistical package R, which has libraries that make this sort of thing very easy (nice tutorial here, for example).
It's very common for a large number of conversations to take place between two specific points. For example, there have been 5878 Welsh language tweets sent from Caerdydd that mention a user in Caernarfon, and 1519 Irish language tweets sent from An Cheathrú Rua that mention a user in Baile Átha Cliath. In such cases, I've scaled the brightness of the arcs so that these frequent paths show up more prominently on the maps.I'm not a linguist or sociolinguist so it's not really my place to draw conclusions about linguistic geography, language vitality, or anything else from these maps. It's best to leave this to members of the language communities themselves, who will have the best understanding of the local situation. That said, I want to address a couple of issues people raised on Twitter after I posted the Irish, Basque and Māori maps.
The most striking thing about the Basque map is how compact it is geographically, especially when compared to the Irish map where we see many conversations between Ireland, North America, continental Europe and even Brazil. In contrast, all of the Basque conversations take place within the Basque Country, roughly speaking. And the Welsh map, which appears here for the first time, looks much more like the Basque map than the Irish one, with just a small percentage of tweets involving a user outside of Wales, most of those to and from London. Does this mean that somehow Irish is a more “international” language than the other two, or that the Irish-speaking diaspora is more engaged with the language? It might, but more careful research would be needed to establish this. My guess is that the Welsh and Basque communities look more compact in part because I'm only displaying the top-500 users in each case. Since these languages have such vibrant communities on Twitter, the bar is set extremely high to make it into the top-500 tweeters (currently, the 500th most active tweeter in Welsh has 1073 tweets in the language, for Basque the number is 1958, but for Irish it's just 176), and I expect that users with thousands of tweets in the language are more likely to live in the traditional homeland where the language is still used on a daily basis by the local community.
A word or two regarding the Manx map. Of the six Celtic languages, Manx has the smallest number of users on Twitter and probably the smallest number of speakers also. Several users have “Isle of Man”, “Ellan Vannin” (or variants thereof) as their location (and no more specific location on the island). Because of this, I normalized all locations on the island to a single lat/long, and therefore (disappointingly) the map doesn't show what I expect is actually an interesting network of communication taking place on the island; instead it just shows the conversation pathways between the island and three users off the island.Finally, a word about privacy. I haven't plotted locations at a granularity finer than a city or town except in cases where users have explicitly activated geolocation for their tweets. And even in those cases, since the maps are at a pretty large scale, it's impossible to pinpoint the exact location of any particular user. That said, not everyone will be so scrupulous with your data, and if the idea of a stranger plotting your movements on a map creeps you out (I think it should), you should deactivate geolocation on your Twitter account (under Settings, go to “Security and Privacy”, and then make sure the box next to “Add location to my tweets” is unchecked). If you don't want anyone to know where you are at all, you can also remove your location from your Twitter profile (Settings → Profile → Location). And if you don't want sites like Indigenous Tweets to have access to your tweets at all, the easiest solution is to make your Tweets private (Settings → Profile, and tick the box next to “Protect my tweets”).