Translating Ushahidi the Ushahidi Way (On crowd-sourced / collaborative translation)

Posted on Mar 20, 2011

At the Innovations Lab Design Center, we have been working on an Ushahidi-based project. We have been hacking it to our own ends (not crisis-related, and only vaguely crowd-sourced) in fun ways I’ll talk about next post, but I wanted to spend some time on an experiment we carried on along with the project: crowd-sourced software translation.

Working in a tri-lingual envrionment (Kosovo has three official languages; Albanian, Serbian, and English), language and translation are always on one’s mind. So I decided to experiment with a translation service based on the web-based software Pootle, and see what it takes to get software translated to a new language.

In this post, I will talk about our experiences with crowd-sourced/collaborative translation, as well as the technical bits required to set up a similar project yourself.

We have been working with the local free software organization (FLOSSK) ever since the beginning of the Innovations Lab, as both open software and open source culture are deeply important to the Lab. I was lucky enough to get here right as FLOSSK was holding their annual SFK software freedom kosovo conference, and FLOSSK members have been working on projects with us and holding some informal PHP language classes and some mini-workshops around mapping at the Lab.  The members of FLOSSK are diverse and come from various different backgrounds, but the most active members are high school students with free time but little professional experience. As they had done other free software translation tasks in the past, and were interesting in translating Ushahidi when they heard about it, we started to work with them to translate the software. I set up the collaborative translation software Pootle, and volunteers who were visiting the Lab anyways for other activities (mainly a PHP class) dove into the translation.
Pootle is built for collaboration. It has two main tools that were helpful for the volunteers to motivate themselves for translation: a percentage that counts the percentage of total strings translated, and counts for the number of strings individual members translate. With those two numbers to guide them, three volunteers in particular (Shkelqim Ahmeti, Agron Demiraj, and Gent Thaci) pored into the translation, and finished translating some 6000 strings in a few weeks time, working for a few hours each.
During the process, they referenced the completion indicators frequently. Shkelqim translated a tremendous amount at home, and Agron sometimes dropped by the Innovations Lab just to do translation work. In a few weeks of off and on work, 100% of Ushahidi software (and most user-facing strings) has been translated into Albanian.
After the translation process was more or less complete, I packaged up the translation files and provided it to our software developer Arbnor Hasani to migrate into the Ushahidi site. Arbnor is an Albanian speaker (and a bit of a perfectionist), and he noticed right away that the some of the translations were a bit off-kilter; the translations were not yet ready to be put onto a public site. I am absolutely proud and thankful of the work that our high school student volunteers, never trained in translation (forget software translation), and trying this for the first or second time in their lives, did. However, given the moderate quality of translation strings, we now have to devise a strategy for revising these translations.
So far, the best idea we have is a process similar to a code-review. We plan to go through large portions of the translation done by FLOSSK members in front of them, and provide feedback about quality–whether the strings are good or bad, what improvements they need, etc. After that, we will go through a revision process that I hope that newly “trained” FLOSSK members can participate in. I think this is a good strategy because it gives skill and knowledge back to the volunteers in return for all the hard work they have put in. But I welcome other suggestions and opinions as well.
We used the web-based translation software Pootle for the project. During my time working with the One Laptop Per Child several years ago, I had worked with this really cool piece of translation software. It offered a web-based interface for translating strings, ability to collaboratively review translations, and suggestions based on the corpus of text already translating. And even more, it offered metrics of who translated what. From my time translating software to Nepali, I found that it was fun software to use because I could see how much work I was doing and the impact I was making to a project I cared about [1].
So I googled around a little bit, and spent a couple of hours setting up Pootle for our needs.
For a quick experimental project, the installation is stupidly easy [2]: you go to the Ubuntu software center, search for “Pootle” and hit install. A simple “sudo Pootleserver start” then gets the default server that Pootle comes with up and running on sqlite3 (a file-based database). Its a web app with fairly good usability after that, and it took me about five minutes to set up the project after I had my .po files ready to go. (See technical bits #1 and #2 at end of post if you want to do this yourself).
This was a cool experiment to run at the UNICEF Innovations Lab Kosovo for several reasons. The first is that it connects the various things we are doing at the Lab–developing software-based solutions for UNICEF counterparts (in particular using Ushahidi to aggregate youth NGOs for the Ministry of Youth Culture and Sports), providing youth increasing ways to use their creative energy for social good, as well as building up the free and open source community here in Kosovo. The second is that it reveals interesting challenges in attempts to work with non-formally-trained people to build quality products. And the third that it has given us some insights into the mechanisms of collaboration (progress-tracking has been an important tool, for example).
The experiment has been partially successful already. We have a rough-cut version of Ushahidi translations in Albanian. I am excited to see how the “code-review” type process with translation review proceeds, and carry forward similar translation projects with new software (as well as with the same software in other languages: Serbian!)
Feel free to email me. Two pieces require technical chops (which for me meant some python and unix tools like sed and gettext): (1) generating (bilingual) .po file [3] from Ushahidi language files, which are monolingual and (2) re-integrating the translations into the Ushahidi source.
[1] – Nepali kids were getting software, and of course I like many felt that it was absolutely essential they be able to understand the interface.


[2] – If you have .po files. For more on .po files, see [3].


[3] – Before you can work with pootle, you will need to generate .po files.  The short version of understanding .po files is this: it is a way to represent text in two different language, the original as well as the new one. [The long version is available here.] These files also have support for a couple of things useful in translation, like the ability to mark translations “fuzzy” (or “I’m not sure if this is right, but this is a good guess”). On the left below is a screenshot of such a file, viewed using text editing software.




On the right is the format of translation files Ushahidi uses. It is a set of php files that other software can directly execute to return useful translations for the software in general. It is a “mono-lingual” file, in that it only represents strings in one language, and not more. I had to convert the .php files into .po files, but that wasn’t very hard. For integration, I used polib and a quick python command or two.