Content Translation achieved a new milestone, supporting already the creation of 500,000 Wikipedia articles. The Language team has been working during the last year to make the tool more solid, and has plans to expand the use of translation to help more communities to grow.
Wikipedia users can learn about many topics. However, the exact number of topics they can access is very different depending on the language they speak. While English speaking users can access more than 5 million articles, Bengali speakers have access to 75 thousand articles.
Translating articles into new languages is a practice that can help content to propagate more fluently across languages, and reduce this language gap. To facilitate this process, we here at the Wikimedia Foundation developed a content translation tool that helps Wikipedia editors to easily translate articles. Content Translation simplifies translating Wikipedia articles into different languages by automating many of the boring steps of the manual translation process.
In early August, Content Translation reached a new milestone: more than half a million articles were created since the tool was released four years ago, making this a good time to reflect on the impact of the tool and discuss future plans.
A more reliable tool
During the past year, the Language team worked on a new version of the tool. Based on user research and feedback, the plan was to create a more solid version of Content Translation to increase the tool adoption and use.
For the new version we replaced the default editing surface provided by the browser with Visual Editor, which supports rich wiki content in a way that is much more reliable. This required a rewrite most of the translation tools, and we wanted to take this opportunity to review them and provide better guidance for newcomers.
As the new version became more complete it was gradually exposed more prominently during the year, and finally replaced the previous version completely without major regressions. During the year more than 149.000 translations were created, a 23% increase compared to the previous year.
We started conversations with different communities to identify the main blockers before the tool could be provided by default and exposed to more users.
Better collaboration between humans and machines
In addition to the number of articles created, we focused on the quality of the content. The new version improved the guidance provided to newcomers. In particular, a new system was created to encourage users to review and edit the initial machine translation, and approaches based on Artificial Intelligence were explored to improve some automatic steps.
Content Translation provides machine translation as initial content for editors to review and improve. The machine translation is provided as a starting point, and translators are highly encouraged to rewrite the content, in order to eliminate errors and make the translation sound more natural.
The new version incorporates new quality control mechanisms for machine translation. Now the tool encourages translators to review the initial automatic translations on a paragraph basis, keeps in a tracking category those translations published with unmodified content for editors to review, and prevents publishing those which exceed the limits defined. The limits to prevent publishing become more strict for users with previous deleted translations, users ignoring the warnings, and cases where several paragraphs contain unmodified contents. In this way, the limits adapt to reduce potential recurrent misuse of the tool.
This system can be customized to address the particular needs of each community, and proved to be useful to help the Indonesian Wikipedia editors to reduce the creation of low quality translations.
In general, our measurements suggest that translations are less likely to be deleted than the articles started from scratch. The survival rate for translations even when those are created by newcomers seems quite good. A recent study shows that a significant percentage of the translations created with the tool survive the community review. Although the survival rate is better for experienced users, it is still very good for newcomers (users that created their account during the last 6 months). For example, only 7.5% of translations created by newcomers in last june were deleted after a month.
In addition, Artificial Intelligence is becoming more present in the tool to make the initial translations better:
- We extended the machine translation support by integrating Google Translate, which uses Google neural machine translation to provide good translation quality for many language combinations. Google Translate supports 105 languages, 18 of them not supported by previous translation services supported. All translation services are integrated in a safe way where only Wikipedia content and no user information is shared with external services, respecting the user privacy.
- In collaboration with the Wikimedia Research team, we explored the use of machine learning approaches to better transfer templates across languages. Templates are used to structure a diversity of rich content of different kinds (from references to infoboxes). However templates were not originally created to work well across languages. The new approach helps to find corresponding parameters across languages for the tool to transfer the information more reliably.
We believe that automation with adequate quality control mechanisms makes it easy for translators to create higher quality translations more easily.
Translation has helped already many communities to create new content. However, there are still communities with potential to grow by using translation that have not been using the tool as much.
Content Translation’s Boost initiative is aimed at expanding the use of translation to help more communities grow. By enabling new and more visible ways to contribute by using translation, we expect communities to attract new editors, and expand the knowledge available in their languages.
We identified potential for expanding its use to more contexts that can benefit from translation:
- Translation can be used by more wikis. The adoption of Content Translation varies significantly from wiki to wiki, and there are wikis with potential to benefit from using translation more.
- Translation can be used in more ways. Currently, Content Translation focuses on creating new articles on desktop. Supporting new kinds of contribution such as expanding existing articles with new sections, or mobile translation enable more opportunities to contribute.
During the next months we will focus on wikis with potential to grow by translation. As a representative set of those wikis we have initially selected Malayalam, Bengali, Tagalog, Javanese, and Mongolian. We’ll be contacting these communities to gauge the interest in the project, and learn about their particular needs to support them better. We expect these and similar communities to benefit as a result.
Our specific plans will be heavily influenced by research in the selected communities and their feedback. Please, provide any feedback about this initiative in the discussion page. We are interested in hearing your ideas on how to help communities grow by using translation.