Wrapping up version one: Structured Data on Commons

As the three-year grant period for building Structured Data on Commons (SDC) comes to a close with the end of 2019, I’d like to share some lists of the past two year’s worth of planning, discussion, building, testing, and releases the team has done with the Commons community.

Continue reading “Wrapping up version one: Structured Data on Commons”

Data Roundtripping: a new frontier for GLAM-Wiki collaborations

Dancers around the may pole, Oxford, Ohio, 1926. Photo by Frank R. Snyder; Miami University Libraries—Digital collections, no known copyright restrictions

For more than 10 years now, cultural institutions around the world have partnered with Wikimedians to make their collections more visible and to encourage re-use via Wikimedia platforms. Collaborations of this kind, GLAM-Wiki projects (with Galleries, Libraries, Archives and Museums), often use Wikipedia and Wikimedia Commons as platforms. Images of cultural collections are uploaded to Wikimedia’s media repository Wikimedia Commons and are re-used as illustrations in Wikipedia articles.

For several years, a growing number of GLAM-Wiki partnerships also work with Wikidata, the free, multilingual knowledge base of the Wikimedia ecosystem. Cultural institutions and Wikimedians upload data about cultural collections to Wikidata: it provides an accessible way to publish collections data as Linked Open Data, and makes the collection data multilingual, re-usable and discoverable across the web. Since 2019, files on Wikimedia Commons can now also be described with multilingual structured data from Wikidata. This will make the (structured) data component of GLAM-Wiki collaborations even more prominent in the future.

Continue reading “Data Roundtripping: a new frontier for GLAM-Wiki collaborations”

Lua support for Structured Data on Commons – pulling data into templates

As the first round of building structured data content for Wikimedia Commons comes to a close, support for the Lua programming language brings structured data front-and-center to file pages.

Continue reading “Lua support for Structured Data on Commons – pulling data into templates”

How can Structured Data on Commons, Wikidata, and Wikisource walk hand in hand? A pilot project with Punjabi Qisse

Punjabi Qisse; Puran Bhagat, Sassi Punnu, Raja Mor Dhuj, Kehar Singh Maut and others. CC BY-SA 4.0 by Satdeep Gill
Selection of books to be digitized, described and transcribed as a part of the SDC pilot project. (ImageSatdeep GillCC BY-SA 4.0)
logo Wikisource
The Wikisource logo by Nicholas Moreau, CC BY-SA 3.0 Unported

I work as a part of the Community Programs (GLAM) team at the Wikimedia Foundation. As part of my work, I support Wikisource, a digital library of public domain and freely licensed texts, which is an important platform for GLAM projects and knowledge exchange in many Wikimedia communities. I have been writing case studies about Wikisource, documenting pain points around it, and prioritizing them with the communities.

Continue reading “How can Structured Data on Commons, Wikidata, and Wikisource walk hand in hand? A pilot project with Punjabi Qisse”

How we helped a small art museum to increase the impact of its collections, with Wikimedia projects and structured data

A blog post by Sandra Fauconnier, with contributions by Sam Donvil (PACKED) and Joris Van Donink (Jakob Smitsmuseum). This blog post describes a GLAM pilot project for Structured Data on Wikimedia Commons, executed by PACKED, and mentored by Sandra. We hope this will inform and inspire Wikimedians who want to learn about structured data, and/or (intend to) do similar GLAM-Wiki collaborations!

Continue reading “How we helped a small art museum to increase the impact of its collections, with Wikimedia projects and structured data”

Introducing ISA – a cool tool for adding structured data on Commons

The ISA tool being announced at WikiData Conference 2019 as the coolest multimedia tool of the year by Liam Wyatt (User:Wittylama). Photo: User:Sandra Fauconnier

ISA is a new tool that makes it very easy for anyone, including absolute beginners, to add structured data descriptions in the form of captions and so-called ‘Depicts’ statements to images on Wikimedia Commons. ISA is called a ‘micro-contributions’ tool: when you use ISA, you make many very small edits to Wikimedia Commons in a playful way. We intentionally designed ISA to be multilingual and mobile-first; it has been such a hit that it received a WikidataCon 2019 Award in the Multimedia category last October. And why this name? ‘Isa’ is the chiShona language word for ‘put’ or ‘place’, but it was also chosen because it is an acronym for Information Structured Acceleration or Information Structured Additions.

Continue reading “Introducing ISA – a cool tool for adding structured data on Commons”

Structured Data on Commons Part Five – Other Statements

With depicts statements available to make the most basic claims about files on Commons, it was time to make more fully-formed statements. The Structured Data on Commons development team developed and released the first level of support for types of statements other than depicts.

“Other statements” offer expanded data about a file. Wikidata properties such as creator (P170), location (P276), Commons quality assessment (P6731), license (P257), and more. For an example of depicts plus other statements, here’s a file that is an image of sugar cubes:

Dietmar Rabich / Wikimedia Commons / “Würfelzucker — 2018 — 3564” / CC BY-SA 4.0

This is the representation of the file in structured data, using depicts with qualifiers in combination with other statements:

Structured data for “Würfelzucker — 2018 — 3564”

This information is “machine-readable,” meaning that people can write software to interact with it, soon there will be the power to query the data, and a host of other potential uses. Lucas Werkmeister wrote a separate blog covering some of the possibilities of Structured Data on Commons. Importantly, all of this information is multilingual as well, as previously most data was restricted to English when used in templates and categories.

Taken as a whole, depicts and other statements, contributors to Wikimedia Commons can now begin to fully contribute structured data. The development team continues to work on support for different data beyond words, such as geocoordinates, time stamps, and other such types. Additional support for community tools such as Lua functionality is making progress as well. After this multi-year effort, the partners involved in the project can start the work of building a more accessible Commons at last.

Previously: Part Four – Depicts Statements

Structured Data on Commons Part Four – Depicts Statements

Now that the underlying software for Structured Data on Commons has been put in place, along with Captions helping to demonstrate the software worked, the development team was ready to release the first form of structured statements for Commons: depicts.

Depicts is a statement for representing the concepts or topics present or expressed in a media file. The depicts statement can be considered the most basic example for modeling information about a file.

Wikimedia Commons / CC-BY-SA 4.0 / GFDL

With support for depicts, people searching for specific media files on Commons can begin finding them in a structured, multilingual way. At the time of release, depicts statements can be searched using the keyword haswbstatement. For example, if you wanted to find all instances of depicts (P180) a house cat (Q146), in the search bar you can use: haswbstatement:P180=Q146 and it will return results in all languages.

After making sure basic depicts support was working, the development team added support for qualifiers. By using qualifiers for depicts, users are able to represent the file even further by refining, contextualizing, or expanding the simple statement. For example, the previous statement of depicts (P180) a house cat (Q146) can be refined to depicts (P180) a house cat (Q146) [color: gray (Q42519)] and will return only files with statements that match a gray cat. As with basic depicts, this functionality is multilingual and will find whatever languages are available.

Wikimedia Commons / CC-BY-SA 4.0 / GFDL

Now that Commons has the most basic modeling for data in a file in place, the development team turned to supporting other types of statements beyond depicts. These other types of statements will be covered in the next part.

Next: Structured Data on Commons Part Five – Other Statements

Previously: Part Three – Multilingual File Captions

Working with Structured Data on Commons: A Status Report

English: American University SOC students helping National Archive scan files and photos.
Editathon at national archive.-American University COMM535.JPG () by Xiaweiyang, CC-BY-SA-3.0.

The beginnings of Structured Data on Commons have been available for a little over half a year now, so let’s take a look at how editors can already work with it, and what more is coming soon. (Disclaimer: though the author is a Wikimedia chapter employee, this post is written in a volunteer capacity only.)

What’s already available

You can, of course, edit the structured data (captions and statements) directly on the file pages. Like any other changes, these edits will show up in the page history, in recent changes, on your watchlist, etc., so other editors can see, inspect, patrol, improve or undo them as usual. This is a great way to get started with Structured Data and get a grasp on how it works.

The Upload Wizard supports structured data as well, and you can set captions on each file before uploading it (and, like with the description, categories, etc., you can copy one file’s captions into remaining files, if you want to use the same caption for a whole batch of uploads), as well as edit each file’s statements.

Another way to add Structured Data is offered by the ISA tool, which is focused on improving the metadata of pictures uploaded as part of “Wiki Loves …” campaigns. It allows participants to add captions in different languages, as well as “depicts” statements, to photos that are part of the campaign (as selected by the campaign coordinator via a category). The coordinator can optionally limit a campaign to only captions or statements if they don’t want to overwhelm their participants or they think that only one of those aspects is necessary.

The Wikipedia Android app also allows you to edit the captions of images embedded in Wikipedia articles. (The iOS app doesn’t seem to have any such feature.)

You can also search the structured data in the regular wiki search, using special search keywords. The full documentation is at mw:Help:Extension:WikibaseCirrusSearch, but the most important keywords are hascaption, incaption and haswbstatement: hascaption:en searches for files that have an English caption, incaption:"search text" searches for “search text” in a file’s captions (and not in its description, categories, etc.), and haswbstatement:P180 searches for files that have a matching statement. All of these can be combined with other search terms as usual – for example, “adoptado hascaption:es -hascaption:fr haswbstatement:P180=Q146” searches for files that depict cats and where the (non-structured) description contains the word «adoptado» (“adopted” in Spanish) which have a caption in Spanish but not in French.

There is also a way to edit the statements of multiple files at once: the user script Add to Commons / Descriptive Claims (AC/DC), written by yours truly, lets you add the same collection of statements (including qualifiers) to a whole set of files. You can use this, for example, to add a suitable “depicts” statement to all the files in a category. (But make sure that all the files actually depict the category subject and are not merely related to it! This wouldn’t work at all for Category:Käthe Kollwitz, for example, because it combines media depicting her with media by her. Sometimes suitable subcategories like Category:Potraits of Käthe Kollwitz exist.)

And finally, if you’re a technical expert you can always use the MediaWiki and Wikibase APIs directly to make any edits you want – for example, User:Multichill did this during the Wikimedia Hackathon 2019 in T223746.

What’s coming soon

A full-featured SPARQL query service for Structured Data on Commons is in the works (T141602); this basically blows the haswbstatement search keyword mentioned earlier out of the water, letting you search not just for simple “has statement” matches but providing a powerful way to query the whole data graph. For example, this will make it possible to search for files that were taken anywhere within a certain city (without having to mention that city on each file – connections from districts etc. to the surrounding city are already on Wikidata), or files depicting animals within a certain family or order. It will also allow users to query the qualifiers of statements, which is not possible in the regular search either. Regular search will remain the best way to search within the file captions (or traditional descriptions), but fortunately the two can be combined using MWAPI.

Lua support is also underway; this will make it possible to embed the structured data in the wikitext, usually via templates. For example, {{Location}} could be updated to get the coordinates from the structured data (specifically the property coordinates of the point of view) if they are not specified as a template argument, similar to how on many Wikipedias, {{official website}} gets the official website from Wikidata if it’s not specified as a template argument. Other templates could also automatically categorize images based on their structured data, similar to how {{Wikidata infobox}} already adds some parent categories to category pages based on the information in Wikidata. This will be up for discussion and implementation by the community, of course.

We can also expect to see support for Structured Data on Commons in more tools. QuickStatements, the Swiss Army knife for editing Wikidata, will hopefully gain support for editing captions and statements on Commons soon (T181062 – in fact there is some very rudimentary support already, but it’s so fragile that I don’t want to give any guidance on it). This will allow for more fine-grained editing than the AC/DC user script mentioned above, though I hope that AC/DC will remain useful as a more user-friendly tool for a common use-case. Support for the Pywikibot library (T223820) and the Pattypan upload tool (T181057) are also planned. And tools should learn to work better together: PagePile support in VisualFileChange or Cat-a-lot and AC/DC would allow you to select a set of files using the former tools and then add statements to all of them using the latter, by exchanging the selection of files via the PagePile tool.

Structured Data on Commons Part Three – Multilingual File Captions

Wikimedia Commons

Wikimedia Commons holds over fifty million freely-licensed media files. These millions of images, sounds, video, documents, three-dimensional files and more contain a vast amount of information related to the contents of the file and the the context for the world around them. As Commons has collected files over the years, the volunteers who curate and maintain the site have developed a system to contain and present this information to the world, using MediaWiki, wikitext, and templates.

A description template is the first and primary way information about a file is show to users. These templates can be a powerful tool for displaying information about files; descriptions provide meaningful context and information about the work presented. Descriptions can be as long as the user would like, providing wikitext markup and links for others to find out more. Description templates can also hold translations by adding language fields. However, the Structured Data team saw some areas that a feature like captions could improve upon from descriptions templates.

A description template that contains a lot of well-organized information, but might not be serving the purpose that a caption can. Template credit Wikimedia Commons, text may be CC-BY-SA 3.0 where applicable.
A caption for the same file based on facts in the description template.

Multilingual captions help share the burden of descriptions by providing a space to describe a file in a way that is standard across all files, easy to translate, and easy to use. Captions do not support wikitext so there is no knowledge needed of how to links work in this space — links can still be provided in the more expansive file description. Captions are added during the upload process using the UploadWizard, or they can be added directly on any file page on Commons. The translation feature for captions is a simple interface that requires only a few steps to create and share a caption translation.

Adding other languages to a caption.

The “multilingual” in “multilingual captions” highlights a primary focus of Structured Data features: opening up access to Commons to as many languages as possible beyond its present capabilities. This is enormously beneficial to the Wikimedia movement and Wikimedia Foundations’ mission of sharing knowledge with the world. In addition to captions, future features planned provide supporting adding “statements” from Wikidata to files, effectively describing them in an organized way that can be accessed by programs and bots to present media. These statements can be multilingual as Wikidata supports translations, which will make statements searchable in any language that has a translation provided.

Next: Part Four – Depicts Statements

Previously: Part Two – Federated Wikibase and Multi-Content Revisions