Working with Structured Data on Commons: A Status Report

Editathon at national archive.-American University COMM535.JPG
File:Editathon_at_national_archive.-American_University_COMM535.JPG () by Xiaweiyang, CC-BY-SA-3.0.
English: American University SOC students helping National Archive scan files and photos.

The beginnings of Structured Data on Commons have been available for a little over half a year now, so let’s take a look at how editors can already work with it, and what more is coming soon. (Disclaimer: though the author is a Wikimedia chapter employee, this post is written in a volunteer capacity only.)

What’s already available

You can, of course, edit the structured data (captions and statements) directly on the file pages. Like any other changes, these edits will show up in the page history, in recent changes, on your watchlist, etc., so other editors can see, inspect, patrol, improve or undo them as usual. This is a great way to get started with Structured Data and get a grasp on how it works.

The Upload Wizard supports structured data as well, and you can set captions on each file before uploading it (and, like with the description, categories, etc., you can copy one file’s captions into remaining files, if you want to use the same caption for a whole batch of uploads), as well as edit each file’s statements.

Another way to add Structured Data is offered by the ISA tool, which is focused on improving the metadata of pictures uploaded as part of “Wiki Loves …” campaigns. It allows participants to add captions in different languages, as well as “depicts” statements, to photos that are part of the campaign (as selected by the campaign coordinator via a category). The coordinator can optionally limit a campaign to only captions or statements if they don’t want to overwhelm their participants or they think that only one of those aspects is necessary.

The Wikipedia Android app also allows you to edit the captions of images embedded in Wikipedia articles. (The iOS app doesn’t seem to have any such feature.)

You can also search the structured data in the regular wiki search, using special search keywords. The full documentation is at mw:Help:Extension:WikibaseCirrusSearch, but the most important keywords are hascaption, incaption and haswbstatement: hascaption:en searches for files that have an English caption, incaption:"search text" searches for “search text” in a file’s captions (and not in its description, categories, etc.), and haswbstatement:P180 searches for files that have a matching statement. All of these can be combined with other search terms as usual – for example, “adoptado hascaption:es -hascaption:fr haswbstatement:P180=Q146” searches for files that depict cats and where the (non-structured) description contains the word «adoptado» (“adopted” in Spanish) which have a caption in Spanish but not in French.

There is also a way to edit the statements of multiple files at once: the user script Add to Commons / Descriptive Claims (AC/DC), written by yours truly, lets you add the same collection of statements (including qualifiers) to a whole set of files. You can use this, for example, to add a suitable “depicts” statement to all the files in a category. (But make sure that all the files actually depict the category subject and are not merely related to it! This wouldn’t work at all for Category:Käthe Kollwitz, for example, because it combines media depicting her with media by her. Sometimes suitable subcategories like Category:Potraits of Käthe Kollwitz exist.)

And finally, if you’re a technical expert you can always use the MediaWiki and Wikibase APIs directly to make any edits you want – for example, User:Multichill did this during the Wikimedia Hackathon 2019 in T223746.

What’s coming soon

A full-featured SPARQL query service for Structured Data on Commons is in the works (T141602); this basically blows the haswbstatement search keyword mentioned earlier out of the water, letting you search not just for simple “has statement” matches but providing a powerful way to query the whole data graph. For example, this will make it possible to search for files that were taken anywhere within a certain city (without having to mention that city on each file – connections from districts etc. to the surrounding city are already on Wikidata), or files depicting animals within a certain family or order. It will also allow users to query the qualifiers of statements, which is not possible in the regular search either. Regular search will remain the best way to search within the file captions (or traditional descriptions), but fortunately the two can be combined using MWAPI.

Lua support is also underway; this will make it possible to embed the structured data in the wikitext, usually via templates. For example, {{Location}} could be updated to get the coordinates from the structured data (specifically the property coordinates of the point of view) if they are not specified as a template argument, similar to how on many Wikipedias, {{official website}} gets the official website from Wikidata if it’s not specified as a template argument. Other templates could also automatically categorize images based on their structured data, similar to how {{Wikidata infobox}} already adds some parent categories to category pages based on the information in Wikidata. This will be up for discussion and implementation by the community, of course.

We can also expect to see support for Structured Data on Commons in more tools. QuickStatements, the Swiss Army knife for editing Wikidata, will hopefully gain support for editing captions and statements on Commons soon (T181062 – in fact there is some very rudimentary support already, but it’s so fragile that I don’t want to give any guidance on it). This will allow for more fine-grained editing than the AC/DC user script mentioned above, though I hope that AC/DC will remain useful as a more user-friendly tool for a common use-case. Support for the Pywikibot library (T223820) and the Pattypan upload tool (T181057) are also planned. And tools should learn to work better together: PagePile support in VisualFileChange or Cat-a-lot and AC/DC would allow you to select a set of files using the former tools and then add statements to all of them using the latter, by exchanging the selection of files via the PagePile tool.


Comments

  1. I agree with Ainali, PagePile is way too much complicated when you only want to add a statement to 4-5 files from a category. Ideally, you should be able to select files like in Cat-a-lot.

  2. As for technical support, the wikibase Rust crate already contains (rudimentary) support for MediaInfo entities.

    If we want to use PagePile more, I might have to beef it up a bit…

Comments? Questions? Discuss this post

8 more replies

Participants