The State of Wikimedia Research, 2022–2023

Wikimania, the annual global conference of the Wikimedia movement, took place in Singapore last month. For the first time since 2019, the conference was held in person again. It was attended by over 670 people in-person and more than 1,500 remotely.

At the conference, Benjamin Mako Hill, Tilman Bayer, and Miriam Redi presented “The State of Wikimedia Research: 2022–2023”, an overview of scholarship and academic research on Wikipedia and other Wikimedia projects from the last year. This resumed an annual Wikimania tradition started by Mako back in 2008 as a graduate student, aiming to provide “a quick tour … of the last year’s academic landscape around Wikimedia and its projects geared at non-academic editors and readers.” With hundreds of research publications every year featuring Wikipedia in their title (and more recently, Wikidata too), is it of course impossible to cover all important research results within one hour. Hence our presentation aimed to identify a set of important themes that attracted researchers’ attention during the past year, and illustrate each theme with a brief “research postcard” summary of one particular publication. Unfortunately, Miriam was not able to be in Singapore to present..

This year’s presentation focused on seven such research themes:

Theme 1. Generative AI and large language models
The boom in generative AI and LLMs triggered by the release of ChatGPT has affected Wikimedia research deeply. As an example, we highlighted a preprint that used Wikipedia to enhance the factual accuracy of a conversational LLM-based chatbot.

Theme 2. Wikidata as a community
While Wikidata is the subject of over 100 published studies each year, the vast majority of these have been primarily concerned with the project’s content as a database which scientists use to advance research about e.g. the semantic web, knowledge graphs and ontology management. This year also saw several papers studying Wikidata as a community, including a study of how Wikidata contributors use talk page to coordinate (preprint).

Theme 3. Cross-project collaboration
Beyond Wikipedia and Wikidata, Wikimedia sister projects have attracted comparatively little researcher attention over the years. We highlighted one of the very first research publication in the social sciences that studied Wikimedia Commons, the free media repository, examining how it interconnects with English Wikipedia.

Theme 4. Rules and governance
Research on rules and governance continues to attract researchers’ attention. Here, we featured a new paper by a political scientist that documented important changes in how English Wikipedia’s NPoV (Neutral Point of View) policy has been applied over time, and used this to advance an explanation for political change in general.

Theme 5. Wikipedia as a tool to measure bias
While Wikimedia research has often focused on Wikipedia’s own biases, researchers have also turned to Wikipedia to construct baselines against which to measure and mitigate biases elsewhere. We highlighted an example of Meta’s AI researchers doing this for their Llama 2 large language model.

Theme 6. Measuring Wikipedia’s own content bias
Despite the huge interest in content gaps along dimensions such as race and gender, systematic approaches to measuring them have not been as frequent as one might hope. We featured a paper that advanced our understanding in this regard, presented a useful method, and is also one of the first to study differences in intersectional identities.

Theme 7. Critical and humanistic approaches
Although most of the published research work related to Wikipedia is based in the sciences or engineering disciplines, a growing body of humanities scholarship can offer important insights as well. We highlighted a recent humanities paper about the measuring of race and ethnicity gaps on Wikipedia, which focused in particular on gaps in such measurements themselves, placing them into a broader social context.

We invite you to watch the video recording on Youtube or our self-hosted media server or peruse the annotated slides from the talk.

Again, this work represents just a tiny fraction of what has been published about Wikipedia in the last year. In particular, we avoided research that was presented elsewhere in Wikimania’s research track.

To keep up to date with the Wikimedia research field throughout the year, consider subscribing to the monthly Wikimedia Research Newsletter and its associated Twitter and Mastodon feeds which are maintained by Miriam and Tilman.


This post was written by Benjamin Mako Hill and Tilman Bayer.


Discover more from Community Data Science Collective

Subscribe to get the latest posts sent to your email.

One Reply to “The State of Wikimedia Research, 2022–2023”

Leave a Reply