FOSSY Wrap-Up: CDSC presents Interactive Session — Let’s Get Real: Putting Research Findings into Practice

Welcome to part 7 of a 7-part series spotlighting presentations from the Science of Community track at FOSSY 23!

In this interactive session, Dr. Benjamin Mako Hill, Dr. Aaron Shaw, and Kaylea Champion hosted a series of conversations with FOSS community members about finding research, putting it to use, and building partnerships between researchers and communities!

This talk was (intentionally!) not recorded, but we’ve synthesized the resources we shared into this wiki page.

FOSSY Wrap-Up: Mariam Guizani on Rules of Engagement: Why and How Companies Participate in OSS

Welcome to part 6 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23!

In this talk, Dr. Guizani shared her work to understand the motivation for companies to participate in open source software development, encompassing the perspective of both small and large firms.

You can watch the talk HERE and learn more about Dr. Guizani HERE.

FOSSY Wrap-Up: Shoji Kajita on Research Data Management Skills Development Leveraged by an Open Source Portfolio

Welcome to part 5 of our 7-part series reviewing all the great talks we were fortunate enough to host during the Science of Community track at this year’s FOSSY.

In this talk, Dr. Kajita introduced us to the work being done as part of the Apereo (formerly JA-SIG/Sakai) to create FOSS platforms to serve as academic and administrative infrastructure in higher education. Research data management is a skill that emerging scholars must learn to do modern quantitative research — and this skill can be scaffolded and tracked via the Karuta portfolio tool.

Watch the talk HERE, learn more about Karuta HERE, and learn more about Dr. Kajita HERE.

FOSSY Wrap-Up: Kaylea Champion’s Lightning Talk on Undermaintained Packages

Welcome to part 4 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23!

Kaylea presented on her new research project to identify how packages come to be undermaintained, in particular investigating assumptions that it’s all about “the old stuff” — old packages, old languages. It turns out that’s only part of the story — older packages and software written in older languages do tend to be undermaintained, but old packages in old languages — the tried and true, as it were — do relatively well!

Watch the talk HERE and learn more about Kaylea’s work HERE.

The State of Wikimedia Research, 2022–2023

Wikimania, the annual global conference of the Wikimedia movement, took place in Singapore last month. For the first time since 2019, the conference was held in person again. It was attended by over 670 people in-person and more than 1,500 remotely.

At the conference, Benjamin Mako Hill, Tilman Bayer, and Miriam Redi presented “The State of Wikimedia Research: 2022–2023”, an overview of scholarship and academic research on Wikipedia and other Wikimedia projects from the last year. This resumed an annual Wikimania tradition started by Mako back in 2008 as a graduate student, aiming to provide “a quick tour … of the last year’s academic landscape around Wikimedia and its projects geared at non-academic editors and readers.” With hundreds of research publications every year featuring Wikipedia in their title (and more recently, Wikidata too), is it of course impossible to cover all important research results within one hour. Hence our presentation aimed to identify a set of important themes that attracted researchers’ attention during the past year, and illustrate each theme with a brief “research postcard” summary of one particular publication. Unfortunately, Miriam was not able to be in Singapore to present..

This year’s presentation focused on seven such research themes:

Theme 1. Generative AI and large language models
The boom in generative AI and LLMs triggered by the release of ChatGPT has affected Wikimedia research deeply. As an example, we highlighted a preprint that used Wikipedia to enhance the factual accuracy of a conversational LLM-based chatbot.

Theme 2. Wikidata as a community
While Wikidata is the subject of over 100 published studies each year, the vast majority of these have been primarily concerned with the project’s content as a database which scientists use to advance research about e.g. the semantic web, knowledge graphs and ontology management. This year also saw several papers studying Wikidata as a community, including a study of how Wikidata contributors use talk page to coordinate (preprint).

Theme 3. Cross-project collaboration
Beyond Wikipedia and Wikidata, Wikimedia sister projects have attracted comparatively little researcher attention over the years. We highlighted one of the very first research publication in the social sciences that studied Wikimedia Commons, the free media repository, examining how it interconnects with English Wikipedia.

Theme 4. Rules and governance
Research on rules and governance continues to attract researchers’ attention. Here, we featured a new paper by a political scientist that documented important changes in how English Wikipedia’s NPoV (Neutral Point of View) policy has been applied over time, and used this to advance an explanation for political change in general.

Theme 5. Wikipedia as a tool to measure bias
While Wikimedia research has often focused on Wikipedia’s own biases, researchers have also turned to Wikipedia to construct baselines against which to measure and mitigate biases elsewhere. We highlighted an example of Meta’s AI researchers doing this for their Llama 2 large language model.

Theme 6. Measuring Wikipedia’s own content bias
Despite the huge interest in content gaps along dimensions such as race and gender, systematic approaches to measuring them have not been as frequent as one might hope. We featured a paper that advanced our understanding in this regard, presented a useful method, and is also one of the first to study differences in intersectional identities.

Theme 7. Critical and humanistic approaches
Although most of the published research work related to Wikipedia is based in the sciences or engineering disciplines, a growing body of humanities scholarship can offer important insights as well. We highlighted a recent humanities paper about the measuring of race and ethnicity gaps on Wikipedia, which focused in particular on gaps in such measurements themselves, placing them into a broader social context.

We invite you to watch the video recording on Youtube or our self-hosted media server or peruse the annotated slides from the talk.

Again, this work represents just a tiny fraction of what has been published about Wikipedia in the last year. In particular, we avoided research that was presented elsewhere in Wikimania’s research track.

To keep up to date with the Wikimedia research field throughout the year, consider subscribing to the monthly Wikimedia Research Newsletter and its associated Twitter and Mastodon feeds which are maintained by Miriam and Tilman.


This post was written by Benjamin Mako Hill and Tilman Bayer.

FOSSY Wrap-Up: Anita Sarma’s Lightning Talk on Inclusion Bugs

Welcome to part 3 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23!

Dr. Anita Sarma gave us an excellent introduction to her and her team’s work on understanding how to make FOSS more inclusive by identifying errors in user interaction design.

Matt Gaughan delivered a rapid introduction to his dataset highlighting the numerous places where the Linux Kernel is using unsafe memory practices.

You can watch the talk HERE and learn more about Dr. Sarma HERE.

FOSSY Wrap-up – Sophia Vargas on Proactive Metrics to Combat Maintainer Burnout

Welcome to part 1 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23!

Sophia Vargas presented ‘Can we combat maintainer burnout with proactive metrics?’ In this talk, Sophia takes us through her extensive investigations across multiple projects to weigh the value of different metrics to anticipate when people might be burning out, including some surprising instances where metrics we might think are helpful really don’t tell us what we think they do.

You can watch the talk HERE and learn more about Sophia’s work HERE.

FOSSY Fun, Finished

The CDSC hosted the Science of Community Track on July 15th at FOSSY this year — it was an awesome day of learning and conversation with a fantastic group of senior scholars, industry partners, students, practitioners, community members, and more! We are so grateful and eager to build on the discussions we began.

If you missed the sessions, watch this space! Most sessions were recorded, and we’ll post links and materials as they’re released.

Special thanks to Molly de Blanc for all the long distance organizing work; Shauna Gordon McKeon for stepping in to help share some closing thoughts on the Science of Community track at the very last minute, and to the FOSSY organizing team for convening such a warm, welcoming inaugural event (indeed, the warmth was palpable as it nearly hit 100° F on Friday and Saturday in Portland).

One tangible result of a free software conference: new laptop stickers!

Meet us at FOSSY!

The Free and Open Source Software Yearly conference (FOSSY) is in less than a week and we will be there!

We will be running the Science of Community track on Saturday July 15.

Two photos. In one is Kaylea Chamption, who has purple hair and a blue shirt. In the other is Sejal Khatri, Benjamin Mako Hill, and Aaron Shaw.
Kaylea Champion, and Benjamin Mako Hill and Aaron Shaw with Sejal Khatri (who won’t be at FOSSY)

The Science of Community track is inspired by the CDSC Science of Community Dialogues, which aim to bring together practitioners and researchers to discuss scholarly work that is relevant to the efforts of practitioners. As researchers, we get so much from the communities we work with and study and we want them to also learn from the research they so generously take part in. While the Dialogues cover a broad range of topics and communities, FOSSY presentations focus on how that work related to free and open source software communities, projects, and practitioners.

At FOSSY, we will have a number of really amazing researchers presenting their work. We wanted to share some highlights from the schedule.

Sophia Vargas, from Google’s Open Source Programs Office, will be presenting on how metrics can help us understand contributor burnout. Professor Shoji Kajita, from Kyoto University, will discuss research data management for FOSS communities. Mariam Guizani, from Oregon State University, will cover research on the why and how of corporate participation in FOSS. We will additionally have lightning talks by Adam Hyde, Anita Sarma, Shauna Gordon-McKeon, and incoming Northwestern Ph.D. student Matthew Gaughan.

We are really excited about our workshop “Let’s Get Real: Putting Research Findings Into Practice.” This workshop, designed for FOSS contributors and practitioners, will help guide you on how to get the most out of the incredible research on and relevant to FOSS. If you want to learn how to navigate the sheer volume of interesting research work happening or how to understand what it means, this is the session for you! Our workshop will be led by Kaylea Chamption and Professors Aaron Shaw and Benjamin Mako Hill. You can read more on our wiki.

Due to scheduling issues, Eriol Fox will be presenting their talk, “Community lead user research and usability in Science and Research OSS: What we learned,” in the Wildcard Track. We recommend going!

We hope to see you at FOSSY. Even if you can’t make it to our sessions, we’ll be at the conference so stop by and say hello!