FOSSY 2024: Submission Deadline Extended!

Worried you didn’t submit your FOSSY proposal on time? Well fear not, the deadline has been extended to Tuesday, June 18th. Submit your proposal today!

Does your work touch open source, communities, technology, or cooperation? Do you want to help bridge the gaps between research and practice? Join us at the Free and Open Source Software Yearly conference (FOSSY) this summer!

We’ll be running the Science of Community track, and are looking for presenters to speak to an audience of FOSS practitioners, developers, community organizers, contributors, and people just generally into and curious about FOSS. 

FOSSY is a low-stress opportunity to talk to people who your work can benefit. For topics, consider presenting implications from past papers, synthesizing work from your field overall, or floating ideas and problems (lightning talks! long talks! short talks!). A full track description and answers to common questions is available on our wiki.

Recording of Thought-leader dialogue: Decentralizing social media

A couple of weeks ago, I moderated a “Thought leader dialogue” panel on “Decentralizing Social Media” co-hosted by the Northwestern Center for Human-Computer Interaction + Design (HCI+D) and the Community Data Science Collective.

The (extraordinary!) panelists were Jaz-Michael King (IFTAS), Christine Lemmer-Webber (Spritely Institute), and Bryan Newbold (BlueSky). The discussion ranged far and wide over some key background on decentralized and federated social media as well as some urgent challenges and opportunities in the space.

The recording of the session is up and you can watch it here (or in the frame below).

Thanks to the panelists, Madison Deyo, and the HCI+D team for making this happen!

Book Review: The Conversational Firm

New hires from the rank-and-file arguing with the CEO in public. Employee-chosen projects and a management team reluctant to say no. Few if any written rules. No offices. Staff arriving and departing when they choose. Messes everywhere. Some companies—especially technology firms—describe their ways of working as remaking the model of the modern business. They describe ways of working that were unthinkable some years ago. But has anything really changed from the organization models of the past, or are these features mostly hype and marketing obscuring the same old bureaucracy and hierarchy?

Although not a Communication scholar, sociologist Catherine J. Turco’s work offers vital insight into how communication structures are reordering relationships, with significant implications for the field and discipline of Communication. In this brief and readable ethnographic study, Turco describes ‘TechCo,’ a social media marketing company, in rich detail. TechCo employees have access to the perks and features familiar to those who study firms in Silicon Valley — hack nights, freedom to experiment, flexible schedules, an open floor plan, a “dogs welcome” policy, free beer, and so on. The company seeks to embody its own industry: positioning itself as open, freewheeling, and engaged, just like the social media platforms they help their customers to use. Beyond this external branding, the founders have made an explicit goal internally: to create a company that is intentionally more `open’ and less hierarchical than traditional firms. How is this goal accomplished—and is it indeed accomplished at all?

Turco’s answer to this question is that these companies accomplish half of this goal. Companies are indeed able to deliberately open their communication, including the disclosure of financial details that in many firms is held exclusively by C-level leadership, as well as allowing for frank, public feedback from rank and file staff to executive leadership. However, they do so while leaving their hierarchical structure for decisionmaking largely intact. Turco argues that staff are satisfied by this arrangement—and in fact prefer to have decisionmaking power left in the hands of executives.

Drawing from theoretical background stretching from Max Weber to Albert Hirschmann to Sherry Turkle, Turco elaborates a theory of the conversational firm. In the conversational firm, voice and decision making power are intentionally decoupled. Therefore, these two factors can be analyzed distinctly and in tension with one another. This poses a particular challenge to lines of research which treat voice and authority as intertwined or interchangeable.

Communication scholars may find much to reflect on in her careful articulation of what is meant by and accomplished by the idea of “openness” in a firm, from her exploration of how employee use of social media can both benefit and harm a firm, and her case study of how efforts to brand and disseminate company culture can be both a marketing boon and an internal headache.

The book opens with conversations with the founders of TechCo and their desire for “radical openness” (p. 2) and anti-bureaucratic approach to structure. Turco describes the company’s experiences with openness and anti-bureaucratic tendencies from a range of perspectives: as reflected in the experiences of an eager young woman who is new to the workforce, as observed in Hack Nights, as visible within the company’s rollicking wiki discussions about everything from financial information to kitchen cleanup duties, and in their grappling with a lack of strict policies (instead, TechCo asks employees to “Use Good Judgment'”).

Through the first three chapters, Turco asks what this openness means, and finds that although the founders’ goal is to be transparent and less hierarchical than traditional firms, hierarchy remains and is even desired by employees: instead, what’s truly different about TechCo is its embrace of employee perspectives, and the employees’ trust that the firm will take them into account. Through long-running discussions on the company wiki and chat platforms, town hall meetings and cross-departmental dinners, we see frank conversations unfold and influence the direction of the company. Turco also observes that employees seem to primarily seek to be heard—they don’t have, and often don’t want, decision rights: they want and receive voice rights.

Turco concludes that despite the findings of prior work that bureaucracy is largely indestructible and reproduces itself, openness in communication allows greater freedom for employees, at least bending the bars of what Weber called the iron cage. The book returns to the limitations of anti-bureaucratic approaches throughout the text, with a series of examples in Chapter Six navigating the limitations of this openness: how the company came to have a traditional human resources department despite the founders’ repeated public expressions of distate for formal HR and concerns about noise, mess, and distraction in open ‘officeless’ seating plans.

In chapter four, Turco turns attention away from TechCo’s internal dialog and to the relationship between TechCo and external audiences—in particular, the absence of a social media policy. Unlike other firms which have strict rules for how employees comport themselves on social media—and the risk that the company faces from public response to employee behavior and disclosures—here again TechCo emphasizes their “Use Good Judgment” guideline. When employees make mistakes that reflect poorly on the company, TechCo’s response is to treat this as a learning opportunity, turning the event into training materials to shape employee understanding of what good judgment looks like (and doesn’t look like).

Chapter five offers a case study of TechCo’s external communication about their company culture. The founders disseminated a `manifesto’ that combined both their beliefs about TechCo’s culture and their beliefs about how companies should be organized to succeed in the current era. Although the document received extensive positive attention and served as a recruiting tool, existing employees were troubled by gaps between their experience and the company’s description of its culture. Employees also voiced the irony of a document developed in a top-down way describing a participatory and bottom-up culture. Satisfaction plummets. Over time, however, continuing conversation about the document and making revisions to it seems to allow employees to regain their sense of voice, eventually resolving the crisis.

Published in 2016 from fieldwork that ended in 2013, this account does not allow us to see how the conversational firm fared during recent events that have disrupted the structure, functioning, and culture of organizations—e.g. the isolation of Covid-19 pandemic, the migration to remote work, and questions about returning to the office.

In elaborating a theory of how firms can be conversational, decoupling decisionmaking power and voice, the book offers a useful framework for scholars examining the future of work and organizations, as well as other topics of enduring interest in Communication: the shifting relationship between firms and publics and the continued blurring of the public and the private in social media. Of key interest is the extent to which Communication theories about voice, the constitutive power of communication, and factors such as concertive control can be applied to these organizations.

Graduate students with an interest in ethnographic methods will find particular value in the blunt personal narratives that comprise an extended methodological appendix. Turco describes the process of gaining access to the company, gathering observations and interview data, and iteratively analyzing her notes and memos, all of which will be familiar to many. However, this section is unique in offering a series of self-critical reflections on the work of organizational ethnography, explicit description of the personal toll the work exacted from her, and the sometimes painful experience of receiving feedback from her subjects as the analysis emerged.

Ultimately, Turco argues that embracing open communication in firms is a transformative way forward. While we in Communication may agree, what remains for us is to investigate what it means: for how we understand voice in organizations and how we assess the role of technology and platforms for communication.

FOSSY 2024: Call for Proposals!

Does your work touch open source, communities, technology, or cooperation? Do you want to help bridge the gaps between research and practice? Join us at FOSSY! The Free and Open Source Software Yearly conference (FOSSY) is back this summer and the call for proposals is open!

We’ll be running the Science of Community track, and are looking for presenters to speak to an audience of FOSS practitioners, developers, community organizers, contributors, and people just generally into and curious about FOSS. 

The Science of Community track is inspired by the CDSC Science of Community Dialogues, which bring together practitioners and researchers to discuss scholarly work that is relevant to the efforts of practitioners. As researchers, we benefit so much from the communities we work with and study and we want them to also learn from the research they so generously take part in. While the Dialogues cover a broad range of topics and communities, FOSSY presentations will focus on how that work relates to free and open source software communities, projects, and practitioners.

FOSSY is a low-stress opportunity to talk to people who your work can benefit. For topics, consider presenting implications from past papers, synthesizing work from your field overall, or floating ideas and problems (lightning talks! long talks! short talks!). A full track description and answers to common questions is available on our wiki.

The CFP deadline is June 14th and uses this form.

Decentralizing Social Media: The challenges and opportunities of federated systems

A Virtual Thought Leader Dialogue on May 23, from 4 – 5:15 p.m. CST. Register here to join.

Based on File:Decentralization.jpg, by Adam Aladdin, CC-BY-SA 3.0

How can we create more trustworthy and accountable social media that support diverse communities? Decentralized social media—systems that allow users to connect and communicate across independent services like Mastodon or BlueSky—offer promising alternatives to centralized commercial platforms like Instagram, TikTok, or X. However, decentralized social media also face urgent design challenges, especially when it comes to content integrity, protecting community trust and safety, and forging collective governance. What happens when there is no central authority to review posts or ban abusive users? How can networks of autonomous communities build and adopt systems to govern effectively? What critical infrastructure can prevent the pervaisve harms of existing social media and support the integrity of public discourse?

Join Northwestern’s Center for Human-Computer Interaction + Design (HCI+D) and the Community Data Science Collective (CDSC) for an engaging conversation about the challenges and opportunites of decentralized social media on May 23rd from 4 to 5:15 p.m. CST. This panel features designers, leaders, and researchers involved in federated social media and will address opportunities for effective design and governance in this space.

Panelists include Jaz-Michael King, Bryan Newbold, and Christine Lemmer-Webber. Short presentations will be followed by discussion and Q&A moderated by Aaron Shaw (Northwestern HCI+D, CDSC). 

Moderator: Aaron Shaw, photograph by Nikki Ritcher Photography

Aaron Shaw is Associate Professor of Communication Studies and Sociology (by courtesy) at Northwestern University and a Faculty Associate of the Berkman Klein Center for Internet and Society at Harvard University. He is a co-founder of the Community Data Science Collective. At Northwestern, he is also affiliated with the Center for Human-Computer Interaction + Design (HCI+D), the Institute for Policy Research, the Buffett Institute for Global Affairs, and the Public Affairs Residential College.

Speaker: Christine Lemmer-Webber, Executive Director of Spritely Networked Communities Institute

Christine has devoted her life to advancing user freedom. Realizing that the federated social web was fractured by a variety of incompatible protocols, she co-authored and shepherded ActivityPub‘s standardization. She has also contributed to many other free and open source projects, including co-founding MediaGoblin.

Christine established the open source Spritely Project to solve known problems in existing centralized and decentralized social media platforms and to re-imagine the way we build networked applications – work that now continues here at the institute under her guidance as Executive Director.

Speaker: Jaz-Michael King, Executive Director of IFTAS (Federated Trust & Safety)

An accomplished professional with an extraordinary record of enabling data-driven decisions, developing innovative products, creating new business opportunities, driving strong operational performance, and building high-performing, agile teams.
Highly versatile, with extensive experience in data and technology from a privacy, improvement, and reporting perspective, Jaz has a proven record in building solutions for non-profit programs. 
As Executive Director of IFTAS, Jaz is now focused on independent, open Social Web activities, with the aim of creating #BetterSocialMedia by supporting trust and safety at scale in federated social media networks.

Speaker: Bryan Newbold, Protocol Engineer at BlueSky

Bryan works at Bluesky, a startup company building a federated social media protocol called “atproto”. Until a few months ago he worked at the Internet Archive collecting scientific research datasets and publications, and created scholar.archive.org. And before that he worked on infrastructure at Stripe, attended the Recurse Center in New York City, and built Atomic Magnetometers for a small New Jersey company called Twinleaf.

Over that same time period, Bryan climbed up and down the ladder of abstraction, obtaining an undergraduate degree in physics (at MIT), operating under-ice robots in Antarctica, developing open hardware lab instrumentation for large-scale brain probing (at LeafLabs), cataloging hundreds of millions of electronics components (at Octopart), and improved production service reliability at Stripe (a financial infrastructure start-up).

Bryan is a transplant from the East Coast and enjoys the road biking, large trees, generous salads, used bookstores, and world-class tech non-profits. This will be his third year serving on the Code of Conduct team at DWeb Camp.

Interested in attending? Register here to join!

Adopting “third-party” end-user bots for managing online communities on platforms

A screenshot of the configuration panel for Moderator functions of a popular end-user bot called Dyno, adopted by millions of communities on Discord.

Bots made by end users are crucial to the success of online communities, helping community leaders moderate content as well as manage membership and engagement. But most folks don’t have the resources to develop custom bots and turn to existing bots shared by their peers. For example, on Discord, some especially popular bots are adopted by millions of communities. However, because these bots are ultimately third-party tools — made by neither the platform nor the community leader in question — they still come with several challenges. In particular, community leaders need to develop the right understandings about a bot’s nature, value, and use in order to adopt it into their community’s existing processes and culture.

In organizational research, these “understandings” are sometimes described as technological frames, a concept developed by Orlikowski & Gash (1994) as they studied why technologies became used in unexpected ways in organizational settings. When your technological frames are well-aligned with a tool’s design, you can imagine that it is easier to assess whether that tool will be useful and can be smoothly incorporated into your organization as intended. In the context of online communities, well-aligned frames can not only reduce the labor and time of bot adoption, but also help community leaders anticipate issues that might cause harm to the community. Our new paper looks to communities on Discord and asks: How do community leaders shift their technology frames of third-party bots and leverage them to address community needs?

Emergent social ecosystems around bot adoption

Our study interviewed 16 community leaders on Discord, walking through their experiences adopting third-party bots for their communities. These interviews underscore how community leaders have developed social ecosystems around bots: organic user-to-user networks of resources, aid, and knowledge about bots across communities.

Despite the decentralized arrangement of communities on Discord, users devised and took advantage of formal and informal opportunities to revise their understandings about bots, both supporting and constraining how bots became used. This was particularly important because third-party bots pose heightened uncertainties about their reliability and security, especially for bots used to protect the community from external threads (such as scammers). For example, interviewees laid out concerns about whether a bot developer could be trusted to keep their bot online, to respond to problems users had, and to manage sensitive information. The emergent social ecosystems helped users get recommendations from others, assess the reputation of bot developers, and consider whether the bot was a good fit for them along much more nuanced dimensions (in the case of one interviewee, the values of the bot developer mattered as well). They also created opportunities for people to directly get help in setting up bots and troubleshooting them, such as via engaged discussions with other users who had more experience.

Our findings underscore a couple of core reasons why we should care about these social ecosystems:

  1. Closing gaps in bot-related skills and knowledge. Across interviews, we saw patterns of people leveraging the resources and aid in social ecosystems to move towards using more powerful but complex bots. Ultimately, people with diverse technical backgrounds (including those who stated they had no technical background) were able to adopt and use bots — even bots involving code-like configurations in markdown languages that might normally pose barriers. We suggest that the diffusion of end-user tools on social platforms be matched with efforts to provide bottom-up social scaffoldings that support exploration, learning, and user discussion of those tools.
  2. Changing perceptions of the labor involved in bot adoption. The process of bot adoption as a deeply social one appeared to impact how people saw the labor they invested into it, shifting it into something fun and satisfying. Bot adoption was both collaborative, involving many individuals as a user discovered, evaluated, set up, and fine-tuned bots; and communal, with community members themselves taking part in some of these steps. We suggest that bot adoption can provide one avenue to deepen community engagement by creating new ways of participating and generating meta discussions about the community, as well as the platform.
  3. Shaping the assumptions around third-party tools. Social ecosystems enabled people to cherry-pick functions across bots, enabling creative wiggle room in curating a set of preferred functions. At the same time, people were constrained by social signals about what bots are and can do, why certain bots are worth adopting, and how the bot is used. For example, people often talked about genres of bots even though no such formal categories existed. We suggest that spaces where leaders from different communities interact with one another to discuss strategies and experiences can be impactful settings for further research, intervention, and design ideas.

Ultimately, the social nature of adopting third-party bots in our interviews offers insight into how we can better support the adoption of valuable user-facing tools across online communities. As online harms become more and more technically sophisticated (e.g., the recent rise of AI-generated disinformation), user-made bots that quickly respond to emerging issues will play an important role in managing communities — and will be even more valuable if they can be shared across communities. Further attention to the dynamics that enable tools to be used across communities with diverse norms and goals will be important as the risks that communities face, and the tools available to them, evolve.

Engage with us!

If you have thoughts, ideas, questions, we are always happy to talk – especially if you think there are community-facing resources we can develop from this work. There are a few ways to engage with us:

  • Drop a comment below this post!
  • Check out the full paper, available ✨ open access ✨ in the ACM Digital Library.
  • Come by the talks we’ll be giving:
    • at ICA2024 on Saturday, June 22, 2024 in the “Digital Networks, Platforms, and Organizing” session at 3:00-4:15PM in Coolangatta 4 (Star L3);
    • at CSCW2024 in November; schedule is still forthcoming!
  • Connect with us on social media or via email.

Come see us at CHI 2024!

We’re going to be at CHI! The Community Date Science Collective will be presenting work from group members and affiliates. CHI is taking place in Honolulu, Hawaiʻi from May 11th – 16th.

By Robert Linsdell from St. Andrews, Canada – Flight from Honolulu to Hilo. Over Sand Island and Honolulu (503729), CC BY 2.0

Jeremy Foote (Purdue University) coauthored “How Founder Motivations, Goals, and Actions Influence Early Trajectories of Online Communities” with Sanjay R Kairam. This work will be presented at “Online Communities: Engagement A” on Tuesday, May 14th at 9:45 a.m. You can also read about Jeremy and Sanjay’s work on our blog.

Carolyn Zou (Northwestern University) will be presenting with coauthor Helena Vasconcelos on their work “Validation Without Ground Truth? Methods for Trusts in Generative Simulations” at the CHI workshops HEAL (Human-Centered Evaluation and Auditing of Language Models) and TREW (Trust and Reliance in Evolving Human-AI Workflows). They will be presenting posters at both sessions and have been selected as a highlighted paper for HEAL and will be giving a presentation on Sunday, May 12th.

Ruijia Cheng (University of Washington) will be their presenting their research on “AXNav: Replaying Accessibility Tests from Natural Language” with cowriters Maryam Taeb, Eldon Schoop, Yue Jiang, Amanda Swearngin, and Jeffrey Nichols. This presentation will be taking place at “Universal Accessibility” on Tuesday, May 14th at 4:30 p.m.

CDSC affiliate Nicholas Vincent is receiving the Outstanding Dissertation Award for their research on “Economic Concentration and Dispossessive Data Use: Can HCI Solve Challenges from and to AI?“. Nicholas will also be presenting their papers “Pika: Empowering Non-Programmers to Author Executable Governance Policies in Online Communities” with Leijie Wang, Julija Rukanskaitė, and Amy X. Zhang at “Supporting Communities” on Thursday, May 16th at 11:00 a.m. and “A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training” with Heila Precel, Brent Hecht, and Allison McDonald at “Politics of Data” on Wednesday, May 15th at 2:45 p.m.

Mandi Cai (Northwestern University) received an honorable mention award alongside coauthor Matthew Kay for their paper “Watching the Election Sausage Get Made: How Data Journalists Visualize the Vote Counting Process in U.S. Elections“. Mandi will be presenting this research at “Governance and Public Policies” on Wednesday, May 15th at 12:00 p.m.

Founders’ influence on their new online communities

Hundreds of new subreddits are created every day, but most of them go nowhere, and never receive more than a few posts or comments. On the other hand, some become wildly popular. If we want to figure out what helps some things to get attention, then looking at new and small online communities is a great place to start. Indeed, the whole focus of my dissertation was trying to understand who started new communities, and why. So, I was super excited when Sanjay Kairam at Reddit told me that Reddit was interested in studying founders of new subreddits!

The research that Sanjay and I (but mostly Sanjay!) did was accepted at CHI 2024, a leading conference for human-computer interaction research. The goal of the research is to understand 1) founders’ motivations for starting new subreddits, 2) founders’ goals for their communities, 3) founders’ plans for making their community successful, and 4) how all of these relate to what happens to a community in the first month of its existence. To figure this out, we surveyed nearly 1,000 redditors one week after they created a new subreddit.

Lots of Motivations and Goals

So, what did we learn? First, that founders have diverse motivations, but the most common is interest in the topic. As shown in the figure above, most founders reported being motivated by topic engagement, information exchange, and connecting with others, while self-promotion was much more rare.

When we asked about their goals for the community, founders were split, and each of the options we gave was ranked as a top goal by a good chunk of participants. While there is some nuance between the different versions of success, we grouped them into “quantity-oriented” and “quality-oriented”, and looked at how motivations related to goals. Somewhat unsurprisingly, folks interested in self-promotion had quantity-oriented goals, while those interested in exchanging information were more focused on quality.

Diversity in plans

We then asked founders about what plans they had for building their community, based on recommendations from the online community literature, such as raising awareness, welcoming newcomers, encouraging contributions, and regulating bad behavior. Surprisingly, for each activity, about half of people said they planned to engage in doing that thing.

Early Community Outcomes

So, how do these motivations, goals, and plans relate to community outcomes? We looked at the first 28 days of each founded subreddit, and counted the number of visitors, number of contributors, and number of subscribers. We then ran regression analyses analyzing how well each aspect of motivations, goals, and plans predicted each outcome. High-level results and regression tables are shown below. For each row, when β is positive, that means that the given feature has a positive relationship with the given outcome. The exponentiated rate ratio (RR) column provides a point estimate of the effect size. For example, Self-Promotion has an RR of 1.32, meaning that if a given person’s self-promotion motivation was one unit higher the model predicts that their community would receive 32% more visitors.

A number of motivations predicted each of the outcomes we measured. The only consistently positive predictor was topical interest. Those who started a community because of interest in a topic had more visitors, more contributors, and more subscribers than others. Interestingly, those motivated by self-promotion had more visitors, but fewer contributors and subscribers.

Goals had a less pronounced relationship with outcomes. Those with quality-oriented goals had more contributors but fewer visitors than those with quantity-oriented goals. There was no significant difference in subscribers for founders with different types of goals.

Finally, raising awareness was the strategy most associated with our success metrics, predicting all three of them. Surprisingly, encouraging contributions was associated with more contributors, but fewer visitors. While we don’t know the mechanism for sure, asking for contributions seems to provide a barrier that discourages newcomers from taking interest in a community.

So what?

We think that there are some key takeaways for platform designers and those starting new communities. Sanjay outlined many of them on the Reddit engineering blog, but I’ll recap a few.

First, topical knowledge and passion is important. This isn’t a causal study, so we don’t know the mechanisms for sure, but people who are passionate about a topic may be aware of other communities in the space and are able to find the right niche; they are also probably better at writing the kinds of welcome messages, initial posts, etc. that appeal to people interested in the topic.

Second, our work is yet more evidence that communities require different things at different points in their lifecycle. Founders should probably focus on building awareness at first, and worry less about encouraging contributions or regulating behavior.

Finally, we think there are a lot of opportunities for designers to take diverse motivations and goals seriously. This could include matching people by their motivations for using a community, developing dashboards that capture different aspects of success and community health and quality, etc.

Learn More

If you want to learn more about the paper, you have options!

CDSC welcomes Madison Deyo!

Madison Deyo has recently joined the CDSC as a Program Coordinator and we couldn’t be more thrilled to welcome her to the team!

Madison Deyo headshot.

Madison is based at Northwestern. With the CDSC, Madison’s role includes a mix of event planning and coordination; outreach and communications; and supporting the operations of the group. She also works with the Northwestern Center for Human-Computer Interaction + Design. Madison brings experience working with community-based non-profits in several different capacities.

Madison currently lives in Chicago, and grew up in Wisconsin, where she attended the University of Wisconsin-Madison. There, she received my B.S. in Art (with a focus on illustration) and Communications: Radio-TV-Film. In addition to her position at Northwestern, Madison also works as a freelance artist designing mead labels, tattoos, and occasionally album/EP covers. You can check out her portfolio.

Replication data release for examining how rules and rule-making across Wikipedias evolve over time

Screenshot of the same rule, Neutral Point of View, on five different language editions. Notably, the pages are different because they exist as connected but ultimately separate pages.

While Wikipedia is famous for its encyclopedic content, it may be surprising to realize that a whole other set of pages on Wikipedia help guide and govern the creation of the peer-produced encyclopedia. These pages extensively describe processes, rules, principles, and technical features of creating, coordinating, and organizing on Wikipedia. Because of the success of Wikipedia, these pages have provided valuable insights into how platforms might decentralize and facilitate participation in online governance. However, each language edition of Wikipedia has a unique set of such pages governing it respectively, even though they are part of the same overarching project: in other words, an under-explored opportunity to understand how governance operates across diverse groups.

In a manuscript published at ICWSM2022, we present descriptive analyses examining on rules and rule-making across language editions of Wikipedia motivated by questions like:

What happens when communities are both relatively autonomous but within a shared system? Given that they’re aligned in key ways, how do their rules and rule-making develop over time? What can patterns in governance work tell us about how communities are converging or diverging over time?

We’ve been very fortunate to share this work with the Wikimedia community since publishing the paper, such as the Wikipedia Signpost and Wikimedia Research Showcase. At the end of last year, we published the replication data and files on Dataverse after addressing a data processing issue we caught earlier in the year (fortunately, it didn’t affect the results – but yet another reminder to quadruple-check one’s data pipeline!). In the spirit of sharing the work more broadly since the Dataverse release, we summarize some of the key aspects of the work here.

Study design

In the project, we examined the five largest language editions of Wikipedia as distinct editing communities: English, German, Spanish, French and Japanese. After manually constructing lists of rules per wiki (resulting in 780 pages), we took advantage of two features on Wikipedia: the revision histories, which log every edit to every page; and the interlanguage links, which connect conceptually equivalent pages across language editions. We then conducted a series of analyses examining comparisons across and relationships between language editions.

Shared patterns across communities

Across communities, we observed that trends suggested that rule-making often became less open over time:

Figure 2 from the ICWSM paper
  • Most rules are created early in the life of the language edition community’s life. Over a nearly 20 year period, roughly 50-80% of the rules (depending on the language edition) were created within the first five years!
  • The median edit count to rule pages peaked in early years (between years 3 and 5) before tapering down. The percent of revisions dedicated to editing the actual rule text versus discussing it shifts towards discussion of rule across communities. These both suggest that rules across communities have calcified over time.

Said simply, these communities have very similar trends in rule-making towards formalization.

Divergence vs convergence in rules

Wikipedia’s interlanguage link (ILL) feature, as mentioned above, lets us explore how the rules being created and edited on communities relate to one another. While the trends above highlight similarities in rule-making, here, the picture about how the rule sets are similar or not is a bit more complicated.

On one hand, the top panel here shows that over time, all five communities see an increase in the proportion of rules in their rules sets that are unique to them individually. On the other hand, the bottom panel shows that editing efforts concentrate on rules that are more shared across communities.

Altogether, we see that communities sharing goals, technology, and a lot more develop substantial and sustained institutional variations; but it’s possible that broad, widely-shared rules created early may help keep them relatively aligned.

Key takeaways

Investigating governance across groups like Wikipedia is valuable for at least two reasons.

First, an enormous amount of effort has gone into studying governance on English Wikipedia, the largest and oldest language edition, to distill lessons about how we can meaningfully decentralize governance in online spaces. But, as prior work [e.g., 1] shows, language editions are often non-aligned in both the content they produce and how they organize that content. Some early stage work we did noted this held true for rule pages on the five language editions of Wikipedia explored here. In recent years, the Wikimedia Foundation itself has made several calls to understand dynamics and patterns beyond English Wikipedia. This work is in part in response to this movement.

Second, the questions explored in our work highlight a key tension in online governance today. While online communities are relatively autonomous entities, they often exist within social and technical systems that put them in relation with one another – whether directly or not. Effectively addressing concerns about online governance means understanding how distinct spaces online govern in ways that are similar or dissimilar, overlap or conflict, diverge and converge. Wikipedia can offer many lessons to this end because it has an especially decentralized and participatory vision of how to govern itself online, such as how patterns of formalization impact success and engagement. Future work we are working on continues in this vein – stay tuned!