Taboo subjects—such as sexuality and mental health—are as important to discuss as they are difficult to raise in conversation. Although many people turn to online resources for information on taboo subjects, censorship and low quality information are common in search results. In work that has just been published at CSCW this week, we present a series of analyses that describe how taboo shapes the process of collaborative knowledge building on English Wikipedia. Our work shows that articles on taboo subjects are much more popular and the subject of more vandalism than articles on non-taboo topics. In surprising news, we also found that they were edited more often and were of higher quality! We also found that contributors to taboo articles did less to hide their identity than we expected.
The first challenge we faced in conducting our study was building a list of Wikipedia articles on taboo topics. This was challenging because while taboo is deeply cultural and can seem natural, our individual perspectives of what is and isn’t taboo is privileged and limited. In building our list, we wanted to avoid relying on our own intuition about what qualifies as taboo. Our approach was to make use of an insight from linguistics: people develop euphemisms as ways to talk about taboos. Think about all the euphemisms we’ve devised for death, or sex, or menstruation, or mental health. Using figurative languages lets us distance ourselves from the pollution of a taboo.
We used this insight to build a new machine learning classifier based on dictionary definitions in English Wiktionary. If a ‘sense’ of a word was tagged as a euphemism, we treated the words in the definition as indicators of taboo. The end result of this analysis is a series of words and phrases that most powerfully differentiate taboo from non-taboo. We then did a simple match between those words and phrases and Wikipedia article titles. We built a comparison sample of articles whose titles are words that, like our taboo articles, appear in Wiktionary definitions.
We used this new dataset to test a series of hypotheses about how taboo shapes collaborative production in Wikipedia. Our initial hypotheses were based on the idea that taboo information is often in high demand but that Wikipedians might be reluctant to associate their names (or usernames) with taboo topics. The result, we argued, would be articles that were in high demand but of low quality. What we found was that taboo articles are thriving on Wikipedia! In summary, we found in comparison to non-taboo articles:
Taboo articles are more popular (as expected).
Taboo articles receive more contributions (contrary to expectations).
Taboo articles receive more low-quality contributions (as expected).
Taboo articles are higher quality (contrary to expectations).
Taboo article contributors are more likely to contribute without an account (as expected), and have less experience (as expected), but that accountholders are more likely to make themselves more identifiable by having a user page, disclosing their gender, and making themselves emailable (all three of these are contrary to expectation!).
For more details, visualizations, statistics, and more, we hope you’ll take a look at our paper. If you are attending CSCW in October 2023, we also hope and come to our CSCW presentation in Minneapolis!
The full citation for the paper is: Champion, Kaylea, and Benjamin Mako Hill. 2023. “Taboo and Collaborative Knowledge Production: Evidence from Wikipedia.” Proceedings of the ACM on Human-Computer Interaction 7 (CSCW2): 299:1-299:25. https://doi.org/10.1145/3610090.
It’s Ph.D. application season and the Community Data Science Collective is recruiting! As always, we are looking for talented people to join our research group. Applying to one of the Ph.D. programs that the CDSC faculty members are affiliated with is a great way to get involved in research on communities, collaboration, and peer production.
Because we know that you may have questions for us that are not answered in this webpage, we will be hosting an open house and Q&A about the CDSC and Ph.D. opportunities on Friday, October 20 at 18:00 UTC (2:00pm US Eastern, 1:00pm US Central, 11:00am US Pacific). You can register online.
This post provides a very brief run-down on the CDSC, the different universities and Ph.D. programs our faculty members are affiliated with, and some general ideas about what we’re looking for when we review Ph.D. applications.
What are these different Ph.D. programs? Why would I choose one over the other?
This year the group includes three faculty principal investigators (PIs) who are actively recruiting PhD students: Aaron Shaw (Northwestern University), Benjamin Mako Hill (University of Washington in Seattle), and Jeremy Foote (Purdue University). Each of these PIs advise Ph.D. students in Ph.D. programs at their respective universities. Our programs are each described below.
Although we often work together on research and serve as co-advisors to students in each others’ projects, each faculty person has specific areas of expertise and interests. The reasons you might choose to apply to one Ph.D. program or to work with a specific faculty member could include factors like your previous training, career goals, and the alignment of your specific research interests with our respective skills.
At the same time, a great thing about the CDSC is that we all collaborate and regularly co-advise students across our respective campuses, so the choice to apply to or attend one program does not prevent you from accessing the expertise of our whole group. But please keep in mind that our different Ph.D. programs have different application deadlines, requirements, and procedures!
Faculty who are actively recruiting this year
If you are interested in applying to any of the programs, we strongly encourage you to reach out the specific faculty in that program before submitting an application.
Jeremy Foote is an Assistant Professor at the Brian Lamb School of Communication at Purdue University. He is affiliated with the Organizational Communication and Media, Technology, and Society programs. Jeremy’s research focuses on how individuals decide when and in what ways to contribute to online communities, how communities change the people who participate in them, and how both of those processes can help us to understand which things become popular and influential. Most of his research is done using data science methods and agent-based simulations.
AaronShaw is an Associate Professor in the Department of Communication Studies at Northwestern. In terms of Ph.D. programs, Aaron’s primary affiliations are with the Media, Technology and Society (MTS) and the Technology and Social Behavior (TSB) Ph.D. programs (please note: the TSB program is a joint degree between Communication and Computer Science). Aaron also has a courtesy appointment in the Sociology Department at Northwestern, but he has not directly supervised any Ph.D. advisees in that department (yet). Aaron’s current projects focus on comparative analysis of the organization of peer production communities and social computing projects, participation inequalities in online communities, and collaborative organizing in pursuit of public goods.
What do you look for in Ph.D. applicants?
There’s no easy or singular answer to this. In general, we look for curious, intelligent people driven to develop original research projects that advance scientific and practical understanding of topics that intersect with any of our collective research interests.
To get an idea of the interests and experiences present in the group, read our respective bios and CVs (follow the links above to our personal websites). Specific skills that we and our students tend to use on a regular basis include consuming and producing social science and/or social computing (human-computer interaction) research; applied statistics and statistical computing, various empirical research methods, social theory and cultural studies, and more.
Formal qualifications that speak to similar skills and show up in your resume, transcripts, or work history are great, but we are much more interested in your capacity to learn, think, write, analyze, and/or code effectively than in your credentials, test scores, grades, or previous affiliations. It’s graduate school and we do not expect you to show up knowing how to do all the things already.
Intellectual creativity, persistence, and a willingness to acquire new skills and problem-solve matter a lot. We think doctoral education is less about executing tasks that someone else hands you and more about learning how to identify a new, important problem; develop an appropriate approach to solving it; and explain all of the above and why it matters so that other people can learn from you in the future. Evidence that you can or at least want to do these things is critical. Indications that you can also play well with others and would make a generous, friendly colleague are really important too.
All of this is to say, we do not have any one trait or skill set we look for in prospective students. We strive to be inclusive along every possible dimension. Each person who has joined our group has contributed unique skills and experiences as well as their own personal interests. We want our future students and colleagues to do the same.
Still not sure whether or how your interests might fit with the group? Still have questions? Still reading and just don’t want to stop? Follow the links above for more information. Feel free to send at least one of us an email. We are happy to try to answer your questions and always eager to chat. You can also join our open house on October 20 at 2:00pm ET (UTC-4).
Welcome to a bonus round of our series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23!
Eriol Fox presented their talk, “Community lead user research and usability in Science and Research OSS: What we learned,” (due to scheduling issues, this landed in the Wildcard track, but it was definitely on-topic for Science of Community! Eriol introduced us to their work exploring how scientists and researchers think about open source software, including differences in norms and motivations as well as challenges around the structure of labor. They also brought along copies of their 4 super cool zines from this project!
You can watch the talk HERE and learn more about Eriol’s work HERE.
Welcome to part 7 of a 7-part series spotlighting presentations from the Science of Community track at FOSSY 23!
In this interactive session, Dr. Benjamin Mako Hill, Dr. Aaron Shaw, and Kaylea Champion hosted a series of conversations with FOSS community members about finding research, putting it to use, and building partnerships between researchers and communities!
This talk was (intentionally!) not recorded, but we’ve synthesized the resources we shared into this wiki page.
Welcome to part 5 of our 7-part series reviewing all the great talks we were fortunate enough to host during the Science of Community track at this year’s FOSSY.
In this talk, Dr. Kajita introduced us to the work being done as part of the Apereo (formerly JA-SIG/Sakai) to create FOSS platforms to serve as academic and administrative infrastructure in higher education. Research data management is a skill that emerging scholars must learn to do modern quantitative research — and this skill can be scaffolded and tracked via the Karuta portfolio tool.
Watch the talk HERE, learn more about Karuta HERE, and learn more about Dr. Kajita HERE.
Welcome to part 4 of a 7-part series spotlighting the excellent talks we were fortunate enough to host during the Science of Community track at FOSSY 23!
Kaylea presented on her new research project to identify how packages come to be undermaintained, in particular investigating assumptions that it’s all about “the old stuff” — old packages, old languages. It turns out that’s only part of the story — older packages and software written in older languages do tend to be undermaintained, but old packages in old languages — the tried and true, as it were — do relatively well!
Watch the talk HERE and learn more about Kaylea’s work HERE.
Wikimania, the annual global conference of the Wikimedia movement, took place in Singapore last month. For the first time since 2019, the conference was held in person again. It was attended by over 670 people in-person and more than 1,500 remotely.
At the conference, Benjamin Mako Hill, Tilman Bayer, and Miriam Redi presented “The State of Wikimedia Research: 2022–2023”, an overview of scholarship and academic research on Wikipedia and other Wikimedia projects from the last year. This resumed an annual Wikimania tradition started by Mako back in 2008 as a graduate student, aiming to provide “a quick tour … of the last year’s academic landscape around Wikimedia and its projects geared at non-academic editors and readers.” With hundreds of research publications every year featuring Wikipedia in their title (and more recently, Wikidata too), is it of course impossible to cover all important research results within one hour. Hence our presentation aimed to identify a set of important themes that attracted researchers’ attention during the past year, and illustrate each theme with a brief “research postcard” summary of one particular publication. Unfortunately, Miriam was not able to be in Singapore to present..
This year’s presentation focused on seven such research themes:
Theme 2. Wikidata as a community While Wikidata is the subject of over 100 published studies each year, the vast majority of these have been primarily concerned with the project’s content as a database which scientists use to advance research about e.g. the semantic web, knowledge graphs and ontology management. This year also saw several papers studying Wikidata as a community, including a study of how Wikidata contributors use talk page to coordinate (preprint).
Theme 4. Rules and governance Research on rules and governance continues to attract researchers’ attention. Here, we featured a new paper by a political scientist that documented important changes in how English Wikipedia’s NPoV (Neutral Point of View) policy has been applied over time, and used this to advance an explanation for political change in general.
Theme 6. Measuring Wikipedia’s own content bias Despite the huge interest in content gaps along dimensions such as race and gender, systematic approaches to measuring them have not been as frequent as one might hope. We featured a paper that advanced our understanding in this regard, presented a useful method, and is also one of the first to study differences in intersectional identities.
Theme 7. Critical and humanistic approaches Although most of the published research work related to Wikipedia is based in the sciences or engineering disciplines, a growing body of humanities scholarship can offer important insights as well. We highlighted a recent humanities paper about the measuring of race and ethnicity gaps on Wikipedia, which focused in particular on gaps in such measurements themselves, placing them into a broader social context.
Again, this work represents just a tiny fraction of what has been published about Wikipedia in the last year. In particular, we avoided research that was presented elsewhere in Wikimania’s research track.