New Grant for Studying “Underproduction” in Software Infrastructure

Earlier this year, a team led by Kaylea Champion were announced as recipients of a generous grant from the Ford and Sloan Foundations to support research into into peer produced software infrastructure. Now that the project is moving forward in earnest, we’re thrilled to tell you about it.

In the foreground, the photo depicts a rusted sign with "To rapid transit" and an arrow. The sign is marked with tagging-style graffiti. In the background are rusted iron girders, part of the infrastructure of the L train.
Rapid Transit. Photo by Anthony Doudt, via flickr. CC BY-NC-ND 2.0

The project is motivated by the fact that peer production communities have produced awesome free (both as in freedom and beer) resources—sites like Wikipedia that gather the world’s knowledge, and software like Linux that enables innovation, connection, commerce, and discovery. Over the last two decades, these resources have become key elements of public digital infrastructure that many of us rely on every day. However, some pieces of digital infrastructure we rely on most remain relatively under-resourced—as security vulnerabilities like Heartbleed in OpenSSL reveal. The grant from Ford and Sloan aims will support a research effort to understand how and why some software packages that are heavily used receive relatively little community support and maintenance.

We’re tackling this challenge by seeking to measure and model patterns of usage, contribution, and quality in a population of free software projects. We’ll then try to identify causes and potential solutions to the challenges of relative underproduction. Throughout, we’ll draw on both insight from the research community and on-the-ground observations from developers and community managers. We aim to create practical guidance that communities and software developers can actually use as well as novel research contributions. Underproduction is, appropriately enough, a challenge that has not gotten much attention from researchers previously, so we’re excited to work on it.

Although Kaylea Champion is leading the project, the team working on the project includes Benjamin Mako Hill, Aaron Shaw, and collective affiliate Morten Warncke-Wang who did pioneering work on underproduction in Wikipedia.

Community Data Science Collective at ICA 2019 in Washington, DC

Jeremy Foote, Nate TeBlunthuisWm Salt Hale, and Mako Hill will all be in Washington DC this week for the  International Communication Association’s 2019 annual meeting.

In particular, we be presenting a new paper from the group led by Sneha Narayan titled “All Talk: How Increasing Interpersonal Communication on Wikis May Not Enhance Productivity.” The talk will be on Monday, May 27 in a session from 9:30 to 10:45 in Washington Hilton LL, Holmead as part of a session organized by the ICA Computational Methods section on “Computational Approaches to Health Communication.”

Additionally, Nate is co-organizing a pre-conference at ICA on “Expanding Computational Communication: Towards a Pipeline for Graduate Students and Early Career Scholars” along with Josephine Lukito (UW Madison) and Frederic Hopp (UC Santa Barbara). The pre-conference will be held at American University on Friday May 24th. As part of that workshop, Nate and Jeremy will be giving a presentation on approaches to the study organizational communication that use computational methods.

We look forward to sharing our research and socializing with you at ICA! Please be in touch if you’re around and want to meet up!

The Shifting Dynamics of Boys and Girls’ Decision to Share their Creative Programming Projects

Informal online learning communities are one of the most exciting and successful ways to engage young people in technology. As the most successful example of the approach, over 40 million children from around the world have created accounts on the Scratch online community where they learn to code by creating interactive art, games, and stories. However, despite its enormous reach and its focus on inclusiveness, participation in Scratch is not as broad as one would hope. For example, reflecting a trend in the broader computing community, more boys have signed up on the Scratch website than girls.

In a recently published paper, I worked with several colleagues from the Community Data Science Collective to unpack the dynamics of unequal participation by gender in Scratch by looking at whether Scratch users choose to share the projects they create. Our analysis took advantage of the fact that less than a third of projects created in Scratch are ever shared publicly. By never sharing, creators never open themselves to the benefits associated with interaction, feedback, socialization, and learning—all things that research has shown participation in Scratch can support.

Overall, we found that boys on Scratch share their projects at a slightly higher rate than girls. Digging deeper, we found that this overall average hid an important dynamic that emerged over time. The graph below shows the proportion of Scratch projects shared for male and female Scratch users’ 1st created projects, 2nd created projects, 3rd created projects, and so on. It reflects the fact that although girls share less often initially, this trend flips over time. Experienced girls share much more than often than boys!

Proportion of projects shared by gender across experience levels, measured as the number of projects created, for 1.1 million Scratch users. Projects created by girls are less likely to be shared than those by boys until about the 9th project is created. The relationship is subsequently reversed.

We unpacked this dynamic using a series of statistical models estimated using data from over 5 million projects by over a million Scratch users. This set of analyses echoed our earlier preliminary finding—while girls were less likely to share initially, more experienced girls shared projects at consistently higher rates than boys. We further found that initial differences in sharing between boys and girls could be explained by controlling for differences in project complexity and in the social connectedness of the project creator.

Another surprising finding is that users who had received more positive peer feedback, at least as measured by receipt of “love its” (similar to “likes” on Facebook), were less likely to share their subsequent projects than users who had received less. This relation was especially strong for boys and for more experienced Scratch users. We speculate that this could be due to a phenomenon known in the music industry as “sophomore album syndrome” or “second album syndrome”—a term used to describe a musician who has had a successful first album but struggles to produce a second because of increased pressure and expectations caused by their previous success


This blog post and the paper are collaborative work with Benjamin Mako Hill and Sayamindu Dasgupta. You can find more details about our methodology and results in the text of our paper, “Gender, Feedback, and Learners’ Decisions to Share Their Creative Computing Projects” which is freely available and published open access in the Proceedings of the ACM on Human-Computer Interaction 2 (CSCW): 54:1-54:23.

New Research on How Anonymity is Perceived in Open Collaboration

Online anonymity often gets a bad rap and complaints about antisocial behavior from anonymous Internet users are as old as the Internet itself. On the other hand, research has shown that many Internet users seek out anonymity to protect their privacy while contributing things of value. Should people seeking to contribute to open collaboration projects like open source software and citizen science projects be required to give up identifying information in order to participate?

We conducted a two-part study to better understand how open collaboration projects balance the threats of bad behavior with the goal of respecting contributors’ expectations of privacy. First, we interviewed eleven people from five different open collaboration “service providers” to understand what threats they perceive to their projects’ mission and how these threats shape privacy and security decisions when it comes to anonymous contributions. Second, we analyzed discussions about anonymous contributors on publicly available logs of the English language Wikipedia mailing list from 2010 to 2017.

In the interview study, we identified three themes that pervaded discussions of perceived threats. These included threats to:

  1. community norms, such as harrassment;
  2. sustaining participation, such as loss of or failure to attract volunteers; and
  3. contribution quality, low-quality contributions drain community resources.

We found that open collaboration providers were most concerned with lowering barriers to participation to attract new contributors. This makes sense given that newbies are the lifeblood of open collaboration communities. We also found that service providers thought of anonymous contributions as a way of offering low barriers to participation, not as a way of helping contributors manage their privacy. They imagined that anonymous contributors who wanted to remain in the community would eventually become full participants by registering for an account and creating an identity on the site. This assumption was evident in policies and technical features of collaboration platforms that barred anonymous contributors from participating in discussions, receiving customized suggestions, or from contributing at all in some circumstances. In our second study of the English language Wikipedia public email listserv, we discovered that the perspectives we encountered in interviews also dominated discussions of anonymity on Wikipedia. In both studies, we found that anonymous contributors were seen as “second-class citizens.”

This is not the way anonymous contributors see themselves. In a study we published two years ago, we interviewed people who sought out privacy when contributing to open collaboration projects. Our subjects expressed fears like being doxed, shot at, losing their job, or harassed. Some were worried about doing or viewing things online that violated censorship laws in their home country. The difference between the way that anonymity seekers see themselves and the way they are seen by service providers was striking.

One cause of this divergence in perceptions around anonymous contributors uncovered by our new paper is that people who seek out anonymity are not able to participate fully in the process of discussing and articulating norms and policies around anonymous contribution. People whose anonymity needs means they cannot participate in general cannot participate in the discussions that determine who can participate.

We conclude our paper with the observation that, although social norms have played an important role in HCI research, relying on them as a yardstick for measuring privacy expectations may leave out important minority experiences whose privacy concerns keep them from participating in the first place. In online communities like open collaboration projects, social norms may best reflect the most privileged and central users of a system while ignoring the most vulnerable


Both this blog post and the paper, Privacy, Anonymity, and Perceived Risk in Open Collaboration: A Study of Service Providers, was written by Nora McDonald, Benjamin Mako Hill, Rachel Greenstadt, and Andrea Forte and will be published in the Proceedings of the 2019 ACM CHI Conference on Human Factors in Computing Systems next week. The paper will be presented at the CHI conference in Glasgow, UK on Wednesday May 8, 2019. The work was supported by the National Science Foundation (awards CNS-1703736 and CNS-1703049).

Exceedingly Reproducible Research: A Proposal

The reproducibility movement in science has sought to increase our confidence in scientific knowledge by having research teams disseminate their data, instruments, and code so that other researchers can reproduce their work. Unfortunately, all approaches to reproducible research to date suffer from the same fundamental flaw: they seek to reproduce the results of previous research while making no effort to reproduce the research process that led to those results. We propose a new method of Exceedingly Reproducible Research (ERR) to close this gap. This blog post will introduce scientists to the error of their ways, and to the ERR of ours.

Even if a replication appears to have succeeded in producing tables and figures that appear identical to those in the original, they differ in that they are providing answers to different questions. An example from our own work illustrates the point.

Rise and Decline on Wikia
Figure 1: Active editors on Wikia wikis over time (taken from TeBlunthuis, Shaw, and Hill 2018)

Figure 1 above shows the average number of contributors (in standardized units) to a series of large wikis drawn from Wikia. It was created to show the life-cycles of large online communities and published in a paper last year.

Rise and Decline on Wikia

Figure 2: Replication of Figure 1 from TeBlunthuis, Shaw, and Hill (2018)

Results from a replication are shown in Figure 2. As you can see, the plots have much in common. However, deeper inspection reveals that the similarity is entirely superficial. Although the dots and lines fall in the same places on the graphs, they fall there for entirely different reasons.

Tilting at windmills in Don Quixote.

Figure 1 reflects a lengthy exploration and refinement of a (mostly) original idea and told us something we did not know. Figure 2 merely tells us that the replication was “successful.” They look similar and may confuse a reader into thinking that they reflect the same thing. But they are as different as night as day. We are like Pierre Menard who reproduced two chapters of Don Quixote word-for-word through his own experiences: the image appears similar but the meaning is completely changed. In that we made no attempt to reproduce the research process, our attempt at replication was doomed before it began.

How Can We Do Better?

Scientific research is not made by code and data, it is made by people. In order to replicate a piece of work, one should reproduce all parts of the research. One must retrace another’s steps, as it were, through the garden of forking paths.

In ERR, researchers must conceive of the idea, design the research project, collect the data, write the code, and interpret the results. ERR involves carrying out every relevant aspect of the research process again, from start to finish. What counts as relevant? Because nobody has attempted ERR before, we cannot know for sure. However, we are reasonably confident that successful ERR will involve taking the same courses as the original scientists, reading the same books and articles, having the same conversations at conferences, conducting the same lab meetings, recruiting the same research subjects, and making the same mistakes.

There are many things that might affect a study indirectly and that, as a result, must also be carried out again. For example, it seems likely that a researcher attempting to ERR must read the same novels, eat the same food, fall prey to the same illnesses, live in the same homes, date and marry the same people, and so on. To ERR, one must  have enough information to become the researchers as they engage in the research process from start to finish.

It seems likely that anyone attempting to ERR will be at a major disadvantage when they know that previous research exists. It seems possible that ERR can only be conducted by researchers who never realize that they are engaged in the process of replication at all. By reading this proposal and learning about ERR, it may be difficult to ever carry it out successfully.

Despite these many challenges, ERR has important advantages over traditional approaches to reproducibility. Because they will all be reproduced along the way, ERR requires no replication datasets or code. Of course, to verify that one is “in ERR” will require access to extensive intermediary products. Researchers wanting to support ERR in their own work should provide extensive intermediary products from every stage of the process. Toward that end, the Community Data Science Collective has started creating videos of our lab meetings in the form of PDF flipbooks well suited to deposition in our university’s institutional archives. A single frame is shown in Figure 3. We have released our video_to_pdf tool under a free license which you can use to convert your own MP4 videos to PDF.

Frame from Video
Figure 3: PDF representation of one frame of a lab meeting between three members of the lab, produced using video_to_pdf. The full lab meeting is 25,470 pages (an excerpt is available).

With ERR, reproduction results in work that is as original as the original work. Only by reproducing the original so fully, so totally, and in such rigorous detail will true scientific validation become possible. We do not so much seek stand on the shoulders of giants, but rather to inhabit the body of the giant. If to reproduce is human; to ERR is divine.

Benjamin Mako Hill is a Research Symbiont!

In exciting news, Benjamin Mako Hill was just announced as a winner of a 2019 Research Symbiont Award.  Mako received the second annual General Symbiosis Award which “is given to a scientist working in any field who has shared data beyond the expectations of their field.” The award was announced at a ceremony in Hawaii at the Pacific Symposium in Biocomputing.

The award presentation called out Mako’s work on the preparation of the Scratch research dataset that includes the first five years of longitudinal data from the Scratch online community. Andrés Monroy-Hernández worked with Mako on that project. Mako’s nomination also mentioned his research groups’ commitment to the production of replication datasets as well as his work with Aaron Shaw on datasets of redirects and page protection from Wikipedia. Mako was asked to talk about this work in his a short video he recorded that was shown at the award ceremony.

Plush salmon with lamprey parasite.
A photo of the award itself: a plush fish complete with a parasitic lamprey.

The Research Symbionts Awards are given annually to recognize “symbiosis” in the form of data sharing. They are a companion award to the Research Parasite Awards which recognize superb examples of secondary data reuse. The award includes money to travel to the Pacific Symposium Computing (unfortunately, Mako wasn’t able to take advantage of this!) as well the plush fish with parasitic lamprey shown here.

In addition to the award given to Mako, Dr. Leonardo Collado-Torres was announced as the recipient of the health-specific Early Career Symobiont award for his work on Recount2.

Awards and citations at computing conferences

I’ve heard a surprising “fact” repeated in the CHI and CSCW communities that receiving a best paper award at a conference is uncorrelated with future citations. Although it’s surprising and counterintuitive, it’s a nice thing to think about when you don’t get an award and its a nice thing to say to others when you do. I’ve thought it and said it myself.

It also seems to be untrue. When I tried to check the “fact” recently, I found a body of evidence that suggests that computing papers that receive best paper awards are, in fact, cited more often than papers that do not.

The source of the original “fact” seems to be a CHI 2009 study by Christoph Bartneck and Jun Hu titled “Scientometric Analysis of the CHI Proceedings.” Among many other things, the paper presents a null result for a test of a difference in the distribution of citations across best papers awardees, nominees, and a random sample of non-nominees.

Although the award analysis is only a small part of Bartneck and Hu’s paper, there have been at least two papers have have subsequently brought more attention, more data, and more sophisticated analyses to the question.  In 2015, the question was asked by Jaques Wainer, Michael Eckmann, and Anderson Rocha in their paper “Peer-Selected ‘Best Papers’—Are They Really That ‘Good’?

Wainer et al. build two datasets: one of papers from 12 computer science conferences with citation data from Scopus and another papers from 17 different conferences with citation data from Google Scholar. Because of parametric concerns, Wainer et al. used a non-parametric rank-based technique to compare awardees to non-awardees.  Wainer et al. summarize their results as follows:

The probability that a best paper will receive more citations than a non best paper is 0.72 (95% CI = 0.66, 0.77) for the Scopus data, and 0.78 (95% CI = 0.74, 0.81) for the Scholar data. There are no significant changes in the probabilities for different years. Also, 51% of the best papers are among the top 10% most cited papers in each conference/year, and 64% of them are among the top 20% most cited.

The question was also recently explored in a different way by Danielle H. Lee in her paper on “Predictive power of conference‐related factors on citation rates of conference papers” published in June 2018.

Lee looked at 43,000 papers from 81 conferences and built a regression model to predict citations. Taking into an account a number of controls not considered in previous analyses, Lee finds that the marginal effect of receiving a best paper award on citations is positive, well-estimated, and large.

Why did Bartneck and Hu come to such a different conclusions than later work?

Distribution of citations (received by 2009) of CHI papers published between 2004-2007 that were nominated for a best paper award (n=64), received one (n=12), or were part of a random sample of papers that did not (n=76).

My first thought was that perhaps CHI is different than the rest of computing. However, when I looked at the data from Bartneck and Hu’s 2009 study—conveniently included as a figure in their original study—you can see that they did find a higher mean among the award recipients compared to both nominees and non-nominees. The entire distribution of citations among award winners appears to be pushed upwards. Although Bartneck and Hu found an effect, they did not find a statistically significant effect.

Given the more recent work by Wainer et al. and Lee, I’d be willing to venture that the original null finding was a function of the fact that citations is a very noisy measure—especially over a 2-5 post-publication period—and that the Bartneck and Hu dataset was small with only 12 awardees out of 152 papers total. This might have caused problems because the statistical test the authors used was an omnibus test for differences in a three-group sample that was imbalanced heavily toward the two groups (nominees and non-nominees) in which their appears to be little difference. My bet is that the paper’s conclusions on awards is simply an example of how a null effect is not evidence of a non-effect—especially in an underpowered dataset.

Of course, none of this means that award winning papers are better. Despite Wainer et al.’s claim that they are showing that award winning papers are “good,” none of the analyses presented can disentangle the signalling value of an award from differences in underlying paper quality. The packed rooms one routinely finds at best paper sessions at conferences suggest that at least some additional citations received by award winners might be caused by extra exposure caused by the awards themselves. In the future, perhaps people can say something along these lines instead of repeating the “fact” of the non-relationship.


This post was originally posted on Benjamin Mako Hill’s blog Copyrighteous.

Apply to join the Community Data Science Collective!

It’s Ph.D. application season and the Community Data Science Collective is recruiting! As always, we are looking for talented people to join our research group. Applying to one of the Ph.D. programs that Aaron, Mako, and Sayamindu are affiliated with is a great way to do that.

This post provides a very brief run-down on the CDSC, the different universities and Ph.D. programs we’re affiliated with, and what we’re looking for when we review Ph.D. applications. It’s quite close to the deadline for some of our programs, but we hope this post will still be useful to prospective applicants now and in the future.

Community data science collective group photo (April, 2018)
Members of the CDSC and friends assembled for a group meeting at UW in April, 2018. From left to right the people in the picture are: Julia, Charlie, Nate, Aaron, Salt, Sneha, Emilia, Sayamindu (hiding), Kaylea, Jeremy, Mako. Photo credit: Sage Ross (cc-by-sa)

What is the Community Data Science Collective?

The Community Data Science Collective (or CDSC) is a joint research group of (mostly) quantitative social scientists and designers pursuing research about the organization of online communities, peer production, and learning and collaboration in social computing systems. We are based at Northwestern University, the University of Washington, and (most recently!) the University of North Carolina, Chapel Hill. You can read more about us and our work on our research group blog and on the collective’s website/wiki.

What are these different Ph.D. programs? Why would I choose one over the other?

The group currently includes three faculty principal investigators (PIs): Aaron Shaw (Northwestern University), Benjamin Mako Hill (University of Washington in Seattle), and Sayamindu Dasgupta (University of North Carolina at Chapel Hill). The three PIs advise Ph.D. students in multiple Ph.D. programs at their respective universities. Our programs are each described below.

Although we often work together on research and serve as co-advisors to students in each others’ projects, each faculty person has specific areas of expertise and unique interests. The reasons you might choose to apply to one Ph.D. program or to work with a specific faculty member include factors like your previous training, career goals, and the alignment of your specific research interests with our respective skills.

At the same time, a great thing about the CDSC is that we all collaborate and regularly co-advise students across our respective campuses, so the choice to apply to or attend one program does not prevent you from accessing the expertise of our whole group. But please keep in mind that our different Ph.D. programs have different application deadlines, requirements, and procedures!

Ph.D. Advisors

Sayamindu Dasgupta head shot
Sayamindu Dasgupta

Sayamindu Dasgupta is is an Assistant Professor in the School of Information and Library Science at UNC Chapel Hill. Sayamindu’s research focus includes data science education for children and informal learning online—this work involves both system building and empirical studies.

Benjamin Mako Hill

Benjamin Mako Hill is an Assistant Professor of Communication at the University of Washington. He is also an Adjunct Assistant Professor at UW’s Department of Human-Centered Design and Engineering (HCDE). Although almost all of Mako’s students are in the Department of Communication, he also advises students in the Department of Computer Science and Engineering and can advise students in HCDE as well—although he typically has no ability to admit students into those programs. Mako’s research focuses on population-level studies of peer production projects, computational social science, and efforts to democratize data science.

Aaron Shaw. (Photo credit: Nikki Ritcher Photography, cc-by-sa)

Aaron Shaw is an Assistant Professor in the Department of Communication Studies at Northwestern. In terms of Ph.D. programs, Aaron’s primary affiliations are with the Media, Technology and Society (MTS) and the Technology and Social Behavior (TSB) Ph.D. programs. Aaron also has a courtesy appointment in the Sociology Department at Northwestern, but he has not directly supervised any Ph.D. advisees in that department (yet). Aaron’s current research projects focus on comparative analysis of the organization of peer production communities and social computing projects, participation inequalities in online communities, and empirical research methods.

What do you look for in Ph.D. applicants?

There’s no easy or singular answer to this. In general, we look for curious, intelligent people driven to develop original research projects that advance scientific and practical understanding of topics that intersect with any of our collective research interests.

To get an idea of the interests and experiences present in the group, read our respective bios and CVs (follow the links above to our personal websites). Specific skills that we and our students tend to use on a regular basis include experience consuming and producing social science and/or social computing (human-computer interaction) research; applied statistics and statistical computing, various empirical research methods, social theory and cultural studies, and more.

Formal qualifications that speak to similar skills and show up in your resume, transcripts, or work history are great, but we are much more interested in your capacity to learn, think, write, analyze, and/or code effectively than in your credentials, test scores, grades, or previous affiliations. It’s graduate school and we do not expect you to show up pre-certified in all the ways or knowing how to do all the things already.

Intellectual creativity, persistence, and a willingness to acquire new skills and problem-solve matter a lot. We think doctoral education is less about executing a task that someone else hands you and more about learning how to identify a new, important problem; develop an appropriate approach to solving it; and explain all of the above and why it matters so that other people can learn from you in the future. Evidence that you can or at least want to do these things is critical. Indications that you can also play well with others and would make a generous, friendly colleague are really important too.

All of this is to say, we do not have any one trait or skill set we look for in prospective students. We strive to be inclusive along every possible dimension. Each person who has joined our group has contributed unique skills and experiences as well as their own personal interests. We want our future students and colleagues to do the same.

Now what?

Still not sure whether or how your interests might fit with the group? Still have questions? Still reading and just don’t want to stop? Follow the links above for more information. Feel free to send at least one of us an email. We are happy to try to answer your questions and always eager to chat.

A proposal to mitigate false discovery in CSCW research

This post was co-authored by Benjamin Mako Hill and Aaron Shaw. We wrote it following a conversation with the CSCW 2018 papers chairs. At their encouragement, we put together this proposal that we plan to bring to the CSCW town hall meeting. Thanks to Karrie Karahalios, Airi Lampinen, Geraldine Fitzpatrick, and Andrés Monroy-Hernández for engaging in the conversation with us and for facilitating the participation of the CSCW community.

False discovery in empirical research

There is growing evidence that an enormous portion of published quantitative research is wrong. In fields where recognition of “false discovery” has prompted systematic re-examinations of published findings, it has led to a replication crisis. For example, a systematic attempt to reproduce influential results in social psychology failed to replicate a majority of them. Another attempt focused on social research in top general science journals and failed to replicate more than a third and found that the size of effects were, on average, overstated by a factor of two.

Quantitative methodologists argue that the high rates of false discovery are, among other reasons, a function of common research practices carried out in good faith. Such practices include accidental or intentional p-hacking where researchers try variations of their analysis until they find significant results; a garden of forking paths where researcher decisions lead to a vast understatement of the number of true “researcher degrees of freedom” in their research designs; the file-drawer problem which leads only statistically significant results to be published; and underpowered studies, which make it so that only overstated effect sizes can be detected.

Graph of the relationship between statistical power and the rates of false discovery. [Taken from this answer on the statistics Q&A site Cross Validated.]
To the degree that much of CSCW and HCI use the same research methods and approaches as these other social scientific fields, there is every reason to believe that these issues extend to social computing research. Of course, given that replication is exceedingly rare in HCI, HCI researchers will rarely even find out that a result is wrong.

To date, no comprehensive set of solutions to these issues exists. However, scholarly communities can take steps to reduce the threat of false discovery. One set of approaches to doing so involves the introduction of changes to the way quantitative studies are planned, executed, and reviewed. We want to encourage the CSCW community to consider supporting some of these practices.

Among the approaches developed and adopted in other research communities, several involve breaking up research into two distinct stages: a first stage in which research designs are planned, articulated, and recorded; and a second stage in which results are computed following the procedures in the recorded design (documenting any changes). This stage-based process ensures that designs cannot shift in ways that shape findings without some clear acknowledgement that such a shift has occurred. When changes happen, adjustments can sometimes be made in the computation of statistical tests. Readers and reviewers of the work can also have greater awareness of the degree to which the statistical tests accurately reflect the analysis procedures or not and adjust their confidence in the findings accordingly.

Versions of these stage-based research designs were first developed in biomedical randomized controlled trials (RCTs) and are extremely widespread in that domain. For example, pre-registration of research designs is now mandatory for NIH funded RCTs and several journals are reviewing and accepting or rejecting studies based on pre-registered designs before results are known.

A proposal for CSCW

In order to address the challenges posed by false discovery, CSCW could adopt a variety of approaches from other fields that have already begun to do so. These approaches entail more or less radical shifts to the ways in which CSCW research gets done, reviewed, and published.

As a starting point, we want to initiate discussion around one specific proposal that could be suitable for a number of social computing studies and would require relatively little in the way of changes to the research and reviewing processes used in our community.

Drawing from a series of methodological pieces in the social sciences ([1], [2], [3]), we propose a method based on split-sample designs that would be entirely optional for CSCW authors at the time of submission.

Essentially, authors who chose to do so could submit papers which were written—and which will be reviewed and revised—based on one portion of their dataset with the understanding that the paper would be published using identical analytic methods also applied to a second, previously un-analyzed portion of the dataset. Authors submitting under this framework would choose to have their papers reviewed, revised and resubmitted, and accepted or rejected based on the quality of the research questions, framing, design, execution, and significance of the study overall. The decision would not be based on the statistical significance of final analysis results.

The idea follows from the statistical technique of “cross validation,” in which an analysis is developed on one subset of data (usually called the “training set”) and then replicated on at least one other subset (the “test set”).

To conduct a project using this basic approach, a researcher would:

  • Randomly partition their full dataset into two (or more) pieces.
  • Design, refine, and complete their analysis using only one piece identified as the training sample.
  • Undergo the CSCW review process using the results from this analysis of the training sample.
  • If the submission receives a decision of “Revise and Resubmit,” authors would then make changes to the analysis of the training sample as requested by ACs and reviewers in the current normal way.
  • If the paper is accepted for publication, the authors would then (and only then) run the final version of the analysis using another piece of their data identified as the test sample and publish those results in the paper.
  • We expect that authors would also publish the training set results used during review in the online supplement to their paper uploaded to the ACM Digital Library.
  • Like any other part part of a paper’s methodology, the split sample procedure would be documented in appropriate parts of the paper.

We are unaware of prior work in social computing that has applied this process. Researchers in data mining, machine learning, and related fields of computer science use cross-validation all the time, they do so differently in order to solve distinct problems (typically related to model overfitting).

The main benefits of this approach (discussed in much more depth in the references at the beginning of this section) would be:

  • Heightened reliability and reproducibility of the analysis.
  • Reduced risk that findings reflect spurious relationships, p-hacking, researcher or reviewer degrees of freedom, or other pitfalls of statistical inference common in the analysis of behavioral data—i.e., protection against false discovery.
  • A procedural guarantee that the results do not determine the publication (or not) of the work—i.e., protection against publication bias.

The most salient risk from the approach is that results might change when authors run the final analysis on the test set.  In the absence of p-hacking and similar issues, such changes will usually be small and will mostly impact the magnitude of effects estimates and their associated standard errors. However, some changes might be more dramatic. Dealing with changes of this sort would be harder for authors and reviewers and would potentially involve something along the lines of the shepherding that some papers receive now.

Let’s talk it over!

This blog post is meant to spark a wider discussion. We hope this can happen during CSCW this year and beyond. We believe the procedure we have proposed would enhance the reliability of our work and is workable in CSCW because it involves narrow changes to the way that quantitative CSCW research and reviewing is usually conducted. We also believe this procedure would serve the long term interests of the HCI and social computing research community. CSCW is a leader in building better models of scientific publishing within HCI through the R&R process, eliminated page limits, the move to PACM, and more. We would like to extend this spirit to issues of reproducibility and publication bias. We are eager to discuss our proposal and welcome suggestions for changes.


[1] Michael L Anderson and Jeremy Magruder. Split-sample strategies for avoiding false discoveries. Technical report, National Bureau of Economic Research, 2017. https://www.nber.org/papers/w23544
[2] Susan Athey and Guido Imbens. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27):7353–7360, 2016. https://doi.org/10.1073/pnas.1510489113
[3] Marcel Fafchamps and Julien Labonne. Using split samples to improve inference on causal effects. Political Analysis, 25(4):465–482, 2017. https://doi.org/10.1017/pan.2017.22

Why organizational culture matters for online communities

Leaders and scholars of online communities tend of think of community growth as the aggregate effect of inexperienced individuals arriving one-by-one. However, there is increasing evidence that growth in many online communities today involves newcomers arriving in groups with previous experience together in other communities. This difference has deep implications for how we think about the process of integrating newcomers. Instead of focusing only on individual socialization into the group culture, we must also understand how to manage mergers of existing groups with distinct cultures. Unfortunately, online community mergers have, to our knowledge, never been studied systematically.

To better understand mergers, I spent six months in 2017 conducting ethnographic participant observation in two World of Warcraft raid guilds planning and undergoing mergers. The results—visible in the attendance plot below—shows that the top merger led to a thriving and sustainable community while the bottom merger led to failure and the eventual dissolution of the group. Why did one merger succeed while the other failed? What can managers of other communities learn from these examples?

In my new paper that will be published in the Proceedings of of the ACM Conference on Computer-supported Cooperative Work and Social Computing (CSCW) and that I will present in New Jersey next month, my coauthors and I try to answer these questions.

Raid team attendance before and after merging. Guilds were given pseudonyms to protect the identity of the research subjects.

In my research setting, World of Warcraft (WoW), players form organized groups called “guilds” to take on the game’s toughest bosses in virtual dungeons that are called “raids.” Raids can be extremely challenging, and they require a large number of players to be successful. Below is a video demonstrating the kind of communication and coordination needed to be successful as a raid team in WoW.

Because participation in a raid guild requires time, discipline, and emotional investment, raid guilds are constantly losing members and recruiting new ones to resupply their ranks. One common strategy for doing so is arranging formal mergers. My study involved following two such groups as they completed mergers. To collect data for my study, I joined both groups, attended and recorded all activities, took copious field notes, and spent hours interviewing leaders.

Although I did not anticipate the divergent outcomes shown in the figure above when I began, I analyzed my data with an eye toward identifying themes that might point to reasons for the success of one merger and the failure of the other. The answers that emerged from my analysis suggest that the key differences that led one merger to be successful and the other to fail revolved around differences in the ways that the two mergers managed organizational culture. This basic insight is supported by a body of research about organizational culture in firms but seem to have not made it onto the radar of most members or scholars of online communities. My coauthors and I think more attention to the role that organizational culture plays in online communities is essential.

We found evidence of cultural incompatibility in both mergers and it seems likely that some degree of cultural clashes is inevitable in any merger. The most important result of our analysis are three observations we drew about specific things that the successful merger did to effectively manage organizational culture. Drawn from our analysis, these themes point to concrete things that other communities facing mergers—either formal or informal—can do.

A recent, random example of a guild merger recruitment post found on the WoW forums.

First, when planning mergers, groups can strategically select other groups with similar organizational culture. The successful merger in our study involved a carefully planned process of advertising for a potential merger on forums, testing out group compatibility by participating in “trial” raid activities with potential guilds, and selecting the guild that most closely matched their own group’s culture. In our settings, this process helped prevent conflict from emerging and ensured that there was enough common ground to resolve it when it did.

Second, leaders can plan intentional opportunities to socialize members of the merged or acquired group. The leaders of the successful merger held community-wide social events in the game to help new members learn their community’s norms. They spelled out these norms in a visible list of rules. They even included the new members in both the brainstorming and voting process of changing the guild’s name to reflect that they were a single, new, cohesive unit. The leaders of the failed merger lacked any explicitly stated community rules, and opportunities for socializing the members of the new group were virtually absent. Newcomers from the merged group would only learn community norms when they broke one of the unstated social codes.

The guild leaders in the successful merger documented every successful high end raid boss achievement in a community-wide “Hall of Fame” journal. A screenshot is taken with every guild member who contributed to the achievement and uploaded to a “Hall of Fame” page.

Third and finally, our study suggested that social activities can be used to cultivate solidarity between the two merged groups, leading to increased retention of new members. We found that the successful guild merger organized an additional night of activity that was socially-oriented. In doing so, they provided a setting where solidarity between new and existing members can cultivate and motivate their members to stick around and keep playing with each other — even when it gets frustrating.

Our results suggest that by preparing in advance, ensuring some degree of cultural compatibility, and providing opportunities to socialize newcomers and cultivate solidarity, the potential for conflict resulting from mergers can be mitigated. While mergers between firms often occur to make more money or consolidate resources, the experience of the failed merger in our study shows that mergers between online communities put their entire communities at stake. We hope our work can be used by leaders in online communities to successfully manage potential conflict resulting from merging or acquiring members of other groups in a wide range of settings.

Much more detail is available our paper which will be published open access and which is currently available as a preprint.


Both this blog post and  the paper it is based on are collaborative work by Charles Kiene from the University of Washington, Aaron Shaw from Northwestern University, and Benjamin Mako Hill from the University of Washington. We are also thrilled to mention that the paper received a Best Paper Honorable Mention award at CSCW 2018!