Exceedingly Reproducible Research: A Proposal

The reproducibility movement in science has sought to increase our confidence in scientific knowledge by having research teams disseminate their data, instruments, and code so that other researchers can reproduce their work. Unfortunately, all approaches to reproducible research to date suffer from the same fundamental flaw: they seek to reproduce the results of previous research while making no effort to reproduce the research process that led to those results. We propose a new method of Exceedingly Reproducible Research (ERR) to close this gap. This blog post will introduce scientists to the error of their ways, and to the ERR of ours.

Even if a replication appears to have succeeded in producing tables and figures that appear identical to those in the original, they differ in that they are providing answers to different questions. An example from our own work illustrates the point.

Rise and Decline on Wikia
Figure 1: Active editors on Wikia wikis over time (taken from TeBlunthuis, Shaw, and Hill 2018)

Figure 1 above shows the average number of contributors (in standardized units) to a series of large wikis drawn from Wikia. It was created to show the life-cycles of large online communities and published in a paper last year.

Rise and Decline on Wikia

Figure 2: Replication of Figure 1 from TeBlunthuis, Shaw, and Hill (2018)

Results from a replication are shown in Figure 2. As you can see, the plots have much in common. However, deeper inspection reveals that the similarity is entirely superficial. Although the dots and lines fall in the same places on the graphs, they fall there for entirely different reasons.

Tilting at windmills in Don Quixote.

Figure 1 reflects a lengthy exploration and refinement of a (mostly) original idea and told us something we did not know. Figure 2 merely tells us that the replication was “successful.” They look similar and may confuse a reader into thinking that they reflect the same thing. But they are as different as night as day. We are like Pierre Menard who reproduced two chapters of Don Quixote word-for-word through his own experiences: the image appears similar but the meaning is completely changed. In that we made no attempt to reproduce the research process, our attempt at replication was doomed before it began.

How Can We Do Better?

Scientific research is not made by code and data, it is made by people. In order to replicate a piece of work, one should reproduce all parts of the research. One must retrace another’s steps, as it were, through the garden of forking paths.

In ERR, researchers must conceive of the idea, design the research project, collect the data, write the code, and interpret the results. ERR involves carrying out every relevant aspect of the research process again, from start to finish. What counts as relevant? Because nobody has attempted ERR before, we cannot know for sure. However, we are reasonably confident that successful ERR will involve taking the same courses as the original scientists, reading the same books and articles, having the same conversations at conferences, conducting the same lab meetings, recruiting the same research subjects, and making the same mistakes.

There are many things that might affect a study indirectly and that, as a result, must also be carried out again. For example, it seems likely that a researcher attempting to ERR must read the same novels, eat the same food, fall prey to the same illnesses, live in the same homes, date and marry the same people, and so on. To ERR, one must  have enough information to become the researchers as they engage in the research process from start to finish.

It seems likely that anyone attempting to ERR will be at a major disadvantage when they know that previous research exists. It seems possible that ERR can only be conducted by researchers who never realize that they are engaged in the process of replication at all. By reading this proposal and learning about ERR, it may be difficult to ever carry it out successfully.

Despite these many challenges, ERR has important advantages over traditional approaches to reproducibility. Because they will all be reproduced along the way, ERR requires no replication datasets or code. Of course, to verify that one is “in ERR” will require access to extensive intermediary products. Researchers wanting to support ERR in their own work should provide extensive intermediary products from every stage of the process. Toward that end, the Community Data Science Collective has started creating videos of our lab meetings in the form of PDF flipbooks well suited to deposition in our university’s institutional archives. A single frame is shown in Figure 3. We have released our video_to_pdf tool under a free license which you can use to convert your own MP4 videos to PDF.

Frame from Video
Figure 3: PDF representation of one frame of a lab meeting between three members of the lab, produced using video_to_pdf. The full lab meeting is 25,470 pages (an excerpt is available).

With ERR, reproduction results in work that is as original as the original work. Only by reproducing the original so fully, so totally, and in such rigorous detail will true scientific validation become possible. We do not so much seek stand on the shoulders of giants, but rather to inhabit the body of the giant. If to reproduce is human; to ERR is divine.

Workshop on Casual Inference in Online Communities

Casual Inference Logo

The last decade has seen a massive increase in formality and rigor in quantitative and statistical research methodology in the social scientific study of online communities. These changes have led to higher reliability, increased reproducibility, and increased faith that our findings accurately reflect empirical reality. Unfortunately, these advancements have not come without important costs. When high methodological standards make it harder for scientists to know things, we lose the ability to speak about important phenomena and relationships.

There are many studies that simply cannot be done with the highest levels of statistical rigor. Significant social concepts such as race and gender can never truly be randomly assigned. There are relationships that are rare enough that they can never be described with a p-value of less than 0.05. To understand these phenomena, our methodology must be more relaxed. In our rush to celebrate the benefits of rigor and formality, social scientists are not exploring the ways in which more casual forms of statistical inference can be useful.

To discuss these issues and their impact in social computing research, the Community Data Science Collective will be holding the first ever workshop on Casual Inference in Online Communities this coming October in Evanston, Illinois. We hope to announce specific dates soon.

Although our program remains to be finalized, we’re currently planning to organize the workshop around five panels:

Panel 1: Relaxing Assumptions
A large body of work in statistics has critiqued the arbitrary and rigid “p < .05” significance standard and pointed to problems like “p-hacking” that it has caused. But what about the benefits that flow from a standard of evidence that one out of twenty non-effects can satisfy? In this panel, we will discuss some of the benefits of p-value standards that allow researchers to easily reject the null hypothesis that there is no effect.
For example, how does science benefit from researchers’ ability to keep trying models until they find a publishable result? What do we learn when researchers can easily add or drop correlated measures to achieve significance? We will also talk about promising new methods available to scientists for overcoming high p-values like choosing highly informative Bayesian priors that ensure credible intervals far away from 0. We will touch on unconstrained optimization, a new way of fitting models by “guesstimating” parameters.
Panel 2: Exputation of Missing Data
Missing data is a major problem in social research. The most common ways of addressing missing data are imputation methods. Of course, imputation techniques bring with them assumptions that are hard to understand and often violated. How might types of imputation less grounded in data and theory help? How might we relax assumptions to infer things more casually about data—and with data—that we can not, and will not, ever have? How can researchers use their beliefs and values to infer data?
Our conversation will focus on exputation, a new approach that allows researches to use their intuition, beliefs, and desires to imagine new data.  We will touch on multiple exputation techniques where researchers engage in the process repeatedly to narrow in on desired results.
Panel 3: Quasi-Quasi Experiments
Not every study can be at the scientific gold standard of a randomized control experiment. The idea of quasi-experiments are designed to relax certain assumptions and requirements in order to draw similar types of inference from non-experimental settings. This panel will ask what might we gain if we were relax things even more.
What might we learn from quasi-quasi experiments, where shocks aren’t quite exogenous (and might not even be that shocking)? We also hope to discuss superficial intelligence, post hoc ergo propter hoc techniques, supernatural experiments, and symbolic matching based on superficial semantic similarities.
Panel 4: Irreproducible Results
Since every researcher and every empirical context is unique, why do we insist that the same study conducted by different researchers should not be? What might be gained from embracing, or even pursuing, irreproducible methods in our research? What might we see if we allow ourselves to be the giants upon whose shoulders we stand?
Panel 5: Research Ethics
[Canceled]

Although we are hardly the first people to talk about casual inference, we believe this will be the first academic meeting on the topic in any field. Please plan to join us if you can!

If you would like to apply to participate, please send a position paper or extended abstract (no more than 1000 words) to casualinference@communitydata.cc. We plan to post a list of the best submissions.


Workshop logo based on the “Hammock” icon by Gan Khoon Lay from the Noun Project.

Introducing the Cannabis Data Science Collective

In 2012, Washington State became one of the first two US states to legalize cannabis for non-medical use. Since then, sales tax revenues from the “green economy” have flooded state coffers. Washington’s academic institutions have been elevated by that rising tide. The University of Washington (one of our research group’s two institutional homes) is now home to pot-focused grants from UW’s Center for Cannabis Research and the UW Law School’s Cannabis Law and Policy Project.

Today, our research group — formerly known as the “Community Data Science Collective” — announces that we too will be raiding that pantry to satisfy our own munchies.  Toward that end, we have changed our name to the Cannabis Data Science Collective. We’ll still be the CDSC, but we’re changing our logo to match our new focus.

The CDSC’s new logo!

Our research will leverage our existing expertise in studying the chronic challenges faced by online communities, peer production, and social computing. We plan to blaze ahead on this path to greener pastures.

Although we’re still in the early days of this new research focus, our group has started a work on series of projects related to cannabis, communication, and social computing. The preliminary titles below are a bit half-baked, but will give you a whiff of what’s to come:

  • Altered state: Mobile device usage on public university campuses before and after marijuana legalization
  • A tale of two edibles: Automated polysemy detection and the stevia/sativa controversy
  • Best buds: Online friendship formation and recreational drug use
  • Bing bong: The effect of legalization on Microsoft’s search results
  • Blunt truths: The effect of the joint probability distribution on community participation
  • Dank memes: The role of viral social media in marijuana legalization
  • Decision trees: The role of deliberation in governance of a marijuana sub-Reddit
  • The Effects of cannabis on word usage: An analysis of Wikipedia articles pre/post pot legalization
  • Fully baked: Evidence of the importance of completing institutionalized socialization practices from an online cannabis community
  • Ganja rep: A novel approach to managing identity on the World Weed Web
  • Hashtags: Bottom-up organization of the marijuana-focused Internet public sphere
  • Higher calling: Marijuana use and altruistic behavior online
  • Joint custody: Overcoming territoriality with shared ownership of work products in a collaborative cannabis community
  • Pass the piece: Hardware design and social exchange norms in synchronous marijuana-sharing communities
  • Pipe dreams: Fan fiction and the imagined futures of the marijuana legalization movement
  • Sticky icky: Keeping order with pinned messages on an online marijuana discussion board
  • Turn on, tune in, drop out: Wikipedia participation rates following marijuana legalization
  • Weed and Wikipedia: Marijuana legalization and public goods participation
  • World Weed Web: A look at the global conversation before and after half of the United States decriminalized

We planned to post this announcement about three weeks ago but our efforts were blunted by a series of events outside our control. We figured it was high time to make the announcement today!