Community Data Science Collective at CSCW 2019

A good chunk of the Community Data Science Collective is in Austin, Texas this week for the 2019 ACM Conference on Computer-supported Cooperative Work and Social Computing (CSCW).

The conference marks the official publication of four papers by collective students and faculty. All four papers were published in the journal Proceedings of the ACM on Human-Computer Interaction: CSCW.

Information on the talks as well as links to the papers are available here (CSCW members are listed in italics):

Additionally, Charlie Kiene is co-organizing a workshop on moderation:

And Sneha Narayan is participating in a panel:

Finally, Mako Hill is chairing a session:

  • Mon, Nov 11 14:30 – 16:00: Wikipedia and Wiki Research

Salt, Kaylea, Charlie, Regina, and Kaylea will all be at the conference as will affiliate Andrés Monroy-Hernández and tons of our social computing friends. Please come and say “Hello” to any of us, introduce yourself if you don’t already know us, and pick up a CDSC sticker!

New Research on How Anonymity is Perceived in Open Collaboration

Online anonymity often gets a bad rap and complaints about antisocial behavior from anonymous Internet users are as old as the Internet itself. On the other hand, research has shown that many Internet users seek out anonymity to protect their privacy while contributing things of value. Should people seeking to contribute to open collaboration projects like open source software and citizen science projects be required to give up identifying information in order to participate?

We conducted a two-part study to better understand how open collaboration projects balance the threats of bad behavior with the goal of respecting contributors’ expectations of privacy. First, we interviewed eleven people from five different open collaboration “service providers” to understand what threats they perceive to their projects’ mission and how these threats shape privacy and security decisions when it comes to anonymous contributions. Second, we analyzed discussions about anonymous contributors on publicly available logs of the English language Wikipedia mailing list from 2010 to 2017.

In the interview study, we identified three themes that pervaded discussions of perceived threats. These included threats to:

  1. community norms, such as harrassment;
  2. sustaining participation, such as loss of or failure to attract volunteers; and
  3. contribution quality, low-quality contributions drain community resources.

We found that open collaboration providers were most concerned with lowering barriers to participation to attract new contributors. This makes sense given that newbies are the lifeblood of open collaboration communities. We also found that service providers thought of anonymous contributions as a way of offering low barriers to participation, not as a way of helping contributors manage their privacy. They imagined that anonymous contributors who wanted to remain in the community would eventually become full participants by registering for an account and creating an identity on the site. This assumption was evident in policies and technical features of collaboration platforms that barred anonymous contributors from participating in discussions, receiving customized suggestions, or from contributing at all in some circumstances. In our second study of the English language Wikipedia public email listserv, we discovered that the perspectives we encountered in interviews also dominated discussions of anonymity on Wikipedia. In both studies, we found that anonymous contributors were seen as “second-class citizens.”

This is not the way anonymous contributors see themselves. In a study we published two years ago, we interviewed people who sought out privacy when contributing to open collaboration projects. Our subjects expressed fears like being doxed, shot at, losing their job, or harassed. Some were worried about doing or viewing things online that violated censorship laws in their home country. The difference between the way that anonymity seekers see themselves and the way they are seen by service providers was striking.

One cause of this divergence in perceptions around anonymous contributors uncovered by our new paper is that people who seek out anonymity are not able to participate fully in the process of discussing and articulating norms and policies around anonymous contribution. People whose anonymity needs means they cannot participate in general cannot participate in the discussions that determine who can participate.

We conclude our paper with the observation that, although social norms have played an important role in HCI research, relying on them as a yardstick for measuring privacy expectations may leave out important minority experiences whose privacy concerns keep them from participating in the first place. In online communities like open collaboration projects, social norms may best reflect the most privileged and central users of a system while ignoring the most vulnerable


Both this blog post and the paper, Privacy, Anonymity, and Perceived Risk in Open Collaboration: A Study of Service Providers, was written by Nora McDonald, Benjamin Mako Hill, Rachel Greenstadt, and Andrea Forte and will be published in the Proceedings of the 2019 ACM CHI Conference on Human Factors in Computing Systems next week. The paper will be presented at the CHI conference in Glasgow, UK on Wednesday May 8, 2019. The work was supported by the National Science Foundation (awards CNS-1703736 and CNS-1703049).

Awards and citations at computing conferences

I’ve heard a surprising “fact” repeated in the CHI and CSCW communities that receiving a best paper award at a conference is uncorrelated with future citations. Although it’s surprising and counterintuitive, it’s a nice thing to think about when you don’t get an award and its a nice thing to say to others when you do. I’ve thought it and said it myself.

It also seems to be untrue. When I tried to check the “fact” recently, I found a body of evidence that suggests that computing papers that receive best paper awards are, in fact, cited more often than papers that do not.

The source of the original “fact” seems to be a CHI 2009 study by Christoph Bartneck and Jun Hu titled “Scientometric Analysis of the CHI Proceedings.” Among many other things, the paper presents a null result for a test of a difference in the distribution of citations across best papers awardees, nominees, and a random sample of non-nominees.

Although the award analysis is only a small part of Bartneck and Hu’s paper, there have been at least two papers have have subsequently brought more attention, more data, and more sophisticated analyses to the question.  In 2015, the question was asked by Jaques Wainer, Michael Eckmann, and Anderson Rocha in their paper “Peer-Selected ‘Best Papers’—Are They Really That ‘Good’?

Wainer et al. build two datasets: one of papers from 12 computer science conferences with citation data from Scopus and another papers from 17 different conferences with citation data from Google Scholar. Because of parametric concerns, Wainer et al. used a non-parametric rank-based technique to compare awardees to non-awardees.  Wainer et al. summarize their results as follows:

The probability that a best paper will receive more citations than a non best paper is 0.72 (95% CI = 0.66, 0.77) for the Scopus data, and 0.78 (95% CI = 0.74, 0.81) for the Scholar data. There are no significant changes in the probabilities for different years. Also, 51% of the best papers are among the top 10% most cited papers in each conference/year, and 64% of them are among the top 20% most cited.

The question was also recently explored in a different way by Danielle H. Lee in her paper on “Predictive power of conference‐related factors on citation rates of conference papers” published in June 2018.

Lee looked at 43,000 papers from 81 conferences and built a regression model to predict citations. Taking into an account a number of controls not considered in previous analyses, Lee finds that the marginal effect of receiving a best paper award on citations is positive, well-estimated, and large.

Why did Bartneck and Hu come to such a different conclusions than later work?

Distribution of citations (received by 2009) of CHI papers published between 2004-2007 that were nominated for a best paper award (n=64), received one (n=12), or were part of a random sample of papers that did not (n=76).

My first thought was that perhaps CHI is different than the rest of computing. However, when I looked at the data from Bartneck and Hu’s 2009 study—conveniently included as a figure in their original study—you can see that they did find a higher mean among the award recipients compared to both nominees and non-nominees. The entire distribution of citations among award winners appears to be pushed upwards. Although Bartneck and Hu found an effect, they did not find a statistically significant effect.

Given the more recent work by Wainer et al. and Lee, I’d be willing to venture that the original null finding was a function of the fact that citations is a very noisy measure—especially over a 2-5 post-publication period—and that the Bartneck and Hu dataset was small with only 12 awardees out of 152 papers total. This might have caused problems because the statistical test the authors used was an omnibus test for differences in a three-group sample that was imbalanced heavily toward the two groups (nominees and non-nominees) in which their appears to be little difference. My bet is that the paper’s conclusions on awards is simply an example of how a null effect is not evidence of a non-effect—especially in an underpowered dataset.

Of course, none of this means that award winning papers are better. Despite Wainer et al.’s claim that they are showing that award winning papers are “good,” none of the analyses presented can disentangle the signalling value of an award from differences in underlying paper quality. The packed rooms one routinely finds at best paper sessions at conferences suggest that at least some additional citations received by award winners might be caused by extra exposure caused by the awards themselves. In the future, perhaps people can say something along these lines instead of repeating the “fact” of the non-relationship.


This post was originally posted on Benjamin Mako Hill’s blog Copyrighteous.

What we lose when we move from social to market exchange

Couchsurfing and Airbnb are websites that connect people with an extra guest room or couch with random strangers on the Internet who are looking for a place to stay. Although Couchsurfing predates Airbnb by about five years, the two sites are designed to help people do the same basic thing and they work in extremely similar ways. They differ, however, in one crucial respect. On Couchsurfing, the exchange of money in return for hosting is explicitly banned. In other words, couchsurfing only supports the social exchange of hospitality. On Airbnb, users must use money: the website is a market on which people can buy and sell hospitality.

Graph of monthly signups on Couchsurfing and Airbnb.
Comparison of yearly sign-ups of trusted hosts on Couchsurfing and Airbnb. Hosts are “trusted” when they have any form of references or verification in Couchsurfing and at least one review in Airbnb.

The figure above compares the number of people with at least some trust or verification on both Couchsurfing and Airbnb based on when each user signed up. The picture, as I have argued elsewhere, reflects a broader pattern that has occurred on the web over the last 15 years. Increasingly, social-based systems of production and exchange, many like Couchsurfing created during the first decade of the Internet boom, are being supplanted and eclipsed by similar market-based players like Airbnb.

In a paper led by Max Klein that was recently published and will be presented at the ACM Conference on Computer-supported Cooperative Work and Social Computing (CSCW) which will be held in Jersey City in early November 2018, we sought to provide a window into what this change means and what might be at stake. At the core of our research were a set of interviews we conducted with “dual-users” (i.e. users experienced on both Couchsurfing and Airbnb). Analyses of these interviews pointed to three major differences, which we explored quantitatively from public data on the two sites.

First, we found that users felt that hosting on Airbnb appears to require higher quality services than Couchsurfing. For example, we found that people who at some point only hosted on Couchsurfing often said that they did not host on Airbnb because they felt that their homes weren’t of sufficient quality. One participant explained that:

“I always wanted to host on Airbnb but I didn’t actually have a bedroom that I felt would be sufficient for guests who are paying for it.”

An another interviewee said:

“If I were to be paying for it, I’d expect a nice stay. This is why I never Airbnb-hosted before, because recently I couldn’t enable that [kind of hosting].”

We conducted a quantitative analysis of rates of Airbnb and Couchsurfing in different cities in the United States and found that median home prices are positively related to number of per capita Airbnb hosts and a negatively related to the number of Couchsurfing hosts. Our exploratory models predicted that for each $100,000 increase in median house price in a city, there will be about 43.4 more Airbnb hosts per 100,000 citizens, and 3.8 fewer hosts on Couchsurfing.

A second major theme we identified was that, while Couchsurfing emphasizes people, Airbnb places more emphasis on places. One of our participants explained:

“People who go on Airbnb, they are looking for a specific goal, a specific service, expecting the place is going to be clean […] the water isn’t leaking from the sink. I know people who do Couchsurfing even though they could definitely afford to use Airbnb every time they travel, because they want that human experience.”

In a follow-up quantitative analysis we conducted of the profile text from hosts on the two websites with a commonly-used system for text analysis called LIWC, we found that, compared to Couchsurfing, a lower proportion of words in Airbnb profiles were classified as being about people while a larger proportion of words were classified as being about places.

Finally, our research suggested that although hosts are the powerful parties in exchange on Couchsurfing, social power shifts from hosts to guests on Airbnb. Reflecting a much broader theme in our interviews, one of our participants expressed this concisely, saying:

“On Airbnb the host is trying to attract the guest, whereas on Couchsurfing, it works the other way round. It’s the guest that has to make an effort for the host to accept them.”

Previous research on Airbnb has shown that guests tend to give their hosts lower ratings than vice versa. Sociologists have suggested that this asymmetry in ratings will tend to reflect the direction of underlying social power balances.

power difference bar graph
Average sentiment score of reviews in Airbnb and Couchsurfing, separated by direction (guest-to-host, or host-to-guest). Error bars show the 95% confidence interval.

We both replicated this finding from previous work and found that, as suggested in our interviews, the relationship is reversed on Couchsurfing. As shown in the figure above, we found Airbnb guests will typically give a less positive review to their host than vice-versa while in Couchsurfing guests will typically a more positive review to the host.

As Internet-based hospitality shifts from social systems to the market, we hope that our paper can point to some of what is changing and some of what is lost. For example, our first result suggests that less wealthy participants may be cut out by market-based platforms. Our second theme suggests a shift toward less human-focused modes of interaction brought on by increased “marketization.” We see the third theme as providing somewhat of a silver-lining in that shifting power toward guests was seen by some of our participants as a positive change in terms of safety and trust in that guests. Travelers in unfamiliar places often are often vulnerable and shifting power toward guests can be helpful.

Although our study is only of Couchsurfing and Airbnb, we believe that the shift away from social exchange and toward markets has broad implications across the sharing economy. We end our paper by speculating a little about the generalizability of our results. I have recently spoken at much more length about the underlying dynamics driving the shift we describe in my recent LibrePlanet keynote address.

More details are available in our paper which we have made available as a preprint on our website. The final version is behind a paywall in the ACM digital library.


This blog post, and paper that it describes, is a collaborative project by Maximilian Klein, Jinhao Zhao, Jiajun Ni, Isaac Johnson, Benjamin Mako Hill, and Haiyi Zhu. Versions of this blog post were posted on several of our personal and institutional websites. Support came from GroupLens Research at the University of Minnesota and the Department of Communication at the University of Washington.

Shannon’s Ghost

I’m spending the 2018-2019 academic year as a fellow at the Center for Advanced Study in the Behavioral Sciences (CASBS) at Stanford.

Claude Shannon on a bicycle.

Every CASBS study is labeled with a list of  “ghosts” who previously occupied the study. This year, I’m spending the year in Study 50 where I’m haunted by an incredible cast that includes many people whose scholarship has influenced and inspired me.

The top part of the list of ghosts in Study #50 at CASBS.

Foremost among this group is Study 50’s third occupant: Claude Shannon

At 21 years old, Shannon’s masters thesis (sometimes cited as the most important masters thesis in history) proved that electrical circuits could encode any relationship expressible in Boolean logic and opened the door to digital computing. Incredibly, this is almost never cited as Shannon’s most important contribution. That came in 1948 when he published a paper titled A Mathematical Theory of Communication which effectively created the field of information theory. Less than a decade after its publication, Aleksandr Khinchin (the mathematician behind my favorite mathematical constant) described the paper saying:

Rarely does it happen in mathematics that a new discipline achieves the character of a mature and developed scientific theory in the first investigation devoted to it…So it was with information theory after the work of Shannon.

As someone whose own research is seeking to advance computation and mathematical study of communication, I find it incredibly propitious to be sharing a study with Shannon.

Although I teach in a communication department, I know Shannon from my background in computing. I’ve always found it curious that, despite the fact Shannon’s 1948 paper is almost certainly the most important single thing ever published with the word “communication” in its title, Shannon is rarely taught in communication curricula is sometimes completely unknown to communication scholars.

In this regard, I’ve thought a lot about this passage in Robert’s Craig’s  influential article “Communication Theory as a Field” which argued:

In establishing itself under the banner of communication, the discipline staked an academic claim to the entire field of communication theory and research—a very big claim indeed, since communication had already been widely studied and theorized. Peters writes that communication research became “an intellectual Taiwan-claiming to be all of China when, in fact, it was isolated on a small island” (p. 545). Perhaps the most egregious case involved Shannon’s mathematical theory of information (Shannon & Weaver, 1948), which communication scholars touted as evidence of their field’s potential scientific status even though they had nothing whatever to do with creating it, often poorly understood it, and seldom found any real use for it in their research.

In preparation for moving into Study 50, I read a new biography of Shannon by Jimmy Soni and Rob Goodman and was excited to find that Craig—although accurately describing many communication scholars’ lack of familiarity—almost certainly understated the importance of Shannon to communication scholarship.

For example, the book form of Shannon’s 1948 article was published by University Illinois on the urging of and editorial supervision of Wilbur Schramm (one of the founders of modern mass communication scholarship) who was a major proponent of Shannon’s work. Everett Rogers (another giant in communication) devotes a chapter of his “History of Communication Studies”² to Shannon and to tracing his impact in communication. Both Schramm and Rogers built on Shannon in parts of their own work. Shannon has had an enormous impact, it turns out, in several subareas of communication research (e.g., attempts to model communication processes).

Although I find these connections exciting. My own research—like most of the rest of communication—is far from the substance of technical communication processes at the center of Shannon’s own work. In this sense, it can be a challenge to explain to my colleagues in communication—and to my fellow CASBS fellows—why I’m so excited to be sharing a space with Shannon this year.

Upon reflection, I think it boils down to two reasons:

  1. Shannon’s work is both mathematically beautiful and incredibly useful. His seminal 1948 article points to concrete ways that his theory can be useful in communication engineering including in compression, error correcting codes, and cryptography. Shannon’s focus on research that pushes forward the most basic type of basic research while remaining dedicated to developing solutions to real problems is a rare trait that I want to feature in my own scholarship.
  2. Shannon was incredibly playful. Shannon played games, juggled constantly, and was always seeking to teach others to do so. He tinkered, rode unicycles, built a flame-throwing trumpet, and so on. With Marvin Minsky, he invented the “ultimate machine”—a machine that’s only function is to turn itself off—which he kept on his desk.

    A version of the Shannon’s “ultimate machine” that is sitting on my desk at CASBS.

I have no misapprehension that I will accomplish anything like Shannon’s greatest intellectual achievements during my year at CASBS. I do hope to be inspired by Shannon’s creativity, focus on impact, and playfulness. In my own little ways, I hope to build something at CASBS that will advance mathematical and computational theory in communication in ways that Shannon might have appreciated.


  1. Incredibly, the year that Shannon was in Study 50, his neighbor in Study 51 was Milton Friedman. Two thoughts: (i) Can you imagine?! (ii) I definitely chose the right study!
  2. Rogers book was written, I found out, during his own stint at CASBS. Alas, it was not written in Study 50.

This post was also published on Benjamin Mako Hill’s blog.

Sayamindu Dasgupta Joining the University of North Carolina Faculty

Sayamindu Dasgupta head shotThe School of Information and Library Sciences (SILS) at the University of North Carolina in Chapel Hill announced this week that the Community Data Science Collective’s very own Sayamindu Dasgupta will be joining their faculty as a tenure-track assistant professor. The announcement from SILS has much more detail and makes very clear that UNC is thrilled to have him join their faculty.

UNC has has every reason to be excited. Sayamindu has been making our research collective look good for several years. Much of this is obvious in the pile of papers and awards he’ s built. In less visible roles, Sayamindu has helped us build infrastructure, mentored graduate and undergraduate students in the group, and has basically just been  joy to have around.

Those of us that work in the Community Data Lab at UW is going to miss having Sayamindu around. Chapel Hill is very, very lucky to have him.

Community Data Science Collective at ICA 2018 in Prague

Jeremy Foote, Nate TeBlunthuis, and Mako Hill are in Prague this week for the  International Communication Association’s 2018 annual meeting.

ICA 2018 (Prague)

The collective has three things on the conference program this year:

  • Fri, May 25, 9:30 to 10:45, Hilton Prague, LL, Vienna: An Agent-Based Model of Online Community Joining as part of the Computational Methods section paper session on “Agent-Based Modeling for Communication Research” — Jeremy Foote (presenting), Benjamin Mako Hill and Nathan TeBlunthuis
  • Fri, May 25, 12:30 to 13:45, Hilton Prague, LL, Congress Hall II – Exhibit Hall/Posters: Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects as part of the Information Systems section’s poster session “ICA Interactive Paper/Poster Session I” —Nathan TeBlunthuis (presenting), Aaron Shaw, and Benjamin Mako Hill
  • Mon, May 28, 9:30 to 10:45, Hilton Prague, M, Palmovka: Theory Building Beyond Communities: Population-Level Research in the Computational Methods section’s panel on “Communication in the Networked Age: A Discussion of Theory Building through Data-Driven Research” — Benjamin Mako Hill (presenting) and Aaron Shaw

We look forward to sharing our research and socializing with you at ICA! Please be in touch if you’re around and want to meet up!

Open Lab at the University of Washington

If you are at the University of Washington (or not at UW but in Seattle) and are interested in seeing what we’re up to, you can join us for a Community Data Science Collective “open lab” this Friday (April 6th) 3-5pm in our new lab space (CMU 306). Collective members from Northwestern University will be in town as well, so there’s even more reason to come!

The open lab is an opportunity to learn about our research, catch up over snacks and beverages, and pick up a sticker or two. We will have no presentations but several posters describing projects we are working on.

OpenSym 2017 Program Postmortem

The International Symposium on Open Collaboration (OpenSym, formerly WikiSym) is the premier academic venue exclusively focused on scholarly research into open collaboration. OpenSym is an ACM conference which means that, like conferences in computer science, it’s really more like a journal that gets published once a year than it is like most social science conferences. The “journal”, in this case, is called the Proceedings of the International Symposium on Open Collaboration and it consists of final copies of papers which are typically also presented at the conference. Like journal articles, papers that are published in the proceedings are not typically published elsewhere.

Along with Claudia Müller-Birn from the Freie Universtät Berlin, I served as the Program Chair for OpenSym 2017. For the social scientists reading this, the role of program chair is similar to being an editor for a journal. My job was not to organize keynotes or logistics at the conference—that is the job of the General Chair. Indeed, in the end I didn’t even attend the conference! Along with Claudia, my role as Program Chair was to recruit submissions, recruit reviewers, coordinate and manage the review process, make final decisions on papers, and ensure that everything makes it into the published proceedings in good shape.

In OpenSym 2017, we made several changes to the way the conference has been run:

  • In previous years, OpenSym had tracks on topics like free/open source software, wikis, open innovation, open education, and so on. In 2017, we used a single track model.
  • Because we eliminated tracks, we also eliminated track-level chairs. Instead, we appointed Associate Chairs or ACs.
  • We eliminated page limits and the distinction between full papers and notes.
  • We allowed authors to write rebuttals before reviews were finalized. Reviewers and ACs were allowed to modify their reviews and decisions based on rebuttals.
  • To assist in assigning papers to ACs and reviewers, we made extensive use of bidding. This means we had to recruit the pool of reviewers before papers were submitted.

Although each of these things have been tried in other conferences, or even piloted within individual tracks in OpenSym, all were new to OpenSym in general.

Overview

Statistics
Papers submitted 44
Papers accepted 20
Acceptance rate 45%
Posters submitted 2
Posters presented 9
Associate Chairs 8
PC Members 59
Authors 108
Author countries 20

The program was similar in size to the ones in the last 2-3 years in terms of the number of submissions. OpenSym is a small but mature and stable venue for research on open collaboration. This year was also similar, although slightly more competitive, in terms of the conference acceptance rate (45%—it had been slightly above 50% in previous years).

As in recent years, there were more posters presented than submitted because the PC found that some rejected work, although not ready to be published in the proceedings, was promising and advanced enough to be presented as a poster at the conference. Authors of posters submitted 4-page extended abstracts for their projects which were published in a “Companion to the Proceedings.”

Topics

Over the years, OpenSym has established a clear set of niches. Although we eliminated tracks, we asked authors to choose from a set of categories when submitting their work. These categories are similar to the tracks at OpenSym 2016. Interestingly, a number of authors selected more than one category. This would have led to difficult decisions in the old track-based system.

distribution of papers across topics with breakdown by accept/poster/reject

The figure above shows a breakdown of papers in terms of these categories as well as indicators of how many papers in each group were accepted. Papers in multiple categories are counted multiple times. Research on FLOSS and Wikimedia/Wikipedia continue to make up a sizable chunk of OpenSym’s submissions and publications. That said, these now make up a minority of total submissions. Although Wikipedia and Wikimedia research made up a smaller proportion of the submission pool, it was accepted at a higher rate. Also notable is the fact that 2017 saw an uptick in the number of papers on open innovation. I suspect this was due, at least in part, to work by the General Chair Lorraine Morgan’s involvement (she specializes in that area). Somewhat surprisingly to me, we had a number of submission about Bitcoin and blockchains. These are natural areas of growth for OpenSym but have never been a big part of work in our community in the past.

Scores and Reviews

As in previous years, review was single blind in that reviewers’ identities are hidden but authors identities are not. Each paper received between 3 and 4 reviews plus a metareview by the Associate Chair assigned to the paper. All papers received 3 reviews but ACs were encouraged to call in a 4th reviewer at any point in the process. In addition to the text of the reviews, we used a -3 to +3 scoring system where papers that are seen as borderline will be scored as 0. Reviewers scored papers using full-point increments.

scores for each paper submitted to opensym 2017: average, distribution, etc

The figure above shows scores for each paper submitted. The vertical grey lines reflect the distribution of scores where the minimum and maximum scores for each paper are the ends of the lines. The colored dots show the arithmetic mean for each score (unweighted by reviewer confidence). Colors show whether the papers were accepted, rejected, or presented as a poster. It’s important to keep in mind that two papers were submitted as posters.

Although Associate Chairs made the final decisions on a case-by-case basis, every paper that had an average score of less than 0 (the horizontal orange line) was rejected or presented as a poster and most (but not all) papers with positive average scores were accepted. Although a positive average score seemed to be a requirement for publication, negative individual scores weren’t necessary showstoppers. We accepted 6 papers with at least one negative score. We ultimately accepted 20 papers—45% of those submitted.

Rebuttals

This was the first time that OpenSym used a rebuttal or author response and we are thrilled with how it went. Although they were entirely optional, almost every team of authors used it! Authors of 40 of our 46 submissions (87%!) submitted rebuttals.

Lower Unchanged Higher
6 24 10

The table above shows how average scores changed after authors submitted rebuttals. The table shows that rebuttals’ effect was typically neutral or positive. Most average scores stayed the same but nearly two times as many average scores increased as decreased in the post-rebuttal period. We hope that this made the process feel more fair for authors and I feel, having read them all, that it led to improvements in the quality of final papers.

Page Lengths

In previous years, OpenSym followed most other venues in computer science by allowing submission of two kinds of papers: full papers which could be up to 10 pages long and short papers which could be up to 4. Following some other conferences, we eliminated page limits altogether. This is the text we used in the OpenSym 2017 CFP:

There is no minimum or maximum length for submitted papers. Rather, reviewers will be instructed to weigh the contribution of a paper relative to its length. Papers should report research thoroughly but succinctly: brevity is a virtue. A typical length of a “long research paper” is 10 pages (formerly the maximum length limit and the limit on OpenSym tracks), but may be shorter if the contribution can be described and supported in fewer pages— shorter, more focused papers (called “short research papers” previously) are encouraged and will be reviewed like any other paper. While we will review papers longer than 10 pages, the contribution must warrant the extra length. Reviewers will be instructed to reject papers whose length is incommensurate with the size of their contribution.

The following graph shows the distribution of page lengths across papers in our final program.

histogram of paper lengths for final accepted papersIn the end 3 of 20 published papers (15%) were over 10 pages. More surprisingly, 11 of the accepted papers (55%) were below the old 10-page limit. Fears that some have expressed that page limits are the only thing keeping OpenSym from publshing enormous rambling manuscripts seems to be unwarranted—at least so far.

Bidding

Although, I won’t post any analysis or graphs, bidding worked well. With only two exceptions, every single assigned review was to someone who had bid “yes” or “maybe” for the paper in question and the vast majority went to people that had bid “yes.” However, this comes with one major proviso: people that did not bid at all were marked as “maybe” for every single paper.

Given a reviewer pool whose diversity of expertise matches that in your pool of authors, bidding works fantastically. But everybody needs to bid. The only problems with reviewers we had were with people that had failed to bid. It might be reviewers who don’t bid are less committed to the conference, more overextended, more likely to drop things in general, etc. It might also be that reviewers who fail to bid get poor matches which cause them to become less interested, willing, or able to do their reviews well and on time.

Having used bidding twice as chair or track-chair, my sense is that bidding is a fantastic thing to incorporate into any conference review process. The major limitations are that you need to build a program committee (PC) before the conference (rather than finding the perfect reviewers for specific papers) and you have to find ways to incentivize or communicate the importance of getting your PC members to bid.

Conclusions

The final results were a fantastic collection of published papers. Of course, it couldn’t have been possible without the huge collection of conference chairs, associate chairs, program committee members, external reviewers, and staff supporters.

Although we tried quite a lot of new things, my sense is that nothing we changed made things worse and many changes made things smoother or better. Although I’m not directly involved in organizing OpenSym 2018, I am on the OpenSym steering committee. My sense is that most of the changes we made are going to be carried over this year.

Finally, it’s also been announced that OpenSym 2018 will be in Paris on August 22-24. The call for papers should be out soon and the OpenSym 2018 paper deadline has already been announced as March 15, 2018. You should consider submitting! I hope to see you in Paris!

This Analysis

OpenSym used the gratis version of EasyChair to manage the conference which doesn’t allow chairs to export data. As a result, data used in this this postmortem was scraped from EasyChair using two Python scripts. Numbers and graphs were created using a knitr file that combines R visualization and analysis code with markdown to create the HTML directly from the datasets. I’ve made all the code I used to produce this analysis available in this git repository. I hope someone else finds it useful. Because the data contains sensitive information on the review process, I’m not publishing the data.

OpenSym 2017 Program Published

A few hours ago, OpenSym 2017 kicked off in Galway. For those that don’t know, OpenSym is the International Symposium on Wikis and Open Collaboration (it was called WikiSym until 2014). Its the premier academic venue focused on research on wikis, open collboration, and peer production.

This year, Claudia Müller-Birn and I served as co-chairs of the academic program. Acting as program chair for an ACM conference like OpenSym is more like being a journal editor than a conference organizer. Claudia and I drafted and publicized a call for papers, recruited Associate Chairs and members of a program committee who would review papers and make decisions, coordinated reviews and final decisions, elicited author responses, sent tons of email to notify everybody about everything, and dealt with problems as they came up. It was a lot of work! With the schedule set, and the proceedings now online, our job is officially over!

OpenSym reviewed 43 papers this year and accepted 20 giving the conference a 46.5% acceptance rate. This is similar to both the number of submissions and the acceptance rates for previous years.

In addition to papers, we received 3 extended abstracts for posters for the academic program and accepted 1. There were an additional 7 promising papers that were not accepted but whose authors were invited to present posters and who will be doing so at the conference. The authors of posters will have extended abstracted about their posters published in the non-archival companion proceedings.

The list of papers being published and presented at OpenSym includes:

The following extended abstracts for posters will be published in the companion to the proceedings:

There was also a doctoral consortium and a non-academic ”industry track” which Claudia and I weren’t involved in coordinating.

As part of running the program, we tried a bunch of new things this year including:

  • A move away from separate tracks back to a singlec combined model with Associate Chairs.
  • Bidding for papers among both Associate Chairs and normal PC members.
  • An author rebuttal/response period where authors got to respond to reviews and reviewers.
  • An elimination of page limits for papers. This meant that the category of notes also disappeared. Reviewers were instructed to evaluate the degree to which papers’ contributions were commensurate to their length.

I’m working on a longer post that will evaluate these changes. Until then, enjoy Galway if you were lucky enough to be there. If you couldn’t make it, enjoy the proceedings online!

You can learn more about OpenSym on it’s Wikipedia article on the OpenSym website. You can find details on the schedule and the program itself at its temporary home on the OpenSym website. I’ll update this page with a link to the ACM Digital Library page when it gets posted.