Update on the COVID-19 Digital Observatory

A few months ago we announced the launch of a COVID-19 Digital Observatory in collaboration with Pushshift and with funding from Protocol Labs. As part of this effort over the last several months, we have aggregated and published public data from multiple online communities and platforms. We’ve also been hard at work adding a series of new data sources that we plan to release in the near future.

Transmission electron microscope image of SARS-CoV-2—also known as 2019-nCoV, the not-so-novel-anymore virus that causes COVID-19 (Source: NIH NIAID via Wikimedia Commons, cc-sa 2.0)

More specifically, we have been gathering Search Engine Response Page (SERP) data on a range of COVID-19 related terms on a daily basis. This SERP data is drawn from both Bing and Google and has grown to encompass nearly 300GB of compressed data from four months of daily search engine results, with both PC and mobile results from nearly 500 different queries each day.

We have also continued to gather and publish revision and pageview data for COVID-related pages on English Wikipedia which now includes approximately 22GB of highly compressed data (several dozen gigabytes of compressed revision data each day) from nearly 1,800 different articles—a list that has been growing over time.

In addition, we are preparing releases of COVID-related data from Reddit and Twitter. We are almost done with two datasets from Reddit: a first one that includes all posts and comments from COVID-related subreddits, and a second that includes all posts or comments which include any of a set of COVID-related terms.

For the Twitter data, we are working out details of what exactly we will be able to release, but we anticipate including Tweet IDs and metadata for tweets that include COVID-related terms as well as those associated with hashtags and terms we’ve identified in some of the other data collection. We’re also designing a set of random samples of COVID-related Twitter content that will be useful for a range of projects.

In conjunction with these dataset releases, we have published all of the code to create the datasets as well as a few example scripts to help people learn how to load and access the data we’ve collected. We aim to extend these example analysis scripts in the future as more of the data comes online.

We hope you will take a look at the material we have been releasing and find ways to use it, extend it, or suggest improvements! We are always looking for feedback, input, and help. If you have a COVID-related dataset that you’d like us to publish, or if you would like to write code or documentation, please get in touch!

All of the data, code, and other resources are linked from the project homepage. To receive further updates on the digital observatory, you can also subscribe to our low traffic announcement mailing list.

Are Vandals Rational?

Although Wikipedia is the encyclopedia that anybody can edit, not all edits are welcome. Wikipedia is subject to a constant deluge of vandalism. Random people on the Internet are constantly “blanking” Wikipedia articles by deleting their content, replacing the text of articles with random characters, inserting outlandish claims or insults, and so on. Although volunteer editors and bots do an excellent job of quickly reverting the damage, the cost in terms of volunteer time is real.

Why do people spend their time and energy vandalizing web pages? For readers of Wikipedia that encounter a page that has been marred or replaced with nonsense or a slur—and especially for all the Wikipedia contributors who spend their time fighting back the tide of vandalism by checking and reverting bad edits and maintaining the bots and systems that keep order—it’s easy to dismiss vandals as incomprehensible sociopaths.

In a paper I just published in the ACM International Conference on Social Media and Society, I systematically analyzed a dataset of Wikipedia vandalism in an effort to identify different types of Wikipedia vandalism and to explain how each can been seen as “rational” from the point of view of the vandal.

You can see Kaylea present this work via a 5-minute YouTube talk.

Leveraging a dataset we created in some of our other work, the study used a random sample of contributions drawn from four groups that vary in the degree to the editors in question can be identified by others in Wikipedia: established users with accounts, users with accounts making their first edits, users without accounts, and users of the Tor privacy tool. Tor users were of particular interest to me because the use of Tor offers concrete evidence that a contributor is deliberately seeking privacy. I compared the frequency of vandalism in each group, developed an ontology to categorize it, and tested the relationship between group membership and different types of vandalism.

Vandalism in an University bathroom. [“Whiteboard Revisited.” Quinn Dombrowski. via flickr, CC BY-SA 2.0]

I found that the group that had engaged in the least effort in order to edit—users without accounts—were the most likely to vandalize. Although privacy-seeking Tor contributors were not the most likely to vandalize, vandalism from Tor-based contributors was less likely to be sociable, was more likely to be large scale (i.e. large blocks of text, such as by pasting in the same lines over and over), and more likely to express frustration with the Wikipedia community.

Thinking systematically about why different groups of users might engage in vandalism can help counter vandalism. Potential interventions might change not just the amount, but also the type, of vandalism a community will receive. Tools to detect vandalism may find that the patterns in each category allow for more accurate targeting. Ultimately, viewing vandals as more than irrational sociopaths opens potential avenues for dialogue.


For more details, check out the full paper which is available as a freely accessible preprint. The project would not have been possible without Chau Tran’s work to develop a dataset of contributions from Tor users. This work was supported by the National Science Foundation (Awards CNS-1703736 and CNS-1703049).

Paper Citation: Kaylea Champion. 2020. “Characterizing Online Vandalism: A Rational Choice Perspective.” In International Conference on Social Media and Society (SMSociety’20). Association for Computing Machinery, New York, NY, USA, 47–57. https://doi.org/10.1145/3400806.3400813

Tor users: An untapped resource for Wikipedia?

screenshot of Wikipedia banning tor users
An image displaying the message that Tor users typically receive when trying to make edits on Wikipedia, stating that the user’s IP address has been identified as a Tor exit node, and that “editing through Tor is blocked to prevent abuse.”

Like everyone else, Internet users who protect their privacy by using the anonymous browsing software Tor are welcome to read Wikipedia. However, when Tor users try to contribute to the self-described “encyclopedia that anybody can edit,” they typically come face-to-face with a notice explaining that their participation is not welcome.

Our new paper—led by Chau Tran at NYU and authored by a group of researchers from the University of Washington, the Community Data Science Collective, Drexel, and New York University—was published and presented this week at the IEEE Symposium on Security & Privacy and provides insight into what Wikipedia might be missing out on by blocking Tor. By comparing contributions from Tor that slip past Wikipedia’s ban to edits made by other types of contributors, we find that Tor users make contributions to Wikipedia that are just as valuable as those made by new and unregistered Wikipedia editors. We also found that Tor users are more likely to engage with certain controversial topics.

One-minute “Trailer” for our paper and talk at the IEEE Symposium on Security & Privacy. Video was produced by Tommy Ferguson at the UW Department of Communication.

To conduct our study, we first identified more than 11,000 Wikipedia edits made by Tor users who were able to bypass Wikipedia’s ban on contributions from Tor between 2007 and 2018. We then used a series of quantitative techniques to evaluate the quality of these contributions. We found that Tor users made contributions that were similar in quality to, and in some senses even better than, contributions made by other users without accounts and newcomers making their first edits.

An image from the study showing the differences in topics edited by Tor users and other Wikipedia users. The image suggests that Tor users are more likely to edit pages discussing topics such as politics, religion, and technology. Other types of users, including IP, First-time, and Registered editors, are more likely to edit pages discussing topics such as music and sports.

We used a range of analytical techniques including direct parsing of article histories, manual inspections of article changes, and a machine learning platform called ORES to analyze contributions. We also used a machine learning technique called topic modeling to analyze Tor users’ areas of interest by checking their edits against clusters of keywords. We found that Tor-based editors are more likely than other users to focus on topics that may be considered controversial, such as politics, technology, and religion.

In a closely connected study led by Kaylea Champion and published several months ago in the Proceedings of the ACM on Human Computer Interaction (CSCW), we conducted a forensic qualitative analysis of contributions of the same dataset. Our results in that study are described in a separate blog post about that project and paint a complementary picture of Tor users engaged—in large part—in uncontroversial and quotidian types of editing behavior.

Across the two papers, our results are similar to other work that suggests that Tor users are very similar to other internet users. For example, one previous study has shown that Tor users frequently visit websites in the Alexa top one million.

Much of the discourse about anonymity online tends toward extreme claims backed up by very little in the way of empirical evidence or systematic study. Our work is a step toward remedying this gap and has implications for many websites that limit participation by users of anonymous browsing software like Tor. In the future, we hope to conduct similar systematic studies in contexts beyond Wikipedia.

Video of the conference presentation at the IEEE Symposium on Security & Privacy 2020 by Chau Tran.

In terms of Wikipedia’s own policy decisions about anonymous participation, we believe that our paper suggests that the benefits of a “pathway to legitimacy” for Tor contributors to Wikipedia might exceed the potential harm due to the value of their contributions. We are particularly excited about exploring ways to allow contributors from anonymity-seeking users under certain conditions: for example, requiring review prior to changes going live. Of course, these are questions for the Wikipedia community to decide but it’s a conversation that we hope our research can inform and that we look forward to participating in.


Authors of the paper, “Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor,” include Chau Tran (NYU), Kaylea Champion (UW & CDSC), Andrea Forte (Drexel), Benjamin Mako Hill (UW & CDSC), and Rachel Greenstadt (NYU). The paper was published at the 2020 IEEE Symposium on Security & Privacy between May 18 and 20. Originally to be held in San Francisco, the event was held digitally due to the COVID-19 pandemic. This blog post borrows with permission from this news release by Andrew Laurent at NYU.

Paper Citation: Tran, Chau, Kaylea Champion, Andrea Forte, Benjamin Mako Hill, and Rachel Greenstadt. “Are Anonymity-Seekers Just like Everybody Else? An Analysis of Contributions to Wikipedia from Tor.” In 2020 IEEE Symposium on Security and Privacy (SP), 1:974–90. San Francisco, California: IEEE Computer Society, 2020. https://doi.org/10.1109/SP40000.2020.00053.

The research was funded by the National Science Foundation.

COVID-19 Digital Observatory awarded Open Innovation Grant from Protocol Labs Research

Last week, Protocol Labs Research announced their COVID-19 Open Innovation Grant recipients and we thrilled to announce that the Community Data Science Collective’s COVID-19 Digital Observatory is among the awarded projects!

Protocol Labs works to improve internet technologies through open source protocols, systems, and tools. The organization initially grew out of efforts to apply blockchain tools to support distributed file sharing infrastructure. Their research group, Protocol Labs Research, created the COVID-19 Open Innovation Grants program “to surface and support open-source projects working on tools to help humanity through present and future pandemics.”

Among the ten projects supported under the program, others aim to develop open source medical devices (such as an origami respirator!), contact tracing infrastructure, device development and testing, and engineering collaboration. We feel grateful and humbled to be in the company of these diverse efforts to apply open collaboration to the response to COVID-19!

In the case of the COVID-19 Digital Observatory, we plan  to use the funds provided by the award to build out the resources we have already started to aggregate and release. In particular, we will build additional infrastructure to process and archive data from Reddit and other social media sources as well as search engine results pages (SERPs) for COVID-related queries.

In addition to folks in the collective, the proposal was successful through the efforts of Jason Baumgartner from Pushshift, who is co-leading the observatory work, as well as Marysia Galent, Research Administrator at Northwestern University, whose expert guidance helped make the grant application possible.

What do people do when they edit Wikipedia through Tor?

A paper recently published at CSCW describes the results of a forensic qualitative analysis of contributions made to Wikipedia through the anonymous browsing system Tor. The project was conducted collaboratively with researchers from Drexel, NYU, and the University of Washington and complements a quantitative analysis of the same data we also published to provide a rich qualitative picture of what anonymity-seekers are trying to do when they contribute to Wikipedia. The work also shows how the ability to stay anonymous can play a important role in facilitating certain types of contributions to online knowledge bases like Wikipedia.

Many individuals use Tor to reduce their visibility to widespread internet surveillance.

Media reports often describe how online platforms are tracking us. That said, trying to live our lives online without leaving a trail of our personal information can be difficult because many services can’t be used without an account and systems that protect privacy are often blocked. One popular approach to protecting our privacy online involves using the Tor network. Tor protects users from being identified by their IP address which can be tied to a physical location. However, if you’d like to contribute to Wikipedia using Tor, you’ll run into a problem. Although most IP addresses can edit without an account, Tor users are blocked from editing.

Tor users attempting to contributing to Wikipedia are shown a screen that informs them that they are not allowed to edit Wikipedia.

Other research by my team has shown that Wikipedia’s attempt to block Tor is imperfect and that some people have been able to edit despite the ban. This work also built a dataset of more than 11,000 contributions made to Wikipedia via Tor and used quantitative analysis to show that the contributions of people using Tor were about the same quality as contributions from other new editors and other contributors without accounts. Of course, given the unusual circumstances Tor-based contributors faced, we wondered if a deeper look into the content of their edits might tell us more about their motives and the kinds of contributions they seek to make. I led a qualitative investigation that sought to explore these questions.

Given the challenges of studying anonymity seekers, we designed a novel “forensic” qualitative approach that was inspired by the techniques common in the practice of computer security as well as criminal investigation. We applied to this new technique to a sample of 500 different editing sessions and sorted each session into a category based on what the editor seemed to be intending to do.

Most of the contributions we found fell into one of the two following categories:

  • Many contributions were quotidian attempts to add to the encyclopedia. Tor-based editors added facts, they fixed typos, and they updated train schedules. There’s no way to know if these individuals knew that they were just getting lucky in their ability to edit or if they were patiently reloading to evade the ban.
  • Second, we found harassing comments and vandalism. Unwelcome conduct is common in online environments, and sometimes more common when the likelihood of being identified is decreased. Some of the harassing comments we observed were direct responses to being banned as a Tor user.

Although these were most of what we observed, we also found evidence of several types of contributor intent:

  • We observed activism, as when a contributor tried to bring attention to journalistic accounts of environmental and human rights abuses being committed by a mining company, only to have editors traceable to the mining company repeatedly remove their edits. Another example included an editor trying to diminish the influence of alternative medicine proponents.
  • We also observed quality maintenance activities when editors used Wikipedia’s rules about appropriate sourcing to remove personal websites being cited in conspiracy theories.
  • We saw edit wars with Tor editors participating in a back-and-forth removal and replacement of content as part of a dispute, in some cases countering the work of an experienced Wikipedia editor who even other experienced editors had gauged to be biased.
  • Finally, we saw Tor-based editors participating in non-article discussions such as investigations of administrator misconduct, and protesting the mistrust of Tor editors by the Wikipedia platform.
An exploratory mapping of our themes in terms of the value a type of contribution represents to the Wikipedia community and the importance of anonymity in facilitating it. Anonymity protecting tools play a critical role in facilitating contributions on the right side of the figure while edits on the left are more likely to occur even when anonymity is impossible. Contributions toward the top reflect valuable forms of participation in Wikipedia while edits on the bottom reflect damage.

In all, these themes led us to reflect on how the risks that individuals face when contributing to online communities are sometimes out of alignment with the risks the communities face by accepting their work. Expressing minoritized perspectives, maintaining community standards even when you may be targeted by the rulebreaker, highlighting injustice or acting as a whistleblower can be very risky for an individual, and may not be possible without privacy protections. Of course, in platforms seeking to support the public good, such knowledge and accountability may be crucial.


This project was conducted by Kaylea Champion, Nora McDonald, Stephanie Bankes, Joseph Zhang, Rachel Greenstadt, Andrea Forte, and Benjamin Mako Hill. This work was supported by the National Science Foundation (awards CNS-1703736 and CNS-1703049) and included the work of two undergraduates supported through an NSF REU supplement.

Paper Citation: Kaylea Champion, Nora McDonald, Stephanie Bankes, Joseph Zhang, Rachel Greenstadt, Andrea Forte, and Benjamin Mako Hill. 2019. A Forensic Qualitative Analysis of Contributions to Wikipedia from Anonymity Seeking Users. Proceedings of the ACM on Human-Computer Interactaction. 3, CSCW, Article 53 (November 2019), 26 pages. https://doi.org/10.1145/3359155

Sohyeon Hwang awarded NSF Graduate Research Fellowship

Congratulations to Sohyeon Hwang, who will be awarded a prestigious Graduate Research Fellowship (a.k.a., GRFP) from the U.S. National Science Foundation!

photo of Sohyeon Hwang standing somewhere
Sohyeon Hwang standing somewhere.

The award will support Sohyeon’s proposed doctoral research on the complexity of governance practices in online communities. This work will focus on the ways communities heterogeneously fill the gap between rules-as-written (de jure) and rules-as-practiced (de facto) to impact the credibility and effectiveness of online governance work. The main components of this project will center around understanding the significance and role of shared (or conversely, localized) rules across communities; the automated tools utilized by these communities; and how users perceive, experience, and practice heterogeneity in online governance practices.

Sohyeon is a first year Ph.D. student in the Media, Technology & Society Program at Northwestern, advised by Aaron Shaw, and began working with the Community Data Science Collective last summer. She completed her undergraduate degree at Cornell University, where she double-majored in government and information science, focusing on Cold War era politics in the former and data science in the latter.

Sohyeon is currently pursuing graduate coursework, and her ongoing research includes a project comparing governance across several of the largest language editions of Wikipedia as well as work with Dr. Ágnes Horvát developing a project on multi-platform information spread. Recently, she has also taken a lead role in the efforts by CDSC and Pushshift to create a Digital Observatory for COVID-19 information resources.

Launching the COVID-19 Digital Observatory

The Community Data Science Collective, in collaboration with Pushshift and others, is launching a new collaborative project to create a digital observatory for socially produced COVID-19 information. The observatory has already begun the process of collecting, and aggregating public data from multiple online communities and platforms. We are publishing reworked versions of these data in forms that are well-documented and more easily analyzable by researchers with a range of skills and computation resources. We hope that these data will facilitate analysis and interventions to improve the quality of socially produced information and public health.

Transmission electron microscope image of SARS-CoV-2—also known as 2019-nCoV, the virus that causes COVID-19 (Source: NIH NIAID via Wikimedia Commons, cc-sa 2.0).

During crises such as the current COVID-19 pandemic, many people turn to the Internet for information, guidance, and help. Much of what they find is socially produced through online forums, social media, and knowledge bases like Wikipedia. The quality of information in these data sources varies enormously and users of these systems may receive information that is incomplete, misleading, or even dangerous. Efforts to improve this are complicated by difficulties in discovering where people are getting information and in coordinating efforts to focus on refining the more important information sources. There are number of researchers with the skills and knowledge to address these issues, but who may struggle to gather or process social data. The digital observatory facilitates data collection, access, and analysis.

Our initial release includes several datasets, code used to collect the data, and some simple analysis examples. Details are provided on the project page as well as our public Github repository. We will continue adding data, code, analysis, documentation, and more. We also welcome collaborators, pull-requests, and other contributions to the project.

What’s the goal for this project?

Our hope is that the public datasets and freely licensed tools, techniques, and knowledge created through the digital observatory will allow researchers, practitioners, and public health officials to more efficiently gather, analyze, understand, and act to improve these crucial sources of information during crises. Ultimately this will support ongoing responses to COVID-19 and contribute to future preparedness to respond to crisis events through analyses conducted after the fact.

How do I get access to the digital observatory?

The digital observatory data, code, and other resources will exist in a few locations, all linked from the project homepage. The data we collect, parse, and publish lives at covid19.communitydata.org/datasets. The code to collect, parse, and output those datasets lives in our Github repository, which also includes some scripts for getting started with analysis. We will integrate additional data and data collection resources from Pushshift and adjacent projects as we go. For more information, please check out the project page.

Stay up to date!

To receive updates on the digital observatory, please subscribe to our low traffic announcement mailing list. You will be the first to know about new datasets and other resources (and we won’t use or distribute addresses for any other reason).

Jacobs Fellowship to study new frontier in tech education

This article is a reposted article from Doug Parry’s article in the UW iSchool News Website. The project is being driven by Stefania Druga who is part of the Community Data Science learning team and Mako. Jason Yip is a group friend.

Partners in the AI Literacy Project funded by Jacobs Fellowship (from top left to right): Jason Yip – Assistant Professor iSchool University of Washington, Stefania Druga – Doctoral Student iSchool University of Washington, Benjamin Mako Hill – Assistant Professor Department of Communications University of Washington, Indra Kubicek – CFO at Kids Code Jeunesse, David Moinina Sengeh – Minister Of Basic and Senior Secondary Education at Government of Sierra Leone, Kate Arthur – Founder & CEO at Kids Code Jeunesse, Michael Preston – co-founder CSforALL & Executive Director of Joan Ganz Cooney Center at Sesame Workshop.

A decade ago, teaching kids to code might have seemed far-fetched to some, but now coding curriculum is being widely adopted across the country. Recently researchers have turned their eye to the next wave of technology: artificial intelligence. As AI makes a growing impact on our lives, can kids benefit from learning how it works?

A three-year, $150,000 award from the Jacobs Foundation Research Fellowship Program will help answer that question. The fellowship awarded to Jason Yip, an assistant professor at the University of Washington Information School, will allow a team of researchers to investigate ways to educate kids about AI.

Stefania Druga, a first-year Ph.D. student advised by Yip , is among the researchers spearheading the effort. Druga came to the iSchool after earning her master’s at the Massachusetts Institute of Technology, where she launched Cognimates, a platform that teaches children how to train AI models and interact with them.

Druga’s desire to take Cognimates to the next level brought her to the University of Washington Information School and to her advisor, Yip, whose KidsTeam UW works with children to design technology. KidsTeam treats children as equal partners in the design process, ensuring the technology meets their needs — an approach known as co-design.

At MIT, “I realized there was only so far we could go,” Druga said. “In order for us to imagine what the future interfaces of AI learning for kids would look like, we need to have this longer-term relationship and partnership with kids, and co-design with kids, which is something Jason and the team here have done very well.”

Built on the widely used Scratch programming language, Cognimates is an open-source platform that gives kids the tools to teach computers how to recognize images and text and play games. Druga hopes the next iteration will help children truly understand the concepts behind AI — what is the robot “thinking” and who taught it to think that way? Even if they don’t grow up to be programmers or software engineers, the generation of “AI natives” will need to understand how technology works in order to be critical users.

“It matters as a new literacy,” Druga said, “especially for new generations who are growing up with technologies that become so embedded in things we use on a regular basis.”

Over the course of the fellowship, the research team will work with international partners to develop an AI literacy educational platform and curriculum in multiple languages for use in different settings, in both more- and less-developed parts of the world.

Partners include Kate Arthur, CEO of Kids Code Jeunesse in Montreal; Michael Preston, executive director of the Joan Ganz Cooney Center at Sesame WorkshopDavid Sengeh, the minister of basic and secondary education for the government of Sierra Leone; and Benjamin Mako Hill, an assistant professor in the UW Department of Communication.

For Yip, the project brings the work of his Ph.D. student together with his work with KidsTeam with other recent research he has conducted on how families interact with AI.

“For me, it’s a proud moment when an advisee has a really cool vision that we can build together as a team,” Yip said. “This is a nice intersection of all of us coming together and thinking about what families need to understand artificial intelligence.”

The Jacobs Foundation fellowship program is open to early- and mid-career researchers from all scholarly disciplines around the world whose work contributes to the development and living conditions of children and youth. It’s highly competitive, with 10-15 fellowships chosen from hundreds of submissions each year.

If you are interested to get involved with this project or support in any way you may contact us at cognimates[a]gmail.com.

Further information about this project available here: http://cognimates.me/research

Reflections on Janet Fulk and Peter Monge

In May 2019, we were invited to give short remarks on the impact of Janet Fulk and Peter Monge at the International Communication Association‘s annual meeting as part of a session called “Igniting a TON (Technology, Organizing, and Networks) of Insights: Recognizing the Contributions of Janet Fulk and Peter Monge in Shaping the Future of Communication Research.

Youtube: Mako Hill @ Janet Fulk and Peter Monge Celebration at ICA 2019

Mako Hill gave a four-minute talk on Janet and Peter’s impact to the work of the Community Data Science Collective. Mako unpacked some of the cryptic acronyms on the CDSC-UW lab’s whiteboard as well as explaining that our group has a home in the academic field of communication, in no small part, because of the pioneering scholarship of Janet and Peter. You can view the talk in WebM or on Youtube.

Modeling the ecological dynamics of online organizations

Do online communities compete with each other over resources or niches? Do they co-evolve in symbiotic or even parasitic relationships? What insights can we gain by applying ecological models of collective behavior to the study of collaborative online groups?

A colorful pisaster ochraceus (a.k.a., pisaster), a sea star species whose presence or absence can radically alter the ecology of an intertidal community. Our research will adapt theories created to explain the population dymamics of organisms like the pisaster in the context of online communities and human organizations (photo: Multi-Agency Rocky Intertidal Network).


We  are delighted to announce that a Community Data Science Collective (CDSC) team led by Nate TeBlunthuis and Jeremy Foote has just started work on a three-year grant from the U.S. National Science Foundation to study the ecological dynamics of online communities! Aaron Shaw and Benjamin Mako Hill are principal investigators for the grant.

The projects supported by the award will extend the study of peer production and online communities by analyzing how aspects of communities’ environments impact their growth, patterns of participation, and survival. The work draws on recent research on various biological systems, organizational ecology, and human computer interaction (HCI). In general, we adapt these approaches to inform quantitative and computational analysis of populations of peer production communities and other online organizations.

As a major goal, we want to explain the conditions under which certain ecological dynamics emerge versus when they do not. For example, prior work has suggested that communities interact in ways that are both competitive and mutalistic. But what leads two communities to become competitors and others to benefit each other?  We aim to understand when these patterns to arise. We are also interested in how community leaders might pursue effective strategies for survival given circumstances in the surrounding environment.

The grant promises to support a number of projects within the CDSC. Nate and Jeremy led the proposal writing as well as two key pilot studies that informed the development of the proposal. Other group members are now involved in planning and developing multiple studies under the grant.

The grant was awarded by the NSF Cyber-Human Systems (CHS) program within the Directorate for Information and Intellligent Systems (IIS) and the award is shared by Northwestern and the University of Washington (award numbers IIS-1910202 and IIS-1908850)

We’ve published the description of the proposal that we submitted to the NSF, although some details will shift as we carry out the project. The best place to stay up-to-date about the work will be to follow [the CDSC Twitter account (@ComDataSci)or the CDSC blog.