If you’re attending ACM-CSCW this year, you are warmly invited to join CDSC members during our talks and other scheduled events. CSCW is not only virtual but spread across multiple weeks and offering sessions multiple times to accommodate timezones. We hope to see you there — we are eager to discuss our work with you!
CDSC members and affiliates are involved in CSCW beyond these public presentations. Nicholas Vincent, Sohyeon Hwang, and Sneha Narayan are part of the organizing team for the “Ethical Tensions, Norms, and Directions in the Extraction of Online Volunteer Work” workshop, where Molly de Blanc is scheduled to present and Kaylea Champion will be giving a lightning talk. Katherina Kloppenborg and Kaylea Champion are presenting in the Doctoral Consortium.
It’s Ph.D. application season and the Community Data Science Collective is recruiting! As always, we are looking for talented people to join our research group. Applying to one of the Ph.D. programs that the CDSC faculty members are affiliated with is a great way to get involved in research on communities, collaboration, and peer production.
Because we know that you may have questions for us that are not answered in this webpage, we will be hosting a panel discussion and Q&A about the CDSC and Ph.D. opportunities on October 20 at 7:30pm UTC (3:30pm US Eastern, 2:30pm US Central, 12:30pm US Pacific). You can register online.
This post provides a very brief run-down on the CDSC, the different universities and Ph.D. programs our faculty members are affiliated with, and some general ideas about what we’re looking for when we review Ph.D. applications.
What are these different Ph.D. programs? Why would I choose one over the other?
This year the group includes three faculty principal investigators (PIs) who are actively recruiting PhD students: Aaron Shaw (Northwestern University), Benjamin Mako Hill (University of Washington in Seattle), and Jeremy Foote (Purdue University). Each of these PIs advise Ph.D. students in Ph.D. programs at their respective universities. Our programs are each described below.
Although we often work together on research and serve as co-advisors to students in each others’ projects, each faculty person has specific areas of expertise and interests. The reasons you might choose to apply to one Ph.D. program or to work with a specific faculty member could include factors like your previous training, career goals, and the alignment of your specific research interests with our respective skills.
At the same time, a great thing about the CDSC is that we all collaborate and regularly co-advise students across our respective campuses, so the choice to apply to or attend one program does not prevent you from accessing the expertise of our whole group. But please keep in mind that our different Ph.D. programs have different application deadlines, requirements, and procedures!
Who is actively recruiting this year?
If you are interested in applying to any of the programs, we strongly encourage you to reach out the specific faculty in that program before submitting an application.
AaronShaw is an Associate Professor in the Department of Communication Studies at Northwestern. This year, he’s also the “Scholar in Residence” for King County, Washington. In terms of Ph.D. programs, Aaron’s primary affiliations are with the Media, Technology and Society (MTS) and the Technology and Social Behavior (TSB) Ph.D. programs (please note: the TSB program is a joint degree between Communication and Computer Science). Aaron also has a courtesy appointment in the Sociology Department at Northwestern, but he has not directly supervised any Ph.D. advisees in that department (yet). Aaron’s current projects focus on comparative analysis of the organization of peer production communities and social computing projects, participation inequalities in online communities, and collaborative organizing in pursuit of public goods.
Jeremy Foote is an Assistant Professor at the Brian Lamb School of Communication at Purdue University. He is affiliated with the Organizational Communication and Media, Technology, and Society programs. Jeremy’s current research focuses on how individuals decide when and in what ways to contribute to online communities, how communities change the people who participate in them, and how both of those processes can help us to understand which things become popular and influential. Most of his research is done using data science methods and agent-based simulations.
What do you look for in Ph.D. applicants?
There’s no easy or singular answer to this. In general, we look for curious, intelligent people driven to develop original research projects that advance scientific and practical understanding of topics that intersect with any of our collective research interests.
To get an idea of the interests and experiences present in the group, read our respective bios and CVs (follow the links above to our personal websites). Specific skills that we and our students tend to use on a regular basis include consuming and producing social science and/or social computing (human-computer interaction) research; applied statistics and statistical computing, various empirical research methods, social theory and cultural studies, and more.
Formal qualifications that speak to similar skills and show up in your resume, transcripts, or work history are great, but we are much more interested in your capacity to learn, think, write, analyze, and/or code effectively than in your credentials, test scores, grades, or previous affiliations. It’s graduate school and we do not expect you to show up knowing how to do all the things already.
Intellectual creativity, persistence, and a willingness to acquire new skills and problem-solve matter a lot. We think doctoral education is less about executing tasks that someone else hands you and more about learning how to identify a new, important problem; develop an appropriate approach to solving it; and explain all of the above and why it matters so that other people can learn from you in the future. Evidence that you can or at least want to do these things is critical. Indications that you can also play well with others and would make a generous, friendly colleague are really important too.
All of this is to say, we do not have any one trait or skill set we look for in prospective students. We strive to be inclusive along every possible dimension. Each person who has joined our group has contributed unique skills and experiences as well as their own personal interests. We want our future students and colleagues to do the same.
Still not sure whether or how your interests might fit with the group? Still have questions? Still reading and just don’t want to stop? Follow the links above for more information. Feel free to send at least one of us an email. We are happy to try to answer your questions and always eager to chat. You can also join our panel discussion on October 20 at 3:30pm ET (UTC-5).
Although we might not notice it, much of the technology we rely on, from cell phones to cloud servers, is fueled by decades of effort by volunteers who create innovative software as well as the organizations necessary to sustain it. Despite this powerful legacy, we now are facing a crisis: not all of these critical components have been sufficiently maintained. Can we detect that an important software component is becoming neglected before major failures occur? Are these neglected packages just a matter of resources — old code and too few contributors — or can we see broader patterns that play a role, such as collaboration and organizational structures? Kaylea Champion has been working to answer these questions in her dissertation. As part of this work, she joined the software community metrics enthusiasts gathered at this year’s CHAOSSCon EU on September 12, 2022 as part of the Open Source Summit.
Kaylea’s presentation shares work in progress about the sources of underproduction, or when highly important packages see low quality development, in open software development. This presentation marks her second time at CHAOSSCon and builds on her work shared at last year’s conference in a lightning talk about detecting underproduction in Debian (see coverage of this work as presented to Debian folks here). Engaging with communities is a key part of this work: when we understand practitioner perspectives on underproduction and its causes, we can do science that supports taking immediate action. If you are interested in measuring the health of your collaborative community, let’s talk!
How should search engines be regulated? What are the implications of the EU Digital Services Act? If you are interested in technology policy, mark your calendar for an upcoming pre-conference virtual event: “Harms and Standards in Content Platform Governance” (October 13, 2022 at 5:00 a.m. PST, 8:00 a.m EST, 2:00 p.m. CEST). As part of the upcoming European Communication Research and Education conference, the Communication Law and Policy section has invited Kaylea Champion to present work she did with Benjamin Mako Hill and University of Washington students Jacinta Harshe, Isabella Brown, and Lucy Bao.
We examine the information landscape as manifested in search results during the Covid-19 pandemic using data we collected as part of the Covid-19 Digital Observatory Project. Our results provide evidence for the powerful ways that search engines shape our information environment–in terms of what information gets seen, the sources of that information, the market sectors that those sources operate within, and the partisan bias of those results.
This free event is oriented to connecting technology researchers and policymakers, and will include presentations of research from legal, communication, and critical perspectives.
We have been going on Lab Dates and it is pretty cool.
CSCW 2021 introduced Lab Speed Dating wherein labs were matched and given an hour to get to know each other. Sohyeon Hwang organized our first lab date. It was so much fun we decided to go on more in order to meet other groups. I wanted to share a bit about this and our process in case you are interested in trying it out or want to have a meetup with us.
After the initial CSCW Lab Date we made a very long list of other labs we want to meet and have (slowly) been inviting them to come by. We also included individual researchers, people who collaborate in smaller, informal groups, co-authors, and corporate research teams.
We use our “softblock” to schedule meetings, rather than finding a new time for each meeting. The CDSC maintains a softblock, which is a block of time for whatever comes up, one-off meetings we need to schedule, and co-working sessions. (Today I am using the softblock to write this blog post!)
We are pretty open to different structures for our lab dates. So far the ones with full labs have been divided into two parts: 1) everyone introduces themselves as briefly as we can manage and then 2) we break out into small groups for short periods of time to talk. We try to cycle through 2-3 of these breakouts, depending on how many people are in attendance. When meeting with individuals, our guests typically present a piece of work that we workshop or discussed their interests in a more general sense and we talk about them as a whole group. We are open to other models, but nothing has come up yet.
Blocks have focused around networking and getting to know other researchers on a professional level. Because we have been attending fewer in-person events, we have had fewer chances to meet new people. Even at events it can be hard to connect with the people you want to meet and it is very hard (for us) to have everyone from the CDSC in a space together with another group.
If you are interested in going on a lab date with us, you can message me on IRC or email me (details here). We have a lot of open spots for the rest of the quarter and one of them could be yours!
We recently held our second Community Dialogue around the theme of anonymity and privacy. Kaylea Champion presented on the role of anonymity in peer-contribution communities. Dr. Shruti Sannon joined us from the University of Michigan and talked about privacy in the gig economy.
What’s Anonymity Worth (Kaylea Champion)
Anonymity can protect and empower contributors in communities. Anonymity can make people feel safer or actually be safer. For example: Wikipedia editors who are working on controversial pages within contested geographies may be safer when they are able to contribute anonymously. Anonymous contribution is not without problems, as it can also empower trolls, harassers, and other bad actors. For more details, and actions you can take or policies to recommend within your communities, watch the video of Kaylea Champion’s presentation below.
Privacy and Surveillance in the Gig Economy (Dr. Shruti Sannon)
Gig workers can be asked or coerced to give up privacy in exchange for money through the design of the gig platforms they are using or by request of customers. Gig workers also use surveillance tools as a means of protecting themselves — some ride share drivers have cameras in their cars for this purpose. Dr. Sannon shared the broader implications of this situation, and what it can mean outside of the gig economy. To learn more, watch the video below.
Thanks to speakers Kaylea Champion and Shruti Sannon. The vision for this event borrows from the User and Open Innovation workshops organized by Eric von Hippel and colleagues, as well as others. This event and the research presented in it were supported by multiple awards from the National Science Foundation (DGE-1842165; IIS-2045055; IIS-1908850; IIS-1910202), Northwestern University, the University of Washington, and Purdue University.
Systems theory is a broad and multidisciplinary scientific approach that studies how things (molecules or cells or organs or people or companies) interact with each other. It argues that understanding how something works requires understanding its relationships and interdependencies.
For example, if we want to predict whether a new online community will grow, an individual perspective might focus on who the founder is, what software it is running on, how well it is designed, etc. A systems approach would argue that it is at least as important to understand things like how many similar communities there are, how active they are, and whether the platform is growing or shrinking.
In a paper just published in Media and Communication, I (Jeremy) argue that 1) it is particularly important to use a systems lens to study online communities, 2) that online communities provide ideal data for taking these approaches, and 3) that there is already really neat research in this area and there should be more of it.
The role of platforms
So, why is it so important to study online communities as interdependent “systems”? The first reason is that many online communities have a really important interdependence with the platforms that they run on. Platforms like Reddit or Facebook provide the servers and software for millions of communities, which are run mostly independently by the community managers and moderators.
However, this is an ambivalent relationship and often the goals and desires of at least some moderators are at odds with those of the platform, and things like community bans from the platform side or protests from the community side are not uncommon. The ways that platform decisions influence communities and how communities can work together to influence platforms are inherently systems questions.
Low barriers to entry and exit
A second feature of online communities is the relative ease with which people can join or leave them. Unlike offline groups, which at least require participants to get dressed, do their hair, and show up somewhere, online community participants can participate in an online community literally within seconds of knowing that it exists.
Similarly, people can leave incredibly easily, and most people do. This figure shows the number of comments made per person across 100 randomly selected subreddits (each line represents a subreddit; axes are both log-scaled). In every case, the vast majority of people only commented once while a few people made many comments.
Finally, it’s often really difficult to draw clear boundaries around where one online community ends and another begins. For example, is all of Wikipedia one “community”? It might make sense to think of a language edition, a WikiProject, or even a single page as a community, and researchers have done all of the above. Even on platforms like Reddit, where there is a clearl delineation between communities, there are dependencies, with people and conversations moving across and between communities on similar topics.
In other words, online communities are semi-autonomous, interdependent, contingent organizations, deeply influenced by their environments. Online community scholars have often ignored this larger context, but systems theory gives us a rich set of tools for studying these interdependencies. One reason that it is so ideal is because online communities provide ideal data.
Data from Online Communities
Systems theory is not new – many of the main concepts were developed in the 1950s and 1960s or earlier. Organizational communication researchers saw how applicable these ideas were, and manyresearchersproposed treating organizations as systems.
However, it was really tough to get the data needed to do systems-based research. To study a group or organization as a system, you need to know about not only the internal workings of the group, but how it relates to other groups, how it is influenced by and influences its environment, etc. Gathering data about even one group was difficult and expensive; getting the data to study many groups and how they interact with each other over time was impossible.
The internet has entered the chat
Online communities provide the kind of data that these earlier researchers could have only dreamed of. Instead of data about one organization, platforms store data about thousands of organizations. And this is not just high-level data about activity levels or participation; on the contrary, we often have longitudinal, full-text conversations of millions of people as they interact within and move between communities.
In part, this article is a call for researchers to think more explicitly about online communities as systems, and to apply systems theory as a way of understanding how online communities work and how we can design research projects to understand them better. It is also an attempt to highlight strands of research that are already doing this. In the paper, I talk about four: Community Comparisons and Interactions, Individual Trajectories, Cross-level Mechanisms, and Simulating Emergent Behavior. Here, I’ll focus on just two.
The first is what I call “Individual Trajectories”. In this approach, researchers can look at how individual people behave across a platform. One of the neat things about having longitudinal, unobtrusively collected data is that we can identify something interesting about users and go “back in time” to look for differences in earlier behavior. For example, in the plot above, Panciera et al. identified people who became active Wikipedia editors; they then went back and looked at how their behavior differed from typical editors from their early days on the site.
Researchers could and should do more work that looks at how people move between communities, and how communities influence the behavior of their members.
Simulating Emergent Behavior
The second approach is to use simulations to study emergent behaviors. Agent-based modeling software like NetLogo or Mesa allows researchers to create virtual worlds, where computational “agents” act according to theories of how the world works. Many communication theories make predictions about how individual‐level behavior produces higher‐level patterns, often through feedback loops (e.g., the Spiral of Silence theory). If agent-based models don’t produce those patterns, then we know that something about the theory—or its computational representation—is wrong.
Agent-based modeling has received some attention from communication researchers lately, including a wonderful special issue was recently published in Communication Methods and Measures; the editorial article makes some great arguments for the promise and benefits of simulations for communication research.
It is a really exciting time to be a computational social scientist, especially one that is interested in online organizations and organizing. We have only scratched the surface of what we can learn from the data that is pouring down around us, especially when it comes to systems theory questions. Tools, methods, and computational advances are constantly evolving and opening up new avenues of research.
Of course, taking advantage of these data sources and computational advances requires a different set of skills than Communication departments have traditionally focused on, and complicated, large-scale analyses require the use of supercomputers and extensive computational expertise.
However, there are many approaches like agent-based modeling or simple web scraping that can be taught to graduate students in one or two semesters, and open up lots of possibilities for doing this kind of research.
I’d love to talk more about these ideas—please reach out, or if you are coming to ICA, come talk to me!
The International Communication Association (ICA)’s 72nd annual conference is coming up in just a couple of weeks. This year, the conference takes place in Paris and a subset of our collective is flying out to present work in person. We are looking forward to meeting up, talking research, and eating croissants. À bientôt!
ICA takes place from Thursday, May 26th to Monday, May 30th, and we are presenting a total of ten (!!) times. All presentations given by members of the collective are scheduled between Friday and Sunday.
We start off with a presentation by Nathan TeBlunthuis on Friday at 11.00 AM, in Room 351 M (Palais des Congres). In a high-density paper session on Computational Approaches to Online Communities, Nate will present a paper entitled “Dynamics of Ecological Adaptation in Online Communities.”
Later that same day, at 3.30 PM in the Amphitheatre Havana (level 3; Palais des Congres), Carl Colglazier will discuss a paper that he collaborated on with Nick Diakopoulos: “Predictive Models in News Coverage of the COVID-19 Pandemic in the U.S.” This paper session is part of the ICA division Journalism Studies.
On Saturday, Floor Fiers will present in the paper session “Impression Management Online: FabriCATing An Image.” Their project, which they wrote with Nathan Walter, discusses “Comments on Airbnb and the Potential for Racial Bias” at 2.00 PM in Regency 1 (Hyatt).
Shortly after, that same afternoon, you’ll find two of our poster presentations at 5.00 PM in the Exhibit Hall (Havana; Palais des Congres, level 3). In one of them, Jeremy Foote will discuss his take on “a systems approach to studying online communities.”
The other poster, presented at the same time and place, is by Kaylea Champion and Benjamin Mako Hill on “Resisting Taboo in the Collaborative Production of Knowledge: Evidence from Wikipedia.”
Most of our presentations are on the fourth day of the conference. At 9.30 AM, we’ll be presenting in three locations at the same time! First, Floor will discuss their paper “Inequality and Discrimination in the Online Labor Market: a Scoping Review” in Room 311+312 (Palais des Congres). This presentation is part of the paper session “All Things Are Not Equal: CompliCATions From Digital Inequalities.”
Second, Carl will present work on behalf of himself, Aaron Shaw, and Benjamin Mako Hill during a high-density paper session in Room 242A (Palais des Congres). The title of their project is “Extended Abstract: Exhaustive Longitudinal Trace Data From Over 70,000 Wiki.”
Lastly, at the same time in Room 352B (Palais des Congres), Jeremy will present an interview study entitled “What Communication Supports Multifunctional Public Goods in Organizations? Using Agent-Based Modeling to Explore Differential Uses of Enterprise Social Media.” Jeremy’s co-authors on this paper are Jeffrey Treem and Bart van den Hooff.
On Sunday afternoon, at 3.30 PM in Room 311+312 (Palais des Congres), Tiwaladeoluwa Adekunle will talk about a qualitative project she collaborated on with Jeremy, Nate, and Laura Nelson: “Co-Creating Risk Online: Exploring Conceptualizations of COVID-19 Risk in Ideologically Distinct Online Communities.”
We will finish off our ICA 2022 presentations at 5.00 PM in Room 313+314 (Palais des Congres), where Kaylea will present on behalf of Isabella Brown, Lucy Bao, Jacinta Harshe, and Mako. The title of their paper is “Making Sense of Covid-19: Search Results and Information Providers”.
We look forward to sharing our research and connecting with you at ICA!
We’re going to be at CHI! The Community Data Science Collective will be presenting three papers. You can find us there in person in New Orleans, Louisiana, April 30 – May 5. If you’ve ever wanted a super cool CDSC sticker, this is your chance!
More than a billion people visit Wikipedia each month and millions have contributed as volunteers. Although Wikipedia exists in 300+ language editions, more than 90% of Wikipedia language editions have fewer than one hundred thousand articles. Many small editions are in languages spoken by small numbers of people, but the relationship between the size of a Wikipedia language edition and that language’s number of speakers—or even the number of viewers of the Wikipedia language editions—varies enormously. Why do some Wikipedias engage more potential contributors than others? We attempted to answer this question in a study of three Indian language Wikipedias that will be published and presented at the ACM Conference on Human Factors in Computing (CHI 2022).
To conduct our study, we selected 3 Wikipedia language communities that correspond to the official languages of 3 neighboring states of India: Marathi (MR) from the state of Maharashtra, Kannada (KN) from the state of Karnataka, and Malayalam (ML) from the state of Kerala (see the map in right panel of the figure above). While the three projects share goals, technological infrastructure, and a similar set of challenges, Malayalam Wikipedia’s community engaged its language speakers in contributing to Wikipedia at a much higher rate than the others. The graph above (left panel) shows that although MR Wikipedia has twice as many viewers as ML Wikipedia, ML has more than double the number of articles on MR.
Our study focused on identifying differentiating factors between the three Wikipedias that could explain these differences. Through a grounded theory analysis of interviews with 18 community participants from the three projects, we identified two broad explanations of a “positive participation cycle” in Malayalam Wikipedia and a “negative participation cycle” in Marathi and Kannada Wikipedias.
As the first step of our study, we conducted semistructured interviews with active participants of all three projects to understand their personal experiences and motivation; their perceptions of dynamics, challenges, and goals within their primary language community; and their perceptions of other language Wikipedia.
We found that MR and KN contributors experience more day-to-day barriers to participation than ML, and that these barriers hinder contributors’ day-to-day activity and impede engagement. For example, both MR and KN members reported a large number of content disputes that they felt reduced their desire to contribute.
But why do some Wikipedias like MR or KN have more day-to-day barriers to contribution like content disputes and low social support than others? Our interviews pointed to a series of higher-level explanations. For example, our interviewees reported important differences in the norms and rules used within each community as well as higher levels of territoriality and concentrated power structures in MR and KN.
Once again, though: why do the MR and KN Wikipedias have these issues with territoriality and centralized authority structures? Here we identify a third, even higher-level set of differences in the social and cultural contexts of the three language-speaking communities. For example, MR and KN community members attributed low engagement to broad cultural attitudes toward volunteerism and differences in their language community’s engagement with free software and free culture.
The two flow charts above visualize the explanatory mapping of divergent feedback loops we describe. The top part of the figure illustrates how the relatively supportive macro-level social environment in Kerala led to a larger group of potential contributors to ML as well as a chain reaction of processes that led to a Wikipedia better able to engage potential contributors. The process is an example of a positive feedback cycle. The second, bottom part of the figure shows the parallel, negative feedback cycle that emerged in MR and KN Wikipedias. In these settings, features of the macro-level social environment led to a reliance on a relatively small group of people for community leadership and governance. This led, in turn, to barriers to entry that reduced contributions.
One final difference between the three Wikipedias was the role that paid labor from NGOs played. Because the MR and KN Wikipedias struggled to recruit and engage volunteers, NGOs and foundations deployed financial resources to support the development of content in Marathi and Kannada, but not in ML to the same degree. Our work suggested this tended to further concentrate power among a small group of paid editors in ways that aggravated the meso-level community struggles. This is shown in the red box in the second (bottom) row of the figure.
The results from our study provide a conceptual framework for understanding how the embeddedness of social computing systems within particular social and cultural contexts shape various aspects of the systems. We found that experience with participatory governance and free/open-source software in the Malayalam community supported high engagement of contributors. Counterintuitively, we found that financial resources intended to increase participation in the Marathi and Kannada communities hindered the growth of these communities. Our findings underscore the importance of social and cultural context in the trajectories of peer production communities. These contextual factors help explain patterns of knowledge inequity and engagement on the internet.
Please refer to the preprint of the paper for more details on the study and our design suggestions for localized peer production projects. We’re excited that this paper has been accepted to CHI 2022 and received the Best Paper Honorable Mention Award! It will be published in the Proceedings of the ACM on Human-Computer Interaction and presented at the conference in May. The full citation for this paper is:
Sejal Khatri, Aaron Shaw, Sayamindu Dasgupta, and Benjamin Mako Hill. 2022. The social embeddedness of peer production: A comparative qualitative analysis of three Indian language Wikipedia editions. In CHI Conference on Human Factors in Computing Systems (CHI ’22), April 29-May 5, 2022, New Orleans, LA, USA. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/3491102.3501832