Community Data Science Collective

October 22, 2024October 14, 2024

FOSSY 2024 Wrap Up: Darius Kazemi on “Community governance models on small-to-mid-size Mastodon servers

In the sixth talk of the Science of Community track we organized for FOSSY, independent FOSS researcher Darius Kazemi described the results of an interview study to learn from the moderation teams of decentralized social network servers. One of Darius’ key observations is the extensive compliance and legally-required work that running such a server requires.

This is part 6 of an 8-part series sharing highlights from the Science of Community track at FOSSY. Visit the FOSSY site for more bio details and an abstract of the talk.

October 21, 2024October 21, 2024

FOSSY 2024 Wrap Up: Bogdan Vasilescu on “Navigating Dependency Abandonment”

In the final talk of the Science of Community track we organized for FOSSY, Computer Science professor and FOSS researcher Dr. Bogdan Vasilescu described his team’s work to understand how developers think about abandoned dependencies. One of the key insights from this work is that abandonment of dependencies is quite common, but that updating a package to remove the abandoned dependencies is very slow — and that one of the factors that drives faster replacement is when projects explicitly announce that they are ending maintenance.

This is part 8 of an 8-part series sharing highlights from the Science of Community track at FOSSY. Visit the FOSSY site for more bio details and an abstract of the talk.

October 21, 2024October 15, 2024

FOSSY 2024 Wrap Up: Kaylea Champion on “Research Says…..Insights on Building, Leading, and Sustaining Open Source”

In the fifth talk of the Science of Community track we organized for FOSSY, Dr. Kaylea Champion describes a series of research results on both how to build high-quality FOSS and how to sustain a community alongside it. One of her key insights is that a great community is no guarantee of a high-quality project — and to serve the public, we need both.

This is part 5 of an 8-part series sharing highlights from the Science of Community track at FOSSY. Visit the FOSSY site for more bio details and an abstract of the talk.

October 19, 2024October 15, 2024

FOSSY 2024 Wrap Up: Paige Cruz on “The Art of Asking”

In the fourth talk of the Science of Community track we organized for FOSSY, principal developer advocate Paige Cruz shared the results of her investigation into the subject of how we can all do a better job of asking questions of one another in FOSS communities. One of her key insights is to invite us to engage with the perspective of those who might answer our question, and to think critically about what details we include and whether they really help others understand and respond — for example, a screenshot of our code can’t be copy pasted and might be unreadable, but a screenshot of a UI bug might replace wordy description.

This is part 4 of an 8-part series sharing highlights from the Science of Community track at FOSSY. Visit the FOSSY site for more bio details and an abstract of the talk.

October 18, 2024October 15, 2024

FOSSY 2024 Wrap Up: Ben Ford on “Private Equity companies only want one thing and it’s….”

In the third talk of the Science of Community track we organized for FOSSY, FOSS leader Ben Ford described his experience navigating the changes in his role when the Puppet project’s commercial partner was acquired by a private equity company. One of the essential takeaways from this talk is the different perspective towards community that a FOSS company takes versus a private equity company, and the challenge of communicating value in this context.

This is part 3 of an 8-part series sharing highlights from the Science of Community track at FOSSY. Visit the FOSSY site for more bio details and an abstract of the talk.

October 17, 2024October 15, 2024

FOSSY 2024 Wrap Up: Matthew Gaughan on “How do FOSS projects actually use new README documents?”

In the second talk of the Science of Community track we organized for FOSSY, CDSC PhD student Matthew Gaughan shared his research to understand how communities actually use README and CONTRIBUTING documents. Although guides to FOSS communities often recommend these documents be extensive and used as part of welcoming new contributors, we find that READMEs are often quite preliminary, and that CONTRIBUTING guides are often a reaction to an influx of contributions.

Excerpt from Matt’s presentation, Graph shows model coefficients for longitudinal activity data around governance document introduction for 2200+ FOSS projects packaged in the Debian GNU/Linux distribution.

This is part 2 of an 8-part series sharing highlights from the Science of Community track at FOSSY. Visit the FOSSY site for more bio details and an abstract of the talk.

October 16, 2024October 15, 2024

FOSSY 2024 Wrap Up: Dawn Foster on “From Data to Action: Using Metrics to Improve FOSS Communities”

Back in July, we kicked off the FOSSY conference Science of Community track with a talk from Dr. Dawn Foster. Dr. Foster shared an update on the work of the CHAOSS project to empower communities to use metrics to understand and improve their practices. Their Practitioner Guide Series, coupled with FOSS analytical tools, will help any community get started on their metrics journey.

This is part 1 of an 8-part series sharing highlights from the Science of Community track at FOSSY. Visit the FOSSY site for more bio details and an abstract of the talk.

October 14, 2024

Science of Community Dialogue: Complexities of Online Governance

On September 27th, we held our 9th Science of Community Dialogue with Sohyeon Hwang (Northwestern) and Seth Frey (University of California-Davis) sharing their research and insights on how communities self-govern amidst competing pressures in complex, multi-layered environments.

Sohyeon presented her work on “Trust and Friction: Community Governance and Privacy on Decentralized Social Media”. Her research aimed to answer the questions of “what aspects of community governance do communities use to shape privacy expectations?” and “how does the decentralized nature of that platform aid or undermine those expectations?”

Seth discussed his research on Apache, looking to answer the question “do things run the way they say they run?” His research explored the relationships between rules and regulations, how often these same rules as discussed among the community, and what exactly is being governed.

All in all, it was a wonderful dialogue and we greatly appreciate Sohyeon and Seth taking the time to share their research, as well as all our attendees joining us for a great discussion.

October 9, 2024October 9, 2024

CDSC PhD Application Info Session and Q & A – October 18

Thinking about applying to graduate school? Wonder what it’s like to pursue a PhD? Interested in understanding relationships between technology and society? Curious about how to do research on online communities like Reddit, Wikipedia, or GNU/Linux? The Community Data Science Collective is hosting a virtual Q&A session on October 18th at 12 pm PT, 2pm CT, 3pm ET for prospective students. This session is scheduled for an hour, to be divided between a larger group session with faculty and then smaller groups with current graduate students. If you would like to attend, register at this link!

This post provides a very brief run-down on the CDSC, the different universities and Ph.D. programs our faculty members are affiliated with, and some general ideas about what we’re looking for when we review Ph.D. applications.

What is the Community Data Science Collective?

The Community Data Science Collective (or CDSC) is a joint research group of (mostly quantitative) empirical social scientists and designers pursuing research about the organization of online communities, peer production, and learning and collaboration in social computing systems. We are based at Northwestern University, the University of Washington, The University of Texas at Austin, Purdue University, and a few other places. You can read more about us and our work on our research group blog and on the collective’s website/wiki.

What are these different Ph.D. programs? Why would I choose one over the other?

This year the group includes four faculty principal investigators (PIs) who are actively recruiting PhD students: Aaron Shaw (Northwestern University), Benjamin Mako Hill (University of Washington in Seattle), Nathan TeBlunthuis (University of Texas at Austin) and Jeremy Foote (Purdue University). Each of these PIs advise Ph.D. students in Ph.D. programs at their respective universities. Our programs are each described below.

Although we often work together on research and serve as co-advisors to students in each others’ projects, each faculty person has specific areas of expertise and interests. The reasons you might choose to apply to one Ph.D. program or to work with a specific faculty member could include factors like your previous training, career goals, and the alignment of your specific research interests with our respective skills.

At the same time, a great thing about the CDSC is that we all collaborate and regularly co-advise students across our respective campuses, so the choice to apply to or attend one program does not prevent you from accessing the expertise of our whole group. But please keep in mind that our different Ph.D. programs have different application deadlines, requirements, and procedures!

Faculty who are actively recruiting this year

If you are interested in applying to any of the programs, we strongly encourage you to reach out the specific faculty in that program before submitting an application.

Jeremy Foote is an Assistant Professor at the Brian Lamb School of Communication at Purdue University. He is affiliated with the Organizational Communication and Media, Technology, and Society programs. Jeremy’s research focuses on how individuals decide when and in what ways to contribute to online communities, how communities change the people who participate in them, and how both of those processes can help us to understand which things become popular and influential. He and his students use multiple methods, including data science, agent-based modeling, field experiments, and interviews.

Benjamin Mako Hill is an Associate Professor of Communication at the University of Washington. He is also adjunct faculty at UW’s Department of Human-Centered Design and Engineering (HCDE), Computer Science and Engineering (CSE) and Information School. Although many of Mako’s students are in the Department of Communication, he has also advised students in all three other departments—although he typically has more limited ability to admit students into those programs on his own and usually does so with a co-advisor in those departments. Mako’s research focuses on population-level studies of peer production projects, computational social science, efforts to democratize data science, and informal learning. Mako has also put together a webpage for prospective graduate students with some useful links and information..

Aaron Shaw is an Associate Professor in the Department of Communication Studies at Northwestern. In terms of Ph.D. programs, Aaron’s primary affiliations are with the Media, Technology and Society (MTS) and the Technology and Social Behavior (TSB) Ph.D. programs (please note: the TSB program is a joint degree between Communication and Computer Science). Aaron also has a courtesy appointment in the Sociology Department at Northwestern, but he has not directly supervised any Ph.D. advisees in that department (yet). Aaron’s current projects focus on comparative analysis of the organization of peer production communities and social computing projects, participation inequalities in online communities, and collaborative organizing in pursuit of public goods.

Nathan TeBlunthuis is an Assistant Professor in the School of Information at the University of Texas at Austin in the area of social informatics. Nathan’s research focuses on analyzing ecosystems of online communities, AI tools in peer production, and methods in computational social science. His current projects continue in these areas and also draw from them all to understand how information sources achieve legitimacy in online communities. He works primarily using computational tools and big data, but also grounds his work in qualitative evidence.

What do you look for in Ph.D. applicants?

There’s no easy or singular answer to this. In general, we look for curious, intelligent people driven to develop original research projects that advance scientific and practical understanding of topics that intersect with any of our collective research interests.

To get an idea of the interests and experiences present in the group, read our respective bios and CVs (follow the links above to our personal websites). Specific skills that we and our students tend to use on a regular basis include consuming and producing social science and/or social computing (human-computer interaction) research; applied statistics and statistical computing, various empirical research methods, social theory and cultural studies, and more.

Formal qualifications that speak to similar skills and show up in your resume, transcripts, or work history are great, but we are much more interested in your capacity to learn, think, write, analyze, and/or code effectively than in your credentials, test scores, grades, or previous affiliations. It’s graduate school and we do not expect you to show up knowing how to do all the things already.

Intellectual creativity, persistence, and a willingness to acquire new skills and problem-solve matter a lot. We think doctoral education is less about executing tasks that someone else hands you and more about learning how to identify a new, important problem; develop an appropriate approach to solving it; and explain all of the above and why it matters so that other people can learn from you in the future. Evidence that you can or at least want to do these things is critical. Indications that you can also play well with others and would make a generous, friendly colleague are really important too.

All of this is to say, we do not have any one trait or skill set we look for in prospective students. We strive to be inclusive along every possible dimension. Each person who has joined our group has contributed unique skills and experiences as well as their own personal interests. We want our future students and colleagues to do the same.

Now what?

Still not sure whether or how your interests might fit with the group? Still have questions? Still reading and just don’t want to stop? Follow the links above for more information. Feel free to send at least one of us an email. We are happy to try to answer your questions and always eager to chat. You can also register for and join our Q+A session on October 18 at 2:00pm CT.

September 26, 2024

Symposium on Online Community Research at Purdue

On September 13th, the Community Data Science Collective led the “Frontiers in Online Community Research Symposium” at Purdue University. We had a number of fantastic presenters and panelists discussing topics from moderating the Fediverse to the role of LLMs in online communities and how different academic disciplines approach online community research.

Eshwar Chandrashekharan (University of Illinois at Urbana-Champaign) joined as our keynote speaker. He presented research he and his group have been working on titled, “Proactive Approaches to Promote Community Resilience and Foster Desirable Behavior Online”. Eshwar discussed ongoing efforts to combat undesirable online behaviors through research and design that promote resilience and facilitate positive interactions within online conversations and communities.

Prior to Eshwar’s keynote, we had an opening panel and research presentations by CDSC members. For the opening panel, Purdue professors Diana Zulli (Communication) and Marcus Mann (Sociology) joined CDSC faculty Aaron Shaw (Northwestern), and Mako Hill (University of Washington) for an introductory Q&A panel. The panel discussed what we know about online communities, what new questions we are just starting to answer, and what exciting new methods are being used.

Following the panel, CDSC students Carl Colglazier (Northwestern), Sohyeon Hwang (Northwestern), and Kaylea Champion (University of Washington) gave really wonderful talks on their research. Carl talked about his work on moderation in the Fediverse, and the impact of site-level blocking. Sohyeon provided a number of provocations about community governance in the face of AI-driven changes, while Kaylea discussed her work on underproduction in social systems. They all gave fantastic presentations and inspired great conversations among attendees.

Overall, it was an excellent symposium that we hope helps to push our field forward. Thank you to all who attended and made it such a great event. A special thank you to the CDSC Purdue members for organizing the event and to Thatiany Andrade Nunes for taking photos!