Come meet us at CHI 2022

We’re going to be at CHI! The Community Data Science Collective will be presenting three papers. You can find us there in person in New Orleans, Louisiana, April 30 – May 5. If you’ve ever wanted a super cool CDSC sticker, this is your chance!

Two red street cars going down a tree lined street.
Streetcars in New Orleans: 2000 series – Perley A. Thomas Car Works 900 Series Replicas” by Flavio~ is marked with CC BY 2.0.

Stefania (Stef) Druga (University of Washington) wrote “Family as a Third Space for AI Literacies: How do children and parents learn about AI together?” with Amy J. Ko and Fee Lia Christoph (University of Michigan). Stef will be presenting at “Interactive Learning Support Systems,” Monday May 2 at 14:15.

Sejal Khatri (University of Washington) received an honorable mention for her work “The Social Embeddedness of Peer Production: A Comparitive Qualitative Analysis of Three Indian Language Wikipedia Editions,” co-authored by Syamindu Dasgupta, Benjamin Mako Hill, and Aaron Shaw. Sejal will be presenting Tuesday May 3 at 14:15 in “Crowdwork and Collaboration.” Sejal, Aaron, Mako, and Syamindu also have a blog post available.

Ruijia Chen (University of Washington) also received an honorable mention for her paper “How Interest-Driven Content Creation Shapes Opportunities for Informal Learning in Scratch: A Case Study on Novices’ Use of Data Structures,” co-authored by Benjamin Mako Hill and Syamindu Dasgupta. Regina will be talking about it during the session “Programing and Coding Support” on Wednesday May 4 at 09:00. You can also read about Ruijia, Mako, and Syamindu’s work on our blog.

The CDSC logo, which looks a bit like a cloud with four legs, and the text "Community Data Science Collective."
You can have this on a sticker!

Notes from the CDSC Community Dialogue Series

This winter, the Community Data Science Collective launched a Community Dialogues series. These are meetings in which we invite community experts, organizers, and researchers to get together to share their knowledge of community practices and challenges, recent research, and how that research can be applied to support communities. We had our first meeting in February, with presentations from Jeremy Foot and Sohyeon Hwang on small communities and Nate TeBlunthius and Charlie Keine on overlapping communities.

Watch the introduction video starring Aaron Shaw!

Six grey and white birds sitting on a fence. A seventh bird is landing to join them.

Joining the Community” by Infomastern is marked with CC BY-SA 2.0.

What we covered

Here are some quick summaries of the presentations. After the presentations, we formed small groups to discuss how what we learned related to our own experiences and knowledge of communities.

Finding Success in Small Communities

Small communities often stay small, medium stay medium, and big stay big. Meteoric growth is uncommon. User control and content curation improves user experience. Small communities help people define their expectations. Participation in small communities is often very salient and help participants build group identity, but not personal relationships. Growth doesn’t mean success, and we need to move beyond that and solely using quantitative metrics to judge our success. Being small can be a feature, not a bug!

We built a list of discussion questions collaboratively. It included:

  • Are you actively trying to attract new members to your community? Why or why not?
  • How do you approach scale/size in your community/communities?
  • Do you experience pressure to grow? From where? Towards what end?
  • What kinds of connections do people seek in the community/communities you are a part of?
  • Can you imagine designs/interventions to draw benefits from small communities or sub-communities within larger projects/communities?
  • How to understand/set community members’ expectations regarding community size?
  • “Small communities promote group identity but not interpersonal relationships.” This seems counterintuitive.
  • How do you managing challenges around growth incentives/pressures?

Why People Join Multiple Communities

People join topical clusters of communities, which have more mutualistic relationships than competitive ones. There is a trilemma (like a dilemma) between large audience, specific content, and homophily (likemindness). No community can do everything, and it may be better for participants and communities to have multiple, overlapping spaces. This can be more engaging, generative, fulfilling, and productive. People develop portfolios of communities, which can involve many small communities..

Questions we had for each other:

  • Do members of your community also participate in similar communities?
  • What other communities are your members most often involved in?
  • Are they “competing” with you? Or “mutualistic” in some way?
  • In what other ways do they relate to your community?
  • There is a “trilemma” between the largest possible audience, specific content, and homophilous (likeminded/similar folks) community. Where does your community sit inside this trilemma?

Slides and videos

How you can get involved

You can subscribe to our mailing list! We’ll be making announcements about future events there. It will be a low volume mailing list.

Acknowledgements

Thanks to speakers Charlie Kiene, Jeremy Foote, Nate TeBlunthius, and Sohyeon Hwang! Kaylea Champion was heavily involved in planning and decision making. The vision for the event borrows from the User and Open Innovation workshops organized by Eric von Hippel and colleagues, as well as others. This event and the research presented in it were supported by multiple awards from the National Science Foundation (DGE-1842165; IIS-2045055; IIS-1908850; IIS-1910202), Northwestern University, the University of Washington, and Purdue University.

Session summaries and questions above were created collaboratively by event attendees.

Conferences, Publications, and Congratulations

This year was packed with things we’re excited about and want to celebrate and share. Great things happened to Community Data Science Collective members within our schools and the wider research community.

A smol brown and golden dog in front of a red door. The dog is wearing a pink collar with ladybugs. She also has very judgemental (or excited) eyebrows.
Meet Tubby! Sohyeon adopted Tubby this year.

Academic Successes

Sohyeon Hwang (Northwestern) and Wm Salt Hale (University of Washington) earned their master’s degrees. You can read Salt’s paper, “Resilience in FLOSS,” online.

Charlie Kiene and Regina Cheng completed their comprehensive exams and are now PhD candidates!

Nate TeBlunthuis defended his dissertation and started a post-doctoral fellowship at Northwestern. Jim Maddock defended his dissertation on December 16th.

Congratulations to everyone!

Teaching and Workshop Participation

Floor Fiers and Sohyeon ran a workshop at Computing Everywhere, a Northwestern initiative to help students build computational literacy. Sohyeon and Charlie participated in Yale SMGI Community Driven Governance Workshop. We also had standout attendance at Social Computing Systems Summer Camp, with Sneha Narayan, Stefania Druga, Charlie, Regina, Salt, and Sohyeon participating.

Regina was a teaching assistant for senior undergraduate students on their capstone projects. Regina’s mentees won Best Design and Best Engineering awards.

Conference Presentations

Sohyeon and Jeremy Foote presented together at CSCW (Computer Supported Co-operative Work) where they earned a Best Paper Honorable Mention award. Nick Vincent had two presentations at CSCW, one relating to Wikipedia links in search engine results and one on conscious data contribution. Benjamin Mako Hill and Nate presented on algorithmic flagging on Wikipedia.

Salt was interviewed on the FOSS and Crafts podcast. His conference presentations included Linux App Summit, SeaGL and DebConf. Kaylea Champion spoke at SeaGL and DebConf. Kaylea’s DebConf present was on her research on detecting at-risk projects in Debian.

Kaylea and Mako also presented at Software Analysis, Evolution and Reengineering, an IEEE conference.

Emilia Gan, Mako, Regina, and Stef organized the “Imagining Future Design of Tools for Youth Data Literacies” workshop at the 2021 Connected Learning Summit.

Our 2021 publications include:

  • Champion, Kaylea. 2021. “Underproduction: An approach for measuring risk in open source software.” 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 388-399, doi: 10.1109/SANER50967.2021.00043.
  • Fiers, Floor , Aaron Shaw , and Eszter Hargittai. 2021. “Generous Attitudes and Online Participation.” Journal of Quantitative Description: Digital Media, 1. https://doi.org/10.51685/jqd.2021.008
  • Hill, Benjamin Mako , and Aaron Shaw , 2021. “The hidden costs of requiring accounts: Quasi-experimental evidence from peer production.” Communication Research 48(6): 771-795. https://doi.org/10.1177%2F0093650220910345.
  • Hwang, Sohyeon and Jeremy Foote . 2021. “Why do people participate in small online communities?”. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 462:1-462:25. https://doi.org/10.1145/3479606
  • Shaw, Aaron and Eszter Hargittai. 2021. “Do the Online Activities of Amazon Mechanical Turk Workers Mirror Those of the General Population? A Comparison of Two Survey Samples.” International Journal of Communication 15: 4383–4398. https://ijoc.org/index.php/ijoc/article/view/16942
  • TeBlunthuis, Nathan , Benjamin Mako Hill , and Aaron Halfaker. 2021. “Effects of Algorithmic Flagging on Fairness: Quasi-experimental Evidence from Wikipedia.” Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 56 (April 2021), 27 pages. https://doi.org/10.1145/3449130
  • TeBlunthuis, Nathan. 2021 “Measuring Wikipedia Article Quality in One Dimension.” In Proceedings of the 17th International Symposium on Open Collaboration (OpenSym ’21). Online: ACM Press. https://doi.org/10.1145/3479986.3479991.

Join us for the CDSC PhD Application Q & A

A five by three grid of photos. In each one is a person being a little silly.
From the CDSC Autumn 2020 retreat

Thinking about applying to graduate school? Wonder what it’s like to pursue a PhD? Interested in understanding relationships between technology and society? Curious about how to do research on online communities like Reddit, Wikipedia, or GNU/Linux? The Community Data Science Collective is hosting a Q&A on November 5th at 13:00 ET / 12:00 CT / 10:00 PT for prospective students. This session is scheduled for an hour, to be divided between a larger group session with faculty and then smaller groups with current graduate students.

Please register as soon as possible (and before 10am US Eastern Time, UTC -5 on November 5 at the latest) by November 2 to submit questions and receive the link to the session. (We’ll do our best but might not be able to integrate question that arrive after November 3rd.)

This is an opportunity for prospective grad students to meet with CDSC faculty, students, and staff. We’ll be there to answer any questions you have about the group, the work we do, your applications to our various programs, and other topics. You can either submit a question ahead of time or ask one during the session.

About the CDSC

We are an interdisciplinary research group spread across Carleton, Northwestern University, Purdue University, and the University of Washington. (Carleton is not accepting graduate students, though the other universities are.) You can read more about PhD opportunities on our blog.

We are mostly quantitative social scientists pursuing research about the organization of online communities, peer production, online communities, and learning and collaboration in social computing systems. Our group research blog and publications page can tell you more about our work.

Notes About Attending

We are so excited to meet you! Please RSVP online to let us know if you’re coming. This form also gives you the opportunity to ask a question ahead of time. By doing this, we’ll be able to make sure we get to your questions.

We will post another announcement with attendance information. We will also email attendance details to all registered attendees.

Before the session, you may want to look through our blog, our wiki, and our people pages.

Please register by 10am US Eastern Time (UTC -5) on November 5 by November 2 in order to make sure your question(s) are included. Questions submitted after November 3 may not be addressed in the large group session.