2022 Year in Review

One of the fun things about being in a large lab is getting to celebrate everyone’s accomplishments, wins, and the good stuff that happens. Here is a brief-ish overview of some real successes from 2022.

A photo of the CDSC group on some steps with their hands in the air. There are nineteen people in the photo. NINETEEN!
Our 2022 Fall Retreat!

Graduations and New Positions

Our lab gained SIX new grad student members, Kevin Ackermann, Yibin Fang, Ellie Ross, Dyuti Jha, Hazel Chu, and Ryan Funkhouser. Kevin is a first year graduate student at Northwestern and Yibin and Ellie are first year students at University of Washington. Dyuti, Hazel, and Ryan joined us via Purdue and become Jeremy Foote’s first ever advisees. We had quite a number of undergraduate RAs. We also gained Divya Sikka from Interlake High School.

Nick Vincent became Dr. Nick Vincent, Ph.D (Northwestern). He will do a postdoc at the University of California Davis and University of Washington. Molly de Blanc earned their master’s degree (New York University). Dr. Nate TeBlunthius joined the University of Michigan as a post-doc, working with Professor Ceren Budak.

Kaylea Champion and Regina Cheng had their dissertation proposals approved and Floor Fiers finished their qualifying exams and is now a Ph.D. candidate. Carl Colglaizer finished his coursework.

Aaron Shaw started an appointment as the Scholar-in-Residence for King County, Washington, as well as Visiting Professor in the Department of Communication at the University of Washington.

Teaching

As faculty, it is expected that Jeremy Foote, Mako Hill, Sneha Narayan, and Aaron Shaw taught classes. As a class teaching assistant, Kaylea won an Outstanding Teaching Award! Floor also taught a public speaking class. CDSC members were also teaching assistants, led workshops, and gave guest lectures in classes.

an icon of a silhouette holding a book and a wand, with stars and planets around them. Text reads "Best Teacher in the Universe."
BEST TEACHER” by mickeymanzzz is licensed under CC BY-SA 2.0.

Presentations

This list is far from complete, including some highlights!

Carl presented at ICA alongside Nicholas Diakopoulos, “Predictive Models in News Coverage of the COVID-19 Pandemic in the United States.”

Floor was present at the Easter Sociological Society (ESS), AoIR (Association of Internet Researchers), and ICA. They won a top paper award at National Communication Association (NCA): Walter, N., Suresh, S., Brooks, J. J., Saucier, C., Fiers, F., & Holbert, R. L. (2022, November). The Chaffee Principle: The Most Likely Effect of Communication…Is Further Communication. National Communication Association (NCA) National Convention, New Orleans, LA.

Kaylea had a whopping two papers at ICA, a keynote at the IEEE Symposium on Digital Privacy and Social Media, and presentations at CSCW Doctoral Consortium, a CSCW workshop, and the DUB Doctoral Consortium. She also participated in Aaron Swartz Day, SeaGL, CHAOSSCon, MozFest, and an event at UMASS Boston.

Molly also participated in Aaron Swartz Day, and a workshop at CSCW on volunteer labor and data.

Regina gave presentations at the Makecode Team at Microsoft Research, Expertise@scale Salon (Emory University), Microsoft Research HCI Seminar, and CSCW (“Many Destinations, Many Pathways: A Quantitative Analysis of Legitimate Peripheral Participation in Scratch. 2022” and “Feedback Exchange and Online Affinity: A Case Study of Online Fanfiction Writers“) (among others). She attended CHI and NAACL (with two additional papers). Regina’s paper with Syamindu Dasgupta and Mako HIll at CHI 2022 (“How Interest-Driven Content Creation Shapes Opportunities for Informal Learning in Scratch: A Case Study on Novices’ Use of Data Structures“) won Best Paper Honorable Mention Award.

Sohyeon was at GLF as a knowledge steward and presented two posters at the HCI+D Lambert Conference (one with Emily Zou and one with Charlie Kiene, Serene Ong, and Aaron). She also presented at ICWSM, had posters at ICSSI and IC2S2, and organized a workshop at CSCW. In addition to more traditional academic presentations, Sohyeon was on a fireside chat panel hosted by d/arc server, guest lectured at the University of Washington and Northwestern, and met with Discord moderators to talk about heterogeneity in online governance. Sohyeon also won the Half-Bake Off at the CDSC fall retreat.

Public Scholarship

A photo of four people. Two of them are sitting and looking at laptops, while two of them are standing and looking at the laptops thinking. Only one person is smiling.
This image is from 2016

We did a lot of public scholarship this year! Among presentations, leading workshops, and organizing public facing events, CDSC also ran the Science of Community Dialogue Series. Presenters from within CDSC include Jeremy Foote, Sohyeon Hwang, Nate TeBlunthius, Charlie Kiene, Kaylea Champion, Regina Cheng, and Nick Vincent. Guest speakers included Dr. Shruti Sannon, Dr. Denae Ford, and Dr. Amy X. Zhang. To attend future Dialogues, sign up for our low-volume email list!

These events are organized by Molly, with assistance from Aaron and Mako.

Publications

Rather than listing publications here, you can check them out on the wiki.

Announcing the Community Dialogue on Accountable Governance

Join the Community Data Science Collective (CDSC) for our 4th Science of Community Dialogue! This Community Dialogue will take place on January 20 at 10:00 PT (18:00 UTC) . This Dialogue focuses on community governance and data. Professor Amy X. Zhang (University of Washington) will join Dr. Nick Vincent (Northwestern University, UC Davis) to cover topics including:

  • how communities can develop accountable governance
  • the distribution of power and decision making in communities
  • how collective action can impact systems
  • data leverage

You can register online.

Full Description:

How can communities develop and understand accountable governance? So many online environments rely on community members in profound ways without being accountable to them in direct ways. In this session, we will explore this topic and its implications for online communities and platforms. 

First, Nick Vincent (Northwestern, UC Davis) will discuss the opportunities for so-called “data leverage” and will highlight the potential to push back on the “data status quo” to build compelling alternatives, including the potential for “data dividends” that allow a broader set of users to economically benefit from their contributions. 

The idea of “data leverage” comes out of a basic, but little discussed fact: Many technologies are highly reliant on content and behavioral traces created by everyday Internet users, and particularly online community members who contribute text, images, code, editorial judgement, rankings, ratings, and more.. The technologies that rely on these resources include ubiquitous and familiar tools like search engines as well as new bleeding edge “Generative AI” systems that produce novel art, prose, code and more. Because these systems rely on contributions from Internet users, collective action by these users (for instance, withholding content) has the potential to impact system performance and operators. 

Next, Amy Zhang (University of Washington) will discuss how communities can think about their governance and the ways in which the distribution of power and decision-making are encoded into the online community software that communities use. She will then describe a tool called PolicyKit that has been developed with the aim of breaking out of common top-down models for governance in online communities to enable governance models that are more open, transparent, and democratic. PolicyKit works by integrating with a community’s platform(s) of choice for online participation (e.g., Slack, Github, Discord, Reddit, OpenCollective), and then provides tools for community members to create a wide range of governance policies and automatically carry out those policies on and across their home platforms. She will then conclude with a discussion of specific governance models and how they incorporate legitimacy and accountability in their design.

What is a Dialogue?

The Science of Community Dialogue Series is a series of conversations between researchers, experts, community organizers, and other people who are interested in how communities work, collaborate, and succeed. You can watch this short introduction video with Aaron Shaw.

Community Dialogue: Informal Learning

We had another Science of Community Dialogue! This most recent one was themed around informal learning, talking about communities as informal learning spaces and the sorts of tools and habits communities can adopt to help learners, mentors, and newcomers. We had presentations from Ruijia (Regina) Cheng (University of Washington, CDSC) and Dr. Denae Ford Robinson (Microsoft, University of Washington).

Regina Cheng covered three related research projects and relevant findings:

  • Ruijia Cheng and Benjamin Mako Hill. 2022. “Many Destinations, Many Pathways: A Quantitative Analysis of Legitimate Peripheral Participation in Scratch.” https://doi.org/10.1145/3555106
  • Ruijia Cheng, Sayamindu Dasgupta, and Benjamin Mako Hill. 2022. “How Interest-Driven Content Creation Shapes Opportunities for Informal Learning in Scratch: A Case Study on Novices’ Use of Data Structures.” https://doi.org/10.1145/3491102.3502124
  • Ruijia Cheng and Jenna Frens. 2022. “Feedback Exchange and Online Affinity: A Case Study of Online Fanfiction Writers.” https://doi.org/10.1145/3555127

Participants collaboratively put together three takeaways from Regina Cheng’s presentation.

We often talk about wanting to support “learning” in some general sense, but a critically important question to ask is “learning about what.” Let’s say we want people to learn three things A, B, and C. The kinds of actions or behaviors that support learning goal A often have no effect on B, and C. And sometimes they actively hurt it. We need to be more specific about what we want people to learn because there are tradeoffs.

Social support is wonderful in that users create examples and resources and answer questions. But it also has this narrowing effect. There’s a piling-on effect that makes it easier and easier (and more likely!) to learn the things that folks have learned before and less likely that people learn anything else.

Feedback is not about information transfer, it’s about relationships. To best promote learning, we should create rich, legitimate, inclusive social environment. These are perhaps good things to do anyway.

Dr. Denae Ford Robinson focused on free and open source software (FOSS) communities as a case study of learning communities. She covered theory, needs, and demonstrated tools designed to help with the mentorship and the learning process.

Community-driven settings like FOSS (and social-good oriented projects in particular) rely enormously on volunteers and/or people opting into participation in ways that create huge challenges related to promoting project sustainability: the most active participants are overloaded in a way that is a recipe for burnout.

The path to sustainability involves attracting, retaining, and then sustaining contributions and understanding these processes as both (a) part of the lifecycle of a user and (b) part of a set of dynamics and lifecycle within the community (e.g., dynamics of community growth).

Approach 1 involves providing new information to help maintainers understand how things are going in their communities. A lack of insight and easy access to data is a cause of inefficiency and burnout.

Approach 2 involves making specific, structured recommendations to maintainers based on the experience of others in the past to do things like add tags and to shape behavior.

Approach 3 involves automating aspects of identifying and recognizing work (and perhaps other tasks) as a way of promoting newcomer experiences and reducing the load on maintainers for doing that.

This event and some of the research presented in it were supported by multiple awards from the National Science Foundation (DGE-1842165; IIS-2045055; IIS-1908850; IIS-1910202), Northwestern University, the University of Washington, and Purdue University.

How to Network from Home

We have been going on Lab Dates and it is pretty cool.

Five penguins standing in a cold looking environment. It appears as though three of them are chatting with one another and the other two are having their own conversation.
Caption This Photo” by U.S. Geological Survey is marked with CC0 1.0.

CSCW 2021 introduced Lab Speed Dating wherein labs were matched and given an hour to get to know each other. Sohyeon Hwang organized our first lab date. It was so much fun we decided to go on more in order to meet other groups. I wanted to share a bit about this and our process in case you are interested in trying it out or want to have a meetup with us.

After the initial CSCW Lab Date we made a very long list of other labs we want to meet and have (slowly) been inviting them to come by. We also included individual researchers, people who collaborate in smaller, informal groups, co-authors, and corporate research teams.

We use our “softblock” to schedule meetings, rather than finding a new time for each meeting. The CDSC maintains a softblock, which is a block of time for whatever comes up, one-off meetings we need to schedule, and co-working sessions. (Today I am using the softblock to write this blog post!)

We are pretty open to different structures for our lab dates. So far the ones with full labs have been divided into two parts: 1) everyone introduces themselves as briefly as we can manage and then 2) we break out into small groups for short periods of time to talk. We try to cycle through 2-3 of these breakouts, depending on how many people are in attendance. When meeting with individuals, our guests typically present a piece of work that we workshop or discussed their interests in a more general sense and we talk about them as a whole group. We are open to other models, but nothing has come up yet.

Blocks have focused around networking and getting to know other researchers on a professional level. Because we have been attending fewer in-person events, we have had fewer chances to meet new people. Even at events it can be hard to connect with the people you want to meet and it is very hard (for us) to have everyone from the CDSC in a space together with another group.

If you are interested in going on a lab date with us, you can message me on IRC or email me (details here). We have a lot of open spots for the rest of the quarter and one of them could be yours!

Second Community Dialogue on Anonymity and Privacy

We recently held our second Community Dialogue around the theme of anonymity and privacy. Kaylea Champion presented on the role of anonymity in peer-contribution communities. Dr. Shruti Sannon joined us from the University of Michigan and talked about privacy in the gig economy.

What’s Anonymity Worth (Kaylea Champion)

Anonymity can protect and empower contributors in communities. Anonymity can make people feel safer or actually be safer. For example: Wikipedia editors who are working on controversial pages within contested geographies may be safer when they are able to contribute anonymously. Anonymous contribution is not without problems, as it can also empower trolls, harassers, and other bad actors. For more details, and actions you can take or policies to recommend within your communities, watch the video of Kaylea Champion’s presentation below.

Privacy and Surveillance in the Gig Economy (Dr. Shruti Sannon)

Gig workers can be asked or coerced to give up privacy in exchange for money through the design of the gig platforms they are using or by request of customers. Gig workers also use surveillance tools as a means of protecting themselves — some ride share drivers have cameras in their cars for this purpose. Dr. Sannon shared the broader implications of this situation, and what it can mean outside of the gig economy. To learn more, watch the video below.

Join us!

You can subscribe to our mailing list! We’ll be making announcements about future events there. It is a low volume mailing list.

Acknowledgements

Thanks to speakers Kaylea Champion and Shruti Sannon. The vision for this event borrows from the User and Open Innovation workshops organized by Eric von Hippel and colleagues, as well as others. This event and the research presented in it were supported by multiple awards from the National Science Foundation (DGE-1842165; IIS-2045055; IIS-1908850; IIS-1910202), Northwestern University, the University of Washington, and Purdue University.

Come meet us at CHI 2022

We’re going to be at CHI! The Community Data Science Collective will be presenting three papers. You can find us there in person in New Orleans, Louisiana, April 30 – May 5. If you’ve ever wanted a super cool CDSC sticker, this is your chance!

Two red street cars going down a tree lined street.
Streetcars in New Orleans: 2000 series – Perley A. Thomas Car Works 900 Series Replicas” by Flavio~ is marked with CC BY 2.0.

Stefania (Stef) Druga (University of Washington) wrote “Family as a Third Space for AI Literacies: How do children and parents learn about AI together?” with Amy J. Ko and Fee Lia Christoph (University of Michigan). Stef will be presenting at “Interactive Learning Support Systems,” Monday May 2 at 14:15.

Sejal Khatri (University of Washington) received an honorable mention for her work “The Social Embeddedness of Peer Production: A Comparitive Qualitative Analysis of Three Indian Language Wikipedia Editions,” co-authored by Syamindu Dasgupta, Benjamin Mako Hill, and Aaron Shaw. Sejal will be presenting Tuesday May 3 at 14:15 in “Crowdwork and Collaboration.” Sejal, Aaron, Mako, and Syamindu also have a blog post available.

Ruijia Chen (University of Washington) also received an honorable mention for her paper “How Interest-Driven Content Creation Shapes Opportunities for Informal Learning in Scratch: A Case Study on Novices’ Use of Data Structures,” co-authored by Benjamin Mako Hill and Syamindu Dasgupta. Regina will be talking about it during the session “Programing and Coding Support” on Wednesday May 4 at 09:00. You can also read about Ruijia, Mako, and Syamindu’s work on our blog.

The CDSC logo, which looks a bit like a cloud with four legs, and the text "Community Data Science Collective."
You can have this on a sticker!

Notes from the CDSC Community Dialogue Series

This winter, the Community Data Science Collective launched a Community Dialogues series. These are meetings in which we invite community experts, organizers, and researchers to get together to share their knowledge of community practices and challenges, recent research, and how that research can be applied to support communities. We had our first meeting in February, with presentations from Jeremy Foot and Sohyeon Hwang on small communities and Nate TeBlunthius and Charlie Keine on overlapping communities.

Watch the introduction video starring Aaron Shaw!

Six grey and white birds sitting on a fence. A seventh bird is landing to join them.

Joining the Community” by Infomastern is marked with CC BY-SA 2.0.

What we covered

Here are some quick summaries of the presentations. After the presentations, we formed small groups to discuss how what we learned related to our own experiences and knowledge of communities.

Finding Success in Small Communities

Small communities often stay small, medium stay medium, and big stay big. Meteoric growth is uncommon. User control and content curation improves user experience. Small communities help people define their expectations. Participation in small communities is often very salient and help participants build group identity, but not personal relationships. Growth doesn’t mean success, and we need to move beyond that and solely using quantitative metrics to judge our success. Being small can be a feature, not a bug!

We built a list of discussion questions collaboratively. It included:

  • Are you actively trying to attract new members to your community? Why or why not?
  • How do you approach scale/size in your community/communities?
  • Do you experience pressure to grow? From where? Towards what end?
  • What kinds of connections do people seek in the community/communities you are a part of?
  • Can you imagine designs/interventions to draw benefits from small communities or sub-communities within larger projects/communities?
  • How to understand/set community members’ expectations regarding community size?
  • “Small communities promote group identity but not interpersonal relationships.” This seems counterintuitive.
  • How do you managing challenges around growth incentives/pressures?

Why People Join Multiple Communities

People join topical clusters of communities, which have more mutualistic relationships than competitive ones. There is a trilemma (like a dilemma) between large audience, specific content, and homophily (likemindness). No community can do everything, and it may be better for participants and communities to have multiple, overlapping spaces. This can be more engaging, generative, fulfilling, and productive. People develop portfolios of communities, which can involve many small communities..

Questions we had for each other:

  • Do members of your community also participate in similar communities?
  • What other communities are your members most often involved in?
  • Are they “competing” with you? Or “mutualistic” in some way?
  • In what other ways do they relate to your community?
  • There is a “trilemma” between the largest possible audience, specific content, and homophilous (likeminded/similar folks) community. Where does your community sit inside this trilemma?

Slides and videos

How you can get involved

You can subscribe to our mailing list! We’ll be making announcements about future events there. It will be a low volume mailing list.

Acknowledgements

Thanks to speakers Charlie Kiene, Jeremy Foote, Nate TeBlunthius, and Sohyeon Hwang! Kaylea Champion was heavily involved in planning and decision making. The vision for the event borrows from the User and Open Innovation workshops organized by Eric von Hippel and colleagues, as well as others. This event and the research presented in it were supported by multiple awards from the National Science Foundation (DGE-1842165; IIS-2045055; IIS-1908850; IIS-1910202), Northwestern University, the University of Washington, and Purdue University.

Session summaries and questions above were created collaboratively by event attendees.

Conferences, Publications, and Congratulations

This year was packed with things we’re excited about and want to celebrate and share. Great things happened to Community Data Science Collective members within our schools and the wider research community.

A smol brown and golden dog in front of a red door. The dog is wearing a pink collar with ladybugs. She also has very judgemental (or excited) eyebrows.
Meet Tubby! Sohyeon adopted Tubby this year.

Academic Successes

Sohyeon Hwang (Northwestern) and Wm Salt Hale (University of Washington) earned their master’s degrees. You can read Salt’s paper, “Resilience in FLOSS,” online.

Charlie Kiene and Regina Cheng completed their comprehensive exams and are now PhD candidates!

Nate TeBlunthuis defended his dissertation and started a post-doctoral fellowship at Northwestern. Jim Maddock defended his dissertation on December 16th.

Congratulations to everyone!

Teaching and Workshop Participation

Floor Fiers and Sohyeon ran a workshop at Computing Everywhere, a Northwestern initiative to help students build computational literacy. Sohyeon and Charlie participated in Yale SMGI Community Driven Governance Workshop. We also had standout attendance at Social Computing Systems Summer Camp, with Sneha Narayan, Stefania Druga, Charlie, Regina, Salt, and Sohyeon participating.

Regina was a teaching assistant for senior undergraduate students on their capstone projects. Regina’s mentees won Best Design and Best Engineering awards.

Conference Presentations

Sohyeon and Jeremy Foote presented together at CSCW (Computer Supported Co-operative Work) where they earned a Best Paper Honorable Mention award. Nick Vincent had two presentations at CSCW, one relating to Wikipedia links in search engine results and one on conscious data contribution. Benjamin Mako Hill and Nate presented on algorithmic flagging on Wikipedia.

Salt was interviewed on the FOSS and Crafts podcast. His conference presentations included Linux App Summit, SeaGL and DebConf. Kaylea Champion spoke at SeaGL and DebConf. Kaylea’s DebConf present was on her research on detecting at-risk projects in Debian.

Kaylea and Mako also presented at Software Analysis, Evolution and Reengineering, an IEEE conference.

Emilia Gan, Mako, Regina, and Stef organized the “Imagining Future Design of Tools for Youth Data Literacies” workshop at the 2021 Connected Learning Summit.

Our 2021 publications include:

  • Champion, Kaylea. 2021. “Underproduction: An approach for measuring risk in open source software.” 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). pp. 388-399, doi: 10.1109/SANER50967.2021.00043.
  • Fiers, Floor , Aaron Shaw , and Eszter Hargittai. 2021. “Generous Attitudes and Online Participation.” Journal of Quantitative Description: Digital Media, 1. https://doi.org/10.51685/jqd.2021.008
  • Hill, Benjamin Mako , and Aaron Shaw , 2021. “The hidden costs of requiring accounts: Quasi-experimental evidence from peer production.” Communication Research 48(6): 771-795. https://doi.org/10.1177%2F0093650220910345.
  • Hwang, Sohyeon and Jeremy Foote . 2021. “Why do people participate in small online communities?”. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 462:1-462:25. https://doi.org/10.1145/3479606
  • Shaw, Aaron and Eszter Hargittai. 2021. “Do the Online Activities of Amazon Mechanical Turk Workers Mirror Those of the General Population? A Comparison of Two Survey Samples.” International Journal of Communication 15: 4383–4398. https://ijoc.org/index.php/ijoc/article/view/16942
  • TeBlunthuis, Nathan , Benjamin Mako Hill , and Aaron Halfaker. 2021. “Effects of Algorithmic Flagging on Fairness: Quasi-experimental Evidence from Wikipedia.” Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 56 (April 2021), 27 pages. https://doi.org/10.1145/3449130
  • TeBlunthuis, Nathan. 2021 “Measuring Wikipedia Article Quality in One Dimension.” In Proceedings of the 17th International Symposium on Open Collaboration (OpenSym ’21). Online: ACM Press. https://doi.org/10.1145/3479986.3479991.

Join us for the CDSC PhD Application Q & A

A five by three grid of photos. In each one is a person being a little silly.
From the CDSC Autumn 2020 retreat

Thinking about applying to graduate school? Wonder what it’s like to pursue a PhD? Interested in understanding relationships between technology and society? Curious about how to do research on online communities like Reddit, Wikipedia, or GNU/Linux? The Community Data Science Collective is hosting a Q&A on November 5th at 13:00 ET / 12:00 CT / 10:00 PT for prospective students. This session is scheduled for an hour, to be divided between a larger group session with faculty and then smaller groups with current graduate students.

Please register as soon as possible (and before 10am US Eastern Time, UTC -5 on November 5 at the latest) by November 2 to submit questions and receive the link to the session. (We’ll do our best but might not be able to integrate question that arrive after November 3rd.)

This is an opportunity for prospective grad students to meet with CDSC faculty, students, and staff. We’ll be there to answer any questions you have about the group, the work we do, your applications to our various programs, and other topics. You can either submit a question ahead of time or ask one during the session.

About the CDSC

We are an interdisciplinary research group spread across Carleton, Northwestern University, Purdue University, and the University of Washington. (Carleton is not accepting graduate students, though the other universities are.) You can read more about PhD opportunities on our blog.

We are mostly quantitative social scientists pursuing research about the organization of online communities, peer production, online communities, and learning and collaboration in social computing systems. Our group research blog and publications page can tell you more about our work.

Notes About Attending

We are so excited to meet you! Please RSVP online to let us know if you’re coming. This form also gives you the opportunity to ask a question ahead of time. By doing this, we’ll be able to make sure we get to your questions.

We will post another announcement with attendance information. We will also email attendance details to all registered attendees.

Before the session, you may want to look through our blog, our wiki, and our people pages.

Please register by 10am US Eastern Time (UTC -5) on November 5 by November 2 in order to make sure your question(s) are included. Questions submitted after November 3 may not be addressed in the large group session.