Notes from the CDSC Community Dialogue Series

This winter, the Community Data Science Collective launched a Community Dialogues series. These are meetings in which we invite community experts, organizers, and researchers to get together to share their knowledge of community practices and challenges, recent research, and how that research can be applied to support communities. We had our first meeting in February, with presentations from Jeremy Foot and Sohyeon Hwang on small communities and Nate TeBlunthius and Charlie Keine on overlapping communities.

Watch the introduction video starring Aaron Shaw!

Six grey and white birds sitting on a fence. A seventh bird is landing to join them.

Joining the Community” by Infomastern is marked with CC BY-SA 2.0.

What we covered

Here are some quick summaries of the presentations. After the presentations, we formed small groups to discuss how what we learned related to our own experiences and knowledge of communities.

Finding Success in Small Communities

Small communities often stay small, medium stay medium, and big stay big. Meteoric growth is uncommon. User control and content curation improves user experience. Small communities help people define their expectations. Participation in small communities is often very salient and help participants build group identity, but not personal relationships. Growth doesn’t mean success, and we need to move beyond that and solely using quantitative metrics to judge our success. Being small can be a feature, not a bug!

We built a list of discussion questions collaboratively. It included:

  • Are you actively trying to attract new members to your community? Why or why not?
  • How do you approach scale/size in your community/communities?
  • Do you experience pressure to grow? From where? Towards what end?
  • What kinds of connections do people seek in the community/communities you are a part of?
  • Can you imagine designs/interventions to draw benefits from small communities or sub-communities within larger projects/communities?
  • How to understand/set community members’ expectations regarding community size?
  • “Small communities promote group identity but not interpersonal relationships.” This seems counterintuitive.
  • How do you managing challenges around growth incentives/pressures?

Why People Join Multiple Communities

People join topical clusters of communities, which have more mutualistic relationships than competitive ones. There is a trilemma (like a dilemma) between large audience, specific content, and homophily (likemindness). No community can do everything, and it may be better for participants and communities to have multiple, overlapping spaces. This can be more engaging, generative, fulfilling, and productive. People develop portfolios of communities, which can involve many small communities..

Questions we had for each other:

  • Do members of your community also participate in similar communities?
  • What other communities are your members most often involved in?
  • Are they “competing” with you? Or “mutualistic” in some way?
  • In what other ways do they relate to your community?
  • There is a “trilemma” between the largest possible audience, specific content, and homophilous (likeminded/similar folks) community. Where does your community sit inside this trilemma?

Slides and videos

How you can get involved

You can subscribe to our mailing list! We’ll be making announcements about future events there. It will be a low volume mailing list.

Acknowledgements

Thanks to speakers Charlie Kiene, Jeremy Foote, Nate TeBlunthius, and Sohyeon Hwang! Kaylea Champion was heavily involved in planning and decision making. The vision for the event borrows from the User and Open Innovation workshops organized by Eric von Hippel and colleagues, as well as others. This event and the research presented in it were supported by multiple awards from the National Science Foundation (DGE-1842165; IIS-2045055; IIS-1908850; IIS-1910202), Northwestern University, the University of Washington, and Purdue University.

Session summaries and questions above were created collaboratively by event attendees.

Catching up on the Collective’s 2021 PhD Q&A

On November 5, folks from the Community Data Science Collective held an open session to allow folks to answer questions about our PhD programs. We’ve posted a 30 minutes video of the first half of the event on our group YouTube channel that includes an introduction to the group and answers some basic questions that attendees had submitted in advance.

Video of the first 30 minutes of the PhD Q&A.

If you have additional questions, feel free to reach out to the CDSC faculty or to anybody else in the group. Contact information for everybody should be online.

Keep in mind that the first due date (University of Washington Department of Communication MA/PhD program) is November 15, 2021.

The rest of the deadlines at Purdue (Brian Lamb School of Communication), Northwestern (Media, Technology & Society; Technology & Social Behavior), and UW (Human-Centered Design & Engineering; Computer Science & Engineering; Information School) are in December—mostly on December 1st.

Join us for the CDSC PhD Application Q & A

A five by three grid of photos. In each one is a person being a little silly.
From the CDSC Autumn 2020 retreat

Thinking about applying to graduate school? Wonder what it’s like to pursue a PhD? Interested in understanding relationships between technology and society? Curious about how to do research on online communities like Reddit, Wikipedia, or GNU/Linux? The Community Data Science Collective is hosting a Q&A on November 5th at 13:00 ET / 12:00 CT / 10:00 PT for prospective students. This session is scheduled for an hour, to be divided between a larger group session with faculty and then smaller groups with current graduate students.

Please register as soon as possible (and before 10am US Eastern Time, UTC -5 on November 5 at the latest) by November 2 to submit questions and receive the link to the session. (We’ll do our best but might not be able to integrate question that arrive after November 3rd.)

This is an opportunity for prospective grad students to meet with CDSC faculty, students, and staff. We’ll be there to answer any questions you have about the group, the work we do, your applications to our various programs, and other topics. You can either submit a question ahead of time or ask one during the session.

About the CDSC

We are an interdisciplinary research group spread across Carleton, Northwestern University, Purdue University, and the University of Washington. (Carleton is not accepting graduate students, though the other universities are.) You can read more about PhD opportunities on our blog.

We are mostly quantitative social scientists pursuing research about the organization of online communities, peer production, online communities, and learning and collaboration in social computing systems. Our group research blog and publications page can tell you more about our work.

Notes About Attending

We are so excited to meet you! Please RSVP online to let us know if you’re coming. This form also gives you the opportunity to ask a question ahead of time. By doing this, we’ll be able to make sure we get to your questions.

We will post another announcement with attendance information. We will also email attendance details to all registered attendees.

Before the session, you may want to look through our blog, our wiki, and our people pages.

Please register by 10am US Eastern Time (UTC -5) on November 5 by November 2 in order to make sure your question(s) are included. Questions submitted after November 3 may not be addressed in the large group session.

Open Lab at the University of Washington

If you are at the University of Washington (or not at UW but in Seattle) and are interested in seeing what we’re up to, you can join us for a Community Data Science Collective “open lab” this Friday (April 6th) 3-5pm in our new lab space (CMU 306). Collective members from Northwestern University will be in town as well, so there’s even more reason to come!

The open lab is an opportunity to learn about our research, catch up over snacks and beverages, and pick up a sticker or two. We will have no presentations but several posters describing projects we are working on.

Introduction to R workshop

I recently taught a two-session workshop introducing R to Kellogg MBA students. I had  a few goals for the workshops:

  1. Convince students of the benefits of using text-based programming for data exploration and analysis
  2. Introduce basic programming concepts (e.g., variables, functions)
  3. Give students a basic understanding of how to do some fundamental data analysis tasks in R: importing, cleaning, visualizing, and modeling

Those are really big goals for only four hours. I decided to use the tidyverse as much as possible and not even teach base R syntax like ‘[,]’, apply, etc. I used the first session to show and explain code using the nycflights13 dataset. For the the second session we did a few more examples but mostly worked on exercises using a dataset from Wikia that I created (with help from Mako and Aaron Halfaker‘s code and data).

Learning R does have its downsides

Retrospection

Overall, I think that the workshops went pretty well. I think that students definitely have a better understanding and a better set of tools than I did after I had used R for four hours!

That being said, there was plenty of room for improvement. I am scheduled to teach another set of workshops early next year and I’m planning to make a few changes:

  1. Make both of the workshops more hands-on and interactive. I think I’ll divide the topics covered: the first workshop will be on importing, cleaning, and grouping data and the second will be on visualizing and creating inferential models.
  2. Get more help – teaching non-programmers R requires some hand-holding and individual attention. To be successful, I think a workshop like this requires 1 “TA” for every 8-10 students.
  3. Find a more relevant dataset. Although I actually learned a few things about my dataset that will help with my papers that use it, I think it would be better to have a dataset that is as similar as possible to what students will be working with in their careers.
  4. Connect the visualization and regression more directly to a specific analysis problem rather than as syntax-learning exercises.

Reuse this workshop!

I found some pretty good resources already in existence for introducing students to R, but none of them quite fit the scope of what I was looking for.  All of the code that I used (as well as some slides for the beginning of class) are on github and GPL licensed. Please reuse my work and submit pull requests!

Community Data Science Workshops in Spring 2015

The Community Data Science Workshops are a series of project-based workshops being held at the University of Washington for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free and open source software, and civic media.

The workshops are for people with absolutely no previous programming experience and they bring together researchers and academics with participants and leaders in online communities.  The workshops are run entirely by volunteers and are entirely free of charge for participants, generously sponsored by the UW Department of Communication and the eScience Institute. Participants from outside UW are encouraged to apply.

There will be a mandatory evening setup session 6:00-9:00pm on Friday April 10 and three workshops held from 9am-4pm on three Saturdays (April 11 and 25 and May 9). Each Saturday session will involve a period for lecture and technical demonstrations in the morning. This will be followed by a lunch graciously provided by the eSciences Institute at UW.  The rest of the day will be followed by group work on programming and data science projects supported by more experienced mentors.

Setup and Programming Tutorial (April 10 evening) — Because we expect to hit the ground running on our first full day, we will meet to help participants get software installed and to work through a self-guided tutorial that will help ensure that everyone has the skills and vocabulary to start programming and learning when we meet the following morning.

Introduction to Programming (and April 11) — Programming is an essential tool for data science and is useful for solving many other problems. The goal of this session will be to introduce programming in the Python programming language. Each participant will leave having solved a real problem and will have built their first real programming project.

Importing Data from web APIs (April 25)  — An important step in doing data science is collecting data. The goal of this session will be to teach participants how to get data from the public application programming interfaces (“APIs”) common to many social media and online communities. Although we will use the APIs provided by Wikipedia and Twitter in the session, the principles and techniques are common to many other online communities.

Data Analysis and Visualization (May 9) — The goal of data science is to use data to answer questions. In our final session, we will use the Python skills we learned in the first session and the datasets we’ve created in the second to ask and answer common questions about the activity and health of online communities. We will focus on learning how to generate visualizations, create summary statistics, and test hypotheses.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

  • Are new contributors in Wikipedia this year sticking around longer or contributing more than people who joined last year?
  • Who are the most active or influential users of a particular Twitter hashtag?
  • Are people who join through a Wikipedia outreach event staying involved? How do they compare to people who decide to join the project outside of the event?

An earlier version of the workshops was run in Spring and Fall 2015 and the curriculum we used for both are online.

Sign up and Participate!

Participants! If you are interested in learning data science, please fill out our registration form here. The deadline to register is Friday April 3.  We will let participants know if we have room for them by Monday April 6. Space is limited and will depend on how many mentors we can recruit for the sessions.

Interested in being a mentor? If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required. And we’ll feed you!  Because we want to keep a very high mentor-to-student ratio, recruiting more mentors means we can accept more participants. If you’re interested you can fill out this form or email makohill@uw.edu. Also, thank you, thank you, thank you!

About the Organizers

The workshops are being coordinated, organized by Benjamin Mako Hill, Dharma Dailey, Jonathan Morgan, Ben Lewis, and Tommy Guy and a long list of other volunteer mentors. The workshops have been designed with lots of help and inspiration from Shauna Gordon-McKeon and Asheesh Laroia of OpenHatch and lots of inspiration from the Boston Python Workshop.

These workshops are an all-volunteer effort. Fundamentally, we’re doing this because we’re programmers and data scientists who work in online communities and we really believe that the skills you’ll learn in these sessions are important and empowering tools.

The workshops are being supported by the UW Department of Communication and the eScience Institute.

If you have any questions or concerns, please contact Benjamin Mako Hill at makohill@uw.edu.

Dept.Comm_UW_vertical_small_square escience_logo

 Photo from the Boston Python Workshop - a similar workshop run in Boston that has inspired and provided a template for the CDW.
Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

Community Data Science Workshops in November 2014

The Community Data Science Workshops in November 2014 are a series of project-based workshops being held at the University of Washington for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free and open source software, and civic media.

The workshops are for people with absolutely no previous programming experience and they bring together researchers and academics with participants and leaders in online communities.  The workshops are run entirely by volunteers and are entirely free of charge for participants, generously sponsored by the UW Department of Communication and the eScience Institute. Participants from outside UW are encouraged to apply.

There will be a mandatory evening setup session 6:30-9:30pm on Friday November 7 and three workshops held from 9am-4pm on three Saturdays in November (November 8, 15, and 22). Each Saturday session will involve a period for lecture and technical demonstrations in the morning. This will be followed by a lunch graciously provided by the eSciences Institute at UW.  The rest of the day will be followed by group work on programming and data science projects supported by more experienced mentors.

Setup and Programming Tutorial (November 7 evening) — Because we expect to hit the ground running on our first full day, we will meet to help participants get software installed and to work through a self-guided tutorial that will help ensure that everyone has the skills and vocabulary to start programming and learning when we meet the following morning.

Introduction to Programming (and November 8) — Programming is an essential tool for data science and is useful for solving many other problems. The goal of this session will be to introduce programming in the Python programming language. Each participant will leave having solved a real problem and will have built their first real programming project.

Importing Data from Wikipedia and Twitter APIs (November 15)  — An important step in doing data science is collecting data. The goal of this session will be to teach participants how to get data from the public application programming interfaces (“APIs”) common to many social media and online communities. Although we will use the APIs provided by Wikipedia and Twitter in the session, the principles and techniques are common to many other online communities.

Data Analysis and Visualization (November 22) — The goal of data science is to use data to answer questions. In our final session, we will use the Python skills we learned in the first session and the datasets we’ve created in the second to ask and answer common questions about the activity and health of online communities. We will focus on learning how to generate visualizations, create summary statistics, and test hypotheses.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

  • Are new contributors in Wikipedia this year sticking around longer or contributing more than people who joined last year?
  • Who are the most active or influential users of a particular Twitter hashtag?
  • Are people who join through a Wikipedia outreach event staying involved? How do they compare to people who decide to join the project outside of the event?

An earlier version of the workshops was run between April and May 2014 and the curriculum we used in the Spring is available online.

Sign up and Participate!

Participants! If you are interested in learning data science, please fill out our registration form here. The deadline to register is Thursday October 30.  We will let participants know if we have room for them by Saturday November 1. Space is limited and will depend on how many mentors we can recruit for the sessions.

Interested in being a mentor? If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required. And we’ll feed you!  Because we want to keep a very high mentor-to-student ratio, recruiting more mentors means we can accept more participants. If you’re interested,  email makohill@uw.edu. Also, thank you, thank you, thank you!

About the Organizers

The workshops are being coordinated, organized by Benjamin Mako Hill, Frances Hocutt, Jonathan Morgan, and Tommy Guy and a long list of other volunteer mentors. The workshops have been designed with lots of help and inspiration from Shauna Gordon-McKeon and Asheesh Laroia of OpenHatch and lots of inspiration from the Boston Python Workshop.

These workshops are an all-volunteer effort. Fundamentally, we’re doing this because we’re programmers and data scientists who work in online communities and we really believe that the skills you’ll learn in these sessions are important and empowering tools.

The workshops are being supported by the UW Department of Communication and the eSciences Institute.

If you have any questions or concerns, please contact Benjamin Mako Hill at makohill@uw.edu.

Dept.Comm_UW_vertical_small_square escience_logo

 Photo from the Boston Python Workshop - a similar workshop run in Boston that has inspired and provided a template for the CDW.
Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

Community Data Science Workshop Post-Mortem

[Reposted from Benjamin Mako Hill’s blog Copyrighteous.]

Earlier this year, I helped plan and run the Community Data Science Workshops: a series of three (and a half) day-long workshops designed to help people learn basic programming and tools for data science tools in order to ask and answer questions about online communities like Wikipedia and Twitter. You can read our initial announcement for more about the vision.

The workshops were organized by myself, Jonathan Morgan from the Wikimedia Foundation, long-time Software Carpentry teacher Tommy Guy, and a group of 15 volunteer “mentors” who taught project-based afternoon sessions and worked one-on-one with more than 50 participants. With overwhelming interest, we were ultimately constrained by the number of mentors who volunteered. Unfortunately, this meant that we had to turn away most of the people who applied. Although it was not emphasized in recruiting or used as a selection criteria, a majority of the participants were women.

The workshops were all free of charge and sponsored by the UW Department of Communication, who provided space, and the eScience Institute, who provided food.

cdsw_combo_images-1The curriculum for all four session session is online:

The workshops were designed for people with no previous programming experience. Although most our participants were from the University of Washington, we had non-UW participants from as far away as Vancouver, BC.

Feedback we collected suggests that the sessions were a huge success, that participants learned enormously, and that the workshops filled a real need in the Seattle community. Between workshops, participants organized meet-ups to practice their programming skills.

Most excitingly, just as we based our curriculum for the first session on the Boston Python Workshop’s, others have been building off our curriculum. Elana Hashman, who was a mentor at the CDSW, is coordinating a set of Python Workshops for Beginners with a group at the University of Waterloo and with sponsorship from the Python Software Foundation using curriculum based on ours. I also know of two university classes that are tentatively being planned around the curriculum.

Because a growing number of groups have been contacting us about running their own events based on the CDSW — and because we are currently making plans to run another round of workshops in Seattle late this fall — I coordinated with a number of other mentors to go over participant feedback and to put together a long write-up of our reflections in the form of a post-mortem. Although our emphasis is on things we might do differently, we provide a broad range of information that might be useful to people running a CDSW (e.g., our budget). Please let me know if you are planning to run an event so we can coordinate going forward.

Community Data Science Workshops at UW

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.

The Community Data Science Workshops are a series of project-based workshops being held at the University of Washington for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free  and open source software, and civic media.

The workshops are for people with no previous programming experience. The goal is to bring together both researchers and academics as well as participants and leaders in online communities.  The workshops will all be free of charge. Participants from outside UW are encouraged to apply.

There will be three workshops held from 9am-4pm on three Saturdays in April and May. Each session will involve a period for lecture and technical demonstrations in the morning. This will be followed by a lunch graciously provided by the eSciences Institute at UW.  The rest  of the day will be followed by group work on programming and data science projects supported by more experienced mentors.

Introduction to Programming (April 5) — Programming is an essential tool for data science and is useful for solving many other problems. The goal of this session will be to introduce programming in the Python programming language. Each participant will leave having solved a real problem and will have built their first real programin their group. We will be relying on the curriculum from the Boston Python Workshops. Because we expect to hit the ground running, we will also run a session in the evening of Friday April 4 to help participants get software installed.

Importing Data from Wikipedia and Twitter APIs (May 3)  — An important step in doing data science is collecting data. The goal of this session will be to teach participants how to get data from the public application programming interfaces (“APIs”) common to many social media and online communities. Although, we will use the APIs provided by Wikipedia and Twitter in the session, the principles and techniques are common to many online communities.

Data Analysis and Visualization (May 31) — The goal of data science is to use data to answer questions. In our final session, we will use the Python skills we learned in the first session and the datasets we’ve created in the second to ask and answer common questions about the activity and health of online communities. We will focus on learning how to generate visualizations, create summary statistics, and test hypotheses.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like:

  • Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year?
  • Who are the most active or influential users of a particular Twitter hashtag?
  • Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

Our first session will be modeled after the Boston Python Workshops, but the curriculum of the later sessions is still in development and will be influenced by the needs of the participants.

Sign up and Participate!

Participants! If you are interested in learning data science, fill out our registration form here. The deadline to register is Wednesday March 26th.  We will let participants know if we have room for them by Saturday March 29th. Space is limited and will depend on how many mentors we can recruit for the sessions.

Interested in being a mentor? If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required. And we’ll feed you!  Because we want to keep a very high mentor to student ratio, recruiting more mentors means we can accept more participants. If you’re interested,  email makohill@uw.edu. Also, thank you, thank you, thank you!

About the Organizers

The workshops are being coordinated, organized, and led by Benjamin Mako Hill at the University of Washington Department of Communication and Jonathan Morgan at the Wikimedia Foundation. They have been designed with lots of help and inspiration from Shauna Gordon-McKeon and Asheesh Laroia of OpenHatch and lots of inspiration from the Boston Python Workshop.

These workshops are an all-volunteer effort. Fundamentally, we’re doing this because we’re programmers and data scientists that work in online communities and we really believe that the skills you’ll learn in these sessions are important and empowering tools.

The workshops are being supported by the UW Department of Communication and the eSciences Institute.

If you have any questions or concerns, contact Benjamin Mako Hill at makohill@uw.edu.

Photo from the Boston Python Workshop – a similar workshop run in Boston that has inspired and provided a template for the CDSW.