Mapping the many pathways of learning in online communities

Thousands of widely used online communities are designed to promote learning. Although some rely on formal educational approaches like lesson plans, curriculum, and tests, many of the most successful learning communities online are structured as what scholars call a community of practice (CoP). In CoPs, members mentor and apprentice with each other (both formally and informally) while working toward a common interest or goal. For example, the Scratch online community is a CoP where millions of young people share and collaborate on programming projects.

Despite an enormous amount of attention paid to online CoPs, there is still a lot of disagreement about the best ways to promote learning in them. One source of disagreement stems from the fact that participants in CoPs are learning a number of different kinds of things and designers are often trying to support many types of learning at once. In a new paper that I’ve published—and that I will be presenting at CSCW this week—I conduct quantitative analyses on data from Scratch to show that there is a complex set of learning pathways at play in CoPs like Scratch. Types of participation that are associated with some important kinds of learning are often unrelated to, or even negatively associated with, other important types of learning outcomes. 

The Scratch online community (left) and an example of a programming project in Scratch (right). 

So what exactly are people learning in CoPs?  We dug into the CoP literature and identified three major types of learning outcomes: 

  • Learning about the domain, which refers to learning knowledge and skills for the core tasks necessary for achieving the explicit goal in the community. In Scratch, this is learning to code.
  • Learning about the community, which means the development of identity as a community member, forming relationships, affinities, and a sense of belonging. In Scratch, this involves learning to interact with others users and developing an identity as a community member.
  • Learning about the practice, which means adopting community specific values, such as the style of contribution that will be accepted and appreciated by its members. In Scratch, this means becoming a valued and respected contributor to the community.

So what types of participation might contribute to learning in a CoP?  We identified several different types of newcomers’ participation that may support learning:

  • Contribution to core tasks which involves direct work towards the community’s explicit goal. In Scratch, this often involves making original programming projects.
  • Engagement with practice proxies which involves observing and participating in others’ work practices. In Scratch, this might mean remixing others’ projects by making changes and building on existing code. 
  • Feedback exchange with community members about their contributions. In Scratch, this often involves writing comments on others’ projects.
  • Social bonding with community members. In Scratch, this can involve “friending” others, which allows a user to follow others’ projects and updates.
A visual representation of our study design.

We conducted a quantitative analysis on how the different types of newcomer participation contribute to the different learning outcomes. In other words, we tested for the presence/absence and the direction of the relationships (shown as the orange arrows) between each of the learning outcomes on the top of the figure and each of the types of newcomer participation on the bottom. To conduct these tests, we used data from Scratch to construct a user level dataset with proxy measures for each type of learning and type of newcomer participation as well as a series of important control variables. All the technical details about the measures and models are in the paper. 

Overall, what we found was a series of complex trade-offs that suggest the kinds of things that support one type of learning frequently do not support others. For example, we found that contribution to core tasks as a newcomer is positively associated with learning about the domain in the long term, but negatively associated with learning about the community and its practices. We found that engagement with practice proxies as a newcomer is negatively associated with long-term learning about the domain and the community. Engaging in feedback exchange and social bonding as a newcomer, on the other hand, are positively associated with learning about the community and its practice.

Our findings indicate that there are no easy solutions: different types of newcomer participation provide varying support for different learning outcomes. What is productive for some types of learning outcomes can be unhelpful for others, and vice versa. For example, although social features like feedback mechanisms and systems for creating social bonds may not be a primary focus of many learning systems, they could be implemented to help users develop a sense of belonging in the community and learn about community specific values. At the same time, while contributing to core tasks may help with domain learning, direct contribution may often be too difficult and might discourage newcomers from staying in the community and learn about its values.


The paper and this blog post are collaborative work between Ruijia “Regina” Cheng and Benjamin Mako Hill. The paper is being published this month(open access) in the Proceedings of the ACM on Human-Computer Interaction The full citation for this paper is: Ruijia Cheng and Benjamin Mako Hill. 2022. Many Destinations, Many Pathways: A Quantitative Analysis of Legitimate Peripheral Participation in Scratch. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 381 (November 2022), 26 pages https://doi.org/10.1145/3555106

The paper is also available as an arXiv preprint and in the ACM Digital Library. The paper is being presented several times at the Virtual CSCW conference taking place in November 2022. Both Regina and Mako are happy to answer questions over email, in the comments on this blog post, or at the one remaining presentation slot at the CSCW conference on November 16th at 8-9pm Pacific Time. 

How does Interest-driven Participation Shape Computational Learning in Online Communities?

Online communities are frequently described as promising sites for computing education. Advocates of online communities as contexts for learning argue that they can help novices learn concrete programming skills through self-directed and interest-driven work. Of course, it is not always clear how well this plays out in practice—especially when it comes to learning challenging programming concepts. We sought to understand this process through a mixed-method case study of the Scratch online community that will be published and presented at the ACM Conference on Human Factors in Computing (CHI 2022) in several weeks.

Scratch is the largest online interest-driven programming community for novices. In Scratch, users can create programming projects using the visual-block based Scratch programming language. Scratch users can choose to share their projects—and many do—so that they can be seen, interacted with, and remixed by other Scratch community members. Our study focused on understanding how Scratch users learn to program with data structures (i.e., variables and lists)—a challenging programming concept for novices—by using community-produced learning resources such as discussion threads and curated project examples. Through a qualitative analysis on Scratch forum discussion threads, we identified a social feedback loop where participation in the community raises the visibility of some particular ways of using variables and lists in ways that shaped the nature and diversity of community-produced learning resources. In a follow-up quantitative analysis on a large collection of Scratch projects, we find statistical support for this social process. 

A program made by a stack of Scratch programming blocks. From the top to bottom: ``when clicked,'' ``forever,'' ``if touching Bat? then,'' ``change score by -1.''
A Scratch project code of a score counter in a game. 

As the first step of our study, we collected and qualitatively analyzed 400 discussion threads about variables and lists in the Scratch Q&A forums. Our key finding was that Scratch users use specific, concrete examples to teach each other about variables and lists. These examples are commonly framed in terms of elements in the projects that they are making, often specific to games.

For instance, we observed users teach each other how to make a score counter in a game using variables. In another example, we saw users sharing tips on creating an item inventory in a game using lists. As a result of this focus on specific game elements, user-generated examples and tutorials are often framed in the specifics of these game-making scenarios. For example, a lot of sample Scratch code on variables and lists were from games with popular elements like scores and inventories. While these community-produced learning resources offers valuable concrete examples, not everybody is interested in making games. We some some evidence that users who are not interested in making games involving scores and inventories were less likely to get effective support when they sought to learn about variables. We argue that repeated over time, this dynamic can lead to a social feedback loop where reliance on community-generated resources can place innovative forms of creative coding at a disadvantage compared to historically common forms.

This diagram illustrates the hypothetical social feedback loop that we constructed based on our findings in Study 1. The diagram starts with the box of ``Stage 1'' on the left, and the text that explains Stage 1 says: ``learners create artifacts with Use Case A.'' There is a right-going arrow pointing from Stage 1 to the box of ``Stage 2'' on the right and the text on the arrow says: ``learners turn to the community for help or inspiration.'' The text that explains Study 2 says: ``Community cumulates learning resources framed around Use Case A.'' Above these is a left-going arrow that points from Stage 2 back to Stage 1, forming the loop. The text on the arrow says: ``Subsequent learners get exposed to resources about Use Case A.'' Underneath the entire loop there is an down-going arrow pointing to a box of ``Outcome.'' The text that explains Outcome says: ``Use Case A becomes archetypal. Other innovative use cases become less common.''
Our proposed hypothetical social feedback loop of how community-generated resources may constrain innovative computational participation. 

The graph here is a visualization of the social feedback loop theory that we proposed. Stage 1 suggests that, in an online interest-driven learning community, some specific applications of a concept (“Use Case A”) will be more popular than others. This might be due to random chance or any number of reasons. When seeking community support, learners will tend to ask questions framed specifically around Use Case A and use community resources framed in terms of the same use case. Stage 2 shows the results of this process. As learners receive support, they produce new artifacts with Use Case A that can serve as learning resources for others. Then, learners in the future can use these learning resources, becoming even more likely to create the same specific application. The outcome of the feedback loop is that, as certain applications of a concept become more popular over time, the community’s learning resources are increasingly focused on the same applications.

We tested our social feedback loop theory using 5 years of Scratch data including 241,634 projects created by 75,911 users. We tested both the mechanism and the outcome of the loop from multiple angles in terms of three hypotheses that we believe will be true if our the feedback loop we describe is shaping behavior:

  1. More projects involving variables and lists will be games over time.
  2. The type of project elements that users make with variables and lists (we defined it as the names that they gave to variables and lists) will be more homogenous.
  3. Users who have been exposed to popular variable and list names will be more likely to use those names in their own projects. We found at least some support for all of our hypotheses.

Our results provide broad (if imperfect) support for our social feedback loop theory. For example, the graph below illustrates one of our findings: users who have been exposed to popular list names (solid line) will be more likely to use (in other words, less likely to never use) popular names in their projects, compared to users who have never downloaded projects with popular list names (dashed line). 

This figure is a line plot that illustrates the curves from the survival analysis for lists. The x-axis is ``Number of shared de novo projects w/ list.'' The labels are ``0'', ``10'', ``20'', and ``30'' from left to right. The y-axis is ``Proportion of users who have never used popular variable names.'' The labels are ``0.00'', ``0.25'', ``0.50'', ``0.75'', and ``1.00'' from bottom to top. There are two lines. The dashed line represents users who never downloaded projects with popular variable names. The solid line represents users who has downloaded projects with popular variable names.The solid line starts at 1 on x-axis and approximately 0.75 on the y-axis. The solid line descends in a convex shape and when it reaches 10 on the x-axis, it is at around 0.25 on the y-axis. The line keeps descending, reaches around 0.05 on the y-axis when it is at 25 on the x-axis, and stays at 0.05 for the rest of the x-axis. The dashed line is significantly higher than the solid line and stays above it the entire graph. The dashed line starts at 1 on x-axis and approximately 0.88 on the y-axis. The dashed line descends in a convex shape that is less steep than the solid line, and when it reaches 10 on the x-axis, it is at around 0.50 on the y-axis. The line keeps descending, reaches around 0.24 on the y-axis when it is at 25 on the x-axis, and stays at 0.24 for the rest of the x-axis.
Plots from our cox proportional survival analysis on the difference between users who have previously downloaded projects with popular list names versus those who have never done so. 

The results from our study describe an important trade-off that designers of online communities in computational learning need to be aware of. On the one hand, learners can learn advanced computational concepts by building their own explanation and understanding on specific use cases that are popular in the community. On the other, such learning can be superficial and not conceptual or generalizable: learners’ preference for peer-generated learning resources around specific interests can restrict the exploration of broader and more innovative uses, which can potentially limit sources of inspiration, pose barriers to broadening participation, and confine learners’ understanding of general concepts. We conclude our paper suggesting several design strategies that might be effective in countering this effect.


Please refer to the preprint of the paper for more details on the study and our design suggestions for future online interest-driven learning communities. We’re excited that this paper has been accepted to CHI 2022 and received the Best Paper Honorable Mention Award! It will be published in the Proceedings of the ACM on Human-Computer Interaction and presented at the conference in May. The full citation for this paper is:

Ruijia Cheng, Sayamindu Dasgupta, and Benjamin Mako Hill. 2022. How Interest-Driven Content Creation Shapes Opportunities for Informal Learning in Scratch: A Case Study on Novices’ Use of Data Structures. In CHI Conference on Human Factors in Computing Systems (CHI ’22), April 29-May 5, 2022, New Orleans, LA, USA. ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/3491102.3502124

If you have any questions about this research, please feel free to reach out to one of the authors: Ruijia “Regina” Cheng, Sayamindu Dasgupta, and Benjamin Mako Hill.