I recently taught a two-session workshop introducing R to Kellogg MBA students. I had a few goals for the workshops:
- Convince students of the benefits of using text-based programming for data exploration and analysis
- Introduce basic programming concepts (e.g., variables, functions)
- Give students a basic understanding of how to do some fundamental data analysis tasks in R: importing, cleaning, visualizing, and modeling
Those are really big goals for only four hours. I decided to use the tidyverse as much as possible and not even teach base R syntax like ‘[,]’, apply, etc. I used the first session to show and explain code using the nycflights13 dataset. For the the second session we did a few more examples but mostly worked on exercises using a dataset from Wikia that I created (with help from Mako and Aaron Halfaker‘s code and data).
Retrospection
Overall, I think that the workshops went pretty well. I think that students definitely have a better understanding and a better set of tools than I did after I had used R for four hours!
That being said, there was plenty of room for improvement. I am scheduled to teach another set of workshops early next year and I’m planning to make a few changes:
- Make both of the workshops more hands-on and interactive. I think I’ll divide the topics covered: the first workshop will be on importing, cleaning, and grouping data and the second will be on visualizing and creating inferential models.
- Get more help – teaching non-programmers R requires some hand-holding and individual attention. To be successful, I think a workshop like this requires 1 “TA” for every 8-10 students.
- Find a more relevant dataset. Although I actually learned a few things about my dataset that will help with my papers that use it, I think it would be better to have a dataset that is as similar as possible to what students will be working with in their careers.
- Connect the visualization and regression more directly to a specific analysis problem rather than as syntax-learning exercises.
Reuse this workshop!
I found some pretty good resources already in existence for introducing students to R, but none of them quite fit the scope of what I was looking for. All of the code that I used (as well as some slides for the beginning of class) are on github and GPL licensed. Please reuse my work and submit pull requests!
Discover more from Community Data Science Collective
Subscribe to get the latest posts sent to your email.