How Adopting Reproducible Practices Can Benefit Data Science Education

How Adopting Reproducible Practices Can Benefit Data Science Education

Title: Tools and Recommendations for Reproducible Teaching

Author(s) and Year: Mine Dogucu and Mine Çetinkaya-Rundel, 2022.

Journal: Journal of Statistics and Data Science Education (Open Access)

As the fields of statistics and data science have grown, the importance of reproducibility in research and easing the “replication crisis” has become increasingly apparent. The inability to reproduce scientific results when using the same data and code may lead to a lack of confidence in the validity of research and can make it difficult to build on and advance scientific knowledge.

But reproducibility is not just important in the realm of research. In a recent article published in the Journal of Statistics and Data Science Education, Mine Dogucu and Mine Çetinkaya-Rundel propose the idea of reproducible teaching practices, arguing that consistency in reproducibility should extend to how professors teach their students. As the authors state, “our vision for the modern teacher-scholars is having consistent reproducibility practices in how they conduct research, what they teach to students, and how they prepare teaching materials.”

In other words, just as researchers are expected to follow reproducible workflows in their own studies, educators should also model these practices for their students. This includes using computational reproducibility methods like version control, producing documentation of data and software, and following open science practices in the classroom. By doing so, teachers can lead by example for their students.

A picture containing graphical user interface

Description automatically generated
Figure 1.The authors illustrate a structural template of a directory that resembles a weekly course material folder from one of their statistics and data science classes. The dotted branching lines indicate subfolders and files within folders. Note how the directory’s contents document all data and code, and each aspect of the course is consolidated into its own separate folder for easy access and readability. Image reproduced without modification from Figure 1 of Dogucu and Çetinkaya-Rundel, 2022.

But the benefits of reproducible teaching practices extend far beyond the classroom. As Dogucu and Çetinkaya-Rundel note, teaching materials are often shared with others, whether it be with teaching assistants, graders, or even other instructors. By adopting reproducible workflows in the preparation and delivery of these materials, teachers can not only make it easier for others to use and adapt their materials, but also increase the impact of their teaching by making it more widely available. As the authors point out, “sharing teaching materials openly can also benefit the instructor by helping them gain name recognition in their field as well as leading to potential collaboration opportunities, which can be especially impactful if they are early career.”

While the adoption of reproducible teaching practices may come with a learning curve, the authors recommend making incremental changes term by term and adopting these practices gradually. They also stress the importance of the new generation of statistics and data science educators having exposure to reproducible teaching tools in their training programs. This style of education can prepare instructors for the benefits and challenges of using these tools in their own teaching.

Graphical user interface, table

Description automatically generated
Figure 2.The version control system, git, can detect differences in bootstrap means when upgrading R from version 3.5.3 to version 4.1.0. In this image, the red lines indicate means computed with the earlier version of R, while the green lines indicate means calculated with the later version. A version control system keeps track of all changes made to a file to recall the file later at a specific time (i.e., version). Image reproduced without modification from Figure 2 of Dogucu and Çetinkaya-Rundel, 2022.

“We envision modern statistics and data science instructors to adopt the reproducible teaching framework in their own teaching regardless of the nature of the courses that they are teaching,” the authors conclude. For example, even in courses where reproducibility is not the focal point (e.g., introductory probability), lecturers who adopt a reproducible workflow will reap the benefits for both themselves and the broader data science education community.

Overall, the framework proposed by Dogucu and Çetinkaya-Rundel offers practical directions for professors looking to improve the reproducibility of their teaching practices. By adopting open and reproducible tools and sharing materials with the broader academic community, educators can not only enhance the quality of their teaching, but also increase its influence beyond the classroom.

Edited by Alyssa Columbus

Cover image credit: WOCinTechChat.com, used under CC BY 2.0Figures used under CC BY 4.0