Subscribe to RSS
DOI: 10.1055/a-2407-1272
Teaching Data Science through an Interactive, Hands-On Workshop with Clinically Relevant Case Studies
Authors
Abstract
Background In this case report, we describe the development of an innovative workshop to bridge the gap in data science education for practicing clinicians (and particularly nurses). In the workshop, we emphasize the core concepts of machine learning and predictive modeling to increase understanding among clinicians.
Objectives Addressing the limited exposure of health care providers to leverage and critique data science methods, this interactive workshop aims to provide clinicians with foundational knowledge in data science, enabling them to contribute effectively to teams focused on improving care quality.
Methods The workshop focuses on meaningful topics for clinicians, such as model performance evaluation and introduces machine learning through hands-on exercises using free, interactive python notebooks. Clinical case studies on sepsis recognition and opioid overdose death provide relatable contexts for applying data science concepts.
Results Positive feedback from over 300 participants across various settings highlights the workshop's effectiveness in making complex topics accessible to clinicians.
Conclusion Our approach prioritizes engaging content delivery and practical application over extensive programming instruction, aligning with adult learning principles. This initiative underscores the importance of equipping clinicians with data science knowledge to navigate today's data-driven health care landscape, offering a template for integrating data science education into health care informatics programs or continuing professional development.
Keywords
machine learning - professional training - data analysis - decision support algorithm - forecastingBackground and Significance
Educating clinicians who have a desire to learn and enhance their work through data is an ongoing and evolving process. With technology advancing and data analytics skills becoming increasingly needed, our clinical workforce must possess foundational knowledge and skills to help them critically evaluate whether and how data science applications could inform care. With the current influx and popularity of artificial intelligence (AI), it is important to prepare a workforce that can influence how these tools are integrated into clinical practice. Training the next generation of clinicians to understand core concepts of data science requires training that addresses emerging methods to analyze large amounts of data to understand phenomena and make actionable predictions that can improve clinical outcomes.[1] Because the vast majority of our health care workforce has completed academic training prior to these recent advances, a significant barrier to equipping clinicians with the necessary knowledge lies in the need to teach data science concepts to clinicians who are actively practicing. Crafting meaningful and engaging learning content to an audience with no background in data science or machine learning (ML) methods requires a carefully executed strategy that aligns with principles of adult learning.[2] To address this need we have created and implemented a data science workshop using the iterative instructional design concepts contained in the ADDIE model (analyze, design, develop, implement, and evaluate) and have offered it to nurses (and other clinical disciplines) at conferences and in academic settings.[3] Leveraging the ADDIE model, we can use formal and informal evaluations to continuously improve the learning activity's content and format to provide an engaging and informative experience for learners.
Objective
Existing data science education and training for clinicians is limited, particularly for nurses. While curricula in nursing schools may include basic statistics or use of common spreadsheet tools such (e.g., Microsoft Excel), there are few academic programs that train health care providers in the concepts of ML, predictive modeling, AI, and other data science methods. We sought to develop an interactive experience where nurses who have an interest in informatics could become familiar with key data science concepts and be exposed to introductory scientific computing skills. The intent of this training is not to transform clinicians into data scientists but rather to give them a solid foundation in the field that allows them to ask the right questions as part of a team driving clinical transformation. In this case report, we describe the development of a data science workshop that has been developed and conducted with clinicians, most of whom are nurses, who possess no background in software programming or data science.
Methods
Content
Because the field of data science is quite broad, we wanted to ensure we focused on the topics that would be most meaningful to practicing nurses who are unlikely to routinely build ML models in the real world. To this end, we emphasized evaluation of model performance more than any other topic. Topics included how to calculate and how to evaluate sensitivity, positive predictive value, area under the receiver operating characteristic curves, and F1 scores. We also introduced the concepts of bias/variance trade-off and preventing overfitting.
After the in-depth discussion of evaluating model performance, we guided learners to train and tune multiple ML models on the data. For those interested in learning more, particularly about data preprocessing (the most time-consuming aspect of a data science project), we provided an additional notebook for self-study. [Table 1] contains an outline of the content covered in the workshop. While this order of content is reversed from an actual data science project, we found that starting with preprocessing was so tedious and unexciting that many learners lost motivation and became disengaged (see the Lessons Learned column in [Table 2]).
Abbreviations: MSN, Master of Science in Nursing; NKBDS, Nursing Knowledge: Big Data Science; VUSN, Vanderbilt University School of Nursing.
Format
All workshop events have used interactive python programming “notebooks” ([Fig. 1]) that facilitate learners' ability to read content, take notes, and execute software code simultaneously. Currently, we offer the workshop through a freely available interactive python notebook platform (specifically, DeepNote, even though several other platforms are also available) that allows remote interaction between individuals (similar to a Google Drive document) and does not involve downloading or installing any software. We have offered the synchronous workshop as an event both in-person and remotely, and we recommend learners have at least two monitors available for the remote version so they can watch the facilitator's demonstrations on one monitor and practice on the other monitor. To help learners explore the concepts of performance metrics interactively, we have also provided an Excel file containing a confusion matrix that can be manipulated with new values. As values are updated, the change in performance metrics can be seen instantaneously. Finally, we provide a “Key Takeaways” document (see Supplemental Materials for student version and answer key) to help learners focus on the most essential concepts and have a reference for the future. [The most recent DeepNote notebook from February 2024 containing all scripts, data, and supplementary files is available at http://tinyurl.com/dsw24 ].


Clinical Case Studies
We developed two case studies to provide relatable clinical contexts: sepsis recognition and opioid overdose death. For sepsis recognition, we created a completely synthetic dataset with known distributions and associations to ensure learners would make the discoveries we intended and that made clinical sense. For opioid overdose death, we created realistic but synthetic data using MITRE's Synthea program.[4]
In preparation for the sepsis case study, we encourage learners to watch 2, 15-minute videos prior to the workshop. The videos introduce the learners to a “sepsis prediction model” (based on an arbitrary point system) where they have an opportunity to hear some concepts (e.g., sensitivity, specificity) before the workshop. During the workshop, the concepts are reinforced as the learners work to build a better sepsis prediction model. First, learners work in small groups to manually modify the 10-variable (e.g., heart rate, white blood cell count, temperature, etc.) sepsis model by either assigning more/less points (i.e., weight) to each variable and/or changing the thresholds at which each variable would be assigned a point; we refer to this as Manual Learning. We use this exercise to illustrate that it would be impossible to manually evaluate every possible combination and that the benefit of ML is to leverage the computer for building a better model.
In the opioid overdose case study, a greater breadth of data preprocessing steps are covered with less depth on model development and validation (compared with the sepsis case study). We used the opioid overdose case study for the first several iterations of the workshop because we wanted to expose learners to all aspects of the prediction-focused data science process. However, we received feedback from workshop participants that when the entire process is condensed into this brief period of time, it was difficult to comprehend the model performance evaluation concepts described at the end. Therefore, we now provide the opioid overdose case study as a postworkshop learning activity for participants who would like to learn more concepts and/or apply the model development and evaluation learnings from the sepsis case study to a new clinical condition.
Evaluation
Feedback from each workshop has been used to iteratively improve the content and delivery. Due to the diversity of evaluation methods used in the multiple formats and audiences ([Table 2]), we are unable to provide longitudinal evaluation data. However, for the workshop offered to graduate nursing informatics students in 2022 through 2024 (30 students), we conducted a survey at the time of the workshop's completion to ask how they might use the information learned and how we can improve next time.
Results
The workshop has been offered to at least 250 attendees, including 4 national scientific meetings with nurses from multiple backgrounds (∼4–6 hours' duration), 4 Master's-level Nursing Informatics cohorts at Vanderbilt University School of Nursing (6–12 hours), and 1 PhD cohort at Vanderbilt University School of Nursing (2 hours).
In the earliest versions of the workshop (starting in 2018), fewer easily accessible notebook platforms were available, and we spent the first 1 to 2 hours of a workshop ensuring everyone could download and install software. The use of a free online platform (starting in 2020) that only requires creating an account without downloading software has significantly improved learners' experiences by reducing the amount of time to prepare the software for active participation.
Overall, the general feedback from all settings and audiences has been positive. Learners share that after participating in the workshop, they now possess a better understanding of the concepts and plan to use the newly gained knowledge in their work settings. Responses by nursing informatics students to the postworkshop evaluation included comments such as, “I find this information immensely valuable and intend to apply it upon returning to work tomorrow, particularly in the context of the ongoing [quality improvement] processes in which I am actively engaged” and “This data workshop helps me gain a lot of clarity about processing in our day-to-day health care information. Now I know how to think critically when assessing a predictive model, understand the pathway of data,” and “Blown away from all the content and can't wait to get back home to further explore.” Overall, the majority of students articulated that they now feel more confident in understanding how predictive models work and more comfortable in knowing the right questions to ask when evaluating effectiveness. There is widespread gratitude for the opportunity to learn these emerging topics in a way that makes sense to nurses and gives them the ability to ask the right questions when assessing the performance and potential benefits of a predictive model being used in their health care systems.
At the completion of the workshops conducted for the nursing informatics students from 2021 to 2024 (n = 40), the feedback again centered on the appreciation of making the concepts understandable and being able to use the knowledge to evaluate models in use or being considered for adoption. The sepsis case study was very well received, because learners are asked to “build your own algorithm” based on clinical knowledge, and then we illustrate how ML can simplify the task while also remaining interpretable. This exercise conducted during the workshop added increased engagement as it was meaningful to their daily practice while immediately demonstrating the benefit of ML. Additional quantitative descriptive information and accompanying free-text comments from respondents are provided in [Table 3].
Discussion
Our interactive workshop to teach data science knowledge and skills meets a gap in current informatics education. Many informatics students will not become data scientists, but we believe all of them should have a grasp of key data science concepts and, at a minimum, be exposed to how data scientists use scientific computing to elicit information from data. A foundational knowledge of data science empowers nurses in informatics roles to assess clinical decision support systems and other predictive tools embedded within electronic health records with the ultimate outcome of improving patient care and outcomes. With many workshop attendees highlighting the immediate applicability of the material, we are confident that we have fostered a foundation for ongoing data science learning among all our participants. While some learners have been inspired to pursue more advanced data science training, the vast majority become equipped to critically evaluate data science activities and imagine new possibilities for how data science methods could help them solve health care challenges.
Many data science courses for general audiences begin with teaching computer science and programming basics, progress to preprocessing data, and conclude with model development and evaluation.[5] [6] Other than starting with some programming basics so that learners can use the notebooks, our approach mostly reverses this sequence. Our approach aligns with the adult learning principle that learners benefit from understanding the rationale for what they are trying to learn. By starting with model evaluation in the context of how this could influence one's clinical care activities, our learners stay more engaged and willing to explore some of the upstream activities that would precede model development. By making our materials freely available for others to use, we hope to empower continued training in the tools that will drive clinical transformation.
Our approach to teaching data science in this way also has its limitations. While we have provided extensive annotation and description, for the workshop to be fully reproducible, a facilitator with data science knowledge is necessary to assist with more complex questions or performing infrequent troubleshooting. While the material should be relevant and approachable for all health care professionals, we have only facilitated workshops for nurses. Finally, while the platforms we use are currently available for free, we have no guarantee they will continue to remain free; however, our content and approach can be easily adapted to other notebook platforms.
Conclusion
An interactive approach to introduce nurses to data science concepts and skills is needed within health informatics educational programs. We have iteratively developed a hands-on workshop that leverages two clinically relevant case studies to facilitate learners becoming more comfortable discussing data science methods and critiquing others' results. Applying data science concepts to clinical practice is key to enhancing care delivery and ultimately improving patient outcomes. By equipping clinicians and nurses with these skills, we aim to foster a health care environment that is data driven, leading to improved health outcomes and efficient care delivery.
Clinical Relevance Statement
Clinicians should understand fundamental data science concepts in today's health care environment if they hope to improve health and health care within systems or populations. Our interactive, relatively brief workshop leverages adult learning principles to provide direct care nurses with key data science information that fills a gap in current academic programs.
Multiple-Choice Questions
-
Practicing clinicians who want to learn about data science are most likely to pursue the which learning format?
-
Pursuing a Doctor of Philosophy (PhD) degree
-
Attending a brief workshop with clinical case studies
-
Shadowing a data scientist for 1 month
-
Online learning module covering ML theory
Correct Answer: The correct answer is option b. While all these options are opportunities to learn more about data science, (b) is the most likely to be of interest to practicing clinicians. (a) would require years of investment and would likely result in a career change. (c) is a significant time commitment that might not be feasible for most clinicians; additionally, it would require knowing a data scientist who is also a decent educator. (d) focuses on theory when many clinicians prefer to engage in clinically relevant learning activities.
-
-
There are several datasets commonly used to teach data science methods. Which of the following datasets is most likely to keep clinicians engaged in learning?
-
Titanic—who will survive?
-
Credit Card Fraud—which transaction is fraudulent?
-
DIGITS—which number is present in the image?
-
Sepsis—which patient will develop sepsis?
Correct Answer: The correct answer is option d. One principle of adult learning is that learners should feel the tasks in which they're participating are relevant to them. Even though the Titanic (a), Credit Card Fraud (b), and DIGITS (c) datasets can be helpful to learn and practice data science skills, many clinicians will not feel the activities are relevant to their work. Therefore, the Sepsis (d) dataset is more likely to facilitate learning for a clinical audience.
-
Conflict of Interest
None declared.
Acknowledgments
We received support for this work from the Agency for Healthcare Research & Quality (AHRQ) and the Patient-Centered Outcomes Research Institute (PCORI) under Award Number K12HS026395; and the Gordon and Betty Moore Foundation through Grant GBMF9048. The content is solely the responsibility of the authors and does not necessarily represent the official views of AHRQ, PCORI, the U.S. Government, or the Gordon and Betty Moore Foundation.
Protection of Human and Animal Subjects
We received a determination from the Vanderbilt University Medical Center's Institutional Review Board that our evaluation approach did not qualify as research (IRB number: 241036).
-
References
- 1 Russell RG, Lovett Novak L, Patel M. et al. Competencies for the use of artificial intelligence-based tools by health care professionals. Acad Med 2023; 98 (03) 348-356
- 2 Mukhalalati BA, Taylor A. Adult learning theories in context: a quick guide for healthcare professional educators. J Med Educ Curric Dev 2019; 6: 2382120519840332
- 3 Kulhanek B. Delivering healthcare informatics training. In: Sengstack P, Boicey C. eds. Mastering Informatics. Indianapolis, IN: Sigma Theta Tau International; 2015
- 4 The MITRE Corporation. Synthea 2022. Accessed September 18, 2024 at: https://synthea.mitre.org/about
- 5 Brunner RJ, Kim EJ. Teaching data science. Procedia Comput Sci 2016; 80: 1947-1956
- 6 Lau S, Nolan D, Gonzalez J, Guo PJ. How computer science and statistics instructors approach data science pedagogy differently: three case studies. Proceedings of the 53rd ACM Technical Symposium on Computer Science Education (Sigcse 2022), Vol. 1. 2022: 29-35
Address for correspondence
Publication History
Received: 25 March 2024
Accepted: 29 August 2024
Accepted Manuscript online:
30 August 2024
Article published online:
11 December 2024
© 2024. Thieme. All rights reserved.
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References
- 1 Russell RG, Lovett Novak L, Patel M. et al. Competencies for the use of artificial intelligence-based tools by health care professionals. Acad Med 2023; 98 (03) 348-356
- 2 Mukhalalati BA, Taylor A. Adult learning theories in context: a quick guide for healthcare professional educators. J Med Educ Curric Dev 2019; 6: 2382120519840332
- 3 Kulhanek B. Delivering healthcare informatics training. In: Sengstack P, Boicey C. eds. Mastering Informatics. Indianapolis, IN: Sigma Theta Tau International; 2015
- 4 The MITRE Corporation. Synthea 2022. Accessed September 18, 2024 at: https://synthea.mitre.org/about
- 5 Brunner RJ, Kim EJ. Teaching data science. Procedia Comput Sci 2016; 80: 1947-1956
- 6 Lau S, Nolan D, Gonzalez J, Guo PJ. How computer science and statistics instructors approach data science pedagogy differently: three case studies. Proceedings of the 53rd ACM Technical Symposium on Computer Science Education (Sigcse 2022), Vol. 1. 2022: 29-35


