Applying FAIR Principles to Improve Data Searchability of Emergency Department Datasets: A Case Study for HCUP-SEDD
Background There is a recognized need to improve how scholarly data are managed and accessed. The scientific community has proposed the findable, accessible, interoperable, and reusable (FAIR) data principles to address this issue.
Objective The objective of this case study was to develop a system for improving the FAIRness of Healthcare Cost and Utilization Project's State Emergency Department Databases (HCUP's SEDD) within the context of data catalog availability.
Methods A search tool, EDCat (Emergency Department Catalog), was designed to improve the “FAIRness” of electronic health databases and tested on datasets from HCUP-SEDD. ElasticSearch was used as a database for EDCat's search engine. Datasets were curated and defined. Searchable data dictionary-related elements and unified medical language system (UMLS) concepts were included in the curated metadata. Functionality to standardize search terms using UMLS concepts was added to the user interface.
Results The EDCat system improved the overall FAIRness of HCUP-SEDD by improving the findability of individual datasets and increasing the efficacy of searches for specific data elements and data types.
Discussion The databases considered for this case study were limited in number as few data distributors make the data dictionaries of datasets available. The publication of data dictionaries should be encouraged through the FAIR principles, and further efforts should be made to improve the specificity and measurability of the FAIR principles.
Conclusion In this case study, the distribution of datasets from HCUP-SEDD was made more FAIR through the development of a search tool, EDCat. EDCat will be evaluated and developed further to include datasets from other sources.
KeywordsFAIR principles - data dictionary - Unified Medical Language System - findability - accessibility - interoperability - reusability
Received: 02 June 2019
Accepted: 24 March 2020
14 June 2020 (online)
© 2020. Thieme. All rights reserved.
Georg Thieme Verlag KG
Stuttgart · New York
- 1 Leventhal R. Report: healthcare data is growing exponentially, needs protection. Available at: https://www.healthcare-informatics.com/news-item/report-healthcare-data-growing-exponentially-needs-protection . Accessed May 26, 2017
- 2 Marr B. How big data is changing healthcare. Available at: https://www.forbes.com/sites/bernardmarr/2015/04/21/how-big-data-is-changing-healthcare/#24d652962873 . Accessed May 26, 2017
- 3 Institute of Medicine (US) Roundtable on Value & Science-Driven Health Care. Clinical Data as the Basic Staple of Health Learning: Creating and Protecting a Public Good: Workshop Summary. Washington, DC: National Academies Press (US); 2010. . Available at: https://www.ncbi.nlm.nih.gov/books/NBK54296/
- 4 Pathak ND. Why health care may finally be ready for big data. Available at: https://hbr.org/2014/12/why-health-care-may-finally-be-ready-for-big-data . Accessed May 26, 2017
- 5 Institute of Medicine (US). Sharing Clinical Research Data: Workshop Summary. Washington, DC: National Academies Press (US); 2013. . Available at: https://www.ncbi.nlm.nih.gov/books/NBK137818/
- 6 Big Data Sharing for Better Health. Available at: https://health.ucsd.edu/news/releases/pages/2014-10-10-big-data-grant.aspx . Accessed May 26, 2017
- 7 Martin E. Evaluating the quality, usability, and fitness of open data for public health research. Available at: http://www.publichealthsystems.org/sites/default/files/presentations/Presentation%20PHSSR%20ResProg_Mar%2011_Erika%20Martin.pdf . Accessed May 26, 2017
- 8 Wilkinson MD, Dumontier M, Aalbersberg IJ. , et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data 2016; 3: 160018
- 9 European Commission. Directorate-General for Research & Innovation. H2020 Programme Guidelines on FAIR Data Management in Horizon 2020, Version 3.0. Available at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf . Accessed May 26, 2017
- 10 Big Data to Knowledge. Available at: https://commonfund.nih.gov/bd2k . Accessed May 26, 2017
- 11 Dublin Core Metadata Element Set. Version 1.1. Available at: http://dublincore.org/documents/dces/ . Accessed May 26, 2017
- 12 HealthData.gov. Available at: https://www.healthdata.gov/ . Accessed May 26, 2017
- 13 Home—DataMed | bioCADDIE Data Discovery Index. Available at: https://datamed.org/ . Accessed May 26, 2017
- 14 NYU Health Sciences Library Data Catalog. Available at: https://datacatalog.med.nyu.edu/ . Accessed May 26, 2017
- 15 Scerri A, Kuriakose J, Deshmane AA. , et al. Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge, Database, Volume 2017, 2017, bax056
- 16 Facilitating the discovery of public datasets. Available at: https://research.googleblog.com/2017/01/facilitating-discovery-of-public.html . Accessed May 26, 2017
- 17 McDaniel G. , ed. IBM Dictionary of Computing. 10th ed. New York, NY: McGraw-Hill; 1994
- 18 Wolfson JD. Health Language Blog. Available at: http://blog.healthlanguage.com/what-is-a-data-dictionary-and-what-role-does-it-play-in-semantic-interoperability . Accessed May 26, 2017
- 19 de Lusignan S, Liaw ST, Michalakidis G, Jones S. Defining datasets and creating data dictionaries for quality improvement and research in chronic disease using routinely collected data: an ontology-driven approach. Inform Prim Care 2011; 19 (03) 127-134
- 20 Best Practices for Data Dictionary Definitions and Usage Version 1.1 2006–11–14 [PDF]. Pacific Northwest Aquatic Monitoring Partnership. Available at: http://storage.hinterland.nu/webdav/Documents/Data%20Dictionary/best_practices_for_data_dictionary_definitions_and_usage_version_1.1_2006-11-14.pdf . Accessed February 9, 2020
- 21 “Overview of HCUP.” HCUP. Available at: www.hcup-us.ahrq.gov/overview.jsp . Accessed February 9, 2020
- 22 Overview of the State Emergency Department Databases (SEDD). Available at: https://www.hcup-us.ahrq.gov/seddoverview.jsp . Accessed May 26, 2017
- 23 State Emergency Department Database (SEDD). Available at: https://datamed.org/display-item.php?repository=0012&id=56d4b850e4b0e644d313246b&query=hcup SEDD. Accessed May 26, 2017
- 24 Quick Start Guide UMLS. Available at: https://www.nlm.nih.gov/research/umls/quickstart.html . Accessed May 26, 2017
- 25 Reich C, Ryan PB, Stang PE, Rocca M. Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases. J Biomed Inform 2012; 45 (04) 689-696
- 26 Hirshon JM, Warner M, Irvin CB. , et al. Research using emergency department-related data sets: current status and future directions. Acad Emerg Med 2009; 16 (11) 1103-1109
- 27 Dictionaries. Available at: http://dictionary.cambridge.org/us/dictionary/ . Accessed May 26, 2017
- 28 Data Management. Available at: www.usgs.gov/products/data-and-tools/data-management . Accessed February 9, 2020
- 29 “Describing Your Data: Data Dictionaries.” Smithsonian Libraries, 1 January, 1970. Available at: library.si.edu/research/describing-your-data-data-dictionaries