Subscribe to RSS
DOI: 10.1055/s-0041-1730032
Lessons Learned for Identifying and Annotating Permissions in Clinical Consent Forms
- Abstract
- Background and Significance
- Objective
- Methods
- Results
- Discussion
- Conclusion
- Clinical Relevance Statement
- Multiple Choice Questions
- References
Abstract
Background The lack of machine-interpretable representations of consent permissions precludes development of tools that act upon permissions across information ecosystems, at scale.
Objectives To report the process, results, and lessons learned while annotating permissions in clinical consent forms.
Methods We conducted a retrospective analysis of clinical consent forms. We developed an annotation scheme following the MAMA (Model-Annotate-Model-Annotate) cycle and evaluated interannotator agreement (IAA) using observed agreement (A o), weighted kappa (κw ), and Krippendorff's α.
Results The final dataset included 6,399 sentences from 134 clinical consent forms. Complete agreement was achieved for 5,871 sentences, including 211 positively identified and 5,660 negatively identified as permission-sentences across all three annotators (A o = 0.944, Krippendorff's α = 0.599). These values reflect moderate to substantial IAA. Although permission-sentences contain a set of common words and structure, disagreements between annotators are largely explained by lexical variability and ambiguity in sentence meaning.
Conclusion Our findings point to the complexity of identifying permission-sentences within the clinical consent forms. We present our results in light of lessons learned, which may serve as a launching point for developing tools for automated permission extraction.
#
Background and Significance
The informed consent process is woven into the fabric of health care ethics, and documentation of informed consent must be included in patients' records as evidence of express permissions for treatment or clinical procedures.[1] [2] Although there are benefits to eConsent,[3] [4] [5] the reality is that consent forms remain largely paper-based in health care settings.[6] Permissions are typically interpreted through manual review on a case-by-case basis; this presents significant issues in terms of scalability and consistency of interpretation. Consent forms are also largely scanned, limiting the usability and/or transferability of the forms.
Machine-interpretable representations of consent permissions are needed to support development of tools that act upon permissions across information ecosystems at scale. While several machine-interpretable representations of consent have been developed,[3] [7] [8] these efforts are centered on consent for research rather than consent in clinical contexts. Moreover, tools for processing real-world consent forms—a necessary precursor to linking consent form content to machine-interpretable representations—remain underdeveloped.
An annotation scheme serves as a human-readable blueprint to guide manual discovery of a given phenomenon and is a fundamental step toward automation. An iterative approach is often used for development of annotation schemes, starting with an initial guideline and updating it after multiple rounds of small sample annotations.[9]
#
Objective
This case report presents the process, results, and lessons learned while developing and testing an annotation scheme to identify permission-sentences in clinical consent forms.
#
Methods
Design
This was a retrospective analysis of clinical consent forms. The principal investigator (PI), a nurse scientist, a trained research assistant (RA), who contributed a health care consumer perspective, and a practicing registered nurse (RN) were involved throughout the entirety of the analysis and annotation process. A data scientist with experience in text processing supported technical aspects of the study. Institutional review board review was not required because human subjects were not involved. Only blank consent forms were collected and analyzed.
#
Recruitment and Sampling
Consent forms were collected through (1) direct contribution by health care facilities and (2) systematic web searching. The Michigan Health Information Management Association (MHIMA) sent an email to 29 directors of health information management departments requesting direct contribution of clinical consent forms. The systematic web search identified publicly available consent forms using search terms for 200 randomly selected hospitals registered with the Centers for Medicare and Medicaid Services (CMS),[10] 50 randomly selected ambulatory surgical centers participating in the Ambulatory Surgical Center Quality Reporting Program,[11] and 63 health care facilities affiliated with all Clinical and Translational Science Award (CTSA) hubs funded during 2014 to 2018 (1:1 match for CTSA hub to health care facility).[12] All facilities in the sample of clinical consent forms are described in [Supplementary Appendix A] (available in the online version).
#
Consent Form Management
Directly contributed consent forms were emailed to the PI or MHIMA contact; one facility allowed the PI to download consent forms from their internal Web site. Web searches for consent forms were retrieved from facility Web sites and Google searches. Forms were included if their primary purpose was consent for a clinical care process or procedure. We excluded duplicate forms, forms used for hospital operations or nonclinical purposes, and forms that were written in languages other than English or were not human-readable after conversion to .txt formats. The RA created records for facilities and forms in Excel spreadsheets and mapped metadata assigned by CMS (e.g., unique identifiers, name, location, facility type) to each form. [Fig. 1] summarizes the data collection and screening procedures.


#
Annotation Scheme Development
We followed Pustejovsky and colleagues' MAMA (Model-Annotate-Model-Annotate) cycle for annotation scheme development.[13] We iteratively annotated unique sets of five randomly selected consent forms at a time. The study team met after each round. We manually compared output after each iteration, qualitatively examined themes for differences between annotators, and adapted the annotation scheme and guideline accordingly to clarify its specifications. The annotation scheme was stable after five iterations.
The final annotation scheme and guideline is provided in [Supplementary Appendix B] (available in the online version). A permission-sentence was formally defined as a “statement(s) that, upon signature of the consent form, authorizes any new action or activity that may, must, or must not be done.” This definition enabled discrimination of permission-sentences from those which did not allow or forbid some new action or activity (e.g., descriptions of care, agreements for payment, statements of patients' rights). The tag Positive was used to markup sentences as permission-sentences (i.e., This is a permission-sentence). The tag Indeterminate (i.e., This might be a permission-sentence) indicated uncertainty by annotators due to ambiguous or inconsistent language in the consent form; this tag allowed annotators to group those uncertain sentences into this category rather than forcing a binary decision. All remaining sentences were tagged with Negative (i.e., This is not a permission-sentence).
#
Annotation and Data Preprocessing
Consent forms were converted from their original formats (.pdf, .doc) to text files (.txt) using document format conversion tools built into Adobe Acrobat DC and MS Word. Permission-sentences were identified and tagged in the text files by three annotators using an open-source annotation platform.[14] Considerable preprocessing was required prior to analysis. We used an open-source software library for natural language processing[15] without any case-specific optimization to parse and generate a list of all sentences for each informed consent form. We enforced standard character encoding (ASCII), and all non-ASCII characters were removed. We excluded “sentences” (i.e., text strings) that lacked English alphabet characters, were less than nine characters long, or were less than three words long.
#
Analysis of Interannotator Agreement
We calculated observed or raw agreement (A o) by summing the count of agreed-upon annotations for all tags and dividing by the count of all sentences. We also calculated weighted kappa (κw ) and Krippendorff's α to account for the degree of difference between tags (i.e., there is greater distance between Positive and Negative than between either tag and Indeterminate) and demonstrate interannotator agreement (IAA) beyond what was attributable to chance.[16] All analyses were performed using Python 3.7 or R for Statistical Computing.[17]
#
#
Results
The final dataset included 134 clinical consent forms from 62 health care facilities. The consent form files have been made publicly available.[18] These consent forms include 6,399 total sentences. Complete agreement was achieved for 5,871 sentences, including 211 positively identified and 5,660 negatively identified as permission-sentences across all three annotators (A o = 0.944, Krippendorff's α = 0.599). Pairwise agreement was highest between PI and RN (κw = 0.655).
Most sentences in the consent forms did not create new contracts of what could or could not be done. Of the sentences that at least one annotator believed may serve a contractual purpose (n = 739), 28.6% had full agreement and 47.5% were identified by at least two annotators (n = 351). [Table 1] presents IAA measures for all combinations of annotators. [Table 2] depicts the count and proportion of permission-sentences as the threshold for IAA was relaxed.
Abbreviations: CI, confidence interval. PI, principal investigator; RA, research assistant; RN, registered nurse.
Note: IAA is reported using all labels (Positive, Indeterminate, and Negative). Ao
is observed agreement, and κw
is a weighted Kappa coefficient, both of which are used to measure IAA between pairs of annotators. and Krippendorff's α are used to measure IAA across all three annotators.
Note: Percentages indicate the proportion of identified sentences to possible permission-sentences (n = 739) and the entire corpus (n = 6,399).
We found some consistency in the language used in permission-sentences. [Table 3] lists the top 10 verbs, including their frequency and an example, across the 211 completely agreed-upon permission-sentences. These verbs largely reflect either the act of giving permission (authorize, consent, may, request, agree, give) or otherwise refer to the actor consent is being given to or the action begin consented for (perform, named, use, receive). This provided our identifying and modeling processes a common structure of permission-sentences, which is reported elsewhere.[19] However, these common words and structure alone were not enough to discriminate permission-sentences with complete consistency, eliminating the possibility of simple rule-based extraction. Beyond instances of missingness (i.e., one or more annotators did not annotate a permission-sentence), disagreements emerged for several reasons. Example sentences for sources of disagreements are provided in [Table 4].
#
Discussion
We report on development and testing of an annotation scheme to identify permission-sentences in clinical consent forms. This is a first step toward developing tools to automate permission extraction and machine interpretation of consent form content. One among the challenges in developing the annotation scheme was the need to generate a definition of a permission-sentence that was stable for use across multiple documents and under review by three annotators. With the definition, we achieved a level of agreement that is encouraging as a foundation for future work by us or others. While it is known that reader comprehension of clinical consent forms is an ongoing challenge,[20] our findings may also point to such complexity and obfuscation within the forms that even two clinicians (PI and RN) had difficulty identifying permission-sentences within the sample of forms.
[Table 5] outlines lessons learned during this study to improve future annotation and machine-interpretability of permission-sentences. It is important to acknowledge that establishing content standards for defining and interpreting the details of permissions within clinical consent forms requires the involvement of those who author the form, those who sign forms, those who review forms, and those who approve and regulate consent forms at federal, state, and organizational levels. The emergence of single institutional review board reviews and efforts to improve procedural inefficiencies of data use agreements may cast a bright light on the need for some standard content about permissions in consent documents. The primary goal of consent forms, however, is to serve as a tool for communication between providers and patients, and secondarily to provide enduring documentation of the agreements between patients and providers regarding the allowability of actions that, absent a consent form, would not be allowable. A balance must be struck between standardly written content and standards for expressing content.
There is a clear need for informatics-based standards in this domain, related to clearly defined structures, language use, and encoding the meaning of these patient permissions in documents that are constructed in machine-interpretable formats. A working group within Health Level Seven (HL7) is currently developing a Fast Healthcare Interoperability Resource intended to provide interoperability standards that will address three types of consents: privacy consent directives, medical treatment consent directives, and research consent directives.[21] HL7 provides a tremendous opportunity for collaboration among those with “real-world” expertise in consent forms and their use and those developing standards for the exchange, integration, sharing, and retrieval of information within the consent forms. In the contemporary environment of digital documents and electronic health records, standards-based representations of consent, at the appropriate level of granularity, are essential to transparency into the permissions authorized by patients when signing consent forms. Future collaborative work should include development and refinement of a gold standard list of permission-sentences and annotations that can be used for automated- or semiautomated annotation that requires extensive programing or training of the tagging system.[15] While our IAA reflected moderate to substantial agreement,[22] these results should be interpreted cautiously. The scheme we propose may not yet be sufficient for information retrieval tasks. We believe that our agreement metrics would be higher with increased clarity and consistency in language used in consent forms.
This study has several limitations. First, it is not known whether the sample is broadly representative of clinical consent forms, nor whether web-retrieved consent forms are current. Second, additional levels of error variance were not accounted for as we did not nest permission-sentences by form or facility. It is possible that certain facilities used language that was highly agreed or disagreed upon. Readability of the consent forms was also not assessed. Lastly, use of the PI as an annotator, who was both lead annotation scheme developer and trainer of the RA, may have introduced bias. The annotation scheme should be further refined in future studies or reused by others for similar annotation tasks.
#
Conclusion
We developed, tested, and shared an annotation scheme for classifying permission-sentences within clinical consent forms that performed with moderate to substantial reliability among three annotators. Our findings point to the complexity of identifying permission-sentences within the clinical consent forms. We present our results in light of lessons learned, which may serve as a launching point for developing tools for automated permission extraction. Future research should examine the understandability of consent permissions across stakeholders, and potentially standardization of clinical consent form structures and content, with emphasis on increasing their understandability by both human and system users.
#
Clinical Relevance Statement
Informed consent is foundational to respecting patients' autonomy. However, consent information presently relies on human interpretation which is increasingly problematic as the health information ecosystem grows more interconnected. This study and our lessons learned serve as a launching point for future permission extraction, which are the precursors for tools to interpret and act upon permissions at scale.
#
Multiple Choice Questions
-
Which measure of agreement does not take agreement due to chance or random error into account?
-
Observed agreement (A o)
-
Weighted kappa κw
-
Krippendorff's α
-
Fleiss κ
Correct Answer: The correct answer is a. Observed agreement is a simple ratio of all items agreed upon by annotators to all possible items. It does not take random error into account.
-
-
Who should clinical consent forms be understandable by?
-
Clinicians
-
Patients
-
Compliance officers and lawyers
-
All of the above
Correct Answer: The correct answer is option d. Consent forms must be consistently understood by those obtaining consent (clinicians), those granting consent (patients), and those who develop and oversee the use of consent forms (compliance officers and lawyers).
-
#
#
Conflict of Interest
None declared.
Protection of Human and Animal Subjects
Institutional Review Board review was not required because human subjects were not involved. Only blank consent forms were collected and analyzed.
-
References
- 1 American Medical Association. Informed Consent. American Medical Association. . Published 2020. Accessed January 4, 2020 at: https://www.ama-assn.org/delivering-care/ethics/informed-consent
- 2 Hertzum M. Electronic health records in Danish home care and nursing homes: inadequate documentation of care, medication, and consent. Appl Clin Inform 2021; 12 (01) 27-33
- 3 Chalil Madathil K, Koikkara R, Obeid J. et al. An investigation of the efficacy of electronic consenting interfaces of research permissions management system in a hospital setting. Int J Med Inform 2013; 82 (09) 854-863
- 4 Reeves JJ, Mekeel KL, Waterman RS. et al. Association of electronic surgical consent forms with entry error rates. JAMA Surg 2020; 155 (08) 777-778
- 5 Chen C, Lee P-I, Pain KJ, Delgado D, Cole CL, Campion Jr TR. Replacing paper informed consent with electronic informed consent for research in academic medical centers: a scoping review. AMIA Jt Summits Transl Sci Proc 2020; 2020: 80-88
- 6 Litwin J. Engagement shift: informed consent in the digital era. Appl Clin Trials 2016; 25 (6/7): 26, 28, 30, 32
- 7 Obeid J, Gabriel D, Sanderson I. A biomedical research permissions ontology: cognitive and knowledge representation considerations. Proc Gov Technol Inf Policies 2010; 2010: 9-13
- 8 Lin Y, Harris MR, Manion FJ. et al. Development of a BFO-based Informed Consent Ontology (ICO). In: 5th ICBO Conference Proceedings; Houston, Texas; October 6–10, 2014
- 9 Chapman WW, Dowling JN. Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports. J Biomed Inform 2006; 39 (02) 196-208
- 10 Centers for Medicare & Medicaid Services. Hospital general information. Data.gov. Published February 23, 2019. Accessed November 7, 2019 at: https://data.cms.gov/provider-data/dataset/xubh-q36u
- 11 Centers for Medicare & Medicaid Services. Ambulatory surgical quality measures – facility. Data.gov. Published October 31, 2019. Accessed November 7, 2019 at: https://data.cms.gov/provider-data/dataset/wue8-3vwe
- 12 CTSA Program Hubs. National center for advancing translational sciences. Published March 13, 2015. Accessed November 26, 2019 at: https://ncats.nih.gov/ctsa/about/hubs
- 13 Pustejovsky J, Bunt H, Zaenen A. Designing annotation schemes: from theory to model. In: Ide N, Pustejovsky J. eds. Handbook of Linguistic Annotation. Dordrecht: Springer; 2017: 21-72
- 14 Dataturks. Trilldata Technologies Pvt Ltd; 2018. . Accessed May 11, 2021 at: https://github.com/DataTurks
- 15 Honnibal M, Montani I. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. Published online 2017
- 16 Artstein R. Inter-annotator Agreement. In: Ide N, Pustejovsky J. eds. Handbook of Linguistic Annotation. Dordrecht: Springer; 2017: 297-313
- 17 R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2013 . Accessed April 23, 2021 at: http://www.R-project.org/
- 18 Umberfield E, Ford K, Stansbury C, Harris MR. Dataset of Clinical Consent Forms [Data set]. University of Michigan - Deep Blue. Accessed May 11, 2021 at: https://doi.org/10.7302/j17s-qj74
- 19 Umberfield E, Stansbury C, Ford K. et al. Evaluating and Extending the Informed Consent Ontology for Representing Permissions from the Clinical Domain. (Under Review)
- 20 Eltorai AEM, Naqvi SS, Ghanian S. et al. Readability of invasive procedure consent forms. Clin Transl Sci 2015; 8 (06) 830-833
- 21 HL7. 6.2 Resource Consent - Content. HL7 FHIR Release 5, Preview 3. Published 2021. Accessed March 11, 2021 at: http://build.fhir.org/consent.html
- 22 Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33 (01) 159-174
Address for correspondence
Publication History
Received: 29 December 2020
Accepted: 31 March 2021
Article published online:
23 June 2021
© 2021. Thieme. All rights reserved.
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References
- 1 American Medical Association. Informed Consent. American Medical Association. . Published 2020. Accessed January 4, 2020 at: https://www.ama-assn.org/delivering-care/ethics/informed-consent
- 2 Hertzum M. Electronic health records in Danish home care and nursing homes: inadequate documentation of care, medication, and consent. Appl Clin Inform 2021; 12 (01) 27-33
- 3 Chalil Madathil K, Koikkara R, Obeid J. et al. An investigation of the efficacy of electronic consenting interfaces of research permissions management system in a hospital setting. Int J Med Inform 2013; 82 (09) 854-863
- 4 Reeves JJ, Mekeel KL, Waterman RS. et al. Association of electronic surgical consent forms with entry error rates. JAMA Surg 2020; 155 (08) 777-778
- 5 Chen C, Lee P-I, Pain KJ, Delgado D, Cole CL, Campion Jr TR. Replacing paper informed consent with electronic informed consent for research in academic medical centers: a scoping review. AMIA Jt Summits Transl Sci Proc 2020; 2020: 80-88
- 6 Litwin J. Engagement shift: informed consent in the digital era. Appl Clin Trials 2016; 25 (6/7): 26, 28, 30, 32
- 7 Obeid J, Gabriel D, Sanderson I. A biomedical research permissions ontology: cognitive and knowledge representation considerations. Proc Gov Technol Inf Policies 2010; 2010: 9-13
- 8 Lin Y, Harris MR, Manion FJ. et al. Development of a BFO-based Informed Consent Ontology (ICO). In: 5th ICBO Conference Proceedings; Houston, Texas; October 6–10, 2014
- 9 Chapman WW, Dowling JN. Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports. J Biomed Inform 2006; 39 (02) 196-208
- 10 Centers for Medicare & Medicaid Services. Hospital general information. Data.gov. Published February 23, 2019. Accessed November 7, 2019 at: https://data.cms.gov/provider-data/dataset/xubh-q36u
- 11 Centers for Medicare & Medicaid Services. Ambulatory surgical quality measures – facility. Data.gov. Published October 31, 2019. Accessed November 7, 2019 at: https://data.cms.gov/provider-data/dataset/wue8-3vwe
- 12 CTSA Program Hubs. National center for advancing translational sciences. Published March 13, 2015. Accessed November 26, 2019 at: https://ncats.nih.gov/ctsa/about/hubs
- 13 Pustejovsky J, Bunt H, Zaenen A. Designing annotation schemes: from theory to model. In: Ide N, Pustejovsky J. eds. Handbook of Linguistic Annotation. Dordrecht: Springer; 2017: 21-72
- 14 Dataturks. Trilldata Technologies Pvt Ltd; 2018. . Accessed May 11, 2021 at: https://github.com/DataTurks
- 15 Honnibal M, Montani I. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. Published online 2017
- 16 Artstein R. Inter-annotator Agreement. In: Ide N, Pustejovsky J. eds. Handbook of Linguistic Annotation. Dordrecht: Springer; 2017: 297-313
- 17 R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2013 . Accessed April 23, 2021 at: http://www.R-project.org/
- 18 Umberfield E, Ford K, Stansbury C, Harris MR. Dataset of Clinical Consent Forms [Data set]. University of Michigan - Deep Blue. Accessed May 11, 2021 at: https://doi.org/10.7302/j17s-qj74
- 19 Umberfield E, Stansbury C, Ford K. et al. Evaluating and Extending the Informed Consent Ontology for Representing Permissions from the Clinical Domain. (Under Review)
- 20 Eltorai AEM, Naqvi SS, Ghanian S. et al. Readability of invasive procedure consent forms. Clin Transl Sci 2015; 8 (06) 830-833
- 21 HL7. 6.2 Resource Consent - Content. HL7 FHIR Release 5, Preview 3. Published 2021. Accessed March 11, 2021 at: http://build.fhir.org/consent.html
- 22 Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33 (01) 159-174

