Development of a Natural Language Processing Tool to Enable Clinical

Status Not yet recruiting

Clinical Trial Summary

The goal of this retrospective cohort study is to develop and validate a language model that can interpret the contents of emergency department electronic medical records and extract relevant information for research purposes in all adult patients who arrived at the participating emergency departments in a three-year period. The main question it aims to answer is: is the language model able to interpret the contents of emergency department electronic medical records and extract the requested information from them so that it can be used to make accurate analyses and predictions? The study is retrospective and data will be extracted automatically from the medical health records.

Clinical Trial Description

BACKGROUND AND RATIONALE FOR THE STUDY Conducting clinical and quality-of-care assessment research in emergency medicine is as difficult as it is important. It is difficult because the vast number of patients that need to be treated and the chronic shortage of staff make ad hoc data collection impractical. It is important because, in the end, research enables emergency physicians and nurses to base their practice on evidence obtained in their own, unique setting, as opposed to evidence obtained in far-removed contexts, as is commonly the case today. The only way to bridge the gap between research needs and availability of robust data is to extract data directly from the electronic health records (EHRs) of emergency departments, avoiding dedicated, time-consuming data collection. This is a difficult task, however, because the most useful information is in free text format (e.g., presence of signs and symptoms, suspected and confirmed diagnosis, anamnesis). Such circumstances and needs require a reliable natural language processing (NLP) tool to derive highly consistent data from free text. Today, large-scale language models are available that can accurately interpret natural language. These models are trained on huge amounts of general knowledge taken mostly from the Internet, however, so their performance in more specialized areas, such as the medical domain, may not be optimal. The present study is part of a larger project called eCREAM (enabling Clinical Research in Emergency and Acute-care Medicine), and aims to develop and validate a language model (called eCREAM_LM) for six languages that can interpret the contents of emergency department EHRs and extract relevant information for research purposes. METHODS The study is an observational, multicenter, retrospective, 24-month study. Thirty centers will participate in the study: 13 from Italy, 4 from Poland, 3 from Greece, Slovakia, Slovenia, and the United Kingdom, and 1 from Switzerland. The centers will not receive any compensation, but their expenses will be covered by project funds. Development and validation of the eCREAM_LM model. eCREAM_LM will be developed through training and fine-tuning of the best overall model, among those open-source, and will proceed in partially parallel phases. Candidate models will be exposed to a huge amount (billions) of medical texts from the scientific literature or other public sources. Simultaneously, the models will also be exposed to a massive amount (millions) of free text notes obtained from medical records in use at participating hospitals. The investigators will then move on to fine-tuning, where a large amount (thousands) of clinical notes, obtained, once again, from the medical records of participating centers, will be used. These notes will be annotated by experienced physicians, which consists of extracting information from the notes to fill in the data items listed in a virtual data collection form (vCRF). The vCRF was created for a related study and contains a set of variables useful in predicting the hospitalization of patients with dyspnea or transient loss of consciousness, which is the objective of the related study. In the current study, the vCRF will serve as a tool for validating the language model. Validation of eCREAM_LM will be carried out using a set of 1,000 clinical notes annotated as described above, but not used in the development phase. These notes will be submitted to the eCREAM_LM model with the task of compiling the vCRF. The concordance in filling in the vCRF between the expert physicians and the eCREAM_LM will be the measure of final validation of eCREAM_LM. Data collection and anonymization Each participating hospital will provide free text notes contained in the medical records of 150-300,000 adult patients treated between 2021 and 2023. Notes referring to different aspects of the same patient (e.g., history, objective examination, test results) will be separated from each other so that it will be impossible to reconstruct the complete profile of the patient. In addition, the notes will be stripped of any reference to the patient (e.g., first name, last name, date of birth) and context (e.g., hospital, date and time of arrival at the center). This process minimizes the likelihood of re-identifying patients and maximizes the protection of their rights. The likelihood of re-identifying a patient within a database depends on how unique his or her characteristics are from other individuals in the database. The likelihood of having unique, and therefore identifiable, patients increases with the amount of information available in the database and decreases with its size. By removing all personal and contextual information from clinical notes and separating each note from the others, each note will only report a few characteristics of the patient. In addition, data collected from hospitals in the same country will be merged so that there is one large database for each language. This effectively zeroes out the probability of there being individuals uniquely identifiable from the notes. Finally, to rule out the possibility that the notes will contain information about third parties, such as names and phone numbers of patients' relatives, a certified anonymization software, specifically designed to remove personal data from free text, will be installed in each hospital. Once anonymized, the data will be centralized for analysis and will also be uploaded to major European language resource sharing platforms in the scientific community. Statistical analysis In the eCREAM_LM validation, the investigators will assess the concordance between expert emergency physicians and the eCREAM_LM itself in filling in the vCRF. The data will refer to a sample of 1,000 notes for each study language. Concordance will be assessed for each variable of the vCRF using Cohen's κ as a measure of agreement. The eCREAM_LM will be considered valid if Cohen's κ is greater than 0.75. Sample size Assuming an excellent agreement (κ=0.80) between eCREAM_LM and the experienced emergency physicians in completing the vCRF, a sample of at least 735 notes will be necessary to achieve sufficient precision to guarantee a good agreement (lower confidence limit of 95% confidence interval of Cohen's κ greater than 0.75). This number is the maximum sample size obtained under different scenarios involving a different number of categories (2 to 5) for each variable and different marginal distributions of the categories in the sample, including balanced distributions (e.g., 5 categories with 20% of the sample in each category) and very imbalanced results (e.g., 5 categories with 1.8%, 7.3%, 16.4%, 29.1% and 45.5% of the sample). Since information of interest may be missing in some notes, the investigators will perform the data validation assessment on 1,000 notes. ;

Study Design

Related Conditions & MeSH terms

Clinical Trial Details

NCT number	NCT06240572
Study type	Observational
Source	Mario Negri Institute for Pharmacological Research
Contact	Chiara Pandolfini
Phone	0039 02 39014 253
Email	chiara.pandolfini@marionegri.it
Status	Not yet recruiting
Phase
Start date	June 2024
Completion date	May 2025

View Details

See also
Status	Clinical Trial	Phase
Completed	`NCT04021771` - Trial of Simulation-based Mastery Learning to Communicate Diagnostic Uncertainty	N/A
Not yet recruiting	`NCT06372379` - Development of a Multipurpose Dashboard to Monitor the Situation of Emergency Departments
Not yet recruiting	`NCT06354764` - Propensity to Hospitalize Patients From the ED in European Centers.
Completed	`NCT05870137` - Assessing Mixed Reality for Emergency Medical Care Delivery in a Simulated Environment	N/A
Completed	`NCT05073406` - Cognition at Altitude in HEMS - Part II	N/A
Completed	`NCT03457272` - Development and Evaluation of a Patient Safety Model	N/A
Completed	`NCT04138446` - Effects of Acute Hypobaric Hypoxia Exposure on Neurocognitive Performance of Pre-hospital Emergency Service Providers	N/A
Completed	`NCT02661607` - Point of Care Echocardiography Versus Chest Radiography for the Assessment of Central Venous Catheter Placement	N/A
Recruiting	`NCT05937763` - ED Adaptive Staffing Study
Completed	`NCT03848559` - Airway Management With Simulated Microgravity Using a Submerged Model	N/A
Completed	`NCT04328519` - The Charlson Comorbidity Index: Predicting Severity in Emergency Departments
Completed	`NCT03314480` - REDucing Unnecessary Computed Tomography Imaging for MaxillOfacial INjury
Enrolling by invitation	`NCT05809648` - A Study to Assess the Accuracy of Magnetocardiography (MCG) to Diagnose True Ischemia in Patients With Chest Pain in the ED
Completed	`NCT03099915` - Asthma Attack in the Emergency Department : Reasons Of This Attendance
Completed	`NCT04206566` - Pre-hospital Advanced Airway Management Studying Expedited Routines
Completed	`NCT03733158` - Flexible Tip Bougie Catheter Intubation	N/A
Completed	`NCT03420027` - Prehospital and Emergency Feasibility of MACOCHA Score Assessment to Predict Difficult Tracheal Intubation
Recruiting	`NCT03486171` - Tracheal Intubation and Prehospital Emergency Setting
Completed	`NCT00448331` - Facilitated Referral for Children Screening Positive for Mental Illness	Phase 0
Completed	`NCT04460196` - Healthcare Renunciation During the Confinement Period in Connection With the COVID-19 Epidemic in Adult Emergency Departments

Development of a Natural Language Processing Tool to Enable Clinical Research in Emergency Medicine — NLP-DeVal

Clinical Trial Summary

Clinical Trial Description

Study Design

Related Conditions & MeSH terms