Pulmonary Embolism Clinical Trial
Official title:
Data Acquisition Study With Artificial Intelligence and Phenotyping of Patients Who Presented With Acute Pulmonary Embolism
The initial aim is to build and validate artificial intelligence tools (machine learning and Natural Language Processing) to acquire and structure data from medical reports at the Centre Hospitalier Intercommunal de Toulon - la Seyne sur mer (CHITS). This project will build upon work previously done by the Department of Epidemiology, Biostatistics and Health Data (DEBDS) at the Centre Antoine Lacassagne (CAL) in Nice, focusing on breast and thyroid cancers. The idea is to validate the transferability of these tools to another establishment with different pathologies and practitioners, specifically the vascular medicine department at CHITS. Subsequently, the aim will be to identify clinically relevant phenotypes in patients with acute pulmonary embolism. Hierarchical clustering methods combined with unsupervised learning (machine learning) will be used to obtain groups of patients who are homogeneous at diagnosis. Evaluating their prognosis at 6 months (recurrence or chronic thromboembolic pulmonary hypertension), account the first 3 months of anticoagulant treatment, would provide an aid to medical decision-making. This research will include a retrospective and a prospective parts. The retrospective part will include patients who have been admitted to CHITS for acute pulmonary embolism since 2019. For the prospective part, it is planned to include patients with same characteristics over the years 2024 and 2025. More than 2,500 patients are expected to be included. This research will have no impact on current patient care. Data from consultations and various examinations carried out as part of care will be collected for six months post-diagnosis in order to meet the research objectives.
Context : Artificial Intelligence : NLP, clustering and unsupervised learning: Artificial Intelligence (AI) is a field that combines computer science with data sets, with the aim of enabling a machine to imitate the cognitive abilities of human being. Machine learning (ML) and its sub-domain deep learning, which uses layers of neurons, are two major sub-domains of AI. The difference lies in training of each algorithm. Two distingued methods are used by ML : supervised learning, which involves training a model on known input and output data to predict future outputs, and unsupervised learning involves the discovery of hidden patterns and intrinsic underlying structures in the input data. Natural Language Processing (NLP) is also a subfield of AI, but it generally requires ML to be effective. NLP processes real-world linguistic data to make sense of it in a way that a computer can understand. NLP has two main stages: data pre-processing and algorithm development. Programming languages such as Python or R are widely used for these techniques. The aim of clustering methods is to group a set of individuals into homogeneous classes. Non-hierarchical methods can be used to classify massive data but require to fixe in advance the number of classes. Hierarchical methods, which are more time-consuming to compute, consist of a series of nested partitions represented by a clustering tree. The optimal number of classes can be determined a posteriori by reading the tree. In presence of a large number of individuals, it is common to combine non-hierarchical and hierarchical techniques. When classes are not clearly known in advance, clustering methods are use with unsupervised learning (ML) [1]. Datasets are generally divided into three disjoint datasets: training data, used to train the chosen algorithm(s); validation data, used to check performance of result; and test data, used only at the end of the process. Venous thromboembolic disease: Venous thromboembolic disease (VTE) is a common pathology whose incidence is imperfectly known, but increases with age, reaching 1% in subjects over 75 years old. In France, it is estimated that every year over 100,000 people develop VTE, which is responsible for between 5,000 and 10,000 deaths. Deep vein thrombosis (DVT) and pulmonary embolism (PE) are the two main types of VTE. DVT corresponds to partial or total occlusion of a deep vein by a thrombus, most often localized in the lower limbs. PE is defined as partial or total occlusion of the pulmonary arteries or their branches. The main risk of DVT is the occurrence of PE, which can be life threatening. Other VTE-specific complications and possible adverse outcomes include thromboembolic recurrence (either DVT or PE), chronic thromboembolic pulmonary hypertension and post-thrombotic syndrome in DVT. Current management of VTE is mainly based on anticoagulant therapy. The duration of treatment varies according to the estimated risk of recurrence if treatment is withdrawn, essentially depending on whether or not there is a prior major risk factor [2]. In this subgroup of PE patients, in the absence of major risk factors, risk of recurrence is considered intermediate and varies according to whether the event is a first episode or a recurrence, and whether there are obstructive pulmonary sequelae or not [3]. More recently, the therapeutic strategy has become more complex, with inclusion of minor risk factors that modulate duration of treatment without relevant evidence. Moreover, regardless of the duration of treatment, the dosage of anticoagulation beyond the sixth month is uncertain for Direct Oral Anticoagulants. Hypotheses : This research is presented under two distinct axis : AXIS 1: The aim of this work will initially be to develop and validate artificial intelligence tools, using ML and NLP, for acquiring and structuring data from text-based medical reports in department of vascular medicine at the Centre Hospitalier Intercommunal de Toulon - la Seyne sur mer (CHITS). This project will build upon work previously done by the Department of Epidemiology, Biostatistics and Health Data (DEBDS) at the Centre Antoine Lacassagne (CAL) focusing on breast and thyroid cancers [5,6,7]. The idea is to validate transferability of these tools to another establishment with different pathologies and practitioners, specifically the vascular medicine department of CHITS. Implementing a method of acquiring structured data using artificial intelligence techniques directly from textual medical reports within our hospital is a challenge. If its performance is proven and this tool is implemented on a permanent routinely basis, it would provide an easily exploitable source of information. The diversity of fields and interests in clinical research in our establishment may make deployment in other departments an achievable goal. For CHITS, this is the first step in the process of building a Health Data Warehouse (HDW). AXIS 2: Subsequently, the aim will be to use the database to identify clinically relevant phenotypes in patients with acute pulmonary embolism. Hierarchical clustering methods combined with unsupervised learning (machine learning) will be used to obtain groups of patients who are homogeneous at diagnosis. Evaluating their prognosis at 6 months (recurrence or chronic thromboembolic pulmonary hypertension), account the first 3 months of anticoagulant treatment, would provide an aid to medical decision-making. An analysis of the six-month evolution of homogeneous patient groups with acute pulmonary embolism, constructed using clustering methods with unsupervised learning has never been conducted before. This innovative project within a large-scale hospital infrastructure is likely to offer doctors a decision-making aid, and patients a scientifically-validated form of therapeutic management. Material and Methods : This research will include a retrospective and a prospective parts. The retrospective part will include patients who have been admitted to CHITS for acute pulmonary embolism since 2019 (around 1900 patients). For the prospective part, it is planned to include patients with same characteristics over the years 2024 and 2025 (approximately 765 patients). If individual information is not available or they object to the processing of their data for 25% of the patients, a large volume of data on over 2,500 patients could potentially be analysed in this trial. This research will have no impact on current patient care. Data from consultations and various examinations carried out as part of the care will be collected for six months post-diagnosis to meet the research objectives. AXIS 1: The data acquisition method used in this research will be twofold. Data from patients included in clinical research will be collected conventionally using a case report form, then centralized and organized in a reference database called "Gold Standard", and entered by a clinical research technician. The second technique of data acquisition, using NLP methods, will proceed in several stages, in parallel with previous approach. First, the extraction of medical reports (MR) in text format will be followed by a pseudonymization stage. The MR dataset will then be prepared for training and validation by removing special characters and identifying segment of interests. Then, MR will be annotated with BRAT in order to identify the terms that will be used to populate the database. Training scripts will be applied on 70% of patients in order to create NLP models. During this training phase, post-processing medical rules will be written in order to translate the information identified by the models into structured data. The script thus finalized is applied to the validation base with an evaluation of its performance. After any necessary adjustments, the performance of final script is evaluated on the test database. Performance will be assessed by comparing data obtained automatically with the manual Gold Standard database. AXIS 2 : Unsupervised clustering methods used in this study combine hierarchical and non-hierarchical methods. Following the hierarchical ascending clustering, Ward's index is used to determine the number of groups of interest. The centroids of these groups are then considered to initialize a partitioning algorithm, such as the k-means algorithm. Once most medically relevant groups have been determined, six-month evolution (stable, aggravation or progress) are compared. Factors influencing progression during the first three months of treatment can also be included in a statistic model, depending on their ability to predict aggravation. All these explorations should provide a basis for medical decision-making. ;
Status | Clinical Trial | Phase | |
---|---|---|---|
Recruiting |
NCT05050617 -
Point-of-Care Ultrasound in Predicting Adverse Outcomes in Emergency Department Patients With Acute Pulmonary Embolism
|
||
Terminated |
NCT04558125 -
Low-Dose Tenecteplase in Covid-19 Diagnosed With Pulmonary Embolism
|
Phase 4 | |
Not yet recruiting |
NCT06017271 -
Predictive Value of Epicardial Adipose Tissue for Pulmonary Embolism and Death in Patients With Lung Cancer
|
||
Completed |
NCT03915925 -
Short-term Clinical Deterioration After Acute Pulmonary Embolism
|
||
Completed |
NCT02502396 -
Rivaroxaban Utilization for Treatment and Prevention of Thromboembolism in Cancer Patients: Experience at a Comprehensive Cancer Center
|
||
Recruiting |
NCT05171075 -
A Study Comparing Abelacimab to Dalteparin in the Treatment of Gastrointestinal/Genitourinary Cancer and Associated VTE
|
Phase 3 | |
Completed |
NCT04454554 -
Prevalence of Pulmonary Embolism in Patients With Dyspnea on Exertion (PEDIS)
|
||
Completed |
NCT03173066 -
Ferumoxytol as a Contrast Agent for Pulmonary Magnetic Resonance Angiography
|
Phase 1 | |
Terminated |
NCT03002467 -
Impact Analysis of Prognostic Stratification for Pulmonary Embolism
|
N/A | |
Completed |
NCT02334007 -
Extended Low-Molecular Weight Heparin VTE Prophylaxis in Thoracic Surgery
|
Phase 1/Phase 2 | |
Completed |
NCT02611115 -
Optimizing Protocols for the Individual Patient in CT Pulmonary Angiography.
|
N/A | |
Completed |
NCT01975090 -
The SENTRY Clinical Study
|
N/A | |
Not yet recruiting |
NCT01357941 -
Need for Antepartum Thromboprophylaxis in Pregnant Women With One Prior Episode of Venous Thromboembolism (VTE)
|
N/A | |
Completed |
NCT01326507 -
Prognostic Value of Heart-type Fatty Acid-Binding Protein (h-FABP) in Acute Pulmonary Embolism
|
N/A | |
Completed |
NCT00720915 -
D-dimer to Select Patients With First Unprovoked Venous Thromboembolism Who Can Have Anticoagulants Stopped at 3 Months
|
N/A | |
Completed |
NCT00771303 -
Ruling Out Pulmonary Embolism During Pregnancy:a Multicenter Outcome Study
|
||
Completed |
NCT02476526 -
Safety of Low Dose IV Contrast CT Scanning in Chronic Kidney Disease
|
Phase 4 | |
Completed |
NCT00780767 -
Angiojet Rheolytic Thrombectomy in Case of Massive Pulmonary Embolism
|
Phase 2 | |
Completed |
NCT00773448 -
Screening for Occult Malignancy in Patients With Idiopathic Venous Thromboembolism
|
N/A | |
Completed |
NCT00244725 -
Odiparcil For The Prevention Of Venous Thromboembolism
|
Phase 2 |