Clinical Trial Details
— Status: Completed
Administrative data
NCT number |
NCT04643665 |
Other study ID # |
1111111111111 |
Secondary ID |
|
Status |
Completed |
Phase |
|
First received |
|
Last updated |
|
Start date |
January 1, 2012 |
Est. completion date |
October 5, 2020 |
Study information
Verified date |
November 2020 |
Source |
Hopital Foch |
Contact |
n/a |
Is FDA regulated |
No |
Health authority |
|
Study type |
Observational
|
Clinical Trial Summary
The thundering evolution of lung transplantation management during the past ten years and
primary graft dysfunction (PGD) new definition have led to new predictive factors of PGD.
Therefore, we retrospectively analyzed a monocentric database using a machine-learning
method, to determine the predictive factors of grade 3 PGD (PGD3), defined as a PaO2/FiO2
ratio < 200 or being under extracorporeal membrane oxygenation (ECMO) at postoperative day 3.
We included all double lung transplantation from 2012 to 2019 and excluded multi-organ
transplant, cardiopulmonary bypass, or repeated transplantation during the study period for
the same patient. Recipient, donor and intraoperative data were added in a gradient boosting
algorithm step-by-step according to standard transplantation stages. Dataset will be split
randomly as 80% training set and 20% testing set. Relationship between predictive factors and
PGD3 will be represented as ShHapley Additive exPlanation (SHAP) values.
Description:
The standardized anesthetic management has been previously described 18 and is detailed on
the web site http://anesthesie-foch.org/protocoles-anesthesie/ ("The Foch lung transplant
anesthesia protocol").
Continuous variables are presented as median + interquartile range (IQR) or mean and 95%CI,
and were compared using independent T-test or Mann-Whitney test. Categorical variables are
presented as n (%) and were compared using Chi-squared test or Fisher's exact test. We
applied machine learning algorithm to predict 3-day ahead primary graft dysfunction after
lung transplant surgery among patients. Machine learning is a branch of artificial
intelligence where computer systems can learn from available data and identify patterns with
minimal human intervention. Machine learning algorithm tests on data and performance metrics
were used to obtain the higher performing algorithm. In this study, we performed a XGBoost
(Gradient Boosting) algorithm which was a combination of decisions trees. Each decision tree
typically learned from its precursor and passed on the improved function to the following.
The weighted combination of these trees provided the prediction.
No particular data transformation has been performed on numerical variables. Categorical
variables have been encoded as integer, without any further pre-processing steps. In
particular, no specific processing has been performed to deal with missing data. The default
behavior of XGBoost has been used. It consists in treating missing data as a specific
modality. During the training step of XGBoost models, missing values are treated as other
values, and left or right decisions at any branch of a tree are learned by optimizing the
outcome.
In order to reflect the sequential nature of this predictive medicine problem, nine steps
have been defined to take into account incrementally observed variables acquired at various
stages of the surgery.
Step 1: recipient variables Step 2: donor variables Step 3: arrival in the OR Step 4: after
anesthetic induction Step 5: during first pneumonectomy Step 6: after first graft
implantation Step 7: second pneumonectomy Step 8: second graft implantation Step 9: end
surgery status At each of the nine steps, a cross-validation procedure is employed to assess
the predictive performance of a machine learning model (XGBoost). One repetition of the
cross-validation procedure is designed as follows: the dataset of subjects is randomly split
into eight disjoint parts. Successively, the performance of the XGBoost model on each of the
eight subset,while training the machine learning model using the remaining seven subsets. For
such a repetition, the predictive probability of 3-day ahead primary graft dysfunction for
each subject is retained to finally compute the area under ROC (receiving operator curve). To
evaluate the variability of the predictive performance of the machine learning model, this
cross-validation procedure is repeated fifty times, with randomly chosen subjects partitions.
For each of the fifty times eight times nine (repetitions, partitions, surgical steps), hence
3600 models training, a conservative approach has been adopted for XGBoost training,
consisting in a unique set of training parameters. These parameters have been chosen to
prevent overfitting due to a relatively small number of subjects compared to the number of
variables, especially categorical variables, which yield a high degree of freedom.
Specifically XGBoost has been trained for 400 rounds (no early stopping), a maximum depth of
5 for each tree, a minimum child weight of 3, and a learning parameter eta equals to 0.0002.
Besides those conservative parameters chosen to prevent overfitting, only 40 percents of
available columns are selected for tree construction at each round, and 95 % of subjects.
These parameters have been kept fixed and chosen to ensure stability of results. Small
perturbations around these values could result in local performance improvements, but would
not be practically chosen given the size of the dataset.
In order to gain some insights into the most useful variables in terms of predictive power,
we then conducted a post-hoc analysis based on the following methodology: at each surgical
step, 400 models have been trained for the repeated cross-validation procedure. For each
model, we retain the rank of each variable as given by the variable importance procedure of
XGBoost. The average rank of each variable for each step is then computed by averaging the
ranks obtained by variables for each of the 400 models. At step 9, variables are ordered
based on their average rank (increasing average ranks). They are then incrementally used as
input of a new cross-validation procedure (repeated 20 times).