Clinical Trial Details
— Status: Completed
Administrative data
NCT number |
NCT06266325 |
Other study ID # |
6138 |
Secondary ID |
|
Status |
Completed |
Phase |
|
First received |
|
Last updated |
|
Start date |
April 1, 2010 |
Est. completion date |
December 31, 2022 |
Study information
Verified date |
March 2024 |
Source |
University of Toronto |
Contact |
n/a |
Is FDA regulated |
No |
Health authority |
|
Study type |
Observational
|
Clinical Trial Summary
Individuals with dementia and their caregivers are faced with challenging decisions
throughout the course of the disease. These decisions may be about medical care (e.g.,
continuation of routine cancer screening, pursuit of cardiopulmonary resuscitation,
initiation of palliative care services), institutionalization (i.e., transition to a
long-term care facility), or financial planning. These inherently difficult decisions are
made more difficult by prognostic uncertainty. Indeed, life expectancy is challenging to
predict in dementia. Consequently, prognosis is infrequently discussed by healthcare
providers with individuals with dementia and their families, which compromises their ability
to plan for the future. A lack of prognostic awareness makes it difficult for patients, their
caregivers, and their healthcare providers to make medical decisions that strike the
appropriate balance between prolonging life and promoting the quality of it. A clinical
prediction tool has the promise to provide personalized and accurate estimations of life
expectancy in individuals with dementia. Therefore, similar to the existing clinical
prediction tools on our Project Big Life platform (www.projectbiglife.ca), we seek to create
and to test a statistical model to predict survival, and to implement the model as a
user-friendly, web-based calculator. The calculator will use self-reported sociodemographic,
clinical, cognitive, functional, and nutritional information that is entered by patients,
their caregivers, and/or their healthcare providers to output an estimated life expectancy.
This estimate could inform the shared decision-making process, thereby empowering decisions
that are compatible with a patient's clinical reality and concordant with their life goals.
Description:
Analysis plan
The analysis plan was informed by guidelines for clinical prediction modelling. The plan was
developed after accessing the derivation dataset but before assessing predictor-outcome
associations and model fitting. Key considerations are full pre-specification of the model,
including selection of predictors, such that data-driven variable selection will be avoided.
This will decrease the risk of bias and overfitting in the model. Second, continuous
variables will be specified as restricted cubic splines with knots at fixed quantiles, such
that categorization of continuous variables will be avoided. This will respect the non-linear
nature of continuous variables, and will avoid the inefficiency and bias associated with
categorization. Third, emphasis will be placed on the assessment of the model's calibration,
not only in the validation cohort but also in subgroups of meaning to clinicians and
policymakers. Statistical analysis will be performed using SAS Enterprise Guide V.9.4.
Validation will be performed using temporal validation, whereby the model's performance will
be evaluated in a temporally distinct (more recent) cohort of individuals with dementia. This
is a more rigorous form of validation compared to internal validation, which includes random
splitting or resampling (bootstrapping, cross-validation). Whereas temporal validation
evaluates transportability, internal validation evaluates only reproducibility. The size of
the derivation cohort and the expected number of events therein enables temporal validation
without significantly increasing the risk of overfitting.
Predictor variables
The candidate predictor variables were fully pre-specified, such that data-driven variable
selection was avoided. The investigators reviewed variables in the home care databases to
identify predictors. In addition, existing reviews of prognostic models in dementia were
explored. Variables were reviewed by the research team in an itemized way to determine which
to include in the initial model.
Notably, predictor values from only a single randomly selected assessment after dementia
diagnosis (index assessment) will be included in the model. The investigators did not include
values from subsequent assessments since the tool would be applied cross-sectionally, not
longitudinally. Indeed, the team wants to avoid using values from subsequent assessments,
which would not have been known at the time of the randomly selected assessment. The
variables in our model will be organized in the following categories: sociodemographic,
clinical (comorbidities, treatment), caregiver-specific, functional, nutritional, cognitive,
psychological/behavioural, home care, healthcare utilization, and assessment-specific
information. The investigators will include interactions between age and variables that
represent comorbidities since the association of these and life expectancy may vary with age.
A linear term of age, not a restricted cubic spline thereof, will be used in interactions.
Outcome variable
The outcome variable will be survival time from the index assessment up to the maximum
follow-up date (December 31st, 2022). Mortality will be discerned from the Registered Persons
Database, which houses a historical listing of all individuals eligible for the Ontario
Health Insurance Program, including sociodemographic (e.g., age, sex, postal code) and vital
information (e.g., date of death). The investigators have pre-specified survival times of
interest, which are compatible with current eligibility guidelines for specialist palliative
care services (i.e., 3, 6, and 12 months).
Model specification
Predictor variables will be explored before assessing predictor-outcome associations or model
fitting. Continuous variables will be explored using descriptive statistics and boxplots, and
categorical variables using descriptive statistics and frequency distributions. Any
identified invalid values will be corrected, if possible, or set to missing otherwise.
Continuous variables will be specified using restricted cubic splines with knots at fixed
quantiles (e.g., in a 5-knot spline, quantiles are placed at the 5th, 27.5th, 50th, 72.5th,
and 95th percentiles). Categorization of continuous variables will be avoided since this is
associated with inefficiency and bias and does not respect the non-linear nature of
continuous variables. Combination of levels of a categorical variable will be avoided unless
a category has a very low proportion of total observations. Variables with a high degree of
missing values or insufficient variation will be excluded. Multi-collinearity will be
evaluated using variable clustering (VARCLUS function in SAS). The minimum proportion of
variance explained by a cluster (eigenvalue) will be set to 0.7.
Missing values will be imputed using multiple imputation so long as missingness was judged to
have been completely at random or at random. Despite its simplicity, complete case analysis
will be avoided to prevent the inefficiency and bias associated with this method. The
imputation model will include the outcome variable, predictor variables, and auxiliary
variables (i.e., variables that are not included in the full model but that could inform the
missing value of a variable). The number of imputed datasets will be based on the proportion
of missing values in the dataset. The final model will be estimated in each of the imputed
datasets. The parameter estimates based on each dataset will be combined using Rubin's rules,
which integrate the uncertainty associated with imputation in the final parameter estimates.
Model estimation
The model will be estimated in the derivation cohort using a Cox proportional hazards
regression. The assumption of proportional hazards will be checked visually by examining
plots of Schoenfeld residuals versus time, and statistically by adding time-interacted
predictors to the model. If the assumption is violated, then the investigators will consider
the addition of time-interacted predictors to the model.
Considering the high ratio of expected events to degrees of freedom and the avoidance of
data-driven variable selection, the risk of overfitting is judged to be low. However, this
will be assessed statistically using the heuristic shrinkage estimator [(likelihood ratio
Chi-square of the model - degree of freedom of the model)/likelihood ratio Chi-square of the
model]. If this is <0.90, then the model will require adjustment for overfitting (e.g., by
pursuing a variable reduction method or by applying shrinkage coefficients to the parameter
estimates). Overfitting will also be assessed visually using the calibration curve.
Since the intention is to apply our model as a manual web-based calculator that could be used
by healthcare providers, caregivers, and patients, the investigators will estimate a reduced
model that seeks to optimize parsimony without a significant decrease in model performance.
Indeed, the initial model may be too complex, labour-intensive, and time-consuming to be
implemented. The reduced model will be estimated using the stepdown method, whereby
sequentially, the variable with the lowest Wald Chi-square will be removed from the model
until a minimally acceptable model performance is achieved. The reduced model will be
compared to the initial model using Akaike's Information Criterion and measures of
discrimination and calibration. The investigators will consider least absolute shrinkage and
selection operator (LASSO), since it could result in the shrinkage of some regression
coefficients to 0, thereby reducing the model. In addition to statistical means of model
reduction, the investigators will consider the clinical relevance of the variable based
existing literature and content expertise, in addition to the ability of patients and their
caregivers to assess and input the variable.
The model will be developed and validated using temporally split samples; however, the final
regression coefficients will be based on the full sample. The final model will have the same
specifications as the derivation model.
Model performance
The model's performance will be assessed in the validation cohort in multiple domains.
Specifically, it will be assessed in terms of overall performance, as measured by
Nagelkerke's R2, which is a measure of the proportion of variability in the outcome that is
explained by the model. Historically, clinical prediction tools have had R2 that ranged from
0.2 to 0.3. The model will also be assessed in terms of discrimination, as measured by the
concordance (c) statistic and visualized by the receiver operating characteristic curve. The
c statistic ranges from 0.5, which represents no discriminative ability, to 1.0, which
represents perfect discriminative ability.
Finally, the model will be assessed in terms of calibration. This will be evaluated visually
using the calibration curve of predicted versus observed mortality based on Kaplan Meier
estimates at the abovementioned pre-specified survival times (3, 6, and 12 months). A
perfectly calibrated model is represented by a 45-degree line with an intercept of 0 and a
slope of 1. The calibration curve informs whether the model systematically over- or
underestimates mortality risk (mean calibration or calibration-in-the-large) and whether it
provides extreme predictions of mortality risk (i.e., underestimates risk in low-risk
individuals and overestimates risk in high-risk individuals), which suggests overfitting. The
mean relative difference between observed and predicted mortality risk will be calculated. An
acceptable difference is <20% when the event rate is <=5%. Finally, to enable comparison to
other prognostic models in community-dwelling individuals with dementia, the investigators
will calculate the Integrated Calibration Index, the mean absolute difference between
observed and predicted mortality risk; E50, the median absolute difference; and E90, the 90th
percentile of absolute difference. Goodness-of-fit will not be measured by the
Hosmer-Lemeshow statistic or its equivalent in a Cox proportional hazards model; these tests
cannot provide a magnitude of miscalibration or determine whether miscalibration is present
in only specific ranges of predicted mortality risk.
Calibration will also be assessed in decile groups based on predicted mortality risk
(moderate calibration). Finally, subgroups of meaning to clinicians and policymakers will be
pre-specified (e.g., defined by age, sex, comorbidities), in which calibration will be
assessed. A calibration graph will be visualized and a mean relative difference will be
calculated in each subgroup. Considering that individuals who underwent their randomly
selected assessment in the hospital may be systematically different than those who underwent
their assessment in the community, the investigators will specifically assess the model's
performance in individuals who underwent an in-hospital assessment.
Model presentation
The final regression model, based on the total sample, will be presented using hazard ratios
and associated 95% confidence intervals. The regression formula will be published online and
be the basis for web-based implementation. Specifically, the model will be converted into a
publicly accessible web-based manual calculator on www.projectbiglife.com, which houses
multiple clinical prediction tools developed by our team. The tool could be used not only by
healthcare providers, but also by patients and caregivers, to calculate life expectancy.
Considering this, a team of web developers, web designers, implementation scientists,
patients and caregivers, and clinicians will inform implementation to make the tool
user-friendly and to make its output interpretable. The model interface and output may differ
depending on whether a clinician or a patient/caregiver is using the tool. The investigators
will respect the uncertainty associated with the output of the tool, by including
interquartile ranges that transparently reflect prognostic uncertainty.