Clinical Trial Details
— Status: Completed
Administrative data
NCT number |
NCT05119465 |
Other study ID # |
19400_21052021 |
Secondary ID |
|
Status |
Completed |
Phase |
|
First received |
|
Last updated |
|
Start date |
November 1, 2019 |
Est. completion date |
June 30, 2021 |
Study information
Verified date |
May 2023 |
Source |
Aristotle University Of Thessaloniki |
Contact |
n/a |
Is FDA regulated |
No |
Health authority |
|
Study type |
Observational
|
Clinical Trial Summary
Since the beginning of the COVID-19 pandemic, 195 million people have been infected and 4.2
million have died from the disease or its side-effects. Physicians, healthcare scientists and
medical staff continuously try to deal with overloaded hospital admissions, while in
parallel, they try to identify meaningful correlations between the severity of infected
patients with their symptoms, comorbidities and biomarkers. Artificial Intelligence (AI) and
Machine Learning (ML) have been used recently in many areas related to COVID-19 healthcare.
The main goal is to manage effectively the wide variety of issues related to COVID-19 and its
consequences. The existing applications of ML to COVID-19 healthcare are based on supervised
classification which require a labeled training dataset, serving as reference point for
learning, as well as predefined classes. However, the existing knowledge about COVID-19 and
its consequences is still not solid and the points of common agreement among different
scientific communities are still unclear.
Therefore, this study aimed to follow an unsupervised clustering approach, where prior
knowledge is not required (tabula rasa).
More specifically, 268 hospitalized patients at the First Propaedeutic Department of Internal
Medicine of AHEPA University Hospital of Thessaloniki were assessed in terms of 40 clinical
variables (numerical and categorical), leading to a high-dimensionality dataset.
Dimensionality reduction was performed by applying Principal Component Analysis (PCA) on the
numerical part of the dataset and Multiple Correspondence Analysis (MCA) on the categorical
part of the dataset. Then, the Bayesian Information Criterion(BIC) was applied to Gaussian
Mixture Models (GMM) in order to identify the optimal number of clusters, under which, the
best grouping of patients occurs.
The proposed methodology identified 4 clusters of patients with similar clinical
characteristics. The analysis revealed a cluster of asymptomatic patients that resulted in
death at a rate of 23.8%.
This striking result forces us to reconsider the relationship between the severity of
COVID-19 clinical symptoms and patient's mortality.
Description:
An algorithmic pipeline based on unsupervised machine learning algorithms, which aims to
operate in tandem with physicians and provide additional knowledge for the proper
categorization of COVID-19 infected patients based on their severity, is proposed in this
study. Data from patients hospitalized in our clinic are collected and stored in separate
Microsoft Excel files (.xlsx), which are loaded into memory. A script is concatenating them
all into a single dataframe where they are checked for NaN (Not a Number) values. Because of
the nature of the data, patients with missing information are discarded entirely from the
dataset, since information inference would be a biased practice for the particular
application. Next, we apply data normalization by scaling all numerical variables between the
(0,1) range, so that the range of all numerical variables is the same, and any bias towards
certain variables is avoided .A thorough and detailed data collection process was designed in
order to collect information for the patients, without disturbing the clinical treatment, or
upsetting them in the process.