Clinical Trial Details
— Status: Recruiting
Administrative data
NCT number |
NCT05754606 |
Other study ID # |
4519 |
Secondary ID |
|
Status |
Recruiting |
Phase |
|
First received |
|
Last updated |
|
Start date |
November 1, 2021 |
Est. completion date |
November 1, 2025 |
Study information
Verified date |
February 2023 |
Source |
Fondazione Policlinico Universitario Agostino Gemelli IRCCS |
Contact |
Maria Raffaella Marchese |
Phone |
3391144556 |
Email |
raffaellamarchese[@]gmail.com |
Is FDA regulated |
No |
Health authority |
|
Study type |
Observational
|
Clinical Trial Summary
The development of Artificial Intelligence (AI), the evolution of voice technology,
progresses in audio signal analysis, and natural language processing/understanding methods
have opened the way to numerous potential applications of voice, such as the identification
of vocal biomarkers for diagnosis, classification or to enhance clinical practice. More
recently, researches focused on the role of the audio signal of the voice as a signature of
the pathogenic process. Dysphonia indicates that some negative changes have occurred in the
voice production. The overall prevalence of dysphonia is approximately 1% even if the actual
rates may be higher depending on the population studied and the definition of the specific
voice disorder. Voice health may be assessed by several acoustic parameters. The relationship
between voice pathology and acoustic voice features has been clinically established and
confirmed both quantitatively and subjectively by speech experts. The automatic systems are
designed to determine whether the sample belongs to a healthy subject or a non-healthy
subject. The exactness of acoustic parameters is linked to the features used to estimate them
for speech noise identification. Current voice searches are mostly restricted to basic
questions even if with broad perspectives. The literature on vocal biomarkers of specific
vocal fold diseases is anecdotal and related to functional vocal fold disorders or rare
movement disorders of the larynx . The most common causes of dysphonia are the Benign Lesions
of the Vocal Fold (BLVF). Currently, videolaryngostroboscopy, although invasive, is the gold
standard for the diagnosis of BLVF. However, it is invasive and expensive procedure. The
novel ML algorithms have recently improved the classification accuracy of selected features
in target variables when compared to more conventional procedures thanks to the ability to
combine and analyze large data-sets of voice features. Even if the majority of studies focus
on the diagnosis of a disorder where they differentiate between healthy and non-healthy
subjects, the investigators believe that the more important task is frequently differential
diagnosis between two or more diseases. Even though this is a challenging task, it is of
crucial importance to move decision support to this level. The main aim of this research
would be the study, development, and validation of ML algorithms to recognize the different
BVLVFL from digital voice recordings.
Description:
The investigators will collect the audio recordings of dysphonic participants affected by
BLVF. All voice samples will be divided into the following groups based on the endoscopic
diagnosis: vocal fold cysts, Reinke's edema, nodules and polyps. The audio tracks will be
obtained by asking to pronounce with usual voice intensity, pitch and quality the word
/aiuole/ three times in a row. Voices will be acquired using a Shure model SM48 microphone
(Evanston IL) positioned at an angle of 45° at a distance of 20 cm from the patient's mouth.
The microphone saturation input will be fixed at 6/9 of CH1 and the environmental noise was
<30 dB sound pressure level (SPL). The signals will be recorded in ".nvi" format with a
high-definition audio-recorder Computerized Speech Lab, model 4300B, from Kay Elemetrics
(Lincoln Park, NJ, USA) with a sampling rate of 50 kHz frequency and converted to ".wav"
format. Each audio file will be anonymously labelled with gender and type of BLVF.
Analysis pipeline All the following analyses will be performed using MatLab R2019b, the
MathWorks, Natick MA, USA. The analysis pipeline included signal pre-processing, features
extraction, screening of the features, and model implementation.
Features extraction On the segmented signal, 66 different features in the time, frequency,
and cepstral domain will be extracted. Then, seven statistical measures will be computed on
the extracted features, namely: mean, standard deviation, skewness, kurtosis, 25th, 50th, and
75th percentiles. In addition, jitter, shimmer, and tilt of the power spectrum will be
obtained from the whole unsegmented signal.
Features screening Features screening will be applied using biostatistical analyses on the
whole dataset, to reduce the extended number of features to give as input to the classifier.
Two statistical tests will be used to screen relevant features for the classification task:
the one-way analysis of variance (ANOVA), when all the groups were normally distributed, and
the Kruskal-Wallis test, otherwise. The groups' normality will be verified through the
Kolmogorov-Smirnov test. For all the tests, a p-value <0.05 will be considered statistically
significant.
A. Model implementation A non-linear Support Vector Machine (SVM) with a Gaussian kernel is
the algorithm chosen for this research. The classification performance will be measured
through the accuracy and the average F1-score. Both metrics will be provided for the
description of the overall classification performances and those obtained on gender
sub-groups.