Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques
Abstract
Parkinson’s disease (PD) is a neurodegenerative disorder that progressively worsens with age, particularly affecting the elderly. Symptoms of PD include visual hallucinations, depression, autonomic dysfunction, and motor difficulties. Conventional diagnostic methods often rely on subjective interpretations of movement, which can be subtle and challenging to assess accurately, potentially leading to misdiagnoses. However, recent studies indicate that over 90% of individuals with PD exhibit vocal abnormalities at the onset of the disease. Machine learning (ML) techniques have shown promise in addressing these diagnostic challenges due to their higher efficiency and reduced error rates in analyzing complex, high-dimensional datasets, particularly those derived from speech signals.
This study investigates 12 machine learning models—logistic regression (LR), support vector machine (SVM, linear/RBF), K-nearest neighbor (KNN), Naïve bayes (NB), decision tree (DT), random forest (RF), extra trees (ET), gradient boosting (GbBoost), extreme gradient boosting (XgBoost), adaboost, and multi-layer perceptron (MLP)—to develop a robust ML model capable of reliably identifying PD cases. The analysis utilized a PD voice dataset comprising 756 acoustic samples from 252 participants, including 188 individuals with PD and 64 healthy controls. The dataset included 130 male and 122 female subjects, with age ranges of 33 - 87 years and 41 - 82 years, respectively.
To enhance model performance, the GridSearchCV method was employed for hyperparameter tuning, alongside recursive feature elimination (RFE) and minimum redundancy maximum relevance (mRMR) feature selection techniques. Among the 12 ML models evaluated, the RF model with the RFE-generated feature subset (RFE-50) emerged as the top performer. It achieved an accuracy of 96.46%, a recall of 0.96, a precision of 0.97, an F1-score of 0.96, and an AUC score of 0.998, marking the highest performance metrics reported for this
dataset in recent studies.
2. Barukab O, Ahmad A, Khan T, Thayyil Kunhumuhammed MR. Analysis of Parkinson’s Disease Using an Imbalanced-Speech Dataset by Employing Decision Tree Ensemble Methods. Diagnostic J (Basel). 2022;12(12). [PubMed ID:36553007]. [PubMed Central ID:PMC9776735]. https://doi.org/10.3390/diagnostics12123000.
3. de Rijk MC, Launer LJ, Berger K, Breteler MM, Dartigues JF, Baldereschi M, et al. Prevalence of Parkinson’s disease in Europe: A collaborative study of population-based cohorts. Neurologic Diseases in the Elderly Research Group. Neurol J. 2000;54(11 Suppl 5):S21-3. [PubMed ID:10854357].
4. Ponce FA, Lozano AM. The most cited works in Parkinson’s disease. Mov Disord. 2011;26(3):380-90. [PubMed ID:21462255]. https://doi.org/10.1002/mds.23445.
5. Langston JW. Parkinson’s disease: current and future challenges. Neurotoxicol J. 2002;23(4-5):443-50. [PubMed ID:12428715]. https://doi.org/10.1016/s0161-813x(02)00098-0.
6. Brakedal B, Toker L, Haugarvoll K, Tzoulis C. A nationwide study of the incidence, prevalence and mortality of Parkinson’s disease in the Norwegian population. NPJ Parkinsons Dis. 2022;8(1):19.[PubMed ID:35236852]. [PubMed Central ID:PMC8891365]. https://doi.org/10.1038/s41531-022-00280-4.
7. Simuni T, Okun MS. Adjunctive Therapies in Parkinson DiseaseHave We Made Meaningful Progress? JAMA Neurol. 2022;79(2):119-20. [PubMed ID:34962569]. https://doi.org/10.1001/jamaneurol.2021.4140.
8. Lageman SK, Donovan EK, Villasenor T, Perrin PB. Exploration of Parkinson’s Disease Symptomatology Subtypes From the Caregiver Perspective: Implications for Caregiver Burden, Depression, and Anxiety. J Geriatr Psychiatry Neurol. 2022;35(5):663-70. [PubMed ID:34587822]. https://doi.org/10.1177/08919887211049146.
9. Li S, Ritz B, Gong Y, Cockburn M, Folle AD, Del Rosario I, et al. Proximity to residential and workplace pesticides application and the risk of progression of Parkinson’s diseases in Central California. Sci Total Environ. 2023;864:160851. [PubMed ID:36526213]. [PubMed Central ID:PMC11121507]. https://doi.org/10.1016/j.scitotenv.2022.160851.
10. Tsanas A, Little M, McSharry P, Ramig L. Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests.Natur Preced J. 2009;1. https://doi.org/10.1038/npre.2009.3920.1.
11. Rigas G, Tzallas AT, Tsipouras MG, Bougia P, Tripoliti EE, Baga D, et al. Assessment of tremor activity in the Parkinson’s disease using a set of wearable sensors. IEEE Trans Inf Technol Biomed. 2012;16(3):478-87. [PubMed ID:22231198]. https://doi.org/10.1109/TITB.2011.2182616.
12. Gasser T, Wichmann T. Parkinson disease and other synucleinopathies. Neurobiology of Brain Disorders. Cambridge: Academic Press; 2023. p. 253-74.
13. 13.Ghane M, Ang MC, Nilashi M, Sorooshian S. Enhanced decision tree induction using evolutionary techniques for Parkinson’s disease classification. Biocybernetic Biomed Eng. 2022;42(3):902-20. https://doi.org/10.1016/j.bbe.2022.07.002.
14. Liu Y, Reddy MK, Penttila N, Ihalainen T, Alku P, Rasanen O. Automatic Assessment of Parkinson’s Disease Using Speech Representations of Phonation and Articulation. IEEE/ACM Transact Audio, Speech, Lang Proc. 2023;31:242-55. https://doi.org/10.1109/taslp.2022.3212829.
15. Demir F, Siddique K, Alswaitti M, Demir K, Sengur A. A Simple and Effective Approach Based on a Multi-Level Feature Selection for Automated Parkinson’s Disease Detection. J Pers Med. 2022;12(1). [PubMed ID:35055370]. [PubMed Central ID:PMC8781034]. https://doi.org/10.3390/jpm12010055.16. Kepesiova Z, Kozak S, Ruzicky E, Zimmermann A, Malaschitz
R. Comparative Analysis of Advanced Machine Learning Algorithms for Early Detection of Parkinson Disease. 2022 Cybernetics & Informatics (K&I). 2022. p. 1-6.
17. Weintraub D, Aarsland D, Chaudhuri KR, Dobkin RD, Leentjens AF, Rodriguez-Violante M, et al. The neuropsychiatry of Parkinson’s disease: advances and challenges. Lancet Neurol. 2022;21(1):89-102. [PubMed ID:34942142]. [PubMed Central ID:PMC8800169].https://doi.org/10.1016/S1474-4422(21)00330-6.
18. Pramanik M, Pradhan R, Nandy P, Qaisar SM, Bhoi AK. Assessment of Acoustic Features and Machine Learning for Parkinson’s Detection. J Healthc Eng. 2021;2021:9957132. [PubMed ID:34471507]. [PubMed Central ID:PMC8405321]. https://doi.org/10.1155/2021/9957132.19. Skrabal D, Rusz J, Novotny M, Sonka K, Ruzicka E, Dusek P, et
al. Articulatory undershoot of vowels in isolated REM sleep behavior disorder and early Parkinson’s disease. NPJ Parkinsons Dis. 2022;8(1):137. [PubMed ID:36266347]. [PubMed Central ID:PMC9584921]. https://doi.org/10.1038/s41531-022-00407-7.
20. Maskeliūnas R, Damaševičius R, Kulikajevas A, Padervinskis E, Pribuišis K, Uloza V. A Hybrid U-Lossian Deep Learning Network for Screening and Evaluating Parkinson’s Disease. Appl Sci J. 2022;12(22). https://doi.org/10.3390/app122211601.
21. Moro-Velazquez L, Gomez-Garcia JA, Arias-Londoño JD, Dehak N, Godino-Llorente JI. Advances in Parkinson’s Disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects. Biomed Signal Proc Control. 2021;66. https://doi.org/10.1016/j.bspc.2021.102418.
22. Suphinnapong P, Phokaewvarangkul O, Thubthong N, Teeramongkonrasmee A, Mahattanasakul P, Lorwattanapongsa P, etal. Objective vowel sound characteristics and their relationship with motor dysfunction in Asian Parkinson’s disease patients. J Neurol Sci. 2021;426:117487. [PubMed ID:34004464]. https://doi.org/10.1016/j.jns.2021.117487.
23. Lamba R, Gulati T, Alharbi H, Jain A. A hybrid system for Parkinson’s disease diagnosis using machine learning techniques. Int J Speech Technol. 2022;25:1-11. https://doi.org/10.1007/s10772-021-09837-9.
24. Baker S, Tekriwal A, Felsen G, Christensen E, Hirt L, Ojemann SG, et al. Automatic extraction of upper-limb kinematic activity using deep learning-based markerless tracking during deep brain stimulation implantation for Parkinson’s disease: A proof of concept study. PLoS One. 2022;17(10):e0275490. [PubMed ID:36264986]. [PubMed Central ID:PMC9584454]. https://doi.org/10.1371/journal.pone.0275490.
25. Bohra A, Vasudevan A, Kutaiba N, Van Langenberg DR. Challenges and Strategies to Optimising the Quality of Small Bowel Magnetic Resonance Imaging in Crohn’s Disease. Diagnostics (Basel). 2022;12(10). [PubMed ID:36292222]. [PubMed Central ID:PMC9600769]. https://doi.org/10.3390/diagnostics12102533.
26. Duffy JR. Motor Speech Disorders: Substrates, Differential Diagnosis, and Management. 3rd ed. Amsterdam: Elsevier 2013.
27. Darley FL, Aronson AE, Brown JR. Differential diagnostic patterns of dysarthria. J Speech Hear Res. 1969;12(2):246-69. [PubMed ID:5808852]. https://doi.org/10.1044/jshr.1202.246.
28. Blanchet P, Snyder G. Speech rate deficits in individuals with Parkinson’s disease: A review of the literature. J Med Speech-Lang Pathol. 2009;17:1-7.
29. Ackermann H, Ziegler W. Articulatory deficits in parkinsonian dysarthria: an acoustic analysis. J Neurol Neurosurg Psychiatry. 1991;54(12):1093-8. [PubMed ID:1783924]. [PubMed Central ID:PMC1014687]. https://doi.org/10.1136/jnnp.54.12.1093.
30. Rana A, Dumka A, Singh R, Panda MK, Priyadarshi N. A Computerized Analysis with Machine Learning Techniques for the Diagnosis of Parkinson’s Disease: Past Studies and Future Perspectives. Diagnostics (Basel). 2022;12(11). [PubMed ID:36359550]. [PubMed Central ID:PMC9689408]. https://doi.org/10.3390/diagnostics12112708.
31. Painuli D, Bhardwaj S, Kose U. Recent advancement in cancer diagnosis using machine learning and deep learning techniques: A comprehensive review. Comput Biol Med. 2022;146:105580. [PubMed ID:35551012]. https://doi.org/10.1016/j.compbiomed.2022.105580.
32. NeeluJyoti DM, Painuli AD. Fuzzy Expert System to diagnose Psoriasis Disease. Int J Comput Sci Inform Secur. 2018;16(9):33-8.
33. Gunduz H. An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson’s disease classification. Biomed Signal Proc Control. 2021;66. https://doi.org/10.1016/j.bspc.2021.102452.
34. Painuli D, Mishra D, Bhardwaj S, Aggarwal M. Machine Learning based Model to combat Covid19. Int J Information Technol Electric Eng. 2020;9(4):33-40.
35. Kaur S, Aggarwal H, Rani R. Diagnosis of Parkinson’s disease using deep CNN with transfer learning and data augmentation. Multimedia Tools App. 2020;80(7):10113-39. https://doi.org/10.1007/ s11042-020-10114-1.
36. Holmes RJ, Oates JM, Phyland DJ, Hughes AJ. Voice characteristics in the progression of Parkinson’s disease. Int J Lang Commun Disord. 2000;35(3):407-18. [PubMed ID:10963022]. https://doi.org/10.1080/136828200410654.
37. Sakar C, Serbes G, Gunduz A, Nizam H, Sakar B. Parkinson’s Disease Classification Dataset. UCI Machine Learning Repository; 2018. Available from: https://archive.ics.uci.edu/dataset/470/parkinson+s+disease+classification.
38. Izonin I, Tkachenko R, Shakhovska N, Ilchyshyn B, Singh KK. A Two-Step Data Normalization Approach for Improving Classification Accuracy in the Medical Diagnosis Domain. Mathematic J. 2022;10(11). https://doi.org/10.3390/math10111942.
39. Albert AJ, Murugan R, Sripriya T. Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology. Res Biomed Eng. 2022;39(1):99-113. https://doi.org/10.1007/s42600-022-00253-9.
40. Mishra D, D. P. Rule Based Expert System for Medical Diagnosis-A Review. Int J Eng Technol, Manag, Appl Sci. 2016;4(12):167-72.
41. Choi H, Ha S, Im HJ, Paek SH, Lee DS. Refining diagnosis of Parkinson’s disease with deep learning-based interpretation of dopamine transporter imaging. Neuroimage Clin. 2017;16:586-94. [PubMed ID:28971009]. [PubMed Central ID:PMC5610036]. https://doi.org/10.1016/j.nicl.2017.09.010.
42. Abdelwahed NM, El-Tawel GS, Makhlouf MA. Effective hybrid feature selection using different bootstrap enhances cancers classification performance. BioData Min. 2022;15(1):24. [PubMed ID:36175944]. [PubMed Central ID:PMC9523996]. https://doi.org/10.1186/s13040-022-00304-y.
43. Ji Y, Li J, Huang Z, Xie W, Zhao D. A Data Dimensionality Reduction Method Based on mRMR and Genetic Algorithm for HighDimensional Small Sample Data. Web Information Systems and Applications. Berlin: Springer 2022. p. 485-96.
44. Signol F, Arnal L, Navarro-Cerdan JR, Llobet R, Arlandis J, PerezCortes JC. SEQENS: An ensemble method for relevant gene identification in microarray data. Comput Biol Med. 2023;152:106413.
Files | ||
Issue | Vol 9 No 1 (2025) | |
Section | Articles | |
DOI | https://doi.org/10.18502/htaa.v9i1.17863 | |
Keywords | ||
Medical Diagnosis Parkinson's disease Machine Learning Data preprocessing Feature selection GridSearchCV |
Rights and permissions | |
![]() |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |