×
Home Current Archive Editorial board
News Contact
Original scientific article

ADVANCING DIABETES PREDICTION THROUGH MACHINE LEARNING AND DEEP LEARNING MODELS USING PIMA INDIAN AND CLINICAL-BIOLOGICAL DATA

By
Zeeshan Hussain Orcid logo ,
Zeeshan Hussain

Jamia Hamdard University India

Suraiya Parveen Orcid logo ,
Suraiya Parveen

Jamia Hamdard University India

Ashif Khan Orcid logo ,
Ashif Khan

Jamia Hamdard University India

Ihtiram Raza Orcid logo ,
Ihtiram Raza

Jamia Hamdard University India

Umnah Orcid logo
Umnah

Jamia Millia Islamia India

Abstract

Diabetes Mellitus is a significant world health and early detection is of paramount significance since it decreases the complications and enables medical intervention in time. The paper is a comparison between the predictive accuracy of the eight Machine Learning classifiers: Logistic Regression, Support Vector Machine (SVM), Decision Tree, Random Forest, Gradient Boosting, Naive Bayes, k-Nearest Neighbors (k-NN), and an Ensemble model on the Pima Indian Diabetes dataset and a collection of clinical-biological patient records. Performance evaluation was conducted using Precision, Recall, F1-Score, and the Area Under the ROC Curve (AUC-ROC). The findings show that a significant difference was observed among the models, with SVM (AUC-ROC: 0.8648) and the Logistic Regression (AUC-ROC: 0.8638) having the best discriminative ability. A comparable study found that Logistic Regression had the highest Precision (0.7632), indicating fewer false-positive predictions, whereas Decision Tree had the highest Recall (0.7447), indicating greater sensitivity in detecting diabetes cases. The ensemble learning produced the best overall performance (AUC-ROC: 0.8709), suggesting that combining predictions from multiple models increases reliability and generalization. On the other hand, k-NN performed worst due to sensitivity to noise and the number of features. In general, the results provide evidence of the high potential of linear-margin and ensemble-based models to structured clinical data and would be a robust foundation of clinical decision support systems, which further help to broaden the role of ML-based analytics in early diabetes diagnosis and preventive health care planning.

References

1.
Taskinen M. Diabetic dyslipidaemia: from basic research to clinical practice. Diabetologia. 2003;(6):733–49.
2.
Saratha B, Radhika MS, Priya DrVS. AN APPROACH TOWARDS DIABETIC RETINOPATHY DETECTION AND ANALYSIS THROUGH COGNITIVE COMPUTING. Archives for Technical Sciences. 2025;33(2):125–34.
3.
Ganie A. Robust diabetic prediction using ensemble machine learning techniques with SMOTE. 2023;
4.
Vij P, Prashant P. Predicting aquatic ecosystem health using machine learning algorithms. International Journal of Aquatic Research and Environmental Studies. 2024;(S1):39–44.
5.
Ganie S, Malik M, Arif T. Performance analysis and prediction of type 2 diabetes mellitus based on lifestyle data using machine learning approaches. Journal of Diabetes & Metabolic Disorders. 2022;(1):339–52.
6.
Nithyalakshmi V, Sivakumar DrR, Sivaramakrishnan DrA. Automatic Detection and Classification of Diabetes Using Artificial Intelligence. International Academic Journal of Innovative Research. 2021;8(1):01–5.
7.
Sharma T, Shah M. A comprehensive review of machine learning techniques on diabetes detection. Visual Computing for Industry, Biomedicine, and Art. 2021;(1):30.
8.
Kumar V, Shah M. Multi Disease Prediction Using Deep Learning Framework for Electric Health Record. International Academic Journal of Science and Engineering. 2021;(4):24–8.
9.
Tasin I, Nabil TU, Islam S, Khan R. Diabetes prediction using machine learning and explainable AI techniques. Healthcare Technology Letters. 2022;10(1–2):1–10.
10.
Debebe B. Levels, trends and determinants of under-five mortality in Amhara Region, Ethiopia: evidence from Demographic and Health Survey (2000-2011). International Academic Journal of Social Sciences. 2016;(2):96–112.
11.
Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi M. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetology & Metabolic Syndrome. 2022;(1):196.
12.
Shin J, Lee J, Ko T, Lee K, Choi Y, Kim HS. Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness. Journal of Personalized Medicine. 2022;12(11):1899.
13.
Fomekong RL, Saruhan B. Titanium Based Materials for High-Temperature Gas Sensor in Harsh Environment Application. The 1st International Electronic Conference on Chemical Sensors and Analytical Chemistry. MDPI; 2021. p. 66.
14.
Kiran M, Xie Y, Anjum N, Ball G, Pierscionek B, Russell D. Machine learning and artificial intelligence in type 2 diabetes prediction: a comprehensive 33-year bibliometric and literature analysis. Frontiers in Digital Health. 2025;7.
15.
Qin L. A Prediction Model of Diabetes Based on Ensemble Learning. Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition. ACM; 2022. p. 45–51.
16.
Hasan R, Dattana V, Mahmood S, Hussain S. Towards Transparent Diabetes Prediction: Combining AutoML and Explainable AI for Improved Clinical Insights. Information. 2024;16(1):7.
17.
Kaliappan J, Saravana Kumar IJ, Sundaravelan S, Anesh T, Rithik RR, Singh Y, et al. Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets. Frontiers in Artificial Intelligence. 2024;7.
18.
Zhao M, Yao Z, Zhang Y, Ma L, Pang W, Ma S, et al. Predictive value of machine learning for the progression of gestational diabetes mellitus to type 2 diabetes: a systematic review and meta-analysis. BMC Medical Informatics and Decision Making. 2025;(1):18.
19.
Khokhar PB, Gravino C, Palomba F. Advances in artificial intelligence for diabetes prediction: insights from a systematic literature review. Artificial Intelligence in Medicine. 2025;164:103132.
20.
Dutta A, Hasan MdK, Ahmad M, Awal MdA, Islam MdA, Masud M, et al. Early Prediction of Diabetes Using an Ensemble of Machine Learning Models. International Journal of Environmental Research and Public Health. 2022;19(19):12378.
21.
Chowdhury P, Barua P, Uddin MN. Diabetes Prediction Using Machine Learning and Hybrid Deep Learning Ensemble Technique. 2024 IEEE International Conference on Computing, Applications and Systems (COMPAS). IEEE; 2024. p. 1–7.
22.
Yan D, Li X, Wang Y, Cai Z. Optimized prediction of diabetes complications using ensemble learning with Bayesian optimization: a cost-efficient laboratory-based approach. Frontiers in Endocrinology. 2025;16.
23.
Sethi H, Goraya A, Sharma V. Artificial Intelligence based Ensemble Model for Diagnosis of Diabetes. International Journal of Advanced Research in Computer Science. 2017;(5).
24.
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García J. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetology & metabolic syndrome. 2021;(1):148.
25.
Firdous S, Wagai GA, Sharma K. A survey on diabetes risk prediction using machine learning approaches. Journal of Family Medicine and Primary Care. 2022;11(11):6929–34.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.