Mathematical Foundations of Diabetes Forecasting Studies:
A Comparative Analysis of Statistics and ML Models

Pakgohar, Alireza

doi:10.18502/ijdo.v18i2.21648

Volume 18, Issue 2 (6-2026) IJDO 2026, 18(2): 113-124 | Back to browse issues page

‎ 10.18502/ijdo.v18i2.21648

Mendeley

Zotero

RefWorks

Pakgohar A. Mathematical Foundations of Diabetes Forecasting Studies: A Comparative Analysis of Statistics and ML Models. IJDO 2026; 18 (2) :113-124
URL: http://ijdo.ssu.ac.ir/article-1-1038-en.html

Mathematical Foundations of Diabetes Forecasting Studies: A Comparative Analysis of Statistics and ML Models

Alireza Pakgohar ^*

Department of Statistics, Payame Noor University, Tehran, Iran.

Abstract: (333 Views)

This study provides a systematic comparison of the mathematical properties, strengths, and limitations of traditional statistical methods and machine learning models in diabetes forecasting. While classical approaches like logistic regression and ANOVA offer interpretability and simplicity, their reliance on linear assumptions and sensitivity to heteroscedasticity limit their utility in modeling complex, nonlinear relationships inherent in diabetes data. In contrast, machine learning techniques-including neural networks, random forests, and gradient boosting-excel in capturing high-dimensional interactions and nonlinear dynamics, achieving superior predictive accuracy. However, these gains come at the cost of computational complexity, black box interpretability challenges, and ethical concerns around algorithmic bias. Through a detailed analysis of mathematical frameworks (e.g., activation functions, regularization, ensemble methods), we demonstrate how hybrid approaches integrating explainable AI (XAI) can bridge the gap between statistical rigor and clinical usability. Our findings highlight the critical trade-offs between model interpretability, predictive power, and scalability, offering actionable insights for optimizing diabetes risk prediction in precision medicine.

Keywords: Statistical modeling, Machine learning, Predictive analytics, Sample size, Big data

Full-Text [PDF 869 kb] (109 Downloads)

Type of Study: Research | Subject: Special
Received: 2026/03/22 | Accepted: 2026/05/17 | Published: 2026/06/1

References

1. Ley C, Martin RK, Pareek A, Groll A, Seil R, Tischer T. Machine learning and conventional statistics: making sense of the differences. Knee Surgery, Sports Traumatology, Arthroscopy. 2022;30(3):753-7. [DOI:10.1007/s00167-022-06896-6]

2. Chen Z, Huang C, Zhou Z, Zhang Y, Xu M, Tang Y, et al. A nonlinear associations of metabolic score for insulin resistance index with incident diabetes: A retrospective Chinese cohort study. Frontiers in Clinical Diabetes and Healthcare. 2023;3:1101276. [DOI:10.3389/fcdhc.2022.1101276]

3. Alanazi BS. A comparative study of traditional statistical methods and machine learning techniques for improved predictive models. International Journal of Analysis and Applications. 2025;23:18. [DOI:10.28924/2291-8639-23-2025-18]

4. Pakgohar A, Saffarzadeh M, Khalili M. The survey role of humanistic factor in incidence and intensity of road accident based on Logistic Regression and CART. Tehran: Applied Research Office of Traffic Police. 2008;13(5):49-66.‏

5. Pakgohar A, Khalili M, Safarzadeh M. Road traffic accident reduction via GLM, CRT, LR regression models. 2010;12(146):77-106.(in Persian)

6. Tasin I, Nabil TU, Islam S, Khan R. Diabetes prediction using machine learning and explainable AI techniques. Healthcare technology letters. 2023;10(1-2):1-0. [DOI:10.1049/htl2.12039]

7. GR A, Mary X A, George S T, Sagayam K M, Fernandez-Gamiz U, Günerhan H, et al. Analysis of diabetes disease using machine learning techniques: A review. Journal of Information Technology Management. 2023;15(4):139-59.

8. Wooldridge JM. Econometric analysis of cross section and panel data MIT press. Cambridge, ma. 2002;108(2):245-54.‏

9. Obster F, Ciolacu MI, Humpe A. Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction. IEEE Access. 2024;12:195613-28. [DOI:10.1109/ACCESS.2024.3521242]

10. Barbierato E, Gatti A. The challenges of machine learning: A critical review. Electronics. 2024;13(2):416.‏ [DOI:10.3390/electronics13020416]

11. Friedman JH, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software. 2010;33:1-22. [DOI:10.18637/jss.v033.i01]

12. Harrell FE. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. New York: springer; 2001. [DOI:10.1007/978-1-4757-3462-1]

13. Pakgohar A, Khalili M. Investigation of sample size in qualitative sampling methods. Popularization of Science. 2021;12(1):270-97.

14. Pakgohar A. The Role of Sample Size on Interpretation of the Result in Applied Research A Study on the Analysis of Regression Models. Methodology of Social Sciences and Humanities. 2023;29(114):19-34.

15. Pakgohar A. Sample Size calculation based on research Approaches in Animal Studies. Journal of Biostatistics and Epidemiology. 2023;9(4):474-83. [DOI:10.18502/jbe.v9i4.16672]

16. Pakgohar A, Mehrannia H. Sample size calculation in clinical trial and animal studies. Iranian Journal of diabetes and Obesity.2024,16(1): 42-50. [DOI:10.18502/ijdo.v16i1.15241]

17. Pakgohar, A, and Fazli M.A. (2024) Artificial Intelligence, Statistics and Big Data in Medicine and Healthcare. 5th International Conference on Software Computing.(in Persian). https://civilica.com/doc/1966976/

18. Akinrinmade AO, Adebile TM, Ezuma-Ebong C, Bolaji K, Ajufo A, Adigun AO, et al. Artificial intelligence in healthcare: perception and reality. Cureus. 2023;15(9):e45594. [DOI:10.7759/cureus.45594]

19. Guan Z, Li H, Liu R, Cai C, Liu Y, Li J, et al. Artificial intelligence in diabetes management: advancements, opportunities, and challenges. Cell Reports Medicine. 2023;4(10):101213. [DOI:10.1016/j.xcrm.2023.101213]

20. Smith, J. K. Advanced statistical methods in healthcare research. Oxford University Press. 2020. https://link.springer.com/book/10.1007/978-981-15-8210-3

21. Zabbah I, Eskandari A, Sardari Z, Noghandi A. Diagnosis of diabetes using artificial neural network and neuro-fuzzy approach. Journal of Health and Biomedical Informatics .2018;5(2):274-85.(in Persian)

22. Bukhari MM, Alkhamees BF, Hussain S, Gumaei A, Assiri A, Ullah SS. An improved artificial neural network model for effective diabetes prediction. Complexity. 2021;2021(1):5525271.‏ [DOI:10.1155/2021/5525271]

23. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206-215. [DOI:10.1038/s42256-019-0048-x]

24. Johnson, A. E. W., Pollard, T. J., Shen, L., Li-Wei, H. L., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035. [DOI:10.1038/sdata.2016.35]

25. Char, D. S., Shah, N. H., & Magnus, D. (2018). Implementing machine learning in health care-addressing ethical challenges. The New England Journal of Medicine, 378(11), 981-983. [DOI:10.1056/NEJMp1714229]

26. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nature medicine. 2019;25(1):44-56. [DOI:10.1038/s41591-018-0300-7]

27. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning: with applications in R. New York: springer; 2013. [DOI:10.1007/978-1-4614-7138-7]

28. Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, Tarantola S. Global sensitivity analysis: the primer. John Wiley & Sons; 2008. [DOI:10.1002/9780470725184]

29. Little RJ, Rubin DB. Statistical analysis with missing data. John Wiley & Sons; 2019. [DOI:10.1002/9781119482260]

30. Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman and Hall/CRC; 1994. [DOI:10.1201/9780429246593]

31. Bishop CM, Nasrabadi NM. Pattern recognition and machine learning. New York: springer; 2006. [DOI:10.1007/978-0-387-31073-2]

32. Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, et al. Apache spark: a unified engine for big data processing. Communications of the ACM. 2016;59(11):56-65. [DOI:10.1145/2934664]

33. Maaten LV, Hinton G. Visualizing data using t-SNE. Journal of machine learning research. 2008;9:2579-605. https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf.