Artificial intelligence and management of diabetes mellitus
Welcome to Health Connect, the podcast for health professionals where we will share the latest news and information on science and technology in the medical industry. In this episode we will talk about some of the applications of artificial intelligence in the management of patients with diabetes mellitus.
Did you know that diabetes mellitus is associated with an increased risk of various cancers, especially gastrointestinal cancers and female-specific cancers?1
Today we will present how artificial intelligence can be applied to the management of diabetes, in particular on the prediction of the presence of pain in patients with diabetic peripheral neuropathy and in the detection of gestational diabetes.
As it is well known, diabetes mellitus is a common, albeit potentially devastating, medical condition that has increased in prevalence over the past few decades to constitute a major public health challenge of the twenty-first century.4 The complications that have traditionally been associated with diabetes include macrovascular conditions, such as coronary heart disease, stroke, and peripheral arterial disease, as well as microvascular conditions, including diabetic kidney disease, retinopathy, and peripheral neuropathy.1
Notably, one serious complication of diabetes is peripheral neuropathy; this damage to the nerves is so common that it can be present in 29 to 49 percent of people with diabetes. This translates to a global prevalence of about 200 million people living with diabetic peripheral neuropathy, and half of them will develop chronic neuropathic pain.2
Painful diabetic peripheral neuropathy, or painful DPN, as we will refer to it from now on, is characterized by chronic pain that is most severe in the feet, but can extend to involve the legs, hands, and arms. It has also been strongly associated with poor quality of life and psychological comorbidities, like depression and anxiety disorders.2
However, the mental health burden resulting from painful DPN remains under-recognized and under-treated. The management of the pain is complicated by several challenges, such as the current under-diagnosis, the inadequate treatment options, and the lack of knowledge about why some patients with neuropathy develop pain and others do not.2
The research on this matter suggests that the pathophysiology is most likely a complex interaction of genetic, environmental and psychological factors; where multiple fundamental neurobiology mechanisms are thought to underlie neuropathic pain including hyper-excitability, maladaptive structural plasticity and pro-inflammatory processes within both the peripheral and central nervous system.2
In fact, in order to improve the treatment of this condition and the associated comorbidities, it is essential to develop a better understanding of its pathophysiology and risk factors. While there have been successful predictions of risk for diabetes and its complications, psychological factors have generally not been considered and the distinction between painful and painless neuropathy was not considered important in previous studies.2
In this context, Dr. Georgios Baskosos and colleagues, from the University of Oxford in the United Kingdom, published in the BMC Medical Informatics and Decision Making journal a research that aims to use diverse machine learning models to predict painful versus painless diabetic neuropathy.2
The researchers used information derived from the DOLORisk project, which contains large datasets whose main characteristic is the harmonization of the data collected from different centers, which facilitates the estimation of the models’ performance using cross-validation and rigorous external validation in independent datasets. These datasets can be exploited to better understand the risk factors and build models that could predict the development of painful or painless DPN.2
From these datasets, the researchers used a total of 1 230 people with diabetes, predominantly type 2. The main outcome of the study was painful or painless neuropathy, which was determined by symptomatology, and the confirmation of abnormalities by nerve conduction studies or Intra Epidermal Nerve Fibre Density. Pain, on the other hand, was assessed according to the NeuP Special Interest Group of the International Association for the Study of Pain grading system (NeuPSIG). Among the independent variables used to make the predictions are the presence of anxiety or depression, smoke status, alcohol consumption, age, gender, body mass index, diabetes duration, cholesterol, triglycerides, low- and high-density lipoproteins, and creatinine.2
These data were used to train a series of machine learning models by means of a set of diverged and well-known algorithms, specifically, Random Forest, the Adaptive Regression Splines, and the Naive Bayes classifier. In the case of Random Forests and Adaptive Regression Splines, the researchers trained an unweighted and a weighted version, because, theoretically, weighted models should match the probability distribution of the outcome closer than unweighted by having better calibration, but could also run a higher risk of overfitting training data and be less generalisable.2
After training and testing the models with the data, the results obtained showed a good performance estimate of all the models, for instance, the area under the precision-recall curve obtained for the unweighted Random Forest was 0.811 and 0.818 for the weighted version. Similarly, for the Adaptive Regression Splines, the achieved performance was 0.820 for the unweighted and 0.818 for the weighted model. Finally, the Naive Bayes classifier had an area under the precision-recall curve of 0.812. Together, these results indicate a moderate positive relationship between the presence of painful neuropathy and the models’ predictions and very good area under the precision-recall curve.2
Next, in order to understand how independent variable values influenced the model outcome and to provide some interpretability of the machine learning algorithms, the researchers calculated the variable importance for each model. Overall, this analysis revealed that, specific variables were consistently among the most powerful predictors of the presence of pain in patients with diabetic neuropathy. These variables included quality of life, personality and psychological traits, age, and glucose control.2
To be more specific, for the Adaptive Regression Splines algorithm, the best performance was achieved when it considered the quality of life (measured as the EQ5D index), extraversion and openness dimension constructs from the ten-item personality inventory (TIPI), hemoglobin A1c, Depression and Anxiety t-scores and age in descending order. Interestingly, the EQ5D index, psychology and personality traits were always amongst the top predictors. On the contrary, the Random Forest model ranked high the importance of body mass index, age and glucose control, while the modifiable lifestyle factors that were also used included alcohol consumption and smoking, but with lower importance.2
Furthermore, the Naive Bayes classifier produced a similar ranking to the Random Forest, but with the addition of ranking alcohol consumption and experience of traumatic events before the age of 18 as having high importance. Noteworthy, in all models, gender was not identified amongst the most powerful predictors and the weighted models had similar feature rankings to the unweighted ones.2
As the last step, the researchers proceeded to perform the validation. This step is particularly important because it can help to identify overfitting, which occurs when a model has learned nuances of the training data but cannot be generalized in unobserved data. Thus, model validation in an independent dataset is important in order to avoid highly optimistic estimations of real-world model performance. The validation dataset consisted of a clinical cohort phenotyped for pain and neuropathy in 295 patients with type 2 diabetes from Tayside, Scotland who were re-phenotyped for neuropathic pain and related traits in order to be classified according to the presence and extent of it.2
The performance of all the models, in terms of area under the precision-recall curve, was very good for all models; however, it was markedly reduced in the validation set compared to the training set. Nonetheless, the Random Forest and the Adaptive Regression Splines models showed the smallest reduction in performance whilst still achieving good moderate positive correlation between predicted and observed outcomes.2
In summary, this study is one of the largest and most comprehensively phenotyped cohorts of people with diabetic neuropathy, in which machine learning was able to accurately classify painful or painless neuropathy on an independent population-based dataset. Also, the presence of pain was strongly associated with poorer self-reported quality of life, younger age, poor glucose control, high body mass index and a number of psychological and personality factors.2
Finally, these models can potentially be used either in the clinical context to assist patient stratification based on the risk of developing painful neuropathy if proven to be valid in a prospective study, or in the form of an online calculator that can return broad risk categories based on user input.2
CÁPSULA
The traditional complications of diabetes mellitus include stroke, coronary heart disease, heart failure, peripheral neuropathy, retinopathy, diabetic kidney disease, and peripheral vascular disease. Although the burden of disease associated with these traditional complications of diabetes mellitus still exists, the rates of these conditions are declining with improvements in the management of the condition.1
Instead, as people with diabetes live longer, they become susceptible to a different set of complications, for instance vascular disease no longer accounts for most deaths among people with diabetes mellitus, as was in previous years. On the contrary, cancer is now the leading cause of death, especially gastrointestinal and female-specific cancers.1
Other emerging complications include infections, alongside their inherent complications, postoperative infections, and those that involve the respiratory system, like COVID-19, pneumonia, MERS, SARS and H1N1 influenza. Furthermore, conditions of the liver, in particular, nonalcoholic fatty liver disease, non-alcoholic steatohepatitis and fibrosis are also rising. Likewise, affective disorders are also associated with diabetes, especially depression, anxiety and eating disorders. Finally, obstructive sleep apnea, and some cognitive conditions like dementia are also being recognized as emerging relevant complications of diabetes mellitus.1
FIN DE CÁPSULA
Welcome again. In the previous section, we talked about the application of artificial intelligence on the prediction of the presence of pain in patients with diabetic peripheral neuropathy. Now we will talk about how artificial intelligence can also be useful to detect gestational diabetes.
Gestational diabetes mellitus is generally defined as “glucose intolerance of varying degrees of severity with onset or first recognition during pregnancy”. The risk of presenting it is increased with overweight and obesity, but by itself it can increase the risk of many other maternal and neonatal complications, such as gestational hypertension, polyhydramnios, Cesarean birth, premature delivery, neonatal macrosomia, hypoglycemia and respiratory distress. Also, this condition can predispose to long-term sequelae for both mother and child including metabolic syndrome and type 2 diabetes, which in turn increases later life chronic disease.3
Fortunately, research has demonstrated that the interventions initiated early in pregnancy can reduce the rate of gestational diabetes in pregnant women with overweight and obesity. However, applying interventions in every instance can be costly and time-consuming. Therefore a clinical decision support system based on machine learning could help in providing a powerful and objective computerized tool to assist clinicians to identify women at risk of gestational diabetes, since it would largely reduce the time and cost by allowing targeted intervention.3
With this in mind, the group led by Dr. Catherine Mooney, from the University College Dublin in Ireland, published a study in the Scientific Reports journal. They presented their advances in applying machine learning to develop a clinical decision support system that predicts the risk of gestational diabetes in a high-risk group of women with overweight and obesity. All this with the ultimate goal of identifying those who may benefit from prevention strategies early in pregnancy.3
The researchers used a dataset that consisted of 484 women with a singleton pregnancy, a body mass index between 25 to 39.9 kg/m2 and who were between 10 to 15 weeks gestation. Some of these women were later diagnosed with gestational diabetes at approximately 28 weeks gestation. The variables used were collected at about 15 weeks gestation, which were considered as the baseline features. These features are maternal anthropometry, demographic characteristics, family history, and blood biomarkers.3
Once the data were collected, the first step was the preprocessing, which mainly consisted of dropping the features with more than 30 percent missing values, imputing missing values in the variables kept, and also removing participants from non-white ethnic origins to generate an independent cross-cultural-ethnic test set that would be used later in the study. The resulting dataset composed of participants of white ethnic origins was split into 75% training and the remaining 25% was used as an independent test set.3
Next, the researchers developed three artificial intelligence models for different use cases. The first model, or model one, was designed to be theoretical and feature-agnostic, where all included features were considered candidates and acted as a baseline. In contrast, model two was designed to be more usable and to fit into clinical routine easily, and thus fasting blood biomarkers, like fasting glucose, insulin, C-peptide, and lipid profile were excluded. This is because pregnant women do not normally attend an antenatal visit fasted and fasting blood biomarkers are not routinely assessed. Finally, model three was designed to work in remote settings without a hospital visit, so all features that cannot be recalled or measured outside of a clinical setting were excluded.3
For each of these three models, the researchers adopted several machine learning algorithms, including logistic regression, random forest, support vector machine, adaptive boosting, and extreme gradient boosting for data modeling. Out of all these algorithms, the support vector machine achieved the highest accuracy for the three models and was selected over the others.3
The researchers found that with this algorithm, model one required five features for optimal results, which are the family history of diabetes, weight, white cell count, fasting glucose and insulin. For model two, five features are included as well, three are the same as for model 1, being the family history of diabetes, weight and white cell count, with the addition of gestational and maternal age instead of glucose and insulin. Finally, for model 3, four features are included, them being gestational age, maternal age, family history of diabetes, and weight.3
When these three models were tested on the independent dataset, model 1 outperforms models 2 and 3, which can be explained by the exclusion of fasting blood biomarkers, which indicates that fasting blood biomarkers, especially fasting glucose and insulin, are strong predictors of gestational diabetes.3
Next, the researchers tested the three models on the independent cross-cultural-ethnic dataset and compared the results with the independent dataset already used, which consisted of people of white origin. They found that models 2 and 3 achieved high specificity but low sensitivity, although similar overall performance was observed in terms of area under the ROC curve and area under the precision-recall curve.3
Not satisfied with the results, the researchers also implemented the developed system as a web server, that allows users to submit their data and predicts the probability that the person will develop gestational diabetes, while also providing explanations for users to understand and trust the predictions.3
To better understand the relevance of this research, keep in mind that this is the first machine learning study that specifically targets pregnant women with overweight and obesity for gestational diabetes prediction. Also, an important characteristic is that the researchers carefully took the clinical usability into account in the modeling process, while most models in the literature included data from clinical tests that are not routinely performed in a clinical setting, such as fasting blood tests. Evidently, the inclusion of these data may lead to an increase in the model performance, but at the cost of usability because it makes the models difficult to translate into clinical use.3
Finally, this is also the first machine learning study to investigate the potential ethnic or cultural difference in gestational diabetes prediction, since, in previous works, all or the majority of the participants are from one ethnic group.3
To conclude, we presented here two studies that apply artificial intelligence to the management and detection of diabetes mellitus; in particular, its potential benefits for the accurate classification of painful or painless diabetic peripheral neuropathy and in the explainable detection of women at risk of gestational diabetes who need targeted pregnancy intervention.2,3
Thanks for joining us on this episode of Health Connect. Don’t miss out on our next episode. Don't forget to subscribe to discover the latest medical news.
References:
- 1. Tomic D, Shaw JE, Magliano DJ. The burden and risks of emerging complications of diabetes mellitus. Nat Rev Endocrinol. 2022. Available at: https://doi.org/10.1038/s41574-022-00690-7
- 2. Baskozos G, Themistocleous AC, Hebert L, Pascal MMV, John J, Callaghan BC, et al. Classification of painful or painless diabetic peripheral neuropathy and identification of the most powerful predictors using machine learning models in large cross-sectional cohorts. BMC Med Inform Decis Mak. 2022;22:144. Available at: https://doi.org/10.1186/s12911-022-01890-x
- 3. Du Y, Rafferty AR, McAuliffe FM, Wei L, Mooney C. An explainable machine learning-based clinical decision support system for prediction of gestational diabetes mellitus. Sci Rep. 2022;12:1170. Available at: https://doi.org/10.1038/s41598-022-05112-2
Os links para todos os sites de terceiros são oferecidos como um serviço aos nossos visitantes e não implicam endosso, indicação ou recomendação do Health Connect. Os artigos vinculados são fornecidos apenas para fins informativos e não visam implicar uma atribuição pelo autor e/ou editor. O Health Connect se isenta de qualquer responsabilidade pelo conteúdo ou pelos serviços de outros sites. Recomendamos que você analise as políticas e condições de todos os sites que escolher acessar.
NON-2022-14714
NON-2023-2512