Artificial Intelligence in Gynecology: Three Promising Developments
Welcome to Health Connect, the podcast for health professionals through which we will share the latest news and information on science and technology in the medical field. Today we will discuss some examples of how artificial intelligence impacts the field of gynecology from three investigations. The first addresses a patient-based screening tool for endometriosis; the second one aims at predicting progression-free survival in ovarian cancer patients, and finally, the third one explores a new decision-referral approach for a breast cancer screening algorithm that brings together the strengths of both radiologist and artificial intelligence.
Did you know that endometriosis affects approximately 190 million women worldwide? In other words, this inflammatory condition occurs in up to 10 percent of women of reproductive age. Endometriosis, as we know, is characterized by the presence of endometrial-like tissue outside the uterus and presents with heterogeneous gynecological symptoms that have a major systemic impact and repercussions on patients' well-being.1
The differential diagnosis of endometriosis remains difficult, as many symptoms may overlap with those of other common conditions, such as irritable bowel syndrome and interstitial cystitis. Since currently there isn’t a noninvasive screening tool to improve the diagnosis, multiple biomarkers, genomic analyses, questionnaires, and imaging techniques have been proposed as noninvasive screening and triage tests that can replace diagnostic laparoscopy. Yet, none of these tools have been routinely implemented in practice, as none of them has achieved clinically relevant accuracy so far.1
Furthermore, several problems have been reported for other diagnostic tools currently available; for example, specific questionnaires as a triage test have a poor contribution to diagnosis. On the other hand, clinical tests such as transvaginal ultrasound can facilitate the diagnosis of deep endometriosis or endometriomas, but are not always an acceptable option, especially in the case of adolescent patients.1
MRI is another option that facilitates diagnosis. Both MRI and transvaginal ultrasound have been shown to be accurate for diagnosing rectal endometriosis and pouch of Douglas obliteration. However, imaging techniques are less accurate for lesions such as endometriosis in the utero-sacral ligament, which is the most frequent location of deep endometriosis, and for detecting peritoneal endometriosis, the earliest stage of the disease.1
In this setting, in 2022, the team of researchers led by Sofiane Bendifallah published their findings on the use of machine learning algorithms in the diagnosis and screening of endometriosis based on key clinical features and patient symptoms. According to the authors, this screening tool could replace the direct visualization of lesions through laparoscopic surgery.1
Let us look at the evidence presented. The training dataset used in this study was constructed from pseudonymized data collected from an open health platform that contained 8 000 records of patients with symptoms suggestive of endometriosis and 500 characteristics on diagnosis, symptoms, imaging, medical treatment, fertility, and surgical treatments, as well as follow-up.1
From these records, the researchers extracted 1 126 patients with a diagnosis of endometriosis based on the following criteria: prior treatment for endometriosis, clinical examination confirming deep endometriosis, or ultrasound or magnetic resonance imaging (MRI), detecting ovarian, peritoneal, or deep endometriosis. The authors also included 608 controls with at least one symptom suggestive of endometriosis, but without prior treatment or confirmatory diagnosis.1
Both groups showed significant differences in epidemiological characteristics, history of symptoms, and medical therapies. Taking into account suggestions from endometriosis experts among the authors, 16 clinical and symptomatic characteristics that significantly affected the prediction of endometriosis occurrence were selected. The general features considered were familial or personal history, demographic characteristics like age and body mass index, endometriosis phenotype, quality of life, and treatment. The patterns of absenteeism during the last six months were considered to measure the quality of life, while the number of non-hormonal pain treatments was considered to assess the treatment feature.1
In addition, the endometriosis phenotype features that were taken into account were the presence of dysmenorrhea, abdominal or low back pain outside of menstruation, pain suggestive of sciatica or pain during intercourse, painful defecation, urinary pain during menstruation, right shoulder pain near or during menstruation, and blood in stool or urine during menstruation.1
These 16 features were used to train machine learning, deep learning, and ensemble models to develop a screening questionnaire. The models were a random forest classifier, logistic regression, decision tree, eXtreme gradient boosting, and soft and hard voting classifiers.1
Classification metrics from the training set revealed that the machine learning model had a sensitivity ranging from 0.82 to one, a specificity of zero to 0.8, and an F1 score of zero to 0.88 in the diagnosis of endometriosis. Next, 100 patients from a prospective cohort study with a surgical diagnosis were used for the validation set, where significant differences were found when compared to the phenotypic profile of patients in the training set; this is particularly relevant for external validation, as it suggests reproducibility and accuracy of the model.1
For the 16 selected features, a sensitivity ranging from 0.91 to 0.95, a specificity of 0.66 to 0.92, and an F1 score ranging from 0.77 to 0.92 were achieved. Among the models used, the soft voting classifier, random forest, and eXtreme gradient boosting were the most accurate methods, with sensitivity and specificity ranging from 95 percent to 98 percent and 80 percent, respectively.1
To define as "clinically useful" a noninvasive test that can replace diagnostic laparoscopy, the requirements proposed by Nisenblat and colleagues in 2016, are that the test should have a sensitivity of 0.94 and specificity of 0.79. Under these criteria, the screening tool presented in this study by Bendifallah and team could be proposed as a screening tool of clinical value to improve the endometriosis patient care pathway.1
It is important to emphasize that, in addition to potentially being a promising screening test for general practitioners, gynecologists, and other frontline healthcare providers, this tool can also be used by the patient. Thus, it can help patients with endometriosis to identify potential symptoms and initiate dialogue with their doctor, increasing health literacy and promoting proactive health-seeking behavior.1
This is pertinent when considering management guidelines offered by bodies such as the National Health Service, or NHS, from England, which recommends supporting the patient to be involved in decisions about their care, giving them choice and control. These strategies can improve patient experience and adherence to treatment and medication.1
CÁPSULA
In patients with endometriosis, the lack of a diagnosis can lead to undertreatment, continued pain, and prolonged symptom impact with an impaired quality of life. In 2016, Greene and colleagues found that the lag between the onset of endometriosis symptoms and seeking medical care is 4.6 years, and the time from seeking medical care to diagnosis was 4.7 years. Similarly, the study led by Ballweg, in 2004, found that among patients with symptoms suggestive of endometriosis, 61 percent of their healthcare professionals had reported that they did not find any problems, which contributed to a delay in diagnosis.1
FIN DE CÁPSULA
Welcome back. In the previous segment, we looked at an example of how artificial intelligence can be a crucial tool in providing objective data to improve diagnosis and awareness of endometriosis among healthcare professionals, as well as shared decision-making with the patient.1 Now, we will examine an example of how machine learning can help advance precision medicine in gynecologic oncology, leading to better patient profiling and personalized treatment.2
Published in 2022, the research by Arezzo and colleagues proposes a tool for predicting 12-month progression-free survival in ovarian cancer patients based on a machine learning algorithm applied to gynecologic ultrasound evaluation. This modality for the diagnosis and evaluation of ovarian cancer offers the advantages of being simple, non-invasive, and inexpensive.2
There is a current unmet clinical need for tools that enable accurate and early screening for ovarian cancer, particularly for the identification of high-risk patients, hence, a machine learning method was proposed to fulfil those needs.2
Currently, ovarian cancer is the seventh most commonly diagnosed cancer among women worldwide and the second most common gynecological malignancy. However, the absence of adequate screening and diagnostic procedures to detect it at an early stage, as well as the rapidity with which the disease spreads through the peritoneal surface, are important factors in its lethality.2
First, in a retrospective observational study, the authors analyzed a database of 64 consecutive patients diagnosed with epithelial ovarian cancer, which accounts for up to 90% of ovarian cancers. The authors collected demographic features, clinical characteristics, surgical and post-surgical history, histopathology, as well as data about transvaginal and/or transabdominal ultrasound examinations according to the classification proposed by the International Ovarian Tumor Analysis.2
A selection was then made from among these features to determine the set of attributes that would train the machine learning algorithms, including age, parity, menopause, CA-125 value, histotype, FIGO stage (an acronym corresponding to the International Federation of Gynecology and Obstetrics), as well as various ultrasound features such as main lesion diameter, side, echogenicity, color score, main solid component diameter, and presence of carcinosis.2
It is pertinent to mention that in this data set, the mean age of the patients was 54.1 plus-minus 14.9 years at diagnosis, and 43.7 percent of the participants were menopausal. The CA-125 test median value was 828.25 units per milliliter. Four out of the 64 participants had a BRCA1 mutation, another four had a BRCA2 mutation, and two had a BRIP1 mutation.2
In addition to this, the analysis of the ultrasound data showed that 53.1 percent of the participants had a unilateral mass, and the median greatest diameter was 113.6 plus-minus 57.6 millimeters. The most common tumor type was multilocular-solid, and the median diameter of the largest solid component was 71.1 plus-minus 45.1 millimeters. The most common echogenicity of the cyst fluid was anechoic. Most of these tumors showed intense vascularity on color Doppler examination. On the original ultrasonographic examination, 87.5 percent of the masses were classified as malignant. An ultrasonographic evaluation revealed ascites in 28.1 percent and carcinosis in 31.2 percent, but only 12.5 percent reported shadows.2
Moreover, the histopathologic analysis showed histotypes mostly of high-grade serous carcinoma, and 40.6 percent of the tumors were FIGO stage III, followed by FIGO stage I in 34.4 percent. Finally, 46 of the 64 participants achieved a progression-free survival of 12 months.2
With this selection of features, three machine learning algorithms were trained and validated, these being logistic regression, random forest, and K-nearest neighbors, with five-fold cross-validation.2
The random forest algorithm showed the best performance, with an accuracy of 93.7 percent, precision of 90 percent, recall of 90 percent, and an area under the receiver operating characteristic curve of 0.92.2
While the major weakness of this study is the small size of the sample population, random forest algorithms have shown robust performance with similar or lower sample sizes. Moreover, the fact that this model is based on a few attributes easy to collect in the clinical setting could be a promising way to obtain personalized follow-up and patient stratification based on the predicted progression-free survival. This could help clinicians to intensify the prescription of instrumental tests in high-risk patients and reducing it for low-risk patients.2
Finally, we will discuss the study led by Leibig and Brehmer, published in 2022, which proposes a model of triage and breast cancer detection with a collaborative approach between artificial intelligence and the radiologist, in contrast to the predominant models in the field that aim to replace the human oversight.3
In a review of the literature, the researchers found that while many algorithms developed for triage can be very effective in reducing workload by focusing on negative predictions to allow radiologists more time to review severe cases, paradoxically, those algorithms may decrease detection sensitivity.3
Other approaches focus on fully automating the identification of suspicious lesions. However, since the prevalence of cancers is often low on screening settings, a high number of false positives are generated. This, of course, requires additional resources for consensus review and diagnostic testing, increasing the radiologists’ workload and distracting attention from the true cancers.3
Moreover, a systematic literature review published in 2021 found that, of 35 included studies, 94 percent of artificial intelligence systems for breast cancer screening were less accurate than a single radiologist. On the other hand, the few studies that reported higher accuracy of the independent system had a high risk of bias and low generalizability to the clinical setting.3
Therefore, to develop this decision-referral approach for integrating artificial intelligence into the breast-cancer screening pathway, the researchers used a retrospective dataset consisting of full field mammography images of 4,463 screen-detected cancers and 100,055 follow-up-proven normal studies of asymptomatic women who participated in a German screening program. Population age ranged from 50 to 70 years, and more than 80 percent were assigned a B or C breast density, according to the categories proposed by the American College of Radiology.3
From this cohort, an internal test dataset of 1,670 screen-detected cancers and 19,997 normal mammography exams were derived, as well as an external test dataset of 2,793 screen-detected cancers and 80,058 normal exams.3
These data sets were used to evaluate a deep convolutional neural network-based algorithm that had been trained with labeled mammography images following annotations of radiologic findings and related biopsy information.3
The performance of various configurations of this algorithm was evaluated, either as a stand-alone system for analyzing mammograms, or in a decision-referral approach that leverages the strengths of both the radiologist and the algorithm. Thus, the authors compared the sensitivity and specificity of the algorithm with those of the radiologist's original decisions at the point of screening.3
To explore this decision-referral approach, a scenario was created where the artificial intelligence system classified whether a study was normal or suspicious for cancer, qualifying its level of confidence in the assessment it had given. Both suspicious studies and studies where the algorithm reported low confidence were referred to the radiologist, with no indication of the artificial intelligence's rating, to avoid potentially misleading bias. In addition, a safety net for the prediction of cancer-positive exams was integrated to maintain a high degree of sensitivity for cancer detection, serving as post-hoc decision support to the radiologist.3
The published results show that the exemplary configuration of the artificial intelligence system in the stand-alone model was significantly less accurate than the average unaided radiologist on both test data sets. For example, for the operating point that maintained radiologist sensitivity on the validation data set, the standalone system achieved a sensitivity of 84.2 percent and specificity of 89.5 percent for the internal test data set, compared with radiologist sensitivity of 85.7 percent and specificity of 93.4 percent.3
In contrast, the decision-referral approach outperformed the unaided radiologist in both sensitivity and specificity. This approach uses both the exemplary system configuration and the radiologist surveillance, reaching a sensitivity of 89.7 percent and a specificity of 93.8 percent on the internal test data set, representing an improvement in radiologist performance of four percentage points in sensitivity and 0.5 percentage points in specificity. In this scenario, the system automatically classified 60.7 percent of the studies.3
Likewise, on the external test data set there was a similar improvement, with an increase of 2.6 percentage points in sensitivity and one percentage point in specificity, corresponding to a triage yield of 63 percent. This indicates that the safety net managed to detect cancers missed by the first reader—that is, the algorithm—, that were only detected by the second reader, this being the radiologist.3
Importantly, this decision referral model improved the radiologist's ability to detect in-situ and invasive malignant carcinomas, as well as the sensitivity in the different subgroups stratified for masses, calcifications, lesion size, and breast density. In the latter subgroup, the model showed a significantly higher sensitivity for classifications B and C, which represented, as mentioned earlier, the vast majority of the screened patients.3
Finally, the sensitivity of the decision-referral approach was consistent across the eight screening centers included in the study and across three different mammography device manufacturers, demonstrating that this hybrid system is adaptable to heterogeneous screening requirements.3
The realistic algorithm configuration with a decision-referral approach improved radiologists' screening accuracy even in clinically relevant subgroups, allowing workload reduction without ruling out final human knowledge and supervision. Thus, this promising study could be of help in the safe clinical adoption of artificial intelligence systems for breast cancer screening.3
Thanks for joining us on this episode of Health Connect. Don't miss our next episode, where we will discuss more artificial intelligence developments in other medical fields. Subscribe to our channel to discover the latest medical news.
References:
- 1. Bendifallah S, Puchar A, Suisse S, Delbos L, Poilblanc M, Descamps P, et al. Machine learning algorithms as new screening approach for patients with endometriosis. Sci Rep. 2022;12(1):639. Available at: https://www.nature.com/articles/s41598-021-04637-2
- 2. Arezzo F, Cormio G, La Forgia D, Santarsiero CM, Mongelli M, Lombardi C, et al. A machine learning approach applied to gynecological ultrasound to predict progression-free survival in ovarian cancer patients. Arch Gynecol Obstet. 2022; doi: 10.1007/s00404-022-06578-1. Epub ahead of print. Available at: https://link.springer.com/article/10.1007/s00404-022-06578-1
- 3. Leibig C, Brehmer M, Bunk S, Byng D, Pinker K, Umutlu L. Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis. Lancet Digit Health. 2022;4(7):e507-e519. Available at: https://www.thelancet.com/journals/landig/article/PIIS2589-7500(22)00070-X
Os links para todos os sites de terceiros são oferecidos como um serviço aos nossos visitantes e não implicam endosso, indicação ou recomendação do Health Connect. Os artigos vinculados são fornecidos apenas para fins informativos e não visam implicar uma atribuição pelo autor e/ou editor. O Health Connect se isenta de qualquer responsabilidade pelo conteúdo ou pelos serviços de outros sites. Recomendamos que você analise as políticas e condições de todos os sites que escolher acessar.
NON-2022-15147
NON-2023-2512