Artificial intelligence and fundus eye diseases
Welcome to Health Connect, the podcast for health professionals where we will share the latest news and information on science and technology in the medical industry. In this episode we will talk about how artificial intelligence can help to better diagnose diseases of the ocular fundus, such as glaucoma.
Glaucoma is a leading cause of irreversible blindness in our aging society, with a projected number of patients of 112 million by 2040.2
Today we will present how the use of artificial intelligence can be applied to ophthalmology, particularly in the detection and classification of diseases of the ocular fundus. We will discuss two studies, the first one focuses on the diagnosis of up to 30 diseases of the fundus; while the second study focuses only on glaucoma.
As you know, millions of people in the world are affected by ocular fundus diseases like glaucoma, diabetic retinopathy, age related macular degeneration, retinal vein and artery occlusion, retinal detachment, and fundus tumors. However, among these diseases, diabetic retinopathy, macular degeneration, and glaucoma are the most common cause of vision impairment in most populations. More importantly, without accurate diagnoses and timely appropriate treatment, these fundus diseases can lead to irreversible blurred vision, metamorphopsia, defects in the visual field, or even blindness.1
Recently, fundus images that provide basic detection of these diseases are available and affordable in most parts of the world, and they can be handled by non-professional personnel and delivered online to major ophthalmic institutions for follow-up. At the same time, deep learning algorithms for diagnoses assisted by artificial intelligence have already been applied to screen for diseases like diabetic retinopathy, macular degeneration, glaucoma, and papilledema. Even so, these assisted diagnosis systems mostly focus on the detection of a single retinal disease. In addition, in clinical practice, retinal disease screening by single-disease diagnostic algorithms would not recognize other fundus diseases, such as diabetic retinopathy. All of this represents a problem because, in real life, the capability to efficiently detect various types of fundus diseases is needed, especially in remote areas that lack specialized ophthalmologists.1
For this reason, a multi-disease detecting system that uses fundus images could avoid missed diagnoses and consequently delayed treatment. Such idea was contemplated by Ling-Ping Cen, Jie Ji and colleagues, from the Chinese University of Hong Kong and the Information Centre in Guangdong, China. This research team aimed to develop a multi-disease automatic detection platform based on convolutional neural networks that can classify 39 types of common fundus diseases and conditions by using color fundus images.1
The purpose of the platform was to predict the probability of each disease for every image and display heatmaps providing explainability in real-time. To do so, the authors collected approximately 250 000 fundus images from 7 multi-ethnic datasets from different places, including China and the United States. These images were used to train, validate and test the deep learning platform.
More specifically, the training was done with about 130 000 images and the validation was done with 129 350 images that had not been seen by the algorithm during the training process. These images were fed into the model, which was based on a two-level hierarchical system for the classification of 39 types of diseases and conditions. In this case, the hierarchy of two levels for classification means that the model first classifies the fundus images into 30 so-called “bigclasses”. However, the images classified into some bigclasses could be further cropped and classified to subclasses; for instance, optic nerve degeneration could be further classified as possible glaucoma and optic atrophy.1
The results of the model tested with this initial dataset for the detection of 30 bigclasses showed a sensitivity of 0.978, a specificity of 0.996, and an area under the curve of 0.9984. Furthermore, the highest F1 scores, which is the most suitable measurement for the evaluation of algorithm performance, were obtained for diseases with obvious features, such as retinal vein occlusion with 0.983, maculopathy with 0.965, silicon oil in eye with 0.964, and laser spots with 0.967.1
By contrast, the lowest scores were obtained for diseases with ambiguous features such as posterior serous exudative retinal detachment with 0.829, optic nerve degeneration with 0.852, severe hypertensive retinopathy with 0.829, chorioretinal atrophy or coloboma with 0.861, and preretinal hemorrhage with 0.766. When evaluating the sensitivity and specificity for detection of these diseases, these were above 0.942 and 0.979, respectively.1
Afterwards, the authors wanted to also evaluate the generalization for the detection of multiple diseases and conditions; thus, the platform was further tested with three heterogeneous datasets collected from hospitals with patients of different ethnicities. The results revealed an overall F1 score of 0.920, and for each disease, the sensitivity and specificity for detection was above 0.855 and 0.982, respectively.1
Not being satisfied after testing the hetero-ethnic datasets, the authors also evaluated the generalization capabilities of their platform to detect different diseases and conditions in other four datasets that were established for single diseases. They tested the trained model on two diabetic retinopathy datasets, called messidor-2 and IDRID. In the first one, they achieved an F1 score of 0.944, while in the second, they found that the performance was weaker with a score of 0.875. Given that low performance, the authors reviewed the misjudged cases, and found as the main cause of false positive results the presence of stains on dirty lenses that looked like hemorrhage spots .1
In another test dataset that contained pathological myopia images called PALM, a higher performance was achieved, with an F1 score of 0.974. Finally, when testing the fourth dataset called REFUGE, containing optic nerve degeneration images, an indicator of possible glaucoma, the performance was moderate with an F1 score of 0.651. It is worth mentioning that the labels of all images in this dataset were initially confirmed by multiple examinations including intraocular pressure, optical coherence tomography, and visual field. However, in early-stage glaucoma, almost no noticeable changes could be detected by fundus images through optical coherence tomography that could show retinal nerve fiber layer thinning. Thus, these cases were missed by the platform because it was developed based on fundus images only.1
All these results show that even without specific optimization to the data sets of single diseases, the overall performance of the platform was acceptable, indicating generalization capabilities in detecting fundus diseases in heterogeneous images.1
At this point, the authors also wanted to perform a test to compare the performance of their platform with five retinal specialists with more than 10 years of clinical experience. They used 922 images that had not been seen by the platform and included various challenging diseases and conditions. The five retinal specialists were requested to complete the whole test independently, similar to the artificial intelligence platform, by selecting different class labels for each image without patient information, followed by an additional test containing a dataset already used in previous tests but with patient information.1
For the whole test without patient information, the retina specialists achieved an F1 score of 0.954, while the artificial intelligence platform obtained a score of 0.964. In this way, the artificial intelligence model was more sensitive than human experts on detecting multiple diseases.1
On the other hand, for the dataset with patient information, the obtained score of the platform was 0.961, while in contrast, the performance of the specialists with patient information was 0.960, which was greater than that obtained without patient information. From here we can conclude that the performance of the platform was comparable to that of the retina specialists who had more than 10 years of clinical experience.1
Finally, the authors also wanted to verify the automatic detection efficiency of their platform for fundus diseases in a real-life setting using tele-reading applications in seven primary hospitals located in different parts of China. In this case, all images were classified into the 30 bigclasses, regardless of the image quality score.1
Not surprisingly, the images with low quality scores were mainly classified as blur fundus. For instance, from the total of 7529 images from 5159 eyes and 3610 subjects that were uploaded by the seven hospitals, 1362 images were merely detected as “Blur fundus” with probabilities equal to or larger than 95 percent. Thus, they were automatically sent back to repeat the photograph.1
In addition, 2105 images were detected as non-referable, which meant that the patients had either normal fundus, tessellated fundus, mild non-proliferative diabetic retinopathy, or large optic cup; thus, they do not need immediate referral to the ophthalmologist. Consequently, these non-referable images were also sent back directly to their primary hospitals with a recommendation of “follow-up”.1
The rest of the images, which were about 4 000, were referable and further checked by the retina specialists. Interestingly, the specialists confirmed that 66 subjects were non-referable, with a few exceptions, like four patients who had yellow-white spots, one patient with chorioretinal atrophy, and 6 subjects with moderate nonproliferative diabetic retinopathy that showed a few hemorrhage spots. Hence, all these patients were not urgent for the referral.1
Importantly, the deep learning platforms achieved a sensitivity of 0.994 and specificity of 0.952 for detection of referable diseases, which demonstrates the high efficiency of the platform for classification of fundus diseases in the primary hospitals. However, there were 105 referable images that could not be categorized as any of the 39 diseases and conditions because they were rare conditions or unclear ophthalmic features.1
In summary, as you can see, this study is very relevant because it is the first report to show that almost all common types of fundus diseases can be detected by deep learning algorithms in retinal images with an accuracy level comparable to that of retina specialists.1
CAPSULA
Glaucoma is a chronic neuropathy that induces structural optic nerve fiber damage with visible changes in and outside the optic disc, which ultimately leads to functional vision loss. Also, glaucoma is associated with characteristic changes of the optic nerve head, also called the optic disc, which are evaluated by the ophthalmologists during clinical examination and optic disc photo analysis to look for typical changes such as generalized or focal neural rim thinning.2
Neuroretinal rim thinning can be quantified in fundus photos by measuring the vertical cup-to-disc ratio, or VCDR. The optic cup is the distinguishable excavation in the central portion of the optic nerve head and it is typically small in normal eyes but increases with neuroretinal rim loss. Therefore, an elevated VCDR is considered suspicious for glaucoma.2
END OF CAPSULA
Welcome again. In the previous section, we talked about the diagnosis of different fundus diseases using artificial intelligence. Now we will talk about how some other features of the fundus, apart from the optic nerve head, can be useful for the diagnosis of glaucoma.
To give a bit of context, let’s remember that glaucoma is usually underdiagnosed, and deep learning models have already been proven successful in overcoming this problem and maintaining a limited false positive rate using fundus images. These models have even reached a sensitivity of 97.6 percent and 85 percent of specificity, but unfortunately, those results came at the cost of lower insights into the decision process of the model. This decision-making transparency, also called explainability of the convolutional neural network, is crucial to build trust for future use of deep learning in medical diagnosis.2
For instance, several previous studies have attempted to explain the deep learning model’s decision in glaucoma classification from fundus images with the presence of relevant regions within the optic nerve head. However, it is currently unknown to what extent the information provided by color fundus images outside the optic nerve head, especially in the peripapillary area, is relevant for glaucoma diagnosis.2
This is the reason why clinicians tend to focus mainly on the optic disc for diagnosing glaucoma, but retinal nerve fiber layer defects that are adjacent to the optic nerve head, are also known as typical indicators of glaucomatous damage. For the evaluation of these defects, papillo-macular area centered red-free fundus images are typically used for optimal visualization, but the clinical detection of the defects is only possible after a 50 percent loss of optimal visualization. Therefore, deep learning models could leverage subtle changes, like retinal nerve fiber layer thinning, that human experts cannot detect.2
With this in mind, Ruben Hemelings and colleagues, from the Flemish Institute for Technological Research in Belgium, published a study in the Scientific Reports journal, which aimed to analyze the importance of the regions beyond the optic nerve head, and provide objective explainability in the context of glaucoma detection and VCDR.2
To achieve this, they used approximately 24 000 color fundus images of both glaucomatous and healthy eyes, obtained from 6486 individuals. Different areas of the images were covered, from the center of the optic disc, starting at 10 percent, which is equivalent to an image that has some optic nerve information removed, up to 60 percent of the fundus covered, in which the complete optic disc and a large peripapillary area are missing.2
These images were fed to the model to predict the VCDR, a measurement to detect glaucoma. They found that the largest drop in performance was observed between 20 and 30 percent, an area that corresponds to the optic nerve head border. Interestingly, with an extreme circular crop of 60 percent of the image diameter, covering both the optic nerve head and a large peripapillary area, the model still explains 37 percent of test variance.2
Then, the authors inverted the previous crop policy, with an increasing amount of optic nerve and periphery, from 20 percent onwards, being accessible to the convolutional neural network. They found that a setup with 30 percent of the image radius used, and therefore with fully visible optic nerve head, obtains results as high as a setup with a complete image. Thus, they demonstrated that the convolutional neural network model only requires the intact optic nerve head to estimate the VCDR accurately.2
Next, the researchers wanted to use this model to classify glaucoma and non-glaucoma images. For this, they used 13 551 images, and after testing the model an area under the curve of 0.94 was obtained. To compare, when using cropped optic nerve head images, the performance remained similar until 20 percent. More interestingly, there was no statistical difference in performance when using optic nerve head cropped from 30 to 60 percent. This means that the model still performed with 60 percent of the image radius covered.2
On the other hand, testing on the inversely cropped dataset with an extreme coverage from the periphery with only one percent of the fundus image visible, a significant glaucoma classification with an area under the curve of 0.67 was achieved. As from 20 percent cropping, the area under the curve overlapped with that of the other cropping policy.2
From these results, the most striking observation is that significant performance in glaucoma detection and VCDR estimation can be achieved without access to the optic nerve head. This is relevant because it answers the clinical question whether significant glaucomatous features are present outside the optic disc in fundus images, even if there are no visible defects that are localized in the retinal nerve fiber layer. With this in mind, now both clinicians and automated screening softwares can therefore focus on the peripapillary area in eyes that suffer from conditions that hamper neuroretinal rim assessment.2
In conclusion, these two studies show how retinal fundus images can be useful to detect up to 39 types of fundus diseases and conditions,1 how the combined use of VCDR estimation can help the detection of glaucoma, and demonstrate the importance of the optic nerve head and peripapillary regions to reveal the presence of significant pixel information.2
These technologies can be used in remote areas reliably and efficiently for one or several diseases of the whole spectrum of common fundus diseases and conditions, especially in remote areas around the world.1
Thanks for joining us on this episode of Health Connect. Don’t miss out on our next episode. Discover more medical news and content on Viatris Connect.
References:
- 1. Cen LP, Ji J, Lin JW, Ju ST, Lin HJ, Li TP. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nat Commun. [Internet]. 2021. (Accessed on June 08, 2022);12:4828. Available at: https://doi.org/10.1038/s41467-021-25138-w
- 2. Hemelings R, Elen B, Barbosa-Breda J, Blaschko MB, De Boever P, Stalmans I. Deep learning on fundus images detects glaucoma beyond the optic disc. Sci Rep. [Internet]. 2021. (Accessed on June 08, 2022);11:20313. Available at: https://doi.org/10.1038/s41598-021-99605-1
Os links para todos os sites de terceiros são oferecidos como um serviço aos nossos visitantes e não implicam endosso, indicação ou recomendação do Health Connect. Os artigos vinculados são fornecidos apenas para fins informativos e não visam implicar uma atribuição pelo autor e/ou editor. O Health Connect se isenta de qualquer responsabilidade pelo conteúdo ou pelos serviços de outros sites. Recomendamos que você analise as políticas e condições de todos os sites que escolher acessar.
NON-2022-13125
NON-2023-2512