Interpretation of chest X-rays and lung CT scan images using AI
Welcome to Health Connect, the podcast for health professionals where we will share the latest news and information on science and technology in the medical industry. In this episode we will talk about how artificial intelligence can be applied to imaging techniques in relation to the respiratory system.
Did you know that the clinical application of artificial intelligence to functional lung imaging is currently more of an evolving opportunity, rather than a tested reality?1
Today we will present how artificial intelligence can help in the interpretation of images, in particular CT scans and X-rays to better diagnose respiratory conditions.
Although artificial intelligence is still a nascent field in many healthcare domains, initial applications and proof-of-concept studies have shown promising and impactful results in diagnosing different disease conditions using only raw data sources like diagnostic imaging.1
For a functional imaging modality, the structural components of the lung need to be defined, such as lung field, lobar compartments, fissures, and the bronchovascular tree, to locate and quantitate image-based data.1
The quantitative analysis of lung tomographies, also called quantitative CT scan analysis has been used extensively for more than 20 years and has helped to clarify how lung densities are distributed in different diseases, like in acute respiratory distress syndrome, or commonly known as ARDS. ARDS can be detected by determining the change in the non-aerated tissue fraction at two end-expiratory pressure levels. Recent studies have also shown that this analysis can also provide valuable information for the respiratory management of COVID- 19.2
However, in order to properly analyze a CT scan, a precise segmentation, in other words, the inclusion of a structure into a region of interest of the lung, is mandatory for a reliable quantification of the lung densities. But the actual segmentation procedure in several hospitals currently requires consistent manual intervention, as well as time, and experienced personnel. These factors have seriously hindered a broader adoption of this analysis in clinical practice.2
Because of this, some efforts have been made to introduce artificial intelligence based on convolutional neural network architectures for image segmentation. The most notable ones being the “SegNet'' and the “U-Net” architectures. In fact, these methods have been applied in the course of the COVID-19 pandemic for lung CT image segmentation.2
In this context, the research led by Peter Herrman, from the University Medical Center Gottingen in Germany, published a work in the Frontiers in Physiology journal that aimed to develop a deep learning algorithm that could automatically analyze and segment acutely injured lungs over the full spectrum of ARDS severity. The researchers believed that the successful application of the deep learning method to the segmentation process of CT lung images in ARDS would greatly increase the use of quantitative CT scan analysis.2
To do so, the researchers used a CT scan dataset from 100 patients with ARDS who were enrolled into different trials in the Policlinico Hospital in Milan. More specifically, the CTs of these patients were taken during an end inspiratory pause at 45 cm of water and at 5 and 15 cm of water during end expiratory pressure.2
Apart from this ARDS group, the researchers added CT scans from 20 COVID-19 patients from the San Paolo hospital in Milan and CT scans from 15 patients with normal lungs from the Medical University of Göttingen. Thus, for all these patients, 15,398 CT slices were all manually segmented to obtain a reference, or ground truth, by experienced intensive care physicians.2
Afterwards, before feeding the algorithm with the images, they were preprocessed, this means converting gray values from the original 16-bit to an 8-bit image. A total of 11,932 images and their manually generated segment coordinates were preprocessed in this way, then loaded into the artificial intelligence software.2
The artificial intelligence network used by the researchers is a network based on the U-Net architecture mentioned before, which is uniquely characteristic for being able to generate a new, altered image as the output from an input image. In particular, during the image processing of the software, small images from the original large image are generated. These small images no longer have any resemblance to the original one, but show certain extracted properties of the image, also called feature maps, such as corners, edges, or structures.2
Therefore, the artificial neural network programmed in this manner was trained with the 11,932 CT slice images of the lungs of patients and the associated manually drawn lung segments, or the ground truth; and it was tested on 3,466 CT lung slice images from 27 patients. After the training and testing of the model, for all lung scans the agreement between manual and artificial intelligence segmentation was 87 percent.2
The results for the slices showed that regardless of the lung type, the mean agreement between manual and artificial intelligence segmentation across all CT slices was 91.3 percent for normal lungs, 85.2 percent for ARDS, and 84.7 percent for COVID-19.2
In general, the researchers found that the agreement between manual and artificial intelligence segmented lungs followed an inverse U-shape: higher in the central regions of the thorax and lower at the apex or near in the pleural recesses. Note that in these regions, the absolute amount of lung tissue is just a small fraction of the entire parenchyma. Interestingly, the difference in lung volume between ground truth and predicted results is up to 99 percent in normal lungs. On the other hand, the worst results were obtained in severe ARDS compared with moderate and mild ARDS and the worst segmentation is located mainly in the peripheral zones of the lung slices.2
When evaluating the model at the level of patient or lung, not slices, the quantitative analysis of CT scan showed that the overinflated, well-aerated, poorly aerated and non-aerated tissue fractions were almost identical in the manually or artificial intelligence-segmented images.2
On the other hand, the researchers assessed the recruitability, which is probably the most important variable. They found that for the recruitment, expressed both as variations of non-aerated tissue and as a variation of well-aerated tissue, the agreement between the two techniques is within plus 6.2 and minus 5.5 percent, when the recruitment is expressed as variation of the percentage of non-aerated tissue. When expressed as variation of the percentage of well-aerated tissue, the agreement was between plus 2.3 and minus 3.3 percent.2
With these results, the researchers demonstrated that automatic lung segmentation performed by a properly trained neural network provided lung contours in close agreement with the ones obtained by manual segmentation. This is relevant beyond their use for research, since these data may prove important for clinical diagnosis and respiratory therapy, especially because it could be a solution to the amount of man-hours required for lung segmentation.2
With regards to the detection of ARDS, image segmentation is especially difficult as, in some cases, it is almost impossible to discriminate the edge of the lung parenchyma from a pleural effusion. This is so common in ARDS, particularly in the most dependent lung regions and most severe ARDS forms. However, this problem is also present in manual segmentation.2
Therefore, the development of a reliable clinical diagnostic system, able to perform the automatic detection and consecutively the quantitative analysis of lung tissues immediately after performance of a lung CT scan seems conceivable and also practicable. Such a tool would have significant impact on diagnosing and selecting the appropriate therapeutic interventions for each individual patient who suffers from severe lung injury, and further on could even become available to clinical practice for monitoring relevant variables.2
CÁPSULA
As artificial intelligence algorithms increasingly affect decision-making in society, researchers have raised concerns about algorithms creating or amplifying biases, for example, different performance on disease diagnosis in black compared with white patients. These biases have been widely studied and addressed in other fields, but in artificial intelligence-driven underdiagnosis they have been relatively unexplored, especially with chest X-ray images.3
With this in mind, a recent systematic study demonstrated the existence of consistent underdiagnosis in different datasets in the chest X-ray domain in the USA. In particular, algorithms trained on all settings still exhibit systematic underdiagnosis biases in under-served subpopulations, such as female patients, black patients, Hispanic patients, younger patients and patients of lower socioeconomic status.3
The authors consider the reason for the bias to be mainly caused by the method of labeling the data, since now many X-ray datasets are not manually labeled, but instead they are automatically labeled from the medical notes. Another reason could be that clinical care itself is already biased against these populations, thus the data taken as ground truth leads to bias amplification, which means that the model is demonstrating a bias already present in the data.3
These findings demonstrate that some algorithms could escalate existing systemic health inequities if there is not a robust audit of performance disparities across subpopulations. As algorithms move from the laboratory to the real world, we must consider the ethical concerns regarding the accessibility of medical treatment for under-served subpopulations and the effective and ethical deployment of these models.3
FIN DE CÁPSULA
Welcome again. In the previous section, we talked about the application of artificial intelligence on the analysis of CT images to better detect ARDS. Now we will talk about how artificial intelligence can also be useful assisting the physician during the diagnostic process.
So, although much work is focused on the analysis of CT scans, the chest radiography is still the most commonly used radiologic examination to screen chest diseases and monitor patients with thoracic abnormalities, including lung cancer and pneumonia. However, interpreting these images is challenging and prone to misreading.4
Because of this, artificial intelligence solutions for chest radiographs have been designed and have gathered attention because they show excellent performance in detecting malignant pulmonary nodules, tuberculosis, and various abnormalities in experimental datasets. When working with experimental datasets, the prevalence of certain diseases may be enriched, this leads also to high diagnostic accuracy of artificial intelligence methods, but this is not generalized across diseases.4
For this reason, cross sectional studies or cohorts with consecutive patients may help to validate the performance of artificial intelligence solutions for clinical practice in the real world. Unfortunately, there is not much research with regards to artificial intelligence augmentations using datasets of this type.4
With this in mind, the research of Kwang Nam Jin and colleagues, from the Boramae Medical Center in Korea, was published in the European Radiology journal. In this publication, the researchers aimed to evaluate a commercial artificial intelligence software on a consecutive diagnostic cohort dataset collected from multiple respiratory outpatient clinics. Furthermore, they also compared the physicians’ ability to detect and localize referable thoracic abnormalities with and without the assistance of the artificial intelligence software. The idea behind this is that implementing a commercially available deep learning algorithm will enhance the ability of clinicians to interpret chest radiographs.4
The researchers included 6,006 consecutive patients who visited respiratory outpatient clinics. The most frequent referable abnormal thoracic lesions in these patients were pulmonary nodules or masses, accounting for 28 percent, followed by lung consolidation with 22 percent, and pneumothorax with 0.4 percent. The final diagnosis of these lesions were most commonly pneumonia, tuberculosis, and malignant neoplasm of the bronchus or lung.4
Next, these data were introduced into the artificial intelligence solution, and when the software detected any abnormality, the locations of the lesions were outlined or marked with a color map and the abnormality was scored. Similarly, the radiographs were evaluated by one of the three adjudicators using CT scans and medical records to determine the presence of referable thoracic abnormalities. These were defined as any chest radiographic abnormalities requiring further diagnostic evaluation or management.4
In this case, referable thoracic abnormalities were categorized into intended and non-intended lesions. Intended lesions were classified into three types: nodule or mass, lung consolidation, and pneumothorax; while non-intended lesions were classified into seven types: atelectasis or fibrosis, bronchiectasis, cardiomegaly, diffuse interstitial lung opacities, mediastinal lesions, pleural effusion, and others.4
Taking into consideration the information of all these patients, the results of the artificial intelligence solution were evaluated and compared against the reference standards to measure the performance. The researchers found that for the 6,006 chest radiographs, the algorithm achieved an average area under curve of 0.87, with a sensitivity and a specificity of 0.89 and 0.72, respectively.4 Furthermore, when comparing the performance according to the type of lesion, the accuracy of the artificial intelligence software was higher for intended lesions than for non-intended lesions.4
After this, the researchers wanted to evaluate the performance of physicians for image classification and lesion localization with and without the assistance of the artificial intelligence software. To achieve this, out of the 6 006 patients, 230 patients were randomly selected and subjected to evaluation by an observer panel consisting of 12 physicians, three of which were thoracic radiologists, three board-certified radiologists, three radiology residents, and three pulmonologists.4
The test was conducted in two sessions with a washout period of 4 weeks to avoid information bias. Each physician independently assessed 116 images with AI assistance and 114 images without assistance during the first session and vice versa during the second session. In addition to chest radiographs, the physicians were provided with the clinical information including age, sex, and chief concern to simulate the normal clinical process. With this experiment, the researchers found that the average values of area under the ROC curve across observer groups were significantly higher in the case of the assisted reading than in unaided reading.4
So, the take home message from this study is that the use of artificial intelligence resulted in an increase in the accuracy for physicians interpreting consecutively collected chest radiographs from respiratory outpatient clinics. It means that the assistance of artificial intelligence improved physicians’ performance in detecting and localizing referable thoracic abnormalities on chest radiographs.4
According to the researchers, this is the first multicenter study to measure physicians’ diagnostic performance with and without an artificial intelligence solution for chest radiographs from consecutive patients.4
In light of the studies reviewed today, we can anticipate that the pulmonary functional imaging community may benefit from this rising activity in data science in the future, as novel approaches using rich data sets are proposed to redefine disease conditions. However, a multidisciplinary approach is essential to introduce artificial intelligence in pulmonary imaging to deliver significant benefits in the coming years.1
Thanks for joining us on this episode of Health Connect. Don’t miss out on our next episode. Don't forget to subscribe to discover the latest medical news.
References:
- 1. San José Estépar, R. Functional imaging of the lung special feature: review article. Br J Radiol. 2022. Available at: https://doi.org/10.1259/bjr.20210527
- 2. Herrmann P, Busana M, Cressoni M, Lotz J, Moerer O, Saager L, et al. Using Artificial Intelligence for Automatic Segmentation of CT Lung Images in Acute Respiratory Distress Syndrome. Front Physiol. 2021;12:676118. Available at: https://doi.org/10.3389/FPHYS.2021.676118
- 3. Seyyed-Kalantari L, Zhang H, McDermott MBA, Chen IY, Ghassemi M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med. 2021;27:2176-2182. Available at: https://doi.org/10.1038/s41591-021-01595-0
- 4. Jin KN, Kim EY, Kim YJ, Lee GP, Kim H, Oh S, et al. Diagnostic effect of artificial intelligence solution for referable thoracic abnormalities on chest radiography: a multicenter respiratory outpatient diagnostic cohort study. Eur Radiol. 2022;32:3469-3479. Available at: Approved article: https://doi.org/10.1007/s00330-021-08397-5
Os links para todos os sites de terceiros são oferecidos como um serviço aos nossos visitantes e não implicam endosso, indicação ou recomendação do Health Connect. Os artigos vinculados são fornecidos apenas para fins informativos e não visam implicar uma atribuição pelo autor e/ou editor. O Health Connect se isenta de qualquer responsabilidade pelo conteúdo ou pelos serviços de outros sites. Recomendamos que você analise as políticas e condições de todos os sites que escolher acessar.
NON-2022-14800
NON-2023-2512