Improving the Diagnosis of Pulmonary Embolism: The Answer Could be Artificial Intelligence
Welcome to Health Connect, the podcast for health professionals through which we will share the latest news and information on science and technology in the medical field. In this episode, we present three investigations that show the current development of artificial intelligence, or AI, for the detection of pulmonary embolism, a condition that, as we know, is life-threatening and challenging to diagnose.1
First, we discuss a deep neural network model for the detection of pulmonary embolism from computed tomography angiograms. Next, a machine learning approach aims to improve the screening process using electrocardiogram signals.
Did you know that the positive yield for pulmonary embolism from computed tomography angiography studies varies from less than 10 percent to 30 percent?1 For example, in a 2020 multicenter analysis of U.S. medical centers, only 3.1 percent of computed tomography scans were positive for PE.2
Given the lack of specificity that characterizes the symptoms of pulmonary embolism, its confirmatory diagnosis requires the use of various thoracic imaging modalities, most commonly computed tomography pulmonary angiography, abbreviated as CTPA.2 Despite being the gold standard for diagnosis, we know that this method presents several problems. First, pretest probability and exclusion criteria are rarely used in routine clinical practice, and a general increase in the use of CT emergency imaging has been reported.1
This brings us to the second factor, as an accurate interpretation of a CTPA requires expertise and time, so high workloads and resulting fatigue can cause diagnostic errors, particularly in emergency radiology.1
Based on this problem, the research team led by Dr. Heidi Huhtanen developed and evaluated a deep neural network model that could automatically detect pulmonary embolism from CTPA using only weakly labeled training data. This latter feature is of particular relevance, as the requirement for densely annotated training data has prevented the adoption of several models for the diagnosis of pulmonary embolism in real clinical settings, where creating a large dataset could be unrealistic.1
The need for a large magnitude of annotated training data is a limiting factor that has been noted in both modern artificial intelligence systems and computer-aided detection models, an earlier technology that, while increasing reader sensitivity, had the significant drawback of a high yield of false positives. These are not only a cause of frustration for radiologists, but may also increase the risk of false diagnoses due to the so-called automation bias, that is, the tendency of humans to favor the machine-made decision.1
To develop their model, CTPA scans with readily available three-millimeter axial slices were chosen from a retrospective cohort, giving a total of 608 CTPAs for the training and validation set, and 204 CTPAs for testing. These datasets are smaller than those reported in other similar studies.1
All CTPAs were interpreted visually, manually labeling them as positive if they included even just one unequivocal embolus and negative if there was none. The decision to annotate imaging data by assigning binary labels on slice-based levels, instead of annotations for each distinct embolus, aimed to minimize manual work. Also, as mentioned earlier, it was intended to test whether limited data annotation could be sufficient for a deep learning approach to achieve reasonable results.1
In the training set, the labeling work was performed by a radiology resident instructed by a board-certified radiologist with 14 years of experience. For the test set, all scans and slices were labeled in consensus to achieve a better reference standard. No significant differences were observed between the training and test sets in terms of patient demographics, type of scanner used, or the time-lapse in which the scans were acquired. Besides, the data sets did not include the same patients.1
Next, CT scans were processed as a series of axial slices, using both a 2D convolutional neural network to analyze individual slices, which we will call by its abbreviation CNN, and a long-short term memory network, which we will call LSTM. This combination has previously been shown to be useful when handling weakly annotated CT data.1
Two versions of the model were created; the first, which we will call model A, was pre-trained with a dataset consisting of over 100 000 chest radiographs. The second model, or model B, used a dataset of over 14 million natural images for its pre-training. A five-fold cross-validation was used in both models.1
Then, for the development of a training set, a total of 52,752 slices were included, 14 percent positive for pulmonary embolism. Likewise, a test set was constructed with a total of 17,778 slices, 16 percent of them were positive. Since the ratio of positive to negative slices was very unbalanced, the authors augmented the data. The final training set for the CNN models consisted of approximately 100 000 slices; half were negative and half were positive.1
The final combination of models A and B produced stack-based and slice-based predictions, and since the LSTM handled the stack-level data, which were already balanced, the augmented data were not used in training this part of the model.1
The results of Huhtanen and colleagues, published in 2022, showed that both models achieved excellent results for pulmonary embolism detection at the stack-based level. Model A achieved 90.2 percent accuracy, 86.6 percent sensitivity, and 93.5 percent specificity, outperforming model B, which achieved 87.3 percent accuracy, 83.5 percent sensitivity, and 90.7 percent specificity.1
At the slice-based level, both models performed well: model A achieved 92.3 percent accuracy, 90.1 percent sensitivity, and 92.7 percent specificity. Once again, model A outperformed model B, which achieved 90.1 percent accuracy, 90.8 percent sensitivity, and 89.9 percent specificity.1
Nevertheless, it should be noted that in the slice-based predictions, model A only achieved 69.9 percent positive predictive value, while model B reached 62.3 percent, compared to 92.3 percent and 89.0 percent in the slice-based predictions, respectively.1
In conclusion, model A performed as well as or even better than model B, despite being pretrained on a substantially smaller data set. These promising findings indicate that this model could be adopted in small clinical settings with limited resources as an aid in emergency reading settings to reduce errors, as a pre-screening tool that does not require the exact location of the embolism, or even in oncology patients, where a delay of several days in reading CT scans is common.1
Cápsula
Although a decrease in mortality attributable to pulmonary embolism has been reported, this condition remains underdiagnosed worldwide. In view of the high morbidity and mortality risk associated with this condition, current clinical prediction models overestimate the need for an additional CTPA study to reduce undetected pulmonary embolism.2
However, this overestimation leads to an over-reliance on CTPA scans, which not only can decrease diagnostic yield and have a large impact on resource utilization but also confers risk to patients due to contrast exposure and large radiation doses. As such, this method could pose an increased risk or even be contraindicated in specific populations. For example, the use of CTPA has selectively increased in older populations, who may have an increased risk of cancer from ionizing radiation.2
Fin de la cápsula
Welcome back. In the previous segment, we talked about the need for complementary tools to the gold standard for the diagnosis of pulmonary embolism. We will now present a model based on routinely collected clinical information that could have crucial implications for improving the diagnostic process.2
Published in 2022, the research published by Dr. Glicksberg and colleagues hypothesizes that routinely collected electrocardiogram waveforms and clinical data can be combined synergistically in a machine learning model to detect pulmonary embolism in patients with moderate to severe suspicion of PE.2
The authors used a retrospective cohort of individuals with moderate or high suspicion of pulmonary embolism from five US hospitals in diverse urban populations, collecting CTPAs reports, electrocardiogram parameters, and routine clinical data, including demographics, comorbidities, vital signs such as heart and respiration rates, blood pressure, temperature and oxygen saturation, as well as relevant laboratories, such as D-dimer, BNP, and troponin levels.2
Accordingly, the study included 21,183 unique patients associated with 23,793 CTPAs and 320,746 electrocardiograms, of which 10 percent and 12.8 percent were positive for pulmonary embolism, respectively.2
The dataset was split into two groups: 90 percent in a set for nine-fold cross-validation consisting of training, model selection, and model development, and 10 percent for testing to assess model performance and benchmark against clinical scores. For the split process, all unique patients were selected based on whether they had at least one CTPA scan positive for pulmonary embolism or not.2
Data was then labeled using a two-stage approach. First, natural language processing pattern matching identified the negative reports for a pulmonary embolism. Then, the authors used a team of resident physicians to note the presence, chronicity, and vascular location of the embolism in the remaining reports. Thus, electrocardiograms were labeled as positive for pulmonary embolism if they were recorded within 24 hours of a positive CTPA scan. The electrocardiograms taken after 24 hours of the CTPA were discarded. Similarly, if the CTPA was labeled as negative, the electrocardiogram was labeled negative as well. The rest of the electronic health records data were selected and labeled following the same criteria.2
A review of the participant population revealed that the mean age was 57.9 years, and the most common comorbidities were hypertension in 35.2 percent and cancer in 23.8 percent. Interestingly, 62.2 percent of the cohort were women, and the race was divided into 2.5 percent Asian, 27 percent black, 14.8 percent Hispanic, 34.2 percent white, and 21.5 percent identified as "other". Since white men tend to be overrepresented in the training data, this model has the advantage of having been trained on a diverse population. As a result, its performance is comparable across gender and race stratifications.2
Positive CTPAs were more frequent in older patients; those aged 60.7 years have more positives compared to those aged 57.6 years. Also, there was a 14.9 percent greater likelihood of having a PE-positive CTPA in those with a history of deep vein thrombosis or pulmonary embolism, compared to 8.2 percent in those with no history.2
Another factor that increased the odds of pulmonary embolism was higher heart rates, where 93.1 beats per minute posed a greater risk than 90.3 beats. Furthermore, the risk also increased with D-dimer levels of 6.5 fibrinogen equivalent units per milliliter at admission, versus 2.3 milligrams per milliliter.2
Finally, among the positive CTPA, 5.2 percent had a truncal pulmonary embolism, 21.1 percent had a main PE, 28.4 percent had a lobar PE, and 45.3 percent had a segmental PE.2
Afterwards, the researchers developed three machine learning predictive models: the first approach used a convolutional neural network model, or CNN, to predict pulmonary embolism from raw electrocardiogram waveform data; the second was an electronic health record model, or EHR model, based on extreme gradient boosting using clinical data, namely demographics, comorbidities, labs, and vital signs, as well as electrocardiogram morphology parameters. Finally, the third model was a fusion approach that integrates clinical data and an embedded representation of the electrocardiogram waveform in an extreme gradient boosting framework.2
When discussing the results, the authors indicate that the fusion model obtained an area under the receiver-operator curve of 0.81 minus-plus 0.01, outperforming both the electrocardiogram and EHR models.2
Furthermore, the performance of these models was compared with that of the usual clinical screening tools in a sample of 100 patients from the test set.2
In this comparison, the fusion model also achieved higher sensitivity and specificity than four commonly evaluated clinical scores, namely the Wells Criteria, the Revised Geneva Score, the Pulmonary Embolism Rule Criteria, and the 4-level Pulmonary Embolism Clinical Probability Score, which achieved an AUROC ranging from 0.50 to 0.58 and a specificity of up to 0.05, compared with the specificity of 0.18 and AUROC of 0.84 minus-plus 0.01 achieved in the fusion model.2
However, one of the most interesting findings comes from the successful integration of raw electrocardiogram waveforms to improve outcome prediction, as these could contain information beyond that provided by traditional electrocardiographic measurements. As the authors state, if these data had not been detected by these deep learning models, they would be imperceptible to the clinician.2
In the words of the researchers, it is possible that this framework is detecting subtle electrocardiographic signs of increased predisposition to thrombus formation; as an example, certain morphological features that represent underlying hypertensive heart disease, or even more acute findings that suggest a manifestation of subclinical pulmonary embolism.2
While we have discussed the diagnostic needs around pulmonary embolism, much remains to be said about the development of predictive tools to enable timely intervention. This is the approach chosen for the study led by Clinical Data Scientist Logan Ryan, who along with a group of researchers developed a machine learning algorithm to identify patients at risk for pulmonary embolism in a hospitalized population prior to the clinical detection of its onset.3
To develop the model, medical and surgical patient data were extracted from the electronic medical records of a large tertiary medical center in the United States. This information included at least one recorded measure of vital signs and at least one laboratory measurement, as well as patient demographics, medication use, and diagnoses. The final cohort consisted of 60,297 patients, 309 of whom experienced a pulmonary embolism while hospitalized.3
Because the risk of pulmonary embolism increases with age, all participants were older than 40 years to minimize false alerts. On average, patients who experienced a pulmonary embolism were more likely to be older, with a history of cancer, venous thromboembolism, or pneumonia.3
Using these data, three algorithms were built, namely logistic regression, neural networks, and extreme gradient boosting. These models were intended to predict the risk of pulmonary embolism at any time during the hospital stay from the first-time vital signs and at least one laboratory measurement that was present in the patient's history.3
Let us now present the results. In terms of performance, extreme gradient boosting achieved an AUROC of 0.85, while the neural network and logistic regression models achieved only 0.74 and 0.67, respectively. There was a sensitivity constant of 81 percent across all models. However, the extreme gradient boosting model achieved a superior specificity of 77 percent compared to the 48 percent and 45 percent obtained by neural network and logistic regression, respectively. In addition, the extreme gradient boosting model obtained better positive and negative likelihood ratios and diagnostic odds ratios.3
Of note, a Shapley additive explanations, or SHAP analysis, was used to evaluate the contributions of individual characteristics to the predictions of the best performing model. The three most important characteristics to generate accurate predictions of pulmonary embolism were: a recent fracture or major trauma record, history of surgery, and history of deep vein thrombosis.3
As we know, these factors have been previously identified as either provoking or precipitating factors of pulmonary embolism, or non-causing factors, but associated with increased risk. Thus, a recent fracture may be a proxy for recent trauma, a relevant provoking factor, while previous DVT is a known risk factor mentioned in the literature.3
Furthermore, other relevant factors for prediction were urine output, change in urine output, and receipt of a fluid bolus, which may reflect whether an individual is dehydrated. Keep in mind that fluid status impacts hemoconcentration, which has been associated to an increased risk of thromboembolic events. Another important factor is increased weight and, in particular, obesity, which has also been associated with an increased risk of VTE and PE.3
Let us note that this model still has several limitations. To understand what impact it might have on both physicians and patients, it is necessary to evaluate its generalizability in different populations and in a clinical setting. However, the use of this model holds promise for the earlier identification of at-risk patients. This could allow for more intensive follow-up, earlier diagnosis, and timely treatment that could potentially reduce the need for higher risk procedures.3
Thanks for joining us on this episode of Health Connect. Don't miss our next episode, where we will discuss more artificial intelligence developments in other medical fields. Subscribe to our channel to discover the latest medical news.
Referências:
- 1. Huhtanen H, Nyman M, Mohsen T, Virkki A, Karlsson A, Hirvonen J. Automated detection of pulmonary embolism from CT-angiograms using deep learning. BMC Med Imaging. 2022;22(1):43. Available at: https://doi.org/10.1186/s12880-022-00763-z
- 2. Somani SS, Honarvar H, Narula S, Landi I, Lee S, Khachatoorian Y, et al. Development of a machine learning model using electrocardiogram signals to improve acute pulmonary embolism screening. Eur Heart J Digit Health. 2022; 3(1): 56–66. Available at: https://academic.oup.com/ehjdh/article/3/1/56/6440044
- 3. Ryan L, Maharjan J, Mataraso S, Barnes G, Hoffman J, Mao Q, et al. Predicting pulmonary embolism among hospitalized patients with machine learning algorithms. Pulm Circ. 2022;12(1):e12013. Available at: https://onlinelibrary.wiley.com/doi/full/10.1002/pul2.12013
Os links para todos os sites de terceiros são oferecidos como um serviço aos nossos visitantes e não implicam endosso, indicação ou recomendação do Health Connect. Os artigos vinculados são fornecidos apenas para fins informativos e não visam implicar uma atribuição pelo autor e/ou editor. O Health Connect se isenta de qualquer responsabilidade pelo conteúdo ou pelos serviços de outros sites. Recomendamos que você analise as políticas e condições de todos os sites que escolher acessar.
NON-2022-15144
NON-2023-2512