New AI Model Brings Radiologist-Level Insight to Lung Cancer Detection
A new artificial intelligence (AI) system for analyzing chest X-rays shows promise in making lung cancer detection more transparent and aligned with how radiologists actually think. The model, called XpertXAI, was developed by researchers at the University of Edinburgh and NHS Lothian in the UK.
Lung cancer remains one of the deadliest forms of cancer worldwide, with early detection via chest X-rays being critical for improving patient outcomes. However, the "black box" nature of many AI systems used to analyze medical images has limited their adoption and trustworthiness in clinical settings.
The researchers set out to create an AI model that could not only accurately detect lung abnormalities, but also explain its reasoning in a way that makes sense to medical professionals. Their key innovation was incorporating expert radiologist knowledge directly into the model's architecture.
"We wanted to bridge the gap between how AI 'thinks' and how radiologists actually approach X-ray analysis," said lead researcher Amy Rafferty. "By baking in clinical concepts from the start, we aimed to create explanations that would be immediately meaningful and trustworthy to doctors."
The team's approach, detailed in a new paper on the preprint server arXiv, builds on a machine learning technique called concept bottleneck models. These models split the typical image classification process into two steps - first predicting the presence of human-interpretable "concepts" in an image, then using those concepts to make a final diagnosis.
What sets XpertXAI apart is that its concepts were carefully curated by expert radiologists to reflect how they actually think about chest X-rays. This includes concepts like "mass," "nodule," and "irregular hilum" that are highly relevant for lung cancer detection.
The researchers extracted these expert-defined concepts from radiology reports associated with over 35,000 chest X-rays. They then trained their AI model to first predict the presence of these clinically meaningful concepts before making a final diagnostic prediction.
To evaluate XpertXAI, the team compared it against several leading "Explainable AI" techniques on a large public dataset of chest X-rays. These included popular methods like LIME, SHAP, and Grad-CAM that attempt to highlight important regions of an image after a model has already made its prediction.
They also tested against a recent AI system called CXR-LLaVA that uses large language models to generate radiology-style reports from chest X-rays, as well as an unsupervised concept-based model called XCBs.
The results showed that XpertXAI significantly outperformed these other approaches in producing clinically relevant explanations. When comparing the model's concept predictions to the actual content of radiology reports, XpertXAI achieved an F1 score (a measure of accuracy) of 0.842, compared to 0.799 for XCBs and 0.658 for CXR-LLaVA.
"This tells us that XpertXAI is much better at zeroing in on the specific clinical features that radiologists actually care about when reading an X-ray," Rafferty explained.
Perhaps most importantly, XpertXAI's explanations aligned much more closely with expert radiologist judgment compared to other AI methods. The researchers had a consultant radiologist with over 10 years of experience evaluate explanations generated for 60 chest X-rays - 40 showing cancer and 20 healthy.
For the cancerous cases, XpertXAI's explanations were rated as fully clinically relevant in the vast majority of cases. The expert noted that XpertXAI consistently highlighted key diagnostic features like masses and nodules that were missed by other AI methods.
In contrast, popular techniques like LIME, SHAP and Grad-CAM frequently received the lowest possible relevance scores from the radiologist. The expert noted these methods often highlighted anatomically implausible regions or missed obvious abnormalities entirely.
"It was striking to see how often the standard AI explanations focused on irrelevant areas of the image or failed to pick up on clear signs of cancer," said Dr. Rishi Ramaesh, the consulting radiologist on the project. "XpertXAI's explanations much more closely matched how I would actually approach analyzing these X-rays."
The CXR-LLaVA system, while able to generate plausible-sounding reports, struggled with consistency. It often produced contradictory findings when given slightly different prompts about the same image. It also had a high rate of false negatives, failing to mention obvious cancerous features in many cases.
The unsupervised XCBs model performed reasonably well, but still fell short of XpertXAI in terms of clinical relevance. It often surfaced concepts that were not actually useful for diagnosis, particularly on healthy X-rays.
"XpertXAI strikes a sweet spot by constraining the model to focus on concepts that we know are clinically meaningful," Rafferty noted. "This expert guidance seems to be key for generating explanations doctors can actually trust and use."
Beyond just producing better explanations, XpertXAI also achieved strong performance on the core task of classifying chest X-rays. Using a decision tree architecture for its final classification step, XpertXAI achieved an F1 score of 0.866 for predicting the presence of lung abnormalities. This outperformed both a standard deep learning model (F1 of 0.740) and the XCBs approach (F1 of 0.792).
"It's exciting to see that by incorporating clinical knowledge, we can actually improve the underlying predictive power of the model," said Rafferty. "This suggests there's real value in trying to align AI reasoning with human expert reasoning."
The researchers stress that their current work focuses specifically on chest X-rays and lung cancer detection. However, they believe the core approach of incorporating expert-defined concepts could be valuable across many areas of medical AI.
"Every medical specialty has its own set of clinically relevant concepts and ways of approaching diagnosis," Ramaesh explained. "This framework gives us a way to encode that knowledge directly into AI systems, potentially making them more robust and trustworthy across the board."
The team is now working to expand their approach to other imaging modalities like CT scans and MRIs. They also plan to conduct larger-scale clinical evaluations to further validate the real-world utility of XpertXAI's explanations.
While the results are promising, the researchers caution that more work is needed before such systems could be deployed in actual clinical settings. Careful testing is required to ensure the model performs consistently across diverse patient populations and hospital systems.
There are also important questions to consider around how AI explanations might influence clinical decision making. "We need to be thoughtful about how these kinds of explanations are presented and ensure they enhance rather than unduly bias clinical judgment," Rafferty noted.
Nevertheless, this work represents an important step toward more transparent and clinically grounded AI for medical imaging. By bridging the gap between AI and radiologist reasoning, approaches like XpertXAI could help build the trust needed for broader adoption of AI in healthcare.
"Ultimately, the goal is to create AI assistants that can truly augment clinical expertise," said Ramaesh. "Systems that not only make accurate predictions, but can explain their reasoning in a way that resonates with medical professionals and supports better patient care."
As AI continues to advance in capabilities, incorporating human domain knowledge may prove key to creating systems that doctors can confidently rely on. The XpertXAI project offers a compelling blueprint for how to build that knowledge directly into the foundations of medical AI.