Trustworthy and explainable AI – a key to equitable oncology?

ESMO Cancer Research
Trustworthy and explainable AI – a key to equitable oncology?

Mitigating research biases by rigorous data analysis and validation can pave the way for a wider adoption of artificial intelligence in cancer care

Artificial intelligence (AI) is often presented as a transformative tool for improving cancer care; however, its performance and reliability are fundamentally constrained by the quality, representativeness, and underlying assumptions of the data and clinical research paradigms on which it is built. The trustworthiness of AI applications is now central to debate as it is decisive for wide adoption of AI-based solutions in clinical practice, as Prof. Alessandra Pedrocchi, from the Politecnico, Milan , Italy, highlighted during her talk at the ESMO Women for Oncology Forum a few months ago. She is currently Professor at the Department of Electronics, Computer Science, and Bioengineering of the Politecnico, and she is one of the founders of the laboratory AI for oncology of the National Cancer Institute of Milan. One of the projects developed is I3LUNG, which aims to yield a novel, integrated, AI-assisted data storage and elaboration platform to individualise immunotherapy in non-small cell lung cancer (Clin Lung Cancer. 2023 Jun;24(4):381-387 ).

Since the very beginning, the focus of your work has been on trustworthy and explainable AI. How has this shaped your approach to collaborating with clinicians in oncology?

Until around five years ago, AI was largely centered on performance, which fueled competition among experts for who could claim first to achieve a specific outcome, such as a better breast cancer prediction through an automated tool. I deliberately rejected this approach in my collaboration proposals, and I instead emphasised that AI should serve as an additional layer of reasoning, just like asking for a second opinion, rather than being an automated process.
Initially, we spoke about explainable AI, putting our efforts on the interpretability and usability of AI algorithms by clinicians, understanding why models made specific decisions, and helping detect biases integrated in the development process. Explainability is a technical effort, not just storytelling, because to make an algorithm interpretable you need to develop another algorithm. It leads to improve user trust in AI applications, which is decisive for their adoption in healthcare.

You have extensive experience in neuroscience and neurorehabilitation, while oncology as a field has only become an area of interest for you in more recent times. What kind of education and skill set are required to a professional other than a medical doctor to enter a disease area and become a key contributor to multidisciplinary projects?

This is a question I ask myself every day. When I began my journey in oncology, the part of my background that was most directly related to AI oncology was my expertise in neural networks, which today form the core of most AI applications in healthcare. However, I was embarking this journey with exceptional oncologists, who were very attentive in teaching me.
A second key aspect was my training: I hold a PhD in biomedical engineering and a master’s degree in electronics, and throughout my research career I have consistently focused on medical applications and the clinical domain. This background has shaped an essential attitude toward working with clinical partners, and learning from them continuously.

Understanding the clinical problem, the clinical needs, and even the clinical language requires specific attitude and mindset that does not come automatically from technical training alone, but can be cultivated by collaborating closely with medical professionals. I would describe this as a methodological approach that is transversal in nature and can be developed in one clinical domain, like for me was neurorehabilitation, and then transferred to others.
At our research laboratory, every project, including a PhD thesis, has two appointed supervisors: one clinical, such as an oncologist, and one technical expert. In this way, we ensure that the clinical and technical perspectives are always integrated and create an opportunity of mutual learning. No one works in isolation, and this collaborative model is growing professionals that can truly contribute to multidisciplinary projects in AI.

What are the most significant sources of bias currently limiting the real-world deployment and clinical trust of AI-based solutions in oncology?

Biases can arise everywhere. The first essential step is awareness, that is recognising that biases exist. The second is actively correcting for them, whenever possible. Bias can originate in the data, since access to data collections is often unequal, and can also come from the algorithms themselves, which is why specific tests to detect and assess algorithmic bias must be put in place. In addition, bias can emerge from final users and how they use algorithms in practice.

This means that fairness is not something that can be addressed at the end of a project: it must run in parallel throughout the entire development process. Identifying and correcting bias is far easier when it is done early and continuously during the development journey. If biases are only discovered at the final stage, addressing them can require a disproportionately large effort or may no longer be feasible at all.

How can AI research bias be addressed efficiently?

The first essential step is a rigorous exploratory data analysis. The initial dataset used for algorithm development must be fully understood, particularly with respect to so-called protected labels such as age, ethnicity, and comorbidities. While not all imbalances can be corrected, they must at least be identified and acknowledged.

Traditionally, algorithm development relies on splitting the dataset into a large training set and a smaller testing set, with the split performed randomly. However, what has become standard practice more recently is to stratify this split by protected labels. This ensures similar distributions of protected labels and other key variables across training and testing datasets, helping to reduce algorithmic bias and performance disparities.
In addition, we always include an external validation dataset, typically collected at a different center. While the type of data collected is consistent in principle, this external validation allows us to assess interoperability and real-world generalisability. Issues about population shifts can arise; when algorithm performance is very high on the training set, but rather lower on the testing set, and usually even lower on external validation, it often indicates that the model has not been properly optimised. Robustness, generalisability, and fairness are therefore critical considerations, and they depend strongly on dataset size. With very small datasets, testing and external validation sets become too limited to provide meaningful statistical support.

After training, we further evaluate performance across protected labels, examining differences in accuracy or specificity to determine whether the algorithm systematically underperforms for certain groups. This is a common risk when some populations are underrepresented, as algorithms naturally optimise for overall statistical performance and tend to favour the most represented data.
This challenge becomes even more pronounced as we move toward genomics, multi-omics, and targeted therapies, where patient populations are increasingly small and diseases effectively become rarer. In this context, AI can help by leveraging larger and more flexible models to capture families of related populations.

Are you optimistic on whether AI will ever be truly fair, equitable, inclusive, and truly reflect the real-world cancer complexity?

The perfect scenario is always yet to come. I believe that the major revolution we are aiming is data sharing on a large-scale. For example, the European Health Data Space strongly interprets this need. Large-scale initiatives that make high-quality, valid data widely available are the key to improving fairness, equal quality, inclusiveness, and overall performance: the larger and more diverse the data we can access, the fewer biases we will have.

This site uses cookies. Some of these cookies are essential, while others help us improve your experience by providing insights into how the site is being used.

For more detailed information on the cookies we use, please check our Privacy Policy.

Customise settings
  • Necessary cookies enable core functionality. The website cannot function properly without these cookies, and you can only disable them by changing your browser preferences.