In a newly released framework paper, ESMO defines criteria for assessing and implementing AI-based biomarkers in oncology
Artificial intelligence (AI) is increasingly generating novel biomarkers informing cancer diagnosis, prognosis, and treatment selection. However, its implementation in clinical practice remains limited due to several challenges, including the absence of a conceptual framework and guidance on how to validate and safely use AI-based biomarkers in clinical practice.
To address these gaps, the ESMO Precision Oncology Task Force and the ESMO Real World Data and Digital Health Task Force have collaborated to create the ESMO Basic Requirements for AI-based Biomarkers in Oncology (EBAI) (Articles in Press November 18, 2025), aiming to support developers, physicians, regulators, and healthcare institutions to fully realise the technology’s potential while keeping associated risks under control.
According to Dr Ben Westphalen of the University Hospital LMU, Munich, Germany, Chair of the ESMO Precision Oncology Task Force, the EBAI effort will also help to increase trust in AI within the oncology community and to promote an open dialogue with regulators.
What are the advantages and potential drawbacks of implementing and using AI-based biomarkers?
AI systems can function as biomarkers because they are able to analyse complex, multidimensional data to predict disease features and clinical outcomes, including treatment responses in patients with cancer. These AI systems can process information and identify patterns that may even be imperceptible to humans, effectively transforming data into actionable clinical insights. I believe these tools can help to speed up, scale, save money and, in the end, streamline work processes in the management of cancer patient risk.
A few years ago, a pioneering work demonstrated that deep learning could predict microsatellite instability (MSI) status directly from H&E-stained histology slides, and results were then confirmed in a regular laboratory test (Nat Med. 2019 Jun 3;25(7):1054–1056).
As AI capabilities continue to evolve, it will be critical for oncology professionals to ensure that any AI-based tool entering the clinics is demonstrably equivalent in accuracy and reliability to the established gold-standard test in clinical practice, in order to properly manage false-negative cases identified by AI and avoid the risk of patients going untested, and missing out on appropriate biomarker-based treatments. One significant risk lies in fact in overreliance on technologies that are not yet scientifically validated or robust. This could result in deviations from standard-of-care procedures, which is something that must be strictly avoided.
The uptake of AI-based biomarkers is still limited. What are the possible reasons behind suboptimal use of AI-related opportunities for precision oncology?
At present, true medical AI solutions remain rare, fragmented, and not widely implemented, at least in Europe. This is largely due to the challenges of integrating such technologies into clinical practice without clear guidelines and frameworks.
From a regulatory perspective, an AI biomarker can be classified as either a medical device or an in vitro diagnostic device, allowing it to be approved and labeled according to European standards. However, implementing these tools in real-world clinical settings is an entirely different challenge. Just as we have done with other laboratory-developed biomarkers that underwent rigorous prospective validation, AI-based biomarkers must also be tested thoroughly to build confidence among clinicians and the broader medical oncology community.
Once this trust is established and these tools are routinely used, they can significantly improve efficiency, reduce costs, and, ultimately, help more patients receive effective treatments.
Imagine, for instance, replacing a biomarker test that costs hundreds of euros with an AI-driven system that uses a simple slide scanner and computational model. This could even be done in a federated manner: a slide scanned anywhere in the world could be analysed remotely, allowing biomarker status to be determined without the need for complex molecular pathology infrastructure. Such an approach could not only save substantial costs but also democratise access to biomarker testing on a global scale.
What are the key validation and performance criteria for AI-based biomarkers that the ESMO Task Force has identified?
Firstly, there must be clarity regarding the ground truth, i.e., the gold standard against which the AI biomarker was tested – which must be clearly defined and transparently reported.
Secondly, there must be clarity on performance, meaning how well the biomarker performs compared to the established standard of care; if it is intended to function as a surrogate biomarker, its performance must be at least equivalent.
Thirdly, there is generalisability: the biomarker should not only function reliably within the controlled environment of a single institution but also across different settings, regardless of the data sources used.
Additional aspects that we recommend to address are fairness, i.e., presence and mitigation of biases related to, for example, race, gender or socioeconomic status, explainability, cost considerations and turnaround time.
The EBAI framework categorises AI-based biomarkers into three distinct classes. To what extent do different characteristics impact the way AI-based biomarkers should be validated and integrated into clinical practice?
Each class comes with its own risks and underlying evidence required for its implementation in clinical practice.
Class A biomarkers are those that automate tedious or repetitive tasks such as counting cells and carry relatively low risk. Class B biomarkers, could be referred to as surrogate biomarkers, use AI tools for screening, enrichment or filtering within larger populations. Here, stronger evidence is required to ensure that the technology can accurately identify true positives and true negatives, demonstrating high sensitivity and specificity.
Class C biomarkers are associated with higher risk. These are novel entities, not based on established biomarkers, and when integrated into clinical care, they must be rigorously evaluated across multiple cohorts. Class C biomarkers are further divided into two subcategories: C1 with a prognostic value, and C2 for predictive use. The latter are the highest-risk category for which the highest level of evidence is required, ideally generated through randomised clinical trials, like the validation of novel laboratory biomarkers. High-quality real-world data, as emphasised by the ESMO Guidance for Reporting Oncology real-World evidence (ESMO-GROW), can complement and support this evidence but should never replace prospectively collected data.
The EBAI framework adds to other recent ESMO initiatives such as ESMO Guidance on the Use of Large Language Models in Clinical Practice (ELCAP) to build consensus and set shared standards to improve uptake and trust in AI-driven cancer care. What are the next steps?
It is important to start an open dialogue about the use of AI in oncology. On one hand, we must acknowledge the immense potential of AI tools, particularly at a time when Europe is facing a shortage of around 1.2 million healthcare workers . On the other hand, we cannot rush these technologies into clinical practice with the risk of disregarding the rigorous methods that have long guided clinical trials and evidence-based generation.
That is why ESMO’s commitment to developing frameworks and to building a narrative around AI in oncology is so important; it signals that we are committed to developing a new interdisciplinary field responsibly, together with professionals from different fields, for the ultimate benefit of our patients.
There are still many uncertainties, but they can be overcome through collaboration among developers, regulators, and the entire scientific community. My hope is that this initiative can serve as a starting point for continued dialogue and help establish a lasting collaboration for innovation and patient-centered progress.