Extensive research is focusing on the development of tools that may enhance drug development and personalised oncology in the near future
There has been a recent move to a more generalist biomedical artificial intelligence (AI) that integrates the capabilities of various models. These latest iterations of biomedical AI have agentic capabilities – i.e. these are systems able to act independently but communicate and work with each other in a goal-driven manner using text as the universal interface. This shift from models that simply read data to models capable of intelligent reasoning to answer medical questions offers the potential to assist in complex tasks with a broad range of applications from cell biology research to drug discovery and drug design.
Med-PaLM was the first large language model (LLM) to enter the clinical domain, having undergone extensive and broad-ranging benchmarking to enable successful response to medical questions (Nature. 2023;620:172–180). While Med-PaLM was comparable to clinicians in its ability to correctly retrieve medical knowledge and instigate a low likelihood of harm if people acted on the generated answers, the model was out-performed by clinicians in other aspects, such as non-medical experts’ evaluation of the helpfulness of generated answers. Med-PaLM then formed the basis for further medical domain-specific training and fine-tuning to create Med-PaLM 2, which surpassed its predecessor in its ability to provide long-form responses to consumer medical questions, and even provided responses preferred by specialist clinicians 65% of the time compared with responses generated by generalist clinicians (Nat Med. 2025;31:943–950). However overall, responses provided by specialist clinicians were preferred to Med-PaLM 2’s responses by both specialist and generalist clinicians, and researchers continued to strive towards a model with clinician-level performance. A further advance came with the addition of multimodal capabilities, leading to the development of Med-PaLM M (NEJM AI 2024:1(3)), which is described as a generalist biomedical AI system capable of interpreting multimodal biomedical data and handle a range of tasks, such as generate accurate reports on chest X-ray findings, as evaluated by radiologists and clinicians.
Other medicine-specific multimodal models, such as Med-Gemini and Med-Gemma were developed to interpret diverse medical data, including classifying medical images, answering questions on 2D and 3D medical images, generating reports, predicting genomic risk and offering differential diagnoses. More recently, further advances have led to the development of TxGemma, a generalist LLM designed to assist in the drug development process by, for example, predicting the properties of a potential therapeutic molecule to ascertain its likely adverse events in clinical trials.
Teaching LLMs the language of biology, Cell2Sentence (C2S) promises to enhance our understanding of cellular heterogeneity by analysing single-cell RNA sequences using its large capacity and multimodality, and pave the way for integration of transcriptomic data, natural language and contextual information. It is hoped that LLMs could accelerate the discovery of novel treatments by screening thousands of potential therapies virtually, or by simulating perturbation responses instantly, or even generate new hypotheses and testing these in the laboratory. LLMs therefore have the potential to speed up and drastically reduce the costs of drug discovery and early-stage drug development.
A key emerging opportunity for LLMs in the cancer arena is for true personalised oncology. The prospect of moving beyond simple biomarkers to more comprehensive patient modelling that integrates an individual’s unique genomic profile with their clinical history might offer the ability to tailor treatment strategies on a patient-by-patient basis in the future. However, there are obstacles, not least of which is data fragmentation. While we can train LLMs on enormous datasets comprising millions of samples, at present, we still lack the ability to train these models on high-quality, multimodal biomedical paired datasets at the scale needed to train LLMs adequately. Data of this nature tend to be siloed, and it would require a concerted effort among multiple stakeholders to collate datasets of sufficiently high quality. Validation of the quality and safety of LLMs is another challenge that must be addressed before these models can be deployed in a clinical setting.
The LLMs of the future will be smarter than existing models and will not only have the ability to use advanced reasoning to respond to our clinical questions, helping support patient management, but will also act as a reliable research partner to help us better understand disease and its treatment faster and more cost effectively than today.
Programme details
Azizi S. Biomedical AI agents: From reasoning to discovery. ESMO AI & Digital Oncology Congress 2025