While synthetic real-world data may help overcome current challenges in clinical trials, it also introduces new complexities
Artificial intelligence (AI) is rapidly transforming the landscape of clinical research and one area of growing interest is the generation of synthetic real-world data (sRWD). By leveraging advanced AI models, researchers can now generate synthetic datasets that closely mirror the statistical properties of actual patient populations, enabling unprecedented access to high-quality data while overcoming longstanding privacy and regulatory barriers (NPJ Digit Med. 2023;6:186).
The promise of AI-generated synthetic cohorts lies in their ability to facilitate data sharing, a critical challenge in both academic- and industry-led research. Real-world datasets from electronic medical records and clinical trial repositories provide the foundation for discovery and hypothesis testing, but access is often hindered by stringent privacy laws and administrative burdens. Synthetic data circumvents these obstacles by generating artificial patient profiles that retain cohort-level fidelity without directly exposing sensitive information. As demonstrated in a recent study presented at the ESMO Congress 2025 and that involved over 19,000 patients with metastatic breast cancer, AI models – such as conditional generative adversarial networks (CTGANs) and classification and regression trees (CART) – can create synthetic datasets highly faithful to the original populations, achieving strong agreement in survival outcome analyses while quantifying and mitigating re-identification risks (Abstract 3136O).
Key applications of synthetic data in clinical research include the development of synthetic control arms and the simulation of trial scenarios (PLOS Digit Health. 2025;4:e0000581). In oncology in particular, where traditional randomised controlled trials can be prohibitively slow or ethically contentious, sRWD enables researchers to generate control cohorts that closely match real patients. This approach can reduce patient burden, speed up recruitment, and provide robust external comparators while maintaining compliance with privacy guidelines.
Further, the creation of synthetic datasets can help mitigate data imbalances and biases, which are common limitations in real-world medical repositories. By generating more diverse and representative cohorts, AI models can be trained to better predict outcomes across different groups, supporting equity and generalisability in clinical decision-making. Regulatory authorities are increasingly engaging with these approaches, recognising the role of sRWD not only in research and development, but also in the evolution of evidence standards for new therapies (Comput Struct Biotechnol J. 2025:28:190–198).
Despite these advances, barriers to widespread adoption persist. The lack of harmonised legal definitions, ongoing technical debates about fidelity versus privacy, and the need for robust validation frameworks present obstacles to routine integration of sRWD in research frameworks. Clinical trust and regulatory endorsement remain paramount. Progress will require multidisciplinary collaboration to engage clinicians, data scientists and regulators to establish international standards and transparent evaluation criteria.
AI-generated sRWD stands poised to accelerate clinical research, enabling innovation while safeguarding patient privacy. Responsible, validated adoption of sRWD offers a path towards more agile, collaborative and inclusive science – an imperative as medicine moves further into the digital era.
AI & Digital Oncology: Resources in one place
Looking for further insights into how artificial intelligence and digital tools are impacting oncology? The ESMO AI & Digital Oncology Hub brings together expert perspectives, research updates and thought leadership from across oncology.
It is a space where you can stay informed, discover resources and follow the conversation on digital innovation in cancer research and treatment.
To further explore the transformative potential of AI in oncology, the very first ESMO AI & Digital Oncology Congress 2025, taking place from 12 to 14 November, will provide a dedicated platform focused on the latest advances in AI and digital technologies in cancer care.