Research unlocks insights into lung cancer evolution from electronic medical records

Doctor reading from tablet

A domain-specific natural language processing (NPL) pipeline showed strong performance in extracting clinically meaningful information from diverse clinical documents of patients with non-small cell lung cancer

Electronic Medical Records (EMRs) represent a rich but heterogeneous data source for monitoring cancer trajectory and refine treatment strategies according to tumour evolution, however their full potential remains constrained by current technologies’ limited ability to interpret unstructured data. Natural language processing (NLP) approaches have emerged as a promising approach to extract clinical information from oncologists’ narrative-based documentation, as confirmed by a study described on the ESMO Real World Data and Digital Oncology (ESMO Real World Data and Digital Oncology, 2026; Vol 11, 100660 ).

At the University Hospital of Toulouse, France, a domain-specific NLP pipeline was developed by combining rule-based and machine learning algorithms to identify key variables while preserving contextual attributes, including temporality, data granularity, and clinical certainty, such as hypothesis or uncertain event. The NLP solution was used to analyse 1,028 clinical documents (discharge summaries and external consultation letters) from 120 patients with non-small cell lung cancer treated with oral targeted therapy. The system’s performance was then assessed by comparing the automatically extracted facts with expert-curated annotations, considered as being the reference standard.

The domain-adapted NLP solution achieved an F1-score of 79.7% for tumor evolution concept extraction and 62.0% for temporality alignment. Commenting on these findings, Dr Rodrigo Dienstmann, working at the Oncoclínicas & Co, Brazil, and Vall d’Hebron Institute of Oncology, Spain, and Editor-in-Chief of the ESMO Real World Data and Digital Oncology peer-reviewed journal, highlights that NPL-based strategies have the potential to support real-world endpoint reconstruction which is critical for optimising cancer care.
“Progress in real-world oncology analytics depends on bridging documentation and data,” he notes. “NLP must not only read what clinicians write but also reconstruct longitudinal trajectories in a way that is scalable, interoperable, and sufficiently robust to inform outcomes research.”

This site uses cookies. Some of these cookies are essential, while others help us improve your experience by providing insights into how the site is being used.

For more detailed information on the cookies we use, please check our Privacy Policy.

Customise settings
  • Necessary cookies enable core functionality. The website cannot function properly without these cookies, and you can only disable them by changing your browser preferences.