Publication: Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology

Hrituraj Singh
Apr 20, 2024
1 min read

Quick Summary

Our three Onco-Retriever models consistently outperform baseline models (Ada, Mistral, and PubMedBERT) in terms of precision, recall and across 13 oncology concepts -- and with less latency.

Links

arXiv (full PDF)

Abstract

Retrieving information from EHR systems is essential for answering specific questions about patient journeys and improving the delivery of clinical care. Despite this fact, most EHR systems still rely on keyword-based searches. With the advent of generative large language models (LLMs), retrieving information can lead to better search and summarization capabilities. Such retrievers can also feed Retrieval-augmented generation (RAG) pipelines to answer any query. However, the task of retrieving information from EHR real-world clinical data contained within EHR systems in order to solve several downstream use cases is challenging due to the difficulty in creating query-document support pairs. We provide a blueprint for creating such datasets in an affordable manner using large language models. Our method results in a retriever that is 30-50 F-1 points better than propriety counterparts such as Ada and Mistral for oncology data elements. We further compare our model, called Onco-Retriever, against fine-tuned PubMedBERT models as well. We conduct extensive manual evaluation along with latency analysis of the different models and provide a path forward for healthcare organizations to build domain-specific retrievers. All our experiments were conducted on real patient EHR data.

Patient-to-Trial Matching

Trial Enrollment

Clinical Navigators

Trial Enrollment

Cancer Registries, Business Intelligence and Precision Oncology

Clinical Navigator

Triaging, Scheduling and Pre-Charting

Publication: Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology

Recent Posts