Leveraging Natural Language Processing for Automated Extraction of Electronic Health Records: Enhancing Clinical Research and Patient Care

Sqilline establishing Danny Platform technology and data analytics services in Romania

In the digital era, the transformation of electronic health records (EHRs) from paper-based to electronic format has revolutionized healthcare practices. The process of digitalization and extraction of medical records holds immense significance in clinical research, patient recruitment for clinical trials, and the overall improvement of patient care, especially in the age of value-based healthcare.

Challenges in automated extraction of medical text in various languages

A major challenge arises from the fact that over 80% of data within EHRs exists as unstructured text. This creates hurdles for efficient analysis and utilization of the information contained within medical records. Furthermore, most studies have predominantly focused on English-language records, leaving inflected languages with non-Latin alphabets, such as Slavic languages with Cyrillic alphabets, with numerous linguistic challenges. In addition, it is worth noting that, aside from manual annotation errors, other errors are also known to occur in medical records and registries, with error rates that can be as high as 27%.

This blog post explores the development of deep learning-based natural language processing algorithms developed by Sqilline, building upon our original paper “Clinical Data Extraction and Normalization of Cyrillic Electronic Health Records Via Deep-Learning Natural Language Processing” published by the American Society of Clinical Oncology (ASCO) in 2019.

Sqilline’s research team encountered various difficulties during the analysis and publication process, including the need for rigorous data verification and specialist oncologists’ involvement in reviewing information. While this approach ensured reliable results, it became increasingly time-consuming as the volume of medical records grew. To address this, Sqilline recognized the transformative potential of Natural Language Processing (NLP) and machine learning (ML) algorithms in streamlining data processing and verification while maintaining high-quality standards.

We demonstrated that using a deep-learning-based NLP algorithm, we can achieve high-performance (AUC ≥ 0.98) for the extraction and normalization of biomarker values in EHRs containing mixed languages and with substantial heterogeneity in the target parameter positions and values. “Four years later, we see in practice the effectiveness of our technology and the importance of NLP in other countries with a variety of languages, and Romania is the latest addition to our pipeline. Sqilline’s Danny Platform now automatically extracts, normalizes, structures, and analyzes data from EHRs in major Romanian hospitals,” said Desislava Mihaylova, CEO and Founder of Sqilline.

Sqilline transforms data processing and enhances quality

The integration of deep-learning-based NLP algorithms into the extraction and analysis of medical records has revolutionized the field, allowing for increased productivity and improved outcomes. “By automating the initial stages of data processing, our deep-learning-based NLP algorithms enable researchers to handle a vast amount of unstructured medical text efficiently. This significantly reduces the burden on healthcare professionals, freeing up their time for more critical tasks,” said Mihail Jekov, CTO of Sqilline.

Moreover, deep-learning-based NLP algorithms facilitate the incorporation of domain-specific knowledge and expertise. Healthcare professionals can “work in tandem” with algorithms, guiding the development of rules and criteria specific to their medical domain, based on treatment durations and therapy effectiveness. This collaborative approach ensures that the automated process aligns with clinical expertise, maintaining high levels of accuracy and quality in the analysis.

The rise of large language models (LLMs)

Leveraging the power of well-trained LLMs, NLP and ML algorithms can automate a significant portion of the data processing and verification process. This integration offers two key advantages. Firstly, it enables faster processing of larger amounts of information, ensuring efficiency and scalability. Secondly, it optimizes the expertise of medical specialists, allowing them to focus on vital tasks such as defining requirements, finalizing analysis conclusions, and serving as advisors in population selection criteria.

At Sqilline, we are currently discussing the potential medical applications of large language models, along with the challenges they present and future directions. An essential focus of our discussions is to ensure that these models deliver precise and personalized insights that aid in human decision-making and improve patient outcomes,” commented Mihail Jekov.

Data extraction from an unstructured data source is critical for clinical research, for identifying eligible patients for clinical trial enrollments, and for monitoring treatment outcomes for value-based care not only for oncology, but across all fields of medicine. Approaches such as the one we develop at Sqilline are critical in automating data processing, improving the accuracy of multilingual EHRs, and finally supporting decision making in healthcare.

Share this article:

More News & Highlights


Results & Insights from the Second National Study on Pediatric Cancers Survival

Bucharest, February 15, 2024 - Sqilline Health SRL played a key role in unveiling Romania's second national study on childhood cancers survival alongside the Dăruiește...



EMA Selects Sqilline Health as Data Partner for Real-World Data and Studies

We are excited to announce that Sqilline Health's Danny Platform has officially been selected as a Data Source for Real-World Data and Studies by the...



Survival in Pediatric Cancers in Romania Symposium to Feature Sqilline’s Partnership with Daruieste Aripi Association and RSPOH

Bucharest, February 15, 2024 – Sqilline Health SRL is honoured to participate in the event "Survival in Pediatric Cancers in Romania – Current State, Outlook,...