[edit]
RELATE: Relation Extraction in Biomedical Abstracts with LLMs and Ontology Constraints
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:1178-1193, 2026.
Abstract
Biomedical knowledge graphs ({KG}s) are vital for drug discovery and clinical decision support but remain incomplete. Large language models ({LLM}s) excel at extracting biomedical relations, yet their outputs lack standardization and alignment with ontologies, limiting {KG} integration with free texts. We introduce {RELATE}, a three-stage pipeline that maps {LLM}-extracted relations to standardized ontology predicates, e.g., the Biolink Model. The pipeline includes: (1) ontology preprocessing with predicate embeddings, (2) similarity-based retrieval enhanced with SapBERT, and (3) {LLM}-based reranking with explicit negation handling. This approach performs relation extraction from free-text outputs to structured, ontology-constrained representations. On the ChemProt benchmark, {RELATE} achieves 52% exact match and 94% accuracy@10, and in 2,400 {HEAL} Project abstracts, it effectively rejects irrelevant associations (0.4%) and identifies negated assertions. {RELATE} captures nuanced biomedical relationships while ensuring quality for {KG} augmentation. By combining vector search with contextual {LLM} reasoning, {RELATE} provides a scalable, semantically accurate framework for converting unstructured biomedical literature into standardized {KG}s.