An Integrated Database and Smart Search Tool for Medical Knowledge Extraction from Radiology Teaching Files


Priya Deshpande, Alexander Rasin, Eli Brown, Jacob Furst, Daniela Raicu, Steven Montner, Samuel Armato III ;
Proceedings of The First Workshop Medical Informatics and Healthcare held with the 23rd SIGKDD Conference on Knowledge Discovery and Data Mining, PMLR 69:10-18, 2017.


Accurate and timely diagnosis is crucial for an effective medical treatment. Teaching files are widely used by radiologists as a resource in the diagnostic process and to teach students of radiology. Teaching files contain images, recorded discussion and notes, external references, augmenting annotations, and patient history. Most hospitals maintain an active collection of teaching files for their internal purposes, but many publically available teaching files are available through online sources that typically provide a basic keyword search interface but little else that can help physicians find the most relevant examples. Other secondary sources (e.g., journals or radiology textbooks) might also be referenced from a teaching file or provide an independent source of information; however, journal and textbook search capabilities, if available, can be very ad hoc and even more limited than for public teaching file repositories. Therefore, in order to access multiple resources, radiologists need to manually navigate each particular source and aggregate the search results into a full answer. In this paper, we describe our integration of multiple public data sources into a unified medical resource repository and the design of advanced search features that make it easier to find relevant teaching files as well as journals or textbooks. Our approach supports incorporating diverse public data that can be further combined with a hospital’s in-house teaching files to provide an integrated radiological knowledge repository. We tested our Integrated Radiological Image Search (IRIS) engine using a set of representative queries. Our search engine finds more accurate and relevant results compared to search engines available for public data sources. The IRIS engine is tailored to facilitate understanding of natural language queries, including negation statements, synonym terms, adjectives, and different sources of text. In addition, the search engine is designed to allow further integration of a module for image-based search to allow finding of visually similar cases.

Related Material