scispace - formally typeset
Open AccessProceedings Article

Free-text medical document retrieval via phrase-based vector space model.

TLDR
This work proposes to represent documents using phrases, a vector space model that represents a document as a vector of index terms, and shows that phrase-based VSM yields a 16% increase of retrieval accuracy compared to the stem-based model.
Abstract
Many information retrieval systems are based on vector space model (VSM) that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve retrieval accuracy. However, past research revealed that such systems did not outperform the traditional stem-based systems. Incorporating conceptual similarity derived from knowledge sources should have the potential to improve retrieval accuracy. Yet the incompleteness of the knowledge source precludes significant improvement. To remedy this problem, we propose to represent documents using phrases. A phrase consists of multiple concepts and word stems. The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems. The document similarity can in turn be derived from phrase similarities. Using OHSUMED as a test collection and UMLS as the knowledge source, our experiment results reveal that phrase-based VSM yields a 16% increase of retrieval accuracy compared to the stem-based model.

read more

Citations
More filters
Journal ArticleDOI

A framework for unifying ontology-based semantic similarity measures

TL;DR: This paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases, and unify a large number of state-of-the-art semantic similarity measures through common expressions.
Journal ArticleDOI

Semantic Similarity from Natural Language and Ontology Analysis

TL;DR: Semantic measures as discussed by the authors assess the similarity or relatedness of semantic entities by taking into account their semantics, i.e. their meaning; intuitively, the words tea and coffee, which both refer to stimulating beverages, will be estimated to be more semantically similar than the words toffee (confection) and coffee despite that the last pair has a higher syntactic similarity.
Journal ArticleDOI

Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies

TL;DR: A new ontology-structure-based technique for measuring semantic similarity in single ontology and across multiple ontologies in the biomedical domain within the framework of unified medical language system (UMLS).
Journal ArticleDOI

Knowledge-based vector space model for text clustering

TL;DR: A new similarity measure is defined that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy to re-weight term frequency in the VSM.
Book ChapterDOI

A comparative study of ontology based term similarity measures on PubMed document clustering

TL;DR: This paper evaluates term re-weighting as an important method to integrate domain ontology to clustering process and results on 8 different semantic measures show there is no a certain type of similarity measures that significantly outperforms the others.
References
More filters
Book

Introduction to Modern Information Retrieval

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI

Introduction to WordNet: An On-line Lexical Database

TL;DR: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.
Journal ArticleDOI

Efficient string matching: an aid to bibliographic search

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

Development of a Stemming Algorithm

TL;DR: A new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application.
Journal Article

Introduction to the special issue on word sense disambiguation: the state of the art

TL;DR: In this paper, les As. font ici le point sur l'etat de la recherche dans ce domaine depuis ces 50 dernieres annees and considerent les prochaines etapes a franchir.
Related Papers (5)