Open AccessProceedings Article
Free-text medical document retrieval via phrase-based vector space model.
Wenlei Mao,Wesley W. Chu +1 more
- pp 489-493
TLDR
This work proposes to represent documents using phrases, a vector space model that represents a document as a vector of index terms, and shows that phrase-based VSM yields a 16% increase of retrieval accuracy compared to the stem-based model.Abstract:
Many information retrieval systems are based on vector space model (VSM) that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve retrieval accuracy. However, past research revealed that such systems did not outperform the traditional stem-based systems. Incorporating conceptual similarity derived from knowledge sources should have the potential to improve retrieval accuracy. Yet the incompleteness of the knowledge source precludes significant improvement. To remedy this problem, we propose to represent documents using phrases. A phrase consists of multiple concepts and word stems. The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems. The document similarity can in turn be derived from phrase similarities. Using OHSUMED as a test collection and UMLS as the knowledge source, our experiment results reveal that phrase-based VSM yields a 16% increase of retrieval accuracy compared to the stem-based model.read more
Citations
More filters
Journal ArticleDOI
A framework for unifying ontology-based semantic similarity measures
TL;DR: This paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases, and unify a large number of state-of-the-art semantic similarity measures through common expressions.
Journal ArticleDOI
Semantic Similarity from Natural Language and Ontology Analysis
TL;DR: Semantic measures as discussed by the authors assess the similarity or relatedness of semantic entities by taking into account their semantics, i.e. their meaning; intuitively, the words tea and coffee, which both refer to stimulating beverages, will be estimated to be more semantically similar than the words toffee (confection) and coffee despite that the last pair has a higher syntactic similarity.
Journal ArticleDOI
Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies
Hisham Al-Mubaid,H.A. Nguyen +1 more
TL;DR: A new ontology-structure-based technique for measuring semantic similarity in single ontology and across multiple ontologies in the biomedical domain within the framework of unified medical language system (UMLS).
Journal ArticleDOI
Knowledge-based vector space model for text clustering
TL;DR: A new similarity measure is defined that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy to re-weight term frequency in the VSM.
Book ChapterDOI
A comparative study of ontology based term similarity measures on PubMed document clustering
TL;DR: This paper evaluates term re-weighting as an important method to integrate domain ontology to clustering process and results on 8 different semantic measures show there is no a certain type of similarity measures that significantly outperforms the others.
References
More filters
Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI
Introduction to WordNet: An On-line Lexical Database
TL;DR: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.
Journal ArticleDOI
Efficient string matching: an aid to bibliographic search
TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Development of a Stemming Algorithm
TL;DR: A new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application.
Journal Article
Introduction to the special issue on word sense disambiguation: the state of the art
Nancy Ide,Jean Véronis +1 more
TL;DR: In this paper, les As. font ici le point sur l'etat de la recherche dans ce domaine depuis ces 50 dernieres annees and considerent les prochaines etapes a franchir.