Free-text medical document retrieval via phrase-based vector space model.

Open AccessProceedings Article

Free-text medical document retrieval via phrase-based vector space model.

- pp 489-493

TLDR

This work proposes to represent documents using phrases, a vector space model that represents a document as a vector of index terms, and shows that phrase-based VSM yields a 16% increase of retrieval accuracy compared to the stem-based model.

Abstract:

Many information retrieval systems are based on vector space model (VSM) that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve retrieval accuracy. However, past research revealed that such systems did not outperform the traditional stem-based systems. Incorporating conceptual similarity derived from knowledge sources should have the potential to improve retrieval accuracy. Yet the incompleteness of the knowledge source precludes significant improvement. To remedy this problem, we propose to represent documents using phrases. A phrase consists of multiple concepts and word stems. The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems. The document similarity can in turn be derived from phrase similarities. Using OHSUMED as a test collection and UMLS as the knowledge source, our experiment results reveal that phrase-based VSM yields a 16% increase of retrieval accuracy compared to the stem-based model.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A framework for unifying ontology-based semantic similarity measures

Sébastien Harispe, +4 more

- 01 Apr 2014 -

Journal of Biomedical Informatics

TL;DR: This paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases, and unify a large number of state-of-the-art semantic similarity measures through common expressions.

...read moreread less

Journal ArticleDOI

Semantic Similarity from Natural Language and Ontology Analysis

Sébastien Harispe, +3 more

- 08 May 2015 -

Synthesis Lectures on Human Language Tec...

TL;DR: Semantic measures as discussed by the authors assess the similarity or relatedness of semantic entities by taking into account their semantics, i.e. their meaning; intuitively, the words tea and coffee, which both refer to stimulating beverages, will be estimated to be more semantically similar than the words toffee (confection) and coffee despite that the last pair has a higher syntactic similarity.

...read moreread less

Journal ArticleDOI

Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies

Hisham Al-Mubaid, +1 more

TL;DR: A new ontology-structure-based technique for measuring semantic similarity in single ontology and across multiple ontologies in the biomedical domain within the framework of unified medical language system (UMLS).

...read moreread less

Journal ArticleDOI

Knowledge-based vector space model for text clustering

Liping Jing, +2 more

- 01 Oct 2010 -

Knowledge and Information Systems

TL;DR: A new similarity measure is defined that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy to re-weight term frequency in the VSM.

...read moreread less

Book ChapterDOI

A comparative study of ontology based term similarity measures on PubMed document clustering

Xiaodan Zhang, +4 more

TL;DR: This paper evaluates term re-weighting as an important method to integrate domain ontology to clustering process and results on 8 different semantic measures show there is no a certain type of similarity measures that significantly outperforms the others.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Introduction to Modern Information Retrieval

Gerard Salton, +1 more

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.

...read moreread less

Journal ArticleDOI

Introduction to WordNet: An On-line Lexical Database

George A. Miller, +4 more

- 01 Dec 1990 -

International Journal of Lexicography

TL;DR: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.

...read moreread less

Journal ArticleDOI

Efficient string matching: an aid to bibliographic search

Alfred V. Aho, +1 more

- 01 Jun 1975 -

Communications of The ACM

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

...read moreread less

Development of a Stemming Algorithm

Julie Beth Lovins

TL;DR: A new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application.

...read moreread less

Journal Article

Introduction to the special issue on word sense disambiguation: the state of the art

Nancy Ide, +1 more

- 01 Mar 1998 -

Computational Linguistics

TL;DR: In this paper, les As. font ici le point sur l'etat de la recherche dans ce domaine depuis ces 50 dernieres annees and considerent les prochaines etapes a franchir.

...read moreread less

Free-text medical document retrieval via phrase-based vector space model.

Citations

A framework for unifying ontology-based semantic similarity measures

Semantic Similarity from Natural Language and Ontology Analysis

Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies

Knowledge-based vector space model for text clustering

A comparative study of ontology based term similarity measures on PubMed document clustering

References

Introduction to Modern Information Retrieval

Introduction to WordNet: An On-line Lexical Database

Efficient string matching: an aid to bibliographic search

Development of a Stemming Algorithm

Introduction to the special issue on word sense disambiguation: the state of the art

Related Papers (5)

Introduction to Modern Information Retrieval

A vector space model for automatic indexing

An approach for measuring semantic similarity between words using multiple information sources

Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program

Term Weighting Approaches in Automatic Text Retrieval