scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Intelligent encoding of concepts in Web document retrieval

27 Sep 2003-Vol. 1, pp 72-77
TL;DR: The main aim of the proposed approach is to improve Web information retrieval effectiveness by overcoming the problems associated with a typical keyword matching retrieval system, through the use of concepts and an intelligent fusion of confidence values.
Abstract: The main aim of the proposed approach presented in this paper is to improve Web information retrieval effectiveness by overcoming the problems associated with a typical keyword matching retrieval system, through the use of concepts and an intelligent fusion of confidence values. By exploiting the conceptual hierarchy of the WordNet (G. Miller, 1995) knowledge base, we show how to effectively encode the conceptual information in a document using the semantic information implied by the words that appear within it. Rather than treating a word as a string made up of a sequence of characters, we consider a word to represent a concept.
Citations
More filters
Journal ArticleDOI
TL;DR: It can be concluded that the newly constructed ELM-based ANSI models can solve the difficulties in tuning the acceleration coefficients of SPSO by the trial-and-error method for predicting the CBR of soils and be further applied to other real-time problems of geotechnical engineering.

57 citations

Journal ArticleDOI
TL;DR: The results of the benchmark experiments confirm that the proposed semantic granularity based IR model performs significantly better than the similarity-based baseline in both a bio-medical and an agricultural domain and that the perceived relevance of the documents delivered by the granularity-based IR system is significantly higher than that produced by a popular search engine for a number of domain-specific search tasks.
Abstract: Both similarity-based and popularity-based document ranking functions have been successfully applied to information retrieval (IR) in general. However, the dimension of semantic granularity also should be considered for effective retrieval. In this article, we propose a semantic granularity-based IR model that takes into account the three dimensions, namely similarity, popularity, and semantic granularity, to improve domain-specific search. In particular, a concept-based computational model is developed to estimate the semantic granularity of documents with reference to a domain ontology. Semantic granularity refers to the levels of semantic detail carried by an information item. The results of our benchmark experiments confirm that the proposed semantic granularity based IR model performs significantly better than the similarity-based baseline in both a bio-medical and an agricultural domain. In addition, a series of user-oriented studies reveal that the proposed document ranking functions resemble the implicit ranking functions exercised by humans. The perceived relevance of the documents delivered by the granularity-based IR system is significantly higher than that produced by a popular search engine for a number of domain-specific search tasks. To the best of our knowledge, this is the first study regarding the application of semantic granularity to enhance domain-specific IR.

53 citations


Cites methods from "Intelligent encoding of concepts in..."

  • ...To effectively handle such a situation, we have developed the Conceptual Marking Tree (CMT) procedure based on a conceptual encoding technique [Zakos et al. 2003]....

    [...]

  • ...[Zakos et al. 2003]....

    [...]

Proceedings ArticleDOI
08 Mar 2007
TL;DR: A system for literature review that uses content-based image retrieval (CBIR) techniques to search for relevant documents using the content of figures in a document along with relevance feedback refinement instead of keyword search guesswork is described.
Abstract: Literature review is a time-consuming burden because it is hard to find relevant articles. But literature review is so important because it allows researchers to find solutions to their questions/problems from previous work already performed and published by others. It is difficult to wade through documents quickly and assess their quality by only looking at their title, abstract, or even full-text. The human visual system allows us to quickly glance at images and infer the main subject of an article and decide whether we are interested in reading more. In some cases, such as biology articles for example, figures showing photos of experimental results quickly allow a researcher in the literature review phase to determine the quality of the work by its results. This work describes a system for literature review that uses content-based image retrieval (CBIR) techniques to search for relevant documents using the content of figures in a document along with relevance feedback refinement instead of keyword search guesswork. The long-term goal is to use it as a subsystem in a content-based document retrieval system where the figures and their captions are combined with the document's body text. This paper describes the processing of the documents to extract available raster graphics as well as text with its layout and formatting information intact. The process of matching a figure to its caption using this layout information is then described. While caption-based search is implemented but not quite merged into the system yet, the figure-caption matching is complete. Two novel modified tf-idf measures that are being considered to take into account bold/italic text, font size, and document structure as a way to infer text importance rather than just rely on text frequency is detailed mathematically and explained intuitively. CBIR queries where there are multiple images that form the query are issued as separate queries and their results are then merged together.

13 citations


Cites methods from "Intelligent encoding of concepts in..."

  • ...Thesaurus information has also been utilized to translate words in documents into concept hierarchies [9] in an attempt to understand what a document is about instead of modeling what a document is about by the frequency of occurrence of its terms like tf-idf [10] (term frequency-inverse document frequency) does....

    [...]

Proceedings ArticleDOI
28 Nov 2006
TL;DR: The representation of the characters of multimedia educational resources and the relations between the hierarchy characters are studied and a hierarchy index is built to satisfy all kinds of queries to the educational resources database system.
Abstract: Long-distance education is a very important teaching method now. The foundation of the long distance is the construction of educational resources database. The key work of constructing resources database is to structure an appropriate index. The paper studies the representation of the characters of multimedia educational resources and the relations between the hierarchy characters. A hierarchy index is built to satisfy all kinds of queries to the educational resources database system. The subject ontology is used to get a standard annotation of the resources and a standard description of the query requirement. The subject ontology is also used to extend the semantics of the query requirement. And the mapping rules of the hierarchy characters provide a way to represent the semantics of the resource automatically.

Cites methods from "Intelligent encoding of concepts in..."

  • ...[3]developed a conceptual encoding technique and proposed several conceptual indicators....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Abstract: Because meaningful sentences are composed of meaningful words, any system that hopes to process natural languages as people do must have information about words and their meanings. This information is traditionally provided through dictionaries, and machine-readable dictionaries are now widely available. But dictionary entries evolved for the convenience of human readers, not for machines. WordNet1 provides a more effective combination of traditional lexicographic information and modern computing. WordNet is an online lexical database designed for use under program control. English nouns, verbs, adjectives, and adverbs are organized into sets of synonyms, each representing a lexicalized concept. Semantic relations link the synonym sets [4].

15,068 citations

Book
15 May 1999
TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.
Abstract: From the Publisher: This is a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective. The advent of the Internet and the enormous increase in volume of electronically stored information generally has led to substantial work on IR from the computer science perspective - this book provides an up-to-date student oriented treatment of the subject.

9,923 citations

Journal ArticleDOI
TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Abstract: The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.

7,572 citations

Journal ArticleDOI
TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Abstract: In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.

6,619 citations


"Intelligent encoding of concepts in..." refers methods in this paper

  • ...Non-Conceptual Encoding Non-conceptual encoding generates a standard inverted index using the VSM [11]....

    [...]

  • ...Combination Average Precision VSM 0.514 VSM + CB 0.526 VSM * CB 0.524 Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’03) 0-7695-1957-1/03 $17.00 © 2003 IEEE...

    [...]

  • ...Non-conceptual encoding generates a standard inverted index using the VSM [11]....

    [...]

  • ...By using the W1 weighting scheme alone for the concept-based approach and combining the VSM, a significant improvement in the average precision is achieved....

    [...]

  • ...Experimental experimentation has produced encouraging results that achieve better than the standard VSM....

    [...]

Journal ArticleDOI
TL;DR: An introduction and survey over probabilistic information retrieval (IR) is given: the probability-ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR along with the corresponding event space clarify the interpretation of the Probabilistic parameters involved.
Abstract: In this paper, an introduction and survey over probabilistic information retrieval (IR) is given. First, the basic concepts of this approach are described: the probability-ranking principle shows that optimum retrieval quality can be achieved under certain assumptions; a conceptual model for IR along with the corresponding event space clarify the interpretation of the probabilistic parameters involved. For the estimation of these parameters, three different learning strategies are distinguished, namely query-related, document-related and description-related learning. As a representative for each of these strategies, a specific model is described. A new approach regards IR as uncertain inference; here, imaging is used as a new technique for estimating the probabilistic parameters, and probabilistic inference networks support more complex forms of inference. Finally, the more general problems of parameter estimations, query expansion and the development of models for advanced document representations are discussed.

358 citations


"Intelligent encoding of concepts in..." refers methods in this paper

  • ...These include Boolean retrieval [2], the vector space model [3], probabilistic models [4], fuzzy set retrieval [5] and neural network models [6]....

    [...]