scispace - formally typeset
Search or ask a question

Showing papers on "Thesaurus (information retrieval) published in 2005"


Journal ArticleDOI
TL;DR: In this paper, what is sustainable development? Goals, indicators, values, and practice, and how sustainable development can be achieved is discussed. Environment: Science and Policy for Sustainable Development: Vol. 47, No. 3, pp 8-21.
Abstract: (2005). What is Sustainable Development? Goals, Indicators, Values, and Practice. Environment: Science and Policy for Sustainable Development: Vol. 47, No. 3, pp. 8-21.

1,316 citations


Journal ArticleDOI
TL;DR: A simple model for semantic growth is described, in which each new word or concept is connected to an existing network by differentiating the connectivity pattern of an existing node, which generates appropriate small-world statistics and power-law connectivity distributions.

1,224 citations


01 Jan 2005
TL;DR: What Happened in CLEF 2004?.- What Happens in CLEf 2004?
Abstract: What Happened in CLEF 2004?.- What Happened in CLEF 2004?.- I. Ad Hoc Text Retrieval Tracks.- CLEF 2004: Ad Hoc Track Overview and Results Analysis.- Selection and Merging Strategies for Multilingual Information Retrieval.- Using Surface-Syntactic Parser and Deviation from Randomness.- Cross-Language Retrieval Using HAIRCUT at CLEF 2004.- Experiments on Statistical Approaches to Compensate for Limited Linguistic Resources.- Application of Variable Length N-Gram Vectors to Monolingual and Bilingual Information Retrieval.- Integrating New Languages in a Multilingual Search System Based on a Deep Linguistic Analysis.- IR-n r2: Using Normalized Passages.- Using COTS Search Engines and Custom Query Strategies at CLEF.- Report on Thomson Legal and Regulatory Experiments at CLEF-2004.- Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval.- Two-Stage Refinement of Transitive Query Translation with English Disambiguation for Cross-Language Information Retrieval: An Experiment at CLEF 2004.- Dictionary-Based Amharic - English Information Retrieval.- Dynamic Lexica for Query Translation.- SINAI at CLEF 2004: Using Machine Translation Resources with a Mixed 2-Step RSV Merging Algorithm.- Mono- and Crosslingual Retrieval Experiments at the University of Hildesheim.- University of Chicago at CLEF2004: Cross-Language Text and Spoken Document Retrieval.- UB at CLEF2004: Cross Language Information Retrieval Using Statistical Language Models.- MIRACLE's Hybrid Approach to Bilingual and Monolingual Information Retrieval.- Searching a Russian Document Collection Using English, Chinese and Japanese Queries.- Dublin City University at CLEF 2004: Experiments in Monolingual, Bilingual and Multilingual Retrieval.- Finnish, Portuguese and Russian Retrieval with Hummingbird SearchServerTM at CLEF 2004.- Data Fusion for Effective European Monolingual Information Retrieval.- The XLDB Group at CLEF 2004.- The University of Glasgow at CLEF 2004: French Monolingual Information Retrieval with Terrier.- II. Domain-Specific Document Retrieval.- The Domain-Specific Track in CLEF 2004: Overview of the Results and Remarks on the Assessment Process.- University of Hagen at CLEF 2004: Indexing and Translating Concepts for the GIRT Task.- IRIT at CLEF 2004: The English GIRT Task.- Ricoh at CLEF 2004.- GIRT and the Use of Subject Metadata for Retrieval.- III. Interactive Cross-Language Information Retrieval.- iCLEF 2004 Track Overview: Pilot Experiments in Interactive Cross-Language Question Answering.- Interactive Cross-Language Question Answering: Searching Passages Versus Searching Documents.- Improving Interaction with the User in Cross-Language Question Answering Through Relevant Domains and Syntactic Semantic Patterns.- Cooperation, Bookmarking, and Thesaurus in Interactive Bilingual Question Answering.- Summarization Design for Interactive Cross-Language Question Answering.- Interactive and Bilingual Question Answering Using Term Suggestion and Passage Retrieval.- IV. Multiple Language Question Answering.- Overview of the CLEF 2004 Multilingual Question Answering Track.- A Question Answering System for French.- Cross-Language French-English Question Answering Using the DLT System at CLEF 2004.- Experiments on Robust NL Question Interpretation and Multi-layered Document Annotation for a Cross-Language Question/Answering System.- Making Stone Soup: Evaluating a Recall-Oriented Multi-stream Question Answering System for Dutch.- The DIOGENE Question Answering System at CLEF-2004.- Cross-Lingual Question Answering Using Off-the-Shelf Machine Translation.- Bulgarian-English Question Answering: Adaptation of Language Resources.- Answering French Questions in English by Exploiting Results from Several Sources of Information.- Finnish as Source Language in Bilingual Question Answering.- miraQA: Experiments with Learning Answer Context Patterns from the Web.- Question Answering for Spanish Supported by Lexical Context Annotation.- Question Answering Using Sentence Parsing and Semantic Network Matching.- First Evaluation of Esfinge - A Question Answering System for Portuguese.- University of Evora in QA@CLEF-2004.- COLE Experiments at QA@CLEF 2004 Spanish Monolingual Track.- Does English Help Question Answering in Spanish?.- The TALP-QA System for Spanish at CLEF 2004: Structural and Hierarchical Relaxing of Semantic Constraints.- ILC-UniPI Italian QA.- Question Answering Pilot Task at CLEF 2004.- Evaluation of Complex Temporal Questions in CLEF-QA.- V. Cross-Language Retrieval in Image Collections.- The CLEF 2004 Cross-Language Image Retrieval Track.- Caption and Query Translation for Cross-Language Image Retrieval.- Pattern-Based Image Retrieval with Constraints and Preferences on ImageCLEF 2004.- How to Visually Retrieve Images from the St. Andrews Collection Using GIFT.- UNED at ImageCLEF 2004: Detecting Named Entities and Noun Phrases for Automatic Query Expansion and Structuring.- Dublin City University at CLEF 2004: Experiments with the ImageCLEF St. Andrew's Collection.- From Text to Image: Generating Visual Query for Image Retrieval.- Toward Cross-Language and Cross-Media Image Retrieval.- FIRE - Flexible Image Retrieval Engine: ImageCLEF 2004 Evaluation.- MIRACLE Approach to ImageCLEF 2004: Merging Textual and Content-Based Image Retrieval.- Cross-Media Feedback Strategies: Merging Text and Image Information to Improve Image Retrieval.- ImageCLEF 2004: Combining Image and Multi-lingual Search for Medical Image Retrieval.- Multi-modal Information Retrieval Using FINT.- Medical Image Retrieval Using Texture, Locality and Colour.- SMIRE: Similar Medical Image Retrieval Engine.- A Probabilistic Approach to Medical Image Retrieval.- UB at CLEF2004 Cross Language Medical Image Retrieval.- Content-Based Queries on the CasImage Database Within the IRMA Framework.- Comparison and Combination of Textual and Visual Features for Interactive Cross-Language Image Retrieval.- MSU at ImageCLEF: Cross Language and Interactive Image Retrieval.- VI. Cross-Language Spoken Document Retrieval.- CLEF 2004 Cross-Language Spoken Document Retrieval Track.- VII. Issues in CLIR and in Evaluation.- The Key to the First CLEF with Portuguese: Topics, Questions and Answers in CHAVE.- How Do Named Entities Contribute to Retrieval Effectiveness?.

201 citations


Journal ArticleDOI
TL;DR: A qualitative analysis of the NCI Thesaurus found many mistakes and inconsistencies with respect to the term-formation principles used, the underlying knowledge representation system, and missing or inappropriately assigned verbal and formal definitions.
Abstract: Objective: The National Cancer Institute Thesaurus is described by its authors as “a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research” and which “exhibits ontology-like properties in its construction and use”. We performed a qualitative analysis of the Thesaurus in order to assess its conformity with principles of good practice in terminology and ontology design. Materials and Methods: We used both the on-line browsable version of the Thesaurus and its OWL-representation (version 04.08b, released on August 2, 2004), measuring each in light of the requirements put forward in relevant ISO terminology standards and in light of ontological principles advanced in the recent literature. Results: We found many mistakes and inconsistencies with respect to the term-formation principles used, the underlying knowledge representation system, and missing or inappropriately assigned verbal and formal definitions. Conclusion: Version 04.08b of the NCI Thesaurus suffers from the same broad range of problems that have been observed in other biomedical terminologies. For its further development, we recommend the use of a more principled approach that allows the Thesaurus to be tested not just for internal consistency but also for its degree of correspondence to that part of reality which it is designed to represent.

145 citations


Journal ArticleDOI
TL;DR: Ontylog has proven well suited for constructing big biomedical vocabularies, and the Ontylog constructs Kind and Role in the collaboration process described in this paper to facilitate communication between ontologists and domain experts are capitalized on.

122 citations



Journal ArticleDOI
TL;DR: Internet functions that include video file generation and real-time control as MWS (MATLAB Web server) features, which are useful for undergraduate courses are described, which provide a straightforward approach for the developer of the teaching material, the control engineering lecturer, and a low-cost option for the student user.
Abstract: This work describes Internet functions that include video file generation and real-time control as MWS (MATLAB Web server) features, which are useful for undergraduate courses. With these functions, using virtual processes, which in turn allow video animations of simulated processes, can enhance virtual laboratories. Furthermore, MWS allows the implementation of remote laboratories operating in batch mode. WinCom or any other suitable software can be used to implement online laboratories. These methodologies provide a straightforward approach for the developer of the teaching material, the control engineering lecturer, and a low-cost option for the student user.

91 citations


Journal Article

88 citations


Proceedings ArticleDOI
Tanveer Syeda-Mahmood1, Gauri Shah1, Rama Akkiraju1, Anca-Andreea Ivan1, Richard Goodwin1 
11 Jul 2005
TL;DR: By combining multiple cues, it is shown that better relevancy results can be obtained for service matches from a large repository, than could be obtained using any one cue alone.
Abstract: In this paper, we explore the use of domain-independent and domain-specific ontologies to find matching service descriptions. The domain-independent relationships are derived using an English thesaurus after tokenization and part-of-speech tagging. The domain-specific ontological similarity is derived by an inference on the semantic annotations associated with Web service descriptions. Matches due to the two cues are combined to determine an overall semantic similarity score. By combining multiple cues, we show that better relevancy results can be obtained for service matches from a large repository, than could be obtained using any one cue alone.

85 citations




Proceedings Article
01 Jan 2005
TL;DR: It is demonstrated in TREC2003 that employing the WWW as an alldomain word-association resource with appropriate filtering can be successful for this Robust Track objective.
Abstract: There were two sub-tasks in the TREC2004 Robust track: given a set of topics, a) improve the effectiveness of the lowest performing 25%, and b) predict their ranking according to their average precision. For task a), we followed the strategy introduced by us last year to improve ad-hoc retrieval by employing the web as an external thesaurus to supplement a given topic description. A new method of probing the web based on a given topic statement called ‘window rotation’ was tested. For task b) we employed e-SVR (epsilon support vector regression) to predict performance of test topics based on training with some simple features such as document frequencies, query term frequencies. This allows performance prediction without retrieval. Features were also added from a retrieval list with the hope that they may predict later stage or web-assisted retrieval better. 200 old topics were used for training to predict the ranking of 49 new topics, as well as the whole set of 249. Runs were done that made use of title only, description only section of a topic, and titledescription-combination retrieval lists. Ten submissions including runs that were based on initial retrieval only, retrievals with pseudo-relevance feedback, and with web-assistance. Evaluation shows that we have achieved very good performance for most of our runs. 2 Robust Track – Improving Low Performing Topics 2.1 Background We introduced a new strategy of improving ad-hoc retrieval based on web-assistance in the Robust Track of TREC2003. In initial retrieval, some queries have low average precision performance (weak or hard queries) while others return good values (strong or easy queries). The objective of this track is to automatically improve the effectiveness of weak topics, and others in general. Strong topics can generally be further improved with pseudo-relevance feedback (PRF), but this does not work for weak topics because for them, an initial retrieval would not bring in much useful material for feedback use. One may try to enrich weak topic wordings via a thesaurus to improve term variety, and thereby enhancing initial retrieval results. However choosing an available and appropriate thesaurus of the right domain without prior knowledge of a topic is quite a challenge. We demonstrated in TREC2003 that employing the WWW as an alldomain word-association resource with appropriate filtering can be successful for this Robust Track objective.

Journal ArticleDOI
01 Mar 2005


Journal ArticleDOI
TL;DR: This work presents a framework for automated taxonomy construction, that involves generation of a cluster hierarchy from a document corpus using statistical clustering and NLP techniques, and extraction of a topic hierarchy from this cluster hierarchy.
Abstract: Construction of domain ontologies on the semantic web is a human and resource intensive process, efforts to reduce which are crucial for the Semantic Web to scale. We present a framework for automated taxonomy construction, that involves: (a) generation of a cluster hierarchy from a document corpus using statistical clustering and NLP techniques; (b) extraction of a topic hierarchy from this cluster hierarchy; and (c) assignment of labels to nodes in the topic hierarchy. Metrics for estimating topic hierarchy quality and parameters of an experimentation framework are identified. MEDLINE was the document corpus and MeSH thesaurus was the gold standard.


Book
01 Jan 2005
TL;DR: With a variety of concepts and vocabulary used in the psychological literature, search and retrieval of records about specific concepts is virtually impossible without the controlled vocabulary of a thesaurus.
Abstract: With a variety of concepts and vocabulary used in the psychological literature, search and retrieval of records about specific concepts is virtually impossible without the controlled vocabulary of a thesaurus. This controlled vocabulary provides a way of structuring matter in a way that is consistent among users.

Journal ArticleDOI
TL;DR: The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web, finding that academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms.
Abstract: The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.

Journal ArticleDOI
TL;DR: A simple thesaurus-based disambiguation algorithm that can operate with very little training data, enabling gene-symbol disambIGuation in massive text mining applications and resolving most ambiguities in the test set with high accuracy.
Abstract: Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck. We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols), not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set. The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications.

Patent
25 Apr 2005
TL;DR: In this paper, a method for managing fixed-area video segments from fixed area security cameras (601) using a media asset management system (600) includes collecting the fixed area video segments and associating corresponding geospatial data with each fixed area segment, and creating a search thesaurus including search descriptors with cross-references therebetween.
Abstract: A method for managing fixed-area video segments from fixed-area security cameras (601) using a media asset management system (600) includes collecting the fixed-area video segments from the fixed-area security cameras, associating corresponding geospatial data with each fixed-area video segment, and creating a search thesaurus including search descriptors with cross-references therebetween. At least one respective search descriptor from the search thesaurus is associated with each fixed-area video segment. The method further includes storing each fixed-area video segment, its geospatial data and its at least one search descriptor on the media asset management system (600) for later search and retrieval, such as by a security organization. The search descriptors may be geospatial search descriptors that are cross-referenced in a hierarchical relationship.


Patent
25 Apr 2005
TL;DR: In this paper, a method for managing video news segments using a media asset management system (100) includes collecting the videos, associating corresponding geospatial data with each video news segment, and creating a search thesaurus including search descriptors with cross-references there between.
Abstract: A method for managing video news segments using a media asset management system (100) includes collecting the video news segments, associating corresponding geospatial data with each video news segment, and creating a search thesaurus including search descriptors with cross-references therebetween. At least one respective search descriptor from the search thesaurus is associated with each video news segment. The method further includes storing each video news segment, its geospatial data and its at least one search descriptor on the media asset management system (100) for later search and retrieval, such as by a news broadcasting organization. The search descriptors may be geospatial search descriptors that are cross-referenced in a hierarchical relationship.


01 Jan 2005
TL;DR: A new algorithm for automatically extracting index terms from documents relating to the domain of agriculture using the domain-specific Agrovoc thesaurus developed by the FAO is described.
Abstract: This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction.

Journal ArticleDOI
TL;DR: Constant temporal behavior over type of touch and low compression properties of the parts of the action (reflected in key bottom contact times) were hypothesized to be indicators for instrumental quality.
Abstract: This study investigated the temporal behavior of grand piano actions from different manufacturers under different touch conditions and dynamic levels. An experimental setup consisting of accelerome ...


Journal ArticleDOI
TL;DR: A case of rhabdoid papillary meningioma with macroscopic and microscopic cysts, displaying extensive leptomeningeal dissemination after frequent local recurrence is reported.
Abstract: Papillary meningioma is a rare variant of meningioma defined by the presence of a perivascular pseudopapillary pattern. Because papillary meningioma typically displays invasion of the brain, local recurrence, and distant metastasis, it has been graded as a WHO grade III tumor [8]. Rhabdoid meningioma is a relatively newly recognized, distinct WHO grade III tumor that frequently develops on a background of other meningioma subtypes [8]. Perry et al. [9] described the first case of rhabdoid meningioma with papillary architecture resembling ependymoma in a 13-year-old girl. Subsequently, three additional cases of rhabdoid papillary meningioma have been reported [1, 4, 10]. Here, we report a case of rhabdoid papillary meningioma with macroscopic and microscopic cysts, displaying extensive leptomeningeal dissemination after frequent local recurrence. A 12-year-old girl was examined for focal motor seizures of the neck. On neurological examination, she showed double vision. Computed tomography (CT) scan showed a right frontal hypodense cystic lesion with a heterogeneously enhanced nodule and enhancement of the cyst wall (Fig. 1a). At operation, the dura mater showed no evidence of invasion. The solid tumor was located in the superficial part of the cortex and contained calcification. The border with the cerebral parenchyma was partly unclear. Gross total removal of the tumor was achieved. Xanthochromic fluid was obtained from the cyst. The cyst wall was not removed. After a histological diagnosis of anaplastic ependymoma had been established at an outside institution, the patient received a course of radiotherapy totaling 50 Gy to the surgical area. She was discharged from our hospital, without neurological deficit. However, five local recurrences were noted over the following 11 years, and the patient died of diffuse subarachnoid dissemination at the age of 25 years. Paraffin sections from the surgical and autopsy materials were stained with hematoxylin and eosin (HE), periodic acid-Schiff (PAS), and silver impregnation for reticulin. Other sections were immunostained using a polyclonal antibody against glial fibrillary acidic protein (GFAP; Dako, Glostrup, Denmark; 1:500), and monoclonal antibodies against vimentin (Dako Cytomation, Carpinteria, CA; 1:50), cytokeratin (AE1/AE3; Dako Cytomation; 1:50), epithelial membrane antigen (EMA; Dako Cytomation; 1:50), desmin (Dako; 1:50), asmooth muscle actin (Dako; 1:100), synaptophysin (Boehringer, Mannheim, Germany; 1:500), neurofilament (Sanbio, Uden, The Netherlands; 1:100), BAF47/ SNF5 (BD Transduction Labs, San Diego, CA; 1:250) [6] and Ki-67 (MIB-1; Dako; 1:50). Microscopic examination of the original tumor showed a sheet-like structure throughout most of the specimen. Mitotic figures and small necrotic foci were occasionally seen. In some areas, however, rhabdoid morphology was defined as sheets of loosely cohesive cells with eccentric nuclei and hyaline, paranuclear inclusions (Fig. 1b). In addition, loss of cellular cohesion led to the focal emergence of papillary architecture (Fig. 1c), accompanied by a dense network of perivascular reticulin fibers (Fig. 1d). Moreover, microcysts of various sizes or ependymal canal-like structures were found. Areas showing transition from the sheet-like structure to the microcystic areas were also evident (Fig. 1e). The wall of the microcysts showed an epitheK. Wakabayashi (&) Æ F. Mori Department of Neuropathology, Institute of Brain Science, Hirosaki University School of Medicine, 5 Zaifu-cho, 036-8562 Hirosaki, Japan E-mail: koichi@cc.hirosaki-u.ac.jp Tel.: +81-172-395130 Fax: +81-172-395132


Patent
25 Apr 2005
TL;DR: In this paper, a method for managing video segments from an aerial sensor platform (501) using a media asset management system (500) includes collecting the video segments and associating corresponding geospatial data with each video segment, and creating a search thesaurus including search descriptors with cross-references therebetween.
Abstract: A method for managing video segments from an aerial sensor platform (501) using a media asset management system (500) includes collecting the video segments from the aerial sensor platform, associating corresponding geospatial data with each video segment, and creating a search thesaurus including search descriptors with cross-references therebetween. At least one respective search descriptor from the search thesaurus is associated with each video segment. The method further includes storing each video segment, its geospatial data and its at least one search descriptor on the media asset management system (500) for later search and retrieval, such as by a surveillance organization. The search descriptors may be geospatial search descriptors that are cross-referenced in a hierarchical relationship.