scispace - formally typeset
Search or ask a question
Institution

West

About: West is a based out in . It is known for research contribution in the topics: Document retrieval & Web query classification. The organization has 67 authors who have published 85 publications receiving 3627 citations. The organization is also known as: W.


Papers
More filters
PatentDOI
TL;DR: An information retrieval system based on probabilities that documents meet information needs that is iteratively adjusted through the samples as the probabilities are scored for documents in samples.

360 citations

Journal ArticleDOI
I. Dan Melamed1
TL;DR: This article presents methods for biasing statistical translation models to reflect bitext properties, and shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs.
Abstract: Parallel texts (bitexts) have properties that distinguish them from other kinds of parallel data. First, most words translate to only one other word. Second, bitext correspondence is typically only partial---many words in each text have no clear equivalent in the other text. This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledge-free model. This article also shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs. Even the simplest kinds of language-specific knowledge, such as the distinction between content words and function words, are shown to reliably boost translation model performance on some tasks. Statistical models that reflect knowledge about the model domain combine the best of both the rationalist and empiricist paradigms.

322 citations

Patent
Howard R. Turtle1
08 Sep 1993
TL;DR: In this paper, a computer implemented process for creating a search query for an information retrieval system in which a database is provided containing a plurality of stopwords and phrases is described, and the phrases are substituted for the sequence of stemmed words from the list so that the remaining elements, namely the substituted phrases and unsubstituted stemmed words, form the search query.
Abstract: A computer implemented process for creating a search query for an information retrieval system in which a database is provided containing a plurality of stopwords and phrases. A natural language input query defines the composition of the text of documents to be identified. Each word of the natural language input query is compared to the database in order to remove stopwords from the query. The remaining words of the input query are stemmed to their basic roots, and the sequence of stemmed words in the list is compared to phrases in the database to identify phrases in the search query. The phrases are substituted for the sequence of stemmed words from the list so that the remaining elements, namely the substituted phrases and unsubstituted stemmed words, form the search query. The completed search query elements are query nodes of a query network used to match representation nodes of a document network of an inference network. The database includes as options a topic and key database for finding numerical keys, and a synonym database for finding synonyms, both of which are employed in the query as query nodes.

310 citations

Patent
Howard R. Turtle1
08 Oct 1991
TL;DR: In this article, a computer implemented process for creating a search query for an information retrieval system in which a database is provided containing a plurality of stopwords and phrases is described, and the phrases are substituted for the sequence of stemmed words from the list so that the remaining elements, namely the substituted phrases and unsubstituted stemmed words, form the search query.
Abstract: A computer implemented process for creating a search query for an information retrieval system in which a database is provided containing a plurality of stopwords and phrases. A natural language input query defines the composition of the test of documents to be identified. Each word of the natural language input query is compared to the database in order to remove stopwords from the query. The remaining words of the input query are stemmed to their basic roots, and the sequence of stemmed words in the list is compared to phrases in the database to identify phrases in the search query. The phrases are substituted for the sequence of stemmed words from the list so that the remaining elements, namely the substituted phrases and unsubstituted stemmed words, form the search query. The completed search query elements are query nodes of a query network used to match representation nodes of a document network of an inference network. The database includes as options a topic and key database for finding numerical keys, and a synonym database for finding synonyms, both of which are employed in the query as query nodes.

261 citations

Journal Article
I. Dan Melamed1
TL;DR: This article advances the state of the art of bitext mapping by formulating the problem in terms of pattern recognition and presenting the Smooth Injective Map Recognizer (SIMR) algorithm, which has produced bitext maps for over 200 megabytes of French-English bitexts.
Abstract: Texts that are available in two languages (bitexts) are becoming more and more plentiful, both in private data warehouses and on publicly accessible sites on the World Wide Web. As with other kinds of data, the value of bitexts largely depends on the efficacy of the available data mining tools. The first step in extracting useful information from bitexts is to find corresponding words and/or text segment boundaries in their two halves (bitext maps).This article advances the state of the art of bitext mapping by formulating the problem in terms of pattern recognition. From this point of view, the success of a bitext mapping algorithm hinges on how well it performs three tasks: signal generation, noise filtering, and search. The Smooth Injective Map Recognizer (SIMR) algorithm presented here integrates innovative approaches to each of these tasks. Objective evaluation has shown that SIMR's accuracy is consistently high for language pairs as diverse as French/English and Korean/English. If necessary, SIMR's bitext maps can be efficiently converted into segment alignments using the Geometric Segment Alignment (GSA) algorithm, which is also presented here.SIMR has produced bitext maps for over 200 megabytes of French-English bitexts. GSA has converted these maps into alignments. Both the maps and the alignments are available from the Linguistic Data Consortium.

216 citations


Authors
Network Information
Related Institutions (5)
University of Lisbon
48.5K papers, 1.1M citations

64% related

The Chinese University of Hong Kong
93.6K papers, 3M citations

64% related

University of Catania
41.1K papers, 1M citations

64% related

University of Bologna
115.1K papers, 3.4M citations

64% related

University of Pisa
73.1K papers, 2.1M citations

64% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20212
20201
20193
20186
20173
20162