Institution

West

About: West is a based out in . It is known for research contribution in the topics: Document retrieval & Web query classification. The organization has 67 authors who have published 85 publications receiving 3627 citations. The organization is also known as: W.

...read moreread less

Topics: Document retrieval, Web query classification, Query optimization, Web search query, Query language ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Patent•DOI•

System of document representation retrieval by successive iterated probability sampling

[...]

Howard R. Turtle¹, Gerald J. Morton¹, F. Kinley Larntz¹•Institutions (1)

West¹

30 Mar 1993-Laboratory Automation & Information Management

TL;DR: An information retrieval system based on probabilities that documents meet information needs that is iteratively adjusted through the samples as the probabilities are scored for documents in samples.

...read moreread less

360 citations

Journal Article•DOI•

Models of translational equivalence among words

[...]

I. Dan Melamed¹•Institutions (1)

West¹

01 Jun 2000-Computational Linguistics

TL;DR: This article presents methods for biasing statistical translation models to reflect bitext properties, and shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs.

...read moreread less

Abstract: Parallel texts (bitexts) have properties that distinguish them from other kinds of parallel data. First, most words translate to only one other word. Second, bitext correspondence is typically only partial---many words in each text have no clear equivalent in the other text. This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledge-free model. This article also shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs. Even the simplest kinds of language-specific knowledge, such as the distinction between content words and function words, are shown to reliably boost translation model performance on some tasks. Statistical models that reflect knowledge about the model domain combine the best of both the rationalist and empiricist paradigms.

...read moreread less

322 citations

Patent•

Concept matching of natural language queries with a database of document concepts

[...]

Howard R. Turtle¹•Institutions (1)

West¹

08 Sep 1993

TL;DR: In this paper, a computer implemented process for creating a search query for an information retrieval system in which a database is provided containing a plurality of stopwords and phrases is described, and the phrases are substituted for the sequence of stemmed words from the list so that the remaining elements, namely the substituted phrases and unsubstituted stemmed words, form the search query.

...read moreread less

Abstract: A computer implemented process for creating a search query for an information retrieval system in which a database is provided containing a plurality of stopwords and phrases. A natural language input query defines the composition of the text of documents to be identified. Each word of the natural language input query is compared to the database in order to remove stopwords from the query. The remaining words of the input query are stemmed to their basic roots, and the sequence of stemmed words in the list is compared to phrases in the database to identify phrases in the search query. The phrases are substituted for the sequence of stemmed words from the list so that the remaining elements, namely the substituted phrases and unsubstituted stemmed words, form the search query. The completed search query elements are query nodes of a query network used to match representation nodes of a document network of an inference network. The database includes as options a topic and key database for finding numerical keys, and a synonym database for finding synonyms, both of which are employed in the query as query nodes.

...read moreread less

310 citations

Patent•

Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query

[...]

Howard R. Turtle¹•Institutions (1)

West¹

08 Oct 1991

TL;DR: In this article, a computer implemented process for creating a search query for an information retrieval system in which a database is provided containing a plurality of stopwords and phrases is described, and the phrases are substituted for the sequence of stemmed words from the list so that the remaining elements, namely the substituted phrases and unsubstituted stemmed words, form the search query.

...read moreread less

Abstract: A computer implemented process for creating a search query for an information retrieval system in which a database is provided containing a plurality of stopwords and phrases. A natural language input query defines the composition of the test of documents to be identified. Each word of the natural language input query is compared to the database in order to remove stopwords from the query. The remaining words of the input query are stemmed to their basic roots, and the sequence of stemmed words in the list is compared to phrases in the database to identify phrases in the search query. The phrases are substituted for the sequence of stemmed words from the list so that the remaining elements, namely the substituted phrases and unsubstituted stemmed words, form the search query. The completed search query elements are query nodes of a query network used to match representation nodes of a document network of an inference network. The database includes as options a topic and key database for finding numerical keys, and a synonym database for finding synonyms, both of which are employed in the query as query nodes.

...read moreread less

261 citations

Journal Article•

Bitext maps and alignment via pattern recognition

[...]

I. Dan Melamed¹•Institutions (1)

West¹

01 Mar 1999-Computational Linguistics

TL;DR: This article advances the state of the art of bitext mapping by formulating the problem in terms of pattern recognition and presenting the Smooth Injective Map Recognizer (SIMR) algorithm, which has produced bitext maps for over 200 megabytes of French-English bitexts.

...read moreread less

Abstract: Texts that are available in two languages (bitexts) are becoming more and more plentiful, both in private data warehouses and on publicly accessible sites on the World Wide Web. As with other kinds of data, the value of bitexts largely depends on the efficacy of the available data mining tools. The first step in extracting useful information from bitexts is to find corresponding words and/or text segment boundaries in their two halves (bitext maps).This article advances the state of the art of bitext mapping by formulating the problem in terms of pattern recognition. From this point of view, the success of a bitext mapping algorithm hinges on how well it performs three tasks: signal generation, noise filtering, and search. The Smooth Injective Map Recognizer (SIMR) algorithm presented here integrates innovative approaches to each of these tasks. Objective evaluation has shown that SIMR's accuracy is consistently high for language pairs as diverse as French/English and Korean/English. If necessary, SIMR's bitext maps can be efficiently converted into segment alignments using the Geometric Segment Alignment (GSA) algorithm, which is also presented here.SIMR has produced bitext maps for over 200 megabytes of French-English bitexts. GSA has converted these maps into alignments. Both the maps and the alignments are available from the Linguistic Data Consortium.

...read moreread less

216 citations

Collapse

Authors

Showing all 67 results

Name	H-index	Papers	Citations
I. Dan Melamed	26	58	2720
Howard R. Turtle	23	33	4099
Khalid Al-Kofahi	19	38	1135
Jack G. Conrad	16	43	987
D. Kotresha	14	32	678
Isabelle Moulinier	12	24	790
Christopher C. Dozier	11	26	464
Peter Jackson	9	11	251
R Delaney	6	7	161
Craig Runde	5	5	201
Trace Liggett	4	8	113
John F. Ludemann	3	4	207
Paul Thompson	3	3	81
Piyush Kumar Pareek	3	16	31
Clifford L. Pyle	3	4	158