scispace - formally typeset
Search or ask a question
Author

Salim Roukos

Other affiliations: Nuance Communications
Bio: Salim Roukos is an academic researcher from IBM. The author has contributed to research in topics: Parsing & Language model. The author has an hindex of 46, co-authored 144 publications receiving 24093 citations. Previous affiliations of Salim Roukos include Nuance Communications.


Papers
More filters
Proceedings ArticleDOI
06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

21,126 citations

Proceedings ArticleDOI
19 Feb 1991
TL;DR: The problem of quantitatively comparing the performance of different broad-coverage grammars of English has to date resisted solution as discussed by the authors, which is a problem that has been resisted solution.
Abstract: The problem of quantitatively comparing the performance of different broad-coverage grammars of English has to date resisted solution. Prima facie, known English grammars appear to disagree strongly with each other as to the elements of even the simplest sentences. For instance, the grammars of Steve Abney (Bellcore), Ezra Black (IBM), Dan Flickinger (Hewlett Packard), Claudia Gdaniec (Logos), Ralph Grishman and Tomek Strzalkowski (NYU), Phil Harrison (Boeing), Don Hindle (AT&T), Bob Ingria (BBN), and Mitch Marcus (U. of Pennsylvania) recognize in common only the following constituents, when each grammarian provides the single parse which he/she would ideally want his/her grammar to specify for three sample Brown Corpus sentences:The famed Yankee Clipper, now retired, has been assisting (as (a batting coach)).One of those capital-gains ventures, in fact, has saddled him (with Gore Court).He said this constituted a (very serious) misuse (of the (Criminal court) processes).

434 citations

Patent
25 Nov 1998
TL;DR: In this paper, a system for conversant interaction includes a recognizer for receiving and processing input information and outputting a recognized representation of the input information, a dialog manager is coupled to the recognizer, the dialog manager having task-oriented forms for associating user input information therewith, dialog manager being capable of selecting an applicable form from the taskoriented forms responsive to input information by scoring the forms relative to each other.
Abstract: A system for conversant interaction includes a recognizer for receiving and processing input information and outputting a recognized representation of the input information A dialog manager is coupled to the recognizer for receiving the recognized representation of the input information, the dialog manager having task-oriented forms for associating user input information therewith, the dialog manager being capable of selecting an applicable form from the task-oriented forms responsive to the input information by scoring the forms relative to each other A synthesizer is employed for converting a response generated by the dialog manager to output the response A program storage device and method are also provided

397 citations

Proceedings ArticleDOI
M. Epstein1, Kishore Papineni1, Salim Roukos1, T. Ward1, S. Della Pietra1 
07 May 1996
TL;DR: A new approach to natural language understanding (NLU) based on the source-channel paradigm is presented, and it is applied to ARPA's Air Travel Information Service (ATIS) domain.
Abstract: We present a new approach to natural language understanding (NLU) based on the source-channel paradigm, and apply it to ARPA's Air Travel Information Service (ATIS) domain. The model uses techniques similar to those used by IBM in statistical machine translation. The parameters are trained using the exact match algorithm; a hierarchy of models is used to facilitate the bootstrapping of more complex models from simpler models.

316 citations

Proceedings ArticleDOI
27 Apr 1993
TL;DR: The model described here was trained on five million words of Wall Street Journal text and produced a language model whose perplexity was 12% lower than that of a conventional trigram, as measured on independent data.
Abstract: Ongoing efforts at adaptive statistical language modeling are described. To extract information from the document history, trigger pairs are used as the basic information-bearing elements. To combine statistical evidence from multiple triggers, the principle of maximum entropy (ME) is used. To combine the trigger-based model with the static model, the latter is absorbed into the ME formalism. Given consistent statistical evidence, a unique ME solution is guaranteed to exist, and an iterative algorithm exists which is guaranteed to converge to it. Among the advantages of this approach are its simplicity, generality, and incremental nature. Among its disadvantages are its computational requirements. The model described here was trained on five million words of Wall Street Journal text. It used some 40000 unigram constraints, 200000 bigram constraints, 200000 trigram constraints, and 60000 trigger constraints. After 13 iterations, it produced a language model whose perplexity was 12% lower than that of a conventional trigram, as measured on independent data. >

296 citations


Cited by
More filters
Proceedings ArticleDOI
06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

21,126 citations

Proceedings Article
08 Dec 2014
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

12,299 citations

Posted Content
TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

11,936 citations

Proceedings Article
25 Jul 2004
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Abstract: ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

9,293 citations

Proceedings ArticleDOI
17 Aug 2015
TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
Abstract: An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker. 1

8,055 citations