Home
/
Authors
/
Salim Roukos

Author

Salim Roukos

Other affiliations: Nuance Communications

Bio: Salim Roukos is an academic researcher from IBM. The author has contributed to research in topics: Parsing & Language model. The author has an hindex of 46, co-authored 144 publications receiving 24093 citations. Previous affiliations of Salim Roukos include Nuance Communications.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2016
2014
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Bleu: a Method for Automatic Evaluation of Machine Translation

[...]

Kishore Papineni¹, Salim Roukos¹, Todd Ward¹, Wei-Jing Zhu¹•Institutions (1)

IBM¹

06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

...read moreread less

21,126 citations

Proceedings Article•DOI•

Procedure for quantitatively comparing the syntactic coverage of English grammars

[...]

Steven Abney, S. Flickenger, Claudia Gdaniec, C. Grishman, Philip Harrison, Donald Hindle, Robert Ingria, Frederick Jelinek, Judith L. Klavans, Mark Liberman, Mitchell Marcus, Salim Roukos, Beatrice Santorini, Tomek Strzalkowski, Ezra Black - Show less +11 more

19 Feb 1991

TL;DR: The problem of quantitatively comparing the performance of different broad-coverage grammars of English has to date resisted solution as discussed by the authors, which is a problem that has been resisted solution.

...read moreread less

Abstract: The problem of quantitatively comparing the performance of different broad-coverage grammars of English has to date resisted solution. Prima facie, known English grammars appear to disagree strongly with each other as to the elements of even the simplest sentences. For instance, the grammars of Steve Abney (Bellcore), Ezra Black (IBM), Dan Flickinger (Hewlett Packard), Claudia Gdaniec (Logos), Ralph Grishman and Tomek Strzalkowski (NYU), Phil Harrison (Boeing), Don Hindle (AT&T), Bob Ingria (BBN), and Mitch Marcus (U. of Pennsylvania) recognize in common only the following constituents, when each grammarian provides the single parse which he/she would ideally want his/her grammar to specify for three sample Brown Corpus sentences:The famed Yankee Clipper, now retired, has been assisting (as (a batting coach)).One of those capital-gains ventures, in fact, has saddled him (with Gore Court).He said this constituted a (very serious) misuse (of the (Criminal court) processes).

...read moreread less

434 citations

Patent•

Natural language task-oriented dialog manager and method

[...]

Kishore Papineni¹, Salim Roukos¹, Robert T. Ward¹•Institutions (1)

IBM¹

25 Nov 1998

TL;DR: In this paper, a system for conversant interaction includes a recognizer for receiving and processing input information and outputting a recognized representation of the input information, a dialog manager is coupled to the recognizer, the dialog manager having task-oriented forms for associating user input information therewith, dialog manager being capable of selecting an applicable form from the taskoriented forms responsive to input information by scoring the forms relative to each other.

...read moreread less

Abstract: A system for conversant interaction includes a recognizer for receiving and processing input information and outputting a recognized representation of the input information A dialog manager is coupled to the recognizer for receiving the recognized representation of the input information, the dialog manager having task-oriented forms for associating user input information therewith, the dialog manager being capable of selecting an applicable form from the task-oriented forms responsive to the input information by scoring the forms relative to each other A synthesizer is employed for converting a response generated by the dialog manager to output the response A program storage device and method are also provided

...read moreread less

397 citations

Proceedings Article•DOI•

Statistical natural language understanding using hidden clumpings

[...]

M. Epstein¹, Kishore Papineni¹, Salim Roukos¹, T. Ward¹, S. Della Pietra¹ - Show less +1 more•Institutions (1)

IBM¹

07 May 1996

TL;DR: A new approach to natural language understanding (NLU) based on the source-channel paradigm is presented, and it is applied to ARPA's Air Travel Information Service (ATIS) domain.

...read moreread less

Abstract: We present a new approach to natural language understanding (NLU) based on the source-channel paradigm, and apply it to ARPA's Air Travel Information Service (ATIS) domain. The model uses techniques similar to those used by IBM in statistical machine translation. The parameters are trained using the exact match algorithm; a hierarchy of models is used to facilitate the bootstrapping of more complex models from simpler models.

...read moreread less

316 citations

Proceedings Article•DOI•

Trigger-based language models: a maximum entropy approach

[...]

R. Lau¹, Roni Rosenfeld², Salim Roukos³•Institutions (3)

Massachusetts Institute of Technology¹, Carnegie Mellon University², IBM³

27 Apr 1993

TL;DR: The model described here was trained on five million words of Wall Street Journal text and produced a language model whose perplexity was 12% lower than that of a conventional trigram, as measured on independent data.

...read moreread less

Abstract: Ongoing efforts at adaptive statistical language modeling are described. To extract information from the document history, trigger pairs are used as the basic information-bearing elements. To combine statistical evidence from multiple triggers, the principle of maximum entropy (ME) is used. To combine the trigger-based model with the static model, the latter is absorbed into the ME formalism. Given consistent statistical evidence, a unique ME solution is guaranteed to exist, and an iterative algorithm exists which is guaranteed to converge to it. Among the advantages of this approach are its simplicity, generality, and incremental nature. Among its disadvantages are its computational requirements. The model described here was trained on five million words of Wall Street Journal text. It used some 40000 unigram constraints, 200000 bigram constraints, 200000 trigram constraints, and 60000 trigger constraints. After 13 iterations, it produced a language model whose perplexity was 12% lower than that of a conventional trigram, as measured on independent data. >

...read moreread less

296 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Bleu: a Method for Automatic Evaluation of Machine Translation

[...]

Kishore Papineni¹, Salim Roukos¹, Todd Ward¹, Wei-Jing Zhu¹•Institutions (1)

IBM¹

06 Jul 2002

...read moreread less

21,126 citations

Proceedings Article•

Sequence to Sequence Learning with Neural Networks

[...]

Ilya Sutskever¹, Oriol Vinyals¹, Quoc V. Le¹•Institutions (1)

Google¹

08 Dec 2014

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

...read moreread less

12,299 citations

Posted Content•

Sequence to Sequence Learning with Neural Networks

[...]

Ilya Sutskever¹, Oriol Vinyals¹, Quoc V. Le¹•Institutions (1)

Google¹

10 Sep 2014-arXiv: Computation and Language

TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

...read moreread less

Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

...read moreread less

11,936 citations

Proceedings Article•

ROUGE: A Package for Automatic Evaluation of Summaries

[...]

Chin-Yew Lin¹•Institutions (1)

Information Sciences Institute¹

25 Jul 2004

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Abstract: ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

...read moreread less

9,293 citations

Proceedings Article•DOI•

Effective Approaches to Attention-based Neural Machine Translation

[...]

Minh-Thang Luong¹, Hieu Pham¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

17 Aug 2015

TL;DR: A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.

...read moreread less

Abstract: An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker. 1

...read moreread less

8,055 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse