K
Keith Stevens
Publications - 7
Citations - 5944
Keith Stevens is an academic researcher. The author has contributed to research in topics: Machine translation & Sentence. The author has an hindex of 4, co-authored 5 publications receiving 4859 citations.
Papers
More filters
Posted Content
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu,Mike Schuster,Zhifeng Chen,Quoc V. Le,Mohammad Norouzi,Wolfgang Macherey,Maxim Krikun,Yuan Cao,Qin Gao,Klaus Macherey,Jeff Klingner,Apurva Shah,Melvin Johnson,Xiaobing Liu,Łukasz Kaiser,Stephan Gouws,Yoshikiyo Kato,Taku Kudo,Hideto Kazawa,Keith Stevens,George Kurian,Nishant Patil,Wei Wang,Cliff Young,Jason A. Smith,Jason Riesa,Alex Rudnick,Oriol Vinyals,Greg S. Corrado,Macduff Hughes,Jeffrey Dean +30 more
TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Proceedings ArticleDOI
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
Mandy Guo,Qinlan Shen,Yinfei Yang,Heming Ge,Daniel Cer,Gustavo Hernandez Abrego,Keith Stevens,Noah Constant,Yun-Hsuan Sung,Brian Strope,Ray Kurzweil +10 more
TL;DR: This paper presented an effective approach for parallel corpus mining using bilingual sentence embeddings, which is achieved using a novel training method that introduces hard negatives consisting of sentences that are not translations but have some degree of semantic similarity.
Posted Content
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
Mandy Guo,Qinlan Shen,Yinfei Yang,Heming Ge,Daniel Cer,Gustavo Hernandez Abrego,Keith Stevens,Noah Constant,Yun-Hsuan Sung,Brian Strope,Ray Kurzweil +10 more
TL;DR: The embedding models are trained to produce similar representations exclusively for bilingual sentence pairs that are translations of each other using a novel training method that introduces hard negatives consisting of sentences that are not translations but have some degree of semantic similarity.
Journal ArticleDOI
OpenAssistant Conversations - Democratizing Large Language Model Alignment
Andreas Kopf,Yannic Kilcher,Dimitri von Rutte,Sotiris Anagnostidis,Zhi Rui Tam,Keith Stevens,Nguyen Minh Duc,Richárd Nagyfi,Arnav Dantuluri,Andrew M. Maguire,Christoph Schuhmann,A. A. Mattick +11 more
TL;DR: OpenAssistant Conversations as discussed by the authors is a large-scale annotated assistant-style conversation corpus consisting of 161,443 messages distributed across 66,497 conversation trees, in 35 different languages, annotated with 461,292 quality ratings.
Proceedings ArticleDOI
Hierarchical Document Encoder for Parallel Corpus Mining
Mandy Guo,Yinfei Yang,Keith Stevens,Daniel Cer,Heming Ge,Yun-Hsuan Sung,Brian Strope,Ray Kurzweil +7 more
TL;DR: The results show document embeddings derived from sentence-level averaging are surprisingly effective for clean datasets, but suggest models trained hierarchically at the document-level are more effective on noisy data.