Keith Stevens

Publications - 7

Citations - 5944

Keith Stevens is an academic researcher. The author has contributed to research in topics: Machine translation & Sentence. The author has an hindex of 4, co-authored 5 publications receiving 4859 citations.

Papers

PDF

Open Access

More filters

Posted Content

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Yonghui Wu, +30 more

- 26 Sep 2016 -

arXiv: Computation and Language

TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.

...read moreread less

Proceedings ArticleDOI

Effective Parallel Corpus Mining using Bilingual Sentence Embeddings

Mandy Guo, +10 more

TL;DR: This paper presented an effective approach for parallel corpus mining using bilingual sentence embeddings, which is achieved using a novel training method that introduces hard negatives consisting of sentences that are not translations but have some degree of semantic similarity.

...read moreread less

Posted Content

Effective Parallel Corpus Mining using Bilingual Sentence Embeddings

Mandy Guo, +10 more

- 31 Jul 2018 -

arXiv: Computation and Language

TL;DR: The embedding models are trained to produce similar representations exclusively for bilingual sentence pairs that are translations of each other using a novel training method that introduces hard negatives consisting of sentences that are not translations but have some degree of semantic similarity.

...read moreread less

Journal ArticleDOI

OpenAssistant Conversations - Democratizing Large Language Model Alignment

Andreas Kopf, +11 more

- 14 Apr 2023 -

arXiv.org

TL;DR: OpenAssistant Conversations as discussed by the authors is a large-scale annotated assistant-style conversation corpus consisting of 161,443 messages distributed across 66,497 conversation trees, in 35 different languages, annotated with 461,292 quality ratings.

...read moreread less

Proceedings ArticleDOI

Hierarchical Document Encoder for Parallel Corpus Mining

Mandy Guo, +7 more

TL;DR: The results show document embeddings derived from sentence-level averaging are surprisingly effective for clean datasets, but suggest models trained hierarchically at the document-level are more effective on noisy data.

...read moreread less