Duygu Ataman

Researcher at New York University

Publications - 22

Citations - 263

Duygu Ataman is an academic researcher from New York University. The author has contributed to research in topics: Machine translation & Vocabulary. The author has an hindex of 8, co-authored 19 publications receiving 206 citations. Previous affiliations of Duygu Ataman include fondazione bruno kessler & University of Zurich.

Papers

PDF

Open Access

More filters

Journal ArticleDOI

Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English

Duygu Ataman, +4 more

- 01 Jun 2017 -

The Prague Bulletin of Mathematical Ling...

TL;DR: This paper proposed a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language, and achieved a significant improvement of 2.3 BLEU points over the conventional vocabulary reduction technique, showing that it can provide better accuracy in open vocabulary translation of morphologically rich languages.

...read moreread less

Proceedings ArticleDOI

Compositional Representation of Morphologically-Rich Input for Neural Machine Translation

Duygu Ataman, +1 more

TL;DR: The authors propose to replace the source-language embedding layer of NMT with a bi-directional recurrent neural network that generates compositional representations of the input at any desired level of granularity.

...read moreread less

Proceedings Article

An Evaluation of Two Vocabulary Reduction Methods for Neural Machine Translation

Duygu Ataman, +1 more

TL;DR: An extensive evaluation of two unsupervised vocabulary reduction methods in NMT, the wellknown byte-pair-encoding (BPE) and linguistically-motivated vocabulary reduction (LMVR), a segmentation method which also considers morphological properties of subwords.

...read moreread less

Posted Content

Compositional Representation of Morphologically-Rich Input for Neural Machine Translation

Duygu Ataman, +1 more

- 05 May 2018 -

arXiv: Computation and Language

...read moreread less

Posted Content

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Isaac Caswell, +51 more

- 23 Mar 2021 -

arXiv: Computation and Language

TL;DR: In this paper, the authors manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4) and audit the correctness of language codes in a sixth (JW300).

...read moreread less