Showing papers by "Kevin Duh published in 2021"

PDF

Open Access

Proceedings Article•DOI•

ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder

[...]

Hirofumi Inaguma¹, Yosuke Higuchi², Kevin Duh³, Tatsuya Kawahara¹, Shinji Watanabe³ - Show less +1 more•Institutions (3)

Kyoto University¹, Waseda University², Johns Hopkins University³

06 Jun 2021

TL;DR: This paper proposed Orthros, a NAR E2E-ST framework, in which both NAR and autoregressive (AR) decoders are jointly trained on the shared speech encoder, which dramatically improves the effectiveness of a large length beam with negligible overhead.

...read moreread less

Abstract: Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems. End-to-end (E2E) models based on the encoder-decoder architecture are more suitable for this goal than traditional cascaded systems, but their effectiveness regarding decoding speed has not been explored so far. Inspired by recent progress in non-autoregressive (NAR) methods in text-based translation, which generates target tokens in parallel by eliminating conditional dependencies, we study the problem of NAR decoding for E2E-ST. We propose a novel NAR E2E-ST framework, Orthros, in which both NAR and autoregressive (AR) decoders are jointly trained on the shared speech encoder. The latter is used for selecting better translation among various length candidates generated from the former, which dramatically improves the effectiveness of a large length beam with negligible overhead. We further investigate effective length prediction methods from speech inputs and the impact of vocabulary sizes. Experiments on four benchmarks show the effectiveness of the proposed method in improving inference speed while maintaining competitive translation quality compared to state-of-the-art AR E2E-ST systems.

...read moreread less

13 citations

Proceedings Article•

Data and Parameter Scaling Laws for Neural Machine Translation

[...]

Mitchell A. Gordon¹, Kevin Duh¹, Jared Kaplan¹•Institutions (1)

Johns Hopkins University¹

16 May 2021

TL;DR: This paper observed that the development cross-entropy loss of supervised NMT models scales with the amount of training data and the number of non-embedding parameters in the model and discussed some practical implications of these results, such as predicting BLEU achieved by large scale models and predicting the ROI of labeling data in low-resource language pairs.

...read moreread less

Abstract: We observe that the development cross-entropy loss of supervised neural machine translation models scales like a power law with the amount of training data and the number of non-embedding parameters in the model. We discuss some practical implications of these results, such as predicting BLEU achieved by large scale models and predicting the ROI of labeling data in low-resource language pairs.

...read moreread less

7 citations

Proceedings Article•DOI•

Self-Guided Curriculum Learning for Neural Machine Translation

[...]

Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda - Show less +2 more

10 May 2021

TL;DR: The authors proposed a self-guided curriculum learning strategy that encourages the NMT model to learn from easy to hard on the basis of recovery degrees, and adopted sentence-level BLEU score as the proxy of recovery degree.

...read moreread less

Abstract: In supervised learning, a well-trained model should be able to recover ground truth accurately, i.e. the predicted labels are expected to resemble the ground truth labels as much as possible. Inspired by this, we formulate a difficulty criterion based on the recovery degrees of training examples. Motivated by the intuition that after skimming through the training corpus, the neural machine translation (NMT) model “knows” how to schedule a suitable curriculum according to learning difficulty, we propose a self-guided curriculum learning strategy that encourages the NMT model to learn from easy to hard on the basis of recovery degrees. Specifically, we adopt sentence-level BLEU score as the proxy of recovery degree. Experimental results on translation benchmarks including WMT14 English-German and WMT17 Chinese-English demonstrate that our proposed method considerably improves the recovery degree, thus consistently improving the translation performance.

...read moreread less

6 citations

Proceedings Article•DOI•

ESPnet-ST IWSLT 2021 Offline Speech Translation System

[...]

Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe - Show less +3 more

01 Jul 2021

TL;DR: This paper used sequence-level knowledge distillation (SeqKD) for end-to-end (E2E) speech translation and achieved state-of-the-art performance.

...read moreread less

Abstract: This paper describes the ESPnet-ST group’s IWSLT 2021 submission in the offline speech translation track. This year we made various efforts on training data, architecture, and audio segmentation. On the data side, we investigated sequence-level knowledge distillation (SeqKD) for end-to-end (E2E) speech translation. Specifically, we used multi-referenced SeqKD from multiple teachers trained on different amounts of bitext. On the architecture side, we adopted the Conformer encoder and the Multi-Decoder architecture, which equips dedicated decoders for speech recognition and translation tasks in a unified encoder-decoder model and enables search in both source and target language spaces during inference. We also significantly improved audio segmentation by using the pyannote.audio toolkit and merging multiple short segments for long context modeling. Experimental evaluations showed that each of them contributed to large improvements in translation performance. Our best E2E system combined all the above techniques with model ensembling and achieved 31.4 BLEU on the 2-ref of tst2021 and 21.2 BLEU and 19.3 BLEU on the two single references of tst2021.

...read moreread less

5 citations

Posted Content•

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolox\'ochitl Mixtec

[...]

Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe - Show less +2 more

26 Jan 2021-arXiv: Audio and Speech Processing

TL;DR: In this article, an end-to-end ASR system was proposed to overcome the transcription bottleneck and transcriber shortage that hinders endangered language (EL) documentation, and a novice transcription correction task was proposed.

...read moreread less

Abstract: "Transcription bottlenecks", created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, eschews linguistic resources but is instead more dependent on large-data settings. We open source a Yoloxochitl Mixtec EL corpus. First, we review our method in building an end-to-end ASR system in a way that would be reproducible by the ASR community. We then propose a novice transcription correction task and demonstrate how ASR systems and novice transcribers can work together to improve EL documentation. We believe this combinatory methodology would mitigate the transcription bottleneck and transcriber shortage that hinders EL documentation.

...read moreread less

4 citations

Proceedings Article•DOI•

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec.

[...]

Jiatong Shi¹, Jonathan D. Amith², Rey Castillo García, Esteban Guadalupe Sierra¹, Kevin Duh¹, Shinji Watanabe¹ - Show less +2 more•Institutions (2)

Johns Hopkins University¹, Gettysburg College²

01 Apr 2021

TL;DR: In this paper, an end-to-end ASR system was proposed to overcome the transcription bottleneck and transcriber shortage that hinders endangered language (EL) documentation in Mexico.

...read moreread less

Abstract: “Transcription bottlenecks”, created by a shortage of effective human transcribers (i.e., transcriber shortage), are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, eschews linguistic resources but is instead more dependent on large-data settings. We open source a Yoloxochitl Mixtec EL corpus. First, we review our method in building an end-to-end ASR system in a way that would be reproducible by the ASR community. We then propose a novice transcription correction task and demonstrate how ASR systems and novice transcribers can work together to improve EL documentation. We believe this combinatory methodology would mitigate the transcription bottleneck and transcriber shortage that hinders EL documentation.

...read moreread less

1 citations

Approaching Sign Language Gloss Translation as a Low-Resource Machine Translation Task.

[...]

Xuan Zhang, Kevin Duh

01 Aug 2021

TL;DR: The authors focus on the second-stage gloss translation component, which is challenging due to the scarcity of publicly available parallel data and investigate two popular methods for improving translation quality: hyperparameter search and backtranslation.

...read moreread less

Abstract: A cascaded Sign Language Translation system first maps sign videos to gloss annotations and then translates glosses into a spoken languages. This work focuses on the second-stage gloss translation component, which is challenging due to the scarcity of publicly available parallel data. We approach gloss translation as a low-resource machine translation task and investigate two popular methods for improving translation quality: hyperparameter search and backtranslation. We discuss the potentials and pitfalls of these methods based on experiments on the RWTH-PHOENIX-Weather 2014T dataset.

...read moreread less

Posted Content•

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring.

[...]

Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe - Show less +1 more

09 Sep 2021-arXiv: Audio and Speech Processing

TL;DR: In this paper, a unified NAR E2E-ST framework called Orthros was proposed, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.

...read moreread less

Abstract: This article describes an efficient end-to-end speech translation (E2E-ST) framework based on non-autoregressive (NAR) models. End-to-end speech translation models have several advantages over traditional cascade systems such as inference latency reduction. However, conventional AR decoding methods are not fast enough because each token is generated incrementally. NAR models, however, can accelerate the decoding speed by generating multiple tokens in parallel on the basis of the token-wise conditional independence assumption. We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder. The auxiliary shallow AR decoder selects the best hypothesis by rescoring multiple candidates generated from the NAR decoder in parallel (parallel AR rescoring). We adopt conditional masked language model (CMLM) and a connectionist temporal classification (CTC)-based model as NAR decoders for Orthros, referred to as Orthros-CMLM and Orthros-CTC, respectively. We also propose two training methods to enhance the CMLM decoder. Experimental evaluations on three benchmark datasets with six language directions demonstrated that Orthros achieved large improvements in translation quality with a very small overhead compared with the baseline NAR model. Moreover, the Conformer encoder architecture enabled large quality improvements, especially for CTC-based models. Orthros-CTC with the Conformer encoder increased decoding speed by 3.63x on CPU with translation quality comparable to that of an AR model.

...read moreread less

Posted Content•

An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces.

[...]

Kelly Marchisio¹, Youngser Park¹, Ali Saad-Eldin, Anton Alyakin¹, Kevin Duh¹, Carey E. Priebe¹, Philipp Koehn² - Show less +3 more•Institutions (2)

Johns Hopkins University¹, Facebook²

26 Sep 2021-arXiv: Computation and Language

TL;DR: The authors compare Euclidean versus graph-based approaches to bilingual lexicon induction under different data conditions and show that they complement each other when combined, and propose a graph matching optimization algorithm for word embeddings.

...read moreread less

Abstract: Much recent work in bilingual lexicon induction (BLI) views word embeddings as vectors in Euclidean space. As such, BLI is typically solved by finding a linear transformation that maps embeddings to a common space. Alternatively, word embeddings may be understood as nodes in a weighted graph. This framing allows us to examine a node's graph neighborhood without assuming a linear transform, and exploits new techniques from the graph matching optimization literature. These contrasting approaches have not been compared in BLI so far. In this work, we study the behavior of Euclidean versus graph-based approaches to BLI under differing data conditions and show that they complement each other when combined. We release our code at this https URL.

...read moreread less

Proceedings Article•DOI•

Adaptive Mixed Component LDA for Low Resource Topic Modeling.

[...]

Suzanna Sia, Kevin Duh¹•Institutions (1)

Johns Hopkins University¹

01 Apr 2021

TL;DR: The authors explore mixture models which interpolate between the discrete and continuous topic-word distributions that utilise pre-trained embeddings to improve topic coherence in low-resource settings.

...read moreread less

Abstract: Probabilistic topic models in low data resource scenarios are faced with less reliable estimates due to sparsity of discrete word co-occurrence counts, and do not have the luxury of retraining word or topic embeddings using neural methods. In this challenging resource constrained setting, we explore mixture models which interpolate between the discrete and continuous topic-word distributions that utilise pre-trained embeddings to improve topic coherence. We introduce an automatic trade-off between the discrete and continuous representations via an adaptive mixture coefficient, which places greater weight on the discrete representation when the corpus statistics are more reliable. The adaptive mixture coefficient takes into account global corpus statistics, and the uncertainty in each topic’s continuous distributions. Our approach outperforms the fully discrete, fully continuous, and static mixture model on topic coherence in low resource settings. We additionally demonstrate the generalisability of our method by extending it to handle multilingual document collections.

...read moreread less

Proceedings Article•DOI•

Sequence Models for Computational Etymology of Borrowings

[...]

Winston Wu¹, Kevin Duh¹, David Yarowsky¹•Institutions (1)

Johns Hopkins University¹

01 Aug 2021

Proceedings Article•

An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces.

[...]

Kelly Marchisio¹, Youngser Park¹, Ali Saad-Eldin, Anton Alyakin¹, Kevin Duh¹, Carey E. Priebe¹, Philipp Koehn² - Show less +3 more•Institutions (2)

Johns Hopkins University¹, Facebook²

01 Nov 2021

TL;DR: The authors compare Euclidean versus graph-based approaches to bilingual lexicon induction under differing data conditions and show that they complement each other when combined, and propose a graph matching optimization algorithm for word embeddings.

...read moreread less

Abstract: Much recent work in bilingual lexicon induction (BLI) views word embeddings as vectors in Euclidean space. As such, BLI is typically solved by finding a linear transformation that maps embeddings to a common space. Alternatively, word embeddings may be understood as nodes in a weighted graph. This framing allows us to examine a node’s graph neighborhood without assuming a linear transform, and exploits new techniques from the graph matching optimization literature. These contrasting approaches have not been compared in BLI so far. In this work, we study the behavior of Euclidean versus graph-based approaches to BLI under differing data conditions and show that they complement each other when combined. We release our code at https://github.com/kellymarchisio/euc-v-graph-bli.

...read moreread less

Posted Content•

ESPnet-ST IWSLT 2021 Offline Speech Translation System.

[...]

Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe - Show less +3 more

01 Jul 2021-arXiv: Audio and Speech Processing

TL;DR: This article used sequence-level knowledge distillation (SeqKD) for end-to-end (E2E) speech translation and achieved state-of-the-art performance.

...read moreread less

Abstract: This paper describes the ESPnet-ST group's IWSLT 2021 submission in the offline speech translation track. This year we made various efforts on training data, architecture, and audio segmentation. On the data side, we investigated sequence-level knowledge distillation (SeqKD) for end-to-end (E2E) speech translation. Specifically, we used multi-referenced SeqKD from multiple teachers trained on different amounts of bitext. On the architecture side, we adopted the Conformer encoder and the Multi-Decoder architecture, which equips dedicated decoders for speech recognition and translation tasks in a unified encoder-decoder model and enables search in both source and target language spaces during inference. We also significantly improved audio segmentation by using the pyannote.audio toolkit and merging multiple short segments for long context modeling. Experimental evaluations showed that each of them contributed to large improvements in translation performance. Our best E2E system combined all the above techniques with model ensembling and achieved 31.4 BLEU on the 2-ref of tst2021 and 21.2 BLEU and 19.3 BLEU on the two single references of tst2021.

...read moreread less

Posted Content•

Self-Guided Curriculum Learning for Neural Machine Translation

[...]

Lei Zhou¹, Liang Ding², Kevin Duh³, Shinji Watanabe, Ryohei Sasano¹, Koichi Takeda¹ - Show less +2 more•Institutions (3)

Nagoya University¹, University of Sydney², Johns Hopkins University³

10 May 2021-arXiv: Computation and Language

TL;DR: This article proposed a self-guided curriculum strategy to encourage the learning of neural machine translation models to follow the above recovery criterion, where they cast the recovery degree of each training example as its learning difficulty.

...read moreread less

Abstract: In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i.e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible. Inspired by this, we propose a self-guided curriculum strategy to encourage the learning of neural machine translation (NMT) models to follow the above recovery criterion, where we cast the recovery degree of each training example as its learning difficulty. Specifically, we adopt the sentence level BLEU score as the proxy of recovery degree. Different from existing curricula relying on linguistic prior knowledge or third-party language models, our chosen learning difficulty is more suitable to measure the degree of knowledge mastery of the NMT models. Experiments on translation benchmarks, including WMT14 English$\Rightarrow$German and WMT17 Chinese$\Rightarrow$English, demonstrate that our approach can consistently improve translation performance against strong baseline Transformer.

...read moreread less

Machine Translation Believability

[...]

Marianna J. Martindale, Kevin Duh, Marine Carpuat

01 Apr 2021

TL;DR: The authors study the relationship of believability to fluency and adequacy by applying traditional MT direct assessment protocols to annotate all three features on the output of neural MT systems, and they find that it is closely related to but distinct from fluency, and initial qualitative analysis suggests that semantic features may explain the difference.

...read moreread less

Abstract: Successful Machine Translation (MT) deployment requires understanding not only the intrinsic qualities of MT output, such as fluency and adequacy, but also user perceptions. Users who do not understand the source language respond to MT output based on their perception of the likelihood that the meaning of the MT output matches the meaning of the source text. We refer to this as believability. Output that is not believable may be off-putting to users, but believable MT output with incorrect meaning may mislead them. In this work, we study the relationship of believability to fluency and adequacy by applying traditional MT direct assessment protocols to annotate all three features on the output of neural MT systems. Quantitative analysis of these annotations shows that believability is closely related to but distinct from fluency, and initial qualitative analysis suggests that semantic features may account for the difference.

...read moreread less