scispace - formally typeset
D

Devendra Singh Sachan

Researcher at Carnegie Mellon University

Publications -  31
Citations -  1136

Devendra Singh Sachan is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Machine translation & Computer science. The author has an hindex of 11, co-authored 27 publications receiving 800 citations. Previous affiliations of Devendra Singh Sachan include Indian Institute of Technology Guwahati & International Institute of Information Technology, Hyderabad.

Papers
More filters
Proceedings ArticleDOI

When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?

TL;DR: It is shown that pre-trained word embeddings can be surprisingly effective in NMT tasks – providing gains of up to 20 BLEU points in the most favorable setting.
Proceedings Article

Adaptive Methods for Nonconvex Optimization

TL;DR: The result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the non-convergence issues, and provides a new adaptive optimization algorithm, Yogi, which controls the increase in effective learning rate, leading to even better performance with similar theoretical guarantees on convergence.
Journal ArticleDOI

Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function

TL;DR: This paper develops a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results compared with more complex approaches, and shows the generality of the mixed objective function by improving the performance on relation extraction task.
Proceedings ArticleDOI

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

TL;DR: This article examined parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model and found that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family.

Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition.

TL;DR: This work trains a bidirectional language model (BiLM) on unlabeled data and transfers its weights to "pretrain" an NER model with the same architecture as the BiLM, which results in a better parameter initialization of the NER models.