Open AccessProceedings Article
Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling
Heike Adel,Ngoc Thang Vu,Tanja Schultz +2 more
- pp 206-211
Reads0
Chats0
TLDR
A way to integrate partof-speech tags (POS) and language information (LID) into these models which leads to significant improvements in terms of perplexity and it is shown that recurrent neural networks and factored language models can be combined using linear interpolation to achieve the best performance.Abstract:
In this paper, we investigate the application of recurrent neural network language models (RNNLM) and factored language models (FLM) to the task of language modeling for Code-Switching speech. We present a way to integrate partof-speech tags (POS) and language information (LID) into these models which leads to significant improvements in terms of perplexity. Furthermore, a comparison between RNNLMs and FLMs and a detailed analysis of perplexities on the different backoff levels are performed. Finally, we show that recurrent neural networks and factored language models can be combined using linear interpolation to achieve the best performance. The final combined language model provides 37.8% relative improvement in terms of perplexity on the SEAME development set and a relative improvement of 32.7% on the evaluation set compared to the traditional n-gram language model.read more
Citations
More filters
Journal ArticleDOI
A primer on neural network models for natural language processing
TL;DR: This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques.
Book
Neural Network Methods in Natural Language Processing
Yoav Goldberg,Graeme Hirst +1 more
TL;DR: Neural networks are a family of powerful machine learning models as mentioned in this paper, and they have been widely used in natural language processing applications such as machine translation, syntactic parsing, and multi-task learning.
Journal ArticleDOI
Computational sociolinguistics: A survey
TL;DR: This article aims to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction, and multilingual communication.
Journal ArticleDOI
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Edoardo Maria Ponti,Helen O'Horan,Yevgeni Berzak,Ivan Vulić,Roi Reichart,Thierry Poibeau,Ekaterina Shutova,Anna Korhonen +7 more
TL;DR: It is shown that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance, due to both intrinsic limitations of databases and under-employment of the typological features included in them.
Proceedings ArticleDOI
Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data
Adithya Pratapa,Gayatri Bhat,Monojit Choudhury,Sunayana Sitaram,Sandipan Dandapat,Kalika Bali +5 more
TL;DR: A computational technique for creation of grammatically valid artificial CM data based on the Equivalence Constraint Theory is presented and it is shown that when training examples are sampled appropriately from this synthetic data and presented in certain order, it can significantly reduce the perplexity of an RNN-based language model.
References
More filters
ReportDOI
Building a large annotated corpus of English: the penn treebank
TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Proceedings Article
Recurrent neural network based language model
TL;DR: Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
Proceedings ArticleDOI
Feature-rich part-of-speech tagging with a cyclic dependency network
TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.
Proceedings ArticleDOI
Extensions of recurrent neural network language model
TL;DR: Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model.
Journal ArticleDOI
“Sometimes I'll start a sentence in Spanish Y TERMINO EN ESPAÑOL”: Toward a typology of code-switching
TL;DR: In this article, the authors integrate the results of the ethnographic and attitudinal components of the broader study into a specifically sociolinguistic analysis, focusing on speakers of varying bilingual abilities, and demonstrate how the incorporation of both functional and linguistic factors into a single model is necessary to account for code-switching behavior.