scispace - formally typeset
Open AccessProceedings Article

Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling

Reads0
Chats0
TLDR
A way to integrate partof-speech tags (POS) and language information (LID) into these models which leads to significant improvements in terms of perplexity and it is shown that recurrent neural networks and factored language models can be combined using linear interpolation to achieve the best performance.
Abstract
In this paper, we investigate the application of recurrent neural network language models (RNNLM) and factored language models (FLM) to the task of language modeling for Code-Switching speech. We present a way to integrate partof-speech tags (POS) and language information (LID) into these models which leads to significant improvements in terms of perplexity. Furthermore, a comparison between RNNLMs and FLMs and a detailed analysis of perplexities on the different backoff levels are performed. Finally, we show that recurrent neural networks and factored language models can be combined using linear interpolation to achieve the best performance. The final combined language model provides 37.8% relative improvement in terms of perplexity on the SEAME development set and a relative improvement of 32.7% on the evaluation set compared to the traditional n-gram language model.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A primer on neural network models for natural language processing

TL;DR: This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques.
Book

Neural Network Methods in Natural Language Processing

TL;DR: Neural networks are a family of powerful machine learning models as mentioned in this paper, and they have been widely used in natural language processing applications such as machine translation, syntactic parsing, and multi-task learning.
Journal ArticleDOI

Computational sociolinguistics: A survey

TL;DR: This article aims to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction, and multilingual communication.
Journal ArticleDOI

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

TL;DR: It is shown that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance, due to both intrinsic limitations of databases and under-employment of the typological features included in them.
Proceedings ArticleDOI

Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data

TL;DR: A computational technique for creation of grammatically valid artificial CM data based on the Equivalence Constraint Theory is presented and it is shown that when training examples are sampled appropriately from this synthetic data and presented in certain order, it can significantly reduce the perplexity of an RNN-based language model.
References
More filters
ReportDOI

Building a large annotated corpus of English: the penn treebank

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Proceedings Article

Recurrent neural network based language model

TL;DR: Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
Proceedings ArticleDOI

Feature-rich part-of-speech tagging with a cyclic dependency network

TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.
Proceedings ArticleDOI

Extensions of recurrent neural network language model

TL;DR: Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model.
Journal ArticleDOI

“Sometimes I'll start a sentence in Spanish Y TERMINO EN ESPAÑOL”: Toward a typology of code-switching

Shana Poplack
- 01 Jan 1980 - 
TL;DR: In this article, the authors integrate the results of the ethnographic and attitudinal components of the broader study into a specifically sociolinguistic analysis, focusing on speakers of varying bilingual abilities, and demonstrate how the incorporation of both functional and linguistic factors into a single model is necessary to account for code-switching behavior.
Related Papers (5)