scispace - formally typeset
Search or ask a question
Author

Chu-Cheng Lin

Other affiliations: Carnegie Mellon University, National Taiwan University, INESC-ID  ...read more
Bio: Chu-Cheng Lin is an academic researcher from Johns Hopkins University. The author has contributed to research in topics: Autoregressive model & Language model. The author has an hindex of 8, co-authored 19 publications receiving 322 citations. Previous affiliations of Chu-Cheng Lin include Carnegie Mellon University & National Taiwan University.

Papers
More filters
Proceedings ArticleDOI
01 Sep 2015
TL;DR: An extension to the bag-ofwords model for learning words representations that take into account both syntactic and semantic properties within language is introduced by employing an attention model that finds within the contextual words, the words that are relevant for each prediction.
Abstract: We introduce an extension to the bag-ofwords model for learning words representations that take into account both syntactic and semantic properties within language. This is done by employing an attention model that finds within the contextual words, the words that are relevant for each prediction. The general intuition of our model is that some words are only relevant for predicting local context (e.g. function words), while other words are more suited for determining global context, such as the topic of the document. Experiments performed on both semantically and syntactically oriented tasks show gains using our model over the existing bag of words model. Furthermore, compared to other more sophisticated models, our model scales better as we increase the size of the context of the model.

156 citations

Proceedings ArticleDOI
01 Jan 2015
TL;DR: This paper showed that word embeddings can also add value to the problem of unsupervised POS induction, replacing multinomial distributions over the vocabulary with multivariate Gaussian distributions over word embedding and observe consistent improvements in eight languages.
Abstract: Unsupervised word embeddings have been shown to be valuable as features in supervised learning problems; however, their role in unsupervised problems has been less thoroughly explored. In this paper, we show that embeddings can likewise add value to the problem of unsupervised POS induction. In two representative models of POS induction, we replace multinomial distributions over the vocabulary with multivariate Gaussian distributions over word embeddings and observe consistent improvements in eight languages. We also analyze the e ect of various choices while inducing word embeddings on “downstream” POS induction results.

67 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: After describing the CRF-based baseline system, three extensions for learning from unlabeled data: semi-supervised learning, word embeddings, and word lists are discussed.
Abstract: We describe the CMU submission for the 2014 shared task on language identification in code-switched data. We participated in all four language pairs: Spanish‐English, Mandarin‐English, Nepali‐English, and Modern Standard Arabic‐Arabic dialects. After describing our CRF-based baseline system, we discuss three extensions for learning from unlabeled data: semi-supervised learning, word embeddings, and word lists.

23 citations

Posted Content
TL;DR: This paper trains recurrent neural networks with only raw features, and uses word embedding to automatically learn meaningful representations, and is able to outperform the best SVM-based systems reported in the EMNLP'14 Code-Switching Workshop by 1% in accuracy, or by 17% in error rate reduction.
Abstract: Mixed language data is one of the difficult yet less explored domains of natural language processing. Most research in fields like machine translation or sentiment analysis assume monolingual input. However, people who are capable of using more than one language often communicate using multiple languages at the same time. Sociolinguists believe this "code-switching" phenomenon to be socially motivated. For example, to express solidarity or to establish authority. Most past work depend on external tools or resources, such as part-of-speech tagging, dictionary look-up, or named-entity recognizers to extract rich features for training machine learning models. In this paper, we train recurrent neural networks with only raw features, and use word embedding to automatically learn meaningful representations. Using the same mixed-language Twitter corpus, our system is able to outperform the best SVM-based systems reported in the EMNLP'14 Code-Switching Workshop by 1% in accuracy, or by 17% in error rate reduction.

23 citations

Posted Content
TL;DR: It is shown that embeddings can likewise add value to the problem of unsupervised POS induction and in two representative models of POS induction, multinomial distributions over the vocabulary are replaced with multivariate Gaussian distributions over word embeddeddings.
Abstract: Unsupervised word embeddings have been shown to be valuable as features in supervised learning problems; however, their role in unsupervised problems has been less thoroughly explored. In this paper, we show that embeddings can likewise add value to the problem of unsupervised POS induction. In two representative models of POS induction, we replace multinomial distributions over the vocabulary with multivariate Gaussian distributions over word embeddings and observe consistent improvements in eight languages. We also analyze the effect of various choices while inducing word embeddings on "downstream" POS induction results.

19 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Proceedings ArticleDOI
04 Mar 2016
TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.
Abstract: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 de juny 2016.

3,960 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: The gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage, are presented and holds the first place on the SQuAD leaderboard for both single and ensemble model.
Abstract: In this paper, we present the gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage. We first match the question and passage with gated attention-based recurrent networks to obtain the question-aware passage representation. Then we propose a self-matching attention mechanism to refine the representation by matching the passage against itself, which effectively encodes information from the whole passage. We finally employ the pointer networks to locate the positions of answers from the passages. We conduct extensive experiments on the SQuAD dataset. The single model achieves 71.3% on the evaluation metrics of exact match on the hidden test set, while the ensemble model further boosts the results to 75.9%. At the time of submission of the paper, our model holds the first place on the SQuAD leaderboard for both single and ensemble model.

721 citations

Posted Content
TL;DR: Zhang et al. as discussed by the authors proposed a spatial memory network, which stores neuron activations from different spatial regions of the image in its memory, and uses the question to choose relevant regions for computing the answer, a process of which constitutes a single hop in the network.
Abstract: We address the problem of Visual Question Answering (VQA), which requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deep image captioning methods based on convolutional-recurrent networks to this problem, but have failed to model spatial inference. To remedy this, we propose a model we call the Spatial Memory Network and apply it to the VQA task. Memory networks are recurrent neural networks with an explicit attention mechanism that selects certain parts of the information stored in memory. Our Spatial Memory Network stores neuron activations from different spatial regions of the image in its memory, and uses the question to choose relevant regions for computing the answer, a process of which constitutes a single "hop" in the network. We propose a novel spatial attention architecture that aligns words with image patches in the first hop, and obtain improved results by adding a second attention hop which considers the whole question to choose visual evidence based on the results of the first hop. To better understand the inference process learned by the network, we design synthetic questions that specifically require spatial inference and visualize the attention weights. We evaluate our model on two published visual question answering datasets, DAQUAR [1] and VQA [2], and obtain improved results compared to a strong deep baseline model (iBOWIMG) which concatenates image and question features to predict the answer [3].

585 citations

Proceedings ArticleDOI
01 Aug 2016
TL;DR: This work presents a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM, and suggests they could be useful in a wide variety of NLP tasks.
Abstract: Context representations are central to various NLP tasks, such as word sense disambiguation, named entity recognition, coreference resolution, and many more. In this work we present a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM. With a very simple application of our context representations, we manage to surpass or nearly reach state-of-the-art results on sentence completion, lexical substitution and word sense disambiguation tasks, while substantially outperforming the popular context representation of averaged word embeddings. We release our code and pretrained models, suggesting they could be useful in a wide variety of NLP tasks.

528 citations