scispace - formally typeset
Search or ask a question

Showing papers presented at "International Joint Conference on Natural Language Processing in 2011"


Proceedings Article
01 Nov 2011
TL;DR: The authors extracted a largescale Japanese learners' corpus from the revision log of a language learning SNS and used it as training data for learners' error correction using an SMT approach.
Abstract: We present an attempt to extract a largescale Japanese learners’ corpus from the revision log of a language learning SNS. This corpus is easy to obtain in largescale, covers a wide variety of topics and styles, and can be a great source of knowledge for both language learners and instructors. We also demonstrate that the extracted learners’ corpus of Japanese as a second language can be used as training data for learners’ error correction using an SMT approach. We evaluate different granularities of tokenization to alleviate the problem of word segmentation errors caused by erroneous input from language learners. Experimental results show that the character-wise model outperforms the word-wise model.

164 citations


Proceedings Article
01 Nov 2011
TL;DR: It is shown that transductive (cross-domain) learning is an important consideration in building a general-purpose language identification system, and a feature selection method is developed that generalizes across domains.
Abstract: We show that transductive (cross-domain) learning is an important consideration in building a general-purpose language identification system, and develop a feature selection method that generalizes across domains. Our results demonstrate that our method provides improvements in transductive transfer learning for language identification. We provide an implementation of the method and show that our system is faster than popular standalone language identification systems, while maintaining competitive accuracy.

161 citations


Proceedings Article
01 Nov 2011
TL;DR: This paper collects and analyse the compositionality judgments for a range of compound nouns using Mechanical Turk, and evaluates two different types of distributional models for compositionality detection – constituent based models and composition function based models.
Abstract: A multiword is compositional if its meaning can be expressed in terms of the meaning of its constituents. In this paper, we collect and analyse the compositionality judgments for a range of compound nouns using Mechanical Turk. Unlike existing compositionality datasets, our dataset has judgments on the contribution of constituent words as well as judgments for the phrase as a whole. We use this dataset to study the relation between the judgments at constituent level to that for the whole phrase. We then evaluate two different types of distributional models for compositionality detection – constituent based models and composition function based models. Both the models show competitive performance though the composition function based models perform slightly better. In both types, additive models perform better than their multiplicative counterparts.

145 citations


Proceedings Article
01 Nov 2011
TL;DR: A fully automatic framework for fine-grained sentiment analysis on the subsentence level combining multiple sentiment lexicons and neighborhood as well as discourse relations to overcome the problem of uncertainty in polarity predictions is presented.
Abstract: Sentiment analysis is the problem of determining the polarity of a text with respect to a particular topic. For most applications, however, it is not only necessary to derive the polarity of a text as a whole but also to extract negative and positive utterances on a more finegrained level. Sentiment analysis systems working on the (sub-)sentence level, however, are difficult to develop since shorter textual segments rarely carry enough information to determine their polarity out of context. In this paper, therefore, we present a fully automatic framework for fine-grained sentiment analysis on the subsentence level combining multiple sentiment lexicons and neighborhood as well as discourse relations to overcome this problem. We use Markov logic to integrate polarity scores from different sentiment lexicons with information about relations between neighboring segments, and evaluate the approach on product reviews. The experiments show that the use of structural features improves the accuracy of polarity predictions achieving accuracy scores of up to 69%.

139 citations


Proceedings Article
01 Nov 2011
TL;DR: A method for characterizing a research work in terms of its focus, domain of application, and techniques used is presented and it is shown how tracing these aspects over time provides a novel measure of the influence of research communities on each other.
Abstract: We present a method for characterizing a research work in terms of its focus, domain of application, and techniques used. We show how tracing these aspects over time provides a novel measure of the influence of research communities on each other. We extract these characteristics by matching semantic extraction patterns, learned using bootstrapping, to the dependency trees of sentences in an article’s

111 citations


Proceedings Article
01 Nov 2011
TL;DR: The system illustrates that the classic supervised keyphrase extraction approach – mostly used for scientific genre previously – could be adapted for opinion-related keyphrases and provides a comparison of the effectiveness of the standard keyphrase extracted features and that of the system designed for the special task of opinion expression mining.
Abstract: In this paper, we shall introduce a system for extracting the keyphrases for the reason of authors’ opinion from product reviews. The datasets for two fairly different product review domains related to movies and mobile phones were constructed semiautomatically based on the pros and cons entered by the authors. The system illustrates that the classic supervised keyphrase extraction approach – mostly used for scientific genre previously – could be adapted for opinion-related keyphrases. Besides adapting the original framework to this special task through defining novel, taskspecific features, an efficient way of representing keyphrase candidates will be demonstrated as well. The paper also provides a comparison of the effectiveness of the standard keyphrase extraction features and that of the system designed for the special task of opinion expression mining.

96 citations


Proceedings Article
01 Nov 2011
TL;DR: This work builds an ensemble-style selftraining classification model and gets better classification performance using only few training data, which largely reduces the manual annotation work in this task.
Abstract: Classification of citations into categories such as use, refutation, comparison etc. may have several relevant applications for digital libraries such as paper browsing aids, reading recommendations, qualified citation indexing, or fine-grained impact factor calculation. Most citation classification approaches described so far heavily rely on rule systems and patterns tailored to specific science domains. We focus on a less manual approach by learning domaininsensitive features from textual, physical, and syntactic aspects. Our experiments show the effectiveness of this feature set with various machine learning algorithms on datasets of different sizes. Furthermore, we build an ensemble-style selftraining classification model and get better classification performance using only few training data, which largely reduces the manual annotation work in this task.

91 citations


Proceedings Article
01 Nov 2011
TL;DR: A simple yet effective semi-supervised method to improve Chinese word segmentation and POS tagging by introducing novel features derived from large auto-analyzed data to enhance a simple pipelined system.
Abstract: This paper presents a simple yet effective semi-supervised method to improve Chinese word segmentation and POS tagging. We introduce novel features derived from large auto-analyzed data to enhance a simple pipelined system. The auto-analyzed data are generated from unlabeled data by using a baseline system. We evaluate the usefulness of our approach in a series of experiments on Penn Chinese Treebanks and show that the new features provide substantial performance gains in all experiments. Furthermore, the results of our proposed method are superior to the best reported results in the literature.

91 citations


Proceedings Article
01 Nov 2011
TL;DR: A model that represents word meaning in context by vectors which are modified according to the words in the target’s syntactic context, which outperforms all previous models on a word sense disambiguation task.
Abstract: We present a model that represents word meaning in context by vectors which are modified according to the words in the target’s syntactic context. Contextualization of a vector is realized by reweighting its components, based on distributional information about the context words. Evaluation on a paraphrase ranking task derived from the SemEval 2007 Lexical Substitution Task shows that our model outperforms all previous models on this task. We show that our model supports a wider range of applications by evaluating it on a word sense disambiguation task. Results show that our model achieves state-of-the-art performance.

88 citations


Proceedings Article
08 Nov 2011
TL;DR: It is found that the Wall-Street-Journal-trained statistical parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy.
Abstract: We investigate the problem of parsing the noisy language of social media. We evaluate four Wall-Street-Journal-trained statistical parsers (Berkeley, Brown, Malt and MST) on a new dataset containing 1,000 phrase structure trees for sentences from microblogs (tweets) and discussion forum posts. We compare the four parsers on their ability to produce Stanford dependencies for these Web 2.0 sentences. We find that the parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy. We attempt three retraining experiments involving Malt, Brown and an in-house Berkeley-style parser and obtain a statistically significant improvement for all three parsers.

86 citations


Proceedings Article
01 Nov 2011
TL;DR: This paper proposes a topic model incorporated with the category information into the process of discovering the latent topics in the content of questions and combines the semantic similarity based latent topics with the translation-based language model into a unified framework for question retrieval.
Abstract: Community-based Question Answering (cQA) is a popular online service where users can ask and answer questions on any topics. This paper is concerned with the problem of question retrieval. Question retrieval in cQA aims to find historical questions that are semantically equivalent or relevant to the queried questions. Although the translation-based language model (Xue et al., 2008) has gained the state-of-the-art performance for question retrieval, they ignore the latent topic information in calculating the semantic similarity between questions. In this paper, we propose a topic model incorporated with the category information into the process of discovering the latent topics in the content of questions. Then we combine the semantic similarity based latent topics with the translation-based language model into a unified framework for question retrieval. Experiments are carried out on a real world cQA data set from Yahoo! Answers. The results show that our proposed method can significantly improve the question retrieval performance of translation-based language model.

Proceedings Article
01 Nov 2011
TL;DR: The first incremental approach to the task of joint POS tagging and dependency parsing, which is built upon a shift-reduce parsing framework with dynamic programming is proposed, achieving the new state-of-the-art performance on this joint task.
Abstract: We address the problem of joint part-of-speech (POS) tagging and dependency parsing in Chinese. In Chinese, some POS tags are often hard to disambiguate without considering longrange syntactic information. Also, the traditional pipeline approach to POS tagging and dependency parsing may suffer from the problem of error propagation. In this paper, we propose the first incremental approach to the task of joint POS tagging and dependency parsing, which is built upon a shift-reduce parsing framework with dynamic programming. Although the incremental approach encounters difficulties with underspecified POS tags of look-ahead words, we overcome this issue by introducing so-called delayed features. Our joint approach achieved substantial improvements over the pipeline and baseline systems in both POS tagging and dependency parsing task, achieving the new state-of-the-art performance on this joint task.

Proceedings Article
01 Nov 2011
TL;DR: This paper describes a two-phase method for expanding abbreviations found in informal text using a machine translation system trained at the character level during the first phase, which is much more robust to new abbreviations than a word-level system.
Abstract: This paper describes a two-phase method for expanding abbreviations found in informal text (e.g., email, text messages, chat room conversations) using a machine translation system trained at the character level during the first phase. In this way, the system learns mappings between character-level “phrases” and is much more robust to new abbreviations than a word-level system. We generate translation models that are independent of the way in which the abbreviations are formed and show that the results show little degradation compared to when type-dependent models are trained. Our experiments on a large data set show our proposed system performs well when tested both on isolated abbreviations and, with the incorporation of a second phase utilizing an in-domain language model, in the context of neighboring words.

Proceedings Article
01 Nov 2011
TL;DR: A system to mine information regarding the safety of people in the disaster-stricken area from Twitter, a massive yet highly unorganized information source, was created.
Abstract: This paper describes efforts of NLP researchers to create a system to aid the relief efforts during the 2011 East Japan Earthquake. Specifically, we created a system to mine information regarding the safety of people in the disaster-stricken area from Twitter, a massive yet highly unorganized information source. We describe the large scale collaborative effort to rapidly create robust and effective systems for word segmentation, named entity recognition, and tweet classification. As a result of our efforts, we were able to effectively deliver new information about the safety of over 100 people in the disasterstricken area to a central repository for safety information.

Proceedings Article
01 Nov 2011
TL;DR: A discriminative model is presented that simultaneously determines an appropriate case frame for a given predicate and its predicate-argument structure and the results of zero anaphora resolution on Web text and the effectiveness of the approach are reported.
Abstract: We present a discriminative model for Japanese zero anaphora resolution that simultaneously determines an appropriate case frame for a given predicate and its predicate-argument structure. Our model is based on a log linear framework, and exploits lexical features obtained from a large raw corpus, as well as non-lexical features obtained from a relatively small annotated corpus. We report the results of zero anaphora resolution on Web text and demonstrate the effectiveness of our approach. In addition, we also investigate the relative importance of each feature for resolving zero anaphora in Web text.

Proceedings Article
01 Nov 2011
TL;DR: A POS-based ensemble model is proposed to efficiently integrate features with different types of POS tags to improve the classification performance of cross-domain sentiment classification.
Abstract: In this paper, we focus on the tasks of cross-domain sentiment classification. We find across different domains, features with some types of part-of-speech (POS) tags are domain-dependent, while some others are domain-free. Based on this finding, we proposed a POS-based ensemble model to efficiently integrate features with different types of POS tags to improve the classification performance. Weights are trained by stochastic gradient descent (SGD) to optimize the perceptron and minimal classification error (MCE) criteria. Experimental results show that the proposed ensemble model is quite effective for the task of cross-domain sentiment classification.

Proceedings Article
01 Nov 2011
TL;DR: A new automatically aligned resource of Wiktionary and WordNet is proposed that has a very high domain coverage of word senses and an enriched sense representation, including pronunciations, etymologies, translations, etc.
Abstract: By today, no lexical resource can claim to be fully comprehensive or perform best for every NLP task. This caused a steep increase of resource alignment research. An important challenge is thereby the alignment of differently represented word senses, which we address in this paper. In particular, we propose a new automatically aligned resource of Wiktionary and WordNet that has (i) a very high domain coverage of word senses and (ii) an enriched sense representation, including pronunciations, etymologies, translations, etc. We evaluate our alignment both quantitatively and qualitatively, and explore how it can contribute to practical tasks.

Proceedings Article
01 Nov 2011
TL;DR: This paper takes a data driven approach to identify arguments of explicit discourse connectives and designs the argument segmentation task as a cascade of decisions based on conditional random fields (CRFs).
Abstract: Parsing discourse is a challenging natural language processing task. In this paper we take a data driven approach to identify arguments of explicit discourse connectives. In contrast to previous work we do not make any assumptions on the span of arguments and consider parsing as a token-level sequence labeling task. We design the argument segmentation task as a cascade of decisions based on conditional random fields (CRFs). We train the CRFs on lexical, syntactic and semantic features extracted from the Penn Discourse Treebank and evaluate feature combinations on the commonly used test split. We show that the best combination of features includes syntactic and semantic features. The comparative error analysis investigates the performance variability over connective types and argument positions.

Proceedings Article
01 Nov 2011
TL;DR: A new test collection is created to evaluate cross-language entity linking performance in twenty-one languages and presents experiments that examine issues such as: the importance of transliteration; the utility of cross- language information retrieval; and, the potential benefit of multilingual named entity recognition.
Abstract: There has been substantial recent interest in aligning mentions of named entities in unstructured texts to knowledge base descriptors, a task commonly called entity linking. This technology is crucial for applications in knowledge discovery and text data mining. This paper presents experiments in the new problem of crosslanguage entity linking, where documents and named entities are in a different language than that used for the content of the reference knowledge base. We have created a new test collection to evaluate cross-language entity linking performance in twenty-one languages. We present experiments that examine issues such as: the importance of transliteration; the utility of cross-language information retrieval; and, the potential benefit of multilingual named entity recognition. Our best model achieves performance which is 94% of a strong monolingual baseline.

Proceedings Article
01 Nov 2011
TL;DR: A hierarchical Bayesian model based on latent Dirichlet allocation (LDA) is presented, called subjLDA, for sentence-level subjectivity detection, which automatically identifies whether a given sentence expresses opinion or states facts.
Abstract: This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identifies whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classifier training or linguistic pattern extraction for subjectivity classification, we view the problem as weakly-supervised generative model learning, where the only input to the model is a small set of domain independent subjectivity lexical clues. A mechanism is introduced to incorporate the prior information about the subjectivity lexical clues into model learning by modifying the Dirichlet priors of topic-word distributions. The subjLDA model has been evaluated on the Multi-Perspective Question Answering (MPQA) dataset and promising results have been observed in the preliminary experiments. We have also explored adding neutral words as prior information for model learning. It was found that while incorporating subjectivity clues bearing positive or negative polarity can achieve a significant performance gain, the prior lexical information from neutral words is less effective.

Proceedings Article
01 Nov 2011
TL;DR: The results show that selecting relevant senses of the constituent words leads to a better semantic composition of the compound, and dynamic prototypes perform better than static prototypes.
Abstract: Compositional Distributional Semantic methods model the distributional behavior of a compound word by exploiting the distributional behavior of its constituent words. In this setting, a constituent word is typically represented by a feature vector conflating all the senses of that word. However, not all the senses of a constituent word are relevant when composing the semantics of the compound. In this paper, we present two different methods for selecting the relevant senses of constituent words. The first one is based on Word Sense Induction and creates a static multi prototype vectors representing the senses of a constituent word. The second creates a single dynamic prototype vector for each constituent word based on the distributional properties of the other constituents in the compound. We use these prototype vectors for composing the semantics of noun-noun compounds and evaluate on a compositionality-based similarity task. Our results show that: (1) selecting relevant senses of the constituent words leads to a better semantic composition of the compound, and (2) dynamic prototypes perform better than static prototypes.

Proceedings Article
01 Nov 2011
TL;DR: This work presents a machine-learning approach that automatically clusters words in multilingual word lists into cognate sets using a number of diverse word similarity measures and features that encode the degree of affinity between pairs of languages.
Abstract: Word lists have become available for most of the world’s languages, but only a small fraction of such lists contain cognate information. We present a machine-learning approach that automatically clusters words in multilingual word lists into cognate sets. Our method incorporates a number of diverse word similarity measures and features that encode the degree of affinity between pairs of languages. The output of the classification algorithm is then used to generate cognate groups. The results of the experiments on word lists representing several language families demonstrate the utility of the proposed approach.

Proceedings Article
01 Nov 2011
TL;DR: This paper formalizes the task of finding a knowledge base entry that a given named entity mention refers to, namely entity linking, by identifying the most “important” node among the graph nodes representing the candidate entries by introducing three degree-based measures of graph connectivity.
Abstract: In this paper, we formalize the task of finding a knowledge base entry that a given named entity mention refers to, namely entity linking, by identifying the most “important” node among the graph nodes representing the candidate entries. With the aim of ranking these entities by their “importance”, we introduce three degree-based measures of graph connectivity. Experimental results on the TACKBP benchmark data sets show that our graph-based method performs comparably with the state-of-the-art methods. We also show that using the name phrase feature outperforms the commonly used bagof-word feature for entity linking.

Proceedings Article
01 Nov 2011
TL;DR: This work proposes a new subjectivity classification at the segment level that is more appropriate for discourse-based sentiment analysis and automatically distinguish between subjective nonevaluative and objective segments by using local and global context features.
Abstract: We propose a new subjectivity classification at the segment level that is more appropriate for discourse-based sentiment analysis. Our approach automatically distinguish between subjective nonevaluative and objective segments and between implicit and explicit opinions, by using local and global context features.

Proceedings Article
01 Nov 2011
TL;DR: The task of identifying general and specific sentences in news articles is introduced and it is shown that the specificity levels predicted by the classifier correlates with the intuitive judgement of specificity employed by people for creating these summaries.
Abstract: In this paper, we introduce the task of identifying general and specific sentences in news articles. Given the novelty of the task, we explore the feasibility of using existing annotations of discourse relations as training data for a general/specific classifier. The classifier relies on several classes of features that capture lexical and syntactic information, as well as word specificity and polarity. We also validate our results on sentences that were directly judged by multiple annotators to be general or specific. We analyze the annotator agreement on specificity judgements and study the strengths and robustness of features. We also provide a task-based evaluation of our classifier on general and specific summaries written by people. Here we show that the specificity levels predicted by our classifier correlates with the intuitive judgement of specificity employed by people for creating these summaries.

Proceedings Article
01 Nov 2011
TL;DR: This paper describes the structured named entity definition used in this challenge and presents a method to transfer reference annotations to ASR output, used in the Quaero 2010 evaluation of extended named entity annotation on speech transcripts, whose results are given.
Abstract: The evaluation of named entity recognition (NER) methods is an active field of research. This includes the recognition of named entities in speech transcripts. Evaluating NER systems on automatic speech recognition (ASR) output whereas human reference annotation was prepared on clean manual transcripts raises difficult alignment issues. These issues are emphasized when named entities are structured, as is the case in the Quaero NER challenge organized in 2010. This paper describes the structured named entity definition used in this challenge and presents a method to transfer reference annotations to ASR output. This method was used in the Quaero 2010 evaluation of extended named entity annotation on speech transcripts, whose results are given in the paper.

Proceedings Article
Shiqi Zhao1, Haifeng Wang1, Chao Li, Ting Liu2, Yi Guan2 
01 Nov 2011
TL;DR: Experimental results show that, the precision of 1-best and 5best generated questions is 67% and 61%, respectively, which outperforms a baseline method that directly retrieves questions for queries in a cQA site search engine.
Abstract: This paper proposes a method that automatically generates questions from queries for community-based question answering (cQA) services. Our query-to-question generation model is built upon templates induced from search engine query logs. In detail, we first extract pairs of queries and user-clicked questions from query logs, with which we induce question generation templates. Then, when a new query is submitted, we select proper templates for the query and generate questions through template instantiation. We evaluated the method with a set of short queries randomly selected from query logs, and the generated questions were judged by human annotators. Experimental results show that, the precision of 1-best and 5best generated questions is 67% and 61%, respectively, which outperforms a baseline method that directly retrieves questions for queries in a cQA site search engine. In addition, the results also suggest that the proposed method can improve the search of cQA archives.

Proceedings Article
01 Nov 2011
TL;DR: It is shown that a hybrid image-text approach can lead to improvements in word relatedness, confirming the applicability of visual cues as a possible orthogonal information source.
Abstract: Traditional approaches to semantic relatedness are often restricted to text-based methods, which typically disregard other multimodal knowledge sources. In this paper, we propose a novel image-based metric to estimate the relatedness of words, and demonstrate the promise of this method through comparative evaluations on three standard datasets. We also show that a hybrid image-text approach can lead to improvements in word relatedness, confirming the applicability of visual cues as a possible orthogonal information source.

Proceedings Article
01 Nov 2011
TL;DR: A new set of named entities having a multilevel tree structure, where base entities are combined to define more complex ones, making the NER task more complex than previous tasks, even more due to the use of noisy data for the annotation.
Abstract: Named Entity Recognition (NER) is a well-known Natural Language Processing (NLP) task, used as a preliminary processing to provide a semantic level to more complex tasks. In this paper we describe a new set of named entities having a multilevel tree structure, where base entities are combined to define more complex ones. This definition makes the NER task more complex than previous tasks, even more due to the use of noisy data for the annotation: transcriptions of French broadcast data. We propose an original and effective system to tackle this new task, putting together the strengths of solutions for sequence labeling approaches and syntactic parsing via cascading of different models. Our system was evaluated in the 2011 Quaero named entity detection evaluation campaign and ranked first, with results far better than those of the other participating systems.

Proceedings Article
01 Nov 2011
TL;DR: It is shown that using LDA for word class induction scales better with the number of classes than the Brown algorithm and the resulting classes outperform Brown on the three tasks.
Abstract: Word classes automatically induced from distributional evidence have proved useful many NLP tasks including Named Entity Recognition, parsing and sentence retrieval. The Brown hard clustering algorithm is commonly used in this scenario. Here we propose to use Latent Dirichlet Allocation in order to induce soft, probabilistic word classes. We compare our approach against Brown in terms of efficiency. We also compare the usefulness of the induced Brown and LDA word classes for the semi-supervised learning of three NLP tasks: fine-grained Named Entity Recognition, Morphological Analysis and semantic Relation Classification. We show that using LDA for word class induction scales better with the number of classes than the Brown algorithm and the resulting classes outperform Brown on the three tasks.