Showing papers by "Hideki Isozaki published in 2008"

PDF

Open Access

Proceedings Article•

Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data

[...]

Jun Suzuki¹, Hideki Isozaki¹•Institutions (1)

01 Dec 2008

TL;DR: Evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition is provided.

...read moreread less

Abstract: This paper provides evidence that the use of more unlabeled data in semi-supervised learning can improve the performance of Natural Language Processing (NLP) tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition. We first propose a simple yet powerful semi-supervised discriminative model appropriate for handling large scale unlabeled data. Then, we describe experiments performed on widely used test collections, namely, PTB III data, CoNLL’00 and ’03 shared task data for the above three NLP tasks, respectively. We incorporate up to 1G-words (one billion tokens) of unlabeled data, which is the largest amount of unlabeled data ever used for these tasks, to investigate the performance improvement. In addition, our results are superior to the best reported results for all of the above test collections.

...read moreread less

159 citations

Proceedings Article•

Corpus-based Question Answering for why-Questions

[...]

Ryuichiro Higashinaka, Hideki Isozaki

01 Jan 2008

TL;DR: NAZEQA, a Japanese why-QA system based on the proposed corpus-based approach, clearly outperforms a baseline that uses hand-crafted patterns with a Mean Reciprocal Rank (top-5) of 0.305, making it presumably the best-performing fully implemented why- QA system.

...read moreread less

Abstract: This paper proposes a corpus-based approach for answering why-questions. Conventional systems use hand-crafted patterns to extract and evaluate answer candidates. However, such hand-crafted patterns are likely to have low coverage of causal expressions, and it is also difficult to assign suitable weights to the patterns by hand. In our approach, causal expressions are automatically collected from corpora tagged with semantic relations. From the collected expressions, features are created to train an answer candidate ranker that maximizes the QA performance with regards to the corpus of why-questions and answers. NAZEQA, a Japanese why-QA system based on our approach, clearly outperforms a baseline that uses hand-crafted patterns with a Mean Reciprocal Rank (top-5) of 0.305, making it presumably the best-performing fully implemented why-QA system.

...read moreread less

73 citations

Proceedings Article•

Multi-label Text Categorization with Model Combination based on F1-score Maximization.

[...]

Akinori Fujino, Hideki Isozaki, Jun Suzuki

01 Jan 2008

TL;DR: This work designs a classifier design method based on model combination and F1-score maximization for multi-label categorization and results confirmed that the proposed method was useful especially for datasets where there were many combinations of category labels.

...read moreread less

Abstract: Text categorization is a fundamental task in natural language processing, and is generally defined as a multi-label categorization problem, where each text document is assigned to one or more categories. We focus on providing good statistical classifiers with a generalization ability for multi-label categorization and present a classifier design method based on model combination and F1-score maximization. In our formulation, we first design multiple models for binary classification per category. Then, we combine these models to maximize the F1-score of a training dataset. Our experimental results confirmed that our proposed method was useful especially for datasets where there were many combinations of category labels.

...read moreread less

44 citations

Proceedings Article•DOI•

Effects of self-disclosure and empathy in human-computer dialogue

[...]

Ryuichiro Higashinaka¹, Kohji Dohsaka¹, Hideki Isozaki¹•Institutions (1)

Nippon Telegraph and Telephone¹

01 Dec 2008

TL;DR: This analysis shows that empathic utterances by users are strong indicators of increasing closeness and user satisfaction, and self-disclosure by users increases when users have positive preferences on topics being discussed.

...read moreread less

Abstract: To build trust or cultivate long-term relationships with users, conversational systems need to perform social dialogue. To date, research has primarily focused on the overall effect of social dialogue in human-computer interaction, leading to little work on the effects of individual linguistic phenomena within social dialogue. This paper investigates such individual effects through dialogue experiments. Focusing on self-disclosure and empathic utterances (agreement and disagreement), we empirically calculate their contributions to the dialogue quality. Our analysis shows that (1) empathic utterances by users are strong indicators of increasing closeness and user satisfaction, (2) the system's empathic utterances are effective for inducing empathy from users, and (3) self-disclosure by users increases when users have positive preferences on topics being discussed.

...read moreread less

35 citations

Journal Article•DOI•

Automatically Acquiring Causal Expression Patterns from Relation-annotated Corpora to Improve Question Answering for why-Questions

[...]

Ryuichiro Higashinaka¹, Hideki Isozaki¹•Institutions (1)

Nippon Telegraph and Telephone¹

01 Apr 2008-ACM Transactions on Asian Language Information Processing

TL;DR: NAZEQA, a Japanese why-QA system based on the approach, clearly outperforms baselines with a Mean Reciprocal Rank ( top-5) of 0.223 when sentences are used as answers and with a MRR (top-5), making it presumably the best-performing fully implemented why- QA system.

...read moreread less

Abstract: This article describes our approach for answering why-questions that we initially introduced at NTCIR-6 QAC-4. The approach automatically acquires causal expression patterns from relation-annotated corpora by abstracting text spans annotated with a causal relation and by mining syntactic patterns that are useful for distinguishing sentences annotated with a causal relation from those annotated with other relations. We use these automatically acquired causal expression patterns to create features to represent answer candidates, and use these features together with other possible features related to causality to train an answer candidate ranker that maximizes the QA performance with regards to the corpus of why-questions and answers. NAZEQA, a Japanese why-QA system based on our approach, clearly outperforms baselines with a Mean Reciprocal Rank (top-5) of 0.223 when sentences are used as answers and with a MRR (top-5) of 0.326 when paragraphs are used as answers, making it presumably the best-performing fully implemented why-QA system. Experimental results also verified the usefulness of the automatically acquired causal expression patterns.

...read moreread less

26 citations

Multi-label Classification using Logistic Regression Models for NTCIR-7 Patent Mining Task

[...]

Akinori Fujino, Hideki Isozaki

01 Jan 2008

TL;DR: A multi-label classification system based on a machine learning approach for the NTCIR-7 Patent Mining Task that employs a logistic regression model for each International Patent Classification code that determines the IPC code assignment of research papers is designed.

...read moreread less

Abstract: We design a multi-label classification system based on a machine learning approach for the NTCIR-7 Patent Mining Task. In our system, we employ a logistic regression model for each International Patent Classification (IPC) code that determines the IPC code assignment of research papers. The logistic regressionmodels are trainedby usingpatentdocuments providedby task organizers. To mitigate the overfitting of the logistic regression models to the patent documents, we design the feature vectors of the patent documents with feature weighting and component selection methods utilizing a research paper set. Using a test collection for the Japanese subtask of the NTCIR7 Patent Mining Task, we confirmed the effectiveness of our multi-label classification system.

...read moreread less

11 citations

NTT SMT System 2008 at NTCIR-7.

[...]

Taro Watanabe, Hajime Tsukada, Hideki Isozaki

01 Jan 2008

TL;DR: This paper demonstrates the strong baseline for the PAT-MT English/Japanese translations and describes NTT SMT System 2008 presented at the patent translation task (PAT-MT) in NTCIR-7.

...read moreread less

Abstract: This paper describes NTT SMT System 2008 presented at the patent translation task (PAT-MT) in NTCIR-7. For PAT-MT, we submitted our strong baseline system faithfully following a hierarchical phrasebased statistical machine translation [2]. The hierarchical phrase-based SMT is based on a synchronousCFGs in which a paired source/target rules are synchronously applied starting from the initial symbol. The decoding is realized by a CYK-style bottom-up parsing on the source side with each derivation representing a translation candidate. We demonstrate the strong baseline for the PAT-MT English/Japanese translations.

...read moreread less

2 citations

NTT's CCLQA System for NTCIR-7 ACLIA

[...]

Ryuichiro Higashinaka, Hideki Isozaki

01 Jan 2008

TL;DR: A new rule-based English question analyzer to extract English query terms, which are translated into Japanese by translation dictionaries, based on the technologies used in the past NTCIR systems for QAC and CLQA.

...read moreread less

Abstract: This paper describes our Complex Cross-Lingual Question Answering (CCLQA) system based on the technologies used in our past NTCIR systems for QAC and CLQA. We implemented a new rule-based English question analyzer to extract English query terms, which are translated into Japanese by translation dictionaries. For DEFINITION, BIOGRAPHY, and EVENT questions, we reused our definition module for QAC-4. For RELATIONSHIP questions, we developed a new module based on our why-QA approach for QAC-4. When these modules were not applicable, a simple sentence retriever was used. According to the organizers’ evaluation results, although our ENJA system performed rather poorly due to the low coverage of the translation dictionaries, our JA-JA system achieved the second best score among the four participants.

...read moreread less