Showing papers by "Hideki Isozaki published in 2003"

PDF

Open Access

Proceedings Article•DOI•

Japanese zero pronoun resolution based on ranking rules and machine learning

[...]

Hideki Isozaki¹, Tsutomu Hirao¹•Institutions (1)

11 Jul 2003

TL;DR: This paper proposes a method that combines ranking rules and machine learning, which is simple and effective, while machine learning can take more factors into account.

...read moreread less

Abstract: Anaphora resolution is one of the most important research topics in Natural Language Processing. In English, overt pronouns such as she and definite noun phrases such as the company are anaphors that refer to preceding entities (antecedents). In Japanese, anaphors are often omitted, and these omissions are called zero pronouns. There are two major approaches to zero pronoun resolution: the heuristic approach and the machine learning approach. Since we have to take various factors into consideration, it is difficult to find a good combination of heuristic rules. Therefore, the machine learning approach is attractive, but it requires a large amount of training data. In this paper, we propose a method that combines ranking rules and machine learning. The ranking rules are simple and effective, while machine learning can take more factors into account. From the results of our experiments, this combination gives better performance than either of the two previous approaches.

...read moreread less

43 citations

Proceedings Article•DOI•

Deriving disambiguous queries in a spoken interactive ODQA system

[...]

Chiori Hori¹, Takaaki Hori¹, Hideki Isozaki¹, Eisaku Maeda¹, Shigeru Katagiri¹, Sadaoki Furui - Show less +2 more•Institutions (1)

Nippon Telegraph and Telephone¹

06 Apr 2003

TL;DR: An interactive approach for spoken interactive ODQA systems that derives disambiguous queries (DQ) that draw out additional information to contribute to distinguishing effectively an exact answer and to supplementing a lack of information by recognition errors.

...read moreread less

Abstract: Recently, open-domain question answering (ODQA) systems that extract an exact answer from large text corpora based on text input are intensively being investigated. However, the information in the first question input by a user is not usually enough to yield the desired answer. Interactions for collecting additional information to accomplish QA is needed. This paper proposes an interactive approach for spoken interactive ODQA systems. When the reliabilities for answer hypotheses obtained by an ODQA system are low, the system automatically derives disambiguous queries (DQ) that draw out additional information. The additional information based on the DQ should contribute to distinguishing effectively an exact answer and to supplementing a lack of information by recognition errors. In our spoken interactive ODQA system, SPIQA, spoken questions are recognized by an ASR system, and DQ are automatically generated to disambiguate the transcribed questions. We confirmed the appropriateness of the derived DQ by comparing them with manually prepared ones.

...read moreread less

25 citations

Proceedings Article•DOI•

Spoken Interactive ODQA System: SPIQA

[...]

Chiori Hori¹, Takaaki Hori¹, Hajime Tsukada¹, Hideki Isozaki¹, Yutaka Sasaki¹, Eisaku Maeda¹ - Show less +2 more•Institutions (1)

Nippon Telegraph and Telephone¹

07 Jul 2003

TL;DR: A spoken interactive ODQA system that derives disambiguating queries (DQs) that draw out additional information by reconstructing the user's initial question by combining the addition information with question and the combination is used for answer extraction.

...read moreread less

Abstract: We have been investigating an interactive approach for Open-domain QA (ODQA) and have constructed a spoken interactive ODQA system, SPIQA. The system derives disambiguating queries (DQs) that draw out additional information. To test the efficiency of additional information requested by the DQs, the system reconstructs the user's initial question by combining the addition information with question. The combination is then used for answer extraction. Experimental results revealed the potential of the generated DQs.

...read moreread less

15 citations

Patent•

Word string extracting method and device, and recording medium with word string extracting program recorded

[...]

Tsutomu Hirao, Hideki Isozaki, Jun Suzuki, 努平尾, 秀樹磯崎, 潤鈴木 - Show less +2 more

05 Nov 2003

TL;DR: In this paper, when a document set belonging to a certain domain of a document DB10 is applied, a word string extracting device extracts a word word string, and performs a low order square test between the word string and the previously extracted word string for a document group included in the predetermined domain and the others, and calculates scores by applying predetermined weight to the authorized word string to extract a sentence whose score is high from among a plurality of documents belonging to the certain domain.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To decide a score under the consideration of not only respective words but also the combination of words. SOLUTION: When a document set belonging to a certain domain of a document DB10 is applied, a word string extracting device extracts a word string, and performs a low order square test between the word string and the previously extracted word string for a document group included in the predetermined domain and the others, and compares it with a threshold to authorize the word string which is characteristics to the domain, and calculates scores by applying predetermined weight to the authorized word string to extract a sentence whose score is high from among a plurality of documents belonging to the certain domain. COPYRIGHT: (C)2005,JPO&NCIPI

...read moreread less

5 citations

Journal Article•DOI•

Machine Learning Approach to Multi-Document Summarization

[...]

Tsutomu Hirao, Hideto Kazawa, Hideki Isozaki, Eisaku Maeda, Yuji Matsumoto - Show less +1 more

10 Jan 2003

TL;DR: In this paper, a Support Vector Machine (SVM) is used to detect the presence of a malicious node in a target environment, which can be used to identify malicious nodes.

...read moreread less

Abstract: 近年, インターネットや大容量の磁気デバイスの普及によって, 大量の電子化文書が氾濫している. こうした状況を背景として, 文書要約技術に対する期待が高まってきている. 特に, ある話題に関連する一連の文書集合をまとめて要約することが可能となれば, 人間の負担を大きく軽減することができる. そこで本稿では, 特定の話題に直接関連する文書集合を対象とし, 機械学習手法を用いることによって重要文を抽出する手法を提案する. 重要文抽出の手法としては近年, 自然言語処理研究の分野でも注目されている機械学習手法の1種であるSupport Vector Machineを用いた手法を提案する. 毎日新聞99年1年分より選んだ12話題の文書集合を用意し, それぞれの話題から総文数の10%, 30%, 50%の要約率に応じて人手により重要文を抽出した正解データセットを異なる被験者により3種作成した. このデータセットを用いて評価実験を行った結果, 提案手法の重要文抽出精度は, Lead手法, TF・IDF手法よりも高いことがわかった. また, 従来より複数文書要約に有効とされる冗長性の削減が, 文を単位とした場合には, 必ずしも有効でないこともわかった.

...read moreread less

2 citations

Proceedings Article•

Experiments in TREC 2003 Genomics Track at NTT.

[...]

Hirotoshi Taira, Tomonori Izumitani, Tsutomu Hirao, Hideki Isozaki, Hideto Kazawa, Eisaku Maeda - Show less +2 more

01 Jan 2003

TL;DR: This work developed a heuristic scoring system that simply counts the number of verbs and their derived words, which are important to specify the function of a query gene or its product and uses a machine learning technique to score documents.

...read moreread less

Abstract: Our system consists of two steps. The first step retrieves documents using a keyword search, and the second step scores each document retrieved in the previous step and creates an output file for the TREC submission. The database provided by TREC consists of more than 500,000 PubMed abstracts. However, less than 50 documents are relevant for most queries. Applying scoring methods to all 500,000 abstracts would create a lot of noise. In the first step, we refined the document set with a simple keyword search. For the second step, we developed two methods. The first method (Method 1) uses a heuristic scoring system that simply counts the number of verbs and their derived words, which are important to specify the function of a query gene or its product. The second method (Method 2) uses a machine learning technique to score documents.

...read moreread less