scispace - formally typeset
Search or ask a question
Author

Akiko Aizawa

Bio: Akiko Aizawa is an academic researcher from National Institute of Informatics. The author has contributed to research in topics: Computer science & Natural language. The author has an hindex of 26, co-authored 229 publications receiving 3218 citations. Previous affiliations of Akiko Aizawa include Graduate University for Advanced Studies & University of Illinois at Urbana–Champaign.


Papers
More filters
Journal ArticleDOI
TL;DR: The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency-inverse document frequency measures that are commonly used in today's information retrieval systems.
Abstract: This paper presents a mathematical definition of the "probability-weighted amount of information" (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of retrieval events. The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency-inverse document frequency measures that are commonly used in today's information retrieval systems. The mathematical definition of the PWI is shown, together with some illustrative examples of the calculation.

1,011 citations

Journal ArticleDOI
TL;DR: New methods for adjusting configuration parameters of genetic algorithms operating in a noisy environment by model the search process as a statistical selection process and derive equations useful for problems related to the scheduling of resources for tests performed in genetic algorithms are developed.
Abstract: In this paper, we develop new methods for adjusting configuration parameters of genetic algorithms operating in a noisy environment. Such methods are related to the scheduling of resources for tests performed in genetic algorithms. Assuming that the population size is given, we address two problems related to the design of efficient scheduling algorithms specifically important in noisy environments. First, we study the durution-scheduling problem that is related to setting dynamically the duration of each generation. Second, we study the sample-allocation problem that entails the adaptive determination of the number of evaluations taken from each candidate in a generation. In our approach, we model the search process as a statistical selection process and derive equations useful for these problems. Our results show that our adaptive procedures improve the performance of genetic algorithms over that of commonly used static ones.

131 citations

Proceedings ArticleDOI
08 Apr 2005
TL;DR: This paper proposes a fast and efficient method for linkage detection that exploits a suffix array structure that enables linkage detection using variable length n-grams and dynamically generates blocks of possibly associated records using ‘blocking keys’ extracted from already known reliable linkages.
Abstract: Record linkage refers to techniques for identifying records associated with the same real-world entities. Record linkage is not only crucial in integrating multi-source databases that have been generated independently, but is also considered to be one of the key issues in integrating heterogeneous Web resources. However, when targeting large-scale data, the cost of enumerating all the possible linkages often becomes impracticably high. Based on this background, this paper proposes a fast and efficient method for linkage detection. The features of the proposed approach are: first, it exploits a suffix array structure that enables linkage detection using variable length n-grams. Second, it dynamically generates blocks of possibly associated records using ‘blocking keys’ extracted from already known reliable linkages. The results from our preliminary experiments where the proposed method was applied to the integration of four bibliographic databases, which scale up to more than 10 million records, are also reported in the paper.

96 citations

01 Jan 2016
TL;DR: This overview paper summarizes the task design, corpora, submitted runs, results, and the approaches used by participating groups of the NTCIR-12 MathIR Task.
Abstract: We present an overview of the NTCIR-12 MathIR Task, dedicated to information access for mathematical content. The MathIR task makes use of two corpora. The first corpus contains excerpts from technical articles in the arXiv, while the second corpus contains English Wikipedia articles. For each corpus, there were two subtasks. Three subtasks contain queries with keywords and formulae (arXiv-main, Wikimain, and arXiv-simto), while the fourth considers isolated formula queries (Wiki-formula). In this overview paper, we summarize the task design, corpora, submitted runs, results, and the approaches used by participating groups.

91 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The authors proposed a framework allowing conditional response generation based on specific attributes, such as genericness and sentiment states, which can be either manually assigned or automatically detected to generate meaningful responses in accordance with the specified attributes.
Abstract: Deep latent variable models have been shown to facilitate the response generation for open-domain dialog systems. However, these latent variables are highly randomized, leading to uncontrollable generated responses. In this paper, we propose a framework allowing conditional response generation based on specific attributes. These attributes can be either manually assigned or automatically detected. Moreover, the dialog states for both speakers are modeled separately in order to reflect personal features. We validate this framework on two different scenarios, where the attribute refers to genericness and sentiment states respectively. The experiment result testified the potential of our model, where meaningful responses can be generated in accordance with the specified attributes.

88 citations


Cited by
More filters
01 Jan 2009

7,241 citations

01 Jan 2014
TL;DR: Using Language部分的�’学模式既不落俗套,又能真正体现新课程标准所倡导的�'学理念,正是年努力探索的问题.
Abstract: 人教版高中英语新课程教材中,语言运用(Using Language)是每个单元必不可少的部分,提供了围绕单元中心话题的听、说、读、写的综合性练习,是单元中心话题的延续和升华.如何设计Using Language部分的教学,使自己的教学模式既不落俗套,又能真正体现新课程标准所倡导的教学理念,正是广大一线英语教师一直努力探索的问题.

2,071 citations

01 Jan 1995
TL;DR: In this paper, the authors propose a method to improve the quality of the data collected by the data collection system. But it is difficult to implement and time consuming and computationally expensive.
Abstract: 本文对国际科学计量学杂志《Scientometrics》1979-1991年的研究论文内容、栏目、作者及国别和编委及国别作了计量分析,揭示出科学计量学研究的重点、活动的中心及发展趋势,说明了学科带头人在发展科学计量学这门新兴学科中的作用。

1,636 citations

Book
01 Dec 2006
TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Abstract: 1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.

1,628 citations