scispace - formally typeset
Search or ask a question
Author

Koji Eguchi

Bio: Koji Eguchi is an academic researcher from Kobe University. The author has contributed to research in topics: Topic model & Inference. The author has an hindex of 15, co-authored 90 publications receiving 889 citations. Previous affiliations of Koji Eguchi include Hitotsubashi University & National Institute of Informatics.


Papers
More filters
01 Jan 2002
TL;DR: The system of the NTCIR-6 CLIR task and its test collection (document sets, topic sets, and method for relevance judgments), and reviews CLIR techniques used by participants and search performance of runs submitted for evaluation are described.
Abstract: The purpose of this paper is to overview research efforts at the NTCIR-6 CLIR task, which is a project of large-scale retrieval experiments on cross-lingual information retrieval (CLIR) of Chinese, Japanese, Korean, and English. The project has three sub-tasks, multi-lingual IR (MLIR), bilingual IR (BLIR), and single language IR (SLIR), in which many research groups from ten countries or regions are participating. This paper describes the system of the NTCIR-6 CLIR task and its test collection (document sets, topic sets, and method for relevance judgments), and reviews CLIR techniques used by participants and search performance of runs submitted for evaluation.

134 citations

Proceedings ArticleDOI
22 Jul 2006
TL;DR: This paper proposes several sentiment information retrieval models in the framework of probabilistic language models, assuming that a user both inputs query terms expressing a certain topic and also specifies a sentiment polarity of interest in some manner.
Abstract: Ranking documents or sentences according to both topic and sentiment relevance should serve a critical function in helping users when topics and sentiment polarities of the targeted text are not explicitly given, as is often the case on the web. In this paper, we propose several sentiment information retrieval models in the framework of probabilistic language models, assuming that a user both inputs query terms expressing a certain topic and also specifies a sentiment polarity of interest in some manner. We combine sentiment relevance models and topic relevance models with model parameters estimated from training data, considering the topic dependence of the sentiment. Our experiments prove that our models are effective.

127 citations

01 Dec 2003
TL;DR: In the Web Retrieval Task of the Third NTCIR Workshop as discussed by the authors, the retrieval effectiveness of Web search engine systems using a common data set, and built a re-usable test collection suitable for evaluating Web Search engine systems, was evaluated.
Abstract: This paper gives an overview of the Web Retrieval Task that was conducted from 2001 to 2002 at the Third NTCIR Workshop. In the Web Retrieval Task, we attempted to assess the retrieval effectiveness of Web search engine systems using a common data set, and built a re-usable test collection suitable for evaluating Web search engine systems. With these objectives, we constructed 100-gigabyte and 10-gigabyte document data that were mainly gathered from the ‘.jp’ domain. Participants were allowed to access those data only within the ‘Open Laboratory’ located at the National Institute of Informatics. Relevance judgments were performed on the retrieved documents, which were written in Japanese or English, by considering the relationshiop between the pages referenced by hyperlinks. Some evaluation measures were also applied to individual system results submitted by the participants.

55 citations

Proceedings Article
06 Jan 2007
TL;DR: This work proposes an interactive scheme for clustering document collections, based on any criterion of the user's preference, and demonstrates excellent results on clustering by sentiment, substantially outperforming an SVM trained on a large amount of labeled data.
Abstract: Document clustering is traditionally tackled from the perspective of grouping documents that are topically similar. However, many other criteria for clustering documents can be considered: for example, documents' genre or the author's mood. We propose an interactive scheme for clustering document collections, based on any criterion of the user's preference. The user holds an active position in the clustering process: first, she chooses the types of features suitable to the underlying task, leading to a task-specific document representation. She can then provide examples of features-- if such examples are emerging, e.g., when clustering by the author's sentiment, words like 'perfect', 'mediocre', 'awful' are intuitively good features. The algorithm proceeds iteratively, and the user can fix errors made by the clustering system at the end of each iteration. Such an interactive clustering method demonstrates excellent results on clustering by sentiment, substantially outperforming an SVM trained on a large amount of labeled data. Even if features are not provided because they are not intuitively obvious to the user--e.g., what would be good features for clustering by genre using part-of-speech trigrams?--our multi-modal clustering method performs significantly better than k-means and Latent Dirichlet Allocation (LDA).

47 citations

Proceedings Article
03 Dec 2012
TL;DR: A new topic model is proposed that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA, that is more effective than some other existing multilingual topic models.
Abstract: Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more effective than some other existing multilingual topic models.

37 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations

Book
01 May 2012
TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

4,515 citations

Journal ArticleDOI
TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.
Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

2,303 citations

01 Jan 2010
TL;DR: In this article, the authors focus on opinion expressions that convey people's positive or negative sentiments, i.e., opinions are subjective expressions that describe people's sentiments, appraisals or feelings toward entities, events and their properties.
Abstract: Textual information in the world can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about entities, events and their properties. Opinions are usually subjective expressions that describe people’s sentiments, appraisals or feelings toward entities, events and their properties. The concept of opinion is very broad. In this chapter, we only focus on opinion expressions that convey people’s positive or negative sentiments. Much of the existing research on textual information processing has been focused on mining and retrieval of factual information, e.g., information retrieval, Web search, text classification, text clustering and many other text mining and natural language processing tasks. Little work had been done on the processing of opinions until only recently. Yet, opinions are so important that whenever we need to make a decision we want to hear others’ opinions. This is not only true for individuals but also true for organizations. One of the main reasons for the lack of study on opinions is the fact that there was little opinionated text available before the World Wide Web. Before the Web, when an individual needed to make a decision, he/she typically asked for opinions from friends and families. When an organization wanted to find the opinions or sentiments of the general public about its products and services, it conducted opinion polls, surveys, and focus groups. However, with the Web, especially with the explosive growth of the usergenerated content on the Web in the past few years, the world has been transformed. The Web has dramatically changed the way that people express their views and opinions. They can now post reviews of products at merchant sites and express their views on almost anything in Internet forums, discussion groups, and blogs, which are collectively called the user-generated content. This online wordof-mouth behavior represents new and measurable sources of information with many practical applications. Now if one wants to buy a product, he/she is no longer limited to asking his/her friends and families because there are many product reviews on the Web which give opinions of existing users of the product. For a company, it may no longer be necessary to conduct surveys, organize focus groups or employ external consultants in order to find consumer opinions about its products and those of its competitors because the user-generated content on the Web can already give them such information.

1,575 citations