Showing papers by "Katsumi Tanaka published in 2012"

PDF

Open Access

Proceedings Article•DOI•

Structured query suggestion for specialization and parallel movement: effect on search behaviors

[...]

Makoto P. Kato¹, Tetsuya Sakai², Katsumi Tanaka¹•Institutions (2)

16 Apr 2012

TL;DR: A new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization and parallel movement, called SParQS is proposed.

...read moreread less

Abstract: Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it is often difficult for the user to choose from a list of query suggestions, and to understand the relation between an input query and suggested ones. In this paper, we propose a new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization (e.g. from "nikon" to "nikon camera" ) and parallel movement (e.g. from "nikon camera" to "canon camera"). Using a query log collected from a popular commercial Web search engine, our prototype called SParQS classifies query suggestions into automatically generated categories and generates a label for each category. Moreover, SParQS presents some new entities as alternatives to the original query (e.g. "canon" in response to the query "nikon"), together with their query suggestions classified in the same way as the original query's suggestions. We conducted a task-based user study to compare SParQS with a traditional "flat list" query suggestion interface. Our results show that the SParQS interface enables subjects to search more successfully than the flat list case, even though query suggestions presented were exactly the same in the two interfaces. In addition, the subjects found the query suggestions more helpful when they were presented in the SParQS interface rather than in a flat list.

...read moreread less

37 citations

Proceedings Article•DOI•

Search intent estimation from user's eye movements for supporting information seeking

[...]

Kazutoshi Umemoto¹, Takehiro Yamamoto¹, Satoshi Nakamura¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

21 May 2012

TL;DR: A two-stage system using user's eye movements to accommodate the increasing demands to obtain information from the Web in an efficient way and it is confirmed that the nMLT method works best.

...read moreread less

Abstract: In this paper, we propose a two-stage system using user's eye movements to accommodate the increasing demands to obtain information from the Web in an efficient way. In the first stage the system estimates a user's search intent as a set of weighted terms extracted based on the user's eye movements while browsing Web pages. Then in the second stage, the system shows relevant information to the user by using the estimated intent for re-ranking search results, suggesting intent-based queries, and emphasizing relevant parts of Web pages. The system aims to help users to efficiently obtain what they need by repeating these steps throughout the information seeking process. We proposed four types of search intent estimation methods (MLT, nMLT, DLT and nDLT) considering the relationship among intents, term frequencies and eye movements. As a result of an experiment designed for evaluating the accuracy of each method with a prototype system, we confirmed that the nMLT method works best. In addition, by analyzing the extracted intent terms for eight subjects in the experiment, we found that the system could estimate the unique search intent of each user even if they performed the same search tasks.

...read moreread less

17 citations

Proceedings Article•DOI•

Is wikipedia too difficult?: comparative analysis of readability of wikipedia, simple wikipedia and britannica

[...]

Adam Jatowt¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

29 Oct 2012

TL;DR: This paper analyzes readability of Wikipedia, which is a popular source of information for searchers about unknown topics, and uses some new metrics based on words' popularity and their distributions across different document genres and topics.

...read moreread less

Abstract: Readability is one of key factors determining document quality and reader's satisfaction. In this paper we analyze readability of Wikipedia, which is a popular source of information for searchers about unknown topics. Although Wikipedia articles are frequently listed by search engines on top ranks, they are often too difficult for average readers searching information about difficult queries. We examine the average readability of content in Wikipedia and compare it to the one in Simple Wikipedia and Britannica. Next, we investigate readability of selected categories in Wikipedia. Apart from standard readability measures we use some new metrics based on words' popularity and their distributions across different document genres and topics.

...read moreread less

11 citations

The wisdom of advertisers: Mining subgoals via query clustering

[...]

Takehiro Yamamoto¹, Tetsuya Sakai², Mayu Iwata³, Chen Yu², Ji-Rong Wen², Katsumi Tanaka¹ - Show less +2 more•Institutions (3)

Kyoto University¹, Microsoft², Osaka University³

01 Jan 2012

TL;DR: The method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall.

...read moreread less

Abstract: This paper tackles the problem of mining subgoals of a given search goal from data For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as "book flights," "book a hotel," "find good restaurants" and "decide which sightseeing spots to visit" As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as "do physical exercise," "take diet pills," and "control calorie intake" In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering Advertisements (ads) reflect advertisers' tremendous efforts in trying to match a given query with implicit user needs Moreover, ads are usually associated with a particular action or transaction We therefore hypothesized that they are useful for subgoal mining To our knowledge, our work is the first to use sponsored search data for this purpose Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall

...read moreread less

11 citations

Proceedings Article•DOI•

Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points

[...]

Makoto P. Kato¹, Hiroaki Ohshima¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

12 Aug 2012

TL;DR: The problem of domain adaptation for content-based retrieval is introduced and a domain adaptation method based on relative aggregation points (RAPs) is proposed, which constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content- based retrieval in heterogeneous domains.

...read moreread less

Abstract: We introduce the problem of domain adaptation for content-based retrieval and propose a domain adaptation method based on relative aggregation points (RAPs). Content-based retrieval including image retrieval and spoken document retrieval enables a user to input examples as a query, and retrieves relevant data based on the similarity to the examples. However, input examples and relevant data can be dissimilar, especially when domains from which the user selects examples and from which the system retrieves data are different. In content-based geographic object retrieval, for example, suppose that a user who lives in Beijing visits Kyoto, Japan, and wants to search for relatively inexpensive restaurants serving popular local dishes by means of a content-based retrieval system. Since such restaurants in Beijing and Kyoto are dissimilar due to the difference in the average cost and areas' popular dishes, it is difficult to find relevant restaurants in Kyoto based on examples selected in Beijing. We propose a solution for this problem by assuming that RAPs in different domains correspond, which may be dissimilar but play the same role. A RAP is defined as the expectation of instances in a domain that are classified into a certain class, e.g. the most expensive restaurant, average restaurant, and restaurant serving the most popular dishes. Our proposed method constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content-based retrieval in heterogeneous domains. To verify the effectiveness of our proposed method, we evaluated various methods with a test collection developed for content-based geographic object retrieval. Experimental results show that our proposed method achieved significant improvements over baseline methods. Moreover, we observed that the search performance of content-based retrieval in heterogeneous domains was significantly lower than that in homogeneous domains. This finding suggests that relevant data for the same search intent depend on the search context, that is, the location where the user searches and the domain from which the system retrieves data.

...read moreread less

10 citations

Proceedings Article•DOI•

The wisdom of advertisers: mining subgoals via query clustering

[...]

Takehiro Yamamoto¹, Tetsuya Sakai², Mayu Iwata³, Chen Yu², Ji-Rong Wen², Katsumi Tanaka¹ - Show less +2 more•Institutions (3)

Kyoto University¹, Microsoft², Osaka University³

29 Oct 2012

TL;DR: In this article, the problem of mining subgoals of a given search goal from data was tackled by using sponsored search data for finding sub-goals by means of query clustering, and the experimental results show that the method that combines ad impressions from sponsored search and query co-occurrences from session data outperforms a state-of-the-art method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1 measure and subgoal recall.

...read moreread less

Abstract: This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as "book flights," "book a hotel," "find good restaurants" and "decide which sightseeing spots to visit." As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as "do physical exercise," "take diet pills," and "control calorie intake." In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers' tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall.

...read moreread less

10 citations

Journal Article•DOI•

"Towards more readable web: measuring readability of web pages based on link structure" by Adam Jatowt, Kouichi Akamatsu, Nimit Pattanasri, and Katsumi Tanaka with Ching-man Au Yeung as coordinator

[...]

Adam Jatowt¹, Kouichi Akamatsu¹, Nimit Pattanasri¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

01 Jan 2012-ACM Sigweb Newsletter

TL;DR: This research investigates the relationship between links and readability of text extracted from Web pages for two datasets, namely English and Japanese pages and describes a link analysis algorithm for measuring comprehensibility of Web pages based on the TrustRank algorithm originally used for combating Web spam.

...read moreread less

Abstract: Although Web search engines have become information gateways to the Internet, search results often contain pages that are difficult to understand for non-expert users, especially when queries contain technical or rare terms. Readability indexes are well-known measures for estimating text comprehensibility. However, readability indexes are not sufficient for evaluating the comprehensibility of Web pages, as they are designed for general purpose texts. In this research, we investigate the relationship between links and readability of text extracted from Web pages for two datasets, namely English and Japanese pages. We then describe a link analysis algorithm for measuring comprehensibility of Web pages based on the TrustRank algorithm originally used for combating Web spam. Lastly, we report results of preliminary studies to measure the correlation between search rank and readability of Web search results.

...read moreread less

7 citations

Proceedings Article•DOI•

Large scale analysis of changes in english vocabulary over recent time

[...]

Adam Jatowt¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

29 Oct 2012

TL;DR: The results of large-scale studies on the usage of words and the evolution of English language vocabulary over the last two centuries are reported to help with understanding its impact on readability and retrieval of historical documents.

...read moreread less

Abstract: Recently many historical texts have become digitized and made accessible for search and browsing. As human language is subject to constant evolution, these texts pose varying challenges to current users. In this paper we report the results of large-scale studies on the usage of words and the evolution of English language vocabulary over the last two centuries to help with understanding its impact on readability and retrieval of historical documents. We perform analysis of several lexical factors which may influence accessibility and readability of historical texts based on two large scale lexical corpora: the Corpus of Historical American English and Google Books 1-gram.

...read moreread less

7 citations

Proceedings Article•DOI•

Relative Relevance Feedback in Image Retrieval

[...]

Yuki Sugiyama¹, Makoto P. Kato¹, Hiroaki Ohshima¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

09 Jul 2012

TL;DR: Zhang et al. as mentioned in this paper proposed a relative relevance feedback method for image retrieval systems, which allows users to select relatively relevant and irrelevant items, and modifies a query by taking into account the relativity of user's feedback.

...read moreread less

Abstract: We propose a relative relevance feedback method for image retrieval systems. Relevance feedback is an effective method to modify a user's query by selecting relevant and irrelevant items in the search result. However, users cannot always find exactly relevant items in the first few search result pages, especially when the initial query is not specified due to the lack of user's knowledge. Thus, we propose relative relevance feedback in the present paper, which allows users to select relatively relevant and irrelevant items, and modifies a query by taking into account the relativity of user's feedback. Our experimental result shows that the relative relevance feedback outperforms a conventional relevance feedback for image retrieval tasks.

...read moreread less

4 citations

Book Chapter•DOI•

Data management challenges and opportunities in cloud computing

[...]

Kyuseok Shim¹, Sang Kyun Cha¹, Lei Chen², Wook-Shin Han³, Divesh Srivastava⁴, Katsumi Tanaka, Hwanjo Yu⁵, Xiaofang Zhou⁶ - Show less +4 more•Institutions (6)

Seoul National University¹, Hong Kong University of Science and Technology², Kyungpook National University³, AT&T Labs⁴, Pohang University of Science and Technology⁵, University of Queensland⁶

15 Apr 2012

TL;DR: The goal of this panel is to initiate an open discussion within the community on data management challenges and opportunities in cloud computing.

...read moreread less

Abstract: Analyzing large data is a challenging problem today, as there is an increasing trend of applications being expected to deal with vast amounts of data that usually do not fit in the main memory of a single machine. For such data-intensive applications, database research community has started to investigate cloud computing as a cost effective option to build scalable parallel data management systems which are capable of serving petabytes of data for millions of users. The goal of this panel is to initiate an open discussion within the community on data management challenges and opportunities in cloud computing. Potential topics to be discussed in the panel include: MapReduce framework, shared-nothing architecture, parallel query processing, security, analytical data management, transactional data management and fault tolerance.

...read moreread less

4 citations

Proceedings Article•DOI•

Longitudinal analysis of historical texts' readability

[...]

Adam Jatowt¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

10 Jun 2012

TL;DR: The correlation between the outcomes of different readability measurements and publication dates of prose texts on the basis of two datasets, the Victorian Women's Writers Project and the Corpus of Late Modern English Texts are investigated.

...read moreread less

Abstract: Digital libraries often contain historical documents of varying age. The degree to which users can understand their content depends much on their reading difficulty. In this poster paper we report the results of our studies on the readability of historical documents from the viewpoint of present users. We investigate the correlation between the outcomes of different readability measurements and publication dates of prose texts on the basis of two datasets, the Victorian Women's Writers Project and the Corpus of Late Modern English Texts.

...read moreread less

Proceedings Article•DOI•

How Japanese Traditional "Omonpakari" Services are Delivered - A Multidisciplinary Approach

[...]

Yoshinori Hara¹, Yutaka Yamauchi¹, Yoshinori Yamakawa², Junya Fujisawa², Hiroaki Ohshima¹, Katsumi Tanaka¹ - Show less +2 more•Institutions (2)

Kyoto University¹, NTT DATA²

24 Jul 2012

TL;DR: The study based on neuroscience showed that the service brain model could explain the cognition of "Omonpakari" service regardless of customers' gender, knowledge and the social context, and suggest an alternative model of service in which there is a productive tension, or dialectic, between the provider and the customer.

...read moreread less

Abstract: In high-quality Japanese services, providers are often said to sense what their customers want from subtle cues and deliver a customized service without explicitly advertising the effort. To understand this subtle service, often called "Omonpakari," we studied a high-end Sushi restaurant using multidisciplinary approach--using neuroscience to analyze the cognitive characteristic, ethno methodology to analyze the interactive structure, and computer science to analyze the social evaluations. The study based on neuroscience showed that the service brain model could explain the cognition of "Omonpakari" service regardless of customers' gender, knowledge and the social context. The ethno methodological analysis revealed that customers performed a role, complying with cultural norms and behaving like a culturally appropriate customer even if they might not be. The analysis using computer science techniques showed that expertise was the key factor of evaluation of the services. These findings suggest an alternative model of service in which there is a productive tension, or dialectic, between the provider and the customer.

...read moreread less

Book Chapter•DOI•

Panoramic image search by similarity and adjacency for similar landscape discovery

[...]

Meng Zhao¹, Hiroaki Ohshima¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

28 Nov 2012

TL;DR: A new image search method, called "panoramic image search", is proposed, and its application to similar landscape discovery is shown, using an image ranking method called PanoramaRank: a combination of image similarity and image adjacency, where image similarity is the retrieval score obtained from the classic vocabulary tree based image retrieval framework.

...read moreread less

Abstract: In this paper, we propose a new image search method, called "panoramic image search", and show its application to similar landscape discovery. In order to perform the "panoramic image search", we introduce an image ranking method called PanoramaRank: a combination of image similarity and image adjacency, where image similarity is the retrieval score obtained from the classic vocabulary tree based image retrieval framework, and image adjacency is computed using a RANSAC verified SURF matching process. Our proposing notion means to search for images physically surrounded to given query image(s). A landscape is a view of an area comprising several geographical features, having a common and meaningful atmosphere. We believe a collection of images is necessary for describing a landscape. Besides, images in this collection have to be roughly similar and roughly adjacent to each other directly or indirectly. In order to discover similar landscapes, (1)find images describing the same landscape as user-selected query image(s) by employing PanoramaRank. (2)Similar images taken in different locations are retrieved, of which belong to the same location are treated as an insufficient representation of a similar landscape to the original one. (3)PanoramaRank is applied once more to find a whole landscape for each location separately. (4)Based on several comparison criteria, landscape similarity ranking has been worked out. Moreover, images of landscapes similar to a given landscape image, especially those not presented in results based on the individual pair-wised measure, can be found. Experimental results and evaluation are also presented.

...read moreread less

Book Chapter•DOI•

On-the-Fly generation of facets as navigation signs for web objects

[...]

Yu Kawano¹, Hiroaki Ohshima¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

15 Apr 2012

TL;DR: A method to generate facets dynamically to enhance the navigation of objects returned by a web-based search query and implemented a prototype system that shows returned images from an image search classified by multiple facets.

...read moreread less

Abstract: We propose a method to generate facets dynamically to enhance the navigation of objects returned by a web-based search query. Facets denote axes for classifying a currently viewed object and related objects and are used as navigation signs to indicate their positions. Facets are generated by detecting hypernyms and coordinate terms of expressions that characterize objects. To be effectively used for browsing search results, generated facets are ranked. We implemented a prototype system that shows returned images from an image search classified by multiple facets. The results of an experiment to assess the facets showed that the average precision of correct facets in all queries obtained using our system is up to 82.7% for the top three and up to 77.6% for the top five ranked facets.

...read moreread less

Journal Article•DOI•

Report on the joint WICOW/AIRWeb workshop on web quality (WebQuality 2011)

[...]

Carlos Castillo¹, Zoltan Gyongyi², Adam Jatowt³, Katsumi Tanaka³•Institutions (3)

Yahoo!¹, Google², Kyoto University³

09 Jan 2012

TL;DR: The Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 2011) was held in conjunction with the 20th International World Wide Web Conference in Hyderabad, India on the 28th March 2011 and this report briefly summarizes the workshop.

...read moreread less

Abstract: The Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 2011) was held in conjunction with the 20th International World Wide Web Conference in Hyderabad, India on the 28th March 2011. Seven full-papers presentations and a keynote talk were delivered in three sessions. This report briefly summarizes the workshop.

...read moreread less

Book Chapter•DOI•

Search intent discovery by structurization of community QA contents

[...]

Soungwoong Yoon¹, Adam Jatowt¹, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

28 Nov 2012

TL;DR: The process which analyzes and structurizes corresponding Community Question-Answer corpus data: Finding question-answer pairs (QAs) related to a user's query, extracting keywords from QAs related to the user's intent, transforming QAs into a graph, and generating suggested queries using QA graphs is introduced.

...read moreread less

Abstract: Web search users often suffer from formulating keyword queries although their search intent may be clear Moreover, it is difficult for search engines to guess search intent from queries only We propose a new method for discovering search intents and for generating suggested queries of a given input Web search query to address these problems Precisely, we introduce the process which analyzes and structurizes corresponding Community Question-Answer corpus data: Finding question-answer pairs (QAs) related to a user's query, extracting keywords from QAs related to the user's intent, transforming QAs into a graph, and generating suggested queries using QA graphs

...read moreread less

Proceedings Article•DOI•

Kcanvas: An application for creative personal knowledge management

[...]

Akiko Takahashi¹, Christa Sommerer, Katsumi Tanaka¹•Institutions (1)

Kyoto University¹

03 Dec 2012

TL;DR: Kcanvas provides a canvas as an intuitive and playful interface so that a user can casually collage what he/she is interested in in his/her daily life and others can also enjoy exploring this canvas as a visual art work.

...read moreread less

Abstract: In this paper, we introduce an application called “Kcanvas”. It is based on the belief that one's thought consist of fragments of knowledge that might be beautiful and enhance one's creativity if they are visualized. Kcanvas provides a canvas as an intuitive and playful interface so that a user can casually collage what he/she is interested in in his/her daily life and others can also enjoy exploring this canvas as a visual art work. We introduce canvases on ”Kcanvas” and discuss the possibility of the application.

...read moreread less