scispace - formally typeset
Search or ask a question

Showing papers by "Katsumi Tanaka published in 2012"


Proceedings ArticleDOI
16 Apr 2012
TL;DR: A new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization and parallel movement, called SParQS is proposed.
Abstract: Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it is often difficult for the user to choose from a list of query suggestions, and to understand the relation between an input query and suggested ones. In this paper, we propose a new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization (e.g. from "nikon" to "nikon camera" ) and parallel movement (e.g. from "nikon camera" to "canon camera"). Using a query log collected from a popular commercial Web search engine, our prototype called SParQS classifies query suggestions into automatically generated categories and generates a label for each category. Moreover, SParQS presents some new entities as alternatives to the original query (e.g. "canon" in response to the query "nikon"), together with their query suggestions classified in the same way as the original query's suggestions. We conducted a task-based user study to compare SParQS with a traditional "flat list" query suggestion interface. Our results show that the SParQS interface enables subjects to search more successfully than the flat list case, even though query suggestions presented were exactly the same in the two interfaces. In addition, the subjects found the query suggestions more helpful when they were presented in the SParQS interface rather than in a flat list.

37 citations


Proceedings ArticleDOI
21 May 2012
TL;DR: A two-stage system using user's eye movements to accommodate the increasing demands to obtain information from the Web in an efficient way and it is confirmed that the nMLT method works best.
Abstract: In this paper, we propose a two-stage system using user's eye movements to accommodate the increasing demands to obtain information from the Web in an efficient way. In the first stage the system estimates a user's search intent as a set of weighted terms extracted based on the user's eye movements while browsing Web pages. Then in the second stage, the system shows relevant information to the user by using the estimated intent for re-ranking search results, suggesting intent-based queries, and emphasizing relevant parts of Web pages. The system aims to help users to efficiently obtain what they need by repeating these steps throughout the information seeking process. We proposed four types of search intent estimation methods (MLT, nMLT, DLT and nDLT) considering the relationship among intents, term frequencies and eye movements. As a result of an experiment designed for evaluating the accuracy of each method with a prototype system, we confirmed that the nMLT method works best. In addition, by analyzing the extracted intent terms for eight subjects in the experiment, we found that the system could estimate the unique search intent of each user even if they performed the same search tasks.

17 citations


Proceedings ArticleDOI
29 Oct 2012
TL;DR: This paper analyzes readability of Wikipedia, which is a popular source of information for searchers about unknown topics, and uses some new metrics based on words' popularity and their distributions across different document genres and topics.
Abstract: Readability is one of key factors determining document quality and reader's satisfaction. In this paper we analyze readability of Wikipedia, which is a popular source of information for searchers about unknown topics. Although Wikipedia articles are frequently listed by search engines on top ranks, they are often too difficult for average readers searching information about difficult queries. We examine the average readability of content in Wikipedia and compare it to the one in Simple Wikipedia and Britannica. Next, we investigate readability of selected categories in Wikipedia. Apart from standard readability measures we use some new metrics based on words' popularity and their distributions across different document genres and topics.

11 citations


01 Jan 2012
TL;DR: The method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall.
Abstract: This paper tackles the problem of mining subgoals of a given search goal from data For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as "book flights," "book a hotel," "find good restaurants" and "decide which sightseeing spots to visit" As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as "do physical exercise," "take diet pills," and "control calorie intake" In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering Advertisements (ads) reflect advertisers' tremendous efforts in trying to match a given query with implicit user needs Moreover, ads are usually associated with a particular action or transaction We therefore hypothesized that they are useful for subgoal mining To our knowledge, our work is the first to use sponsored search data for this purpose Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall

11 citations


Proceedings ArticleDOI
12 Aug 2012
TL;DR: The problem of domain adaptation for content-based retrieval is introduced and a domain adaptation method based on relative aggregation points (RAPs) is proposed, which constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content- based retrieval in heterogeneous domains.
Abstract: We introduce the problem of domain adaptation for content-based retrieval and propose a domain adaptation method based on relative aggregation points (RAPs). Content-based retrieval including image retrieval and spoken document retrieval enables a user to input examples as a query, and retrieves relevant data based on the similarity to the examples. However, input examples and relevant data can be dissimilar, especially when domains from which the user selects examples and from which the system retrieves data are different. In content-based geographic object retrieval, for example, suppose that a user who lives in Beijing visits Kyoto, Japan, and wants to search for relatively inexpensive restaurants serving popular local dishes by means of a content-based retrieval system. Since such restaurants in Beijing and Kyoto are dissimilar due to the difference in the average cost and areas' popular dishes, it is difficult to find relevant restaurants in Kyoto based on examples selected in Beijing. We propose a solution for this problem by assuming that RAPs in different domains correspond, which may be dissimilar but play the same role. A RAP is defined as the expectation of instances in a domain that are classified into a certain class, e.g. the most expensive restaurant, average restaurant, and restaurant serving the most popular dishes. Our proposed method constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content-based retrieval in heterogeneous domains. To verify the effectiveness of our proposed method, we evaluated various methods with a test collection developed for content-based geographic object retrieval. Experimental results show that our proposed method achieved significant improvements over baseline methods. Moreover, we observed that the search performance of content-based retrieval in heterogeneous domains was significantly lower than that in homogeneous domains. This finding suggests that relevant data for the same search intent depend on the search context, that is, the location where the user searches and the domain from which the system retrieves data.

10 citations


Proceedings ArticleDOI
29 Oct 2012
TL;DR: In this article, the problem of mining subgoals of a given search goal from data was tackled by using sponsored search data for finding sub-goals by means of query clustering, and the experimental results show that the method that combines ad impressions from sponsored search and query co-occurrences from session data outperforms a state-of-the-art method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1 measure and subgoal recall.
Abstract: This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as "book flights," "book a hotel," "find good restaurants" and "decide which sightseeing spots to visit." As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as "do physical exercise," "take diet pills," and "control calorie intake." In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers' tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall.

10 citations


Journal ArticleDOI
TL;DR: This research investigates the relationship between links and readability of text extracted from Web pages for two datasets, namely English and Japanese pages and describes a link analysis algorithm for measuring comprehensibility of Web pages based on the TrustRank algorithm originally used for combating Web spam.
Abstract: Although Web search engines have become information gateways to the Internet, search results often contain pages that are difficult to understand for non-expert users, especially when queries contain technical or rare terms. Readability indexes are well-known measures for estimating text comprehensibility. However, readability indexes are not sufficient for evaluating the comprehensibility of Web pages, as they are designed for general purpose texts. In this research, we investigate the relationship between links and readability of text extracted from Web pages for two datasets, namely English and Japanese pages. We then describe a link analysis algorithm for measuring comprehensibility of Web pages based on the TrustRank algorithm originally used for combating Web spam. Lastly, we report results of preliminary studies to measure the correlation between search rank and readability of Web search results.

7 citations


Proceedings ArticleDOI
29 Oct 2012
TL;DR: The results of large-scale studies on the usage of words and the evolution of English language vocabulary over the last two centuries are reported to help with understanding its impact on readability and retrieval of historical documents.
Abstract: Recently many historical texts have become digitized and made accessible for search and browsing. As human language is subject to constant evolution, these texts pose varying challenges to current users. In this paper we report the results of large-scale studies on the usage of words and the evolution of English language vocabulary over the last two centuries to help with understanding its impact on readability and retrieval of historical documents. We perform analysis of several lexical factors which may influence accessibility and readability of historical texts based on two large scale lexical corpora: the Corpus of Historical American English and Google Books 1-gram.

7 citations


Proceedings ArticleDOI
09 Jul 2012
TL;DR: Zhang et al. as mentioned in this paper proposed a relative relevance feedback method for image retrieval systems, which allows users to select relatively relevant and irrelevant items, and modifies a query by taking into account the relativity of user's feedback.
Abstract: We propose a relative relevance feedback method for image retrieval systems. Relevance feedback is an effective method to modify a user's query by selecting relevant and irrelevant items in the search result. However, users cannot always find exactly relevant items in the first few search result pages, especially when the initial query is not specified due to the lack of user's knowledge. Thus, we propose relative relevance feedback in the present paper, which allows users to select relatively relevant and irrelevant items, and modifies a query by taking into account the relativity of user's feedback. Our experimental result shows that the relative relevance feedback outperforms a conventional relevance feedback for image retrieval tasks.

4 citations


Book ChapterDOI
15 Apr 2012
TL;DR: The goal of this panel is to initiate an open discussion within the community on data management challenges and opportunities in cloud computing.
Abstract: Analyzing large data is a challenging problem today, as there is an increasing trend of applications being expected to deal with vast amounts of data that usually do not fit in the main memory of a single machine. For such data-intensive applications, database research community has started to investigate cloud computing as a cost effective option to build scalable parallel data management systems which are capable of serving petabytes of data for millions of users. The goal of this panel is to initiate an open discussion within the community on data management challenges and opportunities in cloud computing. Potential topics to be discussed in the panel include: MapReduce framework, shared-nothing architecture, parallel query processing, security, analytical data management, transactional data management and fault tolerance.

4 citations


Proceedings ArticleDOI
10 Jun 2012
TL;DR: The correlation between the outcomes of different readability measurements and publication dates of prose texts on the basis of two datasets, the Victorian Women's Writers Project and the Corpus of Late Modern English Texts are investigated.
Abstract: Digital libraries often contain historical documents of varying age. The degree to which users can understand their content depends much on their reading difficulty. In this poster paper we report the results of our studies on the readability of historical documents from the viewpoint of present users. We investigate the correlation between the outcomes of different readability measurements and publication dates of prose texts on the basis of two datasets, the Victorian Women's Writers Project and the Corpus of Late Modern English Texts.

Proceedings ArticleDOI
24 Jul 2012
TL;DR: The study based on neuroscience showed that the service brain model could explain the cognition of "Omonpakari" service regardless of customers' gender, knowledge and the social context, and suggest an alternative model of service in which there is a productive tension, or dialectic, between the provider and the customer.
Abstract: In high-quality Japanese services, providers are often said to sense what their customers want from subtle cues and deliver a customized service without explicitly advertising the effort. To understand this subtle service, often called "Omonpakari," we studied a high-end Sushi restaurant using multidisciplinary approach--using neuroscience to analyze the cognitive characteristic, ethno methodology to analyze the interactive structure, and computer science to analyze the social evaluations. The study based on neuroscience showed that the service brain model could explain the cognition of "Omonpakari" service regardless of customers' gender, knowledge and the social context. The ethno methodological analysis revealed that customers performed a role, complying with cultural norms and behaving like a culturally appropriate customer even if they might not be. The analysis using computer science techniques showed that expertise was the key factor of evaluation of the services. These findings suggest an alternative model of service in which there is a productive tension, or dialectic, between the provider and the customer.

Book ChapterDOI
28 Nov 2012
TL;DR: A new image search method, called "panoramic image search", is proposed, and its application to similar landscape discovery is shown, using an image ranking method called PanoramaRank: a combination of image similarity and image adjacency, where image similarity is the retrieval score obtained from the classic vocabulary tree based image retrieval framework.
Abstract: In this paper, we propose a new image search method, called "panoramic image search", and show its application to similar landscape discovery. In order to perform the "panoramic image search", we introduce an image ranking method called PanoramaRank: a combination of image similarity and image adjacency, where image similarity is the retrieval score obtained from the classic vocabulary tree based image retrieval framework, and image adjacency is computed using a RANSAC verified SURF matching process. Our proposing notion means to search for images physically surrounded to given query image(s). A landscape is a view of an area comprising several geographical features, having a common and meaningful atmosphere. We believe a collection of images is necessary for describing a landscape. Besides, images in this collection have to be roughly similar and roughly adjacent to each other directly or indirectly. In order to discover similar landscapes, (1)find images describing the same landscape as user-selected query image(s) by employing PanoramaRank. (2)Similar images taken in different locations are retrieved, of which belong to the same location are treated as an insufficient representation of a similar landscape to the original one. (3)PanoramaRank is applied once more to find a whole landscape for each location separately. (4)Based on several comparison criteria, landscape similarity ranking has been worked out. Moreover, images of landscapes similar to a given landscape image, especially those not presented in results based on the individual pair-wised measure, can be found. Experimental results and evaluation are also presented.

Book ChapterDOI
15 Apr 2012
TL;DR: A method to generate facets dynamically to enhance the navigation of objects returned by a web-based search query and implemented a prototype system that shows returned images from an image search classified by multiple facets.
Abstract: We propose a method to generate facets dynamically to enhance the navigation of objects returned by a web-based search query. Facets denote axes for classifying a currently viewed object and related objects and are used as navigation signs to indicate their positions. Facets are generated by detecting hypernyms and coordinate terms of expressions that characterize objects. To be effectively used for browsing search results, generated facets are ranked. We implemented a prototype system that shows returned images from an image search classified by multiple facets. The results of an experiment to assess the facets showed that the average precision of correct facets in all queries obtained using our system is up to 82.7% for the top three and up to 77.6% for the top five ranked facets.

Journal ArticleDOI
09 Jan 2012
TL;DR: The Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 2011) was held in conjunction with the 20th International World Wide Web Conference in Hyderabad, India on the 28th March 2011 and this report briefly summarizes the workshop.
Abstract: The Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality 2011) was held in conjunction with the 20th International World Wide Web Conference in Hyderabad, India on the 28th March 2011. Seven full-papers presentations and a keynote talk were delivered in three sessions. This report briefly summarizes the workshop.

Book ChapterDOI
28 Nov 2012
TL;DR: The process which analyzes and structurizes corresponding Community Question-Answer corpus data: Finding question-answer pairs (QAs) related to a user's query, extracting keywords from QAs related to the user's intent, transforming QAs into a graph, and generating suggested queries using QA graphs is introduced.
Abstract: Web search users often suffer from formulating keyword queries although their search intent may be clear Moreover, it is difficult for search engines to guess search intent from queries only We propose a new method for discovering search intents and for generating suggested queries of a given input Web search query to address these problems Precisely, we introduce the process which analyzes and structurizes corresponding Community Question-Answer corpus data: Finding question-answer pairs (QAs) related to a user's query, extracting keywords from QAs related to the user's intent, transforming QAs into a graph, and generating suggested queries using QA graphs

Proceedings ArticleDOI
03 Dec 2012
TL;DR: Kcanvas provides a canvas as an intuitive and playful interface so that a user can casually collage what he/she is interested in in his/her daily life and others can also enjoy exploring this canvas as a visual art work.
Abstract: In this paper, we introduce an application called “Kcanvas”. It is based on the belief that one's thought consist of fragments of knowledge that might be beautiful and enhance one's creativity if they are visualized. Kcanvas provides a canvas as an intuitive and playful interface so that a user can casually collage what he/she is interested in in his/her daily life and others can also enjoy exploring this canvas as a visual art work. We introduce canvases on ”Kcanvas” and discuss the possibility of the application.