scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Web Search Personalization by User Profiling

TL;DR: The mathematics behind these 'link analysis algorithms' are analyzed and their effective use in ecommerce applications where they could be used for displaying 'personalized information' is analyzed.
Abstract: The World Wide Web is growing at a rate of about a million pages per day, making it tougher for search engines to extract relevant information for its users. Earlier Search Engines used simple indexing techniques to search for keywords in websites and gave more weightage to pages with higher frequency of keyword occurrences. This technique was easy to trick by using meta-tags liberally, claiming that their page used popular search terms, thereby, made meta-tags useless for search engines. Another technique widely used was to repeatedly use popular search terms in invisible text (white text on a white background) to fool engines. These fallacies called for a set of algorithms which would sort the results using an unbiased parameter. The currently employed Link Analysis Algorithms make use of the structure present in 'hyperlinks', sorted and displayed depending on a 'popularity index' decided to pages linking to it. In this work, we have analyzed the mathematics behind these 'link analysis algorithms' and their effective use in ecommerce applications where they could be used for displaying 'personalized information'.
Citations
More filters
Proceedings Article
01 Jan 1998

62 citations

Posted Content
TL;DR: The main objective of this paper is to explore the field of personalization in context of user profiling, to help researchers make aware of the user profiling.
Abstract: The Personalization of information has taken recommender systems at a very high level. With personalization these systems can generate user specific recommendations accurately and efficiently. User profiling helps personalization, where information retrieval is done to personalize a scenario which maintains a separate user profile for individual user. The main objective of this paper is to explore this field of personalization in context of user profiling, to help researchers make aware of the user profiling. Various trends, techniques and Applications have been discussed in paper which will fulfill this motto.

55 citations

Journal ArticleDOI
TL;DR: This paper aims at finding, extracting and integrating keyword based information from various web sources to generate a structured profile and does some experiments on the profiled information to generate knowledge out of it.

18 citations

Proceedings ArticleDOI
18 Mar 2012
TL;DR: A new architecture for user personalization which combines both social network data and context data is designed which aggregates a user's preference data from various social networking services and then builds a centralized user profile which is accessible through public Web services.
Abstract: In recommender systems, social networks are considered as a trusted source for user interests. In addition, user context can enhance users' decision making. In this paper, we design a new architecture for user personalization which combines both social network data and context data. Our system aggregates a user's preference data from various social networking services and then builds a centralized user profile which is accessible through public Web services. We also collect user's contextual information and store it in a central space which is also accessible through public Web services. Based on Service Oriented Architecture, recommender systems can flexibly utilize users' preference information and context to provide more desirable recommendations. We present how our system can integrate both types of data together and how they can be mapped in a meaningful way.

14 citations

Proceedings ArticleDOI
01 Dec 2009
TL;DR: This paper introduces A Personal Search Engine which provides results relevant to the user's interest and depends on the degree of relevance of the document category to ensure relevant and accurate results.
Abstract: With the tremendous growth of the web and the contents difference, users need specialized accurate results depending on their behavior and varying according to their interest. In this paper, we introduce A Personal Search Engine which provides results relevant to the user's interest. Our search engine depends on three factors to ensure relevant and accurate results. The first factor is the degree of importance of the document category to the user. The second factor is the user's interest page rank which depends on the user's browsing of the page. The third factor is the degree of relevance of the document.

9 citations

References
More filters
Journal ArticleDOI
01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

14,696 citations


Additional excerpts

  • ...Kleinburg’s HITS algorithm [8] and Google’s PageRank [ 2 , 3, 6, 7] algorithm, are eigenvector based methods....

    [...]

Journal Article
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

13,327 citations

Journal ArticleDOI
Jon Kleinberg1
TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
Abstract: The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of context on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authorative” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristrics for link-based analysis.

8,328 citations


"Web Search Personalization by User ..." refers methods in this paper

  • ...Kleinburg’s HITS algorithm [ 8 ] and Google’s PageRank [2, 3, 6, 7] algorithm, are eigenvector based methods....

    [...]

Journal ArticleDOI
Thomas Hofmann1
TL;DR: This paper proposes to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice, and results in a more principled approach with a solid foundation in statistical inference.
Abstract: This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.

2,574 citations

Proceedings ArticleDOI
07 May 2002
TL;DR: A set of PageRank vectors are proposed, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic, and are shown to generate more accurate rankings than with a single, generic PageRank vector.
Abstract: In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate query-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared.

1,765 citations