scispace - formally typeset
Search or ask a question

Showing papers by "Andrei Z. Broder published in 2009"


Proceedings Article•DOI•
20 Apr 2009
TL;DR: This paper describes an efficient and effective approach for matching ads against rare queries that were not processed offline, and shows that the approach significantly improves the effectiveness of advertising on rare queries with only a negligible increase in computational cost.
Abstract: Sponsored search systems are tasked with matching queriesto relevant advertisements. The current state-of-the-art matching algorithms expand the user's query using a variety of external resources, such as Web search results. While these expansion-based algorithms are highly effective, they are largely inefficient and cannot be applied in real-time. In practice, such algorithms are applied offline to popular queries, with the results of the expensive operations cached for fast access at query time. In this paper, we describe an efficient and effective approach for matching ads against rare queries that were not processed offline. The approach builds an expanded query representation by leveraging offline processing done for related popular queries. Our experimental results show that our approach significantly improves the effectiveness of advertising on rare queries with only a negligible increase in computational cost.

114 citations


Proceedings Article•DOI•
20 Apr 2009
TL;DR: This work proposes a simple generative model that embodies two fundamental characteristics of page requests arriving to advertising systems, namely, long-range dependences and similarities and provides theoretical bounds on the gains of similarity caching and demonstrates these gains empirically by fitting the actual data to the model.
Abstract: Motivated by contextual advertising systems and other web applications involving efficiency-accuracy tradeoffs, we study similarity caching. Here, a cache hit is said to occur if the requested item is similar but not necessarily equal to some cached item. We study two objectives that dictate the efficiency-accuracy tradeoff and provide our caching policies for these objectives. By conducting extensive experiments on real data we show similarity caching can significantly improve the efficiency of contextual advertising systems, with minimal impact on accuracy. Inspired by the above, we propose a simple generative model that embodies two fundamental characteristics of page requests arriving to advertising systems, namely, long-range dependences and similarities. We provide theoretical bounds on the gains of similarity caching in this model and demonstrate these gains empirically by fitting the actual data to the model.

85 citations


Journal Article•DOI•
TL;DR: Empirical evaluation confirms that the proposed methodology yields a considerably higher classification accuracy than previously reported, which will lead to better matching of online ads to rare queries and overall to a better user experience.
Abstract: We propose a methodology for building a robust query classification system that can identify thousands of query classes, while dealing in real time with the query volume of a commercial Web search engine. We use a pseudo relevance feedback technique: given a query, we determine its topic by classifying the Web search results retrieved by the query. Motivated by the needs of search advertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in aggregate account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than previously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience.

61 citations


Journal Article•DOI•
TL;DR: In this article, the authors introduce the hiring problem, in which a growing company continuously interviews and decides whether to hire applicants, which is similar in spirit but quite different from the well-studied secretary problem.
Abstract: We introduce the hiring problem, in which a growing company continuously interviews and decides whether to hire applicants. This problem is similar in spirit but quite different from the well-studied secretary problem. Like the secretary problem, it captures fundamental aspects of decision making under uncertainty and has many possible applications. We analyze natural strategies of hiring above the current average, considering both the mean and the median averages; we call these Lake Wobegon strategies. Like the hiring problem itself, our strategies are intuitive, simple to describe, and amenable to mathematically and economically significant modifications. We demonstrate several intriguing behaviors of the two strategies. Specifically, we show dramatic differences between hiring above the mean and above the median. We also show that both strategies are intrinsically connected to the lognormal distribution, leading to only very weak concentration results, and the marked importance of the first few hires on the overall outcome.

47 citations


Proceedings Article•DOI•
Hila Becker1, Andrei Z. Broder2, Evgeniy Gabrilovich2, Vanja Josifovski2, Bo Pang2 •
19 Jul 2009
TL;DR: It is concluded that in the vast majority of cases, the user is shown one of three types of pages, which can be accurately distinguished using automatic text classification.
Abstract: We define and study the process of context transfer in search advertising, which is the transition of a user from the context of Web search to the context of the landing page that follows an ad-click. We conclude that in the vast majority of cases, the user is shown one of three types of pages, which can be accurately distinguished using automatic text classification.

43 citations


Proceedings Article•DOI•
Hila Becker1, Andrei Z. Broder2, Evgeniy Gabrilovich2, Vanja Josifovski2, Bo Pang2 •
02 Nov 2009
TL;DR: The process of context transfer is defined and study, that is, the user's transition from Web search to the context of the landing page that follows an ad-click, and it is shown that in the vast majority of cases the user is shown one of three types of pages, namely, Homepage, Category browse, and Search transfer.
Abstract: Unbeknownst to most users, when a query is submitted to a search engine two distinct searches are performed: the organic or algorithmic search that returns relevant Web pages and related data (maps, images, etc.), and the sponsored search that returns paid advertisements. While an enormous amount of work has been invested in understanding the user interaction with organic search, surprisingly little research has been dedicated to what happens after an ad is clicked, a situation we aim to correct. To this end, we define and study the process of context transfer, that is, the user's transition from Web search to the context of the landing page that follows an ad-click. We conclude that in the vast majority of cases the user is shown one of three types of pages, namely, Homepage (the homepage of the advertiser), Category browse (a browse-able sub-catalog related to the original query), and Search transfer (the search results of the same query re-executed on the target site). We show that these three types of landing pages can be accurately distinguished using automatic text classification. Finally, using such an automatic classifier, we correlate the landing page type with conversion data provided by advertisers, and show that the conversion rate (i.e., users' response rate to ads) varies considerably according to the type. We believe our findings will further the understanding of users' response to search advertising in general, and landing pages in particular, and thus help advertisers improve their Web sites and help search engines select the most suitable ads.

27 citations


Proceedings Article•DOI•
20 Apr 2009
TL;DR: Experimental results show that the approach can accurately forecast the expected number of impressions of contextual ads in real time, and how this method can be used in tools for bid selection and ad evaluation.
Abstract: Contextual advertising (also called content match) refers to the placement of small textual ads within the content of a generic web page. It has become a significant source of revenue for publishers ranging from individual bloggers to major newspapers. At the same time it is an important way for advertisers to reach their intended audience. This reach depends on the total number of exposures of the ad (impressions) and its click-through-rate (CTR) that can be viewed as the probability of an end-user clicking on the ad when shown. These two orthogonal, critical factors are both difficult to estimate and even individually can still be very informative and useful in planning and budgeting advertising campaigns.In this paper, we address the problem of forecasting the number of impressions for new or changed ads in the system. Producing such forecasts, even within large margins of error, is quite challenging: 1) ad selection in contextual advertising is a complicated process based on tens or even hundreds of page and ad features; 2) the publishers' content and traffic vary over time; and 3) the scale of the problem is daunting: over a course of a week it involves billions of impressions, hundreds of millions of distinct pages, hundreds of millions of ads, and varying bids of other competing advertisers. We tackle these complexities by simulating the presence of a given ad with its associated bid over weeks of historical data. We obtain an impression estimate by counting how many times the ad would have been displayed if it were in the system over that period of time. We estimate this count by an efficient two-level search algorithm over the distinct pages in the data set. Experimental results show that our approach can accurately forecast the expected number of impressions of contextual ads in real time. We also show how this method can be used in tools for bid selection and ad evaluation.

23 citations


Patent•
Deepak Agarwal1, Vanja Josifovski, Andrei Z. Broder, Evgeniy Gabrilovich1, Rob Hall •
20 Feb 2009
TL;DR: In this article, a click-through-rate probability for a web advertisement to be placed on the web document may be estimated based on the one or more expert statistical models, and associated weightings may be determined based, at least in part, on the features detected.
Abstract: Methods and systems are provided that may be used to determine a probability of whether a visitor to a web document is likely to click on a web advertisement. An exemplary method may include detecting one or more features in a web document. One or more expert statistical models to which the web document belongs may be determined and associated weightings may be determined based, at least in part, on the one or more features detected. A click-through-rate probability for a web advertisement to be placed on the web document may be estimated based on the one or more expert statistical models.

23 citations


Patent•
Vanja Josifovsky1, George Hu, Jianchang Jc Mao, Majid Mohazzab1, Andrei Z. Broder •
30 Nov 2009
TL;DR: In this paper, the content server classifies the primary webpage for content and retrieves persistent relevance information, possibly including a referrer of the primary web page comprising a URL address of the referring webpage, a listing of other recently visited webpages, a list of any bid phrases from previously displayed advertisements, and a recent click data.
Abstract: Methods for selecting one or more advertisements based on previously captured relevance data to serve to a client system requesting a primary webpage is provided. The client displays a referring webpage having a hyperlink to the primary webpage. Upon selection of the hyperlink, the client sends a request to a content server storing the primary webpage. The content server classifies the primary webpage for content and retrieves persistent relevance information, possibly including a referrer of the primary webpage comprising a URL address of the referring webpage, a listing of other recently visited webpages, a listing of any bid phrases from previously displayed advertisements, and a listing of recent click data. The content server sends the primary webpage to the client, which includes an advertisement server request. The transaction between the content server and the advertisement server includes persistence relevance information to select advertisements to serve to the client.

22 citations


Patent•
25 Jun 2009
TL;DR: In this article, a predictor for determining a degree of relevance between a query rewrite and a search query is provided, where the predictor may receive a query from a user via a terminal and identify a set of candidate query rewrites associated with the search query.
Abstract: A predictor for determining a degree of relevance between a query rewrite and a search query is provided. The predictor may receive a search query from a user via a terminal and identify a set of candidate query rewrites associated with the search query. The predictor may then extract a set of features from advertisements associated with the query rewrites and the search query and determine a degree of relevance between the advertisements and the search query based on a prediction model. The predictor may then determine the degree of relevance between the rewrites and the search query based on the determined degree of relevance between the advertisements and the search query.

16 citations


Patent•
19 Oct 2009
TL;DR: In this article, a contextual advertising system selects online advertisements for display on a network location by transforming page content of a page received in a platform over a network into a textual representation.
Abstract: A contextual advertising system selects online advertisements for display on a network location. The system may transform page content of a page received in a platform over a network into a textual representation. In addition, the system may transform received site content of a site into a site signature. The site includes the page. The system then may correct the textual representation utilizing the site signature to produce modified textual representation. The system may utilize the modified textual representation to select an online advertisement. Considering a page in the context of the entire website to which it belongs leads to better understanding and interpretation of the page topic(s) and thus yields more accurate ad matching.

Proceedings Article•DOI•
09 Feb 2009
TL;DR: This study proposes a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems, and shows that by considering the Web search results in the query's original language as additional sources of information, it can alleviate the effect of erroneous machine translation.
Abstract: The non-English Web is growing at phenomenal speed, but available language processing tools and resources are predominantly English-based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of arguable quality. Given that building comprehensive taxonomies for each language is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable text processing tasks in other languages. Our experimental results confirm that the answer is affirmative with respect to at least one task. In this study we focus on query classification, which is essential for understanding the user intent both in Web search and in online advertising. We propose a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems. In particular, we show that by considering the Web search results in the query's original language as additional sources of information, we can alleviate the effect of erroneous machine translation. Empirical evaluation on query sets in languages as diverse as Chinese and Russian yields very encouraging results; consequently, we believe that our approach is also applicable to many additional languages.

Patent•
12 Feb 2009
TL;DR: In this article, the subject matter relates to returning cached object results based at least in part on a non-exact comparison with a query key, which is similar to our approach.
Abstract: The subject matter disclosed herein relates to returning cached object results based at least in part on a non-exact comparison with a query key.

Patent•
13 May 2009
TL;DR: In this paper, a method and system for determining related bid terms is presented, which includes accessing a term database to determine a plurality of term pairs, the term pairs being paired terms bidded together in a term bidding operating environment.
Abstract: The present invention provides a method and system for determining related bid terms. The method and system includes accessing a term database to determine a plurality of term pairs, the term pairs being paired terms bidded together in a term bidding operating environment. In the method and system, for each of the plurality of term pairs, the method and system includes determining similarity values for each of the term pairs. The method and system further includes generating a similarity matrix using the determined similarity values. And, the method and system includes generating an output result based on a co-bidded relationship between at least one of the terms and advertising information.