Showing papers by "Andrei Z. Broder published in 2009"

PDF

Open Access

Proceedings Article•DOI•

Online expansion of rare queries for sponsored search

[...]

Andrei Z. Broder¹, Peter Ciccolo¹, Evgeniy Gabrilovich¹, Vanja Josifovski¹, Donald Metzler¹, Lance Riedel¹, Jeffrey Yuan¹ - Show less +3 more•Institutions (1)

Yahoo!¹

20 Apr 2009

TL;DR: This paper describes an efficient and effective approach for matching ads against rare queries that were not processed offline, and shows that the approach significantly improves the effectiveness of advertising on rare queries with only a negligible increase in computational cost.

...read moreread less

Abstract: Sponsored search systems are tasked with matching queriesto relevant advertisements. The current state-of-the-art matching algorithms expand the user's query using a variety of external resources, such as Web search results. While these expansion-based algorithms are highly effective, they are largely inefficient and cannot be applied in real-time. In practice, such algorithms are applied offline to popular queries, with the results of the expensive operations cached for fast access at query time. In this paper, we describe an efficient and effective approach for matching ads against rare queries that were not processed offline. The approach builds an expanded query representation by leveraging offline processing done for related popular queries. Our experimental results show that our approach significantly improves the effectiveness of advertising on rare queries with only a negligible increase in computational cost.

...read moreread less

114 citations

Proceedings Article•DOI•

Nearest-neighbor caching for content-match applications

[...]

Sandeep Pandey¹, Andrei Z. Broder¹, Flavio Chierichetti¹, Vanja Josifovski¹, Ravi Kumar¹, Sergei Vassilvitskii¹ - Show less +2 more•Institutions (1)

Yahoo!¹

20 Apr 2009

TL;DR: This work proposes a simple generative model that embodies two fundamental characteristics of page requests arriving to advertising systems, namely, long-range dependences and similarities and provides theoretical bounds on the gains of similarity caching and demonstrates these gains empirically by fitting the actual data to the model.

...read moreread less

Abstract: Motivated by contextual advertising systems and other web applications involving efficiency-accuracy tradeoffs, we study similarity caching. Here, a cache hit is said to occur if the requested item is similar but not necessarily equal to some cached item. We study two objectives that dictate the efficiency-accuracy tradeoff and provide our caching policies for these objectives. By conducting extensive experiments on real data we show similarity caching can significantly improve the efficiency of contextual advertising systems, with minimal impact on accuracy. Inspired by the above, we propose a simple generative model that embodies two fundamental characteristics of page requests arriving to advertising systems, namely, long-range dependences and similarities. We provide theoretical bounds on the gains of similarity caching in this model and demonstrate these gains empirically by fitting the actual data to the model.

...read moreread less

85 citations

Journal Article•DOI•

Classifying search queries using the Web as a source of knowledge

[...]

Evgeniy Gabrilovich¹, Andrei Z. Broder¹, Marcus Fontoura², Amruta Joshi³, Vanja Josifovski¹, Lance Riedel¹, Tong Zhang⁴ - Show less +3 more•Institutions (4)

Yahoo!¹, Pontifical Catholic University of Rio de Janeiro², University of California, Los Angeles³, Rutgers University⁴

01 Apr 2009-ACM Transactions on The Web

TL;DR: Empirical evaluation confirms that the proposed methodology yields a considerably higher classification accuracy than previously reported, which will lead to better matching of online ads to rare queries and overall to a better user experience.

...read moreread less

Abstract: We propose a methodology for building a robust query classification system that can identify thousands of query classes, while dealing in real time with the query volume of a commercial Web search engine. We use a pseudo relevance feedback technique: given a query, we determine its topic by classifying the Web search results retrieved by the query. Motivated by the needs of search advertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in aggregate account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than previously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience.

...read moreread less

61 citations

Journal Article•DOI•

The Hiring Problem and Lake Wobegon Strategies

[...]

Andrei Z. Broder¹, Adam Kirsch², Ravi Kumar, Michael Mitzenmacher, Eli Upfal³, Sergei Vassilvitskii¹ - Show less +2 more•Institutions (3)

Yahoo!¹, Harvard University², Brown University³

01 Sep 2009-SIAM Journal on Computing

TL;DR: In this article, the authors introduce the hiring problem, in which a growing company continuously interviews and decides whether to hire applicants, which is similar in spirit but quite different from the well-studied secretary problem.

...read moreread less

Abstract: We introduce the hiring problem, in which a growing company continuously interviews and decides whether to hire applicants. This problem is similar in spirit but quite different from the well-studied secretary problem. Like the secretary problem, it captures fundamental aspects of decision making under uncertainty and has many possible applications. We analyze natural strategies of hiring above the current average, considering both the mean and the median averages; we call these Lake Wobegon strategies. Like the hiring problem itself, our strategies are intuitive, simple to describe, and amenable to mathematically and economically significant modifications. We demonstrate several intriguing behaviors of the two strategies. Specifically, we show dramatic differences between hiring above the mean and above the median. We also show that both strategies are intrinsically connected to the lognormal distribution, leading to only very weak concentration results, and the marked importance of the first few hires on the overall outcome.

...read moreread less

47 citations

Proceedings Article•DOI•

Context transfer in search advertising

[...]

Hila Becker¹, Andrei Z. Broder², Evgeniy Gabrilovich², Vanja Josifovski², Bo Pang² - Show less +1 more•Institutions (2)

Columbia University¹, Yahoo!²

19 Jul 2009

TL;DR: It is concluded that in the vast majority of cases, the user is shown one of three types of pages, which can be accurately distinguished using automatic text classification.

...read moreread less

Abstract: We define and study the process of context transfer in search advertising, which is the transition of a user from the context of Web search to the context of the landing page that follows an ad-click. We conclude that in the vast majority of cases, the user is shown one of three types of pages, which can be accurately distinguished using automatic text classification.

...read moreread less

43 citations

Proceedings Article•DOI•

What happens after an ad click?: quantifying the impact of landing pages in web advertising

[...]

Hila Becker¹, Andrei Z. Broder², Evgeniy Gabrilovich², Vanja Josifovski², Bo Pang² - Show less +1 more•Institutions (2)

Columbia University¹, Yahoo!²

02 Nov 2009

TL;DR: The process of context transfer is defined and study, that is, the user's transition from Web search to the context of the landing page that follows an ad-click, and it is shown that in the vast majority of cases the user is shown one of three types of pages, namely, Homepage, Category browse, and Search transfer.

...read moreread less

Abstract: Unbeknownst to most users, when a query is submitted to a search engine two distinct searches are performed: the organic or algorithmic search that returns relevant Web pages and related data (maps, images, etc.), and the sponsored search that returns paid advertisements. While an enormous amount of work has been invested in understanding the user interaction with organic search, surprisingly little research has been dedicated to what happens after an ad is clicked, a situation we aim to correct. To this end, we define and study the process of context transfer, that is, the user's transition from Web search to the context of the landing page that follows an ad-click. We conclude that in the vast majority of cases the user is shown one of three types of pages, namely, Homepage (the homepage of the advertiser), Category browse (a browse-able sub-catalog related to the original query), and Search transfer (the search results of the same query re-executed on the target site). We show that these three types of landing pages can be accurately distinguished using automatic text classification. Finally, using such an automatic classifier, we correlate the landing page type with conversion data provided by advertisers, and show that the conversion rate (i.e., users' response rate to ads) varies considerably according to the type. We believe our findings will further the understanding of users' response to search advertising in general, and landing pages in particular, and thus help advertisers improve their Web sites and help search engines select the most suitable ads.

...read moreread less

27 citations

Proceedings Article•DOI•

A search-based method for forecasting ad impression in contextual advertising

[...]

Xuerui Wang¹, Andrei Z. Broder², Marcus Fontoura², Vanja Josifovski²•Institutions (2)

University of Massachusetts Amherst¹, Yahoo!²

20 Apr 2009

TL;DR: Experimental results show that the approach can accurately forecast the expected number of impressions of contextual ads in real time, and how this method can be used in tools for bid selection and ad evaluation.

...read moreread less

Abstract: Contextual advertising (also called content match) refers to the placement of small textual ads within the content of a generic web page. It has become a significant source of revenue for publishers ranging from individual bloggers to major newspapers. At the same time it is an important way for advertisers to reach their intended audience. This reach depends on the total number of exposures of the ad (impressions) and its click-through-rate (CTR) that can be viewed as the probability of an end-user clicking on the ad when shown. These two orthogonal, critical factors are both difficult to estimate and even individually can still be very informative and useful in planning and budgeting advertising campaigns.In this paper, we address the problem of forecasting the number of impressions for new or changed ads in the system. Producing such forecasts, even within large margins of error, is quite challenging: 1) ad selection in contextual advertising is a complicated process based on tens or even hundreds of page and ad features; 2) the publishers' content and traffic vary over time; and 3) the scale of the problem is daunting: over a course of a week it involves billions of impressions, hundreds of millions of distinct pages, hundreds of millions of ads, and varying bids of other competing advertisers. We tackle these complexities by simulating the presence of a given ad with its associated bid over weeks of historical data. We obtain an impression estimate by counting how many times the ad would have been displayed if it were in the system over that period of time. We estimate this count by an efficient two-level search algorithm over the distinct pages in the data set. Experimental results show that our approach can accurately forecast the expected number of impressions of contextual ads in real time. We also show how this method can be used in tools for bid selection and ad evaluation.

...read moreread less

23 citations

Patent•

Method and system for quantifying user interactions with web advertisements

[...]

Deepak Agarwal¹, Vanja Josifovski, Andrei Z. Broder, Evgeniy Gabrilovich¹, Rob Hall - Show less +1 more•Institutions (1)

Yahoo!¹

20 Feb 2009

TL;DR: In this article, a click-through-rate probability for a web advertisement to be placed on the web document may be estimated based on the one or more expert statistical models, and associated weightings may be determined based, at least in part, on the features detected.

...read moreread less

Abstract: Methods and systems are provided that may be used to determine a probability of whether a visitor to a web document is likely to click on a web advertisement. An exemplary method may include detecting one or more features in a web document. One or more expert statistical models to which the web document belongs may be determined and associated weightings may be determined based, at least in part, on the one or more features detected. A click-through-rate probability for a web advertisement to be placed on the web document may be estimated based on the one or more expert statistical models.

...read moreread less

23 citations

Patent•

System and method for retargeting advertisements based on previously captured relevance data

[...]

Vanja Josifovsky¹, George Hu, Jianchang Jc Mao, Majid Mohazzab¹, Andrei Z. Broder - Show less +1 more•Institutions (1)

Yahoo!¹

30 Nov 2009

TL;DR: In this paper, the content server classifies the primary webpage for content and retrieves persistent relevance information, possibly including a referrer of the primary web page comprising a URL address of the referring webpage, a listing of other recently visited webpages, a list of any bid phrases from previously displayed advertisements, and a recent click data.

...read moreread less

Abstract: Methods for selecting one or more advertisements based on previously captured relevance data to serve to a client system requesting a primary webpage is provided. The client displays a referring webpage having a hyperlink to the primary webpage. Upon selection of the hyperlink, the client sends a request to a content server storing the primary webpage. The content server classifies the primary webpage for content and retrieves persistent relevance information, possibly including a referrer of the primary webpage comprising a URL address of the referring webpage, a listing of other recently visited webpages, a listing of any bid phrases from previously displayed advertisements, and a listing of recent click data. The content server sends the primary webpage to the client, which includes an advertisement server request. The transaction between the content server and the advertisement server includes persistence relevance information to select advertisements to serve to the client.

...read moreread less

22 citations

Patent•

Prediction of a degree of relevance between query rewrites and a search query

[...]

Evgeniy Gabrilovich¹, Donald Metzler¹, Vanja Josifovski¹, Andrei Z. Broder¹, Vassilis Plachouras¹, Vanessa Murdock¹, Massimiliano Ciaramita¹ - Show less +3 more•Institutions (1)

Yahoo!¹

25 Jun 2009

TL;DR: In this article, a predictor for determining a degree of relevance between a query rewrite and a search query is provided, where the predictor may receive a query from a user via a terminal and identify a set of candidate query rewrites associated with the search query.

...read moreread less

Abstract: A predictor for determining a degree of relevance between a query rewrite and a search query is provided. The predictor may receive a search query from a user via a terminal and identify a set of candidate query rewrites associated with the search query. The predictor may then extract a set of features from advertisements associated with the query rewrites and the search query and determine a degree of relevance between the advertisements and the search query based on a prediction model. The predictor may then determine the degree of relevance between the rewrites and the search query based on the determined degree of relevance between the advertisements and the search query.

...read moreread less

16 citations

Patent•

Term Weighting for Contextual Advertising

[...]

Donald Metzler¹, Andrei Z. Broder, Vanja Josifovski, Kishore Papineni¹, Alexander J. Smola, George Mavromatis, Evgeniy Gabrilovich¹ - Show less +3 more•Institutions (1)

Yahoo!¹

19 Oct 2009

TL;DR: In this article, a contextual advertising system selects online advertisements for display on a network location by transforming page content of a page received in a platform over a network into a textual representation.

...read moreread less

Abstract: A contextual advertising system selects online advertisements for display on a network location. The system may transform page content of a page received in a platform over a network into a textual representation. In addition, the system may transform received site content of a site into a site signature. The site includes the page. The system then may correct the textual representation utilizing the site signature to produce modified textual representation. The system may utilize the modified textual representation to select an online advertisement. Considering a page in the context of the entire website to which it belongs leads to better understanding and interpretation of the page topic(s) and thus yields more accurate ad matching.

...read moreread less

Proceedings Article•DOI•

Cross-language query classification using web search for exogenous knowledge

[...]

Xuerui Wang¹, Andrei Z. Broder², Evgeniy Gabrilovich², Vanja Josifovski², Bo Pang² - Show less +1 more•Institutions (2)

University of Massachusetts Amherst¹, Yahoo!²

09 Feb 2009

TL;DR: This study proposes a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems, and shows that by considering the Web search results in the query's original language as additional sources of information, it can alleviate the effect of erroneous machine translation.

...read moreread less

Abstract: The non-English Web is growing at phenomenal speed, but available language processing tools and resources are predominantly English-based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of arguable quality. Given that building comprehensive taxonomies for each language is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable text processing tasks in other languages. Our experimental results confirm that the answer is affirmative with respect to at least one task. In this study we focus on query classification, which is essential for understanding the user intent both in Web search and in online advertising. We propose a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems. In particular, we show that by considering the Web search results in the query's original language as additional sources of information, we can alleviate the effect of erroneous machine translation. Empirical evaluation on query sets in languages as diverse as Chinese and Russian yields very encouraging results; consequently, we believe that our approach is also applicable to many additional languages.

...read moreread less

Patent•

Non-exact cache matching

[...]

Andrei Z. Broder¹, Vanja Josifovski¹, Shanmugasundaram Ravikumar¹, Sandeep Pandey¹, Serguei Vassilvitskii¹, Flavio Chierichetti¹ - Show less +2 more•Institutions (1)

Yahoo!¹

12 Feb 2009

TL;DR: In this article, the subject matter relates to returning cached object results based at least in part on a non-exact comparison with a query key, which is similar to our approach.

...read moreread less

Abstract: The subject matter disclosed herein relates to returning cached object results based at least in part on a non-exact comparison with a query key.

...read moreread less

Patent•

Identification of related bid phrases and categories using co-bidding information

[...]

Vanja Josifovski¹, Andrei Z. Broder¹, Patrick Pantel¹, Ana-Maria Popescu¹, Evgeniy Gabrilovich¹, William Swei Chang¹ - Show less +2 more•Institutions (1)

Yahoo!¹

13 May 2009

TL;DR: In this paper, a method and system for determining related bid terms is presented, which includes accessing a term database to determine a plurality of term pairs, the term pairs being paired terms bidded together in a term bidding operating environment.

...read moreread less

Abstract: The present invention provides a method and system for determining related bid terms. The method and system includes accessing a term database to determine a plurality of term pairs, the term pairs being paired terms bidded together in a term bidding operating environment. In the method and system, for each of the plurality of term pairs, the method and system includes determining similarity values for each of the term pairs. The method and system further includes generating a similarity matrix using the determined similarity values. And, the method and system includes generating an output result based on a co-bidded relationship between at least one of the terms and advertising information.

...read moreread less