Institution

Yahoo!

Company•London, United Kingdom•

About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.

...read moreread less

Topics: Population, Web search query, Web page, Web query classification, Query expansion ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

[...]

Lihong Li¹, Wei Chu¹, John Langford¹, Xuanhui Wang¹•Institutions (1)

Yahoo!¹

09 Feb 2011

TL;DR: In this paper, the authors introduce a replay methodology for contextual bandit algorithm evaluation, which is completely data-driven and very easy to adapt to different applications, and provide provably unbiased evaluations.

...read moreread less

Abstract: Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.

...read moreread less

412 citations

Proceedings Article•DOI•

Discovering geographical topics in the twitter stream

[...]

Liangjie Hong¹, Amr Ahmed², Siva Gurumurthy³, Alexander J. Smola², Kostas Tsioutsiouliklis³ - Show less +1 more•Institutions (3)

Lehigh University¹, Yahoo!², Twitter³

16 Apr 2012

TL;DR: An algorithm is presented by modeling diversity in tweets based on topical diversity, geographical diversity, and an interest distribution of the user by exploiting sparse factorial coding of the attributes, thus allowing it to deal with a large and diverse set of covariates efficiently.

...read moreread less

Abstract: Micro-blogging services have become indispensable communication tools for online users for disseminating breaking news, eyewitness accounts, individual expression, and protest groups. Recently, Twitter, along with other online social networking services such as Foursquare, Gowalla, Facebook and Yelp, have started supporting location services in their messages, either explicitly, by letting users choose their places, or implicitly, by enabling geo-tagging, which is to associate messages with latitudes and longitudes. This functionality allows researchers to address an exciting set of questions: 1) How is information created and shared across geographical locations, 2) How do spatial and linguistic characteristics of people vary across regions, and 3) How to model human mobility. Although many attempts have been made for tackling these problems, previous methods are either complicated to be implemented or oversimplified that cannot yield reasonable performance. It is a challenge task to discover topics and identify users' interests from these geo-tagged messages due to the sheer amount of data and diversity of language variations used on these location sharing services. In this paper we focus on Twitter and present an algorithm by modeling diversity in tweets based on topical diversity, geographical diversity, and an interest distribution of the user. Furthermore, we take the Markovian nature of a user's location into account. Our model exploits sparse factorial coding of the attributes, thus allowing us to deal with a large and diverse set of covariates efficiently. Our approach is vital for applications such as user profiling, content recommendation and topic tracking. We show high accuracy in location estimation based on our model. Moreover, the algorithm identifies interesting topics based on location and language.

...read moreread less

407 citations

Proceedings Article•DOI•

Spectral clustering for multi-type relational data

[...]

Bo Long¹, Zhongfei Zhang¹, Xiaoyun Wu², Philip S. Yu³•Institutions (3)

Binghamton University¹, Yahoo!², IBM³

25 Jun 2006

TL;DR: A general model, the collective factorization on related matrices, is proposed for multi-type relational data clustering and a novel algorithm is derived, the spectral relational clustering, to cluster multi- type interrelated data objects simultaneously.

...read moreread less

Abstract: Clustering on multi-type relational data has attracted more and more attention in recent years due to its high impact on various important applications, such as Web mining, e-commerce and bioinformatics. However, the research on general multi-type relational data clustering is still limited and preliminary. The contribution of the paper is three-fold. First, we propose a general model, the collective factorization on related matrices, for multi-type relational data clustering. The model is applicable to relational data with various structures. Second, under this model, we derive a novel algorithm, the spectral relational clustering, to cluster multi-type interrelated data objects simultaneously. The algorithm iteratively embeds each type of data objects into low dimensional spaces and benefits from the interactions among the hidden structures of different types of data objects. Extensive experiments demonstrate the promise and effectiveness of the proposed algorithm. Third, we show that the existing spectral clustering algorithms can be considered as the special cases of the proposed model and algorithm. This demonstrates the good theoretic generality of the proposed model and algorithm.

...read moreread less

406 citations

Proceedings Article•DOI•

Like like alike: joint friendship and interest propagation in social networks

[...]

Shuang-Hong Yang¹, Bo Long², Alexander J. Smola², Narayanan Sadagopan², Zhaohui Zheng², Hongyuan Zha¹ - Show less +2 more•Institutions (2)

Georgia Institute of Technology¹, Yahoo!²

28 Mar 2011

TL;DR: This paper proposes a framework that exploits homophily to establish an integrated network linking a user to interested services and connecting different users with common interests, upon which both friendship and interests could be efficiently propagated.

...read moreread less

Abstract: Targeting interest to match a user with services (e.g. news, products, games, advertisements) and predicting friendship to build connections among users are two fundamental tasks for social network systems. In this paper, we show that the information contained in interest networks (i.e. user-service interactions) and friendship networks (i.e. user-user connections) is highly correlated and mutually helpful. We propose a framework that exploits homophily to establish an integrated network linking a user to interested services and connecting different users with common interests, upon which both friendship and interests could be efficiently propagated. The proposed friendship-interest propagation (FIP) framework devises a factor-based random walk model to explain friendship connections, and simultaneously it uses a coupled latent factor model to uncover interest interactions. We discuss the flexibility of the framework in the choices of loss objectives and regularization penalties and benchmark different variants on the Yahoo! Pulse social networking system. Experiments demonstrate that by coupling friendship with interest, FIP achieves much higher performance on both interest targeting and friendship prediction than systems using only one source of information.

...read moreread less

403 citations

Posted Content•

A Reductions Approach to Fair Classification

[...]

Alekh Agarwal¹, Alina Beygelzimer², Miroslav Dudík¹, John Langford¹, Hanna Wallach¹ - Show less +1 more•Institutions (2)

Microsoft¹, Yahoo!²

06 Mar 2018-arXiv: Learning

TL;DR: The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier with the lowest (empirical) error subject to the desired constraints.

...read moreread less

Abstract: We present a systematic approach for achieving fairness in a binary classification setting. While we focus on two well-known quantitative definitions of fairness, our approach encompasses many other previously studied definitions as special cases. The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier with the lowest (empirical) error subject to the desired constraints. We introduce two reductions that work for any representation of the cost-sensitive classifier and compare favorably to prior baselines on a variety of data sets, while overcoming several of their disadvantages.

...read moreread less

402 citations

Collapse

Authors

Showing all 26766 results

Name	H-index	Papers	Citations
Ashok Kumar	151	5654	164086
Alexander J. Smola	122	434	110222
Howard I. Maibach	116	1821	60765
Sanjay Jain	103	881	46880
Amirhossein Sahebkar	100	1307	46132
Marc Davis	99	412	50243
Wenjun Zhang	96	976	38530
Jian Xu	94	1366	52057
Fortunato Ciardiello	94	695	47352
Tong Zhang	93	414	36519
Michael E. J. Lean	92	411	30939
Ashish K. Jha	87	503	30020
Xin Zhang	87	1714	40102
Theunis Piersma	86	632	34201
George Varghese	84	253	28598