scispace - formally typeset
Search or ask a question
Institution

Yahoo!

CompanyLondon, United Kingdom
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.


Papers
More filters
Proceedings ArticleDOI
Deepak Agarwal1, Bee-Chung Chen1
04 Feb 2010
TL;DR: The authors proposed fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a "bag-of-words" representation for item meta-data is natural Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively Because of data sparseness, regularization is key to good predictive accuracy.
Abstract: We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a "bag-of-words" representation for item meta-data is natural Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively Because of data sparseness, regularization is key to good predictive accuracy Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item Then, user rating on an item is modeled as user's affinity to the item's topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively We show our model is accurate, interpretable and handles both cold-start and warm-start scenarios seamlessly through a single model The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in cold-start scenarios and is comparable to state-of-the-art methods in warm-start scenarios As a by-product, fLDA also identifies interesting topics that explains user-item interactions Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to collaborative filtering applications While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors

296 citations

Patent
01 Nov 2006
TL;DR: In this paper, a GPS coordinate and a search criterion are received from a client device associated with a member of a social network, and a route is determined between the start and end location and through the location of interest.
Abstract: A device, system, and method are directed towards providing location information from a social network. A GPS coordinate and a search criterion are received from a client device associated with a member of a social network. The social network is searched for another member associated with a location name based on the GPS coordinate and the search criterion. The location name may be a sponsored advertisement. The location name is provided to the client device. A communication may be enabled between the member and the other member. Moreover, a start and end location may also be received. The GPS coordinate and/or search criterion may be associated with either the start or end location. The searched location name is used to determine a location of interest. A route is determined between the start and end location and through the location of interest. The route is provided to the client device.

296 citations

Proceedings ArticleDOI
02 Nov 2009
TL;DR: Two different distributed methods that generates exact stochastic GBDT models are presented, the first is a MapReduce implementation and the second utilizes MPI on the Hadoop grid environment.
Abstract: Stochastic Gradient Boosted Decision Trees (GBDT) is one of the most widely used learning algorithms in machine learning today. It is adaptable, easy to interpret, and produces highly accurate models. However, most implementations today are computationally expensive and require all training data to be in main memory. As training data becomes ever larger, there is motivation for us to parallelize the GBDT algorithm. Parallelizing decision tree training is intuitive and various approaches have been explored in existing literature. Stochastic boosting on the other hand is inherently a sequential process and have not been applied to distributed decision trees. In this work, we present two different distributed methods that generates exact stochastic GBDT models, the first is a MapReduce implementation and the second utilizes MPI on the Hadoop grid environment.

296 citations

Proceedings ArticleDOI
05 Jul 2008
TL;DR: A highly efficient solver for the particular instance of semidefinite programming that arises in LMNN classification is described; this solver can handle problems with billions of large margin constraints in a few hours.
Abstract: In this paper we study how to improve nearest neighbor classification by learning a Mahalanobis distance metric. We build on a recently proposed framework for distance metric learning known as large margin nearest neighbor (LMNN) classification. Our paper makes three contributions. First, we describe a highly efficient solver for the particular instance of semidefinite programming that arises in LMNN classification; our solver can handle problems with billions of large margin constraints in a few hours. Second, we show how to reduce both training and testing times using metric ball trees; the speedups from ball trees are further magnified by learning low dimensional representations of the input space. Third, we show how to learn different Mahalanobis distance metrics in different parts of the input space. For large data sets, the use of locally adaptive distance metrics leads to even lower error rates.

295 citations

Proceedings ArticleDOI
06 Jun 2011
TL;DR: This paper studies the ranking algorithm in the random arrivals model, and shows that it has a competitive ratio of at least 0.696, beating the 1-1/e ≈ 0.632 barrier in the adversarial model.
Abstract: In a seminal paper, Karp, Vazirani, and Vazirani show that a simple ranking algorithm achieves a competitive ratio of 1-1/e for the online bipartite matching problem in the standard adversarial model, where the ratio of 1-1/e is also shown to be optimal. Their result also implies that in the random arrivals model defined by Goel and Mehta, where the online nodes arrive in a random order, a simple greedy algorithm achieves a competitive ratio of 1-1/e. In this paper, we study the ranking algorithm in the random arrivals model, and show that it has a competitive ratio of at least 0.696, beating the 1-1/e ≈ 0.632 barrier in the adversarial model. Our result also extends to the i.i.d. distribution model of Feldman et al., removing the assumption that the distribution is known.Our analysis has two main steps. First, we exploit certain dominance and monotonicity properties of the ranking algorithm to derive a family of factor-revealing linear programs (LPs). In particular, by symmetry of the ranking algorithm in the random arrivals model, we have the monotonicity property on both sides of the bipartite graph, giving good "strength" to the LPs. Second, to obtain a good lower bound on the optimal values of all these LPs and hence on the competitive ratio of the algorithm, we introduce the technique of strongly factor-revealing LPs. In particular, we derive a family of modified LPs with similar strength such that the optimal value of any single one of these new LPs is a lower bound on the competitive ratio of the algorithm. This enables us to leverage the power of computer LP solvers to solve for large instances of the new LPs to establish bounds that would otherwise be difficult to attain by human analysis.

295 citations


Authors

Showing all 26766 results

NameH-indexPapersCitations
Ashok Kumar1515654164086
Alexander J. Smola122434110222
Howard I. Maibach116182160765
Sanjay Jain10388146880
Amirhossein Sahebkar100130746132
Marc Davis9941250243
Wenjun Zhang9697638530
Jian Xu94136652057
Fortunato Ciardiello9469547352
Tong Zhang9341436519
Michael E. J. Lean9241130939
Ashish K. Jha8750330020
Xin Zhang87171440102
Theunis Piersma8663234201
George Varghese8425328598
Network Information
Related Institutions (5)
University of Toronto
294.9K papers, 13.5M citations

85% related

University of California, San Diego
204.5K papers, 12.3M citations

85% related

University College London
210.6K papers, 9.8M citations

84% related

Cornell University
235.5K papers, 12.2M citations

84% related

University of Washington
305.5K papers, 17.7M citations

84% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20232
202247
20211,088
20201,074
20191,568
20181,352