Institution
Yahoo!
Company•London, United Kingdom•
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.
Papers published on a yearly basis
Papers
More filters
••
04 Feb 2010TL;DR: The authors proposed fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a "bag-of-words" representation for item meta-data is natural Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively Because of data sparseness, regularization is key to good predictive accuracy.
Abstract: We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a "bag-of-words" representation for item meta-data is natural Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively Because of data sparseness, regularization is key to good predictive accuracy Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item Then, user rating on an item is modeled as user's affinity to the item's topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively We show our model is accurate, interpretable and handles both cold-start and warm-start scenarios seamlessly through a single model The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in cold-start scenarios and is comparable to state-of-the-art methods in warm-start scenarios As a by-product, fLDA also identifies interesting topics that explains user-item interactions Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to collaborative filtering applications While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors
296 citations
•
01 Nov 2006TL;DR: In this paper, a GPS coordinate and a search criterion are received from a client device associated with a member of a social network, and a route is determined between the start and end location and through the location of interest.
Abstract: A device, system, and method are directed towards providing location information from a social network. A GPS coordinate and a search criterion are received from a client device associated with a member of a social network. The social network is searched for another member associated with a location name based on the GPS coordinate and the search criterion. The location name may be a sponsored advertisement. The location name is provided to the client device. A communication may be enabled between the member and the other member. Moreover, a start and end location may also be received. The GPS coordinate and/or search criterion may be associated with either the start or end location. The searched location name is used to determine a location of interest. A route is determined between the start and end location and through the location of interest. The route is provided to the client device.
296 citations
••
02 Nov 2009TL;DR: Two different distributed methods that generates exact stochastic GBDT models are presented, the first is a MapReduce implementation and the second utilizes MPI on the Hadoop grid environment.
Abstract: Stochastic Gradient Boosted Decision Trees (GBDT) is one of the most widely used learning algorithms in machine learning today. It is adaptable, easy to interpret, and produces highly accurate models. However, most implementations today are computationally expensive and require all training data to be in main memory. As training data becomes ever larger, there is motivation for us to parallelize the GBDT algorithm. Parallelizing decision tree training is intuitive and various approaches have been explored in existing literature. Stochastic boosting on the other hand is inherently a sequential process and have not been applied to distributed decision trees. In this work, we present two different distributed methods that generates exact stochastic GBDT models, the first is a MapReduce implementation and the second utilizes MPI on the Hadoop grid environment.
296 citations
••
05 Jul 2008TL;DR: A highly efficient solver for the particular instance of semidefinite programming that arises in LMNN classification is described; this solver can handle problems with billions of large margin constraints in a few hours.
Abstract: In this paper we study how to improve nearest neighbor classification by learning a Mahalanobis distance metric. We build on a recently proposed framework for distance metric learning known as large margin nearest neighbor (LMNN) classification. Our paper makes three contributions. First, we describe a highly efficient solver for the particular instance of semidefinite programming that arises in LMNN classification; our solver can handle problems with billions of large margin constraints in a few hours. Second, we show how to reduce both training and testing times using metric ball trees; the speedups from ball trees are further magnified by learning low dimensional representations of the input space. Third, we show how to learn different Mahalanobis distance metrics in different parts of the input space. For large data sets, the use of locally adaptive distance metrics leads to even lower error rates.
295 citations
••
06 Jun 2011
TL;DR: This paper studies the ranking algorithm in the random arrivals model, and shows that it has a competitive ratio of at least 0.696, beating the 1-1/e ≈ 0.632 barrier in the adversarial model.
Abstract: In a seminal paper, Karp, Vazirani, and Vazirani show that a simple ranking algorithm achieves a competitive ratio of 1-1/e for the online bipartite matching problem in the standard adversarial model, where the ratio of 1-1/e is also shown to be optimal. Their result also implies that in the random arrivals model defined by Goel and Mehta, where the online nodes arrive in a random order, a simple greedy algorithm achieves a competitive ratio of 1-1/e. In this paper, we study the ranking algorithm in the random arrivals model, and show that it has a competitive ratio of at least 0.696, beating the 1-1/e ≈ 0.632 barrier in the adversarial model. Our result also extends to the i.i.d. distribution model of Feldman et al., removing the assumption that the distribution is known.Our analysis has two main steps. First, we exploit certain dominance and monotonicity properties of the ranking algorithm to derive a family of factor-revealing linear programs (LPs). In particular, by symmetry of the ranking algorithm in the random arrivals model, we have the monotonicity property on both sides of the bipartite graph, giving good "strength" to the LPs. Second, to obtain a good lower bound on the optimal values of all these LPs and hence on the competitive ratio of the algorithm, we introduce the technique of strongly factor-revealing LPs. In particular, we derive a family of modified LPs with similar strength such that the optimal value of any single one of these new LPs is a lower bound on the competitive ratio of the algorithm. This enables us to leverage the power of computer LP solvers to solve for large instances of the new LPs to establish bounds that would otherwise be difficult to attain by human analysis.
295 citations
Authors
Showing all 26766 results
Name | H-index | Papers | Citations |
---|---|---|---|
Ashok Kumar | 151 | 5654 | 164086 |
Alexander J. Smola | 122 | 434 | 110222 |
Howard I. Maibach | 116 | 1821 | 60765 |
Sanjay Jain | 103 | 881 | 46880 |
Amirhossein Sahebkar | 100 | 1307 | 46132 |
Marc Davis | 99 | 412 | 50243 |
Wenjun Zhang | 96 | 976 | 38530 |
Jian Xu | 94 | 1366 | 52057 |
Fortunato Ciardiello | 94 | 695 | 47352 |
Tong Zhang | 93 | 414 | 36519 |
Michael E. J. Lean | 92 | 411 | 30939 |
Ashish K. Jha | 87 | 503 | 30020 |
Xin Zhang | 87 | 1714 | 40102 |
Theunis Piersma | 86 | 632 | 34201 |
George Varghese | 84 | 253 | 28598 |