Institution
Yahoo!
Company•London, United Kingdom•
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: In this paper, the authors investigated spatial variation in precipitation by correlation and regression analysis of long-period records and found that there is a strong positive correlation between winter precipitation at stations over the entire region, so that, for practical forecasting of summer runoff in some basins, a single valley-floor precipitation station can be used.
Abstract: . Most of the flow in the River Indus from its upper mountain basin is derived from melting snow and glaciers. Climatic variability and change of both precipitation and energy inputs will, therefore, affect rural livelihoods at both a local and a regional scale through effects on summer runoff in the River Indus. Spatial variation in precipitation has been investigated by correlation and regression analysis of long-period records. There is a strong positive correlation between winter precipitation at stations over the entire region, so that, for practical forecasting of summer runoff in some basins, a single valley-floor precipitation station can be used In contrast, spatial relationships in seasonal precipitation are weaker in summer and sometimes significantly negative between stations north and south of the Himalayan divide. Although analysis of long datasets of precipitation from 1895 shows no significant trend, from 1961–1999 there are statistically significant increases in winter, in summer and in the annual precipitation at several stations. Preliminary analysis has identified a significant positive correlation between the winter North Atlantic Oscillation (NAO) and winter precipitation in the Karakoram and a negative correlation between NAO and summer rainfall at some stations. Keywords: upper Indus basin, climate change, time series analysis, spatial correlation, teleconnections
447 citations
••
01 Sep 2011TL;DR: In this article, a data-based model called credit distribution is proposed to estimate expected influence spread in a social network, which directly leverages available propagation traces to learn how influence flows in the network.
Abstract: Influence maximization is the problem of finding a set of users in a social network, such that by targeting this set, one maximizes the expected spread of influence in the network. Most of the literature on this topic has focused exclusively on the social graph, overlooking historical data, i.e., traces of past action propagations. In this paper, we study influence maximization from a novel data-based perspective. In particular, we introduce a new model, which we call credit distribution, that directly leverages available propagation traces to learn how influence flows in the network and uses this to estimate expected influence spread. Our approach also learns the different levels of influence-ability of users, and it is time-aware in the sense that it takes the temporal nature of influence into account.We show that influence maximization under the credit distribution model is NP-hard and that the function that defines expected spread under our model is submodular. Based on these, we develop an approximation algorithm for solving the influence maximization problem that at once enjoys high accuracy compared to the standard approach, while being several orders of magnitude faster and more scalable.
447 citations
••
21 Apr 2008TL;DR: This work uses a combination of context- and content-based tools to generate representative sets of images for location-driven features and landmarks, a common search task.
Abstract: Can we leverage the community-contributed collections of rich media on the web to automatically generate representative and diverse views of the world's landmarks? We use a combination of context- and content-based tools to generate representative sets of images for location-driven features and landmarks, a common search task. To do that, we using location and other metadata, as well as tags associated with images, and the images' visual features. We present an approach to extracting tags that represent landmarks. We show how to use unsupervised methods to extract representative views and images for each landmark. This approach can potentially scale to provide better search and representation for landmarks, worldwide. We evaluate the system in the context of image search using a real-life dataset of 110,000 images from the San Francisco area.
444 citations
•
[...]
TL;DR: It is proved that the proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and Experimental evaluation on real-world large-scale data demonstrates that k-Means|| outperforms k- means++ in both sequential and parallel settings.
Abstract: Over half a century old and showing no signs of aging, k-means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k-means is crucial for obtaining a good final solution. The recently proposed k-means++ initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. A major downside of the k-means++ is its inherent sequential nature, which limits its applicability to massive data: one must make k passes over the data to find a good initial set of centers. In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. This is unlike prevailing efforts on parallelizing k-means that have mostly focused on the post-initialization phases of k-means. We prove that our proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and then show that in practice a constant number of passes suffices. Experimental evaluation on real-world large-scale data demonstrates that k-means|| outperforms k-means++ in both sequential and parallel settings.
438 citations
••
17 Oct 2015TL;DR: This work presents a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths and is sensitive to the order of queries in the context while avoiding data sparsity.
Abstract: Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths. As a result, our suggestions are sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that our model outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our architecture is general enough to be used in a variety of other applications.
437 citations
Authors
Showing all 26766 results
Name | H-index | Papers | Citations |
---|---|---|---|
Ashok Kumar | 151 | 5654 | 164086 |
Alexander J. Smola | 122 | 434 | 110222 |
Howard I. Maibach | 116 | 1821 | 60765 |
Sanjay Jain | 103 | 881 | 46880 |
Amirhossein Sahebkar | 100 | 1307 | 46132 |
Marc Davis | 99 | 412 | 50243 |
Wenjun Zhang | 96 | 976 | 38530 |
Jian Xu | 94 | 1366 | 52057 |
Fortunato Ciardiello | 94 | 695 | 47352 |
Tong Zhang | 93 | 414 | 36519 |
Michael E. J. Lean | 92 | 411 | 30939 |
Ashish K. Jha | 87 | 503 | 30020 |
Xin Zhang | 87 | 1714 | 40102 |
Theunis Piersma | 86 | 632 | 34201 |
George Varghese | 84 | 253 | 28598 |