Institution

Yahoo!

Company•London, United Kingdom•

About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.

...read moreread less

Topics: Population, Web search query, Web page, Web query classification, Query expansion ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Spatial and temporal variations in precipitation in the Upper Indus Basin, global teleconnections and hydrological implications

[...]

D.R. Archer¹, Hayley J. Fowler², Hayley J. Fowler¹•Institutions (2)

Yahoo!¹, University of Newcastle²

29 Feb 2004-Hydrology and Earth System Sciences

TL;DR: In this paper, the authors investigated spatial variation in precipitation by correlation and regression analysis of long-period records and found that there is a strong positive correlation between winter precipitation at stations over the entire region, so that, for practical forecasting of summer runoff in some basins, a single valley-floor precipitation station can be used.

...read moreread less

Abstract: . Most of the flow in the River Indus from its upper mountain basin is derived from melting snow and glaciers. Climatic variability and change of both precipitation and energy inputs will, therefore, affect rural livelihoods at both a local and a regional scale through effects on summer runoff in the River Indus. Spatial variation in precipitation has been investigated by correlation and regression analysis of long-period records. There is a strong positive correlation between winter precipitation at stations over the entire region, so that, for practical forecasting of summer runoff in some basins, a single valley-floor precipitation station can be used In contrast, spatial relationships in seasonal precipitation are weaker in summer and sometimes significantly negative between stations north and south of the Himalayan divide. Although analysis of long datasets of precipitation from 1895 shows no significant trend, from 1961–1999 there are statistically significant increases in winter, in summer and in the annual precipitation at several stations. Preliminary analysis has identified a significant positive correlation between the winter North Atlantic Oscillation (NAO) and winter precipitation in the Karakoram and a negative correlation between NAO and summer rainfall at some stations. Keywords: upper Indus basin, climate change, time series analysis, spatial correlation, teleconnections

...read moreread less

447 citations

Journal Article•DOI•

A data-based approach to social influence maximization

[...]

Amit Goyal¹, Francesco Bonchi², Laks V. S. Lakshmanan¹•Institutions (2)

University of British Columbia¹, Yahoo!²

01 Sep 2011

TL;DR: In this article, a data-based model called credit distribution is proposed to estimate expected influence spread in a social network, which directly leverages available propagation traces to learn how influence flows in the network.

...read moreread less

Abstract: Influence maximization is the problem of finding a set of users in a social network, such that by targeting this set, one maximizes the expected spread of influence in the network. Most of the literature on this topic has focused exclusively on the social graph, overlooking historical data, i.e., traces of past action propagations. In this paper, we study influence maximization from a novel data-based perspective. In particular, we introduce a new model, which we call credit distribution, that directly leverages available propagation traces to learn how influence flows in the network and uses this to estimate expected influence spread. Our approach also learns the different levels of influence-ability of users, and it is time-aware in the sense that it takes the temporal nature of influence into account.We show that influence maximization under the credit distribution model is NP-hard and that the function that defines expected spread under our model is submodular. Based on these, we develop an approximation algorithm for solving the influence maximization problem that at once enjoys high accuracy compared to the standard approach, while being several orders of magnitude faster and more scalable.

...read moreread less

447 citations

Proceedings Article•DOI•

Generating diverse and representative image search results for landmarks

[...]

Lyndon Kennedy¹, Mor Naaman²•Institutions (2)

Columbia University¹, Yahoo!²

21 Apr 2008

TL;DR: This work uses a combination of context- and content-based tools to generate representative sets of images for location-driven features and landmarks, a common search task.

...read moreread less

Abstract: Can we leverage the community-contributed collections of rich media on the web to automatically generate representative and diverse views of the world's landmarks? We use a combination of context- and content-based tools to generate representative sets of images for location-driven features and landmarks, a common search task. To do that, we using location and other metadata, as well as tags associated with images, and the images' visual features. We present an approach to extracting tags that represent landmarks. We show how to use unsupervised methods to extract representative views and images for each landmark. This approach can potentially scale to provide better search and representation for landmarks, worldwide. We evaluate the system in the context of image search using a real-life dataset of 110,000 images from the San Francisco area.

...read moreread less

444 citations

Posted Content•

Scalable K-Means++

[...]

Bahman Bahmani¹, Benjamin Moseley², Andrea Vattani³, Ravi Kumar⁴, Sergei Vassilvitskii⁴ - Show less +1 more•Institutions (4)

Stanford University¹, University of Illinois at Urbana–Champaign², University of California, San Diego³, Yahoo!⁴

29 Mar 2012-arXiv: Databases

TL;DR: It is proved that the proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and Experimental evaluation on real-world large-scale data demonstrates that k-Means|| outperforms k- means++ in both sequential and parallel settings.

...read moreread less

Abstract: Over half a century old and showing no signs of aging, k-means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k-means is crucial for obtaining a good final solution. The recently proposed k-means++ initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. A major downside of the k-means++ is its inherent sequential nature, which limits its applicability to massive data: one must make k passes over the data to find a good initial set of centers. In this work we show how to drastically reduce the number of passes needed to obtain, in parallel, a good initialization. This is unlike prevailing efforts on parallelizing k-means that have mostly focused on the post-initialization phases of k-means. We prove that our proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and then show that in practice a constant number of passes suffices. Experimental evaluation on real-world large-scale data demonstrates that k-means|| outperforms k-means++ in both sequential and parallel settings.

...read moreread less

438 citations

Proceedings Article•DOI•

A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion

[...]

Alessandro Sordoni¹, Yoshua Bengio¹, Hossein Vahabi², Christina Lioma³, Jakob Grue Simonsen³, Jian-Yun Nie¹ - Show less +2 more•Institutions (3)

Université de Montréal¹, Yahoo!², University of Copenhagen³

17 Oct 2015

TL;DR: This work presents a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths and is sensitive to the order of queries in the context while avoiding data sparsity.

...read moreread less

Abstract: Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a novel hierarchical recurrent encoder-decoder architecture that makes possible to account for sequences of previous queries of arbitrary lengths. As a result, our suggestions are sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that our model outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our architecture is general enough to be used in a variety of other applications.

...read moreread less

437 citations

Collapse

Authors

Showing all 26766 results

Name	H-index	Papers	Citations
Ashok Kumar	151	5654	164086
Alexander J. Smola	122	434	110222
Howard I. Maibach	116	1821	60765
Sanjay Jain	103	881	46880
Amirhossein Sahebkar	100	1307	46132
Marc Davis	99	412	50243
Wenjun Zhang	96	976	38530
Jian Xu	94	1366	52057
Fortunato Ciardiello	94	695	47352
Tong Zhang	93	414	36519
Michael E. J. Lean	92	411	30939
Ashish K. Jha	87	503	30020
Xin Zhang	87	1714	40102
Theunis Piersma	86	632	34201
George Varghese	84	253	28598