Institution

Yahoo!

Company•London, United Kingdom•

About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.

...read moreread less

Topics: Population, Web search query, Web page, Web query classification, Query expansion ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Ozone modifies associations between temperature and cardiovascular mortality: analysis of the NMMAPS data.

[...]

Cizao Ren¹, Gail M. Williams, Lidia Morawska, Kerrie Mengersen, Shilu Tong - Show less +1 more•Institutions (1)

Yahoo!¹

01 Apr 2008-Occupational and Environmental Medicine

TL;DR: It is important to evaluate the modifying role of ozone when estimating temperature-related health impacts and to further investigate the reasons behind the regional variability and mechanism for the interaction between temperature and ozone, which indicate that ozone positively modified the temperature-CVM associations across the different regions.

...read moreread less

Abstract: Objectives: Both ambient ozone and temperature are associated with human health. However, few data are available on whether ozone modifies temperature effects. This study aims to explore whether ozone modified associations between maximum temperature and cardiovascular mortality in the USA. Methods: The authors obtained data from the US National Morbidity, Mortality, and Air Pollution Study (NMMAPS) website. They used two time-series Poisson regression models (a response surface model and a stratification model) to examine whether ozone modified associations between maximum temperature and cardiovascular mortality (CVM) in 95 large US communities during 1987–2000 in summer (June to September). Bayesian meta-analysis was used to pool estimates in each community. Results: The response surface model was used to examine the joint effects of temperature and ozone on CVM in summer. Results indicate that ozone positively modified the temperature-CVM associations across the different regions. The stratification model quantified the temperature-CVM associations across different levels of ozone. Results show that in general the higher the ozone concentration, the stronger the temperature-CVM associations across the communities. A 10uC increase in temperature on the same day was associated with an increase in CVM by 1.17% and 8.31% for the lowest and highest level of ozone concentrations in all communities, respectively. Conclusion: Ozone modified temperature effects in different regions in the USA. It is important to evaluate the modifying role of ozone when estimating temperaturerelated health impacts and to further investigate the reasons behind the regional variability and mechanism for the interaction between temperature and ozone.

...read moreread less

170 citations

Journal Article•DOI•

A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

[...]

Yunjae Jung, Haesun Park¹, Ding-Zhu Du², Barry L. Drake³•Institutions (3)

Korea Institute for Advanced Study¹, University of Minnesota², Yahoo!³

01 Jan 2003-Journal of Global Optimization

TL;DR: The clustering gain measure is defined as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds and shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results.

...read moreread less

Abstract: Clustering has been widely used to partition data into groups so that the degree of association is high among members of the same group and low among members of different groups. Though many effective and efficient clustering algorithms have been developed and deployed, most of them still suffer from the lack of automatic or online decision for optimal number of clusters. In this paper, we define clustering gain as a measure for clustering optimality, which is based on the squared error sum as a clustering algorithm proceeds. When the measure is applied to a hierarchical clustering algorithm, an optimal number of clusters can be found. Our clustering measure shows good performance producing intuitively reasonable clustering configurations in Euclidean space according to the evidence from experimental results. Furthermore, the measure can be utilized to estimate the desired number of clusters for partitional clustering methods as well. Therefore, the clustering gain measure provides a promising technique for achieving a higher level of quality for a wide range of clustering methods.

...read moreread less

170 citations

Posted Content•

Contextual Bandit Algorithms with Supervised Learning Guarantees

[...]

Alina Beygelzimer¹, John Langford², Lihong Li², Lev Reyzin³, Robert E. Schapire⁴ - Show less +1 more•Institutions (4)

IBM¹, Yahoo!², Georgia Institute of Technology³, Princeton University⁴

22 Feb 2010-arXiv: Learning

TL;DR: In this paper, the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices is addressed.

...read moreread less

Abstract: We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-\delta$ while incurring regret at most $O(\sqrt{KT\ln(N/\delta)})$ over $T$ time steps. The new algorithm is tested empirically in a large-scale, real-world dataset. Second, we give a new algorithm called VE that competes with a possibly infinite set of policies of VC-dimension $d$ while incurring regret at most $O(\sqrt{T(d\ln(T) + \ln (1/\delta))})$ with probability $1-\delta$. These guarantees improve on those of all previous algorithms, whether in a stochastic or adversarial environment, and bring us closer to providing supervised learning type guarantees for the contextual bandit setting.

...read moreread less

170 citations

Proceedings Article•DOI•

Online aggregation and continuous query support in MapReduce

[...]

Tyson Condie¹, Neil Conway¹, Peter Alvaro¹, Joseph M. Hellerstein¹, John Gerth², Justin Talbot², Khaled Elmeleegy³, Russell Sears³ - Show less +4 more•Institutions (3)

University of California, Berkeley¹, Stanford University², Yahoo!³

06 Jun 2010

TL;DR: A modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed, and can reduce completion times and improve system utilization for batch jobs as well.

...read moreread less

Abstract: MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to disk before it is consumed. In this demonstration, we describe a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. We demonstrate a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed. Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop, and can run unmodified user-defined MapReduce programs.

...read moreread less

170 citations

Proceedings Article•DOI•

Sketching probabilistic data streams

[...]

Graham Cormode¹, Minos Garofalakis²•Institutions (2)

AT&T Labs¹, Yahoo!²

11 Jun 2007

TL;DR: These algorithms offer strong randomized estimation guarantees while using only sublinear space in the size of the stream(s), and rely on novel, concise streaming sketch synopses that extend conventional sketching ideas to the probabilistic streams setting.

...read moreread less

Abstract: The management of uncertain, probabilistic data has recently emerged as a useful paradigm for dealing with the inherent unreliabilities of several real-world application domains, including data cleaning, information integration, and pervasive, multi-sensor computing. Unlike conventional data sets, a set of probabilistic tuples defines a probability distribution over an exponential number of possible worlds (i.e., "grounded", deterministic databases). This "possibleworlds" interpretation allows for clean query semantics but also raises hard computational problems for probabilistic database query processors. To further complicate matters, in many scenarios (e.g., large-scale process and environmental monitoring using multiple sensor modalities), probabilistic data tuples arrive and need to be processed in a streaming fashion; that is, using limited memory and CPU resources and without the benefit of multiple passes over a static probabilistic database. Such probabilistic data streams raise a host of new research challenges for stream-processing engines that, to date, remain largely unaddressed. In this paper, we propose the first space- and time-efficient algorithms for approximating complex aggregate queries (including, the number of distinct values and join/self-join sizes) over probabilistic data streams. Following the possible-worlds semantics, such aggregates essentially define probability distributions over the space of possible aggregation results, and our goal is to characterize such distributions through efficient approximations of their key moments (such as expectation and variance). Our algorithms offer strong randomized estimation guarantees while using only sublinear space in the size of the stream(s), and rely on novel, concise streaming sketch synopses that extend conventional sketching ideas to the probabilistic streams setting. Our experimental results verify the effectiveness of our approach.

...read moreread less

170 citations

Collapse

Authors

Showing all 26766 results

Name	H-index	Papers	Citations
Ashok Kumar	151	5654	164086
Alexander J. Smola	122	434	110222
Howard I. Maibach	116	1821	60765
Sanjay Jain	103	881	46880
Amirhossein Sahebkar	100	1307	46132
Marc Davis	99	412	50243
Wenjun Zhang	96	976	38530
Jian Xu	94	1366	52057
Fortunato Ciardiello	94	695	47352
Tong Zhang	93	414	36519
Michael E. J. Lean	92	411	30939
Ashish K. Jha	87	503	30020
Xin Zhang	87	1714	40102
Theunis Piersma	86	632	34201
George Varghese	84	253	28598