Institution

Yahoo!

Company•London, United Kingdom•

About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.

...read moreread less

Topics: Population, Web search query, Web page, Web query classification, Query expansion ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The pathophysiology of cigarette smoking and cardiovascular disease: An update

[...]

John A. Ambrose¹, Rajat S. Barua²•Institutions (2)

Yahoo!¹, Icahn School of Medicine at Mount Sinai²

19 May 2004-Journal of the American College of Cardiology

TL;DR: Recent experimental and clinical data support the hypothesis that cigarette smoke exposure increases oxidative stress as a potential mechanism for initiating cardiovascular dysfunction.

...read moreread less

2,064 citations

Proceedings Article•DOI•

Pig latin: a not-so-foreign language for data processing

[...]

Christopher Olston¹, Benjamin Reed¹, Utkarsh Srivastava¹, Ravi Kumar¹, Andrew Tomkins¹ - Show less +1 more•Institutions (1)

Yahoo!¹

09 Jun 2008

TL;DR: A new language called Pig Latin is described, designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce, which is an open-source, Apache-incubator project, and available for general use.

...read moreread less

Abstract: There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural map-reduce programming model, and its associated scalable implementations on commodity hardware, is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse.We describe a new language called Pig Latin that we have designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. We give a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. We also report on a novel debugging environment that comes integrated with Pig, that can lead to even higher productivity gains. Pig is an open-source, Apache-incubator project, and available for general use.

...read moreread less

2,058 citations

Book•

The Probabilistic Relevance Framework

[...]

Stephen Robertson¹, Hugo Zaragoza²•Institutions (2)

Microsoft¹, Yahoo!²

17 Dec 2009

TL;DR: This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F.

...read moreread less

Abstract: The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.

...read moreread less

2,037 citations

Proceedings Article•DOI•

Apache Hadoop YARN: yet another resource negotiator

[...]

Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas¹, Sharad Agarwal, Mahadev Konar, Robert Evans², Thomas Graves², Jason Lowe², Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino¹, Owen O'Malley, Sanjay Radia, Benjamin Reed³, Eric Baldeschwieler - Show less +12 more•Institutions (3)

Microsoft¹, Yahoo!², Facebook³

01 Oct 2013

TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.

...read moreread less

Abstract: The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agora---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.

...read moreread less

2,006 citations

Journal Article•DOI•

The amphibian tree of life

[...]

Darrel R. Frost¹, Taran Grant², Taran Grant¹, Julián Faivovich², Julián Faivovich¹, Raoul H. Bain¹, Alexander Haas³, Célio F. B. Haddad⁴, Rafael O. de Sá⁵, Alan Channing⁶, Mark Wilkinson⁷, Stephen C. Donnellan, Christopher J. Raxworthy¹, Jonathan A. Campbell⁸, Boris L. Blotto⁹, Paul E. Moler¹⁰, Robert C. Drewes¹¹, Ronald A. Nussbaum¹², John D. Lynch¹³, David M. Green¹⁴, Ward C. Wheeler¹ - Show less +17 more•Institutions (14)

American Museum of Natural History¹, Columbia University², University of Hamburg³, Sao Paulo State University⁴, University of Richmond⁵, University of the Western Cape⁶, Natural History Museum⁷, University of Texas at Arlington⁸, Yahoo!⁹, Florida Fish and Wildlife Conservation Commission¹⁰, California Academy of Sciences¹¹, University of Michigan¹², National University of Colombia¹³, McGill University¹⁴

01 Jan 2006-Bulletin of the American Museum of Natural History

TL;DR: A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one, based on the largest phylogenetic analysis of living Amphibia so far accomplished, and many subsidiary taxa are demonstrated to be nonmonophyletic.

...read moreread less

Abstract: The evidentiary basis of the currently accepted classification of living amphibians is discussed and shown not to warrant the degree of authority conferred on it by use and tradition. A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one. This new taxonomy is based on the largest phylogenetic analysis of living Amphibia so far accomplished. We combined the comparative anatomical character evidence of Haas (2003) with DNA sequences from the mitochondrial transcription unit H1 (12S and 16S ribosomal RNA and tRNAValine genes, ≈ 2,400 bp of mitochondrial sequences) and the nuclear genes histone H3, rhodopsin, tyrosinase, and seven in absentia, and the large ribosomal subunit 28S (≈ 2,300 bp of nuclear sequences; ca. 1.8 million base pairs; x = 3.7 kb/terminal). The dataset includes 532 terminals sampled from 522 species representative of the global diversity of amphibians as well as seven of the closest living relatives of amphibians for outgroup comparisons. The...

...read moreread less

1,994 citations

Collapse

Authors

Showing all 26766 results

Name	H-index	Papers	Citations
Ashok Kumar	151	5654	164086
Alexander J. Smola	122	434	110222
Howard I. Maibach	116	1821	60765
Sanjay Jain	103	881	46880
Amirhossein Sahebkar	100	1307	46132
Marc Davis	99	412	50243
Wenjun Zhang	96	976	38530
Jian Xu	94	1366	52057
Fortunato Ciardiello	94	695	47352
Tong Zhang	93	414	36519
Michael E. J. Lean	92	411	30939
Ashish K. Jha	87	503	30020
Xin Zhang	87	1714	40102
Theunis Piersma	86	632	34201
George Varghese	84	253	28598