scispace - formally typeset
Search or ask a question
Institution

Yahoo!

CompanyLondon, United Kingdom
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.


Papers
More filters
Journal ArticleDOI
TL;DR: Recent experimental and clinical data support the hypothesis that cigarette smoke exposure increases oxidative stress as a potential mechanism for initiating cardiovascular dysfunction.

2,064 citations

Proceedings ArticleDOI
Christopher Olston1, Benjamin Reed1, Utkarsh Srivastava1, Ravi Kumar1, Andrew Tomkins1 
09 Jun 2008
TL;DR: A new language called Pig Latin is described, designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce, which is an open-source, Apache-incubator project, and available for general use.
Abstract: There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural map-reduce programming model, and its associated scalable implementations on commodity hardware, is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse.We describe a new language called Pig Latin that we have designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. We give a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. We also report on a novel debugging environment that comes integrated with Pig, that can lead to even higher productivity gains. Pig is an open-source, Apache-incubator project, and available for general use.

2,058 citations

Book
17 Dec 2009
TL;DR: This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F.
Abstract: The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.

2,037 citations

Proceedings ArticleDOI
01 Oct 2013
TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
Abstract: The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agora---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.

2,006 citations

Journal ArticleDOI
TL;DR: A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one, based on the largest phylogenetic analysis of living Amphibia so far accomplished, and many subsidiary taxa are demonstrated to be nonmonophyletic.
Abstract: The evidentiary basis of the currently accepted classification of living amphibians is discussed and shown not to warrant the degree of authority conferred on it by use and tradition. A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one. This new taxonomy is based on the largest phylogenetic analysis of living Amphibia so far accomplished. We combined the comparative anatomical character evidence of Haas (2003) with DNA sequences from the mitochondrial transcription unit H1 (12S and 16S ribosomal RNA and tRNAValine genes, ≈ 2,400 bp of mitochondrial sequences) and the nuclear genes histone H3, rhodopsin, tyrosinase, and seven in absentia, and the large ribosomal subunit 28S (≈ 2,300 bp of nuclear sequences; ca. 1.8 million base pairs; x = 3.7 kb/terminal). The dataset includes 532 terminals sampled from 522 species representative of the global diversity of amphibians as well as seven of the closest living relatives of amphibians for outgroup comparisons. The...

1,994 citations


Authors

Showing all 26766 results

NameH-indexPapersCitations
Ashok Kumar1515654164086
Alexander J. Smola122434110222
Howard I. Maibach116182160765
Sanjay Jain10388146880
Amirhossein Sahebkar100130746132
Marc Davis9941250243
Wenjun Zhang9697638530
Jian Xu94136652057
Fortunato Ciardiello9469547352
Tong Zhang9341436519
Michael E. J. Lean9241130939
Ashish K. Jha8750330020
Xin Zhang87171440102
Theunis Piersma8663234201
George Varghese8425328598
Network Information
Related Institutions (5)
University of Toronto
294.9K papers, 13.5M citations

85% related

University of California, San Diego
204.5K papers, 12.3M citations

85% related

University College London
210.6K papers, 9.8M citations

84% related

Cornell University
235.5K papers, 12.2M citations

84% related

University of Washington
305.5K papers, 17.7M citations

84% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20232
202247
20211,088
20201,074
20191,568
20181,352