Institution
Yahoo!
Company•London, United Kingdom•
About: Yahoo! is a company organization based out in London, United Kingdom. It is known for research contribution in the topics: Population & Web search query. The organization has 26749 authors who have published 29915 publications receiving 732583 citations. The organization is also known as: Yahoo! Inc. & Maudwen-Yahoo! Inc.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: Recent experimental and clinical data support the hypothesis that cigarette smoke exposure increases oxidative stress as a potential mechanism for initiating cardiovascular dysfunction.
2,064 citations
••
09 Jun 2008TL;DR: A new language called Pig Latin is described, designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce, which is an open-source, Apache-incubator project, and available for general use.
Abstract: There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural map-reduce programming model, and its associated scalable implementations on commodity hardware, is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse.We describe a new language called Pig Latin that we have designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. We give a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. We also report on a novel debugging environment that comes integrated with Pig, that can lead to even higher productivity gains. Pig is an open-source, Apache-incubator project, and available for general use.
2,058 citations
•
17 Dec 2009TL;DR: This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F.
Abstract: The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.
2,037 citations
••
01 Oct 2013TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
Abstract: The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agora---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.
2,006 citations
••
American Museum of Natural History1, Columbia University2, University of Hamburg3, Sao Paulo State University4, University of Richmond5, University of the Western Cape6, Natural History Museum7, University of Texas at Arlington8, Yahoo!9, Florida Fish and Wildlife Conservation Commission10, California Academy of Sciences11, University of Michigan12, National University of Colombia13, McGill University14
TL;DR: A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one, based on the largest phylogenetic analysis of living Amphibia so far accomplished, and many subsidiary taxa are demonstrated to be nonmonophyletic.
Abstract: The evidentiary basis of the currently accepted classification of living amphibians is discussed and shown not to warrant the degree of authority conferred on it by use and tradition. A new taxonomy of living amphibians is proposed to correct the deficiencies of the old one. This new taxonomy is based on the largest phylogenetic analysis of living Amphibia so far accomplished. We combined the comparative anatomical character evidence of Haas (2003) with DNA sequences from the mitochondrial transcription unit H1 (12S and 16S ribosomal RNA and tRNAValine genes, ≈ 2,400 bp of mitochondrial sequences) and the nuclear genes histone H3, rhodopsin, tyrosinase, and seven in absentia, and the large ribosomal subunit 28S (≈ 2,300 bp of nuclear sequences; ca. 1.8 million base pairs; x = 3.7 kb/terminal). The dataset includes 532 terminals sampled from 522 species representative of the global diversity of amphibians as well as seven of the closest living relatives of amphibians for outgroup comparisons. The...
1,994 citations
Authors
Showing all 26766 results
Name | H-index | Papers | Citations |
---|---|---|---|
Ashok Kumar | 151 | 5654 | 164086 |
Alexander J. Smola | 122 | 434 | 110222 |
Howard I. Maibach | 116 | 1821 | 60765 |
Sanjay Jain | 103 | 881 | 46880 |
Amirhossein Sahebkar | 100 | 1307 | 46132 |
Marc Davis | 99 | 412 | 50243 |
Wenjun Zhang | 96 | 976 | 38530 |
Jian Xu | 94 | 1366 | 52057 |
Fortunato Ciardiello | 94 | 695 | 47352 |
Tong Zhang | 93 | 414 | 36519 |
Michael E. J. Lean | 92 | 411 | 30939 |
Ashish K. Jha | 87 | 503 | 30020 |
Xin Zhang | 87 | 1714 | 40102 |
Theunis Piersma | 86 | 632 | 34201 |
George Varghese | 84 | 253 | 28598 |