Author
Erich D. Jarvis
Other affiliations: Duke University, City University of New York, Rockefeller University
Bio: Erich D. Jarvis is an academic researcher from Howard Hughes Medical Institute. The author has contributed to research in topics: Genome & Vocal learning. The author has an hindex of 68, co-authored 215 publications receiving 21093 citations. Previous affiliations of Erich D. Jarvis include Duke University & City University of New York.
Topics: Genome, Vocal learning, Medicine, Genomics, Biology
Papers published on a yearly basis
Papers
More filters
••
Nippon Telegraph and Telephone1, Yokohama City University2, Keio University3, University of Tsukuba4, University of Queensland5, J. Craig Venter Institute6, National Institutes of Health7, Osaka University8, Novartis9, Boys Town10, Medical Research Council11, Scripps Research Institute12, University of Oregon13, Rockefeller University14, University of Milan15, Discovery Institute16, Harvard University17, University of Tokyo18, University of Edinburgh19, Duke University20, University of Texas Southwestern Medical Center21, Karolinska Institutet22, Cambridge University Hospitals NHS Foundation Trust23, Canberra Hospital24, Hyogo College of Medicine25, Wellcome Trust Sanger Institute26, University of California, San Diego27, University of Bonn28, Washington University in St. Louis29, Massachusetts Institute of Technology30
TL;DR: The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Abstract: Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences These are clustered into 33,409 'transcriptional units', contributing 901% of a newly established mouse transcriptome database Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome 41% of all transcriptional units showed evidence of alternative splicing In protein-coding transcripts, 79% of splice variations altered the protein product Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics
1,663 citations
••
Duke University1, University of Texas at Austin2, Heidelberg Institute for Theoretical Studies3, Xi'an Jiaotong University4, Beijing Genomics Institute5, American Museum of Natural History6, New Mexico State University7, University of Sydney8, University of California9, Uppsala University10, University of Copenhagen11, Okinawa Institute of Science and Technology12, University of Georgia13, Griffith University14, Catalan Institution for Research and Advanced Studies15, Oak Ridge National Laboratory16, Joint Institute for Nuclear Research17, Aarhus University18, Washington University in St. Louis19, University of California, Santa Cruz20, Cardiff University21, Kunming Institute of Zoology22, China Agricultural University23, Tulane University24, Louisiana State University25, Copenhagen Zoo26, Federal University of Pará27, Oregon Health & Science University28, Technical University of Denmark29, Canterbury Museum30, Curtin University31, Novosibirsk State University32, Smithsonian Institution33, National University of Singapore34, National Museum of Natural History35, Nova Southeastern University36, Occidental College37, University of Edinburgh38, Harvard University39, University of California, San Francisco40, University of Florida41, University of Illinois at Urbana–Champaign42
TL;DR: A genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves recovered a highly resolved tree that confirms previously controversial sister or close relationships and identifies the first divergence in Neoaves, two groups the authors named Passerea and Columbea.
Abstract: To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
1,624 citations
••
University of Tennessee Health Science Center1, University of Washington2, Creighton University3, George Mason University4, Semmelweis University5, University of Arkansas System6, University of Murcia7, Neuroscience Research Australia8, University of South Florida9, University of California, Irvine10, University of Auckland11, Johns Hopkins University12, City University of New York13, Ruhr University Bochum14, California State University, Long Beach15, Oregon Health & Science University16, St. John's University17, University of California, Los Angeles18, Bowling Green State University19, Duke University20, Newcastle University21, University of Chicago22
TL;DR: The standard nomenclature that has been used for many telencephalic and related brainstem structures in birds is reviewed, with a rationale for each name change and evidence for any homologies implied by the new names.
Abstract: The standard nomenclature that has been used for many telencephalic and related brainstem structures in birds is based on flawed assumptions of homology to mammals. In particular, the outdated terminology implies that most of the avian telencephalon is a hypertrophied basal ganglia, when it is now clear that most of the avian telencephalon is neurochemically, hodologically, and functionally comparable to the mammalian neocortex, claustrum, and pallial amygdala (all of which derive from the pallial sector of the developing telencephalon). Recognizing that this promotes misunderstanding of the functional organization of avian brains and their evolutionary relationship to mammalian brains, avian brain specialists began discussions to rectify this problem, culminating in the Avian Brain Nomenclature Forum held at Duke University in July 2002, which approved a new terminology for avian telencephalon and some allied brainstem cell groups. Details of this new terminology are presented here, as is a rationale for each name change and evidence for any homologies implied by the new names. Revisions for the brainstem focused on vocal control, catecholaminergic, cholinergic, and basal ganglia-related nuclei. For example, the Forum recognized that the hypoglossal nucleus had been incorrectly identified as the nucleus intermedius in the Karten and Hodos (1967) pigeon brain atlas, and what was identified as the hypoglossal nucleus in that atlas should instead be called the supraspinal nucleus. The locus ceruleus of this and other avian atlases was noted to consist of a caudal noradrenergic part homologous to the mammalian locus coeruleus and a rostral region corresponding to the mammalian A8 dopaminergic cell group. The midbrain dopaminergic cell group in birds known as the nucleus tegmenti pedunculopontinus pars compacta was recognized as homologous to the mammalian substantia nigra pars compacta and was renamed accordingly; a group of gamma-aminobutyric acid (GABA)ergic neurons at the lateral edge of this region was identified as homologous to the mammalian substantia nigra pars reticulata and was also renamed accordingly. A field of cholinergic neurons in the rostral avian hindbrain was named the nucleus pedunculopontinus tegmenti, whereas the anterior nucleus of the ansa lenticularis in the avian diencephalon was renamed the subthalamic nucleus, both for their evident mammalian homologues. For the basal (i.e., subpallial) telencephalon, the actual parts of the basal ganglia were given names reflecting their now evident homologues. For example, the lobus parolfactorius and paleostriatum augmentatum were acknowledged to make up the dorsal subdivision of the striatal part of the basal ganglia and were renamed as the medial and lateral striatum. The paleostriatum primitivum was recognized as homologous to the mammalian globus pallidus and renamed as such. Additionally, the rostroventral part of what was called the lobus parolfactorius was acknowledged as comparable to the mammalian nucleus accumbens, which, together with the olfactory tubercle, was noted to be part of the ventral striatum in birds. A ventral pallidum, a basal cholinergic cell group, and medial and lateral bed nuclei of the stria terminalis were also recognized. The dorsal (i.e., pallial) telencephalic regions that had been erroneously named to reflect presumed homology to striatal parts of mammalian basal ganglia were renamed as part of the pallium, using prefixes that retain most established abbreviations, to maintain continuity with the outdated nomenclature. We concluded, however, that one-to-one (i.e., discrete) homologies with mammals are still uncertain for most of the telencephalic pallium in birds and thus the new pallial terminology is largely devoid of assumptions of one-to-one homologies with mammals. The sectors of the hyperstriatum composing the Wulst (i.e., the hyperstriatum accessorium intermedium, and dorsale), the hyperstriatum ventrale, the neostriatum, and the archistriatum have been renamed (respectively) the hyperpallium (hypertrophied pallium), the mesopallium (middle pallium), the nidopallium (nest pallium), and the arcopallium (arched pallium). The posterior part of the archistriatum has been renamed the posterior pallial amygdala, the nucleus taeniae recognized as part of the avian amygdala, and a region inferior to the posterior paleostriatum primitivum included as a subpallial part of the avian amygdala. The names of some of the laminae and fiber tracts were also changed to reflect current understanding of the location of pallial and subpallial sectors of the avian telencephalon. Notably, the lamina medularis dorsalis has been renamed the pallial-subpallial lamina. We urge all to use this new terminology, because we believe it will promote better communication among neuroscientists. Further information is available at http://avianbrain.org
1,061 citations
••
TL;DR: This work introduces a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences, leading to substantially better assemblies than current sequencing strategies.
Abstract: Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
987 citations
••
University of Copenhagen1, Beijing Genomics Institute2, Royal Veterinary College3, Seoul National University4, University of Nebraska–Lincoln5, University of Porto6, University of South Carolina7, Montclair State University8, Uppsala University9, National University of Singapore10, University of California, Berkeley11, South China University of Technology12, Chinese Academy of Sciences13, Kunming Institute of Zoology14, Howard Hughes Medical Institute15, Aberystwyth University16, University of Kent17, University of California, Riverside18, Mississippi State University19, Austral University of Chile20, Swedish University of Agricultural Sciences21, China Agricultural University22, Cardiff University23, Copenhagen Zoo24, Louisiana State University25, Washington University in St. Louis26, Xi'an Jiaotong University27, University of California, Santa Cruz28, Nova Southeastern University Oceanographic Center29, Smithsonian Conservation Biology Institute30, National Museum of Natural History31, Natural History Museum32, University of California, San Francisco33, Harvard University34, University of Florida35, University of Edinburgh36, New Mexico State University37, Macau University of Science and Technology38, Curtin University39
TL;DR: This work explored bird macroevolution using full genomes from 48 avian species representing all major extant clades to reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
Abstract: Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
872 citations
Cited by
More filters
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。
18,940 citations
••
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
14,103 citations
•
9,185 citations
••
TL;DR: Phylogenetic analysis of the retrieved rRNA sequence of an uncultured microorganism reveals its closest culturable relatives and may, together with information on the physicochemical conditions of its natural habitat, facilitate more directed cultivation attempts.
9,017 citations
•
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
8,059 citations