Home
/
Authors
/
Ye Yin

Author

Ye Yin

Other affiliations: University of Copenhagen, Beijing Institute of Genomics

Bio: Ye Yin is an academic researcher from Beijing Genomics Institute. The author has contributed to research in topics: Genome & Whole genome sequencing. The author has an hindex of 28, co-authored 40 publications receiving 22902 citations. Previous affiliations of Ye Yin include University of Copenhagen & Beijing Institute of Genomics.

Topics: Genome, Whole genome sequencing, Population, Domestication, Exome sequencing ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A global reference for human genetic variation.

[...]

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

...read moreread less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

12,661 citations

A global reference for human genetic variation

[...]

Adam Auton, Gonçalo R. Abecasis, David Altshuler, Richard Durbin +476 more

01 Oct 2015

TL;DR: The 1000 Genomes Project as mentioned in this paper provided a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and reported the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole genome sequencing, deep exome sequencing and dense microarray genotyping.

...read moreread less

3,247 citations

Journal Article•DOI•

Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life.

[...]

Fredrik Bäckhed¹, Fredrik Bäckhed², Josefine Roswall², Yangqing Peng, Qiang Feng¹, Huijue Jia, Petia Kovatcheva-Datchary², Yin Li, Yan Xia, Hailiang Xie, Huanzi Zhong, Muhammad Tanweer Khan², Jianfeng Zhang, Junhua Li, Liang Xiao, Jumana Y. Al-Aama³, Dongya Zhang, Ying Shiuan Lee², Dorota Ewa Kotowska¹, Camilla Colding¹, Valentina Tremaroli², Ye Yin, Stefan Bergman², Xun Xu, Lise Madsen¹, Lise Madsen⁴, Karsten Kristiansen¹, Jovanna Dahlgren², Jun Wang - Show less +25 more•Institutions (4)

University of Copenhagen¹, University of Gothenburg², King Abdulaziz University³, National Institute of Nutrition, Hyderabad⁴

13 May 2015-Cell Host & Microbe

TL;DR: The gut microbiota of infants delivered by C-section showed significantly less resemblance to their mothers and nutrition had a major impact on early microbiota composition and function, with cessation of breast-feeding, rather than introduction of solid food, being required for maturation into an adult-like microbiota.

...read moreread less

2,227 citations

Journal Article•DOI•

The genome of the mesopolyploid crop species Brassica rapa

[...]

Xiaowu Wang¹, Hanzhong Wang, Jun Wang², Jun Wang³, Jun Wang⁴, Rifei Sun, Jian Wu, Shengyi Liu, Yinqi Bai⁴, Jeong-Hwan Mun⁵, Ian Bancroft⁶, Feng Cheng, Sanwen Huang, Xixiang Li, Wei Hua, Junyi Wang⁴, Xiyin Wang⁷, Xiyin Wang⁸, Michael Freeling⁹, J. Chris Pires¹⁰, Andrew H. Paterson⁷, Boulos Chalhoub, Bo Wang⁴, Alice Hayward¹¹, Alice Hayward¹², Andrew G. Sharpe¹³, Beom-Seok Park⁵, Bernd Weisshaar¹⁴, Binghang Liu⁴, Bo Li⁴, Bo Liu, Chaobo Tong, Chi Song⁴, Chris Duran¹², Chris Duran¹⁵, Chunfang Peng⁴, Geng Chunyu⁴, Chushin Koh¹³, Chuyu Lin⁴, David Edwards¹², David Edwards¹⁵, Desheng Mu⁴, Di Shen, Eleni Soumpourou⁶, Fei Li, Fiona Fraser⁶, Gavin C. Conant¹⁰, Gilles Lassalle¹⁶, Graham J.W. King², Guusje Bonnema¹⁷, Haibao Tang⁹, Haiping Wang, Harry Belcram, Heling Zhou⁴, Hideki Hirakawa, Hiroshi Abe, Hui Guo⁷, Hui Wang, Huizhe Jin⁷, Isobel A. P. Parkin¹⁸, Jacqueline Batley¹¹, Jacqueline Batley¹², Jeong-Sun Kim⁵, Jérémy Just, Jianwen Li⁴, Jiaohui Xu⁴, Jie Deng, Jin A Kim⁵, Jingping Li⁷, Jingyin Yu, Jinling Meng¹⁹, Jinpeng Wang⁸, Jiumeng Min⁴, Julie Poulain²⁰, Katsunori Hatakeyama, Kui Wu⁴, Li Wang⁸, Lu Fang, Martin Trick⁶, Matthew G. Links¹⁸, Meixia Zhao, Mina Jin⁵, Nirala Ramchiary²¹, Nizar Drou²², Paul J. Berkman¹⁵, Paul J. Berkman¹², Qingle Cai⁴, Quanfei Huang⁴, Ruiqiang Li⁴, Satoshi Tabata, Shifeng Cheng⁴, Shu Zhang⁴, Shujiang Zhang, Shunmou Huang, Shusei Sato, Silong Sun, Soo-Jin Kwon⁵, Su-Ryun Choi²¹, Tae-Ho Lee⁷, Wei Fan⁴, Xiang Zhao⁴, Xu Tan⁷, Xun Xu⁴, Yan Wang, Yang Qiu, Ye Yin⁴, Yingrui Li⁴, Yongchen Du, Yongcui Liao, Yong Pyo Lim²¹, Yoshihiro Narusaka, Yupeng Wang⁸, Zhenyi Wang⁸, Zhenyu Li⁴, Zhiwen Wang⁴, Zhiyong Xiong¹⁰, Zhonghua Zhang - Show less +113 more•Institutions (22)

Civil Aviation Authority of Singapore¹, Rothamsted Research², University of Copenhagen³, Beijing Institute of Genomics⁴, Rural Development Administration⁵, John Innes Centre⁶, University of Georgia⁷, North China University of Science and Technology⁸, University of California, Berkeley⁹, University of Missouri¹⁰, Australian Research Council¹¹, University of Queensland¹², National Research Council¹³, Bielefeld University¹⁴, Australian Centre for Plant Functional Genomics¹⁵, University of Rennes¹⁶, Wageningen University and Research Centre¹⁷, Agriculture and Agri-Food Canada¹⁸, Huazhong Agricultural University¹⁹, French Alternative Energies and Atomic Energy Commission²⁰, Chungnam National University²¹, Norwich Research Park²²

01 Oct 2011-Nature Genetics

TL;DR: The annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage, and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.

...read moreread less

Abstract: We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.

...read moreread less

1,811 citations

Journal Article•DOI•

The oyster genome reveals stress adaptation and complexity of shell formation

[...]

Guofan Zhang¹, Xiaodong Fang, Ximing Guo², Li Li, Ruibang Luo, Fei Xu, Pengcheng Yang, Linlin Zhang, Xiaotong Wang, Haigang Qi, Zhiqiang Xiong, Huayong Que, Yinlong Xie, Peter W. H. Holland³, Jordi Paps³, Yabing Zhu, Fucun Wu, Yuanxin Chen, Jiafeng Wang, Chunfang Peng, Jie Meng, Lan Yang, Jun Liu, Bo Wen, Na Zhang, Zhiyong Huang, Qihui Zhu, Yue Feng, Andrew S. Mount⁴, Dennis Hedgecock⁵, Zhe Xu⁶, Yunjie Liu, Tomislav Domazet-Lošo, Yishuai Du, Xiaoqing Sun, Shoudu Zhang, Binghang Liu, Peizhou Cheng, Xuanting Jiang, Juan Li, Dingding Fan, Wei Wang, Wenjing Fu, Tong Wang, Bo Wang, Jibiao Zhang, Zhiyu Peng, Yingxiang Li, Na Li, Jinpeng Wang, Maoshan Chen, Yan He², Fengji Tan, Xiaorui Song, Qiumei Zheng, Ronglian Huang, Hailong Yang, Du Xuedi, Li Chen, Mei Yang, Patrick M. Gaffney⁷, Shan Wang², Longhai Luo, Zhicai She, Yao Ming, Huang Wen, Shu Zhang, Baoyu Huang, Yong Zhang, Tao Qu, Peixiang Ni, Guoying Miao, Junyi Wang, Qiang Wang, Christian E. W. Steinberg⁸, Haiyan Wang, Ning Li, Lumin Qian², Guojie Zhang, Yingrui Li, Huanming Yang, Xiao Liu, Jian Wang, Ye Yin, Jun Wang⁹ - Show less +81 more•Institutions (9)

Chinese Academy of Sciences¹, Rutgers University², University of Oxford³, Clemson University⁴, University of Southern California⁵, Atlantic Cape Community College⁶, University of Delaware⁷, Humboldt University of Berlin⁸, University of Copenhagen⁹

04 Oct 2012-Nature

TL;DR: The sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy and transcriptomes of development and stress response and the proteome of the shell are reported, showing that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes.

...read moreread less

Abstract: The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster's adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster genome sequence fills a void in our understanding of the Lophotrochozoa.

...read moreread less

1,806 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

limma powers differential expression analyses for RNA-sequencing and microarray studies

[...]

Matthew E. Ritchie¹, Belinda Phipson², Di Wu³, Yifang Hu¹, Charity W. Law⁴, Wei Shi¹, Gordon K. Smyth¹, Gordon K. Smyth⁵ - Show less +4 more•Institutions (5)

Walter and Eliza Hall Institute of Medical Research¹, Royal Children's Hospital², Harvard University³, University of Zurich⁴, University of Melbourne⁵

20 Apr 2015-Nucleic Acids Research

TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

...read moreread less

22,147 citations

Journal Article•DOI•

Analysis of protein-coding genetic variation in 60,706 humans

[...]

Monkol Lek, Konrad J. Karczewski¹, Konrad J. Karczewski², Eric Vallabh Minikel², Eric Vallabh Minikel¹, Kaitlin E. Samocha, Eric Banks¹, Timothy Fennell¹, Anne H. O’Donnell-Luria¹, Anne H. O’Donnell-Luria², Anne H. O’Donnell-Luria³, James S. Ware, Andrew J. Hill⁴, Andrew J. Hill², Andrew J. Hill¹, Beryl B. Cummings², Beryl B. Cummings¹, Taru Tukiainen², Taru Tukiainen¹, Daniel P. Birnbaum¹, Jack A. Kosmicki, Laramie E. Duncan¹, Laramie E. Duncan², Karol Estrada¹, Karol Estrada², Fengmei Zhao², Fengmei Zhao¹, James Zou¹, Emma Pierce-Hoffman², Emma Pierce-Hoffman¹, Joanne Berghout⁵, David Neil Cooper⁶, Nicole A. Deflaux⁷, Mark A. DePristo¹, Ron Do, Jason Flannick², Jason Flannick¹, Menachem Fromer, Laura D. Gauthier¹, Jackie Goldstein¹, Jackie Goldstein², Namrata Gupta¹, Daniel P. Howrigan², Daniel P. Howrigan¹, Adam Kiezun¹, Mitja I. Kurki¹, Mitja I. Kurki², Ami Levy Moonshine¹, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso², Gina M. Peloso¹, Ryan Poplin¹, Manuel A. Rivas¹, Valentin Ruano-Rubio¹, Samuel A. Rose¹, Douglas M. Ruderfer⁸, Khalid Shakir¹, Peter D. Stenson⁶, Christine Stevens¹, Brett Thomas², Brett Thomas¹, Grace Tiao¹, María Teresa Tusié-Luna, Ben Weisburd¹, Hong-Hee Won⁹, Dongmei Yu, David Altshuler¹, David Altshuler¹⁰, Diego Ardissino, Michael Boehnke¹¹, John Danesh¹², Stacey Donnelly¹, Roberto Elosua, Jose C. Florez², Jose C. Florez¹, Stacey Gabriel¹, Gad Getz², Gad Getz¹, Stephen J. Glatt¹³, Christina M. Hultman¹⁴, Sekar Kathiresan, Markku Laakso¹⁵, Steven A. McCarroll², Steven A. McCarroll¹, Mark I. McCarthy¹⁶, Mark I. McCarthy¹⁷, Dermot P.B. McGovern¹⁸, Ruth McPherson¹⁹, Benjamin M. Neale², Benjamin M. Neale¹, Aarno Palotie, Shaun Purcell⁸, Danish Saleheen²⁰, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan²¹, Patrick F. Sullivan¹⁴, Jaakko Tuomilehto²², Ming T. Tsuang²³, Hugh Watkins¹⁷, Hugh Watkins¹⁶, James G. Wilson²⁴, Mark J. Daly², Mark J. Daly¹, Daniel G. MacArthur¹, Daniel G. MacArthur² - Show less +103 more•Institutions (24)

Broad Institute¹, Harvard University², Boston Children's Hospital³, University of Washington⁴, University of Arizona⁵, Cardiff University⁶, Google⁷, Icahn School of Medicine at Mount Sinai⁸, Samsung Medical Center⁹, Vertex Pharmaceuticals¹⁰, University of Michigan¹¹, University of Cambridge¹², State University of New York Upstate Medical University¹³, Karolinska Institutet¹⁴, University of Eastern Finland¹⁵, University of Oxford¹⁶, Wellcome Trust Centre for Human Genetics¹⁷, Cedars-Sinai Medical Center¹⁸, University of Ottawa¹⁹, University of Pennsylvania²⁰, University of North Carolina at Chapel Hill²¹, University of Helsinki²², University of California, San Diego²³, University of Mississippi Medical Center²⁴

18 Aug 2016-Nature

TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.

...read moreread less

Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

...read moreread less

8,758 citations

Journal Article•DOI•

Salmon provides fast and bias-aware quantification of transcript expression

[...]

Rob Patro¹, Geet Duggal, Michael I. Love², Rafael A. Irizarry², Carl Kingsford³ - Show less +1 more•Institutions (3)

Stony Brook University¹, Harvard University², Carnegie Mellon University³

01 Apr 2017-Nature Methods

TL;DR: Salmon is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.

...read moreread less

Abstract: We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA-seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.

...read moreread less

6,095 citations

Journal Article•DOI•

The UK Biobank resource with deep phenotyping and genomic data

[...]

Clare Bycroft¹, Colin Freeman¹, Desislava Petkova², Desislava Petkova¹, Gavin Band¹, Lloyd T. Elliott¹, Kevin Sharp¹, Allan Motyer³, Damjan Vukcevic³, Olivier Delaneau⁴, Olivier Delaneau⁵, Jared O'Connell⁶, Adrian Cortes¹, Adrian Cortes⁷, Samantha Welsh, Alan Young¹, Mark Effingham, Gil McVean¹, Stephen Leslie³, Naomi E. Allen¹, Peter Donnelly¹, Jonathan Marchini¹ - Show less +18 more•Institutions (7)

University of Oxford¹, Procter & Gamble², University of Melbourne³, University of Geneva⁴, Swiss Institute of Bioinformatics⁵, Illumina⁶, John Radcliffe Hospital⁷

11 Oct 2018-Nature

TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.

...read moreread less

Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

...read moreread less

4,489 citations

Journal Article•DOI•

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

[...]

Nuala A. O'Leary¹, Mathew W. Wright¹, J. Rodney Brister¹, Stacy Ciufo¹, Diana Haddad¹, Richard McVeigh¹, Bhanu Rajput¹, Barbara Robbertse¹, Brian Smith-White¹, Danso Ako-adjei¹, Alexander Astashyn¹, Azat Badretdin¹, Yiming Bao¹, Olga Blinkova¹, Vyacheslav Brover¹, Vyacheslav Chetvernin¹, Jinna Choi¹, Eric Cox¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Tamara Goldfarb¹, Tripti Gupta¹, Daniel H. Haft¹, Eneida L. Hatcher¹, Wratko Hlavina¹, Vinita Joardar¹, Vamsi K. Kodali¹, Wenjun Li¹, Donna Maglott¹, Patrick Masterson¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Kathleen O'Neill¹, Shashikant Pujar¹, Sanjida H. Rangwala¹, Daniel Rausch¹, Lillian D. Riddick¹, Conrad L. Schoch¹, Andrei Shkeda¹, Susan S. Storz¹, Hanzhen Sun¹, Françoise Thibaud-Nissen¹, Igor Tolstoy¹, Raymond E. Tully¹, Anjana R. Vatsan¹, Craig Wallin¹, David Webb¹, Wendy Wu¹, Melissa J. Landrum¹, Avi Kimchi¹, Tatiana Tatusova¹, Michael DiCuccio¹, Paul Kitts¹, Terence Murphy¹, Kim D. Pruitt¹ - Show less +51 more•Institutions (1)

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.

...read moreread less

Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

...read moreread less

4,104 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse