Home
/
Authors
/
Junyi Wang

Author

Junyi Wang

Other affiliations: Chinese Academy of Sciences, Beijing Institute of Genomics

Bio: Junyi Wang is an academic researcher from Beijing Genomics Institute. The author has contributed to research in topics: Genome & Whole genome sequencing. The author has an hindex of 26, co-authored 31 publications receiving 12740 citations. Previous affiliations of Junyi Wang include Chinese Academy of Sciences & Beijing Institute of Genomics.

Topics: Genome, Whole genome sequencing, Genome evolution, Gene, Ploidy ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The genome of the mesopolyploid crop species Brassica rapa

[...]

Xiaowu Wang¹, Hanzhong Wang, Jun Wang², Jun Wang³, Jun Wang⁴, Rifei Sun, Jian Wu, Shengyi Liu, Yinqi Bai³, Jeong-Hwan Mun⁵, Ian Bancroft⁶, Feng Cheng, Sanwen Huang, Xixiang Li, Wei Hua, Junyi Wang³, Xiyin Wang⁷, Xiyin Wang⁸, Michael Freeling⁹, J. Chris Pires¹⁰, Andrew H. Paterson⁷, Boulos Chalhoub, Bo Wang³, Alice Hayward¹¹, Alice Hayward¹², Andrew G. Sharpe¹³, Beom-Seok Park⁵, Bernd Weisshaar¹⁴, Binghang Liu³, Bo Li³, Bo Liu, Chaobo Tong, Chi Song³, Chris Duran¹², Chris Duran¹⁵, Chunfang Peng³, Geng Chunyu³, Chushin Koh¹³, Chuyu Lin³, David Edwards¹⁵, David Edwards¹², Desheng Mu³, Di Shen, Eleni Soumpourou⁶, Fei Li, Fiona Fraser⁶, Gavin C. Conant¹⁰, Gilles Lassalle¹⁶, Graham J.W. King⁴, Guusje Bonnema¹⁷, Haibao Tang⁹, Haiping Wang, Harry Belcram, Heling Zhou³, Hideki Hirakawa, Hiroshi Abe, Hui Guo⁷, Hui Wang, Huizhe Jin⁷, Isobel A. P. Parkin¹⁸, Jacqueline Batley¹¹, Jacqueline Batley¹², Jeong-Sun Kim⁵, Jérémy Just, Jianwen Li³, Jiaohui Xu³, Jie Deng, Jin A Kim⁵, Jingping Li⁷, Jingyin Yu, Jinling Meng¹⁹, Jinpeng Wang⁸, Jiumeng Min³, Julie Poulain²⁰, Katsunori Hatakeyama, Kui Wu³, Li Wang⁸, Lu Fang, Martin Trick⁶, Matthew G. Links¹⁸, Meixia Zhao, Mina Jin⁵, Nirala Ramchiary²¹, Nizar Drou²², Paul J. Berkman¹², Paul J. Berkman¹⁵, Qingle Cai³, Quanfei Huang³, Ruiqiang Li³, Satoshi Tabata, Shifeng Cheng³, Shu Zhang³, Shujiang Zhang, Shunmou Huang, Shusei Sato, Silong Sun, Soo-Jin Kwon⁵, Su-Ryun Choi²¹, Tae-Ho Lee⁷, Wei Fan³, Xiang Zhao³, Xu Tan⁷, Xun Xu³, Yan Wang, Yang Qiu, Ye Yin³, Yingrui Li³, Yongchen Du, Yongcui Liao, Yong Pyo Lim²¹, Yoshihiro Narusaka, Yupeng Wang⁸, Zhenyi Wang⁸, Zhenyu Li³, Zhiwen Wang³, Zhiyong Xiong¹⁰, Zhonghua Zhang - Show less +113 more•Institutions (22)

Civil Aviation Authority of Singapore¹, University of Copenhagen², Beijing Institute of Genomics³, Rothamsted Research⁴, Rural Development Administration⁵, John Innes Centre⁶, University of Georgia⁷, North China University of Science and Technology⁸, University of California, Berkeley⁹, University of Missouri¹⁰, Australian Research Council¹¹, University of Queensland¹², National Research Council¹³, Bielefeld University¹⁴, Australian Centre for Plant Functional Genomics¹⁵, University of Rennes¹⁶, Wageningen University and Research Centre¹⁷, Agriculture and Agri-Food Canada¹⁸, Huazhong Agricultural University¹⁹, French Alternative Energies and Atomic Energy Commission²⁰, Chungnam National University²¹, Norwich Research Park²²

01 Oct 2011-Nature Genetics

TL;DR: The annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage, and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.

...read moreread less

Abstract: We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.

...read moreread less

1,811 citations

Journal Article•DOI•

The oyster genome reveals stress adaptation and complexity of shell formation

[...]

Guofan Zhang¹, Xiaodong Fang, Ximing Guo², Li Li, Ruibang Luo, Fei Xu, Pengcheng Yang, Linlin Zhang, Xiaotong Wang, Haigang Qi, Zhiqiang Xiong, Huayong Que, Yinlong Xie, Peter W. H. Holland³, Jordi Paps³, Yabing Zhu, Fucun Wu, Yuanxin Chen, Jiafeng Wang, Chunfang Peng, Jie Meng, Lan Yang, Jun Liu, Bo Wen, Na Zhang, Zhiyong Huang, Qihui Zhu, Yue Feng, Andrew S. Mount⁴, Dennis Hedgecock⁵, Zhe Xu⁶, Yunjie Liu, Tomislav Domazet-Lošo, Yishuai Du, Xiaoqing Sun, Shoudu Zhang, Binghang Liu, Peizhou Cheng, Xuanting Jiang, Juan Li, Dingding Fan, Wei Wang, Wenjing Fu, Tong Wang, Bo Wang, Jibiao Zhang, Zhiyu Peng, Yingxiang Li, Na Li, Jinpeng Wang, Maoshan Chen, Yan He², Fengji Tan, Xiaorui Song, Qiumei Zheng, Ronglian Huang, Hailong Yang, Du Xuedi, Li Chen, Mei Yang, Patrick M. Gaffney⁷, Shan Wang², Longhai Luo, Zhicai She, Yao Ming, Huang Wen, Shu Zhang, Baoyu Huang, Yong Zhang, Tao Qu, Peixiang Ni, Guoying Miao, Junyi Wang, Qiang Wang, Christian E. W. Steinberg⁸, Haiyan Wang, Ning Li, Lumin Qian², Guojie Zhang, Yingrui Li, Huanming Yang, Xiao Liu, Jian Wang, Ye Yin, Jun Wang⁹ - Show less +81 more•Institutions (9)

Chinese Academy of Sciences¹, Rutgers University², University of Oxford³, Clemson University⁴, University of Southern California⁵, Atlantic Cape Community College⁶, University of Delaware⁷, Humboldt University of Berlin⁸, University of Copenhagen⁹

04 Oct 2012-Nature

TL;DR: The sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy and transcriptomes of development and stress response and the proteome of the shell are reported, showing that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes.

...read moreread less

Abstract: The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster's adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster genome sequence fills a void in our understanding of the Lophotrochozoa.

...read moreread less

1,806 citations

Journal Article•DOI•

The sequence and de novo assembly of the giant panda genome

[...]

Ruiqiang Li, Wei Fan, Geng Tian¹, Hongmei Zhu, Lin He², Lin He³, Jing Cai⁴, Jing Cai¹, Quanfei Huang, Qingle Cai⁵, Bo Li, Yinqi Bai, Zhihe Zhang⁶, Ya-Ping Zhang⁴, Wen Wang⁴, Jun Li, Fuwen Wei¹, Heng Li⁷, Min Jian, Jianwen Li, Zhaolei Zhang⁸, Rasmus Nielsen⁹, Dawei Li, Wanjun Gu¹⁰, Zhentao Yang, Zhaoling Xuan, Oliver A. Ryder, Frederick C. Leung¹¹, Yan Zhou, Jianjun Cao, Xiao Sun¹⁰, Yonggui Fu¹², Xiaodong Fang, Xiaosen Guo, Bo Wang, Rong Hou⁶, Fujun Shen⁶, Bo Mu, Peixiang Ni, Runmao Lin, Wubin Qian, Guo-Dong Wang⁴, Guo-Dong Wang¹, Chang Yu, Wenhui Nie⁴, Jinhuan Wang⁴, Zhigang Wu, Huiqing Liang, Jiumeng Min⁵, Qi Wu¹, Shifeng Cheng⁵, Jue Ruan¹, Mingwei Wang, Zhongbin Shi, Ming Wen, Binghang Liu, Xiaoli Ren, Huisong Zheng, Dong Dong⁸, Kathleen Cook⁸, Gao Shan, Hao Zhang, Carolin Kosiol¹³, Xueying Xie¹⁰, Zuhong Lu¹⁰, Hancheng Zheng, Yingrui Li¹, Cynthia C. Steiner, Tommy Tsan-Yuk Lam¹¹, Siyuan Lin, Qinghui Zhang, Guoqing Li, Jing Tian, Timing Gong, Hongde Liu¹⁰, Dejin Zhang¹⁰, Lin Fang, Chen Ye, Juanbin Zhang, Wenbo Hu¹², Anlong Xu¹², Yuanyuan Ren, Guojie Zhang¹, Guojie Zhang⁴, Michael William Bruford¹⁴, Qibin Li¹, Lijia Ma¹, Yiran Guo¹, Na An, Yujie Hu¹, Yang Zheng¹, Yongyong Shi³, Zhiqiang Li³, Qing Liu, Yanling Chen, Jing Zhao, Ning Qu⁵, Shancen Zhao, Feng Tian, Xiaoling Wang, Haiyin Wang, Lizhi Xu, Xiao Liu, Tomas Vinar¹⁵, Yajun Wang¹⁶, Tak-Wah Lam¹¹, Siu-Ming Yiu¹¹, Shiping Liu¹⁷, Hemin Zhang, Desheng Li, Yan Huang, Xia Wang, Guohua Yang, Zhi Jiang, Junyi Wang, Nan Qin, Li Li, Jingxiang Li, Lars Bolund, Karsten Kristiansen¹⁸, Gane Ka-Shu Wong¹⁹, Maynard V. Olson²⁰, Xiuqing Zhang, Songgang Li, Huanming Yang, Jing Wang, Jun Wang¹⁸ - Show less +123 more•Institutions (20)

Chinese Academy of Sciences¹, Fudan University², Shanghai Jiao Tong University³, Kunming Institute of Zoology⁴, Shenzhen University⁵, Chengdu Research Base of Giant Panda Breeding⁶, Wellcome Trust⁷, University of Toronto⁸, University of California, Berkeley⁹, Southeast University¹⁰, University of Hong Kong¹¹, Sun Yat-sen University¹², University of Vienna¹³, Cardiff University¹⁴, Comenius University in Bratislava¹⁵, Sichuan University¹⁶, South China University of Technology¹⁷, University of Copenhagen¹⁸, University of Alberta¹⁹, University of Washington²⁰

21 Jan 2010-Nature

TL;DR: Using next-generation sequencing technology alone, a draft sequence of the giant panda genome is generated and assembled, indicating that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition.

...read moreread less

Abstract: Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.

...read moreread less

1,109 citations

Journal Article•DOI•

The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes

[...]

Shengyi Liu¹, Yumei Liu, Xinhua Yang, Chaobo Tong¹, David Edwards², Isobel A. P. Parkin³, Meixia Zhao¹, Jianxin Ma⁴, Jingyin Yu¹, Shunmou Huang¹, Xiyin Wang⁵, Junyi Wang, Kun Lu⁶, Zhiyuan Fang, Ian Bancroft⁷, Tae-Jin Yang⁸, Qiong Hu¹, Xinfa Wang¹, Zhen Yue, Haojie Li, Linfeng Yang, Jian Wu, Qing Zhou, Wanxin Wang, Graham J.W. King⁹, J. Chris Pires¹⁰, Changxin Lu, Zhangyan Wu, Perumal Sampath⁸, Zhuo Wang, Hui Guo⁵, Shengkai Pan, Limei Yang, Jiumeng Min, Dong Zhang⁵, Dianchuan Jin, Wanshun Li, Harry Belcram¹¹, Jinxing Tu¹², Mei Guan¹³, Cunkou Qi, Dezhi Du, Jiana Li⁶, Liangcai Jiang, Jacqueline Batley¹⁴, Andrew G. Sharpe¹⁵, Beom Seok Park, Pradeep Ruperao², Feng Cheng, Nomar Espinosa Waminal⁸, Yin Huang, Caihua Dong¹, Li Wang, Jingping Li⁵, Zhiyong Hu¹, Mu Zhuang, Yi Huang¹, Junyan Huang¹, Jiaqin Shi¹, Desheng Mei¹, Jing Liu¹, Tae-Ho Lee⁵, Jinpeng Wang, Huizhe Jin⁵, Zaiyun Li¹², Xun Li¹³, Jiefu Zhang, Lu Xiao, Yongming Zhou¹², Zhongsong Liu¹³, Xuequn Liu¹⁶, Rui Qin¹⁶, Xu Tang⁵, Wenbin Liu, Yupeng Wang⁵, Yangyong Zhang, Jonghoon Lee⁸, Hyun Hee Kim¹⁷, Xun Xu, Xinming Liang, Wei Hua¹, Xiaowu Wang, Jun Wang¹⁸, Boulos Chalhoub¹¹, Andrew H. Paterson⁵ - Show less +81 more•Institutions (18)

Crops Research Institute¹, Australian Centre for Plant Functional Genomics², Agriculture and Agri-Food Canada³, Purdue University⁴, Plant Genome Mapping Laboratory⁵, Southwest University⁶, University of York⁷, Seoul National University⁸, Southern Cross University⁹, University of Missouri¹⁰, Centre national de la recherche scientifique¹¹, Huazhong Agricultural University¹², Hunan Agricultural University¹³, University of Queensland¹⁴, National Research Council¹⁵, Central University, India¹⁶, Sahmyook University¹⁷, King Abdulaziz University¹⁸

23 May 2014-Nature Communications

TL;DR: A draft genome sequence of Brassica oleracea is described, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks.

...read moreread less

Abstract: Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear Brassica is an ideal model to increase knowledge of polyploid evolution Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B oleracea This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus

...read moreread less

884 citations

Journal Article•DOI•

The draft genome of a diploid cotton Gossypium raimondii.

[...]

Kunbo Wang, Zhiwen Wang, Fuguang Li, Wuwei Ye, Junyi Wang, Guoli Song, Zhen Yue, Lin Cong, Haihong Shang, Shilin Zhu, Changsong Zou, Qin Li¹, Youlu Yuan, Cairui Lu, Hengling Wei, Caiyun Gou, Zheng Zequn, Ye Yin, Xueyan Zhang, Kun Liu, Bo Wang, Chi Song, Nan Shi, Russell J. Kohel², Richard G. Percy², John Z. Yu², Yu-Xian Zhu¹, Jun Wang³, Shuxun Yu - Show less +25 more•Institutions (3)

Peking University¹, United States Department of Agriculture², University of Copenhagen³

01 Oct 2012-Nature Genetics

TL;DR: Cotton, and probably Theobroma cacao, are the only sequenced plant species that possess an authentic CDN1 gene family for gossypol biosynthesis, as revealed by phylogenetic analysis.

...read moreread less

Abstract: Yuxian Zhu and colleagues report the draft genome of a diploid cotton Gossypium raimondii. This species is a wild South American cotton, whose progenitor is thought to have been the contributor of the D subgenome of the allotetraploid commercial species Gossypium hirsutum and Gossypium barbadense, which account for ~95% of the worldwide cotton crop.

...read moreread less

826 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

[...]

Nuala A. O'Leary¹, Mathew W. Wright¹, J. Rodney Brister¹, Stacy Ciufo¹, Diana Haddad¹, Richard McVeigh¹, Bhanu Rajput¹, Barbara Robbertse¹, Brian Smith-White¹, Danso Ako-adjei¹, Alexander Astashyn¹, Azat Badretdin¹, Yiming Bao¹, Olga Blinkova¹, Vyacheslav Brover¹, Vyacheslav Chetvernin¹, Jinna Choi¹, Eric Cox¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Tamara Goldfarb¹, Tripti Gupta¹, Daniel H. Haft¹, Eneida L. Hatcher¹, Wratko Hlavina¹, Vinita Joardar¹, Vamsi K. Kodali¹, Wenjun Li¹, Donna Maglott¹, Patrick Masterson¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Kathleen O'Neill¹, Shashikant Pujar¹, Sanjida H. Rangwala¹, Daniel Rausch¹, Lillian D. Riddick¹, Conrad L. Schoch¹, Andrei Shkeda¹, Susan S. Storz¹, Hanzhen Sun¹, Françoise Thibaud-Nissen¹, Igor Tolstoy¹, Raymond E. Tully¹, Anjana R. Vatsan¹, Craig Wallin¹, David Webb¹, Wendy Wu¹, Melissa J. Landrum¹, Avi Kimchi¹, Tatiana Tatusova¹, Michael DiCuccio¹, Paul Kitts¹, Terence Murphy¹, Kim D. Pruitt¹ - Show less +51 more•Institutions (1)

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.

...read moreread less

Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

...read moreread less

4,104 citations

KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集ゲノム医学の現在と未来--基礎と臨床) -- (データベース)

[...]

光輝中尾, 實金久

01 Jan 2000

3,536 citations

Journal Article•DOI•

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers

[...]

Guillaume Marçais¹, Carl Kingsford¹•Institutions (1)

University of Maryland, College Park¹

01 Mar 2011-Bioinformatics

TL;DR: This work proposes a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient, based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length.

...read moreread less

Abstract: Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm. Results: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution. Availability: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish. Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

2,779 citations

Journal Article•DOI•

De novo assembly of human genomes with massively parallel short read sequencing

[...]

Ruiqiang Li¹, Hongmei Zhu, Jue Ruan, Wubin Qian, Xiaodong Fang, Zhongbin Shi, Yingrui Li, Shengting Li², Gao Shan, Karsten Kristiansen, Songgang Li, Huanming Yang, Jing Wang, Jun Wang - Show less +10 more•Institutions (2)

Beijing Genomics Institute¹, Aarhus University²

01 Feb 2010-Genome Research

TL;DR: The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

...read moreread less

Abstract: Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

...read moreread less

2,760 citations

Journal Article•DOI•

Scaffolding pre-assembled contigs using SSPACE

[...]

Marten Boetzer, Christiaan V. Henkel¹, Hans J. Jansen¹, Derek Butler¹, Walter Pirovano¹ - Show less +1 more•Institutions (1)

Leiden University¹

01 Feb 2011-Bioinformatics

TL;DR: A new tool, called SSPACE, which is a stand-alone scaffolder of pre-assembled contigs using paired-read data with a short runtime, multiple library input of paired-end and/or mate pair datasets and possible contig extension with unmapped sequence reads.

...read moreread less

Abstract: Summary:De novo assembly tools play a main role in reconstructing genomes from next-generation sequencing (NGS) data and usually yield a number of contigs. Using paired-read sequencing data it is possible to assess the order, distance and orientation of contigs and combine them into so-called scaffolds. Although the latter process is a crucial step in finishing genomes, scaffolding algorithms are often built-in functions in de novo assembly tools and cannot be independently controlled. We here present a new tool, called SSPACE, which is a stand-alone scaffolder of pre-assembled contigs using paired-read data. Main features are: a short runtime, multiple library input of paired-end and/or mate pair datasets and possible contig extension with unmapped sequence reads. SSPACE shows promising results on both prokaryote and eukaryote genomic testsets where the amount of initial contigs was reduced by at least 75%. Availability: www.baseclear.com/bioinformatics-tools/. Contact: walter.pirovano@baseclear.com Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

2,165 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse