scispace - formally typeset
Search or ask a question
Author

Bo Wang

Other affiliations: York Hospital, University of York, Beijing Institute of Genomics  ...read more
Bio: Bo Wang is an academic researcher from University of Leicester. The author has contributed to research in topics: Gaussian process & Regression analysis. The author has an hindex of 21, co-authored 50 publications receiving 14080 citations. Previous affiliations of Bo Wang include York Hospital & University of York.


Papers
More filters
Journal ArticleDOI
Adam Auton1, Gonçalo R. Abecasis2, David Altshuler3, Richard Durbin4  +514 moreInstitutions (90)
01 Oct 2015-Nature
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

12,661 citations

Journal ArticleDOI
Xiaowu Wang1, Hanzhong Wang, Jun Wang2, Jun Wang3, Jun Wang4, Rifei Sun, Jian Wu, Shengyi Liu, Yinqi Bai3, Jeong-Hwan Mun5, Ian Bancroft6, Feng Cheng, Sanwen Huang, Xixiang Li, Wei Hua, Junyi Wang3, Xiyin Wang7, Xiyin Wang8, Michael Freeling9, J. Chris Pires10, Andrew H. Paterson8, Boulos Chalhoub, Bo Wang3, Alice Hayward11, Alice Hayward12, Andrew G. Sharpe13, Beom-Seok Park5, Bernd Weisshaar14, Binghang Liu3, Bo Li3, Bo Liu, Chaobo Tong, Chi Song3, Chris Duran15, Chris Duran12, Chunfang Peng3, Geng Chunyu3, Chushin Koh13, Chuyu Lin3, David Edwards12, David Edwards15, Desheng Mu3, Di Shen, Eleni Soumpourou6, Fei Li, Fiona Fraser6, Gavin C. Conant10, Gilles Lassalle16, Graham J.W. King4, Guusje Bonnema17, Haibao Tang9, Haiping Wang, Harry Belcram, Heling Zhou3, Hideki Hirakawa, Hiroshi Abe, Hui Guo8, Hui Wang, Huizhe Jin8, Isobel A. P. Parkin18, Jacqueline Batley11, Jacqueline Batley12, Jeong-Sun Kim5, Jérémy Just, Jianwen Li3, Jiaohui Xu3, Jie Deng, Jin A Kim5, Jingping Li8, Jingyin Yu, Jinling Meng19, Jinpeng Wang7, Jiumeng Min3, Julie Poulain20, Katsunori Hatakeyama, Kui Wu3, Li Wang7, Lu Fang, Martin Trick6, Matthew G. Links18, Meixia Zhao, Mina Jin5, Nirala Ramchiary21, Nizar Drou22, Paul J. Berkman12, Paul J. Berkman15, Qingle Cai3, Quanfei Huang3, Ruiqiang Li3, Satoshi Tabata, Shifeng Cheng3, Shu Zhang3, Shujiang Zhang, Shunmou Huang, Shusei Sato, Silong Sun, Soo-Jin Kwon5, Su-Ryun Choi21, Tae-Ho Lee8, Wei Fan3, Xiang Zhao3, Xu Tan8, Xun Xu3, Yan Wang, Yang Qiu, Ye Yin3, Yingrui Li3, Yongchen Du, Yongcui Liao, Yong Pyo Lim21, Yoshihiro Narusaka, Yupeng Wang7, Zhenyi Wang7, Zhenyu Li3, Zhiwen Wang3, Zhiyong Xiong10, Zhonghua Zhang 
TL;DR: The annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage, and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.
Abstract: We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.

1,811 citations

Journal ArticleDOI
TL;DR: A high level of linkage disequilibrium in the soybean genome is identified, suggesting that marker-assisted breeding of soybean will be less challenging than map-based cloning and to facilitate future breeding and quantitative trait analysis.
Abstract: We report a large-scale analysis of the patterns of genome-wide genetic variation in soybeans. We re-sequenced a total of 17 wild and 14 cultivated soybean genomes to an average of approximately ×5 depth and >90% coverage using the Illumina Genome Analyzer II platform. We compared the patterns of genetic variation between wild and cultivated soybeans and identified higher allelic diversity in wild soybeans. We identified a high level of linkage disequilibrium in the soybean genome, suggesting that marker-assisted breeding of soybean will be less challenging than map-based cloning. We report linkage disequilibrium block location and distribution, and we identified a set of 205,614 tag SNPs that may be useful for QTL mapping and association studies. The data here provide a valuable resource for the analysis of wild soybeans and to facilitate future breeding and quantitative trait analysis.

936 citations

Journal ArticleDOI
TL;DR: More than 1,000,000 SNPs, 30,000 indel polymorphisms and 101 low-sequence-diversity chromosomal intervals in the maize genome are uncovered, including the parents of the most productive commercial hybrid in China.
Abstract: We have resequenced a group of six elite maize inbred lines, including the parents of the most productive commercial hybrid in China. This effort uncovered more than 1,000,000 SNPs, 30,000 indel polymorphisms and 101 low-sequence-diversity chromosomal intervals in the maize genome. We also identified several hundred complete genes that show presence/absence variation among these resequenced lines. We discuss the potential roles of complementation of presence/absence variations and other deleterious mutations in contributing to heterosis. High-density SNP and indel polymorphism markers reported here are expected to be a valuable resource for future genetic studies and the molecular breeding of this important crop.

457 citations

Journal ArticleDOI
TL;DR: The maize haplotype version 3 (HapMap 3) was built from whole-genome sequencing data from 1218 maize lines, covering predomestication and domesticated Zea mays varieties across the world as discussed by the authors.
Abstract: Author(s): Bukowski, Robert; Guo, Xiaosen; Lu, Yanli; Zou, Cheng; He, Bing; Rong, Zhengqin; Wang, Bo; Xu, Dawen; Yang, Bicheng; Xie, Chuanxiao; Fan, Longjiang; Gao, Shibin; Xu, Xun; Zhang, Gengyun; Li, Yingrui; Jiao, Yinping; Doebley, John F; Ross-Ibarra, Jeffrey; Lorant, Anne; Buffalo, Vince; Romay, M Cinta; Buckler, Edward S; Ware, Doreen; Lai, Jinsheng; Sun, Qi; Xu, Yunbi | Abstract: BackgroundCharacterization of genetic variations in maize has been challenging, mainly due to deterioration of collinearity between individual genomes in the species. An international consortium of maize research groups combined resources to develop the maize haplotype version 3 (HapMap 3), built from whole-genome sequencing data from 1218 maize lines, covering predomestication and domesticated Zea mays varieties across the world.ResultsA new computational pipeline was set up to process more than 12 trillion bp of sequencing data, and a set of population genetics filters was applied to identify more than 83 million variant sites.ConclusionsWe identified polymorphisms in regions where collinearity is largely preserved in the maize species. However, the fact that the B73 genome used as the reference only represents a fraction of all haplotypes is still an important limiting factor.

324 citations


Cited by
More filters
Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations

Journal ArticleDOI
Monkol Lek, Konrad J. Karczewski1, Konrad J. Karczewski2, Eric Vallabh Minikel1, Eric Vallabh Minikel2, Kaitlin E. Samocha, Eric Banks1, Timothy Fennell1, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria3, Anne H. O’Donnell-Luria2, James S. Ware, Andrew J. Hill4, Andrew J. Hill2, Andrew J. Hill1, Beryl B. Cummings1, Beryl B. Cummings2, Taru Tukiainen2, Taru Tukiainen1, Daniel P. Birnbaum1, Jack A. Kosmicki, Laramie E. Duncan1, Laramie E. Duncan2, Karol Estrada1, Karol Estrada2, Fengmei Zhao1, Fengmei Zhao2, James Zou1, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Joanne Berghout5, David Neil Cooper6, Nicole A. Deflaux7, Mark A. DePristo1, Ron Do, Jason Flannick2, Jason Flannick1, Menachem Fromer, Laura D. Gauthier1, Jackie Goldstein2, Jackie Goldstein1, Namrata Gupta1, Daniel P. Howrigan2, Daniel P. Howrigan1, Adam Kiezun1, Mitja I. Kurki2, Mitja I. Kurki1, Ami Levy Moonshine1, Pradeep Natarajan, Lorena Orozco, Gina M. Peloso2, Gina M. Peloso1, Ryan Poplin1, Manuel A. Rivas1, Valentin Ruano-Rubio1, Samuel A. Rose1, Douglas M. Ruderfer8, Khalid Shakir1, Peter D. Stenson6, Christine Stevens1, Brett Thomas1, Brett Thomas2, Grace Tiao1, María Teresa Tusié-Luna, Ben Weisburd1, Hong-Hee Won9, Dongmei Yu, David Altshuler1, David Altshuler10, Diego Ardissino, Michael Boehnke11, John Danesh12, Stacey Donnelly1, Roberto Elosua, Jose C. Florez2, Jose C. Florez1, Stacey Gabriel1, Gad Getz2, Gad Getz1, Stephen J. Glatt13, Christina M. Hultman14, Sekar Kathiresan, Markku Laakso15, Steven A. McCarroll1, Steven A. McCarroll2, Mark I. McCarthy16, Mark I. McCarthy17, Dermot P.B. McGovern18, Ruth McPherson19, Benjamin M. Neale2, Benjamin M. Neale1, Aarno Palotie, Shaun Purcell8, Danish Saleheen20, Jeremiah M. Scharf, Pamela Sklar, Patrick F. Sullivan14, Patrick F. Sullivan21, Jaakko Tuomilehto22, Ming T. Tsuang23, Hugh Watkins16, Hugh Watkins17, James G. Wilson24, Mark J. Daly1, Mark J. Daly2, Daniel G. MacArthur2, Daniel G. MacArthur1 
18 Aug 2016-Nature
TL;DR: The aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC) provides direct evidence for the presence of widespread mutational recurrence.
Abstract: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

8,758 citations

Journal ArticleDOI
04 May 2011-PLOS ONE
TL;DR: A procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs) is reported, which is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches.
Abstract: Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.

5,163 citations

Journal ArticleDOI
11 Oct 2018-Nature
TL;DR: Deep phenotype and genome-wide genetic data from 500,000 individuals from the UK Biobank is described, describing population structure and relatedness in the cohort, and imputation to increase the number of testable variants to 96 million.
Abstract: The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

4,489 citations

Journal ArticleDOI
TL;DR: This work considers approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non‐Gaussian response variables and can directly compute very accurate approximations to the posterior marginals.
Abstract: Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.

4,164 citations