scispace - formally typeset
Search or ask a question

Showing papers by "Yingrui Li published in 2014"


Journal ArticleDOI
Bernhard Misof, Shanlin Liu, Karen Meusemann1, Ralph S. Peters, Alexander Donath, Christoph Mayer, Paul B. Frandsen2, Jessica L. Ware2, Tomas Flouri3, Rolf G. Beutel4, Oliver Niehuis, Malte Petersen, Fernando Izquierdo-Carrasco3, Torsten Wappler5, Jes Rust5, Andre J. Aberer3, Ulrike Aspöck6, Ulrike Aspöck7, Horst Aspöck6, Daniela Bartel6, Alexander Blanke8, Simon Berger3, Alexander Böhm6, Thomas R. Buckley9, Brett Calcott10, Junqing Chen, Frank Friedrich11, Makiko Fukui12, Mari Fujita8, Carola Greve, Peter Grobe, Shengchang Gu, Ying Huang, Lars S. Jermiin1, Akito Y. Kawahara13, Lars Krogmann14, Martin Kubiak11, Robert Lanfear15, Robert Lanfear16, Robert Lanfear17, Harald Letsch6, Yiyuan Li, Zhenyu Li, Jiguang Li, Haorong Lu, Ryuichiro Machida8, Yuta Mashimo8, Pashalia Kapli18, Pashalia Kapli3, Duane D. McKenna19, Guanliang Meng, Yasutaka Nakagaki8, José Luis Navarrete-Heredia20, Michael Ott21, Yanxiang Ou, Günther Pass6, Lars Podsiadlowski5, Hans Pohl4, Björn M. von Reumont22, Kai Schütte11, Kaoru Sekiya8, Shota Shimizu8, Adam Slipinski1, Alexandros Stamatakis3, Alexandros Stamatakis23, Wenhui Song, Xu Su, Nikolaus U. Szucsich6, Meihua Tan, Xuemei Tan, Min Tang, Jingbo Tang, Gerald Timelthaler6, Shigekazu Tomizuka8, Michelle D. Trautwein24, Xiaoli Tong25, Toshiki Uchifune8, Manfred Walzl6, Brian M. Wiegmann26, Jeanne Wilbrandt, Benjamin Wipfler4, Thomas K. F. Wong1, Qiong Wu, Gengxiong Wu, Yinlong Xie, Shenzhou Yang, Qing Yang, David K. Yeates1, Kazunori Yoshizawa27, Qing Zhang, Rui Zhang, Wenwei Zhang, Yunhui Zhang, Jing Zhao, Chengran Zhou, Lili Zhou, Tanja Ziesmann, Shijie Zou, Yingrui Li, Xun Xu, Yong Zhang, Huanming Yang, Jian Wang, Jun Wang, Karl M. Kjer2, Xin Zhou 
07 Nov 2014-Science
TL;DR: The phylogeny of all major insect lineages reveals how and when insects diversified and provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
Abstract: Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

1,998 citations


Journal ArticleDOI
01 May 2014-Nature
TL;DR: Genomic analyses suggest that ESCC and head and neck squamous cell carcinoma share some common pathogenic mechanisms, and ESCC development is associated with alcohol drinking, and novel biological markers and tumorigenic pathways that would greatly improve therapeutic strategies for ESCC are explored.
Abstract: Oesophageal cancer is one of the most aggressive cancers and is the sixth leading cause of cancer death worldwide(1). Approximately 70% of global oesophageal cancer cases occur in China, with oesophageal squamous cell carcinoma (ESCC) being the histopathological form in the vast majority of cases (>90%)(2,3). Currently, there are limited clinical approaches for the early diagnosis and treatment of ESCC, resulting in a 10% five-year survival rate for patients. However, the full repertoire of genomic events leading to the pathogenesis of ESCC remains unclear. Here we describe a comprehensive genomic analysis of 158 ESCC cases, as part of the International Cancer Genome Consortium research project. We conducted whole-genome sequencing in 17 ESCC cases and whole-exome sequencing in 71 cases, of which 53 cases, plus an additional 70 ESCC cases not used in the whole-genome and whole-exome sequencing, were subjected to array comparative genomic hybridization analysis. We identified eight significantly mutated genes, of which six are well known tumour-associated genes (TP53, RB1, CDKN2A, PIK3CA, NOTCH1, NFE2L2), and two have not previously been described in ESCC (ADAM29 and FAM135B). Notably, FAM135B is identified as a novel cancer-implicated gene as assayed for its ability to promote malignancy of ESCC cells. Additionally, MIR548K, a microRNA encoded in the amplified 11q13.3-13.4 region, is characterized as a novel oncogene, and functional assays demonstrate that MIR548K enhances malignant phenotypes of ESCC cells. Moreover, we have found that several important histone regulator genes (MLL2 (also called KMT2D), ASH1L, MLL3 (KMT2C), SETD1B, CREBBP and EP300) are frequently altered in ESCC. Pathway assessment reveals that somatic aberrations are mainly involved in the Wnt, cell cycle and Notch pathways. Genomic analyses suggest that ESCC and head and neck squamous cell carcinoma share some common pathogenic mechanisms, and ESCC development is associated with alcohol drinking. This study has explored novel biological markers and tumorigenic pathways that would greatly improve therapeutic strategies for ESCC.

853 citations


Journal ArticleDOI
14 Aug 2014-Nature
TL;DR: Re-sequencing the region around EPAS1 in 40 Tibetan and 40 Han individuals finds that this gene has a highly unusual haplotype structure that can only be convincingly explained by introgression of DNA from Denisovan or Denisovan-related individuals into humans.
Abstract: As modern humans migrated out of Africa, they encountered many new environmental conditions, including greater temperature extremes, different pathogens and higher altitudes. These diverse environments are likely to have acted as agents of natural selection and to have led to local adaptations. One of the most celebrated examples in humans is the adaptation of Tibetans to the hypoxic environment of the high-altitude Tibetan plateau. A hypoxia pathway gene, EPAS1, was previously identified as having the most extreme signature of positive selection in Tibetans, and was shown to be associated with differences in haemoglobin concentration at high altitude. Re-sequencing the region around EPAS1 in 40 Tibetan and 40 Han individuals, we find that this gene has a highly unusual haplotype structure that can only be convincingly explained by introgression of DNA from Denisovan or Denisovan-related individuals into humans. Scanning a larger set of worldwide populations, we find that the selected haplotype is only found in Denisovans and in Tibetans, and at very low frequency among Han Chinese. Furthermore, the length of the haplotype, and the fact that it is not found in any other populations, makes it unlikely that the haplotype sharing between Tibetans and Denisovans was caused by incomplete ancestral lineage sorting rather than introgression. Our findings illustrate that admixture with other hominin species has provided genetic variation that helped humans to adapt to new environments.

851 citations


Journal ArticleDOI
TL;DR: The conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution, compared with two other popular transcriptome assemblers.
Abstract: Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Owing to the rapid increase in throughputs and decrease in costs of next- generation sequencing, RNA-Seq in particular has become the method of choice. However, the very short reads (e.g. 2 � 90 bp paired ends) from next generation sequencing makes de novo assembly to recover complete or full-length transcript sequences an algorithmic challenge. Results: Here, we present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. We evaluated its performance on transcriptome datasets from rice and mouse. Using as our benchmarks the known transcripts from these wellannotated genomes (sequenced a decade ago), we assessed how SOAPdenovo-Trans and two other popular transcriptome assemblers handled such practical issues as alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution. Availability and implementation: Source code and user manual are available at http://sourceforge.net/projects/soapdenovotrans/. Contact: xieyl@genomics.cn or bgi-soap@googlegroups.com Supplementary information: Supplementary data are available at Bioinformatics online.

730 citations


Journal ArticleDOI
Laurent C. Francioli1, Androniki Menelaou1, Sara L. Pulit1, Freerk van Dijk1, Pier Francesco Palamara2, Clara C. Elbers1, Pieter B. Neerincx1, Kai Ye3, Kai Ye4, Victor Guryev, Wigard P. Kloosterman1, Patrick Deelen1, Abdel Abdellaoui5, Elisabeth M. van Leeuwen6, Mannis van Oven6, Martijn Vermaat4, Mingkun Li7, Jeroen F. J. Laros4, Lennart C. Karssen6, Alexandros Kanterakis1, Najaf Amin6, Jouke-Jan Hottenga5, Eric-Wubbo Lameijer4, Mathijs Kattenberg5, Martijn Dijkstra1, Heorhiy Byelas1, Jessica van Setten8, Barbera D. C. van Schaik5, Jan Bot, Isaac J. Nijman1, Ivo Renkens1, Tobias Marschall9, Alexander Schönhuth, Jayne Y. Hehir-Kwa10, Robert E. Handsaker10, Robert E. Handsaker11, Paz Polak10, Mashaal Sohail10, Mashaal Sohail12, Dana Vuzman12, Fereydoun Hormozdiari, David van Enckevort, Hailiang Mei6, Vyacheslav Koval4, Matthijs Moed1, K. Joeri van der Velde1, Fernando Rivadeneira12, Fernando Rivadeneira10, Fernando Rivadeneira6, Karol Estrada6, Carolina Medina-Gomez6, Aaron Isaacs11, Aaron Isaacs10, Steven A. McCarroll4, Marian Beekman4, Anton J. M. de Craen4, H. Eka D. Suchiman4, Albert Hofman6, Ben A. Oostra6, André G. Uitterlinden6, Gonneke Willemsen5, Mathieu Platteel1, Jan H. Veldink8, Leonard H. van den Berg13, Steven J. Pitts13, Shobha Potluri13, Purnima Sundar13, David R. Cox10, David R. Cox12, Shamil R. Sunyaev4, Johan T. den Dunnen7, Mark Stoneking7, Peter de Knijff4, Manfred Kayser6, Qibin Li14, Yingrui Li14, Yuanping Du14, Ruoyan Chen14, Hongzhi Cao14, Ning Li, Sujie Cao, Jun Wang15, Jasper A. Bovenberg, Itsik Pe'er2, P. Eline Slagboom4, Cornelia M. van Duijn6, Dorret I. Boomsma5, Gert-Jan B. van Ommen4, Paul I.W. de Bakker1, Paul I.W. de Bakker8, Morris A. Swertz, Cisca Wijmenga 
TL;DR: The Genome of the Netherlands (GoNL) Project is described, in which the whole genomes of 250 Dutch parent-offspring families were sequenced and a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions were constructed.
Abstract: Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring families and constructed a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions. The intermediate coverage (∼13×) and trio design enabled extensive characterization of structural variation, including midsize events (30-500 bp) previously poorly catalogued and de novo mutations. We demonstrate that the quality of the haplotypes boosts imputation accuracy in independent samples, especially for lower frequency alleles. Population genetic analyses demonstrate fine-scale structure across the country and support multiple ancient migrations, consistent with historical changes in sea level and flooding. The GoNL Project illustrates how single-population whole-genome sequencing can provide detailed characterization of genetic variation and may guide the design of future population studies.

677 citations


Journal ArticleDOI
TL;DR: It is found that dosage compensation effect of tandem duplication genes probably contributed to the pungent diversification in pepper and the Capsicum reference genome provides crucial information for the study of not only the evolution of the pepper genome but also, the Solanaceae family.
Abstract: As an economic crop, pepper satisfies people’s spicy taste and has medicinal uses worldwide. To gain a better understanding of Capsicum evolution, domestication, and specialization, we present here the genome sequence of the cultivated pepper Zunla-1 (C. annuum L.) and its wild progenitor Chiltepin (C. annuum var. glabriusculum). We estimate that the pepper genome expanded ∼0.3 Mya (with respect to the genome of other Solanaceae) by a rapid amplification of retrotransposons elements, resulting in a genome comprised of ∼81% repetitive sequences. Approximately 79% of 3.48-Gb scaffolds containing 34,476 protein-coding genes were anchored to chromosomes by a high-density genetic map. Comparison of cultivated and wild pepper genomes with 20 resequencing accessions revealed molecular footprints of artificial selection, providing us with a list of candidate domestication genes. We also found that dosage compensation effect of tandem duplication genes probably contributed to the pungent diversification in pepper. The Capsicum reference genome provides crucial information for the study of not only the evolution of the pepper genome but also, the Solanaceae family, and it will facilitate the establishment of more effective pepper breeding programs.

593 citations


Journal ArticleDOI
TL;DR: A draft 6.5 Gb genome sequence of Locusta migratoria is presented, which is the largest animal genome sequenced so far, and complex regulatory mechanisms involved in microtubule dynamic-mediated synapse plasticity during phase change are revealed.
Abstract: Locusts are one of the world's most destructive agricultural pests and represent a useful model system in entomology. Here we present a draft 6.5 Gb genome sequence of Locusta migratoria, which is the largest animal genome sequenced so far. Our findings indicate that the large genome size of L. migratoria is likely to be because of transposable element proliferation combined with slow rates of loss for these elements. Methylome and transcriptome analyses reveal complex regulatory mechanisms involved in microtubule dynamic-mediated synapse plasticity during phase change. We find significant expansion of gene families associated with energy consumption and detoxification, consistent with long-distance flight capacity and phytophagy. We report hundreds of potential insecticide target genes, including cys-loop ligand-gated ion channels, G-protein-coupled receptors and lethal genes. The L. migratoria genome sequence offers new insights into the biology and sustainable management of this pest species, and will promote its wide use as a model system.

431 citations


Journal ArticleDOI
TL;DR: The Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL, is described, a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population.
Abstract: Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project.

267 citations


Journal ArticleDOI
TL;DR: The sesame genome will facilitate future research on the evolution of eudicots, as well as the study of lipid biosynthesis and potential genetic improvement of sesame, an important species from the order Lamiales and a high oil crop.
Abstract: Background: Sesame, Sesamum indicum L., is considered the queen of oilseeds for its high oil content and quality, and is grown widely in tropical and subtropical areas as an important source of oil and protein. However, the molecular biology of sesame is largely unexplored. Results: Here, we report a high-quality genome sequence of sesame assembled de novo with a contig N50 of 52.2 kb and a scaffold N50 of 2.1 Mb, containing an estimated 27,148 genes. The results reveal novel, independent whole genome duplication and the absence of the Toll/interleukin-1 receptor domain in resistance genes. Candidate genes and oil biosynthetic pathways contributing to high oil content were discovered by comparative genomic and transcriptomic analyses. These revealed the expansion of type 1 lipid transfer genes by tandem duplication, the contraction of lipid degradation genes, and the differential expression of essential genes in the triacylglycerol biosynthesis pathway, particularly in the early stage of seed development. Resequencing data in 29 sesame accessions from 12 countries suggested that the high genetic diversity of lipid-related genes might be associated with the wide variation in oil content. Additionally, the results shed light on the pivotal stage of seed development, oil accumulation and potential key genes for sesamin production, an important pharmacological constituent of sesame. Conclusions: As an important species from the order Lamiales and a high oil crop, the sesame genome will facilitate future research on the evolution of eudicots, as well as the study of lipid biosynthesis and potential genetic improvement of sesame.

225 citations


Journal ArticleDOI
TL;DR: Single-variant and gene-based association analyses of nonsynonymous SNVs did not identify newly associated genes for psoriasis in the regions subjected to targeted resequencing, which suggests that coding variants in the 1,326 targeted genes contribute only a limited fraction of the overall genetic risk for Psoriasis.
Abstract: To explore the contribution of functional coding variants to psoriasis, we analyzed nonsynonymous single-nucleotide variants (SNVs) across the genome by exome sequencing in 781 psoriasis cases and 676 controls and through follow-up validation in 1,326 candidate genes by targeted sequencing in 9,946 psoriasis cases and 9,906 controls from the Chinese population. We discovered two independent missense SNVs in IL23R and GJB2 of low frequency and five common missense SNVs in LCE3D, ERAP1, CARD14 and ZNF816A associated with psoriasis at genome-wide significance. Rare missense SNVs in FUT2 and TARBP1 were also observed with suggestive evidence of association. Single-variant and gene-based association analyses of nonsynonymous SNVs did not identify newly associated genes for psoriasis in the regions subjected to targeted resequencing. This suggests that coding variants in the 1,326 targeted genes contribute only a limited fraction of the overall genetic risk for psoriasis.

191 citations


Journal ArticleDOI
23 May 2014-Science
TL;DR: In this article, the authors performed whole-exome sequencing of 49 blood-tumor pairs and RNA sequencing of 44 tumors from cortisol-producing adenomas (ACAs), adrenocorticotropic hormone-independent macronodular hyperplasias (AIMAHs), and Adrenocortical oncocytomas (ADOs) and identified a hotspot in the PRKACA gene with a L205R mutation in 69.2% (27 out of 39) of ACAs and validated in 65.5% of a total of 87
Abstract: Adrenal Cushing's syndrome is caused by excess production of glucocorticoid from adrenocortical tumors and hyperplasias, which leads to metabolic disorders. We performed whole-exome sequencing of 49 blood-tumor pairs and RNA sequencing of 44 tumors from cortisol-producing adrenocortical adenomas (ACAs), adrenocorticotropic hormone-independent macronodular adrenocortical hyperplasias (AIMAHs), and adrenocortical oncocytomas (ADOs). We identified a hotspot in the PRKACA gene with a L205R mutation in 69.2% (27 out of 39) of ACAs and validated in 65.5% of a total of 87 ACAs. Our data revealed that the activating L205R mutation, which locates in the P+1 loop of the protein kinase A (PKA) catalytic subunit, promoted PKA substrate phosphorylation and target gene expression. Moreover, we discovered the recurrently mutated gene DOT1L in AIMAHs and CLASP2 in ADOs. Collectively, these data highlight potentially functional mutated genes in adrenal Cushing's syndrome.

Journal ArticleDOI
TL;DR: With careful monitoring via whole-genome sequencing it is possible to apply genome editing to human pluripotent cells with minimal impact on genomic mutational load, and a TALEN-HDAdV hybrid vector is developed, which significantly increased gene-correction efficiency in hiPSCs.

Journal ArticleDOI
12 Aug 2014-PLOS ONE
TL;DR: This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity.
Abstract: The genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.

Journal ArticleDOI
TL;DR: This study provides the first exome-wide evidence at single-cell level supporting that colon cancer could be of a biclonal origin, and suggests that low-prevalence mutations in a cohort may also play important protumorigenic roles at the individual level.
Abstract: Single-cell sequencing is a powerful tool for delineating clonal relationship and identifying key driver genes for personalized cancer management. Here we performed single-cell sequencing analysis of a case of colon cancer. Population genetics analyses identified two independent clones in tumor cell population. The major tumor clone harbored APC and TP53 mutations as early oncogenic events, whereas the minor clone contained preponderant CDC27 and PABPC1 mutations. The absence of APC and TP53 mutations in the minor clone supports that these two clones were derived from two cellular origins. Examination of somatic mutation allele frequency spectra of additional 21 whole-tissue exome-sequenced cases revealed the heterogeneity of clonal origins in colon cancer. Next, we identified a mutated gene SLC12A5 that showed a high frequency of mutation at the single-cell level but exhibited low prevalence at the population level. Functional characterization of mutant SLC12A5 revealed its potential oncogenic effect in colon cancer. Our study provides the first exome-wide evidence at single-cell level supporting that colon cancer could be of a biclonal origin, and suggests that low-prevalence mutations in a cohort may also play important protumorigenic roles at the individual level.

Journal ArticleDOI
TL;DR: Deep-sequence 42 HCC patients with a combination of whole genome, exome and transcriptome sequencing identify the mutational landscape of HCC and find frequent mutations in TP53, CTNNB1 and AXIN1, and rare but likely functional mutations in BAP1 and IDH1.
Abstract: Background Hepatocellular carcinoma (HCC) is a heterogeneous disease with high mortality rate. Recent genomic studies have identified TP53, AXIN1, and CTNNB1 as the most frequently mutated genes. Lower frequency mutations have been reported in ARID1A, ARID2 and JAK1. In addition, hepatitis B virus (HBV) integrations into the human genome have been associated with HCC.

Journal ArticleDOI
TL;DR: This study is the first to identify frequent BAP1 and BRCA pathway alterations in bladder cancer, show TERT promoter alterations are independent of other bladder cancer gene alterations, and show KDM6A loss is a driver of the bladder cancer phenotype.
Abstract: Purpose: Genetic analysis of bladder cancer has revealed a number of frequently altered genes, including frequent alterations of the telomerase ( TERT ) gene promoter, although few altered genes have been functionally evaluated. Our objective is to characterize alterations observed by exome sequencing and sequencing of the TERT promoter, and to examine the functional relevance of histone lysine (K)–specific demethylase 6A ( KDM6A/UTX ), a frequently mutated histone demethylase, in bladder cancer. Experimental Design: We analyzed bladder cancer samples from 54 U.S. patients by exome and targeted sequencing and confirmed somatic variants using normal tissue from the same patient. We examined the biologic function of KDM6A using in vivo and in vitro assays. Results: We observed frequent somatic alterations in BRCA1 associated protein-1 (BAP1) in 15% of tumors, including deleterious alterations to the deubiquitinase active site and the nuclear localization signal. BAP1 mutations contribute to a high frequency of tumors with breast cancer (BRCA) DNA repair pathway alterations and were significantly associated with papillary histologic features in tumors. BAP1 and KDM6A mutations significantly co-occurred in tumors. Somatic variants altering the TERT promoter were found in 69% of tumors but were not correlated with alterations in other bladder cancer genes. We examined the function of KDM6A , altered in 24% of tumors, and show depletion in human bladder cancer cells, enhanced in vitro proliferation, in vivo tumor growth, and cell migration. Conclusions: This study is the first to identify frequent BAP1 and BRCA pathway alterations in bladder cancer, show TERT promoter alterations are independent of other bladder cancer gene alterations, and show KDM6A loss is a driver of the bladder cancer phenotype. Clin Cancer Res; 20(18); 4935–48. ©2014 AACR .

Journal ArticleDOI
TL;DR: Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble.
Abstract: Background: Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome. Findings: We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) – the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300× raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing. Conclusions: Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits.

Journal ArticleDOI
TL;DR: The first whole genome resequencing-based analysis identifying genes that likely modulate high altitude adaptation in native Ethiopians residing at 3,500 m above sea level on Bale Plateau or Chennek field in Ethiopia highlights the importance of whole genome sequencing for investigating adaptation by natural selection.
Abstract: Although it has long been proposed that genetic factors contribute to adaptation to high altitude, such factors remain largely unverified. Recent advances in high-throughput sequencing have made it feasible to analyze genome-wide patterns of genetic variation in human populations. Since traditionally such studies surveyed only a small fraction of the genome, interpretation of the results was limited. We report here the results of the first whole genome resequencing-based analysis identifying genes that likely modulate high altitude adaptation in native Ethiopians residing at 3,500 m above sea level on Bale Plateau or Chennek field in Ethiopia. Using cross-population tests of selection, we identify regions with a significant loss of diversity, indicative of a selective sweep. We focus on a 208 kbp gene-rich region on chromosome 19, which is significant in both of the Ethiopian subpopulations sampled. This region contains eight protein-coding genes and spans 135 SNPs. To elucidate its potential role in hypoxia tolerance, we experimentally tested whether individual genes from the region affect hypoxia tolerance in Drosophila. Three genes significantly impact survival rates in low oxygen: cic, an ortholog of human CIC, Hsl, an ortholog of human LIPE, and Paf-AHα, an ortholog of human PAFAH1B3. Our study reveals evolutionarily conserved genes that modulate hypoxia tolerance. In addition, we show that many of our results would likely be unattainable using data from exome sequencing or microarray studies. This highlights the importance of whole genome sequencing for investigating adaptation by natural selection.

Journal ArticleDOI
TL;DR: The results of this study increase the number of confirmed Psoriasis risk loci and provide novel insight into the pathogenesis of psoriasis.
Abstract: In a previous large-scale exome sequencing analysis for psoriasis, we discovered seven common and low-frequency missense variants within six genes with genome-wide significance. Here we describe an in-depth analysis of noncoding variants based on sequencing data (10,727 cases and 10,582 controls) with replication in an independent cohort of Han Chinese individuals consisting of 4,480 cases and 6,521 controls to identify additional psoriasis susceptibility loci. We confirmed four known psoriasis susceptibility loci (IL12B, IFIH1, ERAP1 and RNF114; 2.30 × 10(-20)≤P≤2.41 × 10(-7)) and identified three new susceptibility loci: 4q24 (NFKB1) at rs1020760 (P=2.19 × 10(-8)), 12p13.3 (CD27-LAG3) at rs758739 (P=4.08 × 10(-8)) and 17q12 (IKZF3) at rs10852936 (P=1.96 × 10(-8)). Two suggestive loci, 3p21.31 and 17q25, are also identified with P<1.00 × 10(-6). The results of this study increase the number of confirmed psoriasis risk loci and provide novel insight into the pathogenesis of psoriasis.

Bernhard Misof, Shanlin Liu, Karen Meusemann, Ralph S. Peters, Alexander Donath, Christoph Mayer, Paul B. Frandsen, Jessica L. Ware, Tomas Flouri, Rolf G. Beutel, Oliver Niehuis, Malte Petersen, Fernando Izquierdo-Carrasco, Torsten Wappler, Jes Rust, Andre J. Aberer, Ulrike Aspöck, Horst Aspöck, Daniela Bartel, Alexander Blanke, Simon Berger, Alexander Böhm, Thomas R. Buckley, Brett Calcott, Junqing Chen, Frank Friedrich, Makiko Fukui, Mari Fujita, Carola Greve, Peter Grobe, Shengchang Gu, Ying Huang, Lars S. Jermiin, Akito Y. Kawahara, Lars Krogmann, Martin Kubiak, Robert Lanfear, Harald Letsch, Yiyuan Li, Zhenyu Li, Jiguang Li, Haorong Lu, Ryuichiro Machida, Yuta Mashimo, Pashalia Kapli, Duane D. McKenna, Guanliang Meng, Yasutaka Nakagaki, José Luis Navarrete-Heredia, Michael Ott, Yanxiang Ou, Günther Pass, Lars Podsiadlowski, Hans Pohl, Björn M. von Reumont, Kai Schütte, Kaoru Sekiya, Shota Shimizu, Adam Slipinski, Alexandros Stamatakis, Wenhui Song, Xu Su, Nikolaus U. Szucsich, Meihua Tan, Xuemei Tan, Min Tang, Jingbo Tang, Gerald Timelthaler, Shigekazu Tomizuka, Michelle D. Trautwein, Xiaoli Tong, Toshiki Uchifune, Manfred Walzl, Brian M. Wiegmann, Jeanne Wilbrandt, Benjamin Wipfler, Thomas K. F. Wong, Qiong Wu, Gengxiong Wu, Yinlong Xie, Shenzhou Yang, Qing Yang, David K. Yeates, Kazunori Yoshizawa, Qing Zhang, Rui Zhang, Wenwei Zhang, Yunhui Zhang, Jing Zhao, Chengran Zhou, Lili Zhou, Tanja Ziesmann, Shijie Zou, Yingrui Li, Xun Xu, Yong Zhang, Huanming Yang, Jian Wang, Jun Wang, Karl M. Kjer, Xin Zhou 
01 Jan 2014
TL;DR: A phylogenetic analysis of protein-coding genes from all major insect orders and close relatives was performed by Misof et al. as discussed by the authors, who used this resolved phylogenetic tree together with fossil analysis to date the origin of insects to ~479 million years ago and to resolve longcontroversial subjects in insect phylogeny.
Abstract: Toward an insect evolution resolution Insects are the most diverse group of animals, with the largest number of species. However, many of the evolutionary relationships between insect species have been controversial and difficult to resolve. Misof et al. performed a phylogenomic analysis of protein-coding genes from all major insect orders and close relatives, resolving the placement of taxa. The authors used this resolved phylogenetic tree together with fossil analysis to date the origin of insects to ~479 million years ago and to resolve long-controversial subjects in insect phylogeny. Science, this issue p. 763 The phylogeny of all major insect lineages reveals how and when insects diversified. Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

Journal ArticleDOI
TL;DR: Preservations in genomic profiles from liver primary tumors to metachronous lung metastases indicate that the genomic features during tumorigenesis may be retained during metastasis, which may explain the clinical observation that both primary and metastatic tumors are usually sensitive or resistant to the same systemic treatments.
Abstract: To gain biological insights into lung metastases from hepatocellular carcinoma (HCC), we compared the whole-genome sequencing profiles of primary HCC and paired lung metastases. We used whole-genome sequencing at 33X-43X coverage to profile somatic mutations in primary HCC (HBV+) and metachronous lung metastases (> 2 years interval). In total, 5,027-13,961 and 5,275-12,624 somatic single-nucleotide variants (SNVs) were detected in primary HCC and lung metastases, respectively. Generally, 38.88-78.49% of SNVs detected in metastases were present in primary tumors. We identified 65–221 structural variations (SVs) in primary tumors and 60–232 SVs in metastases. Comparison of these SVs shows very similar and largely overlapped mutated segments between primary and metastatic tumors. Copy number alterations between primary and metastatic pairs were also found to be closely related. Together, these preservations in genomic profiles from liver primary tumors to metachronous lung metastases indicate that the genomic features during tumorigenesis may be retained during metastasis. We found very similar genomic alterations between primary and metastatic tumors, with a few mutations found specifically in lung metastases, which may explain the clinical observation that both primary and metastatic tumors are usually sensitive or resistant to the same systemic treatments.

Journal ArticleDOI
TL;DR: It is shown that exome capture of saliva-derived DNA yields sufficient non-human sequences to characterize oral microbial communities, including detection of bacteria linked to oral disease (e.g. Prevotella melaninogenica).
Abstract: Targeted capture of genomic regions reduces sequencing cost while generating higher coverage by allowing biomedical researchers to focus on specific loci of interest, such as exons. Targeted capture also has the potential to facilitate the generation of genomic data from DNA collected via saliva or buccal cells. DNA samples derived from these cell types tend to have a lower human DNA yield, may be degraded from age and/or have contamination from bacteria or other ambient oral microbiota. However, thousands of samples have been previously collected from these cell types, and saliva collection has the advantage that it is a non-invasive and appropriate for a wide variety of research. We demonstrate successful enrichment and sequencing of 15 South African KhoeSan exomes and 2 full genomes with samples initially derived from saliva. The expanded exome dataset enables us to characterize genetic diversity free from ascertainment bias for multiple KhoeSan populations, including new exome data from six HGDP Namibian San, revealing substantial population structure across the Kalahari Desert region. Additionally, we discover and independently verify thirty-one previously unknown KIR alleles using methods we developed to accurately map and call the highly polymorphic HLA and KIR loci from exome capture data. Finally, we show that exome capture of saliva-derived DNA yields sufficient non-human sequences to characterize oral microbial communities, including detection of bacteria linked to oral disease (e.g. Prevotella melaninogenica). For comparison, two samples were sequenced using standard full genome library preparation without exome capture and we found no systematic bias of metagenomic information between exome-captured and non-captured data. DNA from human saliva samples, collected and extracted using standard procedures, can be used to successfully sequence high quality human exomes, and metagenomic data can be derived from non-human reads. We find that individuals from the Kalahari carry a higher oral pathogenic microbial load than samples surveyed in the Human Microbiome Project. Additionally, rare variants present in the exomes suggest strong population structure across different KhoeSan populations.

Journal ArticleDOI
TL;DR: Full mtDNA sequences are mined from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this resource and characterising the variation found in the mtDNA sequence in Danes.
Abstract: In this paper, we mine full mtDNA sequences from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this resource. The sample includes 1000 individuals with type 2 diabetes and 1000 controls. We characterise the variation found in the mtDNA sequence in Danes and relate the variation to diabetes risk as well as to several blood phenotypes of the controls but find no significant associations. We report 2025 polymorphisms, of which 393 have not been reported previously. These 393 mutations are both very rare and estimated to be caused by very recent mutations but individuals with type 2 diabetes do not possess more of these variants. Population genetics analysis using Bayesian skyline plot shows a recent history of rapid population growth in the Danish population in accordance with the fact that >40% of variable sites are observed as singletons.

Journal ArticleDOI
28 Apr 2014-PLOS ONE
TL;DR: HLA-DRB1 and CD2AP gene were identified to be among the susceptibility genes of KBD, thus supporting the role of the autoimmune response in KBD and the possibility of shared etiology between osteoarthritis, rheumatoid arthritis, and KBD.
Abstract: Objective To identify and investigate the susceptibility genes of Kashin–Beck disease (KBD) in Chinese population. Methods Whole-exome capturing and sequencing technology was used for the detection of genetic variations in 19 individuals from six families with high incidence of KBD. A total of 44 polymorphisms from 41 genes were genotyped from a total of 144 cases and 144 controls by using MassARRAY under the standard protocol from Sequenom. Association was applied on the data by using PLINK1.07. Results In the sequencing stage, each sample showed approximately 70-fold coverage, thus covering more than 99% of the target regions. Among the single nucleotide polymorphisms (SNPs) used in the transmission disequilibrium test, 108 had a p-value of <0.01, whereas 1056 had a p-value of <0.05. Kyoto Encyclopedia of Genes and Genomes(KEGG) pathway analysis indicates that these SNPs focus on three major pathways: regulation of actin cytoskeleton, focal adhesion, and metabolic pathways. In the validation stage, single locus effects revealed that two of these polymorphisms (rs7745040 and rs9275295) in the human leukocyte antigen (HLA)-DRB1 gene and one polymorphism (rs9473132) in CD2-associated protein (CD2AP) gene have a significant statistical association with KBD. Conclusions HLA-DRB1 and CD2AP gene were identified to be among the susceptibility genes of KBD, thus supporting the role of the autoimmune response in KBD and the possibility of shared etiology between osteoarthritis, rheumatoid arthritis, and KBD.

Journal ArticleDOI
09 Jan 2014-PLOS ONE
TL;DR: The analytical strategy developed here will be of great help in fighting against the outbreaks of emerging infectious diseases, by pinpointing the source of pathogens rapidly with genomic epidemiological data and microbial forensics information.
Abstract: Source tracing of pathogens is critical for the control and prevention of infectious diseases. Genome sequencing by high throughput technologies is currently feasible and popular, leading to the burst of deciphered bacterial genome sequences. Utilizing the flooding genomic data for source tracing of pathogens in outbreaks is promising, and challenging as well. Here, we employed Yersinia pestis genomes from a plague outbreak at Xinghai county of China in 2009 as an example, to develop a simple two-step strategy for rapid source tracing of the outbreak. The first step was to define the phylogenetic position of the outbreak strains in a whole species tree, and the next step was to provide a detailed relationship across the outbreak strains and their suspected relatives. Through this strategy, we observed that the Xinghai plague outbreak was caused by Y. pestis that circulated in the local plague focus, where the majority of historical plague epidemics in the Qinghai-Tibet Plateau may originate from. The analytical strategy developed here will be of great help in fighting against the outbreaks of emerging infectious diseases, by pinpointing the source of pathogens rapidly with genomic epidemiological data and microbial forensics information.

Journal ArticleDOI
TL;DR: The phylogeny of the ground tit was confirmed as not belonging to the Corvidae family but to the Paridae family, which reflects the classification of this species to the Estrildidae family.
Abstract: 1. Fumin Lei is no longer listed as an author of this article. Instead, his helpful input is noted in the acknowledgements section. 2. The provisional version of this article mistakenly stated that zebra finch belongs to the Paridae family. We have now corrected this error to reflect the classification of this species to the Estrildidae family. 3. In the abstract of the provisional version of the article we stated that the phylogeny of the ground tit was confirmed as belonging to the Paridae family. We have now re-phrased this sentence to say that ground tit phylogeny was confirmed as not belonging to the Corvidae family. 4. In the conclusions of the provisional version of the article we stated that the phylogeny of the ground tit was confirmed as not belonging to the Corvidae family but to the Paridae family. We have now re-phrased this conclusion to say that ground tit phylogeny was confirmed as not belonging to the Corvidae family.

Bernhard Misof, Shanlin Liu, Karen Meusemann, Ralph S. Peters, Alexander Donath, Christoph Mayer, Paul B. Frandsen, Jessica L. Ware, Tomas Flouri, Rolf G. Beutel, Oliver Niehuis, Malte Petersen, Fernando Izquierdo-Carrasco, Torsten Wappler, Jes Rust, Andre J. Aberer, Ulrike Aspöck, Horst Aspöck, Daniela Bartel, Alexander Blanke, Simon Berger, Alexander Böhm, Thomas R. Buckley, Brett Calcott, Junqing Chen, Frank Friedrich, Makiko Fukui, Mari Fujita, Carola Greve, Peter Grobe, Shengchang Gu, Ying Huang, Lars S. Jermiin, Akito Y. Kawahara, Lars Krogmann, Martin Kubiak, Robert Lanfear, Harald Letsch, Yiyuan Li, Zhenyu Li, Jiguang Li, Haorong Lu, Ryuichiro Machida, Yuta Mashimo, Pashalia Kapli, Duane D. McKenna, Guanliang Meng, Yasutaka Nakagaki, José Luis Navarrete-Heredia, Michael Ott, Yanxiang Ou, Günther Pass, Lars Podsiadlowski, Hans Pohl, Björn M. von Reumont, Kai Schütte, Kaoru Sekiya, Shota Shimizu, Adam Slipinski, Alexandros Stamatakis, Wenhui Song, Xu Su, Nikolaus U. Szucsich, Meihua Tan, Xuemei Tan, Min Tang, Jingbo Tang, Gerald Timelthaler, Shigekazu Tomizuka, Michelle D. Trautwein, Xiaoli Tong, Toshiki Uchifune, Manfred Walzl, Brian M. Wiegmann, Jeanne Wilbrandt, Benjamin Wipfler, Thomas K. F. Wong, Qiong Wu, Gengxiong Wu, Yinlong Xie, Shenzhou Yang, Qing Yang, David K. Yeates, Kazunori Yoshizawa, Qing Zhang, Rui Zhang, Wenwei Zhang, Yunhui Zhang, Jing Zhao, Chengran Zhou, Lili Zhou, Tanja Ziesmann, Shijie Zou, Yingrui Li, Xun Xu, Yong Zhang, Huanming Yang, Jian Wang, Jun Wang, Karl M. Kjer, Xin Zhou 
01 Jan 2014

Patent
27 Nov 2014
TL;DR: In this paper, the authors proposed a method of gap closing in nucleotide sequence, which consists of selecting reads having an overlap with one end of the first contig close to the gap as a set of reads for gap closing, selecting reads with a shortest overlap with the first-closest contig in the set of read candidates, and determining whether reads having no overlapping relationship with the candidate read present in the read candidates present for gap-closing.
Abstract: Provided is a method of gap closing in nucleotide sequence. The nucleic acid sequence comprises a first contig at one end of a gap in an unassembled region, and a second contig at the other end of the gap in the unassembled region. The method comprises: selecting reads having an overlap with one end of the first contig close to the gap as a set of reads for gap closing; selecting reads having a shortest overlap with the first contig in the set of reads for gap closing as a candidate read; determining whether reads having an overlapping length with the first contig shorter than an overlapping length between the candidate read and the first contig present in the set of reads for gap closing, and determining whether reads having no overlapping relationship with the candidate read present in the set of reads for gap closing; obtaining a result of presenting an extension conflict, and determining an unconfident candidate read, if reads having an overlapping length with the first contig shorter than an overlapping length between the candidate read and the first contig present in the set of reads for gap closing, reads having no overlapping relationship with the candidate read present in the set of reads for gap closing, or both reads having an overlapping length with the first contig shorter than an overlapping length between the candidate read and the first contig, and reads having no overlapping relationship with the candidate read present in the set of reads for gap closing; reselecting the candidate read until obtaining a confident candidate read, if the candidate read is unconfident; connecting the confident candidate read to the first contig, to form a new first contig; determining whether one end of the new first contig close to the gap has an overlap with one end of the second contig close to the gap; performing the step of selecting the set of reads for gap closing on the basis of the new first contig, if the one end of the new first contig close to the gap has no overlap with the one end of the second contig close to the gap, wherein the first contig in the step of selecting the set of reads for gap closing is replaced with the new first contig; connecting the new first contig to the second contig to complete gap closing, if one end of the new first contig close to the gap has an overlap with one end of the second contig close to the gap.