Showing papers by "Yingrui Li published in 2014"
••
Commonwealth Scientific and Industrial Research Organisation1, Rutgers University2, Heidelberg Institute for Theoretical Studies3, University of Jena4, University of Bonn5, University of Vienna6, Naturhistorisches Museum7, University of Tsukuba8, Landcare Research9, Johns Hopkins University10, University of Hamburg11, Ehime University12, Florida Museum of Natural History13, Staatliches Museum für Naturkunde Stuttgart14, National Evolutionary Synthesis Center15, Australian National University16, Macquarie University17, American Museum of Natural History18, University of Memphis19, University of Guadalajara20, Bavarian Academy of Sciences and Humanities21, Natural History Museum22, Karlsruhe Institute of Technology23, California Academy of Sciences24, South China Agricultural University25, North Carolina State University26, Hokkaido University27
TL;DR: The phylogeny of all major insect lineages reveals how and when insects diversified and provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
Abstract: Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
1,998 citations
••
TL;DR: Genomic analyses suggest that ESCC and head and neck squamous cell carcinoma share some common pathogenic mechanisms, and ESCC development is associated with alcohol drinking, and novel biological markers and tumorigenic pathways that would greatly improve therapeutic strategies for ESCC are explored.
Abstract: Oesophageal cancer is one of the most aggressive cancers and is the sixth leading cause of cancer death worldwide(1). Approximately 70% of global oesophageal cancer cases occur in China, with oesophageal squamous cell carcinoma (ESCC) being the histopathological form in the vast majority of cases (>90%)(2,3). Currently, there are limited clinical approaches for the early diagnosis and treatment of ESCC, resulting in a 10% five-year survival rate for patients. However, the full repertoire of genomic events leading to the pathogenesis of ESCC remains unclear. Here we describe a comprehensive genomic analysis of 158 ESCC cases, as part of the International Cancer Genome Consortium research project. We conducted whole-genome sequencing in 17 ESCC cases and whole-exome sequencing in 71 cases, of which 53 cases, plus an additional 70 ESCC cases not used in the whole-genome and whole-exome sequencing, were subjected to array comparative genomic hybridization analysis. We identified eight significantly mutated genes, of which six are well known tumour-associated genes (TP53, RB1, CDKN2A, PIK3CA, NOTCH1, NFE2L2), and two have not previously been described in ESCC (ADAM29 and FAM135B). Notably, FAM135B is identified as a novel cancer-implicated gene as assayed for its ability to promote malignancy of ESCC cells. Additionally, MIR548K, a microRNA encoded in the amplified 11q13.3-13.4 region, is characterized as a novel oncogene, and functional assays demonstrate that MIR548K enhances malignant phenotypes of ESCC cells. Moreover, we have found that several important histone regulator genes (MLL2 (also called KMT2D), ASH1L, MLL3 (KMT2C), SETD1B, CREBBP and EP300) are frequently altered in ESCC. Pathway assessment reveals that somatic aberrations are mainly involved in the Wnt, cell cycle and Notch pathways. Genomic analyses suggest that ESCC and head and neck squamous cell carcinoma share some common pathogenic mechanisms, and ESCC development is associated with alcohol drinking. This study has explored novel biological markers and tumorigenic pathways that would greatly improve therapeutic strategies for ESCC.
853 citations
••
TL;DR: Re-sequencing the region around EPAS1 in 40 Tibetan and 40 Han individuals finds that this gene has a highly unusual haplotype structure that can only be convincingly explained by introgression of DNA from Denisovan or Denisovan-related individuals into humans.
Abstract: As modern humans migrated out of Africa, they encountered many new environmental conditions, including greater temperature extremes, different pathogens and higher altitudes. These diverse environments are likely to have acted as agents of natural selection and to have led to local adaptations. One of the most celebrated examples in humans is the adaptation of Tibetans to the hypoxic environment of the high-altitude Tibetan plateau. A hypoxia pathway gene, EPAS1, was previously identified as having the most extreme signature of positive selection in Tibetans, and was shown to be associated with differences in haemoglobin concentration at high altitude. Re-sequencing the region around EPAS1 in 40 Tibetan and 40 Han individuals, we find that this gene has a highly unusual haplotype structure that can only be convincingly explained by introgression of DNA from Denisovan or Denisovan-related individuals into humans. Scanning a larger set of worldwide populations, we find that the selected haplotype is only found in Denisovans and in Tibetans, and at very low frequency among Han Chinese. Furthermore, the length of the haplotype, and the fact that it is not found in any other populations, makes it unlikely that the haplotype sharing between Tibetans and Denisovans was caused by incomplete ancestral lineage sorting rather than introgression. Our findings illustrate that admixture with other hominin species has provided genetic variation that helped humans to adapt to new environments.
851 citations
••
TL;DR: The conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution, compared with two other popular transcriptome assemblers.
Abstract: Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Owing to the rapid increase in throughputs and decrease in costs of next- generation sequencing, RNA-Seq in particular has become the method of choice. However, the very short reads (e.g. 2 � 90 bp paired ends) from next generation sequencing makes de novo assembly to recover complete or full-length transcript sequences an algorithmic challenge. Results: Here, we present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. We evaluated its performance on transcriptome datasets from rice and mouse. Using as our benchmarks the known transcripts from these wellannotated genomes (sequenced a decade ago), we assessed how SOAPdenovo-Trans and two other popular transcriptome assemblers handled such practical issues as alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution. Availability and implementation: Source code and user manual are available at http://sourceforge.net/projects/soapdenovotrans/. Contact: xieyl@genomics.cn or bgi-soap@googlegroups.com Supplementary information: Supplementary data are available at Bioinformatics online.
730 citations
••
University of Groningen1, Columbia University2, University of Washington3, Leiden University4, University of Amsterdam5, Erasmus University Rotterdam6, Max Planck Society7, Utrecht University8, Centrum Wiskunde & Informatica9, Radboud University Nijmegen10, Massachusetts Institute of Technology11, Harvard University12, Pfizer13, Beijing Institute of Genomics14, University of Copenhagen15
TL;DR: The Genome of the Netherlands (GoNL) Project is described, in which the whole genomes of 250 Dutch parent-offspring families were sequenced and a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions were constructed.
Abstract: Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring families and constructed a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions. The intermediate coverage (∼13×) and trio design enabled extensive characterization of structural variation, including midsize events (30-500 bp) previously poorly catalogued and de novo mutations. We demonstrate that the quality of the haplotypes boosts imputation accuracy in independent samples, especially for lower frequency alleles. Population genetic analyses demonstrate fine-scale structure across the country and support multiple ancient migrations, consistent with historical changes in sea level and flooding. The GoNL Project illustrates how single-population whole-genome sequencing can provide detailed characterization of genetic variation and may guide the design of future population studies.
677 citations
••
TL;DR: It is found that dosage compensation effect of tandem duplication genes probably contributed to the pungent diversification in pepper and the Capsicum reference genome provides crucial information for the study of not only the evolution of the pepper genome but also, the Solanaceae family.
Abstract: As an economic crop, pepper satisfies people’s spicy taste and has medicinal uses worldwide. To gain a better understanding of Capsicum evolution, domestication, and specialization, we present here the genome sequence of the cultivated pepper Zunla-1 (C. annuum L.) and its wild progenitor Chiltepin (C. annuum var. glabriusculum). We estimate that the pepper genome expanded ∼0.3 Mya (with respect to the genome of other Solanaceae) by a rapid amplification of retrotransposons elements, resulting in a genome comprised of ∼81% repetitive sequences. Approximately 79% of 3.48-Gb scaffolds containing 34,476 protein-coding genes were anchored to chromosomes by a high-density genetic map. Comparison of cultivated and wild pepper genomes with 20 resequencing accessions revealed molecular footprints of artificial selection, providing us with a list of candidate domestication genes. We also found that dosage compensation effect of tandem duplication genes probably contributed to the pungent diversification in pepper. The Capsicum reference genome provides crucial information for the study of not only the evolution of the pepper genome but also, the Solanaceae family, and it will facilitate the establishment of more effective pepper breeding programs.
593 citations
••
TL;DR: A draft 6.5 Gb genome sequence of Locusta migratoria is presented, which is the largest animal genome sequenced so far, and complex regulatory mechanisms involved in microtubule dynamic-mediated synapse plasticity during phase change are revealed.
Abstract: Locusts are one of the world's most destructive agricultural pests and represent a useful model system in entomology. Here we present a draft 6.5 Gb genome sequence of Locusta migratoria, which is the largest animal genome sequenced so far. Our findings indicate that the large genome size of L. migratoria is likely to be because of transposable element proliferation combined with slow rates of loss for these elements. Methylome and transcriptome analyses reveal complex regulatory mechanisms involved in microtubule dynamic-mediated synapse plasticity during phase change. We find significant expansion of gene families associated with energy consumption and detoxification, consistent with long-distance flight capacity and phytophagy. We report hundreds of potential insecticide target genes, including cys-loop ligand-gated ion channels, G-protein-coupled receptors and lethal genes. The L. migratoria genome sequence offers new insights into the biology and sustainable management of this pest species, and will promote its wide use as a model system.
431 citations
••
TL;DR: The Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL, is described, a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population.
Abstract: Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project.
267 citations
••
TL;DR: The sesame genome will facilitate future research on the evolution of eudicots, as well as the study of lipid biosynthesis and potential genetic improvement of sesame, an important species from the order Lamiales and a high oil crop.
Abstract: Background: Sesame, Sesamum indicum L., is considered the queen of oilseeds for its high oil content and quality, and is grown widely in tropical and subtropical areas as an important source of oil and protein. However, the molecular biology of sesame is largely unexplored. Results: Here, we report a high-quality genome sequence of sesame assembled de novo with a contig N50 of 52.2 kb and a scaffold N50 of 2.1 Mb, containing an estimated 27,148 genes. The results reveal novel, independent whole genome duplication and the absence of the Toll/interleukin-1 receptor domain in resistance genes. Candidate genes and oil biosynthetic pathways contributing to high oil content were discovered by comparative genomic and transcriptomic analyses. These revealed the expansion of type 1 lipid transfer genes by tandem duplication, the contraction of lipid degradation genes, and the differential expression of essential genes in the triacylglycerol biosynthesis pathway, particularly in the early stage of seed development. Resequencing data in 29 sesame accessions from 12 countries suggested that the high genetic diversity of lipid-related genes might be associated with the wide variation in oil content. Additionally, the results shed light on the pivotal stage of seed development, oil accumulation and potential key genes for sesamin production, an important pharmacological constituent of sesame. Conclusions: As an important species from the order Lamiales and a high oil crop, the sesame genome will facilitate future research on the evolution of eudicots, as well as the study of lipid biosynthesis and potential genetic improvement of sesame.
225 citations
••
TL;DR: Single-variant and gene-based association analyses of nonsynonymous SNVs did not identify newly associated genes for psoriasis in the regions subjected to targeted resequencing, which suggests that coding variants in the 1,326 targeted genes contribute only a limited fraction of the overall genetic risk for Psoriasis.
Abstract: To explore the contribution of functional coding variants to psoriasis, we analyzed nonsynonymous single-nucleotide variants (SNVs) across the genome by exome sequencing in 781 psoriasis cases and 676 controls and through follow-up validation in 1,326 candidate genes by targeted sequencing in 9,946 psoriasis cases and 9,906 controls from the Chinese population. We discovered two independent missense SNVs in IL23R and GJB2 of low frequency and five common missense SNVs in LCE3D, ERAP1, CARD14 and ZNF816A associated with psoriasis at genome-wide significance. Rare missense SNVs in FUT2 and TARBP1 were also observed with suggestive evidence of association. Single-variant and gene-based association analyses of nonsynonymous SNVs did not identify newly associated genes for psoriasis in the regions subjected to targeted resequencing. This suggests that coding variants in the 1,326 targeted genes contribute only a limited fraction of the overall genetic risk for psoriasis.
191 citations
••
TL;DR: In this article, the authors performed whole-exome sequencing of 49 blood-tumor pairs and RNA sequencing of 44 tumors from cortisol-producing adenomas (ACAs), adrenocorticotropic hormone-independent macronodular hyperplasias (AIMAHs), and Adrenocortical oncocytomas (ADOs) and identified a hotspot in the PRKACA gene with a L205R mutation in 69.2% (27 out of 39) of ACAs and validated in 65.5% of a total of 87
Abstract: Adrenal Cushing's syndrome is caused by excess production of glucocorticoid from adrenocortical tumors and hyperplasias, which leads to metabolic disorders. We performed whole-exome sequencing of 49 blood-tumor pairs and RNA sequencing of 44 tumors from cortisol-producing adrenocortical adenomas (ACAs), adrenocorticotropic hormone-independent macronodular adrenocortical hyperplasias (AIMAHs), and adrenocortical oncocytomas (ADOs). We identified a hotspot in the PRKACA gene with a L205R mutation in 69.2% (27 out of 39) of ACAs and validated in 65.5% of a total of 87 ACAs. Our data revealed that the activating L205R mutation, which locates in the P+1 loop of the protein kinase A (PKA) catalytic subunit, promoted PKA substrate phosphorylation and target gene expression. Moreover, we discovered the recurrently mutated gene DOT1L in AIMAHs and CLASP2 in ADOs. Collectively, these data highlight potentially functional mutated genes in adrenal Cushing's syndrome.
••
TL;DR: With careful monitoring via whole-genome sequencing it is possible to apply genome editing to human pluripotent cells with minimal impact on genomic mutational load, and a TALEN-HDAdV hybrid vector is developed, which significantly increased gene-correction efficiency in hiPSCs.
••
TL;DR: This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity.
Abstract: The genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.
••
TL;DR: This study provides the first exome-wide evidence at single-cell level supporting that colon cancer could be of a biclonal origin, and suggests that low-prevalence mutations in a cohort may also play important protumorigenic roles at the individual level.
Abstract: Single-cell sequencing is a powerful tool for delineating clonal relationship and identifying key driver genes for personalized cancer management. Here we performed single-cell sequencing analysis of a case of colon cancer. Population genetics analyses identified two independent clones in tumor cell population. The major tumor clone harbored APC and TP53 mutations as early oncogenic events, whereas the minor clone contained preponderant CDC27 and PABPC1 mutations. The absence of APC and TP53 mutations in the minor clone supports that these two clones were derived from two cellular origins. Examination of somatic mutation allele frequency spectra of additional 21 whole-tissue exome-sequenced cases revealed the heterogeneity of clonal origins in colon cancer. Next, we identified a mutated gene SLC12A5 that showed a high frequency of mutation at the single-cell level but exhibited low prevalence at the population level. Functional characterization of mutant SLC12A5 revealed its potential oncogenic effect in colon cancer. Our study provides the first exome-wide evidence at single-cell level supporting that colon cancer could be of a biclonal origin, and suggests that low-prevalence mutations in a cohort may also play important protumorigenic roles at the individual level.
••
TL;DR: Deep-sequence 42 HCC patients with a combination of whole genome, exome and transcriptome sequencing identify the mutational landscape of HCC and find frequent mutations in TP53, CTNNB1 and AXIN1, and rare but likely functional mutations in BAP1 and IDH1.
Abstract: Background
Hepatocellular carcinoma (HCC) is a heterogeneous disease with high mortality rate. Recent genomic studies have identified TP53, AXIN1, and CTNNB1 as the most frequently mutated genes. Lower frequency mutations have been reported in ARID1A, ARID2 and JAK1. In addition, hepatitis B virus (HBV) integrations into the human genome have been associated with HCC.
••
TL;DR: This study is the first to identify frequent BAP1 and BRCA pathway alterations in bladder cancer, show TERT promoter alterations are independent of other bladder cancer gene alterations, and show KDM6A loss is a driver of the bladder cancer phenotype.
Abstract: Purpose: Genetic analysis of bladder cancer has revealed a number of frequently altered genes, including frequent alterations of the telomerase ( TERT ) gene promoter, although few altered genes have been functionally evaluated. Our objective is to characterize alterations observed by exome sequencing and sequencing of the TERT promoter, and to examine the functional relevance of histone lysine (K)–specific demethylase 6A ( KDM6A/UTX ), a frequently mutated histone demethylase, in bladder cancer. Experimental Design: We analyzed bladder cancer samples from 54 U.S. patients by exome and targeted sequencing and confirmed somatic variants using normal tissue from the same patient. We examined the biologic function of KDM6A using in vivo and in vitro assays. Results: We observed frequent somatic alterations in BRCA1 associated protein-1 (BAP1) in 15% of tumors, including deleterious alterations to the deubiquitinase active site and the nuclear localization signal. BAP1 mutations contribute to a high frequency of tumors with breast cancer (BRCA) DNA repair pathway alterations and were significantly associated with papillary histologic features in tumors. BAP1 and KDM6A mutations significantly co-occurred in tumors. Somatic variants altering the TERT promoter were found in 69% of tumors but were not correlated with alterations in other bladder cancer genes. We examined the function of KDM6A , altered in 24% of tumors, and show depletion in human bladder cancer cells, enhanced in vitro proliferation, in vivo tumor growth, and cell migration. Conclusions: This study is the first to identify frequent BAP1 and BRCA pathway alterations in bladder cancer, show TERT promoter alterations are independent of other bladder cancer gene alterations, and show KDM6A loss is a driver of the bladder cancer phenotype. Clin Cancer Res; 20(18); 4935–48. ©2014 AACR .
••
TL;DR: Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble.
Abstract: Background: Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome. Findings: We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) – the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300× raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing. Conclusions: Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits.
••
TL;DR: The first whole genome resequencing-based analysis identifying genes that likely modulate high altitude adaptation in native Ethiopians residing at 3,500 m above sea level on Bale Plateau or Chennek field in Ethiopia highlights the importance of whole genome sequencing for investigating adaptation by natural selection.
Abstract: Although it has long been proposed that genetic factors contribute to adaptation to high altitude, such factors remain largely unverified. Recent advances in high-throughput sequencing have made it feasible to analyze genome-wide patterns of genetic variation in human populations. Since traditionally such studies surveyed only a small fraction of the genome, interpretation of the results was limited. We report here the results of the first whole genome resequencing-based analysis identifying genes that likely modulate high altitude adaptation in native Ethiopians residing at 3,500 m above sea level on Bale Plateau or Chennek field in Ethiopia. Using cross-population tests of selection, we identify regions with a significant loss of diversity, indicative of a selective sweep. We focus on a 208 kbp gene-rich region on chromosome 19, which is significant in both of the Ethiopian subpopulations sampled. This region contains eight protein-coding genes and spans 135 SNPs. To elucidate its potential role in hypoxia tolerance, we experimentally tested whether individual genes from the region affect hypoxia tolerance in Drosophila. Three genes significantly impact survival rates in low oxygen: cic, an ortholog of human CIC, Hsl, an ortholog of human LIPE, and Paf-AHα, an ortholog of human PAFAH1B3. Our study reveals evolutionarily conserved genes that modulate hypoxia tolerance. In addition, we show that many of our results would likely be unattainable using data from exome sequencing or microarray studies. This highlights the importance of whole genome sequencing for investigating adaptation by natural selection.
••
TL;DR: The results of this study increase the number of confirmed Psoriasis risk loci and provide novel insight into the pathogenesis of psoriasis.
Abstract: In a previous large-scale exome sequencing analysis for psoriasis, we discovered seven common and low-frequency missense variants within six genes with genome-wide significance. Here we describe an in-depth analysis of noncoding variants based on sequencing data (10,727 cases and 10,582 controls) with replication in an independent cohort of Han Chinese individuals consisting of 4,480 cases and 6,521 controls to identify additional psoriasis susceptibility loci. We confirmed four known psoriasis susceptibility loci (IL12B, IFIH1, ERAP1 and RNF114; 2.30 × 10(-20)≤P≤2.41 × 10(-7)) and identified three new susceptibility loci: 4q24 (NFKB1) at rs1020760 (P=2.19 × 10(-8)), 12p13.3 (CD27-LAG3) at rs758739 (P=4.08 × 10(-8)) and 17q12 (IKZF3) at rs10852936 (P=1.96 × 10(-8)). Two suggestive loci, 3p21.31 and 17q25, are also identified with P<1.00 × 10(-6). The results of this study increase the number of confirmed psoriasis risk loci and provide novel insight into the pathogenesis of psoriasis.
01 Jan 2014
TL;DR: A phylogenetic analysis of protein-coding genes from all major insect orders and close relatives was performed by Misof et al. as discussed by the authors, who used this resolved phylogenetic tree together with fossil analysis to date the origin of insects to ~479 million years ago and to resolve longcontroversial subjects in insect phylogeny.
Abstract: Toward an insect evolution resolution Insects are the most diverse group of animals, with the largest number of species. However, many of the evolutionary relationships between insect species have been controversial and difficult to resolve. Misof et al. performed a phylogenomic analysis of protein-coding genes from all major insect orders and close relatives, resolving the placement of taxa. The authors used this resolved phylogenetic tree together with fossil analysis to date the origin of insects to ~479 million years ago and to resolve long-controversial subjects in insect phylogeny. Science, this issue p. 763 The phylogeny of all major insect lineages reveals how and when insects diversified. Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
••
TL;DR: Preservations in genomic profiles from liver primary tumors to metachronous lung metastases indicate that the genomic features during tumorigenesis may be retained during metastasis, which may explain the clinical observation that both primary and metastatic tumors are usually sensitive or resistant to the same systemic treatments.
Abstract: To gain biological insights into lung metastases from hepatocellular carcinoma (HCC), we compared the whole-genome sequencing profiles of primary HCC and paired lung metastases. We used whole-genome sequencing at 33X-43X coverage to profile somatic mutations in primary HCC (HBV+) and metachronous lung metastases (> 2 years interval). In total, 5,027-13,961 and 5,275-12,624 somatic single-nucleotide variants (SNVs) were detected in primary HCC and lung metastases, respectively. Generally, 38.88-78.49% of SNVs detected in metastases were present in primary tumors. We identified 65–221 structural variations (SVs) in primary tumors and 60–232 SVs in metastases. Comparison of these SVs shows very similar and largely overlapped mutated segments between primary and metastatic tumors. Copy number alterations between primary and metastatic pairs were also found to be closely related. Together, these preservations in genomic profiles from liver primary tumors to metachronous lung metastases indicate that the genomic features during tumorigenesis may be retained during metastasis. We found very similar genomic alterations between primary and metastatic tumors, with a few mutations found specifically in lung metastases, which may explain the clinical observation that both primary and metastatic tumors are usually sensitive or resistant to the same systemic treatments.
••
TL;DR: It is shown that exome capture of saliva-derived DNA yields sufficient non-human sequences to characterize oral microbial communities, including detection of bacteria linked to oral disease (e.g. Prevotella melaninogenica).
Abstract: Targeted capture of genomic regions reduces sequencing cost while generating higher coverage by allowing biomedical researchers to focus on specific loci of interest, such as exons. Targeted capture also has the potential to facilitate the generation of genomic data from DNA collected via saliva or buccal cells. DNA samples derived from these cell types tend to have a lower human DNA yield, may be degraded from age and/or have contamination from bacteria or other ambient oral microbiota. However, thousands of samples have been previously collected from these cell types, and saliva collection has the advantage that it is a non-invasive and appropriate for a wide variety of research. We demonstrate successful enrichment and sequencing of 15 South African KhoeSan exomes and 2 full genomes with samples initially derived from saliva. The expanded exome dataset enables us to characterize genetic diversity free from ascertainment bias for multiple KhoeSan populations, including new exome data from six HGDP Namibian San, revealing substantial population structure across the Kalahari Desert region. Additionally, we discover and independently verify thirty-one previously unknown KIR alleles using methods we developed to accurately map and call the highly polymorphic HLA and KIR loci from exome capture data. Finally, we show that exome capture of saliva-derived DNA yields sufficient non-human sequences to characterize oral microbial communities, including detection of bacteria linked to oral disease (e.g. Prevotella melaninogenica). For comparison, two samples were sequenced using standard full genome library preparation without exome capture and we found no systematic bias of metagenomic information between exome-captured and non-captured data. DNA from human saliva samples, collected and extracted using standard procedures, can be used to successfully sequence high quality human exomes, and metagenomic data can be derived from non-human reads. We find that individuals from the Kalahari carry a higher oral pathogenic microbial load than samples surveyed in the Human Microbiome Project. Additionally, rare variants present in the exomes suggest strong population structure across different KhoeSan populations.
••
TL;DR: Full mtDNA sequences are mined from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this resource and characterising the variation found in the mtDNA sequence in Danes.
Abstract: In this paper, we mine full mtDNA sequences from an exome capture data set of 2000 Danes, showing that it is possible to get high-quality full-genome sequences of the mitochondrion from this resource. The sample includes 1000 individuals with type 2 diabetes and 1000 controls. We characterise the variation found in the mtDNA sequence in Danes and relate the variation to diabetes risk as well as to several blood phenotypes of the controls but find no significant associations. We report 2025 polymorphisms, of which 393 have not been reported previously. These 393 mutations are both very rare and estimated to be caused by very recent mutations but individuals with type 2 diabetes do not possess more of these variants. Population genetics analysis using Bayesian skyline plot shows a recent history of rapid population growth in the Danish population in accordance with the fact that >40% of variable sites are observed as singletons.
••
TL;DR: HLA-DRB1 and CD2AP gene were identified to be among the susceptibility genes of KBD, thus supporting the role of the autoimmune response in KBD and the possibility of shared etiology between osteoarthritis, rheumatoid arthritis, and KBD.
Abstract: Objective
To identify and investigate the susceptibility genes of Kashin–Beck disease (KBD) in Chinese population.
Methods
Whole-exome capturing and sequencing technology was used for the detection of genetic variations in 19 individuals from six families with high incidence of KBD. A total of 44 polymorphisms from 41 genes were genotyped from a total of 144 cases and 144 controls by using MassARRAY under the standard protocol from Sequenom. Association was applied on the data by using PLINK1.07.
Results
In the sequencing stage, each sample showed approximately 70-fold coverage, thus covering more than 99% of the target regions. Among the single nucleotide polymorphisms (SNPs) used in the transmission disequilibrium test, 108 had a p-value of <0.01, whereas 1056 had a p-value of <0.05. Kyoto Encyclopedia of Genes and Genomes(KEGG) pathway analysis indicates that these SNPs focus on three major pathways: regulation of actin cytoskeleton, focal adhesion, and metabolic pathways. In the validation stage, single locus effects revealed that two of these polymorphisms (rs7745040 and rs9275295) in the human leukocyte antigen (HLA)-DRB1 gene and one polymorphism (rs9473132) in CD2-associated protein (CD2AP) gene have a significant statistical association with KBD.
Conclusions
HLA-DRB1 and CD2AP gene were identified to be among the susceptibility genes of KBD, thus supporting the role of the autoimmune response in KBD and the possibility of shared etiology between osteoarthritis, rheumatoid arthritis, and KBD.
••
TL;DR: The analytical strategy developed here will be of great help in fighting against the outbreaks of emerging infectious diseases, by pinpointing the source of pathogens rapidly with genomic epidemiological data and microbial forensics information.
Abstract: Source tracing of pathogens is critical for the control and prevention of infectious diseases. Genome sequencing by high throughput technologies is currently feasible and popular, leading to the burst of deciphered bacterial genome sequences. Utilizing the flooding genomic data for source tracing of pathogens in outbreaks is promising, and challenging as well. Here, we employed Yersinia pestis genomes from a plague outbreak at Xinghai county of China in 2009 as an example, to develop a simple two-step strategy for rapid source tracing of the outbreak. The first step was to define the phylogenetic position of the outbreak strains in a whole species tree, and the next step was to provide a detailed relationship across the outbreak strains and their suspected relatives. Through this strategy, we observed that the Xinghai plague outbreak was caused by Y. pestis that circulated in the local plague focus, where the majority of historical plague epidemics in the Qinghai-Tibet Plateau may originate from. The analytical strategy developed here will be of great help in fighting against the outbreaks of emerging infectious diseases, by pinpointing the source of pathogens rapidly with genomic epidemiological data and microbial forensics information.
••
TL;DR: The phylogeny of the ground tit was confirmed as not belonging to the Corvidae family but to the Paridae family, which reflects the classification of this species to the Estrildidae family.
Abstract: 1. Fumin Lei is no longer listed as an author of this article. Instead, his helpful input is noted in the acknowledgements section. 2. The provisional version of this article mistakenly stated that zebra finch belongs to the Paridae family. We have now corrected this error to reflect the classification of this species to the Estrildidae family. 3. In the abstract of the provisional version of the article we stated that the phylogeny of the ground tit was confirmed as belonging to the Paridae family. We have now re-phrased this sentence to say that ground tit phylogeny was confirmed as not belonging to the Corvidae family. 4. In the conclusions of the provisional version of the article we stated that the phylogeny of the ground tit was confirmed as not belonging to the Corvidae family but to the Paridae family. We have now re-phrased this conclusion to say that ground tit phylogeny was confirmed as not belonging to the Corvidae family.
•
27 Nov 2014
TL;DR: In this paper, the authors proposed a method of gap closing in nucleotide sequence, which consists of selecting reads having an overlap with one end of the first contig close to the gap as a set of reads for gap closing, selecting reads with a shortest overlap with the first-closest contig in the set of read candidates, and determining whether reads having no overlapping relationship with the candidate read present in the read candidates present for gap-closing.
Abstract: Provided is a method of gap closing in nucleotide sequence. The nucleic acid sequence comprises a first contig at one end of a gap in an unassembled region, and a second contig at the other end of the gap in the unassembled region. The method comprises: selecting reads having an overlap with one end of the first contig close to the gap as a set of reads for gap closing; selecting reads having a shortest overlap with the first contig in the set of reads for gap closing as a candidate read; determining whether reads having an overlapping length with the first contig shorter than an overlapping length between the candidate read and the first contig present in the set of reads for gap closing, and determining whether reads having no overlapping relationship with the candidate read present in the set of reads for gap closing; obtaining a result of presenting an extension conflict, and determining an unconfident candidate read, if reads having an overlapping length with the first contig shorter than an overlapping length between the candidate read and the first contig present in the set of reads for gap closing, reads having no overlapping relationship with the candidate read present in the set of reads for gap closing, or both reads having an overlapping length with the first contig shorter than an overlapping length between the candidate read and the first contig, and reads having no overlapping relationship with the candidate read present in the set of reads for gap closing; reselecting the candidate read until obtaining a confident candidate read, if the candidate read is unconfident; connecting the confident candidate read to the first contig, to form a new first contig; determining whether one end of the new first contig close to the gap has an overlap with one end of the second contig close to the gap; performing the step of selecting the set of reads for gap closing on the basis of the new first contig, if the one end of the new first contig close to the gap has no overlap with the one end of the second contig close to the gap, wherein the first contig in the step of selecting the set of reads for gap closing is replaced with the new first contig; connecting the new first contig to the second contig to complete gap closing, if one end of the new first contig close to the gap has an overlap with one end of the second contig close to the gap.