scispace - formally typeset
Search or ask a question

Showing papers by "Cold Spring Harbor Laboratory published in 2016"


Journal ArticleDOI
TL;DR: These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.
Abstract: Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.

3,096 citations


Journal ArticleDOI
TL;DR: The open-source FALCON and FALcon-Unzip algorithms are introduced to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes.
Abstract: While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.

1,490 citations


Journal ArticleDOI
TL;DR: The data suggest that induction of NETs by cancer cells is a previously unidentified metastasis-promoting tumor-host interaction and a potential therapeutic target, and treatment with NET-digesting, DNase I–coated nanoparticles markedly reduced lung metastases in mice.
Abstract: Neutrophils, the most abundant type of leukocytes in blood, can form neutrophil extracellular traps (NETs). These are pathogen-trapping structures generated by expulsion of the neutrophil's DNA with associated proteolytic enzymes. NETs produced by infection can promote cancer metastasis. We show that metastatic breast cancer cells can induce neutrophils to form metastasis-supporting NETs in the absence of infection. Using intravital imaging, we observed NET-like structures around metastatic 4T1 cancer cells that had reached the lungs of mice. We also found NETs in clinical samples of triple-negative human breast cancer. The formation of NETs stimulated the invasion and migration of breast cancer cells in vitro. Inhibiting NET formation or digesting NETs with deoxyribonuclease I (DNase I) blocked these processes. Treatment with NET-digesting, DNase I-coated nanoparticles markedly reduced lung metastases in mice. Our data suggest that induction of NETs by cancer cells is a previously unidentified metastasis-promoting tumor-host interaction and a potential therapeutic target.

568 citations


Journal ArticleDOI
TL;DR: This paper provides an update to the previous publications about the Ensembl Genomes, with a focus on recent developments, including the development of new analyses and views to represent polyploid genomes and the continued up-scaling of the resource.
Abstract: Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.

512 citations


Journal ArticleDOI
TL;DR: The results show that characterization of the maize B73 transcriptome is far from complete, and that maize gene expression is more complex than previously thought.
Abstract: Zea mays is an important genetic model for elucidating transcriptional networks. Uncertainties about the complete structure of mRNA transcripts limit the progress of research in this system. Here, using single-molecule sequencing technology, we produce 111,151 transcripts from 6 tissues capturing ∼70% of the genes annotated in maize RefGen_v3 genome. A large proportion of transcripts (57%) represent novel, sometimes tissue-specific, isoforms of known genes and 3% correspond to novel gene loci. In other cases, the identified transcripts have improved existing gene models. Averaging across all six tissues, 90% of the splice junctions are supported by short reads from matched tissues. In addition, we identified a large number of novel long non-coding RNAs and fusion transcripts and found that DNA methylation plays an important role in generating various isoforms. Our results show that characterization of the maize B73 transcriptome is far from complete, and that maize gene expression is more complex than previously thought.

466 citations


Journal ArticleDOI
TL;DR: It is reported that genetic loss or systemic knockdown of Malat1 using antisense oligonucleotides (ASOs) in the MMTV-PyMT- and Her2/neu-amplified tumor organoids model results in slower tumor growth accompanied by significant differentiation into cystic tumors and a reduction in metastasis.
Abstract: Genome-wide analyses have identified thousands of long noncoding RNAs (lncRNAs). Malat1 (metastasis-associated lung adenocarcinoma transcript 1) is among the most abundant lncRNAs whose expression is altered in numerous cancers. Here we report that genetic loss or systemic knockdown of Malat1 using antisense oligonucleotides (ASOs) in the MMTV (mouse mammary tumor virus)-PyMT mouse mammary carcinoma model results in slower tumor growth accompanied by significant differentiation into cystic tumors and a reduction in metastasis. Furthermore, Malat1 loss results in a reduction of branching morphogenesis in MMTV-PyMT- and Her2/neu-amplified tumor organoids, increased cell adhesion, and loss of migration. At the molecular level, Malat1 knockdown results in alterations in gene expression and changes in splicing patterns of genes involved in differentiation and protumorigenic signaling pathways. Together, these data demonstrate for the first time a functional role of Malat1 in regulating critical processes in mammary cancer pathogenesis. Thus, Malat1 represents an exciting therapeutic target, and Malat1 ASOs represent a potential therapy for inhibiting breast cancer progression.

457 citations


Journal ArticleDOI
12 Apr 2016-eLife
TL;DR: A new dimensionality reduction technique, demixed principal component analysis (dPCA), that decomposes population activity into a few components and exposes the dependence of the neural representation on task parameters such as stimuli, decisions, or rewards is demonstrated.
Abstract: Neurons in higher cortical areas, such as the prefrontal cortex, are often tuned to a variety of sensory and motor variables, and are therefore said to display mixed selectivity. This complexity of single neuron responses can obscure what information these areas represent and how it is represented. Here we demonstrate the advantages of a new dimensionality reduction technique, demixed principal component analysis (dPCA), that decomposes population activity into a few components. In addition to systematically capturing the majority of the variance of the data, dPCA also exposes the dependence of the neural representation on task parameters such as stimuli, decisions, or rewards. To illustrate our method we reanalyze population data from four datasets comprising different species, different cortical areas and different experimental tasks. In each case, dPCA provides a concise way of visualizing the data that summarizes the task-dependent features of the population response in a single figure.

443 citations


Journal ArticleDOI
TL;DR: The state of plant transformation is reviewed and innovations needed to enable genome editing in crops are pointed to, including a potential game-changer in crop genetics when plant transformation systems are optimized.
Abstract: Plant transformation has enabled fundamental insights into plant biology and revolutionized commercial agriculture. Unfortunately, for most crops, transformation and regeneration remain arduous even after more than thirty years of technological advances. Genome editing provides new opportunities to enhance crop productivity, but relies on genetic transformation and plant regeneration, which are bottlenecks in the process. Herein we review the state of plant transformation and point to innovations needed to enable genome editing in crops. Plant tissue culture methods need optimization and simplification for efficiency and minimize time in culture. Currently, specialized facilities exist for crop transformation. Single cell and robotic techniques should be developed for high throughput genomic screens. Utilization of plant genes involved in developmental reprogramming, wound response, and/or homologous recombination could boost recovery of transformed plants. Engineering universal Agrobacterium strains and recruitment of other microbes, such as Ensifer or Rhizobium, could facilitate delivery of DNA and proteins into plant cells. Synthetic biology should be employed for de novo design of transformation systems. Genome editing is a potential game-changer in crop genetics when plant transformation systems are optimized.

419 citations


Journal ArticleDOI
TL;DR: It is proposed that confidence should be defined as the probability that a decision or a proposition is correct given the evidence, a critical quantity in complex sequential decisions and the term certainty should be reserved to refer to the encoding of all other probability distributions over sensory and cognitive variables.
Abstract: When facing uncertainty, adaptive behavioral strategies demand that the brain performs probabilistic computations. In this probabilistic framework, the notion of certainty and confidence would appear to be closely related, so much so that it is tempting to conclude that these two concepts are one and the same. We argue that there are computational reasons to distinguish between these two concepts. Specifically, we propose that confidence should be defined as the probability that a decision or a proposition, overt or covert, is correct given the evidence, a critical quantity in complex sequential decisions. We suggest that the term certainty should be reserved to refer to the encoding of all other probability distributions over sensory and cognitive variables. We also discuss strategies for studying the neural codes for confidence and certainty and argue that clear definitions of neural codes are essential to understanding the relative contributions of various cortical areas to decision making.

386 citations


Journal ArticleDOI
01 Jan 2016-Database
TL;DR: The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses and it is used to produce reference comparative data and make it freely available.
Abstract: Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.

384 citations


Journal ArticleDOI
25 Feb 2016-Nature
TL;DR: It is concluded that in addition to later interbreeding events, the ancestors of Neanderthals from the Altai Mountains and early modern humans met and interbred, possibly in the Near East, many thousands of years earlier than previously thought.
Abstract: It has been shown that Neanderthals contributed genetically to modern humans outside Africa 47,000-65,000 years ago. Here we analyse the genomes of a Neanderthal and a Denisovan from the Altai Mountains in Siberia together with the sequences of chromosome 21 of two Neanderthals from Spain and Croatia. We find that a population that diverged early from other modern humans in Africa contributed genetically to the ancestors of Neanderthals from the Altai Mountains roughly 100,000 years ago. By contrast, we do not detect such a genetic contribution in the Denisovan or the two European Neanderthals. We conclude that in addition to later interbreeding events, the ancestors of Neanderthals from the Altai Mountains and early modern humans met and interbred, possibly in the Near East, many thousands of years earlier than previously thought.

Journal ArticleDOI
TL;DR: Several genes displaying presence/absence variation are annotated with functions related to major agronomic traits, including disease resistance, flowering time, glucosinolate metabolism and vitamin biosynthesis.
Abstract: There is an increasing awareness that as a result of structural variation, a reference sequence representing a genome of a single individual is unable to capture all of the gene repertoire found in the species. A large number of genes affected by presence/absence and copy number variation suggest that it may contribute to phenotypic and agronomic trait diversity. Here we show by analysis of the Brassica oleracea pangenome that nearly 20% of genes are affected by presence/absence variation. Several genes displaying presence/absence variation are annotated with functions related to major agronomic traits, including disease resistance, flowering time, glucosinolate metabolism and vitamin biosynthesis.

Journal ArticleDOI
Yuxiang Jiang1, Tal Ronnen Oron2, Wyatt T. Clark3, Asma R. Bankapur4  +153 moreInstitutions (59)
TL;DR: The second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function, was conducted by as mentioned in this paper. But the results of the CAFA2 assessment are limited.
Abstract: BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

Journal ArticleDOI
TL;DR: This Review discusses how CLV-W US signaling coordinates stem cell proliferation with differentiation, highlighting commonalities and differences between CLAVATA-WUSCHEL pathways in different species.
Abstract: Shoot meristems are maintained by pluripotent stem cells that are controlled by CLAVATA-WUSCHEL feedback signaling. This pathway, which coordinates stem cell proliferation with differentiation, was first identified in Arabidopsis, but appears to be conserved in diverse higher plant species. In this Review, we highlight the commonalities and differences between CLAVATA-WUSCHEL pathways in different species, with an emphasis on Arabidopsis, maize, rice and tomato. We focus on stem cell control in shoot meristems, but also briefly discuss the role of these signaling components in root meristems.

Journal ArticleDOI
TL;DR: Findings suggest that recruitment of granulin-expressing inflammatory monocytes plays a key role in PDAC metastasis and may serve as a potential therapeutic target for PDAC liver metastasis.
Abstract: Pancreatic ductal adenocarcinoma (PDAC) is a devastating metastatic disease for which better therapies are urgently needed. Macrophages enhance metastasis in many cancer types; however, the role of macrophages in PDAC liver metastasis remains poorly understood. Here we found that PDAC liver metastasis critically depends on the early recruitment of granulin-secreting inflammatory monocytes to the liver. Mechanistically, we demonstrate that granulin secretion by metastasis-associated macrophages (MAMs) activates resident hepatic stellate cells (hStCs) into myofibroblasts that secrete periostin, resulting in a fibrotic microenvironment that sustains metastatic tumour growth. Disruption of MAM recruitment or genetic depletion of granulin reduced hStC activation and liver metastasis. Interestingly, we found that circulating monocytes and hepatic MAMs in PDAC patients express high levels of granulin. These findings suggest that recruitment of granulin-expressing inflammatory monocytes plays a key role in PDAC metastasis and may serve as a potential therapeutic target for PDAC liver metastasis.

Journal ArticleDOI
TL;DR: It is found that there exist exclusive and orthogonal population-level subspaces dedicated to preparatory and movement computations, which yielded a reorganization in response correlations: the set of neurons with shared response properties changed completely between preparation and movement.
Abstract: Neural populations can change the computation they perform on very short timescales Although such flexibility is common, the underlying computational strategies at the population level remain unknown To address this gap, we examined population responses in motor cortex during reach preparation and movement We found that there exist exclusive and orthogonal population-level subspaces dedicated to preparatory and movement computations This orthogonality yielded a reorganization in response correlations: the set of neurons with shared response properties changed completely between preparation and movement Thus, the same neural population acts, at different times, as two separate circuits with very different properties This finding is not predicted by existing motor cortical models, which predict overlapping preparation-related and movement-related subspaces Despite orthogonality, responses in the preparatory subspace were lawfully related to subsequent responses in the movement subspace These results reveal a population-level strategy for performing separate but linked computations

Journal ArticleDOI
TL;DR: iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.
Abstract: The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.

Journal ArticleDOI
TL;DR: It is demonstrated that the genetic underpinning of inherited pancreatic cancer is highly heterogeneous, which has significant implications for the management of patients with familial Pancreatic cancer and for the identification of susceptibility genes in other common cancer types.
Abstract: Pancreatic cancer is projected to become the second leading cause of cancer-related death in the United States by 2020. A familial aggregation of pancreatic cancer has been established, but the cause of this aggregation in most families is unknown. To determine the genetic basis of susceptibility in these families, we sequenced the germline genome of 638 familial pancreatic cancer patients. We also sequenced the exomes of 39 familial pancreatic adenocarcinomas. Our analyses support the role of previously identified familial pancreatic cancer susceptibility genes such as BRCA2, CDKN2A and ATM, and identify novel candidate genes harboring rare, deleterious germline variants for further characterization. We also show how somatic point mutations that occur during hematopoiesis can affect the interpretation of genome-wide studies of hereditary traits. Our observations have important implications for the etiology of pancreatic cancer and for the identification of susceptibility genes in other common cancer types.


Journal ArticleDOI
21 Sep 2016-Neuron
TL;DR: It is demonstrated that combinatorial genetic and viral approaches target restricted GABAergic subpopulations and cell types characterized by distinct laminar location, morphology, axonal projection, and electrophysiological properties.

Journal ArticleDOI
TL;DR: The whole-genome sequencing and assembly of inbred derivatives of Petunia hybrida reveal that the Petunia lineage has experienced at least two rounds of hexaploidization, and transcription factors involved in the shift from bee to moth pollination reside in particularly dynamic regions of the genome.
Abstract: Petunia hybrida is a popular bedding plant that has a long history as a genetic model system. We report the whole-genome sequencing and assembly of inbred derivatives of its two wild parents, P. axillaris N and P. inflata S6. The assemblies include 91.3% and 90.2% coverage of their diploid genomes (1.4 Gb; 2n = 14) containing 32,928 and 36,697 protein-coding genes, respectively. The genomes reveal that the Petunia lineage has experienced at least two rounds of hexaploidization: the older gamma event, which is shared with most Eudicots, and a more recent Solanaceae event that is shared with tomato and other solanaceous species. Transcription factors involved in the shift from bee to moth pollination reside in particularly dynamic regions of the genome, which may have been key to the remarkable diversity of floral colour patterns and pollination systems. The high-quality genome sequences will enhance the value of Petunia as a model system for research on unique biological phenomena such as small RNAs, symbiosis, self-incompatibility and circadian rhythms.

Journal ArticleDOI
07 Sep 2016-Neuron
TL;DR: Applying MAPseq to the locus coeruleus (LC), it is found that individual LC neurons have preferred cortical targets and harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits.

Journal ArticleDOI
TL;DR: This study reveals how cells undergoing oncogene-induced senescence acquire a distinctive enhancer landscape that includes formation of super-enhancers adjacent to immune-modulatory genes required for paracrine immune activation.
Abstract: Oncogene-induced senescence is a potent barrier to tumorigenesis that limits cellular expansion following certain oncogenic events. Senescent cells display a repressive chromatin configuration thought to stably silence proliferation-promoting genes, while simultaneously activating an unusual form of immune surveillance involving a secretory program referred to as the senescence-associated secretory phenotype (SASP). Here we demonstrate that senescence also involves a global remodeling of the enhancer landscape with recruitment of the chromatin reader BRD4 to newly activated super-enhancers adjacent to key SASP genes. Transcriptional profiling and functional studies indicate that BRD4 is required for the SASP and downstream paracrine signaling. Consequently, BRD4 inhibition disrupts immune cell mediated targeting and elimination of premalignant senescent cells in vitro and in vivo. Our results identify a critical role for BRD4-bound super-enhancers in senescence immune surveillance and in the proper execution of a tumor-suppressive program.

Journal ArticleDOI
03 Jun 2016-eLife
TL;DR: It is shown that loci controlling adaptive responses to the environment are the most frequent transposition targets observed and the importance of transposition as a recurrent generator of large-effect alleles is demonstrated.
Abstract: Transposable elements (TEs) are powerful motors of genome evolution yet a comprehensive assessment of recent transposition activity at the species level is lacking for most organisms Here, using genome sequencing data for 211 Arabidopsis thaliana accessions taken from across the globe, we identify thousands of recent transposition events involving half of the 326 TE families annotated in this plant species We further show that the composition and activity of the 'mobilome' vary extensively between accessions in relation to climate and genetic factors Moreover, TEs insert equally throughout the genome and are rapidly purged by natural selection from gene-rich regions because they frequently affect genes, in multiple ways Remarkably, loci controlling adaptive responses to the environment are the most frequent transposition targets observed These findings demonstrate the pervasive, species-wide impact that a rich mobilome can have and the importance of transposition as a recurrent generator of large-effect alleles

Journal ArticleDOI
TL;DR: Assemblytics incorporates a unique anchor filtering approach to increase robustness to repetitive elements, and identifies six classes of variants based on their distinct alignment signatures to detect aberrant genomes and identify differences between related species.
Abstract: Assemblytics is a web app for detecting and analyzing variants from a de novo genome assembly aligned to a reference genome. It incorporates a unique anchor filtering approach to increase robustness to repetitive elements, and identifies six classes of variants based on their distinct alignment signatures. Assemblytics can be applied both to comparing aberrant genomes, such as human cancers, to a reference, or to identify differences between related species. Multiple interactive visualizations enable in-depth explorations of the genomic distributions of variants. AVAILABILITY AND IMPLEMENTATION: http://assemblytics.com, https://github.com/marianattestad/assemblytics CONTACT: mnattest@cshl.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: The results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.
Abstract: We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.

Journal ArticleDOI
TL;DR: Tumor-induced IL-6 impairs the ketogenic response to reduced caloric intake, resulting in a systemic metabolic stress response that blocks anti-cancer immunotherapy.

Journal ArticleDOI
TL;DR: Long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq, implying that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.
Abstract: Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.

Journal ArticleDOI
04 May 2016-Neuron
TL;DR: This work suggests that the human feeling of confidence originates from a mental computation of statistical confidence, and quantitatively accounted for human confidence in the authors' tasks without necessitating heuristic operations.

Posted ContentDOI
03 Jun 2016-bioRxiv
TL;DR: The FALCON-based assemblies were substantially more contiguous and complete than alternate short or long-read approaches, and enabled the study of haplotype structures and heterozygosities between the homologous chromosomes, including identifying widespread heterozygous structural variations within the coding sequences.
Abstract: While genome assembly projects have been successful in a number of haploid or inbred species, one of the current main challenges is assembling non-inbred or rearranged heterozygous genomes. To address this critical need, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble Single Molecule Real-Time (SMRT(R)) Sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We demonstrate the quality of this approach by assembling new reference sequences for three heterozygous samples, including an F1 hybrid of the model species Arabidopsis thaliana, the widely cultivated V. vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata that have challenged short-read assembly approaches. The FALCON-based assemblies were substantially more contiguous and complete than alternate short or long-read approaches. The phased diploid assembly enabled the study of haplotype structures and heterozygosities between the homologous chromosomes, including identifying widespread heterozygous structural variations within the coding sequences.