Showing papers on "Gene published in 2014"
••
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
14,103 citations
••
TL;DR: This work shows that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells, and observes a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation.
Abstract: The simplicity of programming the CRISPR (clustered regularly interspaced short palindromic repeats)–associated nuclease Cas9 to modify specific genomic loci suggests a new way to interrogate gene function on a genome-wide scale. We show that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells. First, we used the GeCKO library to identify genes essential for cell viability in cancer and pluripotent stem cells. Next, in a melanoma model, we screened for genes whose loss is involved in resistance to vemurafenib, a therapeutic RAF inhibitor. Our highest-ranking candidates include previously validated genes NF1 and MED12 , as well as novel hits NF2 , CUL3 , TADA2B , and TADA1. We observe a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation, demonstrating the promise of genome-scale screening with Cas9.
4,147 citations
••
Eric A. Collisson1, Joshua D. Campbell2, Angela N. Brooks3, Angela N. Brooks2 +315 more•Institutions (41)
TL;DR: In this paper, the authors report molecular profiling of 230 resected lung adnocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses.
Abstract: Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis.
4,104 citations
••
TL;DR: The authors' data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycoleytic enzyme pyruvate kinase.
Abstract: The major cell classes of the brain differ in their developmental processes, metabolism, signaling, and function To better understand the functions and interactions of the cell types that comprise these classes, we acutely purified representative populations of neurons, astrocytes, oligodendrocyte precursor cells, newly formed oligodendrocytes, myelinating oligodendrocytes, microglia, endothelial cells, and pericytes from mouse cerebral cortex We generated a transcriptome database for these eight cell types by RNA sequencing and used a sensitive algorithm to detect alternative splicing events in each cell type Bioinformatic analyses identified thousands of new cell type-enriched genes and splicing isoforms that will provide novel markers for cell identification, tools for genetic manipulation, and insights into the biology of the brain For example, our data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycolytic enzyme pyruvate kinase This dataset will provide a powerful new resource for understanding the development and function of the brain To ensure the widespread distribution of these datasets, we have created a user-friendly website (http://webstanfordedu/group/barres_lab/brain_rnaseqhtml) that provides a platform for analyzing and comparing transciption and alternative splicing profiles for various cell classes in the brain
3,891 citations
01 Jul 2014
TL;DR: High rates of somatic mutation were seen, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification, and MAPK and PI(3)K pathway activity was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation.
Abstract: Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen(mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis.
2,847 citations
••
TL;DR: It is found that large-scale genomic analysis can identify nearly all known cancer genes in these cancer types and 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis.
Abstract: Although a few cancer genes are mutated in a high proportion of tumours of a given type (.20%), most are mutated at intermediate frequencies (2–20%). To explore the feasibility of creating a comprehensive catalogue of cancer genes, we analysed somatic point mutations in exome sequences from 4,742 human cancers and their matched normal-tissue samples across 21 cancer types. We found that large-scale genomic analysis can identify nearly all known cancer genes in these tumour types. Our analysis also identified 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. We estimate that near-saturation may be achieved with 600– 5,000 samples per tumour type, depending on background mutation frequency. The results may help to guide the next stage of cancer genomics. Comprehensive knowledge of the genes underlying human cancers is a critical foundation for cancer diagnostics, therapeutics, clinical-trial design and selection of rational combination therapies. It is now possible to use genomic analysis to identify cancer genes in an unbiased fashion, based on the presence of somatic mutations at a rate significantly higher than the expected background level. Systematic studies have revealed many new cancer genes, as well as new classes of cancer genes 1,2 . They have also made clear that, although some cancer genes are mutated at high frequencies, most cancer genes in most patients occur at intermediate frequencies (2–20%) or lower. Accordingly, a complete catalogue of mutations in this frequency class will be essential for recognizing dysregulated pathways and optimal targets for therapeutic intervention. However, recent work suggests major gaps in our knowledge of cancer genes of intermediate frequency. For example, a study of 183 lung adenocarcinomas 3 found that 15% of patients lacked even a single mutation affecting any of the 10 known hallmarks of cancer, and 38% had 3 or fewer such mutations. In this paper, we analysed somatic point mutations (substitutions and small insertion and deletions) in nearly 5,000 human cancers and their matched normal-tissue samples (‘tumour–normal pairs’) across 21 tumour types. The questions that we examine here are: first, whether large-scale genomic analysis across tumour types can reliably identify all known cancer genes; second, whether it will reveal many new candidate cancer genes; and third, how far we are from having a complete catalogue of cancer genes (at least those of intermediate frequency). We used rigorous statistical methods to enumerate candidate cancer genes and then carefully inspected each gene to identify those with strong biological connections to cancer and mutational patterns consistent with the expected function. The analysis reveals nearly all known cancer genes and revealed 33 novel candidates, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Importantly, the data show that the
2,565 citations
••
TL;DR: A quantitative transcriptomics analysis (RNA-Seq) is used to classify the tissue-specific expression of genes across a representative set of all major human organs and tissues and combined this analysis with antibody-based profiling of the same tissues.
2,512 citations
••
TL;DR: In this paper, a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single-guide RNA (sgRNA) library was described.
Abstract: The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)–Cas9 system for genome editing has greatly expanded the toolbox for mammalian genetics, enabling the rapid generation of isogenic cell lines and mice with modified alleles. Here, we describe a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single-guide RNA (sgRNA) library. sgRNA expression cassettes were stably integrated into the genome, which enabled a complex mutant pool to be tracked by massively parallel sequencing. We used a library containing 73,000 sgRNAs to generate knockout collections and performed screens in two human cell lines. A screen for resistance to the nucleotide analog 6-thioguanine identified all expected members of the DNA mismatch repair pathway, whereas another for the DNA topoisomerase II ( TOP2A ) poison etoposide identified TOP2A , as expected, and also cyclin-dependent kinase 6, CDK6. A negative selection screen for essential genes identified numerous gene sets corresponding to fundamental processes. Last, we show that sgRNA efficiency is associated with specific sequence motifs, enabling the prediction of more effective sgRNAs. Collectively, these results establish Cas9/sgRNA screens as a powerful tool for systematic genetic analysis in mammalian cells.
2,487 citations
••
TL;DR: The biological barriers to gene delivery in vivo are introduced and recent advances in material sciences, nanotechnology and nucleic acid chemistry that have yielded promising non-viral delivery systems are discussed, some of which are currently undergoing testing in clinical trials.
Abstract: Gene-based therapy is the intentional modulation of gene expression in specific cells to treat pathological conditions This modulation is accomplished by introducing exogenous nucleic acids such as DNA, mRNA, small interfering RNA (siRNA), microRNA (miRNA) or antisense oligonucleotides Given the large size and the negative charge of these macromolecules, their delivery is typically mediated by carriers or vectors In this Review, we introduce the biological barriers to gene delivery in vivo and discuss recent advances in material sciences, nanotechnology and nucleic acid chemistry that have yielded promising non-viral delivery systems, some of which are currently undergoing testing in clinical trials The diversity of these systems highlights the recent progress of gene-based therapy using non-viral approaches
2,460 citations
••
Icahn School of Medicine at Mount Sinai1, Carnegie Mellon University2, Harvard University3, University of Toronto4, Wellcome Trust Sanger Institute5, University of Pittsburgh6, Nagoya University7, University of Freiburg8, King's College London9, Vanderbilt University10, University of Santiago de Compostela11, King Abdulaziz University12, University of Utah13, Duke University14, Memorial University of Newfoundland15, Trinity College, Dublin16, University of Pennsylvania17, University of Illinois at Chicago18, Boston Children's Hospital19, Columbia University20, German Cancer Research Center21, University College London22, Kaiser Permanente23, Broad Institute24, Cardiff University25, Complutense University of Madrid26, Newcastle University27, Baylor College of Medicine28, University of California, San Francisco29, RWTH Aachen University30, National Health Service31, McMaster University32, Saarland University33, Karolinska Institutet34, National Institutes of Health35, University of Helsinki36, Emory University37
TL;DR: Using exome sequencing, it is shown that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate of < 0.05, plus a set of 107 genes strongly enriched for those likely to affect risk (FDR < 0.30).
Abstract: The genetic architecture of autism spectrum disorder involves the interplay of common and rare variants and their impact on hundreds of genes. Using exome sequencing, here we show that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, plus a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic formation, transcriptional regulation and chromatin-remodelling pathways. These include voltage-gated ion channels regulating the propagation of action potentials, pacemaking and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodellers-most prominently those that mediate post-translational lysine methylation/demethylation modifications of histones.
2,228 citations
••
TL;DR: This article identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height, and all common variants together captured 60% of heritability.
Abstract: Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ∼2,000, ∼3,700 and ∼9,500 SNPs explained ∼21%, ∼24% and ∼29% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/β-catenin and chondroitin sulfate-related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.
••
Alistair R. R. Forrest, Hideya Kawaji, Michael Rehli1, J Kenneth Baillie2 +277 more•Institutions (63)
TL;DR: For example, the authors mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body.
Abstract: Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research
••
TL;DR: High-resolution multiorgan expression data is generated showing that nearly half of all genes in the mouse genome oscillate with circadian rhythm somewhere in the body, and the majority of best-selling drugs and World Health Organization essential medicines directly target the products of rhythmic genes.
Abstract: To characterize the role of the circadian clock in mouse physiology and behavior, we used RNA-seq and DNA arrays to quantify the transcriptomes of 12 mouse organs over time. We found 43% of all protein coding genes showed circadian rhythms in transcription somewhere in the body, largely in an organ-specific manner. In most organs, we noticed the expression of many oscillating genes peaked during transcriptional “rush hours” preceding dawn and dusk. Looking at the genomic landscape of rhythmic genes, we saw that they clustered together, were longer, and had more spliceforms than nonoscillating genes. Systems-level analysis revealed intricate rhythmic orchestration of gene pathways throughout the body. We also found oscillations in the expression of more than 1,000 known and novel noncoding RNAs (ncRNAs). Supporting their potential role in mediating clock function, ncRNAs conserved between mouse and human showed rhythmic expression in similar proportions as protein coding genes. Importantly, we also found that the majority of best-selling drugs and World Health Organization essential medicines directly target the products of rhythmic genes. Many of these drugs have short half-lives and may benefit from timed dosage. In sum, this study highlights critical, systemic, and surprising roles of the mammalian circadian clock and provides a blueprint for advancement in chronotherapy.
••
TL;DR: The integrated gene catalog (IGC) is established comprising 9,879,896 genes, which includes close-to-complete sets of genes for most gut microbes, which are also of considerably higher quality than in previous catalogs.
Abstract: Many analyses of the human gut microbiome depend on a catalog of reference genes. Existing catalogs for the human gut microbiome are based on samples from single cohorts or on reference genomes or protein sequences, which limits coverage of global microbiome diversity. Here we combined 249 newly sequenced samples of the Metagenomics of the Human Intestinal Tract (MetaHit) project with 1,018 previously sequenced samples to create a cohort from three continents that is at least threefold larger than cohorts used for previous gene catalogs. From this we established the integrated gene catalog (IGC) comprising 9,879,896 genes. The catalog includes close-to-complete sets of genes for most gut microbes, which are also of considerably higher quality than in previous catalogs. Analyses of a group of samples from Chinese and Danish individuals using the catalog revealed country-specific gut microbial signatures. This expanded catalog should facilitate quantitative characterization of metagenomic, metatranscriptomic and metaproteomic data from the gut microbiome to understand its variation across populations in human health and disease.
••
Academy of Sciences of the Czech Republic1, University of Saskatchewan2, Bayer3, Kansas State University4, University of California, Riverside5, Blaise Pascal University6, Kyoto University7, University of Dundee8, Punjab Agricultural University9, Indian Agricultural Research Institute10, University of Delhi11, University of Tsukuba12, Yokohama City University13, National Research Council14, Norwegian University of Life Sciences15, Sainsbury Laboratory16, Leibniz Association17, United States Department of Energy18, James Hutton Institute19, Institut national de la recherche agronomique20, University of Zurich21, Sabancı University22, Murdoch University23
TL;DR: Insight into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.
Abstract: An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum) genome has been produced by sequencing isolated chromosome arms. We have annotated 124,201 gene loci distributed nearly evenly across the homeologous chromosomes and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid and tetraploid wheat relatives showed that high sequence similarity and structural conservation are retained, with limited gene loss, after polyploidization. However, across the genomes there was evidence of dynamic gene gain, loss, and duplication since the divergence of the wheat lineages. A high degree of transcriptional autonomy and no global dominance was found for the subgenomes. These insights into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.
••
TL;DR: An online tool for the design of highly active sgRNAs for any gene of interest is provided, including a further optimization of the protospacer-adjacent motif (PAM) of Streptococcus pyogenes Cas9.
Abstract: Components of the prokaryotic clustered, regularly interspaced, short palindromic repeats (CRISPR) loci have recently been repurposed for use in mammalian cells. The CRISPR-associated (Cas)9 can be programmed with a single guide RNA (sgRNA) to generate site-specific DNA breaks, but there are few known rules governing on-target efficacy of this system. We created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. We discovered sequence features that improved activity, including a further optimization of the protospacer-adjacent motif (PAM) of Streptococcus pyogenes Cas9. The results from 1,841 sgRNAs were used to construct a predictive model of sgRNA activity to improve sgRNA design for gene editing and genetic screens. We provide an online tool for the design of highly active sgRNAs for any gene of interest.
••
University of California, San Diego1, Pennsylvania State University2, Stanford University3, University of Washington4, University of Michigan5, New College of Florida6, Florida State University7, Cold Spring Harbor Laboratory8, California Institute of Technology9, University of Vienna10, Emory University11, Fred Hutchinson Cancer Research Center12, Massachusetts Institute of Technology13, Broad Institute14, University of California, Irvine15, University of California, Santa Cruz16, University of California, San Francisco17, Yale University18, University of Florida19, Johns Hopkins University20, University College London21, University of Oxford22, Cornell University23, Memorial Sloan Kettering Cancer Center24, Harvard University25, University of Iowa26, Yeshiva University27, University of Pennsylvania28, Washington University in St. Louis29, National Institutes of Health30, University of North Carolina at Chapel Hill31
TL;DR: The mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types as mentioned in this paper.
Abstract: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases
••
TL;DR: A computational pipeline to identifycircRNAs and quantify their relative abundance from RNA-seq data is developed, providing a new framework for future investigation of this intriguing topological isoform while raising doubts regarding a biological function of most circRNAs.
Abstract: Background: The recent reports of two circular RNAs (circRNAs) with strong potential to act as microRNA (miRNA) sponges suggest that circRNAs might play important roles in regulating gene expression. However, the global properties of circRNAs are not well understood. Results: We developed a computational pipeline to identify circRNAs and quantify their relative abundance from RNA-seq data. Applying this pipeline to a large set of non-poly(A)-selected RNA-seq data from the ENCODE project, we annotated 7,112 human circRNAs that were estimated to comprise at least 10% of the transcripts accumulating from their loci. Most circRNAs are expressed in only a few cell types and at low abundance, but they are no more cell-type-specific than are mRNAs with similar overall expression levels. Although most circRNAs overlap protein-coding sequences, ribosome profiling provides no evidence for their translation. We also annotated 635 mouse circRNAs, and although 20% of them are orthologous to human circRNAs, the sequence conservation of these circRNA orthologs is no higher than that of their neighboring linear exons. The previously proposed miR-7 sponge, CDR1as, is one of only two circRNAs with more miRNA sites than expected by chance, with the next best miRNA-sponge candidate deriving from a gene encoding a primate-specific zinc-finger protein, ZNF91. Conclusions: Our results provide a new framework for future investigation of this intriguing topological isoform while raising doubts regarding a biological function of most circRNAs.
••
TL;DR: It is concluded that SAD1 dynamically controls splicing efficiency and splice-site recognition in Arabidopsis, and it is proposed that this may contribute to S AD1-mediated stress tolerance through the metabolism of transcripts expressed from stress-responsive genes.
Abstract: Sm-like proteins are highly conserved proteins that form the core of the U6 ribonucleoprotein and function in several mRNA metabolism processes, including pre-mRNA splicing. Despite their wide occurrence in all eukaryotes, little is known about the roles of Sm-like proteins in the regulation of splicing. Here, through comprehensive transcriptome analyses, we demonstrate that depletion of the Arabidopsis supersensitive to abscisic acid and drought 1 gene (SAD1), which encodes Sm-like protein 5 (LSm5), promotes an inaccurate selection of splice sites that leads to a genome-wide increase in alternative splicing. In contrast, overexpression of SAD1 strengthens the precision of splice-site recognition and globally inhibits alternative splicing. Further, SAD1 modulates the splicing of stress-responsive genes, particularly under salt-stress conditions. Finally, we find that overexpression of SAD1 in Arabidopsis improves salt tolerance in transgenic plants, which correlates with an increase in splicing accuracy and efficiency for stress-responsive genes. We conclude that SAD1 dynamically controls splicing efficiency and splice-site recognition in Arabidopsis, and propose that this may contribute to SAD1-mediated stress tolerance through the metabolism of transcripts expressed from stress-responsive genes. Our study not only provides novel insights into the function of Sm-like proteins in splicing, but also uncovers new means to improve splicing efficiency and to enhance stress tolerance in a higher eukaryote.
••
TL;DR: It is concluded that independent and stochastic allelic transcription generates abundant random monoallelic expression in the mammalian cell.
Abstract: Expression from both alleles is generally observed in analyses of diploid cell populations, but studies addressing allelic expression patterns genome-wide in single cells are lacking. Here, we present global analyses of allelic expression across individual cells of mouse preimplantation embryos of mixed background (CAST/EiJ × C57BL/6J). We discovered abundant (12 to 24%) monoallelic expression of autosomal genes and that expression of the two alleles occurs independently. The monoallelic expression appeared random and dynamic because there was considerable variation among closely related embryonic cells. Similar patterns of monoallelic expression were observed in mature cells. Our allelic expression analysis also demonstrates the de novo inactivation of the paternal X chromosome. We conclude that independent and stochastic allelic transcription generates abundant random monoallelic expression in the mammalian cell.
••
TL;DR: A central role for RNA in human evolution and ontogeny is suggested and the emergence of the previously unsuspected world of regulatory RNA from a historical perspective is reviewed.
Abstract: Discoveries over the past decade portend a paradigm shift in molecular biology. Evidence suggests that RNA is not only functional as a messenger between DNA and protein but also involved in the regulation of genome organization and gene expression, which is increasingly elaborate in complex organisms. Regulatory RNA seems to operate at many levels; in particular, it plays an important part in the epigenetic processes that control differentiation and development. These discoveries suggest a central role for RNA in human evolution and ontogeny. Here, we review the emergence of the previously unsuspected world of regulatory RNA from a historical perspective.
20 Nov 2014
TL;DR: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways.
Abstract: © 2014 Macmillan Publishers Limited. All rights reserved.The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain
••
TL;DR: A concise database for BLAST using a Bio-Edit interface that can detect AR genetic determinants in bacterial genomes and can rapidly and easily discover putative new AR geneticeterminants is created.
Abstract: ARG-ANNOT (Antibiotic Resistance Gene-ANNOTation) is a new bioinformatic tool that was created to detect existing and putative new antibiotic resistance (AR) genes in bacterial genomes. ARG-ANNOT uses a local BLAST program in Bio-Edit software that allows the user to analyze sequences without a Web interface. All AR genetic determinants were collected from published works and online resources; nucleotide and protein sequences were retrieved from the NCBI GenBank database. After building a database that includes 1,689 antibiotic resistance genes, the software was tested in a blind manner using 100 random sequences selected from the database to verify that the sensitivity and specificity were at 100% even when partial sequences were queried. Notably, BLAST analysis results obtained using the rmtF gene sequence (a new aminoglycoside-modifying enzyme gene sequence that is not included in the database) as a query revealed that the tool was able to link this sequence to short sequences (17 to 40 bp) found in other genes of the rmt family with significant E values. Finally, the analysis of 178 Acinetobacter baumannii and 20 Staphylococcus aureus genomes allowed the detection of a significantly higher number of AR genes than the Resfinder gene analyzer and 11 point mutations in target genes known to be associated with AR. The average time for the analysis of a genome was 3.35 ± 0.13 min. We have created a concise database for BLAST using a Bio-Edit interface that can detect AR genetic determinants in bacterial genomes and can rapidly and easily discover putative new AR genetic determinants.
••
United States Department of Energy1, North Dakota State University2, United States Department of Agriculture3, University of Georgia4, Institut national de la recherche agronomique5, Tennessee State University6, Colorado State University7, University of California, Davis8, Michigan State University9, University of Arizona10, University of Paris-Sud11, University of Nebraska–Lincoln12
TL;DR: 2 independent domestications from genetic pools that diverged before human colonization are confirmed and a set of genes linked with increased leaf and seed size are identified and combined with quantitative trait locus data from Mesoamerican cultivars.
Abstract: Common bean (Phaseolus vulgaris L.) is the most important grain legume for human consumption and has a role in sustainable agriculture owing to its ability to fix atmospheric nitrogen. We assembled 473 Mb of the 587-Mb genome and genetically anchored 98% of this sequence in 11 chromosome-scale pseudomolecules. We compared the genome for the common bean against the soybean genome to find changes in soybean resulting from polyploidy. Using resequencing of 60 wild individuals and 100 landraces from the genetically differentiated Mesoamerican and Andean gene pools, we confirmed 2 independent domestications from genetic pools that diverged before human colonization. Less than 10% of the 74 Mb of sequence putatively involved in domestication was shared by the two domestication events. We identified a set of genes linked with increased leaf and seed size and combined these results with quantitative trait locus data from Mesoamerican cultivars. Genes affected by domestication may be useful for genomics-enabled crop improvement.
••
TL;DR: The results demonstrate the potential for efficient loss-of-function screening using the CRISPR-Cas9 system and identify 27 known and 4 previously unknown genes implicated in these phenotypes.
Abstract: Identification of genes influencing a phenotype of interest is frequently achieved through genetic screening by RNA interference (RNAi) or knockouts. However, RNAi may only achieve partial depletion of gene activity, and knockout-based screens are difficult in diploid mammalian cells. Here we took advantage of the efficiency and high throughput of genome editing based on type II, clustered, regularly interspaced, short palindromic repeats (CRISPR)-CRISPR-associated (Cas) systems to introduce genome-wide targeted mutations in mouse embryonic stem cells (ESCs). We designed 87,897 guide RNAs (gRNAs) targeting 19,150 mouse protein-coding genes and used a lentiviral vector to express these gRNAs in ESCs that constitutively express Cas9. Screening the resulting ESC mutant libraries for resistance to either Clostridium septicum alpha-toxin or 6-thioguanine identified 27 known and 4 previously unknown genes implicated in these phenotypes. Our results demonstrate the potential for efficient loss-of-function screening using the CRISPR-Cas9 system.
••
TL;DR: It is shown here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site- specific double-strand DNA breaks using timed delivery ofCas9-guide RNA ribonucleoprotein (RNP) complexes.
Abstract: The CRISPR/Cas9 system is a robust genome editing technology that works in human cells, animals and plants based on the RNA-programmed DNA cleaving activity of the Cas9 enzyme. Building on previous work (Jinek et al., 2013), we show here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site-specific double-strand DNA breaks using timed delivery of Cas9-guide RNA ribonucleoprotein (RNP) complexes. Cas9 RNP-mediated HDR in HEK293T, human primary neonatal fibroblast and human embryonic stem cells was increased dramatically relative to experiments in unsynchronized cells, with rates of HDR up to 38% observed in HEK293T cells. Sequencing of on- and potential off-target sites showed that editing occurred with high fidelity, while cell mortality was minimized. This approach provides a simple and highly effective strategy for enhancing site-specific genome engineering in both transformed and primary human cells.
••
Harvard University1, Yale University2, Broad Institute3, Baylor College of Medicine4, Beth Israel Deaconess Medical Center5, Wellcome Trust Sanger Institute6, Icahn School of Medicine at Mount Sinai7, University of Texas Health Science Center at Houston8, University of Illinois at Chicago9, University of Pennsylvania10, Vanderbilt University11, University of Pittsburgh12, Carnegie Mellon University13
TL;DR: This model is used to identify ∼1,000 genes that are significantly lacking in functional coding variation in non-ASD samples and are enriched for de novo loss-of-function mutations identified in ASD cases, suggesting that the role of de noVO mutations in ASDs might reside in fundamental neurodevelopmental processes.
Abstract: Mark Daly and colleagues present a statistical framework to evaluate the role of de novo mutations in human disease by calibrating a model of de novo mutation rates at the individual gene level. The mutation probabilities defined by their model and list of constrained genes can be used to help identify genetic variants that have a significant role in disease.
••
TL;DR: A two-state model for Cas9 binding and cleavage is proposed, in which a seed match triggers binding but extensive pairing with target DNA is required for cleavage.
Abstract: Bacterial type II CRISPR-Cas9 systems have been widely adapted for RNA-guided genome editing and transcription regulation in eukaryotic cells, yet their in vivo target specificity is poorly understood. Here we mapped genome-wide binding sites of a catalytically inactive Cas9 (dCas9) from Streptococcus pyogenes loaded with single guide RNAs (sgRNAs) in mouse embryonic stem cells (mESCs). Each of the four sgRNAs we tested targets dCas9 to between tens and thousands of genomic sites, frequently characterized by a 5-nucleotide seed region in the sgRNA and an NGG protospacer adjacent motif (PAM). Chromatin inaccessibility decreases dCas9 binding to other sites with matching seed sequences; thus 70% of off-target sites are associated with genes. Targeted sequencing of 295 dCas9 binding sites in mESCs transfected with catalytically active Cas9 identified only one site mutated above background levels. We propose a two-state model for Cas9 binding and cleavage, in which a seed match triggers binding but extensive pairing with target DNA is required for cleavage.
••
TL;DR: FISSEQ is compatible with tissue sections and whole-mount embryos and reduces the limitations of optical resolution and noisy signals on single-molecule detection, and can be used to investigate cellular phenotype, gene regulation, and environment in situ.
Abstract: Understanding the spatial organization of gene expression with single-nucleotide resolution requires localizing the sequences of expressed RNA transcripts within a cell in situ. Here, we describe fluorescent in situ RNA sequencing (FISSEQ), in which stably cross-linked complementary DNA (cDNA) amplicons are sequenced within a biological sample. Using 30-base reads from 8102 genes in situ, we examined RNA expression and localization in human primary fibroblasts with a simulated wound-healing assay. FISSEQ is compatible with tissue sections and whole-mount embryos and reduces the limitations of optical resolution and noisy signals on single-molecule detection. Our platform enables massively parallel detection of genetic elements, including gene transcripts and molecular barcodes, and can be used to investigate cellular phenotype, gene regulation, and environment in situ.
••
Crops Research Institute1, Australian Centre for Plant Functional Genomics2, Agriculture and Agri-Food Canada3, Purdue University4, Plant Genome Mapping Laboratory5, Southwest University6, University of York7, Seoul National University8, Southern Cross University9, University of Missouri10, Centre national de la recherche scientifique11, Huazhong Agricultural University12, Hunan Agricultural University13, University of Queensland14, National Research Council15, Central University, India16, Sahmyook University17, King Abdulaziz University18
TL;DR: A draft genome sequence of Brassica oleracea is described, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks.
Abstract: Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear Brassica is an ideal model to increase knowledge of polyploid evolution Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B oleracea This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus