scispace - formally typeset
Search or ask a question

Showing papers on "Gene published in 2014"


Journal ArticleDOI
TL;DR: FeatureCounts as discussed by the authors is a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments, which implements highly efficient chromosome hashing and feature blocking techniques.
Abstract: MOTIVATION: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. RESULTS: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. AVAILABILITY AND IMPLEMENTATION: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

14,103 citations


Journal ArticleDOI
03 Jan 2014-Science
TL;DR: This work shows that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells, and observes a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation.
Abstract: The simplicity of programming the CRISPR (clustered regularly interspaced short palindromic repeats)–associated nuclease Cas9 to modify specific genomic loci suggests a new way to interrogate gene function on a genome-wide scale. We show that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells. First, we used the GeCKO library to identify genes essential for cell viability in cancer and pluripotent stem cells. Next, in a melanoma model, we screened for genes whose loss is involved in resistance to vemurafenib, a therapeutic RAF inhibitor. Our highest-ranking candidates include previously validated genes NF1 and MED12 , as well as novel hits NF2 , CUL3 , TADA2B , and TADA1. We observe a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation, demonstrating the promise of genome-scale screening with Cas9.

4,147 citations


Journal ArticleDOI
01 Jan 2014-Nature
TL;DR: In this paper, the authors report molecular profiling of 230 resected lung adnocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses.
Abstract: Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis.

4,104 citations


Journal ArticleDOI
TL;DR: The authors' data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycoleytic enzyme pyruvate kinase.
Abstract: The major cell classes of the brain differ in their developmental processes, metabolism, signaling, and function To better understand the functions and interactions of the cell types that comprise these classes, we acutely purified representative populations of neurons, astrocytes, oligodendrocyte precursor cells, newly formed oligodendrocytes, myelinating oligodendrocytes, microglia, endothelial cells, and pericytes from mouse cerebral cortex We generated a transcriptome database for these eight cell types by RNA sequencing and used a sensitive algorithm to detect alternative splicing events in each cell type Bioinformatic analyses identified thousands of new cell type-enriched genes and splicing isoforms that will provide novel markers for cell identification, tools for genetic manipulation, and insights into the biology of the brain For example, our data provide clues as to how neurons and astrocytes differ in their ability to dynamically regulate glycolytic flux and lactate generation attributable to unique splicing of PKM2, the gene encoding the glycolytic enzyme pyruvate kinase This dataset will provide a powerful new resource for understanding the development and function of the brain To ensure the widespread distribution of these datasets, we have created a user-friendly website (http://webstanfordedu/group/barres_lab/brain_rnaseqhtml) that provides a platform for analyzing and comparing transciption and alternative splicing profiles for various cell classes in the brain

3,891 citations


01 Jul 2014
TL;DR: High rates of somatic mutation were seen, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification, and MAPK and PI(3)K pathway activity was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation.
Abstract: Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen(mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis.

2,847 citations


Journal ArticleDOI
23 Jan 2014-Nature
TL;DR: It is found that large-scale genomic analysis can identify nearly all known cancer genes in these cancer types and 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis.
Abstract: Although a few cancer genes are mutated in a high proportion of tumours of a given type (.20%), most are mutated at intermediate frequencies (2–20%). To explore the feasibility of creating a comprehensive catalogue of cancer genes, we analysed somatic point mutations in exome sequences from 4,742 human cancers and their matched normal-tissue samples across 21 cancer types. We found that large-scale genomic analysis can identify nearly all known cancer genes in these tumour types. Our analysis also identified 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. We estimate that near-saturation may be achieved with 600– 5,000 samples per tumour type, depending on background mutation frequency. The results may help to guide the next stage of cancer genomics. Comprehensive knowledge of the genes underlying human cancers is a critical foundation for cancer diagnostics, therapeutics, clinical-trial design and selection of rational combination therapies. It is now possible to use genomic analysis to identify cancer genes in an unbiased fashion, based on the presence of somatic mutations at a rate significantly higher than the expected background level. Systematic studies have revealed many new cancer genes, as well as new classes of cancer genes 1,2 . They have also made clear that, although some cancer genes are mutated at high frequencies, most cancer genes in most patients occur at intermediate frequencies (2–20%) or lower. Accordingly, a complete catalogue of mutations in this frequency class will be essential for recognizing dysregulated pathways and optimal targets for therapeutic intervention. However, recent work suggests major gaps in our knowledge of cancer genes of intermediate frequency. For example, a study of 183 lung adenocarcinomas 3 found that 15% of patients lacked even a single mutation affecting any of the 10 known hallmarks of cancer, and 38% had 3 or fewer such mutations. In this paper, we analysed somatic point mutations (substitutions and small insertion and deletions) in nearly 5,000 human cancers and their matched normal-tissue samples (‘tumour–normal pairs’) across 21 tumour types. The questions that we examine here are: first, whether large-scale genomic analysis across tumour types can reliably identify all known cancer genes; second, whether it will reveal many new candidate cancer genes; and third, how far we are from having a complete catalogue of cancer genes (at least those of intermediate frequency). We used rigorous statistical methods to enumerate candidate cancer genes and then carefully inspected each gene to identify those with strong biological connections to cancer and mutational patterns consistent with the expected function. The analysis reveals nearly all known cancer genes and revealed 33 novel candidates, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Importantly, the data show that the

2,565 citations



Journal ArticleDOI
03 Jan 2014-Science
TL;DR: In this paper, a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single-guide RNA (sgRNA) library was described.
Abstract: The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)–Cas9 system for genome editing has greatly expanded the toolbox for mammalian genetics, enabling the rapid generation of isogenic cell lines and mice with modified alleles. Here, we describe a pooled, loss-of-function genetic screening approach suitable for both positive and negative selection that uses a genome-scale lentiviral single-guide RNA (sgRNA) library. sgRNA expression cassettes were stably integrated into the genome, which enabled a complex mutant pool to be tracked by massively parallel sequencing. We used a library containing 73,000 sgRNAs to generate knockout collections and performed screens in two human cell lines. A screen for resistance to the nucleotide analog 6-thioguanine identified all expected members of the DNA mismatch repair pathway, whereas another for the DNA topoisomerase II ( TOP2A ) poison etoposide identified TOP2A , as expected, and also cyclin-dependent kinase 6, CDK6. A negative selection screen for essential genes identified numerous gene sets corresponding to fundamental processes. Last, we show that sgRNA efficiency is associated with specific sequence motifs, enabling the prediction of more effective sgRNAs. Collectively, these results establish Cas9/sgRNA screens as a powerful tool for systematic genetic analysis in mammalian cells.

2,487 citations


Journal ArticleDOI
TL;DR: The biological barriers to gene delivery in vivo are introduced and recent advances in material sciences, nanotechnology and nucleic acid chemistry that have yielded promising non-viral delivery systems are discussed, some of which are currently undergoing testing in clinical trials.
Abstract: Gene-based therapy is the intentional modulation of gene expression in specific cells to treat pathological conditions This modulation is accomplished by introducing exogenous nucleic acids such as DNA, mRNA, small interfering RNA (siRNA), microRNA (miRNA) or antisense oligonucleotides Given the large size and the negative charge of these macromolecules, their delivery is typically mediated by carriers or vectors In this Review, we introduce the biological barriers to gene delivery in vivo and discuss recent advances in material sciences, nanotechnology and nucleic acid chemistry that have yielded promising non-viral delivery systems, some of which are currently undergoing testing in clinical trials The diversity of these systems highlights the recent progress of gene-based therapy using non-viral approaches

2,460 citations


Journal ArticleDOI
Silvia De Rubeis1, Xin-Xin He2, Arthur P. Goldberg1, Christopher S. Poultney1, Kaitlin E. Samocha3, A. Ercument Cicek2, Yan Kou1, Li Liu2, Menachem Fromer1, Menachem Fromer3, R. Susan Walker4, Tarjinder Singh5, Lambertus Klei6, Jack A. Kosmicki3, Shih-Chen Fu1, Branko Aleksic7, Monica Biscaldi8, Patrick Bolton9, Jessica M. Brownfeld1, Jinlu Cai1, Nicholas G. Campbell10, Angel Carracedo11, Angel Carracedo12, Maria H. Chahrour3, Andreas G. Chiocchetti, Hilary Coon13, Emily L. Crawford10, Lucy Crooks5, Sarah Curran9, Geraldine Dawson14, Eftichia Duketis, Bridget A. Fernandez15, Louise Gallagher16, Evan T. Geller17, Stephen J. Guter18, R. Sean Hill19, R. Sean Hill3, Iuliana Ionita-Laza20, Patricia Jiménez González, Helena Kilpinen, Sabine M. Klauck21, Alexander Kolevzon1, Irene Lee22, Jing Lei2, Terho Lehtimäki, Chiao-Feng Lin17, Avi Ma'ayan1, Christian R. Marshall4, Alison L. McInnes23, Benjamin M. Neale24, Michael John Owen25, Norio Ozaki7, Mara Parellada26, Jeremy R. Parr27, Shaun Purcell1, Kaija Puura, Deepthi Rajagopalan4, Karola Rehnström5, Abraham Reichenberg1, Aniko Sabo28, Michael Sachse, Stephen Sanders29, Chad M. Schafer2, Martin Schulte-Rüther30, David Skuse22, David Skuse31, Christine Stevens24, Peter Szatmari32, Kristiina Tammimies4, Otto Valladares17, Annette Voran33, Li-San Wang17, Lauren A. Weiss29, A. Jeremy Willsey29, Timothy W. Yu3, Timothy W. Yu19, Ryan K. C. Yuen4, Edwin H. Cook18, Christine M. Freitag, Michael Gill16, Christina M. Hultman34, Thomas Lehner35, Aarno Palotie36, Aarno Palotie3, Aarno Palotie24, Gerard D. Schellenberg17, Pamela Sklar1, Matthew W. State29, James S. Sutcliffe10, Christopher A. Walsh3, Christopher A. Walsh19, Stephen W. Scherer4, Michael E. Zwick37, Jeffrey C. Barrett5, David J. Cutler37, Kathryn Roeder2, Bernie Devlin6, Mark J. Daly3, Mark J. Daly24, Joseph D. Buxbaum1 
13 Nov 2014-Nature
TL;DR: Using exome sequencing, it is shown that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate of < 0.05, plus a set of 107 genes strongly enriched for those likely to affect risk (FDR < 0.30).
Abstract: The genetic architecture of autism spectrum disorder involves the interplay of common and rare variants and their impact on hundreds of genes. Using exome sequencing, here we show that analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, plus a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic formation, transcriptional regulation and chromatin-remodelling pathways. These include voltage-gated ion channels regulating the propagation of action potentials, pacemaking and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodellers-most prominently those that mediate post-translational lysine methylation/demethylation modifications of histones.

2,228 citations


Journal ArticleDOI
Andrew R. Wood1, Tõnu Esko2, Jian Yang3, Sailaja Vedantam4  +441 moreInstitutions (132)
TL;DR: This article identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height, and all common variants together captured 60% of heritability.
Abstract: Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ∼2,000, ∼3,700 and ∼9,500 SNPs explained ∼21%, ∼24% and ∼29% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/β-catenin and chondroitin sulfate-related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.

Journal ArticleDOI
27 Mar 2014-Nature
TL;DR: For example, the authors mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body.
Abstract: Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research

Journal ArticleDOI
TL;DR: High-resolution multiorgan expression data is generated showing that nearly half of all genes in the mouse genome oscillate with circadian rhythm somewhere in the body, and the majority of best-selling drugs and World Health Organization essential medicines directly target the products of rhythmic genes.
Abstract: To characterize the role of the circadian clock in mouse physiology and behavior, we used RNA-seq and DNA arrays to quantify the transcriptomes of 12 mouse organs over time. We found 43% of all protein coding genes showed circadian rhythms in transcription somewhere in the body, largely in an organ-specific manner. In most organs, we noticed the expression of many oscillating genes peaked during transcriptional “rush hours” preceding dawn and dusk. Looking at the genomic landscape of rhythmic genes, we saw that they clustered together, were longer, and had more spliceforms than nonoscillating genes. Systems-level analysis revealed intricate rhythmic orchestration of gene pathways throughout the body. We also found oscillations in the expression of more than 1,000 known and novel noncoding RNAs (ncRNAs). Supporting their potential role in mediating clock function, ncRNAs conserved between mouse and human showed rhythmic expression in similar proportions as protein coding genes. Importantly, we also found that the majority of best-selling drugs and World Health Organization essential medicines directly target the products of rhythmic genes. Many of these drugs have short half-lives and may benefit from timed dosage. In sum, this study highlights critical, systemic, and surprising roles of the mammalian circadian clock and provides a blueprint for advancement in chronotherapy.

Journal ArticleDOI
TL;DR: The integrated gene catalog (IGC) is established comprising 9,879,896 genes, which includes close-to-complete sets of genes for most gut microbes, which are also of considerably higher quality than in previous catalogs.
Abstract: Many analyses of the human gut microbiome depend on a catalog of reference genes. Existing catalogs for the human gut microbiome are based on samples from single cohorts or on reference genomes or protein sequences, which limits coverage of global microbiome diversity. Here we combined 249 newly sequenced samples of the Metagenomics of the Human Intestinal Tract (MetaHit) project with 1,018 previously sequenced samples to create a cohort from three continents that is at least threefold larger than cohorts used for previous gene catalogs. From this we established the integrated gene catalog (IGC) comprising 9,879,896 genes. The catalog includes close-to-complete sets of genes for most gut microbes, which are also of considerably higher quality than in previous catalogs. Analyses of a group of samples from Chinese and Danish individuals using the catalog revealed country-specific gut microbial signatures. This expanded catalog should facilitate quantitative characterization of metagenomic, metatranscriptomic and metaproteomic data from the gut microbiome to understand its variation across populations in human health and disease.

Journal ArticleDOI
Klaus F. X. Mayer, Jane Rogers, Jaroslav Doležel1, Curtis J. Pozniak2, Kellye Eversole, Catherine Feuillet3, Bikram S. Gill4, Bernd Friebe4, Adam J. Lukaszewski5, Pierre Sourdille6, Takashi R. Endo7, M. Kubaláková1, Jarmila Číhalíková1, Zdeňka Dubská1, Jan Vrána1, Romana Šperková1, Hana Šimková1, Melanie Febrer8, Leah Clissold, Kirsten McLay, Kuldeep Singh9, Parveen Chhuneja9, Nagendra K. Singh10, Jitendra P. Khurana11, Eduard Akhunov4, Frédéric Choulet6, Adriana Alberti, Valérie Barbe, Patrick Wincker, Hiroyuki Kanamori12, Fuminori Kobayashi12, Takeshi Itoh12, Takashi Matsumoto12, Hiroaki Sakai12, Tsuyoshi Tanaka12, Jianzhong Wu12, Yasunari Ogihara13, Hirokazu Handa12, P. Ron Maclachlan2, Andrew G. Sharpe14, Darrin Klassen14, David Edwards, Jacqueline Batley, Odd-Arne Olsen, Simen Rød Sandve15, Sigbjørn Lien15, Burkhard Steuernagel16, Brande B. H. Wulff16, Mario Caccamo, Sarah Ayling, Ricardo H. Ramirez-Gonzalez, Bernardo J. Clavijo, Jonathan M. Wright, Matthias Pfeifer, Manuel Spannagl, Mihaela Martis, Martin Mascher17, Jarrod Chapman18, Jesse Poland4, Uwe Scholz17, Kerrie Barry18, Robbie Waugh19, Daniel S. Rokhsar18, Gary J. Muehlbauer, Nils Stein17, Heidrun Gundlach, Matthias Zytnicki20, Véronique Jamilloux20, Hadi Quesneville20, Thomas Wicker21, Primetta Faccioli, Moreno Colaiacovo, Antonio Michele Stanca, Hikmet Budak22, Luigi Cattivelli, Natasha Glover6, Lise Pingault6, Etienne Paux6, Sapna Sharma, Rudi Appels23, Matthew I. Bellgard23, Brett Chapman23, Thomas Nussbaumer, Kai Christian Bader, Hélène Rimbert, Shichen Wang4, Ron Knox, Andrzej Kilian, Michael Alaux20, Françoise Alfama20, Loïc Couderc20, Nicolas Guilhot6, Claire Viseux20, Mikaël Loaec20, Beat Keller21, Sébastien Praud 
18 Jul 2014-Science
TL;DR: Insight into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.
Abstract: An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum) genome has been produced by sequencing isolated chromosome arms. We have annotated 124,201 gene loci distributed nearly evenly across the homeologous chromosomes and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid and tetraploid wheat relatives showed that high sequence similarity and structural conservation are retained, with limited gene loss, after polyploidization. However, across the genomes there was evidence of dynamic gene gain, loss, and duplication since the divergence of the wheat lineages. A high degree of transcriptional autonomy and no global dominance was found for the subgenomes. These insights into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.

Journal ArticleDOI
TL;DR: An online tool for the design of highly active sgRNAs for any gene of interest is provided, including a further optimization of the protospacer-adjacent motif (PAM) of Streptococcus pyogenes Cas9.
Abstract: Components of the prokaryotic clustered, regularly interspaced, short palindromic repeats (CRISPR) loci have recently been repurposed for use in mammalian cells. The CRISPR-associated (Cas)9 can be programmed with a single guide RNA (sgRNA) to generate site-specific DNA breaks, but there are few known rules governing on-target efficacy of this system. We created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. We discovered sequence features that improved activity, including a further optimization of the protospacer-adjacent motif (PAM) of Streptococcus pyogenes Cas9. The results from 1,841 sgRNAs were used to construct a predictive model of sgRNA activity to improve sgRNA design for gene editing and genetic screens. We provide an online tool for the design of highly active sgRNAs for any gene of interest.

Journal ArticleDOI
Feng Yue1, Feng Yue2, Yong Cheng3, Alessandra Breschi, Jeff Vierstra4, Weisheng Wu5, Weisheng Wu2, Tyrone Ryba6, Tyrone Ryba7, Richard Sandstrom4, Zhihai Ma3, Carrie A. Davis8, Benjamin D. Pope7, Yin Shen1, Dmitri D. Pervouchine, Sarah Djebali, Robert E. Thurman4, Rajinder Kaul4, Eric Rynes4, Anthony Kirilusha9, Georgi K. Marinov9, Brian A. Williams9, Diane Trout9, Henry Amrhein9, Katherine I. Fisher-Aylor9, Igor Antoshechkin9, Gilberto DeSalvo9, Lei Hoon See8, Meagan Fastuca8, Jorg Drenkow8, Chris Zaleski8, Alexander Dobin8, Pablo Prieto, Julien Lagarde, Giovanni Bussotti, Andrea Tanzer10, Olgert Denas11, Kanwei Li11, M. A. Bender4, M. A. Bender12, Miaohua Zhang12, Rachel Byron12, Mark Groudine12, Mark Groudine4, David McCleary1, Long Pham1, Zhen Ye1, Samantha Kuan1, Lee Edsall1, Yi-Chieh Wu13, Matthew D. Rasmussen13, Mukul S. Bansal13, Manolis Kellis14, Manolis Kellis13, Cheryl A. Keller2, Christapher S. Morrissey2, Tejaswini Mishra2, Deepti Jain2, Nergiz Dogan2, Robert S. Harris2, Philip Cayting3, Trupti Kawli3, Alan P. Boyle3, Alan P. Boyle5, Ghia Euskirchen3, Anshul Kundaje3, Shin Lin3, Yiing Lin3, Camden Jansen15, Venkat S. Malladi3, Melissa S. Cline16, Drew T. Erickson3, Vanessa M. Kirkup16, Katrina Learned16, Cricket A. Sloan3, Kate R. Rosenbloom16, Beatriz Lacerda de Sousa17, Kathryn Beal, Miguel Pignatelli, Paul Flicek, Jin Lian18, Tamer Kahveci19, Dongwon Lee20, W. James Kent16, Miguel Santos17, Javier Herrero21, Cedric Notredame, Audra K. Johnson4, Shinny Vong4, Kristen Lee4, Daniel Bates4, Fidencio Neri4, Morgan Diegel4, Theresa K. Canfield4, Peter J. Sabo4, Matthew S. Wilken4, Thomas A. Reh4, Erika Giste4, Anthony Shafer4, Tanya Kutyavin4, Eric Haugen4, Douglas Dunn4, Alex Reynolds4, Shane Neph4, Richard Humbert4, R. Scott Hansen4, Marella F. T. R. de Bruijn22, Licia Selleri23, Alexander Y. Rudensky24, Steven Z. Josefowicz24, Robert M. Samstein24, Evan E. Eichler4, Stuart H. Orkin25, Dana N. Levasseur26, Thalia Papayannopoulou4, Kai Hsin Chang4, Arthur I. Skoultchi27, Srikanta Gosh27, Christine M. Disteche4, Piper M. Treuting4, Yanli Wang2, Mitchell J. Weiss, Gerd A. Blobel28, Xiaoyi Cao1, Sheng Zhong1, Ting Wang29, Peter J. Good30, Rebecca F. Lowdon29, Rebecca F. Lowdon30, Leslie B. Adams30, Leslie B. Adams31, Xiao Qiao Zhou30, Michael J. Pazin30, Elise A. Feingold30, Barbara J. Wold9, James Taylor11, Ali Mortazavi15, Sherman M. Weissman18, John A. Stamatoyannopoulos4, Michael Snyder3, Roderic Guigó, Thomas R. Gingeras8, David M. Gilbert7, Ross C. Hardison2, Michael A. Beer20, Bing Ren1 
20 Nov 2014-Nature
TL;DR: The mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types as mentioned in this paper.
Abstract: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases

Journal ArticleDOI
TL;DR: A computational pipeline to identifycircRNAs and quantify their relative abundance from RNA-seq data is developed, providing a new framework for future investigation of this intriguing topological isoform while raising doubts regarding a biological function of most circRNAs.
Abstract: Background: The recent reports of two circular RNAs (circRNAs) with strong potential to act as microRNA (miRNA) sponges suggest that circRNAs might play important roles in regulating gene expression. However, the global properties of circRNAs are not well understood. Results: We developed a computational pipeline to identify circRNAs and quantify their relative abundance from RNA-seq data. Applying this pipeline to a large set of non-poly(A)-selected RNA-seq data from the ENCODE project, we annotated 7,112 human circRNAs that were estimated to comprise at least 10% of the transcripts accumulating from their loci. Most circRNAs are expressed in only a few cell types and at low abundance, but they are no more cell-type-specific than are mRNAs with similar overall expression levels. Although most circRNAs overlap protein-coding sequences, ribosome profiling provides no evidence for their translation. We also annotated 635 mouse circRNAs, and although 20% of them are orthologous to human circRNAs, the sequence conservation of these circRNA orthologs is no higher than that of their neighboring linear exons. The previously proposed miR-7 sponge, CDR1as, is one of only two circRNAs with more miRNA sites than expected by chance, with the next best miRNA-sponge candidate deriving from a gene encoding a primate-specific zinc-finger protein, ZNF91. Conclusions: Our results provide a new framework for future investigation of this intriguing topological isoform while raising doubts regarding a biological function of most circRNAs.

Journal ArticleDOI
TL;DR: It is concluded that SAD1 dynamically controls splicing efficiency and splice-site recognition in Arabidopsis, and it is proposed that this may contribute to S AD1-mediated stress tolerance through the metabolism of transcripts expressed from stress-responsive genes.
Abstract: Sm-like proteins are highly conserved proteins that form the core of the U6 ribonucleoprotein and function in several mRNA metabolism processes, including pre-mRNA splicing. Despite their wide occurrence in all eukaryotes, little is known about the roles of Sm-like proteins in the regulation of splicing. Here, through comprehensive transcriptome analyses, we demonstrate that depletion of the Arabidopsis supersensitive to abscisic acid and drought 1 gene (SAD1), which encodes Sm-like protein 5 (LSm5), promotes an inaccurate selection of splice sites that leads to a genome-wide increase in alternative splicing. In contrast, overexpression of SAD1 strengthens the precision of splice-site recognition and globally inhibits alternative splicing. Further, SAD1 modulates the splicing of stress-responsive genes, particularly under salt-stress conditions. Finally, we find that overexpression of SAD1 in Arabidopsis improves salt tolerance in transgenic plants, which correlates with an increase in splicing accuracy and efficiency for stress-responsive genes. We conclude that SAD1 dynamically controls splicing efficiency and splice-site recognition in Arabidopsis, and propose that this may contribute to SAD1-mediated stress tolerance through the metabolism of transcripts expressed from stress-responsive genes. Our study not only provides novel insights into the function of Sm-like proteins in splicing, but also uncovers new means to improve splicing efficiency and to enhance stress tolerance in a higher eukaryote.

Journal ArticleDOI
10 Jan 2014-Science
TL;DR: It is concluded that independent and stochastic allelic transcription generates abundant random monoallelic expression in the mammalian cell.
Abstract: Expression from both alleles is generally observed in analyses of diploid cell populations, but studies addressing allelic expression patterns genome-wide in single cells are lacking. Here, we present global analyses of allelic expression across individual cells of mouse preimplantation embryos of mixed background (CAST/EiJ × C57BL/6J). We discovered abundant (12 to 24%) monoallelic expression of autosomal genes and that expression of the two alleles occurs independently. The monoallelic expression appeared random and dynamic because there was considerable variation among closely related embryonic cells. Similar patterns of monoallelic expression were observed in mature cells. Our allelic expression analysis also demonstrates the de novo inactivation of the paternal X chromosome. We conclude that independent and stochastic allelic transcription generates abundant random monoallelic expression in the mammalian cell.

Journal ArticleDOI
TL;DR: A central role for RNA in human evolution and ontogeny is suggested and the emergence of the previously unsuspected world of regulatory RNA from a historical perspective is reviewed.
Abstract: Discoveries over the past decade portend a paradigm shift in molecular biology. Evidence suggests that RNA is not only functional as a messenger between DNA and protein but also involved in the regulation of genome organization and gene expression, which is increasingly elaborate in complex organisms. Regulatory RNA seems to operate at many levels; in particular, it plays an important part in the epigenetic processes that control differentiation and development. These discoveries suggest a central role for RNA in human evolution and ontogeny. Here, we review the emergence of the previously unsuspected world of regulatory RNA from a historical perspective.

20 Nov 2014
TL;DR: The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways.
Abstract: © 2014 Macmillan Publishers Limited. All rights reserved.The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain

Journal ArticleDOI
TL;DR: A concise database for BLAST using a Bio-Edit interface that can detect AR genetic determinants in bacterial genomes and can rapidly and easily discover putative new AR geneticeterminants is created.
Abstract: ARG-ANNOT (Antibiotic Resistance Gene-ANNOTation) is a new bioinformatic tool that was created to detect existing and putative new antibiotic resistance (AR) genes in bacterial genomes. ARG-ANNOT uses a local BLAST program in Bio-Edit software that allows the user to analyze sequences without a Web interface. All AR genetic determinants were collected from published works and online resources; nucleotide and protein sequences were retrieved from the NCBI GenBank database. After building a database that includes 1,689 antibiotic resistance genes, the software was tested in a blind manner using 100 random sequences selected from the database to verify that the sensitivity and specificity were at 100% even when partial sequences were queried. Notably, BLAST analysis results obtained using the rmtF gene sequence (a new aminoglycoside-modifying enzyme gene sequence that is not included in the database) as a query revealed that the tool was able to link this sequence to short sequences (17 to 40 bp) found in other genes of the rmt family with significant E values. Finally, the analysis of 178 Acinetobacter baumannii and 20 Staphylococcus aureus genomes allowed the detection of a significantly higher number of AR genes than the Resfinder gene analyzer and 11 point mutations in target genes known to be associated with AR. The average time for the analysis of a genome was 3.35 ± 0.13 min. We have created a concise database for BLAST using a Bio-Edit interface that can detect AR genetic determinants in bacterial genomes and can rapidly and easily discover putative new AR genetic determinants.

Journal ArticleDOI
TL;DR: 2 independent domestications from genetic pools that diverged before human colonization are confirmed and a set of genes linked with increased leaf and seed size are identified and combined with quantitative trait locus data from Mesoamerican cultivars.
Abstract: Common bean (Phaseolus vulgaris L.) is the most important grain legume for human consumption and has a role in sustainable agriculture owing to its ability to fix atmospheric nitrogen. We assembled 473 Mb of the 587-Mb genome and genetically anchored 98% of this sequence in 11 chromosome-scale pseudomolecules. We compared the genome for the common bean against the soybean genome to find changes in soybean resulting from polyploidy. Using resequencing of 60 wild individuals and 100 landraces from the genetically differentiated Mesoamerican and Andean gene pools, we confirmed 2 independent domestications from genetic pools that diverged before human colonization. Less than 10% of the 74 Mb of sequence putatively involved in domestication was shared by the two domestication events. We identified a set of genes linked with increased leaf and seed size and combined these results with quantitative trait locus data from Mesoamerican cultivars. Genes affected by domestication may be useful for genomics-enabled crop improvement.

Journal ArticleDOI
TL;DR: The results demonstrate the potential for efficient loss-of-function screening using the CRISPR-Cas9 system and identify 27 known and 4 previously unknown genes implicated in these phenotypes.
Abstract: Identification of genes influencing a phenotype of interest is frequently achieved through genetic screening by RNA interference (RNAi) or knockouts. However, RNAi may only achieve partial depletion of gene activity, and knockout-based screens are difficult in diploid mammalian cells. Here we took advantage of the efficiency and high throughput of genome editing based on type II, clustered, regularly interspaced, short palindromic repeats (CRISPR)-CRISPR-associated (Cas) systems to introduce genome-wide targeted mutations in mouse embryonic stem cells (ESCs). We designed 87,897 guide RNAs (gRNAs) targeting 19,150 mouse protein-coding genes and used a lentiviral vector to express these gRNAs in ESCs that constitutively express Cas9. Screening the resulting ESC mutant libraries for resistance to either Clostridium septicum alpha-toxin or 6-thioguanine identified 27 known and 4 previously unknown genes implicated in these phenotypes. Our results demonstrate the potential for efficient loss-of-function screening using the CRISPR-Cas9 system.

Journal ArticleDOI
15 Dec 2014-eLife
TL;DR: It is shown here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site- specific double-strand DNA breaks using timed delivery ofCas9-guide RNA ribonucleoprotein (RNP) complexes.
Abstract: The CRISPR/Cas9 system is a robust genome editing technology that works in human cells, animals and plants based on the RNA-programmed DNA cleaving activity of the Cas9 enzyme. Building on previous work (Jinek et al., 2013), we show here that new genetic information can be introduced site-specifically and with high efficiency by homology-directed repair (HDR) of Cas9-induced site-specific double-strand DNA breaks using timed delivery of Cas9-guide RNA ribonucleoprotein (RNP) complexes. Cas9 RNP-mediated HDR in HEK293T, human primary neonatal fibroblast and human embryonic stem cells was increased dramatically relative to experiments in unsynchronized cells, with rates of HDR up to 38% observed in HEK293T cells. Sequencing of on- and potential off-target sites showed that editing occurred with high fidelity, while cell mortality was minimized. This approach provides a simple and highly effective strategy for enhancing site-specific genome engineering in both transformed and primary human cells.

Journal ArticleDOI
TL;DR: This model is used to identify ∼1,000 genes that are significantly lacking in functional coding variation in non-ASD samples and are enriched for de novo loss-of-function mutations identified in ASD cases, suggesting that the role of de noVO mutations in ASDs might reside in fundamental neurodevelopmental processes.
Abstract: Mark Daly and colleagues present a statistical framework to evaluate the role of de novo mutations in human disease by calibrating a model of de novo mutation rates at the individual gene level. The mutation probabilities defined by their model and list of constrained genes can be used to help identify genetic variants that have a significant role in disease.

Journal ArticleDOI
TL;DR: A two-state model for Cas9 binding and cleavage is proposed, in which a seed match triggers binding but extensive pairing with target DNA is required for cleavage.
Abstract: Bacterial type II CRISPR-Cas9 systems have been widely adapted for RNA-guided genome editing and transcription regulation in eukaryotic cells, yet their in vivo target specificity is poorly understood. Here we mapped genome-wide binding sites of a catalytically inactive Cas9 (dCas9) from Streptococcus pyogenes loaded with single guide RNAs (sgRNAs) in mouse embryonic stem cells (mESCs). Each of the four sgRNAs we tested targets dCas9 to between tens and thousands of genomic sites, frequently characterized by a 5-nucleotide seed region in the sgRNA and an NGG protospacer adjacent motif (PAM). Chromatin inaccessibility decreases dCas9 binding to other sites with matching seed sequences; thus 70% of off-target sites are associated with genes. Targeted sequencing of 295 dCas9 binding sites in mESCs transfected with catalytically active Cas9 identified only one site mutated above background levels. We propose a two-state model for Cas9 binding and cleavage, in which a seed match triggers binding but extensive pairing with target DNA is required for cleavage.

Journal ArticleDOI
21 Mar 2014-Science
TL;DR: FISSEQ is compatible with tissue sections and whole-mount embryos and reduces the limitations of optical resolution and noisy signals on single-molecule detection, and can be used to investigate cellular phenotype, gene regulation, and environment in situ.
Abstract: Understanding the spatial organization of gene expression with single-nucleotide resolution requires localizing the sequences of expressed RNA transcripts within a cell in situ. Here, we describe fluorescent in situ RNA sequencing (FISSEQ), in which stably cross-linked complementary DNA (cDNA) amplicons are sequenced within a biological sample. Using 30-base reads from 8102 genes in situ, we examined RNA expression and localization in human primary fibroblasts with a simulated wound-healing assay. FISSEQ is compatible with tissue sections and whole-mount embryos and reduces the limitations of optical resolution and noisy signals on single-molecule detection. Our platform enables massively parallel detection of genetic elements, including gene transcripts and molecular barcodes, and can be used to investigate cellular phenotype, gene regulation, and environment in situ.

Journal ArticleDOI
TL;DR: A draft genome sequence of Brassica oleracea is described, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks.
Abstract: Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear Brassica is an ideal model to increase knowledge of polyploid evolution Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B oleracea This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus