Showing papers by "Barbara J. Wold published in 2012"
••
Cold Spring Harbor Laboratory1, University of California, Irvine2, California Institute of Technology3, Florida State University College of Arts and Sciences4, Yale University5, Wellcome Trust Sanger Institute6, Norwegian University of Science and Technology7, Affymetrix8, University of North Carolina at Chapel Hill9, University of Lausanne10, University of Geneva11, Genome Institute of Singapore12, Stanford University13, Pompeu Fabra University14
TL;DR: Evidence that three-quarters of the human genome is capable of being transcribed is reported, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs that prompt a redefinition of the concept of a gene.
Abstract: Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
4,450 citations
01 Sep 2012
TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.
2,767 citations
••
Stanford University1, California Institute of Technology2, Massachusetts Institute of Technology3, Broad Institute4, University of California, Berkeley5, Harvard University6, Yale University7, Duke University8, University of Washington9, University of Texas at Austin10, University of Chicago11, Pennsylvania State University12, Baylor College of Medicine13, National Institutes of Health14, Ontario Institute for Cancer Research15, University of Massachusetts Medical School16, University of Southern California17, University of North Carolina at Chapel Hill18
TL;DR: This work discusses how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data and develops a set of working standards and guidelines for ChIP experiments that are updated routinely.
Abstract: Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
1,801 citations
••
TL;DR: In conclusion, this study provides insights into transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells through widespread promoter-centered intragenic, extragenics, and intergenic interactions.
1,166 citations
••
University of Washington1, Stanford University2, Pennsylvania State University3, University of California, San Diego4, Cold Spring Harbor Laboratory5, Florida State University6, Fred Hutchinson Cancer Research Center7, Yale University8, California Institute of Technology9, University of Massachusetts Medical School10, Duke University11, Emory University12, Children's Hospital of Philadelphia13, University of California, Irvine14, University of California, Santa Cruz15, National Institutes of Health16
TL;DR: The Mouse E NCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome to enable a broad range of mouse genomics efforts.
Abstract: To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome
445 citations
••
TL;DR: Five stages spanning the commitment process are probed using RNA-seq and ChIP-seq to track genome-wide shifts in transcription, cohorts of active transcription factor genes, histone modifications at diverse classes of cis-regulatory elements, and binding repertoire of GATA-3 and PU.1, transcription factors with complementary roles in T cell development.
323 citations
••
TL;DR: Since increased densities of microglia in two functionally and anatomically disparate cortical areas are observed, it is suggested that these immune cells are probably denser throughout cerebral cortex in brains of people with autism.
Abstract: We immunocytochemically identified microglia in fronto-insular (FI) and visual cortex (VC) in autopsy brains of well-phenotyped subjects with autism and matched controls, and stereologically quantified the microglial densities. Densities were determined blind to phenotype using an optical fractionator probe. In FI, individuals with autism had significantly more microglia compared to controls (p = 0.02). One such subject had a microglial density in FI within the control range and was also an outlier behaviorally with respect to other subjects with autism. In VC, microglial densities were also significantly greater in individuals with autism versus controls (p = 0.0002). Since we observed increased densities of microglia in two functionally and anatomically disparate cortical areas, we suggest that these immune cells are probably denser throughout cerebral cortex in brains of people with autism.
224 citations
••
Harvard University1, University of Chicago2, Institut national de la recherche agronomique3, Pennsylvania State University4, University of North Carolina at Chapel Hill5, Tongji University6, University of Texas Southwestern Medical Center7, Agency for Science, Technology and Research8, University of California, Berkeley9, California Institute of Technology10
TL;DR: Evaluation of widely used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle data sets with deep coverage, and a chromatin-state bias was detected: open chromatin regions yielded higher coverage, which led to false positives if not corrected.
Abstract: We evaluated how variations in sequencing depth and other parameters influence interpretation of chromatin immunoprecipitation–sequencing (ChIP-seq) experiments. Using Drosophila melanogaster S2 cells, we generated ChIP-seq data sets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin-state bias: open chromatin regions yielded higher coverage, which led to false positives if not corrected. This bias had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP-library complexity at high coverage. Removal of reads originating at the same base reduced false-positives but had little effect on detection sensitivity. Even at mappable-genome coverage depth of ~1 read per base pair, ~1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle data sets with deep coverage.
159 citations
••
TL;DR: In this paper, the authors measured genome-wide differential allelic occupancy of 24 transcription factors and EP300 in a human lymphoblastoid cell line GM12878 and found strong association between TF occupancy and expression within 100 bp of transcription start sites (TSSs), and weak association up to 100 kb from TSSs.
Abstract: A complex interplay between transcription factors (TFs) and the genome regulates transcription. However, connecting variation in genome sequence with variation in TF binding and gene expression is challenging due to environmental differences between individuals and cell types. To address this problem, we measured genome-wide differential allelic occupancy of 24 TFs and EP300 in a human lymphoblastoid cell line GM12878. Overall, 5% of human TF binding sites have an allelic imbalance in occupancy. At many sites, TFs clustered in TF-binding hubs on the same homolog in especially open chromatin. While genetic variation in core TF binding motifs generally resulted in large allelic differences in TF occupancy, most allelic differences in occupancy were subtle and associated with disruption of weak or noncanonical motifs. We also measured genome-wide differential allelic expression of genes with and without heterozygous exonic variants in the same cells. We found that genes with differential allelic expression were overall less expressed both in GM12878 cells and in unrelated human cell lines. Comparing TF occupancy with expression, we found strong association between allelic occupancy and expression within 100 bp of transcription start sites (TSSs), and weak association up to 100 kb from TSSs. Sites of differential allelic occupancy were significantly enriched for variants associated with disease, particularly autoimmune disease, suggesting that allelic differences in TF occupancy give functional insights into intergenic variants associated with disease. Our results have the potential to increase the power and interpretability of association studies by targeting functional intergenic variants in addition to protein coding sequences.
156 citations
••
TL;DR: This study analyzed the long, polyA-selected, unstranded, deeply sequenced RNA-seq data from the ENCODE Project across 14 human cell lines for candidate RNA editing events to find a stronger association of editing and specific genes suggests that the editing of the transcript is more important than the edit of any individual site.
Abstract: RNA-seq data can be mined for sequence differences relative to the reference genome to identify both genomic SNPs and RNA editing events. We analyzed the long, polyA-selected, unstranded, deeply sequenced RNA-seq data from the ENCODE Project across 14 human cell lines for candidate RNA editing events. On average, 43% of the RNA sequencing variants that are not in dbSNP and are within gene boundaries are A-to-G(I) RNA editing candidates. The vast majority of A-to-G(I) edits are located in introns and 3′ UTRs, with only 123 located in protein-coding sequence. In contrast, the majority of non–A-to-G variants (60%–80%) map near exon boundaries and have the characteristics of splice-mapping artifacts. After filtering out all candidates with evidence of private genomic variation using genome resequencing or ChIP-seq data, we find that up to 85% of the high-confidence RNA variants are A-to-G(I) editing candidates. Genes with A-to-G(I) edits are enriched in Gene Ontology terms involving cell division, viral defense, and translation. The distribution and character of the remaining non–A-to-G variants closely resemble known SNPs. We find no reproducible A-to-G(I) edits that result in nonsynonymous substitutions in all three lymphoblastoid cell lines in our study, unlike RNA editing in the brain. Given that only a fraction of sites are reproducibly edited in multiple cell lines and that we find a stronger association of editing and specific genes suggests that the editing of the transcript is more important than the editing of any individual site.
155 citations
••
TL;DR: The investigation of a DNA-binding pyrrole-imidazole polyamide targeted to bind the DNA sequence 5′-WGGWWW-3′ with reference to its potency in a subcutaneous xenograft tumor model shows the molecule is capable of trafficking to the tumor site following sub cutaneous injection and modulates transcription of select genes in vivo.
Abstract: Gene regulation by DNA binding small molecules could have important therapeutic applications. This study reports the investigation of a DNA-binding pyrrole-imidazole polyamide targeted to bind the DNA sequence 5′-WGGWWW-3′ with reference to its potency in a subcutaneous xenograft tumor model. The molecule is capable of trafficking to the tumor site following subcutaneous injection and modulates transcription of select genes in vivo. An FITC-labeled analogue of this polyamide can be detected in tumor-derived cells by confocal microscopy. RNA deep sequencing (RNA-seq) of tumor tissue allowed the identification of further affected genes, a representative panel of which was interrogated by quantitative reverse transcription-PCR and correlated with cell culture expression levels.
••
TL;DR: It is concluded that these tissue-specific factors contribute much more broadly to the transcriptional output of muscle tissue than previously thought, offering a partial explanation for widespread HLH-1 occupancy.
Abstract: Two major transcriptional regulators of Caenorhabditis elegans bodywall muscle (BWM) differentiation, hlh-1 and unc-120, are expressed in muscle where they are known to bind and regulate several well-studied muscle-specific genes. Simultaneously mutating both factors profoundly inhibits formation of contractile BWM. These observations were consistent with a simple network model in which the muscle regulatory factors drive tissue-specific transcription by binding selectively near muscle-specific targets to activate them. We tested this model by measuring the number, identity, and tissue-specificity of functional regulatory targets for each factor. Some joint regulatory targets (218) are BWM-specific and enriched for nearby HLH-1 binding. However, contrary to the simple model, the majority of genes regulated by one or both muscle factors are also expressed significantly in non-BWM tissues. We also mapped global factor occupancy by HLH-1, and created a genetic interaction map that identifies hlh-1 collaborating transcription factors. HLH-1 binding did not predict proximate regulatory action overall, despite enrichment for binding among BWM-specific positive regulatory targets of hlh-1. We conclude that these tissue-specific factors contribute much more broadly to the transcriptional output of muscle tissue than previously thought, offering a partial explanation for widespread HLH-1 occupancy. We also identify a novel regulatory connection between the BWM-specific hlh-1 network and the hlh-8/twist nonstriated muscle network. Finally, our results suggest a molecular basis for synthetic lethality in which hlh-1 and unc-120 mutant phenotypes are mutually buffered by joint additive regulation of essential target genes, with additional buffering suggested via newly identified hlh-1 interacting factors.