scispace - formally typeset
Open AccessPosted ContentDOI

Information Content Differentiates Enhancers From Silencers in Mouse Photoreceptors

Reads0
Chats0
TLDR
In this paper, a measure of information content that describes the number and diversity of motifs in a sequence was developed to distinguish enhancers and silencers targeted by the same transcription factors.
Abstract
Enhancers and silencers often depend on the same transcription factors (TFs) and are conflated in genomic assays of TF binding or chromatin state To identify sequence features that distinguish enhancers and silencers, we assayed massively parallel reporter libraries of genomic sequences targeted by the photoreceptor TF CRX in mouse retinas Both enhancers and silencers contain more TF motifs than inactive sequences, but relative to silencers, enhancers contain motifs from a more diverse collection of TFs We developed a measure of information content that describes the number and diversity of motifs in a sequence and found that, while both enhancers and silencers depend on CRX motifs, enhancers have higher information content The ability of information content to distinguish enhancers and silencers targeted by the same TF illustrates how motif context determines the activity of cis-regulatory sequences

read more

Content maybe subject to copyright    Report

References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Journal ArticleDOI

Matplotlib: A 2D Graphics Environment

TL;DR: Matplotlib is a 2D graphics package used for Python for application development, interactive scripting, and publication-quality image generation across user interfaces and operating systems.
Journal ArticleDOI

BEDTools: a flexible suite of utilities for comparing genomic features

TL;DR: A new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format, which allows the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks.
Journal ArticleDOI

MEME Suite: tools for motif discovery and searching

TL;DR: The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps, and all of the motif-based tools are now implemented as web services via Opal.
Related Papers (5)
Frequently Asked Questions (14)
Q1. How many cycles were used to amplify the polylinker libraries?

Polylinker libraries 522 were amplified using primers BC_CRX_Nested_F and BC_CRX_R (Supplementary file 6) for 523 30 cycles (NEB Q5) at an annealing temperature of 67C and then purified with the Monarch 52425PCR kit. 

PE2 indexing barcodes were then 528 added by amplifying 2 microliters of the previous reaction with forward primer P1_outer and 529 reverse primers PE2_outer_SIC69 and PE2_outer_SIC70 (Supplementary file 6) for 5 cycles 530 at an annealing temperature of 66C followed by 5 cycles with no annealing step (NEB Q5) and 531 then purified with the Monarch PCR kit. 

To clone the basal promoter 475 into barcoded oligos without any upstream cis-regulatory sequence, the authors placed the SpeI site 47623next to the EcoRI site, which allowed us to place the promoter between the EcoRI site and the 477 3’ barcode. 

Sequencing reads were filtered to ensure that the barcode sequence perfectly matched the 541 expected sequence (>93% reads in a sample for the Rho libraries, >86% reads for the 542 Polylinker libraries). 

After synthesis of their library, the authors discovered 434 11% of these sequences do not actually overlap H3K27ac ChIP-seq peaks (110/1004 of the 435 H3K4me3- group and 60/541 of the H3K4me3+ group), but the authors still included them in the analysis 436 because they contain CRX motifs in ATAC-seq peaks. 

Expression levels were approximately log-normally distributed, so the authors computed 566 the log-normal parameters for each sequence and then performed Welch’s t-test. 

The cis-regulatory 956 logic of Hedgehog gradient responses: key roles for gli binding affinity, competition, and 957 cooperativity. 

For TFs and a 681bp window (which is the size of their sequences), and , meaning 682 that five total motifs for three different TFs specifies an approximately unique 164 bp location in 683 a mammalian genome. 

There was no basal promoter construct in this 715 library, so instead the authors defined silencers as ChIP-seq peaks that were at least two-fold below the 716 log2 mean of all scrambled sequences. 

The authors 591 determined which differentially expressed genes are near a member of their library using 592 previously published associations between retinal ATAC-seq peaks and genes [REF Murphy 593 2019]. 

The authors wrote custom Python 692 wrappers for gkmSVM to allow for interfacing between the C++ binaries and the rest of their 693 workflow. 

Five total motifs for three different TFs can be achieved in two ways: 684 three motifs for A, one for B, and one for C (3-1-1), or two motifs for A, two for B, and one for C 685(2-2-1). 

Massively parallel in vivo 1029 enhancer assay reveals that highly local features determine the cis-regulatory function of 1030 ChIP-seq peaks. 

Because each arrangement is equally likely, 652then is also the expected value of and the authors can write the entropy as 653, which is Shannon entropy.