Deconvolving sequence features that discriminate between overlapping regulatory annotations
read more
Citations
Integrative analysis of 111 reference human epigenomes
ChromHMM: automating chromatin-state discovery and characterization
References
A map of the cis-regulatory sequences in the mouse genome
Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome
DREME: motif discovery in transcription factor ChIP-seq data
Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors.
Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains
Related Papers (5)
Mocap: Large-scale inference of transcription factor binding sites from chromatin accessibility
Frequently Asked Questions (16)
Q2. How many datasets were used to calculate collective degree?
To calculate collective degree, the authors used a total of 158, 102, and 202 ChIP-seq datasets in GM12878, H1-hESC, and K562 cell-types, respectively.
Q3. What is the likely reason for the non-positive score?
A significant depletion of motif instances at sites annotated by a label compared to other labels can very likely result in non-positive scores.
Q4. What is the common method of finding a motif in a sequence?
Most popular motif-finding methods use unsupervised machinelearning approaches to discover motifs in ‘foreground’ input sequences that are over-represented with respect to a set of ‘background’ sequences (e.g. “bound” vs. “unbound”, respectively) [1,2].
Q5. How are the weight parameters learned for the labels?
In other words, while the k-mer weight parameters for each subclass are learned directly from the data, the weight parameters for the labels are learned exclusively through the regularization constraint.
Q6. How can SeqUnwinder deconvolve sequence features associated with motor neuron programming?
By implicitly accounting for the effects of overlapping annotation labels, SeqUnwinder can deconvolve sequence features associated with motor neuron programming dynamics and ES chromatin status.
Q7. How many TFs were found to have cognate motifs?
the authors found IRF and RUNX motifs enriched at GM12878-specific binding sites for 11 and 7 of the 17 examined TFs, respectively.
Q8. What is the advantage of the “hill-finding” approach?
One advantage of the “hill-finding” approach is that it implicitly takes into account positional relationships between high-scoring k-mers on the genome; short stretches that contain multiple high-scoring k-mers will form larger “hills”.
Q9. What variants of the basic string kernel have been proposed?
Several variants of the basic string kernel (e.g. mismatch kernel [35], di-mismatch kernel [4], wild-card kernel [5,35], and gkm-kernel [36]) have been proposed and have been shown to substantially improve the classifier performance.
Q10. How many different DREME runs did the authors run for each of the labels?
Since DREME takes only two classes as input: a foreground set and a background set, the authors ran four different DREME runs for each of the four labels.
Q11. What was the significance of the binding sites removed from the shared set?
binding sites showing significantly differential binding in any of the possible 3 pair-wise comparisons were removed from the shared set.
Q12. What is the way to predict TF binding?
SeqUnwinder’s characterization of cell-specific motif features in collections of DNase-seq datasets may therefore serve as a source of predictive features for efforts that aim to predict cell-specific TF binding from accessibility experimental data alone [39–41].
Q13. How do the authors restrict the k-mer features to the hills?
To speed-up implementation, the authors restrict the unbiased k-mer features to only those k-mers that are present in at least 5% of the hills.
Q14. What was the q-value cutoff of the labeled sites?
All sites with significantly greater Isl1/Lhx3 ChIP enrichment at 12h compared to 48h (q-value cutoff of<0.01) were labeled as “early”.
Q15. What are the motifs that the authors previously assigned to early or late TF binding behaviors?
the motifs that the authors previously assigned to early or late TF binding behaviors could have been merely associated with ES-active and ES-inactive sites, respectively.
Q16. What are the TFs that are not correlated with the cognate motif?
the cognate motif was not specifically predictive of cell-type-specific labels for the examined TFs, with the exception of H1-hESC-specific sites for CEBPB, NRSF and SRF.