scispace - formally typeset
Open AccessJournal ArticleDOI

Measuring the reproducibility and quality of Hi-C data

TLDR
This work assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices, to identify low-quality experiments.
Abstract
Hi-C is currently the most widely used assay to investigate the 3D organization of the genome and to study its role in gene regulation, DNA replication, and disease. However, Hi-C experiments are costly to perform and involve multiple complex experimental steps; thus, accurate methods for measuring the quality and reproducibility of Hi-C data are essential to determine whether the output should be used further in a study. Using real and simulated data, we profile the performance of several recently proposed methods for assessing reproducibility of population Hi-C data, including HiCRep, GenomeDISCO, HiC-Spector, and QuASAR-Rep. By explicitly controlling noise and sparsity through simulations, we demonstrate the deficiencies of performing simple correlation analysis on pairs of matrices, and we show that methods developed specifically for Hi-C data produce better measures of reproducibility. We also show how to use established measures, such as the ratio of intra- to interchromosomal interactions, and novel ones, such as QuASAR-QC, to identify low-quality experiments. In this work, we assess reproducibility and quality measures by varying sequencing depth, resolution and noise levels in Hi-C data from 13 cell lines, with two biological replicates each, as well as 176 simulated matrices. Through this extensive validation and benchmarking of Hi-C data, we describe best practices for reproducibility and quality assessment of Hi-C experiments. We make all software publicly available at http://github.com/kundajelab/3DChromatin_ReplicateQC to facilitate adoption in the community.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding.

TL;DR: This study uncovers previously obscured finer-scale genome organization, establishing functional links between chromatin folding and gene regulation by using high-resolution Micro-C to probe links between 3D genome organization and transcriptional regulation in mouse stem cells.
Journal ArticleDOI

Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2.

TL;DR: The FitHiC2 protocol is described, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites.
Journal ArticleDOI

Robust single-cell Hi-C clustering by convolution- and random-walk-based imputation.

TL;DR: ScHiCluster as discussed by the authors is a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk, which significantly improves clustering accuracy when applied to low coverage datasets compared with existing methods.
Journal ArticleDOI

GenomeDISCO: a concordance score for chromosome conformation capture experiments using random walks on contact map graphs.

TL;DR: A concordance measure called DIfferences between Smoothed COntact maps (GenomeDISCO) is introduced for assessing the similarity of a pair of contact maps obtained from chromosome conformation capture experiments, which accurately distinguishes biological replicates from samples obtained from different cell types.
References
More filters
Journal ArticleDOI

High-resolution profiling of histone methylations in the human genome.

TL;DR: High-resolution maps for the genome-wide distribution of 20 histone lysine and arginine methylations as well as histone variant H2A.Z, RNA polymerase II, and the insulator binding protein CTCF across the human genome using the Solexa 1G sequencing technology are generated.
Journal ArticleDOI

A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping

TL;DR: In situ Hi-C is used to probe the 3D architecture of genomes, constructing haploid and diploid maps of nine cell types, identifying ∼10,000 loops that frequently link promoters and enhancers, correlate with gene activation, and show conservation across cell types and species.
Journal ArticleDOI

Topological domains in mammalian genomes identified by analysis of chromatin interactions

TL;DR: It is found that the boundaries of topological domains are enriched for the insulator binding protein CTCF, housekeeping genes, transfer RNAs and short interspersed element (SINE) retrotransposons, indicating that these factors may have a role in establishing the topological domain structure of the genome.
Related Papers (5)