scispace - formally typeset
Search or ask a question

Showing papers by "Michael Q. Zhang published in 2013"


Journal ArticleDOI
23 May 2013-Cell
TL;DR: It is found that promoters that are active in early developmental stages tend to be CG rich and mainly engage H3K27me3 upon silencing in nonexpressing lineages, while promoters for genes expressed preferentially at later stages are often CG poor and primarily employ DNA methylation upon repression.

697 citations


Journal ArticleDOI
TL;DR: BS-Seeker2, an updated version of BS Seeker, is developed as a full pipeline for mapping bisulfite sequencing data and generating DNA methylomes, and CGmap and ATCGmap file formats for full representations of DNA methylome are defined.
Abstract: DNA methylation is an important epigenetic modification involved in many biological processes. Bisulfite treatment coupled with high-throughput sequencing provides an effective approach for studying genome-wide DNA methylation at base resolution. Libraries such as whole genome bisulfite sequencing (WGBS) and reduced represented bisulfite sequencing (RRBS) are widely used for generating DNA methylomes, demanding efficient and versatile tools for aligning bisulfite sequencing data. We have developed BS-Seeker2, an updated version of BS Seeker, as a full pipeline for mapping bisulfite sequencing data and generating DNA methylomes. BS-Seeker2 improves mappability over existing aligners by using local alignment. It can also map reads from RRBS library by building special indexes with improved efficiency and accuracy. Moreover, BS-Seeker2 provides additional function for filtering out reads with incomplete bisulfite conversion, which is useful in minimizing the overestimation of DNA methylation levels. We also defined CGmap and ATCGmap file formats for full representations of DNA methylomes, as part of the outputs of BS-Seeker2 pipeline together with BAM and WIG files. Our evaluations on the performance show that BS-Seeker2 works efficiently and accurately for both WGBS data and RRBS data. BS-Seeker2 is freely available at http://pellegrini.mcdb.ucla.edu/BS_Seeker2/ and the Galaxy server.

348 citations


Journal ArticleDOI
22 Aug 2013-Immunity
TL;DR: It is shown that mice lacking the transcription factor Foxo1 in activated CD8+ T cells have defective secondary, but not primary, responses to Listeria monocytogenes infection.

159 citations


Journal ArticleDOI
TL;DR: OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads, achieves high sensitivity of junction detection by strategic searches with small seeds and identified hundreds of novel micro-exons in the mouse transcriptome.
Abstract: A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads. OLego adopts a multiple-seed-and-extend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds (~14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows-Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at http://zhanglab.c2b2.columbia.edu/index.php/OLego.

126 citations


Journal ArticleDOI
TL;DR: It is demonstrated that both CO2 and temperature alter microRNA expression to affect Arabidopsis growth and development, and miR156/157- andmiR172-regulated transcriptional network might underlie the onset of early flowering induced by increasing CO2.
Abstract: An increase in the concentration of atmospheric carbon dioxide and warmer temperatures can alter plant growth and development. Here the authors show that these conditions can also elicit significant changes in microRNAs expression, including some which might induce early flowering in Arabidopsis.

119 citations


Journal ArticleDOI
TL;DR: This paper briefly reviews the molecular techniques that generate DNA methylation data and its application to cancer studies, and describes the coverage of the methylome by the most recent version of Infinium HumanMethylation450 BeadChip technology.
Abstract: With the rapid development of genome-wide high-throughput technologies, including expression arrays, SNP arrays and next-generation sequencing platforms, enormous amounts of molecular data have been generated and deposited in the public domain. The application of computational approaches is required to yield biological insights from this enormous, ever-growing resource. A particularly interesting subset of these resources is related to epigenetic regulation, with DNA methylation being the most abundant data type. In this paper, we will focus on the analysis of DNA methylation data and its application to cancer studies. We first briefly review the molecular techniques that generate such data, much of which has been obtained with the use of the most recent version of Infinium HumanMethylation450 BeadChip(®) technology (Illumina, CA, USA). We describe the coverage of the methylome by this technique. Several examples of data mining are provided. However, it should be understood that reliance on a single aspect of epigenetics has its limitations. In the not too distant future, these defects may be rectified, providing scientists with previously unavailable opportunities to explore in detail the role of epigenetics in cancer and other disease states.

82 citations


Journal ArticleDOI
TL;DR: In this article, the authors determined all three p16 inactivation mechanisms with the use of multiple methodologies for genomic status, methylation, RNA, and protein expression, and correlated them with EGFR, KRAS, STK11 mutations and smoking status in 40 cell lines and 45 tumor samples of primary non-small-cell lung carcinoma.

70 citations


Journal ArticleDOI
21 Jun 2013-PLOS ONE
TL;DR: Genome-wide analysis revealed properties of CFS-like recurrent deletions that distinguish them from deletions affecting tumor suppressor genes, including their isolation at specific loci away from other genomic deletion sites, a considerably smaller deletion size, and dispersal throughout the affected locus rather than assembly at a common site of overlap.
Abstract: One of the key questions about genomic alterations in cancer is whether they are functional in the sense of contributing to the selective advantage of tumor cells. The frequency with which an alteration occurs might reflect its ability to increase cancer cell growth, or alternatively, enhanced instability of a locus may increase the frequency with which it is found to be aberrant in tumors, regardless of oncogenic impact. Here we’ve addressed this on a genome-wide scale for cancer-associated focal deletions, which are known to pinpoint both tumor suppressor genes (tumor suppressors) and unstable loci. Based on DNA copy number analysis of over one-thousand human cancers representing ten different tumor types, we observed five loci with focal deletion frequencies above 5%, including the A2BP1 gene at 16p13.3 and the MACROD2 gene at 20p12.1. However, neither RNA expression nor functional studies support a tumor suppressor role for either gene. Further analyses suggest instead that these are sites of increased genomic instability and that they resemble common fragile sites (CFS). Genome-wide analysis revealed properties of CFS-like recurrent deletions that distinguish them from deletions affecting tumor suppressor genes, including their isolation at specific loci away from other genomic deletion sites, a considerably smaller deletion size, and dispersal throughout the affected locus rather than assembly at a common site of overlap. Additionally, CFS-like deletions have less impact on gene expression and are enriched in cell lines compared to primary tumors. We show that loci affected by CFS-like deletions are often distinct from known common fragile sites. Indeed, we find that each tumor tissue type has its own spectrum of CFS-like deletions, and that colon cancers have many more CFS-like deletions than other tumor types. We present simple rules that can pinpoint focal deletions that are not CFS-like and more likely to affect functional tumor suppressors.

38 citations


Journal ArticleDOI
17 Jun 2013
TL;DR: It is found that the transient increase in CSCs proportion initiated from the purified NSCCs subpopulation cannot be well predicted by the conventional CSC model, implying that the cell state conversion is required especially for the transient dynamics.
Abstract: Cancer stem cell (CSC) theory suggests a cell-lineage structure in tumor cells in which CSCs are capable of giving rise to the other non-stem cancer cells (NSCCs) but not vice versa. However, an alternative scenario of bidirectional interconversions between CSCs and NSCCs was proposed very recently. Here we present a general population model of cancer cells by integrating conventional cell divisions with direct conversions between different cell states, namely, not only can CSCs differentiate into NSCCs by asymmetric cell division, NSCCs can also dedifferentiate into CSCs by cell state conversion. Our theoretical model is validated when applying the model to recent experimental data. It is also found that the transient increase in CSCs proportion initiated from the purified NSCCs subpopulation cannot be well predicted by the conventional CSC model where the conversion from NSCCs to CSCs is forbidden, implying that the cell state conversion is required especially for the transient dynamics. The theoretical analysis also gives the condition such that our general model can be equivalently reduced into a simple Markov chain with only cell state transitions keeping the same cell proportion dynamics.

31 citations


Journal ArticleDOI
05 Sep 2013-PLOS ONE
TL;DR: This work developed FastDMA which can be used to identify significantly differentially methylated probes and applies it on three large-scale DNA methylation datasets from The Cancer Genome Atlas (TCGA) to find many differentiallymethylated genomic sites in different types of cancer.
Abstract: DNA methylation is vital for many essential biological processes and human diseases. Illumina Infinium HumanMethylation450 Beadchip is a recently developed platform studying genome-wide DNA methylation state on more than 480,000 CpG sites and a few CHG sites with high data quality. To analyze the data of this promising platform, we developed FastDMA which can be used to identify significantly differentially methylated probes. Besides single probe analysis, FastDMA can also do region-based analysis for identifying the differentially methylated region (DMRs). A uniformed statistical model, analysis of covariance (ANCOVA), is used to achieve all the analyses in FastDMA. We apply FastDMA on three large-scale DNA methylation datasets from The Cancer Genome Atlas (TCGA) and find many differentially methylated genomic sites in different types of cancer. On the testing datasets, FastDMA shows much higher computational efficiency than current tools. FastDMA can benefit the data analyses of large-scale DNA methylation studies with an integrative pipeline and a high computational efficiency. The software is freely available via http://bioinfo.au.tsinghua.edu.cn/software/fastdma/.

22 citations


Journal ArticleDOI
TL;DR: The current computational approaches for active enhancer prediction are surveyed and the future directions are discussed to discuss future directions.

BookDOI
01 Nov 2013
TL;DR: The result of three well-studied problems is sketched, which doesn’t take duplicated genes into consideration and asks to minimize the number of rearrangements.
Abstract: ly, a genome with n genes could be represented by a permutation of n numbers with each number representing a gene. Two genomes from different species with the same set of genes in different orders could be represented by 158 T. Jiang and J. Feng Table 5.3 Example of reversal 0 3 5 4 2 1 6 0 1 2 4 5 3 6 0 1 2 3 5 4 6 0 1 2 3 5 4 6 0 1 2 3 4 5 6 two permutations. The problem of genome rearrangement is to find the minimum number of rearrangement operations needed to transform from one permutation to the other. Some rearrangements would change the direction of genes in genomes. To further capture the reality in genome rearrangement, each number in the permutation could be leaded by a sign, ̇, to reflect the direction of a gene in the genome. Table 5.3 shows a toy example. In this example, two genomes are represented by permutation 3, 5, 4, 2, 1 and 1, 2, 3, 4, 5 and the rearrangement operation is reversal, which cuts out a consecutive segment of the permutation, reverses the direction of this segment by reversing the order and the signs of numbers in the segment, and then pastes this segment back. This example also shows two conventions: 1. The leading 0 and ending 6 in each row: They are dummy genes. The only function of them is to simplify the discussion such that the first and last number in the permutation could be handled easily. 2. The identity permutation in the last row: A simple substitution could simplify the problem of transforming permutation 1 into 2 by another problem of transforming into identity permutation, where comes from substituting every number and sign in 1 by another number and sign, guided by 2. This simplified model asks to minimize the number of rearrangements. Nature, however, may not follow this parsimony strategy exactly. Even though, the parsimony result provides enough information of the relation between two concerned genomes. This model also doesn’t take duplicated genes into consideration. If duplicated numbers are allowed, most of the corresponding problems become much harder. Different rearrangements correspond to different problems. Some of them have polynomial algorithms but some of them do not. In the following, we sketch the result of three well-studied problems. Except the following three operations, there are several other operations: deletion, insertion, fusion, fission, transreversal, block interchange, etc. Various combinations of these operations were considered too.

Journal ArticleDOI
TL;DR: A web server that integrates proteomic and mRNA expression data together to infer mi RNA-centered regulatory networks and provides more comprehensive miRNA- centered regulatory networks, ProteoMirExpress is presented.

Journal Article
TL;DR: It is found that promoters that are active in early developmental stages tend to be CG rich and mainly engage H3K27me3 upon silencing in non-expressing lineages, while promoters for genes expressed preferentially at later stages are often CG poor and employ DNA methylation upon repression.
Abstract: Epigenetic mechanisms have been proposed as crucial for regulating mammalian development, but their precise function is only partially understood. To investigate the epigenetic control of embryonic development, we differentiated human embryonic stem cells into mesendoderm, neural progenitor cells, trophoblast-like cells, and mesenchymal stem cells and systematically characterized DNA methylation, chromatin modifications, and the transcriptome in each lineage. Strikingly, we found that promoters that are active in early developmental stages tend to be CG rich and mainly engage H3K27me3 upon silencing in non-expressing lineages. By contrast, promoters for genes expressed preferentially at later stages are often CG poor and employ DNA methylation upon repression. Interestingly, the early developmental regulatory genes are often located in large genomic domains that are generally devoid of DNA methylation in most lineages, as we termed DNA methylation valleys (DMVs). Our results suggest that distinct epigenetic mechanisms regulate early and late stages of ES cell differentiation.

Journal ArticleDOI
TL;DR: Computational analyses for further investigating the relation between CSCs and NSCCs through population modeling of cancer cells indicate that cell-state conversions in cancer play important role in effectively keeping the heterogeneity in the population ofcancer cells.
Abstract: Cancer stem cell (CSC) theory suggests a cell-lineage structure in cancer that CSCs are capable of giving rise to the other non-stem cancer cells (NSCCs) but not vice versa. However, an alternative scenario of bidirectional interconversions between CSCs and NSCCs was proposed in [Gupta PB, et al. (2011) Cell 146: 633644]. Here we present computational analyses for further investigating the relation between CSCs and NSCCs through population modeling of cancer cells, where not only can CSCs differentiate into NSCCs by asymmetric cell division, NSCCs can also dedifferentiate into CSCs by cell state conversion. By validating our model with recent experimental data, it is shown that the conversion from CSCs to NCSSs explains the transient increase in the proportion of CSCs initiated from the purified NSCCs subpopulation. Our results indicate that cell-state conversions in cancer play important role s in effectively keeping the heterogeneity in the population of cancer cells.

Book ChapterDOI
01 Nov 2013
TL;DR: Genomics began with large-scale sequencing of the human and many model organism genomes around 1990; rapid accumulation of vast genomic data brings a great challenge on how to decipher such massive molecular information.
Abstract: Genomics began with large-scale sequencing of the human and many model organism genomes around 1990; rapid accumulation of vast genomic data brings a great challenge on how to decipher such massive molecular information. As bioinformatics in general, genome informatics is also data driven; many computational tools developed can soon be obsolete when new technologies and data types become available. Keeping this in mind if a student wants to work in this fascinating new field, one must be able to adapt quickly and to “shoot the moving targets” with the “just-in-time ammunition.”

Journal ArticleDOI
TL;DR: A set of genes for which the exonic allele- specific methylation patterns are correlated with their allele-specific expression levels is identified, which may shed light on the function of non-coding nucleotide variations, DNA methylation polymorphism and inter-individual differences.
Abstract: Whole-genome DNA methylation sequencing provides both methylation patterns and genetic information. We utilized base resolution methylomes to directly identify allelic linkage of DNA methylation and genomic variants. The paired association was further extended to construct hepitypes by the simultaneous phasing of genotype and methylation. Using such approach, the sequencing reads provide direct statistics of the interdependence between methylcytosines and nucleotide variations; consequently, the detailed patterns of genetic and epigenetic variations can be readily inferred by data. Moreover, the analysis is not limited by known single nucleotide variants. We demonstrate the utility of our method by identifying methylation sites that were strongly associated with genetic variations in the human genome using H1 and IMR90 methylomes. In addition to imprinted regions and SNV-in-CpG sites, we show numerous cis-regulatory sequence-associated DNA methylation sites. We next extended this strategy to incorporate multiple nucleotide and methylation sites and ranked hepitypes according to the observed frequency. The top-ranked hepitypes indicate that methylated sites are often observed from the same allele. Moreover, we used the informative nucleotide variants in both methylation and expression data of the same cell lines to investigate the extent of allele-specific gene regulation. We identified a set of genes for which the exonic allele-specific methylation patterns are correlated with their allele-specific expression levels. Allele-specific methylation plays an essential role in allelic regulation. Our finding may shed light on the function of non-coding nucleotide variations, DNA methylation polymorphism and inter-individual differences. This approach is applicable to any largescale differential methylation studies and can integrate various types of high-throughput sequencing data.