scispace - formally typeset
Search or ask a question

Showing papers by "Jussi Taipale published in 2017"


Journal ArticleDOI
05 May 2017-Science
TL;DR: This work systematically analyzed binding specificities of full-length transcription factors and extended DNA binding domains to unmethylated and CpG-methylated DNA by using methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment).
Abstract: INTRODUCTION Nearly all cells in the human body share the same primary genome sequence consisting of four nucleotide bases. One of the bases, cytosine, is commonly modified by methylation of its 5 position in CpG dinucleotides (mCpG). Most CpG dinucleotides in the human genome are methylated, but the level of CpG methylation varies with genetic location (promoter versus gene body), whether genes are active versus silenced, and cell type. Research has shown that the maintenance of a particular cellular state after cell division is dependent on faithful transmission of methylated CpGs, as well as inheritance of the mother cells’ repertoire of transcription factors by the daughter cells. These two mechanisms of epigenetic inheritance are linked to each other; the binding of transcription factors can be affected by cytosine methylation, and cytosine methylation can, in turn, be added or removed by proteins that associate with transcription factors. RATIONALE The genetic and epigenetic language, which imparts when and where genes are expressed, is understood at a conceptual level. However, a more detailed understanding is needed of the genomic regulatory mechanism by which methylated cytosines affect transcription factor binding. Because cytosine methylation changes DNA structure, it has the potential to affect binding of all transcription factors. However, a systematic analysis of binding of a large collection of transcription factors to all possible DNA sequences has not previously been conducted. RESULTS To globally characterize the effect of cytosine methylation on transcription factor binding, we systematically analyzed binding specificities of full-length transcription factors and extended DNA binding domains to unmethylated and CpG-methylated DNA by using methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment). We evaluated binding of 542 transcription factors and identified a large number of previously uncharacterized transcription factor recognition motifs. Binding of most major classes of transcription factors, including bHLH, bZIP, and ETS, was inhibited by mCpG. In contrast, transcription factors such as homeodomain, POU, and NFAT proteins preferred to bind methylated DNA. This class of binding was enriched in factors with central roles in embryonic and organismal development. The observed binding preferences were validated using several orthogonal methods, including bisulfite-SELEX and protein-binding microarrays. In addition, the preference of the pluripotency factor OCT4 to bind to a mCpG-containing motif was confirmed by chromatin immunoprecipitation analysis in mouse embryonic stem cells with low or high levels of CpG methylation (due to deficiency in all enzymes that methylate cytosines or contribute to their removal, respectively). Crystal structure analysis of the homeodomain proteins HOXB13, CDX1, CDX2, and LHX4 revealed three key residues that contribute to the preference of this developmentally important family of transcription factors for mCpG. The preference for binding to mCpG was due to direct hydrophobic interactions with the 5-methyl group of methylcytosine. In contrast, inhibition of binding of other transcription factors to methylated sequences was found to be caused by steric hindrance. CONCLUSION Our work constitutes a global analysis of the effect of cytosine methylation on DNA binding specificities of human transcription factors. CpG methylation can influence binding of most transcription factors to DNA—in some cases negatively and in others positively. Our finding that many developmentally important transcription factors prefer to bind to mCpG sites can inform future analyses of the role of DNA methylation on cell differentiation, chromatin reprogramming, and transcriptional regulation.

846 citations


Journal ArticleDOI
13 Apr 2017-Nature
TL;DR: By studying the reprogramming of mouse fibroblasts to neurons, it is found that the pan neuron-specific transcription factor Myt1-like (Myt1l) exerts its pro-neuronal function by direct repression of many different somatic lineage programs except the neuronal program.
Abstract: Normal differentiation and induced reprogramming require the activation of target cell programs and silencing of donor cell programs. In reprogramming, the same factors are often used to reprogram many different donor cell types. As most developmental repressors, such as RE1-silencing transcription factor (REST) and Groucho (also known as TLE), are considered lineage-specific repressors, it remains unclear how identical combinations of transcription factors can silence so many different donor programs. Distinct lineage repressors would have to be induced in different donor cell types. Here, by studying the reprogramming of mouse fibroblasts to neurons, we found that the pan neuron-specific transcription factor Myt1-like (Myt1l) exerts its pro-neuronal function by direct repression of many different somatic lineage programs except the neuronal program. The repressive function of Myt1l is mediated via recruitment of a complex containing Sin3b by binding to a previously uncharacterized N-terminal domain. In agreement with its repressive function, the genomic binding sites of Myt1l are similar in neurons and fibroblasts and are preferentially in an open chromatin configuration. The Notch signalling pathway is repressed by Myt1l through silencing of several members, including Hes1. Acute knockdown of Myt1l in the developing mouse brain mimicked a Notch gain-of-function phenotype, suggesting that Myt1l allows newborn neurons to escape Notch activation during normal development. Depletion of Myt1l in primary postmitotic neurons de-repressed non-neuronal programs and impaired neuronal gene expression and function, indicating that many somatic lineage programs are actively and persistently repressed by Myt1l to maintain neuronal identity. It is now tempting to speculate that similar 'many-but-one' lineage repressors exist for other cell fates; such repressors, in combination with lineage-specific activators, would be prime candidates for use in reprogramming additional cell types.

165 citations


Journal ArticleDOI
TL;DR: An overview of the structural basis of the different mechanisms by which TFs can cooperate is presented, focusing on insight from recent functional studies and structural analyses of specific TF-TF-DNA complexes.

159 citations


Journal ArticleDOI
TL;DR: This work resequenced data from previously published HT‐SELEX experiments, the most extensive mammalian TF–DNA binding data available to date, to reveal the nucleotide position‐dependent DNA shape readout in TF‐binding sites and the TF family‐specific position dependence.
Abstract: Transcription factors (TFs) achieve DNA-binding specificity through contacts with functional groups of bases (base readout) and readout of structural properties of the double helix (shape readout). Currently, it remains unclear whether DNA shape readout is utilized by only a few selected TF families, or whether this mechanism is used extensively by most TF families. We resequenced data from previously published HT-SELEX experiments, the most extensive mammalian TF-DNA binding data available to date. Using these data, we demonstrated the contributions of DNA shape readout across diverse TF families and its importance in core motif-flanking regions. Statistical machine-learning models combined with feature-selection techniques helped to reveal the nucleotide position-dependent DNA shape readout in TF-binding sites and the TF family-specific position dependence. Based on these results, we proposed novel DNA shape logos to visualize the DNA shape preferences of TFs. Overall, this work suggests a way of obtaining mechanistic insights into TF-DNA binding without relying on experimentally solved all-atom structures.

95 citations


Journal ArticleDOI
TL;DR: The findings broaden the scope of phenotypes caused by mutations in NFKB1 and suggest that a subset of autoinflammatory diseases, such as Behçet disease, can be caused by rare monogenic variants in genes of the NF‐&kgr;B pathway.
Abstract: Background The nuclear factor κ light-chain enhancer of activated B cells (NF-κB) signaling pathway is a key regulator of immune responses. Accordingly, mutations in several NF-κB pathway genes cause immunodeficiency. Objective We sought to identify the cause of disease in 3 unrelated Finnish kindreds with variable symptoms of immunodeficiency and autoinflammation. Methods We applied genetic linkage analysis and next-generation sequencing and functional analyses of NFKB1 and its mutated alleles. Results In all affected subjects we detected novel heterozygous variants in NFKB1, encoding for p50/p105. Symptoms in variant carriers differed depending on the mutation. Patients harboring a p.I553M variant presented with antibody deficiency, infection susceptibility, and multiorgan autoimmunity. Patients with a p.H67R substitution had antibody deficiency and experienced autoinflammatory episodes, including aphthae, gastrointestinal disease, febrile attacks, and small-vessel vasculitis characteristic of Behcet disease. Patients with a p.R157X stop-gain experienced hyperinflammatory responses to surgery and showed enhanced inflammasome activation. In functional analyses the p.R157X variant caused proteasome-dependent degradation of both the truncated and wild-type proteins, leading to a dramatic loss of p50/p105. The p.H67R variant reduced nuclear entry of p50 and showed decreased transcriptional activity in luciferase reporter assays. The p.I553M mutation in turn showed no change in p50 function but exhibited reduced p105 phosphorylation and stability. Affinity purification mass spectrometry also demonstrated that both missense variants led to altered protein-protein interactions. Conclusion Our findings broaden the scope of phenotypes caused by mutations in NFKB1 and suggest that a subset of autoinflammatory diseases, such as Behcet disease, can be caused by rare monogenic variants in genes of the NF-κB pathway.

85 citations


Posted ContentDOI
28 Dec 2017-bioRxiv
TL;DR: This work reveals striking differences in TF binding to free and nucleosomal DNA, and uncovers a rich interaction landscape between the TFs and the nucleosome.
Abstract: Nucleosomes cover most of the genome and are thought to be displaced by transcription factors (TFs) in regions that direct gene expression However, the modes of interaction between TFs and nucleosomal DNA remain largely unknown Here, we use nucleosome consecutive affinity-purification systematic evolution of ligands by exponential enrichment (NCAP-SELEX) to systematically explore interactions between the nucleosome and 220 TFs representing diverse structural families Consistently with earlier observations, we find that the vast majority of TFs have less access to nucleosomal DNA than to free DNA The motifs recovered from TFs bound to nucleosomal and free DNA are generally similar; however, steric hindrance and scaffolding by the nucleosome result in specific positioning and orientation of the motifs Many TFs preferentially bind close to the end of nucleosomal DNA, or to periodic positions at its solvent-exposed side TFs often also bind nucleosomal DNA in a particular orientation, because the nucleosome breaks the local rotational symmetry of DNA Some TFs also specifically interact with DNA located at the dyad position where only one DNA gyre is wound, whereas other TFs prefer sites spanning two DNA gyres and bind specifically to each of them Our work reveals striking differences in TF binding to free and nucleosomal DNA, and uncovers a rich interaction landscape between the TFs and the nucleosome

74 citations


Journal ArticleDOI
TL;DR: A large-scale PWAS study is performed to comprehensively characterize TF-binding landscape that is associated with CRC, which identifies 731 allele-specific TF binding at 116 CRC risk loci.
Abstract: Genome-wide association studies have identified a great number of non-coding risk variants for colorectal cancer (CRC). To date, the majority of these variants have not been functionally studied. Identification of allele-specific transcription factor (TF) binding is of great importance to understand regulatory consequences of such variants. A recently developed proteome-wide analysis of disease-associated SNPs (PWAS) enables identification of TF-DNA interactions in an unbiased manner. Here we perform a large-scale PWAS study to comprehensively characterize TF-binding landscape that is associated with CRC, which identifies 731 allele-specific TF binding at 116 CRC risk loci. This screen identifies the A-allele of rs1800734 within the promoter region of MLH1 as perturbing the binding of TFAP4 and consequently increasing DCLK3 expression through a long-range interaction, which promotes cancer malignancy through enhancing expression of the genes related to epithelial-to-mesenchymal transition.

55 citations


Journal ArticleDOI
TL;DR: Random Sequence Labels (RSLs) are incorporated into the guide library, which act as unique molecular identifiers (UMIs) to allow massively parallel lineage tracing and lineage dropout screening.
Abstract: Loss-of-function screening by CRISPR/Cas9 gene knockout with pooled, lentiviral guide libraries is a widely applicable method for systematic identification of genes contributing to diverse cellular phenotypes. Here, Random Sequence Labels (RSLs) are incorporated into the guide library, which act as unique molecular identifiers (UMIs) to allow massively parallel lineage tracing and lineage dropout screening. RSLs greatly improve the reproducibility of results by increasing both the precision and the accuracy of screens. They reduce the number of cells needed to reach a set statistical power, or allow a more robust screen using the same number of cells.

54 citations


Journal ArticleDOI
06 Jun 2017-eLife
TL;DR: It is shown that a 538 kb deletion of the entire MYC upstream super-enhancer region in mice results in 50% to 80% decrease in Myc expression in multiple tissues, and indicates that targeting the activity of this element is a promising strategy of cancer chemoprevention and therapy.
Abstract: The gene desert upstream of the MYC oncogene on chromosome 8q24 contains susceptibility loci for several major forms of human cancer The region shows high conservation between human and mouse and contains multiple MYC enhancers that are activated in tumor cells However, the role of this region in normal development has not been addressed Here we show that a 538 kb deletion of the entire MYC upstream super-enhancer region in mice results in 50% to 80% decrease in Myc expression in multiple tissues The mice are viable and show no overt phenotype However, they are resistant to tumorigenesis, and most normal cells isolated from them grow slowly in culture These results reveal that only cells whose MYC activity is increased by serum or oncogenic driver mutations depend on the 8q24 super-enhancer region, and indicate that targeting the activity of this element is a promising strategy of cancer chemoprevention and therapy

52 citations


Journal ArticleDOI
TL;DR: It is found that tumor and normal cells are differentially sensitive to loss of the histone genes transcriptional regulator CASP8AP2, indicating that nucleosome depletion is sensed in normal cells via a DNA-damage -like response that is defective in tumor cells.
Abstract: To identify cell cycle regulators that enable cancer cells to replicate DNA and divide in an unrestricted manner, we performed a parallel genome-wide RNAi screen in normal and cancer cell lines. In addition to many shared regulators, we found that tumor and normal cells are differentially sensitive to loss of the histone genes transcriptional regulator CASP8AP2. In cancer cells, loss of CASP8AP2 leads to a failure to synthesize sufficient amount of histones in the S-phase of the cell cycle, resulting in slowing of individual replication forks. Despite this, DNA replication fails to arrest, and tumor cells progress in an elongated S-phase that lasts several days, finally resulting in death of most of the affected cells. In contrast, depletion of CASP8AP2 in normal cells triggers a response that arrests viable cells in S-phase. The arrest is dependent on p53, and preceded by accumulation of markers of DNA damage, indicating that nucleosome depletion is sensed in normal cells via a DNA-damage -like r...

27 citations


Posted ContentDOI
06 Mar 2017-bioRxiv
TL;DR: Random sequence labels (RSLs) were incorporated into the guide-library to allow massively parallel lineage tracing (MPLT) and true dropout screening of CRISPR/Cas9 knockout genes.
Abstract: Loss of function screening by CRISPR/Cas9 gene knockout with pooled, lentiviral guide libraries is a widely applicable method for systematic identification of genes contributing to diverse cellular phenotypes. Here, random sequence labels (RSLs) were incorporated into the guide-library. RSLs function as internal replicates for robust and reproducible hit calling, and act as unique molecular identifiers (UMIs) to allow massively parallel lineage tracing (MPLT) and true dropout screening.

Journal ArticleDOI
TL;DR: A statistical model for the somatic background indel mutation rate of microsatellites and estimated clonality of mutations determined the most likely MSI target genes to be the aminoadipate-semialdehyde dehydrogenase AASDH and the solute transporter SLC9A8.
Abstract: Approximately 15% of colorectal cancers exhibit microsatellite instability (MSI), which leads to accumulation of large numbers of small insertions and deletions (indels). Genes that provide growth advantage to cells via loss-of-function mutations in microsatellites are called MSI target genes. Several criteria to define these genes have been suggested, one of them being simple mutation frequency. Microsatellite mutation rate, however, depends on the length and nucleotide context of the microsatellite. Therefore, assessing the general impact of mismatch repair deficiency on the likelihood of mutation events is paramount when following this approach. To identify MSI target genes, we developed a statistical model for the somatic background indel mutation rate of microsatellites to assess mutation significance. Exome sequencing data of 24 MSI colorectal cancers revealed indels at 54 million mononucleotide microsatellites of three or more nucleotides in length. The top 105 microsatellites from 71 genes were further analyzed in 93 additional MSI colorectal cancers. Mutation significance and estimated clonality of mutations determined the most likely MSI target genes to be the aminoadipate-semialdehyde dehydrogenase AASDH and the solute transporter SLC9A8 Our findings offer a systematic profiling of the somatic background mutation rate in protein-coding mononucleotide microsatellites, allowing a full cataloging of the true targets of MSI in colorectal cancer. Cancer Res; 77(15); 4078-88. ©2017 AACR.

Journal ArticleDOI
TL;DR: This work characterize the first MED12 5′ end nonsense mutation identified in acute lymphoblastic leukemia and show that it escapes nonsense‐mediated mRNA decay (NMD) by using an alternative translation initiation site.
Abstract: MED12 is a key component of the transcription-regulating Mediator complex. Specific missense and in-frame insertion/deletion mutations in exons 1 and 2 have been identified in uterine leiomyomas, breast tumors, and chronic lymphocytic leukemia. Here, we characterize the first MED12 5' end nonsense mutation (c.97G>T, p.E33X) identified in acute lymphoblastic leukemia and show that it escapes nonsense-mediated mRNA decay (NMD) by using an alternative translation initiation site. The resulting N-terminally truncated protein is unable to enter the nucleus due to the lack of identified nuclear localization signal (NLS). The absence of NLS prevents the mutant MED12 protein to be recognized by importin-α and subsequent loading into the nuclear pore complex. Due to this mislocalization, all interactions between the MED12 mutant and other Mediator components are lost. Our findings provide new mechanistic insights into the MED12 functions and indicate that somatic nonsense mutations in early exons may avoid NMD.

Posted ContentDOI
25 Aug 2017-bioRxiv
TL;DR: It is reported here that genome editing by CRISPR/Cas9 induces a p53-mediated DNA damage response and cell cycle arrest, and Transient inhibition of p53 prevents this response, and increases the rate of homologous recombination more than five-fold.
Abstract: We report here that genome editing by CRISPR/Cas9 induces a p53-mediated DNA damage response and cell cycle arrest. Transient inhibition of p53 prevents this response, and increases the rate of homologous recombination more than five-fold. This provides a way to improve precision genome editing of normal cells, but warrants caution in using CRISPR for human therapies until the mechanism of the activation of p53 is elucidated.

Posted ContentDOI
17 Oct 2017-bioRxiv
TL;DR: A novel massively parallel protein activity assay, Active TF Identification (ATI) is developed that can identify DNA-binding activity of all TFs from any species or tissue type and finds that a set of TFs binding to only around ten distinct motifs display strong DNA- binding activity in any given cell or tissue types.
Abstract: It is well established that transcription factors (TFs) play crucial roles in determining cell identity, and that a large fraction of all TFs are expressed in most cell types. In order to globally characterize activities of TFs in cells, we have developed a novel massively parallel protein activity assay, Active TF Identification (ATI) that measures DNA-binding activity of all TFs from any species or tissue type. In contrast to previous studies based on mRNA expression or protein abundance, we found that a set of TFs binding to only around ten distinct motifs display strong DNA-binding activity in any given cell or tissue type. Mass spectrometric identification of TFs revealed that within these highly active TFs, there were both housekeeping TFs, which were universally found in all cell types, and specific TFs, which were highly enriched in known factors that determine the fate of the analyzed tissue or cell type. The importance of a small subset of TFs for determining the overall accessible chromatin landscape of a cell suggests that gene regulatory logic may be simpler than what has previously been appreciated.

Posted ContentDOI
17 Oct 2017-bioRxiv
TL;DR: A novel massively parallel protein activity assay, Active TF Identification (ATI) is developed that measures DNA-binding activity of all TFs from any species or tissue type and finds that a set of TFs binding to only around ten distinct motifs display strong DNA- binding activity in any given cell or tissue types.
Abstract: It is well established that transcription factors (TFs) play crucial roles in determining cell identity, and that a large fraction of all TFs are expressed in most cell types. In order to globally characterize activities of TFs in cells, we have developed a novel massively parallel protein activity assay, Active TF Identification (ATI) that measures DNA-binding activity of all TFs from any species or tissue type. In contrast to previous studies based on mRNA expression or protein abundance, we found that a set of TFs binding to only around ten distinct motifs display strong DNA-binding activity in any given cell or tissue type. Mass spectrometric identification of TFs revealed that within these highly active TFs, there were both housekeeping TFs, which were universally found in all cell types, and specific TFs, which were highly enriched in known factors that determine the fate of the analyzed tissue or cell type. The importance of a small subset of TFs for determining the overall accessible chromatin landscape of a cell suggests that gene regulatory logic may be simpler than what has previously been appreciated.

Posted ContentDOI
07 Nov 2017-bioRxiv
TL;DR: The structures of human HOXB13 and CDX2 bound to their two optimal DNA sequences, CAATAAA and TCGTAAA, revealed that both sites were bound with similar ΔG, but the interaction with the CAA sequence was driven by change in enthalpy, whereas the TCG site was bound withSimilar affinity due to smaller loss of entropy.
Abstract: Most transcription factors (TFs) are thought to recognize a single optimal sequence, with mutations to this sequence decreasing binding in a multiplicative manner. However, some TFs display epistasis, with multiple substitutions to their optimal site alleviating each others effects. These TFs can bind to two distinct sequences that represent two local optima in the Gibbs free energy of binding (ΔG). To determine the molecular mechanism behind this effect, we solved the structures of human HOXB13 and CDX2 proteins bound to their two optimal DNA sequences, CAATAAA and TCGTAAA. Striking differences were observed in the recognition of the distinct part of the sequence. Whereas the CAA trinucleotide was recognized by a network of direct and water-mediated hydrogen bonds, no such interactions were visible in the TCG structures of both TFs, suggesting that the solvent molecules at the interface were disordered. Thermodynamic analyses by isothermal titration calorimetry (ITC) revealed that both sites were bound with similar ΔG. However, the interaction with the CAA sequence was driven by change in enthalpy (ΔH), whereas the TCG site was bound with similar affinity due to a smaller loss of entropy (ΔS). Additional analysis of BARHL2 and MYF5 using ITC confirmed the presence of entropic and enthalpic optima also for these TFs that can recognize two distinct sequences. The common presence of at least two local optima is general to all macromolecular interactions, as ΔG depends on two partially independent variables ΔH and ΔS according to the central equation of thermodynamics, ΔG = ΔH - TΔS.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: A novel seed-driven algorithm SeedHam for PPM learning that decreases contamination from artefact instances of the motif and thereby allows using larger Hamming neighbourhoods, and a novel seed finding rule, based on analysis of the palindromic structure of sequences is proposed.
Abstract: We formulate and analyze a novel seed-driven algorithm SeedHam for PPM learning. To learn a PPM of length `, the algorithm uses the most frequent `-mer of the training data as a seed, and then restricts the learning into the `-mers of training data that belong to a Hamming neighbourhood of the seed. The PPM is constructed from background corrected counts of such `-mers using an algorithm that estimates a product of ` categorical distributions from a (non-uniform) Hamming sample. The SeedHam method is intended for PPM learning from large sequence sets (up to hundreds of Mbases) containing enriched motif instances. A variant of the method is introduced that decreases contamination from artefact instances of the motif and thereby allows using larger Hamming neighbourhoods. To partially solve the motif orientation problem in two-stranded DNA we propose a novel seed finding rule, based on analysis of the palindromic structure of sequences. Test experiments are reported, that illustrate the relative strengths of different variants of our methods, and show that our algorithm outperforms two popular earlier methods. Availability and implementation: A C++ implementation of the method is available from https://github.com/jttoivon/seedham/ Contact: jarkko.toivonen@cs.helsinki.fi 1998 ACM Subject Classification I.2.6 Learning, G.2.1 Combinatorics, I.5.1 Models, J.3 Life and Medical Sciences