scispace - formally typeset
Search or ask a question
Author

Kohji Okamura

Bio: Kohji Okamura is an academic researcher from University of Tokyo. The author has contributed to research in topics: DNA methylation & CpG site. The author has an hindex of 20, co-authored 67 publications receiving 5836 citations. Previous affiliations of Kohji Okamura include Ochanomizu University & The Centre for Applied Genomics.


Papers
More filters
Journal ArticleDOI
23 Nov 2006-Nature
TL;DR: A first-generation CNV map of the human genome is constructed through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia, underscoring the importance of CNV in genetic diversity and evolution and the utility of this resource for genetic disease studies.
Abstract: Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

4,275 citations

Journal ArticleDOI
TL;DR: Targeted demethylation of CpGs in regulatory regions and dem methylation-dependent 1.7- to 50-fold upregulation of associated genes both in cell culture (embryonic stem cells, cancer cell lines, primary neural precursor cells) and in vivo in mouse fetuses are demonstrated.
Abstract: Despite the importance of DNA methylation in health and disease, technologies to readily manipulate methylation of specific sequences for functional analysis and therapeutic purposes are lacking. Here we adapt the previously described dCas9-SunTag for efficient, targeted demethylation of specific DNA loci. The original SunTag consists of ten copies of the GCN4 peptide separated by 5-amino-acid linkers. To achieve efficient recruitment of an anti-GCN4 scFv fused to the ten-eleven (TET) 1 hydroxylase, which induces demethylation, we changed the linker length to 22 amino acids. The system attains demethylation efficiencies >50% in seven out of nine loci tested. Four of these seven loci showed demethylation of >90%. We demonstrate targeted demethylation of CpGs in regulatory regions and demethylation-dependent 1.7- to 50-fold upregulation of associated genes both in cell culture (embryonic stem cells, cancer cell lines, primary neural precursor cells) and in vivo in mouse fetuses.

374 citations

Journal ArticleDOI
TL;DR: Pl placental-specific imprinting provides evidence for an inheritable epigenetic state that is independent of DNA methylation and the existence of a novel imprinting mechanism at these loci.
Abstract: Genomic imprinting is a form of epigenetic regulation that results in the expression of either the maternally or paternally inherited allele of a subset of genes (Ramowitz and Bartolomei 2011). This imprinted expression of transcripts is crucial for normal mammalian development. In humans, loss-of-imprinting of specific loci results in a number of diseases exemplified by the reciprocal growth phenotypes of the Beckwith-Wiedemann and Silver-Russell syndromes, and the behavioral disorders Angelman and Prader-Willi syndromes (Kagami et al. 2008; Buiting 2010; Choufani et al. 2010; Eggermann 2010; Kelsey 2010; Mackay and Temple 2010). In addition, aberrant imprinting also contributes to multigenic disorders associated with various complex traits and cancer (Kong et al. 2009; Monk 2010). Imprinted loci contain differentially methylated regions (DMRs) where cytosine methylation marks one of the parental alleles, providing cis-acting regulatory elements that influence the allelic expression of surrounding genes. Some DMRs acquire their allelic methylation during gametogenesis, when the two parental genomes are separated, resulting from the cooperation of the de novo methyltransferase DNMT3A and its cofactor DNMT3L (Bourc'his et al. 2001; Hata et al. 2002). These primary, or germline imprinted DMRs are stably maintained throughout somatic development, surviving the epigenetic reprogramming at the oocyte-to-embryo transition (Smallwood et al. 2011; Smith et al. 2012). To confirm that an imprinted DMR functions as an imprinting control region (ICR), disruption of the imprinted expression upon genetic deletion of that DMR, either through experimental targeting in mouse or that which occurs spontaneously in humans, is required. A subset of DMRs, known as secondary DMRs, acquire methylation during development and are regulated by nearby germline DMRs in a hierarchical fashion (Coombes et al. 2003; Lopes et al. 2003; Kagami et al. 2010). With the advent of large-scale, base-resolution methylation technologies, it is now possible to discriminate allelic methylation dictated by sequence variants from imprinted methylation. Yet our knowledge of the total number of imprinted DMRs in humans, and their developmental dynamics, remains incomplete, hampered by genetic heterogeneity of human samples. Here we present high-resolution mapping of human imprinted methylation. We performed whole-genome-wide bisulfite sequencing (WGBS) on leukocyte-, brain-, liver-, and placenta-derived DNA samples to identify partially methylated regions common to all tissues consistent with imprinted DMRs. We subsequently confirmed the partial methylated states in tissues using high-density methylation microarrays. The parental origin of methylation was determined by comparing microarray data for DNA samples from reciprocal genome-wide uniparental disomy (UPD) samples, in which all chromosomes are inherited from one parent (Lapunzina and Monk 2011), and androgenetic hydatidiform moles, which are created by the fertilization of an oocyte lacking a nucleus by a sperm that endoreduplicates. The use of uniparental disomies and hydatidiform moles meant that our analyses were not subjected to genotype influences, enabling us to characterize all known imprinted DMRs at base-pair resolution and to identify 21 imprinted domains, which we show are absent in mice. Lastly, we extended our analyses to determine the methylation profiles of all imprinted DMRs in sperm, stem cells derived from parthenogenetically activated metaphase-2 oocyte blastocytes (phES) (Mai et al. 2007; Harness et al. 2011), and stem cells (hES) generated from both six-cell blastomeres and the inner cell mass of blastocysts, delineating the extent of embryonic reprogramming that occurs at these loci during human development.

285 citations

Journal ArticleDOI
TL;DR: The results illustrate the importance of constructing an ethnicity-specific reference genome for identifying rare variants and constructed a Japanese-specific major allele reference genome, by which the number of unique mapping of the short reads in the data has increased 0.045% on average.
Abstract: Whole-genome and -exome resequencing using next-generation sequencers is a powerful approach for identifying genomic variations that are associated with diseases. However, systematic strategies for prioritizing causative variants from many candidates to explain the disease phenotype are still far from being established, because the population-specific frequency spectrum of genetic variation has not been characterized. Here, we have collected exomic genetic variation from 1208 Japanese individuals through a collaborative effort, and aggregated the data into a prevailing catalog. In total, we identified 156 622 previously unreported variants. The allele frequencies for the majority (88.8%) were lower than 0.5% in allele frequency and predicted to be functionally deleterious. In addition, we have constructed a Japanese-specific major allele reference genome by which the number of unique mapping of the short reads in our data has increased 0.045% on average. Our results illustrate the importance of constructing an ethnicity-specific reference genome for identifying rare variants. All the collected data were centralized to a newly developed database to serve as useful resources for exploring pathogenic variations. Public access to the database is available at http://www.genome.med.kyoto-u.ac.jp/SnpDB/.

261 citations

Journal ArticleDOI
TL;DR: Observations indicate that ZNF384-related fusion genes consist of a distinct subgroup of B-cell precursor acute lymphoblastic leukemia with a characteristic immunophenotype, while the clinical features depend on the functional properties of individual fusion partners.
Abstract: Fusion genes involving ZNF384 have recently been identified in B-cell precursor acute lymphoblastic leukemia, and 7 fusion partners have been reported. We further characterized this type of fusion gene by whole transcriptome sequencing and/or polymerase chain reaction. In addition to previously reported genes, we identified BMP2K as a novel fusion partner for ZNF384. Including the EP300-ZNF384 that we reported recently, the total frequency of ZNF384-related fusion genes was 4.1% in 291 B-cell precursor acute lymphoblastic leukemia patients enrolled in a single clinical trial, and TCF3-ZNF384 was the most recurrent, with a frequency of 2.4%. The characteristic immunophenotype of weak CD10 and aberrant CD13 and/or CD33 expression was revealed to be a common feature of the leukemic cells harboring ZNF384-related fusion genes. The signature gene expression profile in TCF3-ZNF384-positive patients was enriched in hematopoietic stem cell features and related to that of EP300-ZNF384-positive patients, but was significantly distinct from that of TCF3-PBX1-positive and ZNF384-fusion-negative patients. However, clinical features of TCF3-ZNF384-positive patients are markedly different from those of EP300-ZNF384-positive patients, exhibiting higher cell counts and a younger age at presentation. TCF3-ZNF384-positive patients revealed a significantly poorer steroid response and a higher frequency of relapse, and the additional activating mutations in RAS signaling pathway genes were detected by whole exome analysis in some of the cases. Our observations indicate that ZNF384-related fusion genes consist of a distinct subgroup of B-cell precursor acute lymphoblastic leukemia with a characteristic immunophenotype, while the clinical features depend on the functional properties of individual fusion partners.

149 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: This work presents Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer, and uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.
Abstract: We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.

13,008 citations

Journal Article
Fumio Tajima1
30 Oct 1989-Genomics
TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

11,521 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
18 Oct 2007-Nature
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

4,565 citations