scispace - formally typeset
Search or ask a question

Showing papers by "Satoru Miyano published in 2012"


Journal ArticleDOI
TL;DR: The whole-genome sequencing analysis of HCCs identified the influence of etiological background on somatic mutation patterns and subsequent carcinogenesis, as well as recurrent mutations in chromatin regulators in H CCs.
Abstract: Hidewaki Nakagawa and colleagues report the whole-genome sequencing of 27 hepatocellular carcinomas. They find that chromatin regulators were mutated in approximately 50% of tumors.

785 citations


Journal ArticleDOI
TL;DR: An extensive genome-wide expression profiling of 226 primary human stage I-II lung adenocarcinomas helps identify patients who may gain the most benefit from adjuvant chemotherapy after surgical resection and further stratify more or less aggressive subgroups of triple-negative lung ADC.
Abstract: Activation of the EGFR, KRAS, and ALK oncogenes defines 3 different pathways of molecular pathogenesis in lung adenocarcinoma. However, many tumors lack activation of any pathway (triple-negative lung adenocarcinomas) posing a challenge for prognosis and treatment. Here, we report an extensive genome-wide expression profiling of 226 primary human stage I-II lung adenocarcinomas that elucidates molecular characteristics of tumors that harbor ALK mutations or that lack EGFR, KRAS, and ALK mutations, that is, triple-negative adenocarcinomas. One hundred and seventy-four genes were selected as being upregulated specifically in 79 lung adenocarcinomas without EGFR and KRAS mutations. Unsupervised clustering using a 174-gene signature, including ALK itself, classified these 2 groups of tumors into ALK-positive cases and 2 distinct groups of triple-negative cases (groups A and B). Notably, group A triple-negative cases had a worse prognosis for relapse and death, compared with cases with EGFR, KRAS, or ALK mutations or group B triple-negative cases. In ALK-positive tumors, 30 genes, including ALK and GRIN2A, were commonly overexpressed, whereas in group A triple-negative cases, 9 genes were commonly overexpressed, including a candidate diagnostic/therapeutic target DEPDC1, that were determined to be critical for predicting a worse prognosis. Our findings are important because they provide a molecular basis of ALK-positive lung adenocarcinomas and triple-negative lung adenocarcinomas and further stratify more or less aggressive subgroups of triple-negative lung ADC, possibly helping identify patients who may gain the most benefit from adjuvant chemotherapy after surgical resection.

621 citations


Journal ArticleDOI
TL;DR: The proposed algorithm first divides genes into subsets, the sizes of which are relatively small, then selects informative smaller subsets of genes from a subset and merges the chosen genes with another gene subset to update the gene subset.
Abstract: Most of the conventional feature selection algorithms have a drawback whereby a weakly ranked gene that could perform well in terms of classification accuracy with an appropriate subset of genes will be left out of the selection. Considering this shortcoming, we propose a feature selection algorithm in gene expression data analysis of sample classifications. The proposed algorithm first divides genes into subsets, the sizes of which are relatively small (roughly of size h), then selects informative smaller subsets of genes (of size r <; h) from a subset and merges the chosen genes with another gene subset (of size r) to update the gene subset. We repeat this process until all subsets are merged into one informative subset. We illustrate the effectiveness of the proposed algorithm by analyzing three distinct gene expression data sets. Our method shows promising classification accuracy for all the test data sets. We also show the relevance of the selected genes in terms of their biological functions.

186 citations


Journal ArticleDOI
19 Sep 2012-PLOS ONE
TL;DR: It is re-emphasized that EGF signaling status in cancer cells underlies an aggressive phenotype of cancer cells, which is useful for the selection of early-stage lung adenocarcinoma patients with a poor prognosis.
Abstract: Purpose To identify stage I lung adenocarcinoma patients with a poor prognosis who will benefit from adjuvant therapy.

128 citations


Journal ArticleDOI
TL;DR: A suite of tools for analysing regulatory networks is assembled, and their use with microarray datasets generated in human endothelial cells is illustrated.
Abstract: Gene regulatory networks inferred from RNA abundance data have generated significant interest, but despite this, gene network approaches are used infrequently and often require input from bioinformaticians. We have assembled a suite of tools for analysing regulatory networks, and we illustrate their use with microarray datasets generated in human endothelial cells. We infer a range of regulatory networks, and based on this analysis discuss the strengths and limitations of network inference from RNA abundance data. We welcome contact from researchers interested in using our inference and visualization tools to answer biological questions.

71 citations


Journal ArticleDOI
TL;DR: A null space based feature selection method for gene expression data in terms of supervised classification that discards the redundant genes by applying the information of null space of scatter matrices is proposed.
Abstract: Feature selection is quite an important process in gene expression data analysis. Feature selection methods discard unimportant genes from several thousands of genes for finding important genes or pathways for the target biological phenomenon like cancer. The obtained gene subset is used for statistical analysis for prediction such as survival as well as functional analysis for understanding biological characteristics. In this paper we propose a null space based feature selection method for gene expression data in terms of supervised classification. The proposed method discards the redundant genes by applying the information of null space of scatter matrices. We derive the method theoretically and demonstrate its effectiveness on several DNA gene expression datasets. The method is easy to implement and computationally efficient.

71 citations


Journal ArticleDOI
TL;DR: Hundreds of genes were affected in all tissues in each of the colonized models; however, a gene set enrichment analysis method, MetaGene Profiler, demonstrated that the specific changes of Gene Ontology (GO) categories occurred predominantly in 0WexGF LI, SPF SI, and 5Wex GF SPL, respectively.
Abstract: Epidemiological studies have suggested that the encounter with commensal microorganisms during the neonatal period is essential for normal development of the host immune system. Basic research involving gnotobiotic mice has demonstrated that colonization at the age of 5 weeks is too late to reconstitute normal immune function. In this study, we examined the transcriptome profiles of the large intestine (LI), small intestine (SI), liver (LIV), and spleen (SPL) of 3 bacterial colonization models—specific pathogen- free mice (SPF), ex-germ-free mice with bacterial reconstitution at the time of delivery (0WexGF), and ex-germ-free mice with bacterial reconstitution at 5 weeks of age (5WexGF)—and compared them with those of germ-free (GF) mice. Hundreds of genes were affected in all tissues in each of the colonized models; however, a gene set enrichment analysis method, MetaGene Profiler (MGP), demonstrated that the specific changes of Gene Ontology (GO) categories occurred predominantly in 0WexGF LI, SPF SI, and 5WexGF SPL, respectively. MGP analysis on signal pathways revealed prominent changes in toll-like receptor (TLR)- and type 1 interferon (IFN)-signaling in LI of 0WexGF and SPF mice, but not 5WexGF mice, while 5WexGF mice showed specific changes in chemokine signaling. RT-PCR analysis of TLR-related genes showed that the expression of interferon regulatory factor 3 (Irf3), a crucial rate-limiting transcription factor in the induction of type 1 IFN, prominently decreased in 0WexGF and SPF mice but not in 5WexGF and GF mice. The present study provides important new information regarding the molecular mechanisms of the so-called "hygiene hypothesis".

48 citations


Journal ArticleDOI
20 Apr 2012-PLOS ONE
TL;DR: In vitro functional genomics analysis of microarray data from A375 melanoma cells treated in vitro with siRNAs identified proliferation-association RNA clusters that are linked to melanoma patient prognosis and are potential prognostic biomarkers and drug targets.
Abstract: Background Our understanding of the molecular pathways that underlie melanoma remains incomplete. Although several published microarray studies of clinical melanomas have provided valuable information, we found only limited concordance between these studies. Therefore, we took an in vitro functional genomics approach to understand melanoma molecular pathways. Methodology/Principal Findings Affymetrix microarray data were generated from A375 melanoma cells treated in vitro with siRNAs against 45 transcription factors and signaling molecules. Analysis of this data using unsupervised hierarchical clustering and Bayesian gene networks identified proliferation-association RNA clusters, which were co-ordinately expressed across the A375 cells and also across melanomas from patients. The abundance in metastatic melanomas of these cellular proliferation clusters and their putative upstream regulators was significantly associated with patient prognosis. An 8-gene classifier derived from gene network hub genes correctly classified the prognosis of 23/26 metastatic melanoma patients in a cross-validation study. Unlike the RNA clusters associated with cellular proliferation described above, co-ordinately expressed RNA clusters associated with immune response were clearly identified across melanoma tumours from patients but not across the siRNA-treated A375 cells, in which immune responses are not active. Three uncharacterised genes, which the gene networks predicted to be upstream of apoptosis- or cellular proliferation-associated RNAs, were found to significantly alter apoptosis and cell number when over-expressed in vitro. Conclusions/Significance This analysis identified co-expression of RNAs that encode functionally-related proteins, in particular, proliferation-associated RNA clusters that are linked to melanoma patient prognosis. Our analysis suggests that A375 cells in vitro may be valid models in which to study the gene expression modules that underlie some melanoma biological processes (e.g., proliferation) but not others (e.g., immune response). The gene expression modules identified here, and the RNAs predicted by Bayesian network inference to be upstream of these modules, are potential prognostic biomarkers and drug targets.

34 citations


Journal ArticleDOI
TL;DR: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.
Abstract: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.

30 citations


Journal ArticleDOI
TL;DR: The algorithm can perform bulk reduction of features (genes) while maintaining the quality information in the reduced subset of features for discriminative purpose and can be used as a pre-processing step for other feature selection algorithms.
Abstract: We propose a new filter based feature selection algorithm for classification based on DNA microarray gene expression data. It utilizes null space of covariance matrix for feature selection. The algorithm can perform bulk reduction of features (genes) while maintaining the quality information in the reduced subset of features for discriminative purpose. Thus, it can be used as a pre-processing step for other feature selection algorithms. The algorithm does not assume statistical independency among the features. The algorithm shows promising classification accuracy when compared with other existing techniques on several DNA microarray gene expression datasets.

27 citations


Journal ArticleDOI
TL;DR: Computational gene network analysis revealed a novel molecular system that may play an important role in the TNF-induced angiogenesis seen in cancer and rheumatic disease, and suggests that Bayesian network analysis linked to functional annotation may be a powerful tool to provide insight into disease.
Abstract: TNF (Tumor Necrosis Factor-α) induces HUVEC (Human Umbilical Vein Endothelial Cells) to proliferate and form new blood vessels. This TNF-induced angiogenesis plays a key role in cancer and rheumatic disease. However, the molecular system that underlies TNF-induced angiogenesis is largely unknown. We analyzed the gene expression changes stimulated by TNF in HUVEC over a time course using microarrays to reveal the molecular system underlying TNF-induced angiogenesis. Traditional k-means clustering analysis was performed to identify informative temporal gene expression patterns buried in the time course data. Functional enrichment analysis using DAVID was then performed for each cluster. The genes that belonged to informative clusters were then used as the input for gene network analysis using a Bayesian network and nonparametric regression method. Based on this TNF-induced gene network, we searched for sub-networks related to angiogenesis by integrating existing biological knowledge. k-means clustering of the TNF stimulated time course microarray gene expression data, followed by functional enrichment analysis identified three biologically informative clusters related to apoptosis, cellular proliferation and angiogenesis. These three clusters included 648 genes in total, which were used to estimate dynamic Bayesian networks. Based on the estimated TNF-induced gene networks, we hypothesized that a sub-network including IL6 and IL8 inhibits apoptosis and promotes TNF-induced angiogenesis. More particularly, IL6 promotes TNF-induced angiogenesis by inducing NF-κB and IL8, which are strong cell growth factors. Computational gene network analysis revealed a novel molecular system that may play an important role in the TNF-induced angiogenesis seen in cancer and rheumatic disease. This analysis suggests that Bayesian network analysis linked to functional annotation may be a powerful tool to provide insight into disease.

Journal ArticleDOI
TL;DR: A set of univariate filter-based methods using a between-class overlapping criterion are accurate and robust, have biological significance, and are computationally efficient and easy to implement, well suited for biological and clinical discoveries.
Abstract: Feature selection algorithms play a crucial role in identifying and discovering important genes for cancer classification. Feature selection algorithms can be broadly categorized into two main groups: filter-based methods and wrapper-based methods. Filter-based methods have been quite popular in the literature due to their many advantages, including computational efficiency, simplistic architecture, and an intuitively simple means of discovering biological and clinical aspects. However, these methods have limitations, and the classification accuracy of the selected genes is less accurate. In this paper, we propose a set of univariate filter-based methods using a between-class overlapping criterion. The proposed techniques have been compared with many other univariate filter-based methods using an acute leukemia dataset. The following properties have been examined: classification accuracy of the selected individual genes and the gene subsets; redundancy check among selected genes using ridge regression and LASSO methods; similarity and sensitivity analyses; functional analysis; and, stability analysis. A comprehensive experiment shows promising results for our proposed techniques. The univariate filter based methods using between-class overlapping criterion are accurate and robust, have biological significance, and are computationally efficient and easy to implement. Therefore, they are well suited for biological and clinical discoveries.

Journal ArticleDOI
TL;DR: A powerful statistical framework for the identification of driver aberrations, which would be applicable to ever-increasing amounts of cancer genomic data seen in the era of next generation sequencing is proposed.
Abstract: Motivation: In cancer genomes, chromosomal regions harboring cancer genes are often subjected to genomic aberrations like copy number alteration and loss of heterozygosity. Given this, finding recurrent genomic aberrations is considered an apt approach for screening cancer genes. Although several permutation-based tests have been proposed for this purpose, none of them are designed to find recurrent aberrations from the genomic dataset without paired normal sample controls. Their application to unpaired genomic data may lead to false discoveries, because they retrieve pseudo-aberrations that exist in normal genomes as polymorphisms. Results: We develop a new parametric method named parametric aberration recurrence test (PART) to test for the recurrence of genomic aberrations. The introduction of Poisson-binomial statistics allow us to compute small P-values more efficiently and precisely than the previously proposed permutation-based approach. Moreover, we extended PART to cover unpaired data (PART-up) so that there is a statistical basis for analyzing unpaired genomic data. PART-up uses information from unpaired normal sample controls to remove pseudo-aberrations in unpaired genomic data. Using PART-up, we successfully predict recurrent genomic aberrations in cancer cell line samples whose paired normal sample controls are unavailable. This article thus proposes a powerful statistical framework for the identification of driver aberrations, which would be applicable to ever-increasing amounts of cancer genomic data seen in the era of next generation sequencing. Availability: Our implementations of PART and PART-up are available from http://www.hgc.jp/~niiyan/PART/manual.html. Contact: aniida@ims.u-tokyo.ac.jp Supplementary information:Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: PRD, an online database of PRIs, dispersed across several sources, is introduced, which is a standard model for describing detailed molecular interactions, with an emphasis on gene level data.
Abstract: Although protein–RNA interactions (PRIs) are involved in various important cellular processes, compiled data on PRIs are still limited. This contrasts with protein–protein interactions, which have been intensively recorded in public databases and subjected to network level analysis. Here, we introduce PRD, an online database of PRIs, dispersed across several sources, including scientific literature. Currently, over 10,000 interactions have been stored in PRD using PSI-MI 2.5, which is a standard model for describing detailed molecular interactions, with an emphasis on gene level data. Users can browse all recorded interactions and execute flexible keyword searches against the database via a web interface. Our database is not only a reference of PRIs, but will also be a valuable resource for studying characteristics of PRI networks. Availability PRD can be freely accessed at http://pri.hgc.jp/

Journal ArticleDOI
TL;DR: Interactions between the genetic background and environmental factors are associated with increased risk for CRC and there is a robust risk of the minor G allele at the 8q24 rs6983267 SNP; however, a major T allele SNP could more clearly reveal a correlation with CRC specifically when DM is present.
Abstract: Background Colorectal cancer (CRC) oncogenesis was considered to be determined by interactions between genetic and environmental factors. Specific interacting factors that influence CRC morbidity have yet to be fully investigated.

Journal ArticleDOI
16 Nov 2012-Blood
TL;DR: This study focused on the RAS protein superfamily of small GTPases and identified somatic recurrent mutations in the F82 residue of Ras-like without CAAX1 ( RIT1 ) gene in 2 patients with chronic myelomonocytic leukemia (CMML) and secondary AML (sAML), respectively, and confirmed the somatic nature of both mutations.


Journal ArticleDOI
TL;DR: This work proposes a statistical method for uncovering gene pathways that characterize cancer heterogeneity that can reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes.
Abstract: We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.

Journal ArticleDOI
17 Jan 2012
TL;DR: A new statistical approach is proposed that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions and can identifyChanges on regulations more accurately than existing methods.
Abstract: In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded. We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood. For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib. From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.

Journal ArticleDOI
TL;DR: The IRView web interface displays all IR data, including user-uploaded data, on reference sequences so that the positional relationship between IRs can be easily understood and should be useful for analyzing underlying relationships between the proteins behind the PPI networks.
Abstract: Summary: Protein–protein interactions (PPIs) are mediated through specific regions on proteins. Some proteins have two or more protein interacting regions (IRs) and some IRs are competitively used for interactions with different proteins. IRView currently contains data for 3417 IRs in human and mouse proteins. The data were obtained from different sources and combined with annotated region data from InterPro. Information on non-synonymous single nucleotide polymorphism sites and variable regions owing to alternative mRNA splicing is also included. The IRView web interface displays all IR data, including user-uploaded data, on reference sequences so that the positional relationship between IRs can be easily understood. IRView should be useful for analyzing underlying relationships between the proteins behind the PPI networks. Availability: IRView is publicly available on the web at http://ir.hgc.jp/. Contact: pj.ca.oykot-u.smi@okenoken

Proceedings Article
09 Jul 2012
TL;DR: Numerical experiments indicate that a profile consisting of the total number of patients is insufficient and a role-specific profile is needed to reconstruct the assumption on the ratio of reproduction numbers in workplaces and in homes from pseudo-observation time-course data generated by a simulation run.
Abstract: Agent-based simulation is one of the approaches that can be applied to simulate the transmission of infectious disease such as influenza within a city. Several types of agents with different behaviours are allocated to a model city and the transmission between city residents is stochastically solved locally. Simulations corresponding to specific intervention measures are carried out. However, due to the large number of parameters in the simulation, which cannot be fully constrained by surveillance evidence and epidemiological knowledge, one sometimes judges candidate intervention measures from simulation results with arbitrarily fixed parameters. In the present study, we have conducted numerical experiments to estimate reproduction numbers (transmissibility parameters) in workplaces and in homes from pseudo-observation time-course data generated by a simulation run. This pseudo-observation is generated under the assumption that transmission in workplaces is more effective than in homes. The ratio of these numbers are considered to affect the response to intervention. Our experiments indicate that a profile consisting of the total number of patients is insufficient; rather, a role-specific profile is needed to reconstruct the assumption on the ratio of reproduction numbers.

Proceedings ArticleDOI
TL;DR: It is demonstrated that abnormalRNA splicing caused by mutations of multiple genes on RNA splicing pathway is a common feature of myelodysplasia and is a major source for protein diversity in higher eukaryotes.
Abstract: MDS are a group of myeloid neoplasms characterized by deregulated blood cell production and a high propensity to AML. Although a number of gene alterations have been implicated in the pathogenesis of MDS, they do not fully explain the pathogenesis of MDS. So, in order to clarify a comprehensive registry of gene mutations in MDS, we performed whole-exome sequencing of 29 cases with MDS and related myeloid neoplasm. A total of 268 somatic mutations or 9.2 mutations per sample were identified. Among these 9 genes were mutated in more than 2 cases, which not only included a spectrum of known gene targets in MDS, but also affected previously unknown genes that are commonly involved in RNA splicing pathway, including U2AF35, SRSF2 and ZRSR2. Together with additional three (SF3A1, SF3B1 and PRPF40B) found in single cases, 16 (55.2%) of the 29 discovery cases carried a mutation affecting the component of the splicing machinery. To confirm the observation, we examined 9 spliceosome genes for mutations in a large set of myeloid neoplasms. In total, 219 mutations were identified in 209 out of the 582 samples of myeloid neoplasms. RNA splicing pathway mutations were highly specific to myelodysplasia, including 19 of 23 (83%) cases with RARS, 43 of 50 (86%) RCMD-RS, 68 of 155 (44%) other MDS, 48 of 88 (55%) CMML, and 16 of 62 (26%) secondary AML with MDS features with a string preference of SF3B1 mutations to RARS and RCMD-RS and of SRSF2 to CMML, while they were rare in cases with de novo AML and MPN. Significantly, these mutations occurred in an almost completely mutually exclusive manner among mutated cases, suggesting the importance of deregulated RNA splicing in the pathogenesis of MDS. RNA splicing plays critical roles in differentiation, development, and disease and is a major source for protein diversity in higher eukaryotes. Splicing pathway mutations in myelodysplasia commonly affected those components of the splicing complex that are engaged in the 3′ splice site recognition, strongly indicating production of unspliced or aberrantly spliced RNA species are incriminated for the pathogenesis of MDS. So, to clarify the effect of these splicing mutations on RNA splicing, we expressed the wild-type and the mutant U2AF35 or SRSF2 in HeLa cells and performed whole transcriptome analysis in these cells. The results of exon array showed that the wild-type U2AF35 promoted RNA splicing correctly, whereas the mutant U2AF35 inhibited this processes and rendered intronic sequences to remain unspliced. RNA sequencing additionally showed that the number of reads that encompassed the exon/intron junctions was significantly increased in mutant U2AF35-transduced cells. This result means that mutant U2AF35 actually induced impaired 3′-splice site recognition during pre-mRNA processing. In conclusion, our study demonstrated that abnormal RNA splicing caused by mutations of multiple genes on RNA splicing pathway is a common feature of myelodysplasia. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr 5119. doi:1538-7445.AM2012-5119

Journal ArticleDOI
16 Nov 2012-Blood
TL;DR: Novel germline mutations of RP genes that could be responsible for Diamond-Blackfan anemia are identified, further confirming the concept that the RP genes are common targets of Germline mutations in DBA patients and also suggested the presence of non-RP gene targets for DNA.

Journal ArticleDOI
TL;DR: ChopSticks can generate high-resolution deletion calls of homozygous deletions using information independent of other methods, and it is therefore useful to examine the functional impact of SVs or to infer molecular mechanisms of SV generation mechanisms.
Abstract: Structural variations (SVs) in genomes are commonly observed even in healthy individuals and play key roles in biological functions. To understand their functional impact or to infer molecular mechanisms of SVs, they have to be characterized with the maximum resolution. However, high-resolution analysis is a difficult task because it requires investigation of the complex structures involved in an enormous number of alignments of next-generation sequencing (NGS) reads and genome sequences that contain errors. We propose a new method called ChopSticks that improves the resolution of SV detection for homozygous deletions even when the depth of coverage is low. Conventional methods based on read pairs use only discordant pairs to localize the positions of deletions, where a discordant pair is a read pair whose alignment has an aberrant strand or distance. In contrast, our method exploits concordant reads as well. We theoretically proved that when the depth of coverage approaches zero or infinity, the expected resolution of our method is asymptotically equal to that of methods based only on discordant pairs under double coverage. To confirm the effectiveness of ChopSticks, we conducted computational experiments against both simulated NGS reads and real NGS sequences. The resolution of deletion calls by other methods was significantly improved, thus demonstrating the usefulness of ChopSticks. ChopSticks can generate high-resolution deletion calls of homozygous deletions using information independent of other methods, and it is therefore useful to examine the functional impact of SVs or to infer SV generation mechanisms.

Journal ArticleDOI
16 Nov 2012-Blood
TL;DR: While GATA1 was the only recurrent mutational target in the TAM phase, 8 genes were recurrently mutated in AMKL samples in whole genome/exome sequencing, including NRAS, TP53 and other novel gene targets that had been previously reported to be mutated in other neoplasms.


Journal ArticleDOI
16 Nov 2012-Blood
TL;DR: It is suggested that mutated genes located in CDRs can be pathogenic due to haploinsufficiency of WT genes and heterozygous mutations, among those on del7/7q, cases with wild type forms of corresponding genes showed decreased expression.

Proceedings ArticleDOI
04 Oct 2012
TL;DR: This research analyzes "Monshin" and predicts "Sho" which is the name of a disease in Japanese traditional medicine.
Abstract: In Japanese traditional medicine, "Monshin" plays an important role. "Monshin" is a questionnaire that asked the patient's lifestyle and subjective symptoms. Specialists decide traditional herbal medicine by using of "Monshin". In this research, we analyze "Monshin" and predict "Sho" which is the name of a disease.

Journal ArticleDOI
16 Nov 2012-Blood
TL;DR: The results indicated that a subset of pediatric AML represents a discrete entity that could be discriminated from the adult counterpart, in terms of the spectrum of gene mutations, which is a well-established strategy for obtaining comprehensive spectrum of protein-coding mutations.

Journal ArticleDOI
16 Nov 2012-Blood
TL;DR: A cohort of 168 patients with MDS who received either azacitidine or decitabine for the presence of somatic mutations was screened, finding mutant CBL and PPFIA2 to be strongly associated with response, whereas mutant U2AF1/2, SF3B1 and PRPF8 were stronglyassociated with refractoriness.