scispace - formally typeset
Search or ask a question

Showing papers by "Tao Huang published in 2018"


Journal ArticleDOI
TL;DR: The results showed the distinct mechanisms of the different colorectal cancer subtypes with MSI status and provided the genes that may be the optimal standards to further classify the various molecular subtypes of coloreCTal cancer with distinct MSI status.
Abstract: Colorectal cancer is the third most common cancer in males and second in females. This disease can be caused by genetic and acquired/environmental factors. Microsatellite instability (MSI) is one of the major mechanisms in colorectal cancer. This mechanism is a specific condition of genetic hyper mutability that results from incompetent DNA mismatch repair. MSI has been applied to classify different colorectal cancer subtypes. However, the effects of MSI status on gene expression are largely unknown. In our study, we integrated the gene expression profile and MSI status of all CRC samples from the TCGA database, and then categorized the CRC samples into three subgroups, namely, MSI-stable, MSI-low, and MSI-high, according to the MSI status. We applied a novel computational method based on machine learning and screened the genes specifically expressed for the different colorectal cancer subtypes. The results showed the distinct mechanisms of the different colorectal cancer subtypes with MSI status and provided the genes that may be the optimal standards to further classify the various molecular subtypes of colorectal cancer with distinct MSI status.

74 citations


Journal ArticleDOI
TL;DR: The cerebellum may play a crucial role in the pathogenesis ofSCZ and ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ, and these interesting findings may stimulate novel strategy for developing new drugs against SCZ.
Abstract: Schizophrenia (SCZ) is a devastating genetic mental disorder. Identification of the SCZ risk genes in brains is helpful to understand this disease. Thus, we first used the minimum Redundancy-Maximum Relevance (mRMR) approach to integrate the genome-wide sequence analysis results on SCZ and the expression quantitative trait locus (eQTL) data from ten brain tissues to identify the genes related to SCZ. Second, we adopted the variance inflation factor regression algorithm to identify their interacting genes in brains. Third, using multiple analysis methods, we explored and validated their roles. By means of the aforementioned procedures, we have found that (1) the cerebellum may play a crucial role in the pathogenesis of SCZ and (2) ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ. These interesting findings may stimulate novel strategy for developing new drugs against SCZ. It has not escaped our notice that the approach reported here is of use for studying many other genome diseases as well.

61 citations


Journal ArticleDOI
TL;DR: It is revealed that SNAI1, HMGA2 and VAV2 are the most important genes for TAPVC, which elucidates the possible molecular pathogenesis of this rare congenital birth defect.

56 citations


Journal ArticleDOI
TL;DR: Investigating tissue expression difference between mRNAs and lncRNAs revealed the heterogeneous expression pattern of lncRNA and mRNA and gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.
Abstract: Messenger RNA (mRNA) and long noncoding RNA (lncRNA) are two main subgroups of RNAs participating in transcription regulation. With the development of next generation sequencing, increasing lncRNAs are identified. Many hidden functions of lncRNAs are also revealed. However, the differences in lncRNAs and mRNAs are still unclear. For example, we need to determine whether lncRNAs have stronger tissue specificity than mRNAs and which tissues have more lncRNAs expressed. To investigate such tissue expression difference between mRNAs and lncRNAs, we encoded 9339 lncRNAs and 14,294 mRNAs with 71 expression features, including 69 maximum expression features for 69 types of cells, one feature for the maximum expression in all cells, and one expression specificity feature that was measured as Chao-Shen-corrected Shannon's entropy. With advanced feature selection methods, such as maximum relevance minimum redundancy, incremental feature selection methods, and random forest algorithm, 13 features presented the dissimilarity of lncRNAs and mRNAs. The 11 cell subtype features indicated which cell types of the lncRNAs and mRNAs had the largest expression difference. Such cell subtypes may be the potential cell models for lncRNA identification and function investigation. The expression specificity feature suggested that the cell types to express mRNAs and lncRNAs were different. The maximum expression feature suggested that the maximum expression levels of mRNAs and lncRNAs were different. In addition, the rule learning algorithm, repeated incremental pruning to produce error reduction algorithm, was also employed to produce effective classification rules for classifying lncRNAs and mRNAs, which gave competitive results compared with random forest and could give a clearer picture of different expression patterns between lncRNAs and mRNAs. Results not only revealed the heterogeneous expression pattern of lncRNA and mRNA, but also gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.

55 citations


Journal ArticleDOI
TL;DR: The results not only demonstrate a high classification capacity and subtype‐specific gene expression patterns but also quantitatively reflect the pattern of the gene expression levels across the NSC lineage, providing insight into deciphering the molecular basis of NSC differentiation.
Abstract: Adult neural stem cells (NSCs) are a group of multi-potent, self-renewing progenitor cells that contribute to the generation of new neurons and oligodendrocytes. Three subtypes of NSCs can be isolated based on the stages of the NSC lineage, including quiescent neural stem cells (qNSCs), activated neural stem cells (aNSCs) and neural progenitor cells (NPCs). Although it is widely accepted that these three groups of NSCs play different roles in the development of the nervous system, their molecular signatures are poorly understood. In this study, we applied the Monte-Carlo Feature Selection (MCFS) method to identify the gene expression signatures, which can yield a Matthews correlation coefficient (MCC) value of 0.918 with a support vector machine evaluated by ten-fold cross-validation. In addition, some classification rules yielded by the MCFS program for distinguishing above three subtypes were reported. Our results not only demonstrate a high classification capacity and subtype-specific gene expression patterns but also quantitatively reflect the pattern of the gene expression levels across the NSC lineage, providing insight into deciphering the molecular basis of NSC differentiation. This article is protected by copyright. All rights reserved

55 citations


Journal ArticleDOI
12 Mar 2018-Genes
TL;DR: This study proposes a novel computational method by incorporating several machine learning algorithms, including Monte Carlo feature selection, random forest, and rough set-based rule learning, to identify genes with significant expression differences between patient-derived tumor xenograft (PDX) and original human tumors.
Abstract: Breast cancer is one of the most common malignancies in women. Patient-derived tumor xenograft (PDX) model is a cutting-edge approach for drug research on breast cancer. However, PDX still exhibits differences from original human tumors, thereby challenging the molecular understanding of tumorigenesis. In particular, gene expression changes after tissues are transplanted from human to mouse model. In this study, we propose a novel computational method by incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), random forest (RF), and rough set-based rule learning, to identify genes with significant expression differences between PDX and original human tumors. First, 831 breast tumors, including 657 PDX and 174 human tumors, were collected. Based on MCFS and RF, 32 genes were then identified to be informative for the prediction of PDX and human tumors and can be used to construct a prediction model. The prediction model exhibits a Matthews coefficient correlation value of 0.777. Seven interpretable interactions within the informative gene were detected based on the rough set-based rule learning. Furthermore, the seven interpretable interactions can be well supported by previous experimental studies. Our study not only presents a method for identifying informative genes with differential expression but also provides insights into the mechanism through which gene expression changes after being transplanted from human tumor into mouse model. This work would be helpful for research and drug development for breast cancer.

55 citations


Journal ArticleDOI
TL;DR: The sequences and structures of the RNA molecule were top ranking, implying they can be potential indicators of differences between cirRNAs and other lncRNAs, and an effective classification model to distinguish them was built.
Abstract: As non-coding RNAs, circular RNAs (cirRNAs) and long non-coding RNAs (lncRNAs) have attracted an increasing amount of attention. They have been confirmed to participate in many biological processes, including playing roles in transcriptional regulation, regulating protein-coding genes, and binding to RNA-associated proteins. Until now, the differences between these two types of non-coding RNAs have not been fully uncovered. It is still quite difficult to detect cirRNAs from other lncRNAs using simple techniques. In this study, we investigated these two types of non-coding RNAs using several computational methods. The purpose was to extract important factors that could distinguish cirRNAs from other lncRNAs and build an effective classification model to distinguish them. First, we collected cirRNAs, lncRNAs and their representations from a previous study, in which each cirRNA or lncRNA was represented by 188 features derived from its graph representation, sequence and conservation properties. Second, these features were analyzed by the minimum redundancy maximum relevance (mRMR) method. The obtained mRMR feature list, incremental feature selection method and hierarchical extreme learning machine algorithm were employed to build an optimal classification model with sensitivity of 0.703, specificity of 0.850, accuracy of 0.789 and a Matthews correlation coefficient of 0.561. Finally, we analyzed the 16 most important features. Of them, the sequences and structures of the RNA molecule were top ranking, implying they can be potential indicators of differences between cirRNAs and other lncRNAs. Meanwhile, other features of evolutionary conversation, sequence consecution were also important.

49 citations


Journal ArticleDOI
12 Apr 2018-Genes
TL;DR: A computational method is presented to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN) using the reliable Monte Carlo feature selection method.
Abstract: Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without AVSD is a critical way to investigate the complex association between AVSD and DS. In this study, we present a computational method to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN). First, each patient was encoded by using the copy number of probes on chromosome 21. The encoded features were ranked by the reliable Monte Carlo feature selection (MCFS) method to obtain a ranked feature list. Based on this feature list, we used a two-stage incremental feature selection to construct two series of feature subsets and applied SNNs to build classifiers to identify optimal features. Results show that 2737 optimal features were obtained, and the corresponding optimal SNN classifier constructed on optimal features yielded a Matthew’s correlation coefficient (MCC) value of 0.748. For comparison, random forest was also used to build classifiers and uncover optimal features. This method received an optimal MCC value of 0.582 when top 132 features were utilized. Finally, we analyzed some key features derived from the optimal features in SNNs found in literature support to further reveal their essential roles.

38 citations


Journal ArticleDOI
TL;DR: Two computational methods that integrated two network diffusion algorithms, including Laplacian heat diffusion (LHD) and random walk with restart (RWR), to search possible genes in the whole network found some obtained genes that can be confirmed as novel TSGs according to recent publications, suggesting the utility of these two proposed methods.
Abstract: Extensive studies on tumor suppressor genes (TSGs) are helpful to understand the pathogenesis of cancer and design effective treatments. However, identifying TSGs using traditional experiments is quite difficult and time consuming. Developing computational methods to identify possible TSGs is an alternative way. In this study, we proposed two computational methods that integrated two network diffusion algorithms, including Laplacian heat diffusion (LHD) and random walk with restart (RWR), to search possible genes in the whole network. These two computational methods were LHD-based and RWR-based methods. To increase the reliability of the putative genes, three strict screening tests followed to filter genes obtained by these two algorithms. After comparing the putative genes obtained by the two methods, we designated twelve genes (e.g., MAP3K10, RND1, and OTX2) as common genes, 29 genes (e.g., RFC2 and GUCY2F) as genes that were identified only by the LHD-based method, and 128 genes (e.g., SNAI2 and FGF4) as genes that were inferred only by the RWR-based method. Some obtained genes can be confirmed as novel TSGs according to recent publications, suggesting the utility of our two proposed methods. In addition, the reported genes in this study were quite different from those reported in a previous one.

34 citations


Journal ArticleDOI
TL;DR: Key biomarker genes, such as IGFBP2, IGF2BP3, PRDX1, NOV, NEFL, HOXA10, GNG12, SPRY4, and BCL11A, were identified and the underlying rules of classifying the three subtypes were produced by Johnson reducer algorithm, which revealed the difference among theThree subtypes, and how they are formed and transformed.
Abstract: As a common brain cancer derived from glial cells, gliomas have three subtypes: glioblastoma, diffuse astrocytoma, and anaplastic astrocytoma. The subtypes have distinctive clinical features but are closely related to each other. A glioblastoma can be derived from the early stage of diffuse astrocytoma, which can be transformed into anaplastic astrocytoma. Due to the complexity of these dynamic processes, single-cell gene expression profiles are extremely helpful to understand what defines these subtypes. We analyzed the single-cell gene expression profiles of 5057 cells of anaplastic astrocytoma tissues, 261 cells of diffuse astrocytoma tissues, and 1023 cells of glioblastoma tissues with advanced machine learning methods. In detail, a powerful feature selection method, Monte Carlo feature selection (MCFS) method, was adopted to analyze the gene expression profiles of cells, resulting in a feature list. Then, the incremental feature selection (IFS) method was applied to the obtained feature list, with the help of support vector machine (SVM), to extract key features (genes) and construct an optimal SVM classifier. Several key biomarker genes, such as IGFBP2, IGF2BP3, PRDX1, NOV, NEFL, HOXA10, GNG12, SPRY4, and BCL11A, were identified. In addition, the underlying rules of classifying the three subtypes were produced by Johnson reducer algorithm. We found that in diffuse astrocytoma, PRDX1 is highly expressed, and in glioblastoma, the expression level of PRDX1 is low. These rules revealed the difference among the three subtypes, and how they are formed and transformed. These genes are not only biomarkers for glioma subtypes, but also drug targets that may switch the clinical features or even reverse the tumor progression.

34 citations


Journal ArticleDOI
TL;DR: This study provides a novel computational approach which successfully identified 26 potential epigenetic factors, paving the way on deepening the authors' understandings on the epigenetic mechanism.
Abstract: Epigenetic regulation has long been recognized as a significant factor in various biological processes, such as development, transcriptional regulation, spermatogenesis, and chromosome stabilization. Epigenetic alterations lead to many human diseases, including cancer, depression, autism, and immune system defects. Although efforts have been made to identify epigenetic regulators, it remains a challenge to systematically uncover all the components of the epigenetic regulation in the genome level using experimental approaches. The advances of constructing protein-protein interaction (PPI) networks provide an excellent opportunity to identify novel epigenetic factors computationally in the genome level. In this study, we identified potential epigenetic factors by using a computational method that applied the random walk with restart (RWR) algorithm on a protein-protein interaction (PPI) network using reported epigenetic factors as seed nodes. False positives were identified by their specific roles in the PPI network or by a low-confidence interaction and a weak functional relationship with epigenetic regulators. After filtering out the false positives, 26 candidate epigenetic factors were finally accessed. According to previous studies, 22 of these are thought to be involved in epigenetic regulation, suggesting the robustness of our method. Our study provides a novel computational approach which successfully identified 26 potential epigenetic factors, paving the way on deepening our understandings on the epigenetic mechanism.

Journal ArticleDOI
12 Feb 2018
TL;DR: The prediction method proposed in this study was confirmed to be a powerful tool for recognizing cleavage sites from protein sequences, and the optimal features that resulted from the dagging algorithm played crucial roles in identifying the cleaved sites by a literature review.
Abstract: The cleavage site of a signal peptide located in the C-region can be recognized by the signal peptidase in eukaryotic and prokaryotic cells, and the signal peptides are typically cleaved off during or after the translocation of the target protein. The identification of cleavage sites remains challenging because of the diverse lengths of signal peptides and the weak conservation of the motif recognized by the signal peptidase. In this study, we applied a fast and accurate computational method to identify cleavage sites in signal peptides based on protein sequences. We collected 2683 protein sequences with experimentally validated N-terminus signal peptides from the newly released UniProt database. A 20 amino acid-length peptide segment flanking the cleavage site was extracted from each protein, and four types of features were used to encode the peptide segment. We applied the synthetic minority oversampling technique, maximum relevance minimum redundancy, and incremental feature selection, together with dagging and random forest algorithms, to identify the optimal features that can lead to the optimal identification of the cleavage sites. The optimal dagging and random forest classifiers constructed on the optimal features yielded Youden's indexes of 0.871 and 0.736, respectively. The sensitivity, specificity, and accuracy yielded by the optimal dagging classifier all exceeded 0.9, which demonstrated the high prediction ability of the optimal dagging classifier. These optimal features that resulted from the dagging algorithm, predominantly the position-specific scoring matrix and the amino acid factor, played crucial roles in identifying the cleavage sites by a literature review. The prediction method proposed in this study was confirmed to be a powerful tool for recognizing cleavage sites from protein sequences.

Journal ArticleDOI
TL;DR: Compared to other prediction models that use classic machine learning algorithms as prediction engines on the same datasets with their own optimal features, the optimal ELM-based prediction model produced much better results, indicating the superiority of the proposed model for the identification of nitrated tyrosine residues from protein sequences.
Abstract: Background Accurately recognizing nitrated tyrosine residues from protein sequences would pave a way for understanding the mechanism of nitration and the screening of the tyrosine residues in sequences. Results In this study, we proposed a prediction model that used the extreme learning machine (ELM) algorithm as the prediction engine to identify nitrated tyrosine residues. To encode each tyrosine residue, a sliding window technique was adopted to extract a peptide segment for each tyrosine residue, from which a number of features were extracted. These features were analyzed by a popular feature selection method, Minimum Redundancy Maximum Relevance (mRMR) method, producing a feature list, in which all features were ranked in a rigorous way. Then, the Incremental Feature Selection (IFS) method was utilized to discover the optimal features, on which the optimal ELM-based prediction model was built. This model produced satisfactory results on the training dataset with a Matthews correlation coefficient of 0.757. The model was also evaluated by an independent test dataset that contained only positive samples, yielding a sensitivity of 0.938. Conclusion Compared to other prediction models that use classic machine learning algorithms as prediction engines on the same datasets with their own optimal features, the optimal ELM-based prediction model produced much better results, indicating the superiority of the proposed model for the identification of nitrated tyrosine residues from protein sequences.

Journal ArticleDOI
TL;DR: The in-depth biological analysis of the 23 biomarkers showed great promise and suggested that mRNA surveillance pathway and multicellular organism growth played important roles in OA.
Abstract: Osteoarthritis (OA) is a complex disease that affects articular joints and may cause disability The incidence of OA is extremely high Most elderly people have the symptoms of osteoarthritis The physiotherapy of OA is time consuming, and the chances of full recovery from OA are very minimal The most effective way of fighting OA is early diagnosis and early intervention Liquid biopsy has become a popular noninvasive test To find the blood gene expression signature for OA, we reanalyzed the publicly available blood gene expression profiles of 106 patients with OA and 33 control samples using an automatic computational pipeline based on advanced feature selection methods Finally, a compact 23-gene set was identified On the basis of these 23 genes, we constructed a Support Vector Machine (SVM) classifier and evaluated it with leave-one-out cross-validation Its sensitivity (Sn), specificity (Sp), accuracy (ACC), and Mathew's correlation coefficient (MCC) were 0991, 0909, 0971, and 0920, respectively Obviously, the performance needed to be validated in an independent large dataset, but the in-depth biological analysis of the 23 biomarkers showed great promise and suggested that mRNA surveillance pathway and multicellular organism growth played important roles in OA Our results shed light on OA diagnosis through liquid biopsy


Journal ArticleDOI
07 Sep 2018-Genes
TL;DR: This constructed model was superior to the SVM model using tissue enriched genes and yielded MCC of 0.985 on an independent test dataset, indicating its good generalization ability.
Abstract: Tissue-specific gene expression has long been recognized as a crucial key for understanding tissue development and function. Efforts have been made in the past decade to identify tissue-specific expression profiles, such as the Human Proteome Atlas and FANTOM5. However, these studies mainly focused on “qualitatively tissue-specific expressed genes” which are highly enriched in one or a group of tissues but paid less attention to “quantitatively tissue-specific expressed genes”, which are expressed in all or most tissues but with differential expression levels. In this study, we applied machine learning algorithms to build a computational method for identifying “quantitatively tissue-specific expressed genes” capable of distinguishing 25 human tissues from their expression patterns. Our results uncovered the expression of 432 genes as optimal features for tissue classification, which were obtained with a Matthews Correlation Coefficient (MCC) of more than 0.99 yielded by a support vector machine (SVM). This constructed model was superior to the SVM model using tissue enriched genes and yielded MCC of 0.985 on an independent test dataset, indicating its good generalization ability. These 432 genes were proven to be widely expressed in multiple tissues and a literature review of the top 23 genes found that most of them support their discriminating powers. As a complement to previous studies, our discovery of these quantitatively tissue-specific genes provides insights into the detailed understanding of tissue development and function.

Journal ArticleDOI
TL;DR: The role of long non-coding RNA in Renal cell carcinoma (RCC) tumorigenesis and progression remains largely unknown and LINC01510 functions as a tumor suppressor in RCC tumorsigenesis.

Journal ArticleDOI
TL;DR: Cytogenetically normal acute myeloid leukemia (CN‐AML), which accounted for nearly half of total AML patients, is a highly heterogeneous subset of AML.
Abstract: Introduction Cytogenetically normal acute myeloid leukemia (CN-AML), which accounted for nearly half of total AML patients, is a highly heterogeneous subset of AML. The specific genetic profile and the ethnic features of CN-AML are worth to be studied. Methods Using deep sequencing technology, we detected the mutation pattern of 39 genes in 152 Chinese CN-AML patients and analyzed their clinical features. Results A total of 503 mutations of 39 genes were identified in 145 (95.4%) patients, with the median number of 3 mutations per case. Nine genes (NPM1, CEBPA, DNMT3A, GATA2, NRAS, TET2, FLT3, IDH2, and WT1) mutated in more than 10% patients. Function groups of myeloid transcription factors, activated signaling, and DNA methylation were most affected. The distribution of variant allele frequencies (VAF) of recurrent genes was different among functional groups. High mutation rates of CEBPA and GATA2 together with the low frequency of FLT3-ITD mutation seemed to be the distinct characteristics of Chinese patients. Furthermore, CEBPAbi and GATA2 were found to mutate most in M2 subtype, while NPM1 and DNMT3A mutated more in M4 and M5. The prognostic analysis identified CEBPAmo mutation as an inferior factor. FLT3-ITD, TP53, DNMT3A, CEBPAmo, and WT1 mutations were selected as high-risk markers to identify the CN-AML patients with poor prognosis. Conclusion Our study provided the valuable information of ethnic genetic characteristics and the clinical relevance of Chinese CN-AML patients.

Journal ArticleDOI
TL;DR: In this special issue, the received papers could be generally divided into 3 categories including computational models in identifying key biomarkers, pathways, and network modules associated with cancers and other diseases, and validations of the mechanisms of key biomarker and their applications in tumor diagnosis and treatment, and other studies in predicting tumor evolution, drug-disease association, disease sequence alignment, and so on.
Abstract: Next-Generation Sequencing (NGS) technology, o en seen as the foundation of precisionmedicine, has been successfully applied in oncology diagnostics and immunotherapy. With advances in gene diagnostics and immunotherapy, there may be a chance to control the development of cancers and alleviate the suffering of patients undergoing chemotherapy. To promote the translation of precision medicine from bench to beside and from application of genetic testing to personalized medicine, new analysis methods for NGS and genetic data need to be developed. For example, the NGS panel is quite different from whole genome sequencing (WGS), focusing on fewer genes or regions but requiring greater precision and efficiency. For complex diseases, such as cancers, the driver genes are usually a cluster of genes in a regulatory network. Graph theories, such as shortest path analysis and random walk algorithms, will help dissect genomewide interactions into key modules or paths whose dysfunction is associated with disease progression. In this special issue, we have received 32 papers, out of which 19 has been accepted for publication. ese papers could be generally divided into 3 categories including (1) computational models in identifying key biomarkers, pathways, and networkmodules associatedwith cancers and other diseases, (2) validations of the mechanisms of key biomarkers and their applications in tumor diagnosis and treatment, and (3) other studies in predicting tumor evolution, drug-disease association, disease sequence alignment, and so on. 2. Computational Models in Identifying Key Biomarkers


Posted ContentDOI
20 Dec 2018-bioRxiv
TL;DR: The normal aging process is modeled based on multi-omics profiles across tissues, and the computational pipeline aims to model aging self-organizing systems and study the relationship between aging and related diseases (i.e. cancers), thus provide useful indexes of aging related diseases and improve diagnostic effects for both pre- and pro- gnosis.
Abstract: Aging is a fundamental biological process, where key bio-markers interact with each other and synergistically regulate the aging process. Thus aging dysfunction will induce many disorders. Finding aging markers and re-constructing networks based on multi-omics data (i.e. methylation, transcriptional and so on) are informative to study the aging process. However, optimizing the model to predict aging have not been performed systemically, although it is critical to identify potential molecular mechanism of aging relative diseases. This paper aims to model the aging self-organization system using a serious of supervised learning methods, and study complex molecular mechanism of aging at system level: i.e. optimizing the aging network; summarizing interactions between aging markers; accumulating patterns of aging markers within module; finding order-parameters of the aging self-organization system. In this work, the normal aging process is modeled based on multi-omics profiles across tissues. In addition, the computational pipeline aims to model aging self-organizing systems and study the relationship between aging and related diseases (i.e. cancers), thus provide useful indexes of aging related diseases and improve diagnostic effects for both pre- and pro- gnosis.