Home
/
Authors
/
Yanni Zhu

Author

Yanni Zhu

Bio: Yanni Zhu is an academic researcher from University of Minnesota. The author has contributed to research in topics: Support vector machine & Hinge loss. The author has an hindex of 2, co-authored 2 publications receiving 112 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Network-based support vector machine for classification of microarray samples

[...]

Yanni Zhu¹, Xiaotong T Shen¹, Wei Pan¹•Institutions (1)

University of Minnesota¹

30 Jan 2009-BMC Bioinformatics

TL;DR: A network-based support vector machine is proposed for binary classification problems by constructing a penalty term from the F∞-norm being applied to pairwise gene neighbors with the hope to improve predictive performance and gene selection.

...read moreread less

Abstract: The importance of network-based approach to identifying biological markers for diagnostic classification and prognostic assessment in the context of microarray data has been increasingly recognized. To our knowledge, there have been few, if any, statistical tools that explicitly incorporate the prior information of gene networks into classifier building. The main idea of this paper is to take full advantage of the biological observation that neighboring genes in a network tend to function together in biological processes and to embed this information into a formal statistical framework. We propose a network-based support vector machine for binary classification problems by constructing a penalty term from the F∞-norm being applied to pairwise gene neighbors with the hope to improve predictive performance and gene selection. Simulation studies in both low- and high-dimensional data settings as well as two real microarray applications indicate that the proposed method is able to identify more clinically relevant genes while maintaining a sparse model with either similar or higher prediction accuracy compared with the standard and the L1 penalized support vector machines. The proposed network-based support vector machine has the potential to be a practically useful classification tool for microarrays and other high-dimensional data.

...read moreread less

113 citations

Journal Article•DOI•

Support Vector Machines with Disease-gene-centric Network Penalty for High Dimensional Microarray Data.

[...]

Yanni Zhu¹, Wei Pan¹, Xiaotong T Shen•Institutions (1)

University of Minnesota¹

01 Jan 2009-Statistics and Its Interface

TL;DR: The proposed DGC-SVM utilizes the hinge loss penalized by a sum of the L(infinity)-norm being applied to each group of genes to be an effective classification tool that encourages gene selection along paths to or clustering around known disease genes for microarray data.

...read moreread less

Abstract: With the availability of genetic pathways or networks and accumulating knowledge on genes with variants predisposing to diseases (disease genes), we propose a disease-gene-centric support vector machine (DGC-SVM) that directly incorporates these two sources of prior information into building microarray-based classifiers for binary classification problems. DGC-SVM aims to detect the genes clustering together and around some key disease genes in a gene network. To achieve this goal, we propose a penalty over suitably defined groups of genes. A hierarchy is imposed on an undirected gene network to facilitate the definition of such gene groups. Our proposed DGC-SVM utilizes the hinge loss penalized by a sum of the L(infinity)-norm being applied to each group. The simulation studies show that DGC-SVM not only detects more disease genes along pathways than the existing standard SVM and SVM with an L(1)-penalty (L1-SVM), but also captures disease genes that potentially affect the outcome only weakly. Two real data applications demonstrate that DGC-SVM improves gene selection with predictive performance comparable to the standard-SVM and L1-SVM. The proposed method has the potential to be an effective classification tool that encourages gene selection along paths to or clustering around known disease genes for microarray data.

...read moreread less

6 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Early Diagnosis of Complex Diseases by Molecular Biomarkers, Network Biomarkers, and Dynamical Network Biomarkers

[...]

Rui Liu¹, Xiangdong Wang², Kazuyuki Aihara³, Luonan Chen³, Luonan Chen⁴ - Show less +1 more•Institutions (4)

South China University of Technology¹, Fudan University², University of Tokyo³, Chinese Academy of Sciences⁴

01 May 2014-Medicinal Research Reviews

TL;DR: The new concept of dynamical network biomarkers (DNBs) has been developed, which is different from traditional static approaches, and the DNB is able to distinguish a predisease state from normal and disease states by even a small number of samples, and therefore has great potential to achieve “real” early diagnosis of complex diseases.

...read moreread less

Abstract: Many studies have been carried out for early diagnosis of complex diseases by finding accurate and robust biomarkers specific to respective diseases. In particular, recent rapid advance of high-throughput technologies provides unprecedented rich information to characterize various disease genotypes and phenotypes in a global and also dynamical manner, which significantly accelerates the study of biomarkers from both theoretical and clinical perspectives. Traditionally, molecular biomarkers that distinguish disease samples from normal samples are widely adopted in clinical practices due to their ease of data measurement. However, many of them suffer from low coverage and high false-positive rates or high false-negative rates, which seriously limit their further clinical applications. To overcome those difficulties, network biomarkers (or module biomarkers) attract much attention and also achieve better performance because a network (or subnetwork) is considered to be a more robust form to characterize diseases than individual molecules. But, both molecular biomarkers and network biomarkers mainly distinguish disease samples from normal samples, and they generally cannot ensure to identify predisease samples due to their static nature, thereby lacking ability to early diagnosis. Based on nonlinear dynamical theory and complex network theory, a new concept of dynamical network biomarkers (DNBs, or a dynamical network of biomarkers) has been developed, which is different from traditional static approaches, and the DNB is able to distinguish a predisease state from normal and disease states by even a small number of samples, and therefore has great potential to achieve "real" early diagnosis of complex diseases. In this paper, we comprehensively review the recent advances and developments on molecular biomarkers, network biomarkers, and DNBs in particular, focusing on the biomarkers for early diagnosis of complex diseases considering a small number of samples and high-throughput data (or big data). Detailed comparisons of various types of biomarkers as well as their applications are also discussed.

...read moreread less

230 citations

Journal Article•DOI•

Incorporating Predictor Network in Penalized Regression with Application to Microarray Data

[...]

Wei Pan¹, Benhuai Xie¹, Xiaotong Shen¹•Institutions (1)

University of Minnesota¹

01 Jun 2010-Biometrics

TL;DR: A grouped penalty based on the Lγ‐norm that smoothes the regression coefficients of the predictors over the network is proposed that performs best in variable selection across all simulation set‐ups considered.

...read moreread less

Abstract: We consider penalized linear regression, especially for “large p, small n” problems, for which the relationships among predictors are described a priori by a network. A class of motivating examples includes modeling a phenotype through gene expression profiles while accounting for coordinated functioning of genes in the form of biological pathways or networks. To incorporate the prior knowledge of the similar effect sizes of neighboring predictors in a network, we propose a grouped penalty based on the Lγ-norm that smoothes the regression coefficients of the predictors over the network. The main feature of the proposed method is its ability to automatically realize grouped variable selection and exploit grouping effects. We also discuss effects of the choices of the γ and some weights inside the Lγ-norm. Simulation studies demonstrate the superior finite sample performance of the proposed method as compared to Lasso, elastic net and a recently proposed network-based method. The new method performs best in variable selection across all simulation set-ups considered. For illustration, the method is applied to a microarray dataset to predict survival times for some glioblastoma patients using a gene expression dataset and a gene network compiled from some KEGG pathways.

...read moreread less

127 citations

Journal Article•DOI•

Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

[...]

Kun-Huang Chen¹, Kung-Jeng Wang¹, Min Lung Tsai², Kung Min Wang³, Angelia Melani Adrian¹, Wei Chung Cheng⁴, Tzu Sen Yang⁵, Nai Chia Teng⁵, Kuo Pin Tan¹, Ku Shang Chang² - Show less +6 more•Institutions (5)

National Taiwan University of Science and Technology¹, Yuanpei University², Memorial Hospital of South Bend³, National Yang-Ming University⁴, Taipei Medical University⁵

20 Feb 2014-BMC Bioinformatics

TL;DR: This study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier that outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets.

...read moreread less

Abstract: In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data. To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets. Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification.

...read moreread less

123 citations

Journal Article•DOI•

A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification.

[...]

Yunchuan Kong¹, Tianwei Yu¹•Institutions (1)

Emory University¹

07 Nov 2018-Scientific Reports

TL;DR: A newly developed classifier named Forest Deep Neural Network (fDNN), to integrate the deep neural network architecture with a supervised forest feature detector, which is able to learn sparse feature representations and feed the representations into a neural network to mitigate the overfitting problem.

...read moreread less

Abstract: In predictive model development, gene expression data is associated with the unique challenge that the number of samples (n) is much smaller than the amount of features (p). This “n ≪ p” property has prevented classification of gene expression data from deep learning techniques, which have been proved powerful under “n > p” scenarios in other application fields, such as image classification. Further, the sparsity of effective features with unknown correlation structures in gene expression profiles brings more challenges for classification tasks. To tackle these problems, we propose a newly developed classifier named Forest Deep Neural Network (fDNN), to integrate the deep neural network architecture with a supervised forest feature detector. Using this built-in feature detector, the method is able to learn sparse feature representations and feed the representations into a neural network to mitigate the overfitting problem. Simulation experiments and real data analyses using two RNA-seq expression datasets are conducted to evaluate fDNN’s capability. The method is demonstrated a useful addition to current predictive models with better classification performance and more meaningful selected features compared to ordinary random forests and deep neural networks.

...read moreread less

113 citations

Journal Article•DOI•

Molecular Classifiers for Acute Kidney Transplant Rejection in Peripheral Blood by Whole Genome Gene Expression Profiling

[...]

Sunil M. Kurian¹, A. Williams¹, Terri Gelbart¹, Daniel Campbell¹, Tony S. Mondala¹, Steven R. Head¹, Stephen Horvath², Lillian W. Gaber³, Ryan Thompson¹, Thomas Whisenant¹, Wen Lin², Peter Langfelder², Elizabeth H Robison¹, Randolph Schaffer⁴, Jonathan S. Fisher⁴, John J. Friedewald⁵, Stuart M. Flechner⁶, Laurence Chan⁷, Alexander C. Wiseman⁷, H. Shidban, Robert Mendez, Raymond L. Heilman⁸, Michael Abecassis⁵, Christopher L. Marsh⁴, Daniel R. Salomon⁴, Daniel R. Salomon¹ - Show less +22 more•Institutions (8)

Scripps Research Institute¹, University of California, Los Angeles², Texas Medical Center³, Scripps Health⁴, Northwestern University⁵, Cleveland Clinic⁶, University of Colorado Denver⁷, Mayo Clinic⁸

01 May 2014-American Journal of Transplantation

TL;DR: It is concluded that peripheral blood gene expression profiling can be used as a minimally invasive tool to accurately reveal TX, AR and ADNR in the setting of acute kidney transplant dysfunction.

...read moreread less

102 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Collapse