Showing papers by "Yi-Ping Phoebe Chen published in 2009"

PDF

Open Access

Journal Article•DOI•

Acoustic feature selection for automatic emotion recognition from speech

[...]

Jia Rong¹, Gang Li¹, Yi-Ping Phoebe Chen¹•Institutions (1)

01 May 2009-Information Processing and Management

TL;DR: A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method.

...read moreread less

Abstract: Emotional expression and understanding are normal instincts of human beings, but automatical emotion recognition from speech without referring any language or linguistic information remains an unclosed problem. The limited size of existing emotional data samples, and the relative higher dimensionality have outstripped many dimensionality reduction and feature selection algorithms. This paper focuses on the data preprocessing techniques which aim to extract the most effective acoustic features to improve the performance of the emotion recognition. A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features. The presented algorithm integrates the advantages from a decision tree method and the random forest ensemble. Experiment results on a series of Chinese emotional speech data sets indicate that the presented algorithm can achieve improved results on emotional recognition, and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method.

...read moreread less

198 citations

Journal Article•DOI•

Analysis on relationship between extreme pathways and correlated reaction sets

[...]

Yanping Xi¹, Yi-Ping Phoebe Chen², Ming Cao¹, Weirong Wang¹, Fei Wang¹ - Show less +1 more•Institutions (2)

Fudan University¹, Deakin University²

30 Jan 2009-BMC Bioinformatics

TL;DR: Both extreme pathways and correlated reaction sets are derived from the topology information of metabolic networks and suggest a possible mechanism: as a controllable unit, an extreme pathway is regulated by its corresponding correlation sets, and a correlated reaction set is further regulated by the organism's regulatory network.

...read moreread less

Abstract: Constraint-based modeling of reconstructed genome-scale metabolic networks has been successfully applied on several microorganisms In constraint-based modeling, in order to characterize all allowable phenotypes, network-based pathways, such as extreme pathways and elementary flux modes, are defined However, as the scale of metabolic network rises, the number of extreme pathways and elementary flux modes increases exponentially Uniform random sampling solves this problem to some extent to study the contents of the available phenotypes After uniform random sampling, correlated reaction sets can be identified by the dependencies between reactions derived from sample phenotypes In this paper, we study the relationship between extreme pathways and correlated reaction sets Correlated reaction sets are identified for E coli core, red blood cell and Saccharomyces cerevisiae metabolic networks respectively All extreme pathways are enumerated for the former two metabolic networks As for Saccharomyces cerevisiae metabolic network, because of the large scale, we get a set of extreme pathways by sampling the whole extreme pathway space In most cases, an extreme pathway covers a correlated reaction set in an 'all or none' manner, which means either all reactions in a correlated reaction set or none is used by some extreme pathway In rare cases, besides the 'all or none' manner, a correlated reaction set may be fully covered by combination of a few extreme pathways with related function, which may bring redundancy and flexibility to improve the survivability of a cell In a word, extreme pathways show strong complementary relationship on usage of reactions in the same correlated reaction set Both extreme pathways and correlated reaction sets are derived from the topology information of metabolic networks The strong relationship between correlated reaction sets and extreme pathways suggests a possible mechanism: as a controllable unit, an extreme pathway is regulated by its corresponding correlated reaction sets, and a correlated reaction set is further regulated by the organism's regulatory network

...read moreread less

22 citations

Journal Article•DOI•

Discovery of Structural and Functional Features in RNA Pseudoknots

[...]

Qingfeng Chen¹, Yi-Ping Phoebe Chen¹•Institutions (1)

Deakin University¹

01 Jul 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: PseudoBase is a database providing structural, functional, and sequence data related to RNA pseudoknots, and a novel framework using quantitative association rule mining to analyze the pseudok not data is presented.

...read moreread less

Abstract: An RNA pseudoknot consists of nonnested double-stranded stems connected by single-stranded loops. There is increasing recognition that RNA pseudoknots are one of the most prevalent RNA structures and fulfill a diverse set of biological roles within cells, and there is an expanding rate of studies into RNA pseudoknotted structures as well as increasing allocation of function. These not only produce valuable structural data but also facilitate an understanding of structural and functional characteristics in RNA molecules. PseudoBase is a database providing structural, functional, and sequence data related to RNA pseudoknots. To capture the features of RNA pseudoknots, we present a novel framework using quantitative association rule mining to analyze the pseudoknot data. The derived rules are classified into specified association groups regarding structure, function, and category of RNA pseudoknots. The discovered association rules assist biologists in filtering out significant knowledge of structure-function and structure-category relationships. A brief biological interpretation to the relationships is presented, and their potential correlations with each other are highlighted.

...read moreread less

18 citations

Journal Article•DOI•

Candidate working set strategy based SMO algorithm in support vector machine

[...]

Xiaofeng Song¹, Wei-min Chen¹, Yi-Ping Phoebe Chen², Bin Jiang¹•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, Deakin University²

01 Sep 2009-Information Processing and Management

TL;DR: This new candidate working set (CWS) strategy can select several greatest violating samples from Cache as the iterative working sets for the next several optimizing steps, which can improve the efficiency of the kernel cache usage and reduce the computational cost related to the working set selection.

...read moreread less

Abstract: Sequential minimal optimization (SMO) is quite an efficient algorithm for training the support vector machine. The most important step of this algorithm is the selection of the working set, which greatly affects the training speed. The feasible direction strategy for the working set selection can decrease the objective function, however, may augment to the total calculation for selecting the working set in each of the iteration. In this paper, a new candidate working set (CWS) Strategy is presented considering the cost on the working set selection and cache performance. This new strategy can select several greatest violating samples from Cache as the iterative working sets for the next several optimizing steps, which can improve the efficiency of the kernel cache usage and reduce the computational cost related to the working set selection. The results of the theory analysis and experiments demonstrate that the proposed method can reduce the training time, especially on the large-scale datasets.

...read moreread less

17 citations

Journal Article•DOI•

Compensatory ability to null mutation in metabolic networks.

[...]

Da Jiang¹, Shuigeng Zhou¹, Yi-Ping Phoebe Chen²•Institutions (2)

Fudan University¹, Deakin University²

01 Jun 2009-Biotechnology and Bioengineering

TL;DR: Analyzing more than 800 organism's metabolic networks suggests that the reactions with larger impact degrees are likely essential and the universal reactions should also be essential, and shows that scale‐free feature and reaction reversibility contribute to the robustness in metabolic networks.

...read moreread less

Abstract: Robustness is an inherent property of biological system It is still a limited understanding of how it is accomplished at the cellular or molecular level To this end, this article analyzes the impact degree of each reaction to others, which is defined as the number of cascading failures of following and/or forward reactions when an initial reaction is deleted By analyzing more than 800 organism's metabolic networks, it suggests that the reactions with larger impact degrees are likely essential and the universal reactions should also be essential Alternative metabolic pathways compensate null mutations, which represents that average impact degrees for all organisms are small Interestingly, average impact degrees of archaea organisms are smaller than other two categories of organisms, eukayote and bacteria, indicating that archaea organisms have strong robustness to resist the various perturbations during the evolution process The results show that scale-free feature and reaction reversibility contribute to the robustness in metabolic networks The optimal growth temperature of organism also relates the robust structure of metabolic network

...read moreread less

17 citations

Journal Article•DOI•

Brief Communication: Finding rule groups to classify high dimensional gene expression datasets

[...]

Jiyuan An¹, Yi-Ping Phoebe Chen²•Institutions (2)

Deakin University¹, Australian Research Council²

01 Feb 2009-Computational Biology and Chemistry

TL;DR: This paper proposes a robust algorithm to find out rule groups to classify gene expression datasets and shows that the rule groups obtained by the algorithm have higher accuracy than that of other classification approaches.

...read moreread less

14 citations

Proceedings Article•DOI•

High Functional Coherence in k-Partite Protein Cliques of Protein Interaction Networks

[...]

Qian Liu¹, Yi-Ping Phoebe Chen², Jinyan Li¹•Institutions (2)

Nanyang Technological University¹, Deakin University²

01 Nov 2009

TL;DR: The idea of k-partite protein cliques suggests a novel approach to characterizing PPI networks, and may help function prediction for unknown proteins.

...read moreread less

Abstract: We introduce a new topological concept called k-partite protein cliques to study protein interaction (PPI) networks.In particular, we examine functional coherence of proteins in k-partite protein cliques. A k-partite protein clique is a k-partite maximal clique comprising two or more nonoverlapping protein subsets between any two of which full interactions are exhibited. In the detection of PPI’s k-partite maximal cliques, we propose to transform PPI networks into induced K-partite graphs with proteins as vertices where edges only exist among the graph’s partites. Then, we present a k-partite maximal clique mining (MaCMik) algorithm to enumerate k-partite maximal cliques from K-partite graphs. Our MaCMik algorithm is applied to a yeast PPI network. We observe that there does exist interesting and unusually high functional coherence in k-partite proteincliques—most proteins in k-partite protein cliques, especially those in the same partites, share the same functions. Therefore, the idea of k-partite protein cliques suggests a novel approach to characterizing PPI networks, and may help function prediction for unknown proteins.

...read moreread less

7 citations

Book Chapter•DOI•

Spherical harmonics and distance transform for image representation and retrieval

[...]

Atul Sajjanhar¹, Guojun Lu², Dengsheng Zhang², Jingyu Hou¹, Yi-Ping Phoebe Chen¹ - Show less +1 more•Institutions (2)

Deakin University¹, Monash University²

23 Sep 2009

TL;DR: Experimental results show that the performance of the proposed descriptors is significantly better than other methods in the same category.

...read moreread less

Abstract: In this paper, we have proposed a method for 2D image retrieval based on object shapes. The method relies on transforming the 2D images into 3D space based on distance transform. Spherical harmonics are obtained for the 3D data and used as descriptors for the underlying 2D images. The proposed method is compared against two existing methods which use spherical harmonics for shape based retrieval of images. MPEG-7 Still Images Content Set is used for performing experiments; this dataset consists of 3621 still images. Experimental results show that the performance of the proposed descriptors is significantly better than other methods in the same category.

...read moreread less

7 citations

Proceedings Article•DOI•

Early Breast Cancer Identification: Which Way to Go? Microarray or Image Based Computer Aided Diagnosis!

[...]

Jesmin Nahar¹, Kevin S. Tickle¹, A. B. M. Shawkat Ali¹, Yi-Ping Phoebe Chen²•Institutions (2)

Central Queensland University¹, Deakin University²

19 Oct 2009

TL;DR: Results suggest the most effective means of breast cancer identification in the early stage is a hybrid approach.

...read moreread less

Abstract: The goal of this research is to develop a computer aided diagnostic (CAD) system that can detect breast cancer in the early stage by using microarray and image data. We verified the performance of six well known classification algorithms with various performance matrices. Although we do not suggest a unique classifier algorithm for a CAD system, we do identify a number of algorithms whose performance is very promising. The algorithms performance was validated by 3 images dataset; two have been used for the first time in this experiment. Multidimensional image filtering is adopted for the final data extraction. The image data classification performance is compared with microarray data. Results suggest the most effective means of breast cancer identification in the early stage is a hybrid approach.

...read moreread less

5 citations

Journal Article•

Finding Coverage Using Incremental Attribute Combinations

[...]

Jiyuan An, Yi-Ping Phoebe Chen

01 May 2009-International Journal of Innovative Computing Information and Control

TL;DR: An algorithm that adopts incremental feature combinations to effectively find the largest coverage is proposed, and the irrelevant coverage can be pruned away at early stages because potentially large Coverage can be found earlier.

...read moreread less

Abstract: Coverage is the range that covers only positive samples in attribute (or feature) space. Finding coverage is the kernel problem in induction algorithms because of the fact that coverage can be used as rules to describe positive samples. To reflect the characteristic of training samples, it is desir-able that the large coverage that cover more positive samples. However, it is difficult to find large coverage, because the attribute space is usually very high dimensionality. Many heuristic methods such as ID3, AQ and CN2 have been proposed to find large coverage. A robust algorithm also has been proposed to find the largest coverage, but the complexities of time and space are costly when the dimensionality becomes high. To overcome this drawback, this paper proposes an algorithm that adopts incremental feature combinations to effectively find the largest coverage. In this algorithm, the irrelevant coverage can be pruned away at early stages because potentially large coverage can be found earlier. Experiments show that the space and time needed to find the largest coverage has been significantly reduced.

...read moreread less

2 citations

Proceedings Article•DOI•

Data Pre-processing for More Effective Gene Clustering

[...]

Jingyu Hou¹, Yi-Ping Phoebe Chen¹•Institutions (1)

Deakin University¹

24 Apr 2009

TL;DR: An innovative data pre-processing approach to identify noise data in the data sets and eliminate or reduce the impact of the noise data on gene clustering, that makes the clustering results stable across clustering algorithms with different similarity metrics.

...read moreread less

Abstract: The high-throughput experimental data from the new gene microarray technology has spurred numerous efforts to find effective ways of processing microarray data for revealing real biological relationships among genes. This work proposes an innovative data pre-processing approach to identify noise data in the data sets and eliminate or reduce the impact of the noise data on gene clustering, With the proposed algorithm, the pre-processed data sets make the clustering results stable across clustering algorithms with different similarity metrics, the important information of genes and features is kept, and the clustering quality is improved. The primary evaluation on real microarray data sets has shown the effectiveness of the proposed algorithm.

...read moreread less

Book Chapter•DOI•

Advanced graph mining methods for protein analysis

[...]

Yi-Ping Phoebe Chen¹, Jia Rong, Gang Li¹•Institutions (1)

Deakin University¹

01 Sep 2009

TL;DR: This chapter introduces a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and takes advantage of its expressive power to study protein structures.

...read moreread less

Abstract: As one of the primary substances in a living organism, protein defines the character of each cell by interacting with the cellular environment to promote the cell’s growth and function [1]. Previous studies on proteomics indicate that the functions of different proteins could be assigned based upon protein structures [2,3]. The knowledge on protein structures gives us an overview of protein fold space and is helpful for the understanding of the evolutionary principles behind structure. By observing the architectures and topologies of the protein families, biological processes can be investigated more directly with much higher resolution and finer detail. For this reason, the analysis of protein, its structure and the interaction with the other materials is emerging as an important problem in bioinformatics. However, the determination of protein structures is experimentally expensive and time consuming, this makes scientists largely dependent on sequence rather than more general structure to infer the function of the protein at the present time. For this reason, data mining technology is introduced into this area to provide more efficient data processing and knowledge discovery approaches. Unlike many data mining applications which lack available data, the protein structure determination problem and its interaction study, on the contrary, could utilize a vast amount of biologically relevant information on protein and its interaction, such as the protein data bank (PDB) [4], the structural classification of proteins (SCOP) databases [5], CATH databases [6], UniProt [7], and others. The difficulty of predicting protein structures, specially its 3D structures, and the interactions between proteins as shown in Figure 6.1, lies in the computational complexity of the data. Although a large number of approaches have been developed to determine the protein structures such as ab initio modelling [8], homology modelling [9] and threading [10], more efficient and reliable methods are still greatly needed. In this chapter, we will introduce a state-of-the-art data mining technique, graph mining, which is good at defining and discovering interesting structural patterns in graphical data sets, and take advantage of its expressive power to study protein structures, including protein structure prediction and comparison, and protein-protein interaction (PPI). The current graph pattern mining methods will be described, and typical algorithms will be presented, together with their applications in the protein structure analysis. The rest of the chapter is organized as follows: Section 6.2 will give a brief introduction of the fundamental knowledge of protein, the publicly accessible protein data resources and the current research status of protein analysis; in Section 6.3, we will pay attention to one of the state-of-the-art data mining methods, graph mining; then Section 6.4 surveys several existing work for protein structure analysis using advanced graph mining methods in the recent decade; finally, in Section 6.5, a conclusion with potential further work will be summarized.

...read moreread less