Author
Suohai Fan
Bio: Suohai Fan is an academic researcher from Jinan University. The author has contributed to research in topics: Stochastic game & Symmetric graph. The author has an hindex of 12, co-authored 38 publications receiving 511 citations.
Papers
More filters
TL;DR: The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise, and better classification results are produced from this feasible and effective algorithm.
Abstract: The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
140 citations
TL;DR: A novel multi-swarm particle swarm optimization with dynamic learning strategy (PSO-DLS) to improve the performance of PSO and demonstrate its promising effectiveness in solving complex problems statistically comparing to other algorithms.
Abstract: In the paper, we proposed a novel multi-swarm particle swarm optimization with dynamic learning strategy (PSO-DLS) to improve the performance of PSO. To promote information exchange among sub-swarms, the particle classification mechanism advocates that particles in each sub-swarm are classified into ordinary particles and communication particles with different tasks at each iteration. The ordinary particles focus on exploitation under the guidance of the local best position in its sub-swarm, while the communication particles with dynamic ability that focus on exploration under the guidance of a united local best position in a new search region promote information to be exchanged among sub-swarms. Moreover the strategy sets a dynamic control mechanism with an increasing parameter p for implementing the classification operation, which provides ordinary particles an increasing sense of evolution into communication particles during the searching process. A simple case of analysis on searching behavior supports its remarkable impact on maintaining the diversity and searching a better solution. Experimental results on 15 function problems of CEC 2015 for 10 and 30 dimensions also demonstrate its promising effectiveness in solving complex problems statistically comparing to other algorithms. What's more, the computational times reveal the subtle design of PSO-DLS.
106 citations
TL;DR: The behavior and bounds of conditional chromatic number of a graph G are investigated and it is found that every vertex of degree at least r in G will be adjacent to vertices with at leastR different colors.
Abstract: For an integer r>0, a conditional(k,r)-coloring of a graph G is a proper k-coloring of the vertices of G such that every vertex of degree at least r in G will be adjacent to vertices with at least r different colors. The smallest integer k for which a graph G has a conditional (k,r)-coloring is the rth order conditional chromatic number @g"r(G). In this paper, the behavior and bounds of conditional chromatic number of a graph G are investigated.
53 citations
TL;DR: The dynamic chromatic number, denoted by @g"2(G), is the smallest integer k for which a graph G has a (k,2)-coloring, which is the least integer k such that every list assignment L with |L(v)|=k, @[email protected]?V(G], permits an (L,2-coloring.
Abstract: For integers k,r>0, a (k,r)-coloring of a graph G is a proper coloring on the vertices of G by k colors such that every vertex v of degree d(v) is adjacent to vertices with at least min{d(v),r} different colors. The dynamic chromatic number, denoted by @g"2(G), is the smallest integer k for which a graph G has a (k,2)-coloring. A list assignment L of G is a function that assigns to every vertex v of G a set L(v) of positive integers. For a given list assignment L of G, an (L,r)-coloring of G is a proper coloring c of the vertices such that every vertex v of degree d(v) is adjacent to vertices with at least min{d(v),r} different colors and c(v)@?L(v). The dynamic choice number of G, ch"2(G), is the least integer k such that every list assignment L with |L(v)|=k, @[email protected]?V(G), permits an (L,2)-coloring. It is known that for any graph G, @g"r(G)@?ch"r(G). Using Euler distributions in this paper, we prove the following results, where (2) and (3) are best possible. (1)If G is planar, then ch"2(G)@?6. Moreover, ch"2(G)@?5 when @D(G)@?4. (2)If G is planar, then @g"2(G)@?5. (3)If G is a graph with genus g(G)>=1, then ch"2(G)@?12(7+1+48g(G)).
39 citations
TL;DR: The memory-based prisoners dilemma game with conditional selection on networks is investigated and the proposed selection takes the historical information into account, which helps evaluate the recent performance in the history and select neighbors with strong attractiveness.
Abstract: Memory-based prisoners dilemma game with conditional selection is studied on networks.The proposed selection takes the historical information into account.Only neighbors in potential growth way are selected.Memory length M makes a dual impact on the spatial interaction.Defection is benefit from small M while cooperation is promoted by large M. We investigated the memory-based prisoners dilemma game with conditional selection on networks. The proposed selection takes the historical information into account, which helps evaluate the recent performance in the history and select neighbors with strong attractiveness. Those neighbors who get more payoffs than average payoff in the memory length are considered in potential growth way. The simulation results shows that memory length M makes a dual impact on the spatial interaction. Defection is benefit from small M while cooperation is promoted by large M. Cooperator can resist the defectors invasion with the increase of memory length M. Discussions for the average payoff and strategy distribution in the spatial game show the effect of the proposed mechanism with conditional selection. The findings may be helpful in understanding cooperative behavior in natural and social systems consisting of conditional selection based on the recent performance in the history.
32 citations
Cited by
More filters
Journal Article•
TL;DR: Why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease are detailed.
Abstract: Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease.
1,323 citations
01 Jan 2012
TL;DR: The influence of institutional investors on myopic R&D investment behavior was discussed by Bushee as discussed by the authors, who claimed that institutional investors had a profound influence on investment behavior.
Abstract: 机构投资者作为证券市场中的重要力量,越来越受到理论界和实务界的关注。论文对宾夕法尼亚大学沃顿商学院会计学教授布赖恩-布希(Brian Bushee)的论文"The influence of institutional investors on myopic R&D investment behavior"(机构投资者对企业短视研发投资行为的影响,以下简称Bushee(1998))进行评价并提出相关的建议和研究方向。
1,246 citations
TL;DR: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data because of its simplicity in the design, as well as its robustness when applied to different type of problems.
Abstract: The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also significantly contributed to new supervised learning paradigms, including multilabel classification, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of different software packages -- from open source to commercial. In this paper, marking the fifteen year anniversary of SMOTE, we reect on the SMOTE journey, discuss the current state of affairs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.
905 citations
TL;DR: This work presents a simple and effective oversampling method based on k-means clustering and SMOTE (synthetic minority oversampled technique), which avoids the generation of noise and effectively overcomes imbalances between and within classes.
Abstract: Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE (synthetic minority oversampling technique), which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 90 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation 1 is made available in the Python programming language.
463 citations
[...]
01 Jan 1994
TL;DR: For the list object, introduced in Chapter 5, it was shown that each data element contains at most one predecessor element and one successor element, so for any given data element or node in the list structure, the authors can talk in terms of a next element and a previous element.
Abstract: For the list object, introduced in Chapter 5, it was shown that each data element contains at most one predecessor element and one successor element. Therefore, for any given data element or node in the list structure, we can talk in terms of a next element and a previous element.
381 citations