scispace - formally typeset
Search or ask a question
Author

Mohammad Hossein Olyaee

Other affiliations: Islamic Azad University
Bio: Mohammad Hossein Olyaee is an academic researcher from University of Zanjan. The author has contributed to research in topics: Gene regulatory network & Cluster analysis. The author has an hindex of 4, co-authored 19 publications receiving 71 citations. Previous affiliations of Mohammad Hossein Olyaee include Islamic Azad University.

Papers
More filters
Journal ArticleDOI
TL;DR: A hybrid method based on the IWSSr method and Shuffled Frog Leaping Algorithm and SFLA is proposed to select effective features in a large-scale gene dataset, achieving a more compact set of features along with high accuracy.
Abstract: Feature selection problem is one of the most significant issues in data classification. The purpose of feature selection is selection of the least number of features in order to increase accuracy and decrease the cost of data classification. In recent years, due to appearance of high-dimensional datasets with low number of samples, classification models have encountered over-fitting problem. Therefore, the need for feature selection methods that are used to remove the extensions and irrelevant features is felt. Recently, although, various methods have been proposed for selecting the optimal subset of features with high precision, these methods have encountered some problems such as instability, high convergence time, selection of a semi-optimal solution as the final result. In other words, they have not been able to fully extract the effective features. In this paper, a hybrid method based on the IWSSr method and Shuffled Frog Leaping Algorithm (SFLA) is proposed to select effective features in a large-scale gene dataset. The proposed algorithm is implemented in two phases: filtering and wrapping. In the filter phase, the Relief method is used for weighting features. Then, in the wrapping phase, by using the SFLA and the IWSSr algorithms, the search for effective features in a feature-rich area is performed. The proposed method is evaluated by using some standard gene expression datasets. The experimental results approve that the proposed approach in comparison to similar methods, has been achieved a more compact set of features along with high accuracy. The source code and testing datasets are available at https://github.com/jimy2020/SFLA_IWSSr-Feature-Selection.

50 citations

Journal ArticleDOI
TL;DR: A comparison with the results of existing methods shows that the current study's approach provides a satisfactory performance for protein structural class prediction.

32 citations

Journal ArticleDOI
01 May 2018
TL;DR: An interesting approach which is based on Breeding Swarm has been used to learn BNs, a hybrid GA/PSO which enable us to benefits the strengths of particle swarm optimization with genetic algorithms.
Abstract: Bayesian networks (BNs) are widely used as one of the most effective models in bioinformatics, artificial intelligence, text analysis, medical diagnosis, etc. Learning the structure of BNs from data can be viewed as an optimization problem and is proved that this problem is NP-hard. Therefore, heuristic methods can be used as powerful tools to find high-quality networks. In this paper, an interesting approach which is based on Breeding Swarm has been used to learn BNs. Breeding Swarm is a hybrid GA/PSO which enable us to benefits the strengths of particle swarm optimization with genetic algorithms. In order to assess the proposed method, several real-world and benchmark applications are used. Results show that our method is a clear improvement on genetic algorithm and particle swarm optimization.

13 citations

Journal ArticleDOI
TL;DR: It was found that chaotic behavior clearly exists in most haplotype subsequences and it was discovered that the average reconstruction rate for all input data is more than 97%, demonstrating that applying knowledge can effectively improve the reconstruction rate of given haplotypes.
Abstract: Sequence data are deposited in the form of unphased genotypes and it is not possible to directly identify the location of a particular allele on a specific parental chromosome or haplotype. This study employed nonlinear time series modeling approaches to analyze the haplotype sequences obtained from the NGS sequencing method. To evaluate the chaotic behavior of haplotypes, we analyzed their whole sequences, as well as several subsequences from distinct haplotypes, in terms of the SNP distribution on their chromosomes. This analysis utilized chaos game representation (CGR) followed by the application of two different scaling methods. It was found that chaotic behavior clearly exists in most haplotype subsequences. For testing the applicability of the proposed model, the present research determined the alleles in gap positions and positions with low coverage by using chromosome subsequences in which 10% of each subsequence’s alleles are replaced by gaps. After conversion of the subsequences’ CGR into the coordinate series, a Local Projection (LP) method predicted the measure of ambiguous positions in the coordinate series. It was discovered that the average reconstruction rate for all input data is more than 97%, demonstrating that applying this knowledge can effectively improve the reconstruction rate of given haplotypes.

7 citations

Journal ArticleDOI
TL;DR: Results demonstrate that satisfactory results were obtained, proving that AROHap can be used for SIH reconstruction problem, and as a fast convergence bio-inspired method to improve the initial bi-partitioning of the fragments in the previous step.

6 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An in-depth review of existing approaches of time series networks, covering their methodological foundations, interpretation and practical considerations with an emphasis on recent developments, and emphasizes which fundamental new insights complex network approaches bring into the field of nonlinear time series analysis.

382 citations

Journal ArticleDOI
TL;DR: Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort to determine the genetic sequence of the entire human genome.
Abstract: Purpose of the Study. To determine the genetic sequence of the entire human genome. Study Population. Five normal volunteers: 1 African American, 1 Asian-Chinese, 1 Hispanic-Mexican, and 2 Caucasians. Methods. A 2.91-billion base pair (bp) consensus sequence of the human genome was generated. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. Results. The 2 assembly strategies yielded very similar results that largely agree with independent mapping data. Analysis of the genome sequence revealed 26 588 genes for which there was strong corroborating evidence and an additional ∼12,000 likely …

228 citations

Journal ArticleDOI
TL;DR: A powerful method to predict protein structural classes for low-similarity sequences is developed on the basis of a very objective and strict benchmark dataset and will provide an important guide to extract valuable information from protein sequences.
Abstract: Protein structural class could provide important clues for understanding protein fold, evolution and function. However, it is still a challenging problem to accurately predict protein structural classes for low-similarity sequences. This paper was devoted to develop a powerful method to predict protein structural classes for low-similarity sequences. On the basis of a very objective and strict benchmark dataset, we firstly extracted optimal tripeptide compositions (OTC) which was picked out by using feature selection technique to formulate protein samples. And an overall accuracy of 91.1% was achieved in jackknife cross-validation. Subsequently, we investigated the accuracies of three popular features: position-specific scoring matrix (PSSM), predicted secondary structure information (PSSI) and the average chemical shift (ACS) for comparison. Finally, to further improve the prediction performance, we examined all combinations of the four kinds of features and achieved the maximum accuracy of 96.7% in jackknife cross-validation by combining OTC with ACS, demonstrating that the model is efficient and powerful. Our study will provide an important guide to extract valuable information from protein sequences.

175 citations

Journal ArticleDOI
TL;DR: The extensive experimental and statistical analyses suggest that the proposed hybrid variant of HHO is able to produce effcient search results without additional computational cost.
Abstract: Feature selection, an optimization problem, becomes an important pre-process tool in data mining, which simultaneously aims at minimizing feature-size and maximizing model generalization. Because of large search space, conventional optimization methods often fail to generate global optimum solution. A variety of hybrid techniques merging different search strategies have been proposed in feature selection literature, but mostly deal with low dimensional datasets. In this paper, a hybrid optimization method is proposed for numerical optimization and feature selection, which integrates sine-cosine algorithm (SCA) in Harris hawks optimization (HHO). The goal of SCA integration is to cater ineffective exploration in HHO, moreover exploitation is enhanced by dynamically adjusting candidate solutions for avoiding solution stagnancy in HHO. The proposed method, namely SCHHO, is evaluated by employing CEC’17 test suite for numerical optimization and sixteen datasets with low and high-dimensions exceeding 15000 attributes, and compared with original SCA and HHO, as well as, other well-known optimization methods like dragonfly algorithm (DA), whale optimization algorithm (WOA), grasshopper optimization algorithm (GOA), Grey wolf optimization (GWO), and salp swarm algorithm (SSA); in addition to state-of-the-art methods. Performance of the proposed method is also validated against hybrid methods proposed in recent related literature. The extensive experimental and statistical analyses suggest that the proposed hybrid variant of HHO is able to produce efficient search results without additional computational cost. With increased convergence speed, SCHHO reduced feature-size up to 87% and achieved accuracy up to 92%. Motivated from the findings of this study, various potential future directions are also highlighted.

100 citations

Journal ArticleDOI
TL;DR: The results indicate that the method proposed in this paper can effectively improve the prediction accuracy of protein structural class, which will be a reliable tool for prediction of proteinStructural class, especially for low-similarity sequences.
Abstract: Prediction of protein structural class plays an important role in protein structure and function analysis, drug design and many other biological applications. Prediction of protein structural class for low-similarity sequences is still a challenging task. Based on the theory of wavelet denoising, this paper presents a novel method of prediction of protein structural class for the first time. Firstly, the features of the protein sequence are extracted by using Chou's pseudo amino acid composition (PseAAC). Then the extracted feature information is denoised by two-dimensional (2D) wavelet. Finally, the optimal feature vectors are input to support vector machine (SVM) classifier to predict protein structural classes. We obtained significant predictive results using jackknife test on three low-similarity protein structural class datasets 25PDB, 1189 and 640, and compared our method with previous methods The results indicate that the method proposed in this paper can effectively improve the prediction accuracy of protein structural class, which will be a reliable tool for prediction of protein structural class, especially for low-similarity sequences.

64 citations