scispace - formally typeset
Search or ask a question

Showing papers in "Journal of AI and Data Mining in 2015"


Journal ArticleDOI
TL;DR: A leave-one-out cross validation method is used for performance evaluation and experimental results prove that the proposed method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging data has an acceptable accuracy.
Abstract: In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of PCA results. For feature extraction, local binary patterns (LBP) technique is applied on the ICs. It transforms the ICs into spatial histograms of LBP values. For feature selection, genetic algorithm (GA) is used to obtain a set of features with large discrimination power. In the next step of feature selection, linear discriminant analysis (LDA) is applied to further extract features that maximize the ratio of between-class and within-class variability. Finally, a test subject is classified into schizophrenia or control group using a Euclidean distance based classifier and a majority vote method. In this paper, a leave-one-out cross validation method is used for performance evaluation. Experimental results prove that the proposed method has an acceptable accuracy.

26 citations


Journal ArticleDOI
TL;DR: A set of user reviews in both scopes of university and cell phone areas were collected and the results of the two methods for feature extraction were compared.
Abstract: Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which plays an important role in making major decisions in such areas. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels due to orientation analysis of different aspects of an area. In this paper, two methods are introduced for a feature extraction. The recommended methods consist of four main stages. First, opinion-mining lexicon for Persian is created. This lexicon is used to determine the orientation of users’ reviews. Second, the preprocessing stage includes unification of writing, tokenization, creating parts-of-speech tagging and syntactic dependency parsing for documents. Third, the extraction of features uses two methods including frequency-based feature extraction and dependency grammar based feature extraction. Fourth, the features and polarities of the word reviews extracted in the previous stage are modified and the final features' polarity is determined. To assess the suggested techniques, a set of user reviews in both scopes of university and cell phone areas were collected and the results of the two methods were compared.

20 citations


Journal ArticleDOI
TL;DR: The developed C4.5 decision tree provides a viable tool for civil engineers to determine the liquefaction potential of soil and is compared with the available artificial neural network (ANN) and relevance vector machine (RVM).
Abstract: The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the over-fitting. This article examines the capability of C4.5 decision tree for the prediction of seismic liquefaction potential of soil based on the Cone Penetration Test (CPT) data. The database contains the information about cone resistance (q_c), total vertical stress (σ_0), effective vertical stress (σ_0^'), mean grain size (D_50), normalized peak horizontal acceleration at ground surface (a_max), cyclic stress ratio (τ/σ_0^') and earthquake magnitude (M_w). The overall classification success rate for the entire data set is 98%. The results of C4.5 decision tree have been compared with the available artificial neural network (ANN) and relevance vector machine (RVM) models. The developed C4.5 decision tree provides a viable tool for civil engineers to determine the liquefaction potential of soil.

15 citations


Journal ArticleDOI
TL;DR: Performance of the proposed hashing method is evaluated with an important application in perceptual image hashing scheme: image authentication and experiments are conducted to show that the present method has acceptable robustness against perceptual content-preserving manipulations.
Abstract: Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for perceptual image hashing. In feature extraction, we propose to use both sign and magnitude information of local differences. So, the algorithm utilizes a combination of gradient-based and LBP-based descriptors for feature extraction. To provide security needs, two secret keys are incorporated in feature extraction and hash generation steps. Performance of the proposed hashing method is evaluated with an important application in perceptual image hashing scheme: image authentication. Experiments are conducted to show that the present method has acceptable robustness against perceptual content-preserving manipulations. Moreover, the proposed method has this capability to localize the tampering area, which is not possible in all hashing schemes.

15 citations


Journal ArticleDOI
TL;DR: A technique for clustering time series data using a particle swarm optimization (PSO) approach is proposed, and Pearson Correlation Coefficient as one of the most commonly-used distance measures for time series is considered.
Abstract: With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because of different applications, the problem of clustering the time series data has become highly popular and many algorithms have been proposed in this field. Recently Swarm Intelligence (SI) as a family of nature inspired algorithms has gained huge popularity in the field of pattern recognition and clustering. In this paper, a technique for clustering time series data using a particle swarm optimization (PSO) approach has been proposed, and Pearson Correlation Coefficient as one of the most commonly-used distance measures for time series is considered. The proposed technique is able to find (near) optimal cluster centers during the clustering process. To reduce the dimensionality of the search space and improve the performance of the proposed method, a singular value decomposition (SVD) representation of cluster centers is considered. Experimental results over three popular data sets indicate the superiority of the proposed technique in comparing with fuzzy C-means and fuzzy K-medoids clustering techniques.

11 citations


Journal ArticleDOI
TL;DR: An instance reduction method, which is based on the DDS is proposed, namely IRDDS (Instance Reduction based on Distance-based Decision Surface), which selects the most representative instances, satisfying both of the following objectives: high accuracy and reduction rates.
Abstract: In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classification or training could be reduced. Instance-based learning methods are often confronted with the difficulty of choosing the instances, which must be stored to be used during an actual test. Storing too many instances may result in large memory requirements and slow execution speed. In this paper, first, a Distance-based Decision Surface (DDS) is proposed and is used as a separate surface between the classes, and then an instance reduction method, which is based on the DDS is proposed, namely IRDDS (Instance Reduction based on Distance-based Decision Surface). Using the DDS with Genetic algorithm selects a reference set for classification. IRDDS selects the most representative instances, satisfying both of the following objectives: high accuracy and reduction rates. The performance of IRDDS is evaluated on real world data sets from UCI repository by the 10-fold cross-validation method. The results of the experiments are compared with some state-of-the-art methods, which show the superiority of the proposed method, in terms of both classification accuracy and reduction percentage.

11 citations


Journal ArticleDOI
TL;DR: Simulation results indicated that the novel proposed method has better performance in parameters such as detection accuracy (DA) and a false alarm rate (FAR) even with a large set of faulty sensor nodes.
Abstract: Wireless sensor networks (WSNs) consist of a large number of sensor nodes which are capable of sensing different environmental phenomena and sending the collected data to the base station or Sink. Since sensor nodes are made of cheap components and are deployed in remote and uncontrolled environments, they are prone to failure; thus, maintaining a network with its proper functions even when undesired events occur is necessary which is called fault tolerance. Hence, fault management is essential in these networks. In this paper, a new method has been proposed with particular attention to fault tolerance and fault detection in WSN. The performance of the proposed method was simulated in MATLAB. The proposed method was based on majority vote which can detect permanently faulty sensor nodes with high detection. Accuracy and low false alarm rate were excluded them from the network. To investigate the efficiency of the new method, the researchers compared it with Chen, Lee, and hybrid algorithms. Simulation results indicated that the novel proposed method has better performance in parameters such as detection accuracy (DA) and a false alarm rate (FAR) even with a large set of faulty sensor nodes.

9 citations


Journal ArticleDOI
TL;DR: In this article, a supervised feature extraction method based on discriminant analysis (DA) which uses the first principal component (PC1) to weight the scatter matrices was proposed.
Abstract: When the number of training samples is limited, feature reduction plays an important role in classification of hyperspectral images. In this paper, we propose a supervised feature extraction method based on discriminant analysis (DA) which uses the first principal component (PC1) to weight the scatter matrices. The proposed method, called DA-PC1, copes with the small sample size problem and has not the limitation of linear discriminant analysis (LDA) in the number of extracted features. In DA-PC1, the dominant structure of distribution is preserved by PC1 and the class separability is increased by DA. The experimental results show the good performance of DA-PC1 compared to some state-of-the-art feature extraction methods.

9 citations


Journal ArticleDOI
TL;DR: Simulation results prove that the method based on a graph algorithm for optimum placement of passive harmonic filters in a multi-bus system is effective and suitable for the passive filter planning in a power system.
Abstract: The harmonic in distribution systems becomes an important problem due to an increase in nonlinear loads This paper presents a new approach based on a graph algorithm for optimum placement of passive harmonic filters in a multi-bus system, which suffers from harmonic current sources The objective of this paper is to minimize the network loss, the cost of the filter and the total harmonic distortion of voltage, and also enhances voltage profile at each bus effectively Four types of sub-graph have been used for search space of optimization The method handles standard capacitor sizes in planning filters and associated costs In this paper, objective function is not differential but eases solving process The IEEE 30 bus test system is used for the placement of passive filter The simulation has been done to show applicability of the proposed method Simulation results prove that the method is effective and suitable for the passive filter planning in a power system

7 citations


Journal ArticleDOI
TL;DR: Three popular linear dimensionality reduction methods are evaluated on the performance of three benchmark anomaly detection algorithms to improve the performance and runtime of these algorithms and make them suitable to be implemented in real time applications.
Abstract: Anomaly Detection (AD) has recently become an important application of hyperspectral images analysis. The goal of these algorithms is to find the objects in the image scene which are anomalous in comparison with their surrounding background. One way to improve the performance and runtime of these algorithms is to use Dimensionality Reduction (DR) techniques. This paper evaluates the effect of three popular linear dimensionality reduction methods on the performance of three benchmark anomaly detection algorithms. The Principal Component Analysis (PCA), Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) as DR methods, act as pre-processing step for AD algorithms. The assessed AD algorithms are ReedXiaoli (RX), Kernel-based versions of the RX (Kernel-RX) and Dual Window-Based Eigen Separation Transform (DWEST). The AD methods have been applied to two hyperspectral datasets acquired by both the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and Hyperspectral Mapper (HyMap) sensors. The evaluation of experiments has been done using Receiver Operation Characteristic (ROC) curve, visual investigation and runtime of the algorithms. Experimental results show that the DR methods can significantly improve the detection performance of the RX method. The detection performance of neither the Kernel-RX method nor the DWEST method changes when using the proposed methods. Moreover, these DR methods increase the runtime of the RX and DWEST significantly and make them suitable to be implemented in real time applications.

6 citations


Journal ArticleDOI
TL;DR: Evaluation results indicated that the proposed method is more effective than other methods mentioned for comparison in this paper and the superiority of the proposed algorithm is using several features of document and users feedback simultaneously.
Abstract: The main challenge of a search engine is ranking web documents to provide the best response to a users query Despite the huge number of the extracted results for users query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance In this paper, a ranking algorithm based on the reinforcement learning and users feedback called RL3F are considered In the proposed algorithm, the ranking system has been considered to be the agent of learning system and selecting documents to display to the user is as the agents' action The reinforcement signal in the system is calculated according to a users clicks on documents Action-value values of the proposed algorithm are computed for each feature In each learning cycle, the documents are sorted out for the next query, and according to the document in the ranked list, documents are selected at random to show the user Learning process continues until the training is completed LETOR3 benchmark is used to evaluate the proposed method Evaluation results indicated that the proposed method is more effective than other methods mentioned for comparison in this paper The superiority of the proposed algorithm is using several features of document and users feedback simultaneously

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed algorithm outperforms SWA and Max-Min1 in terms of maximizing the data utility and data accuracy and it provides better execution time over SWAAndMax1 in high scalability for sensitive itemsets and transactions.
Abstract: Data sanitization process is used to promote the sharing of transactional databases among organizations and businesses, and alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved against association rule mining method. This process strongly relies on the minimizing the impact of data sanitization on the data utility by minimizing the number of lost patterns in the form of non-sensitive patterns which are not mined from sanitized database. This study proposes a data sanitization algorithm to hide sensitive patterns in the form of frequent itemsets from the database while controlling the impact of sanitization on the data utility using estimation of impact factor of each modification on non-sensitive itemsets. The proposed algorithm has been compared with Sliding Window size Algorithm (SWA) and Max-Min1 in terms of execution time, data utility and data accuracy. The data accuracy is defined as the ratio of deleted items to the total support values of sensitive itemsets in the source dataset. Experimental results demonstrate that the proposed algorithm outperforms SWA and Max-Min1 in terms of maximizing the data utility and data accuracy and it provides better execution time over SWA and Max-Min1 in high scalability for sensitive itemsets and transactions.

Journal ArticleDOI
TL;DR: The decoupling of the multi-agent system global error dynamics facilitates the employment of policy iteration and optimal adaptive control techniques to solve the leaderfollower consensus problem under known and unknown dynamics.
Abstract: In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution to algebraic Riccati equations. An optimal adaptive control technique is employed to iteratively solve the algebraic Riccati equation based on the online measured error state and input information for each agent without requiring the priori knowledge of the system matrices. The decoupling of the multi-agent system global error dynamics facilitates the employment of policy iteration and optimal adaptive control techniques to solve the leaderfollower consensus problem under known and unknown dynamics. Simulation results verify the effectiveness of the proposed methods.

Journal ArticleDOI
TL;DR: Similar behaviors of k-means research on English can be expanded to Persian showed that despite many differences between various languages, clustering methods may be extendable to other languages.
Abstract: This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of documents based on their content, it is expected that the answer to this question is yes. On the other hand, many differences between various languages can cause the answer to this question to be no. This research has focused on kmeans that is one of the basic and popular document clustering methods. We want to know whether the clusters of aligned Persian and English texts obtained by the k-means are similar. To find an answer to this question, Mizan English-Persian Parallel Corpus was considered as benchmark. After features extraction using text mining techniques and applying the PCA dimension reduction method, the k-means clustering was performed. The morphological difference between English and Persian languages caused the larger feature vector length for Persian. So almost in all experiments, the English results were slightly richer than those in Persian. Aside from these differences, the overall behavior of Persian and English clusters was similar. These similar behaviors showed that results of k-means research on English can be expanded to Persian. Finally, there is hope that despite many differences between various languages, clustering methods may be extendable to other languages.

Journal ArticleDOI
TL;DR: Experimental results show that limiting number of neighbors for each AUV can lead to more uniform network topologies with larger coverage, and it is further shown that the proposed algorithm is more efficient in terms of major network parameters such as target area coverage, deployment time, and average travelled distance by the AUVs.
Abstract: Data collection from seabed by means of underwater wireless sensor networks (UWSN) has recently attracted considerable attention. Autonomous underwater vehicles (AUVs) are increasingly used as UWSNs in underwater missions. Events and environmental parameters in underwater regions have a stochastic nature. Sensors to observe and report events must cover the target area. A ‘topology control algorithm’ characterizes how well a sensing field is monitored and how well pairs of sensors are mutually connected in UWSNs. It is prohibitive to use a central controller to guide AUVs’ behavior due to ever changing, unknown environmental conditions, limited bandwidth and lossy communication media. In this research, a completely decentralized three-dimensional topology control algorithm for AUVs is proposed. It is aimed at achieving maximal coverage of the target area. The algorithm enables AUVs to autonomously decide on and adjust their speed and direction based on the information collected from their neighbors. Each AUV selects the best movement at each step by independently executing a Particle Swarm Optimization (PSO) algorithm. In the fitness function, the global average neighborhood degree is used as the upper limit of the number of neighbors of each AUV. Experimental results show that limiting number of neighbors for each AUV can lead to more uniform network topologies with larger coverage. It is further shown that the proposed algorithm is more efficient in terms of major network parameters such as target area coverage, deployment time, and average travelled distance by the AUVs.

Journal ArticleDOI
TL;DR: This paper introduces a noise robust new set of MFCC vector estimated through following steps and uses MLP neural network to evaluate the performance of proposed MFCC method and to classify the results.
Abstract: The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to the noisy original speech signal. The pre-emphasized original speech segmented into overlapping time frames, then it is windowed by a modified hamming window .Higher order autocorrelation coefficients are extracted. The next step is to eliminate the lower order of the autocorrelation coefficients. The consequence pass from FFT block and then power spectrum of output is calculated. A Gaussian shape filter bank is applied to the results. Logarithm and two compensator blocks form which one is mean subtraction and the other one are root block applied to the results and DCT transformation is the last step. We use MLP neural network to evaluate the performance of proposed MFCC method and to classify the results. Some speech recognition experiments for various tasks indicate that the proposed algorithm is more robust than traditional ones in noisy condition.

Journal ArticleDOI
TL;DR: A new control method for a three-phase four-wire Unified Power Quality Conditioner (UPQC) to deal with the problems of power quality under distortional and unbalanced load conditions which is optimized by using a self-tuning filter (STF) and without using load or filter currents measurement.
Abstract: This paper presents a new control method for a three-phase four-wire Unified Power Quality Conditioner (UPQC) to deal with the problems of power quality under distortional and unbalanced load conditions. The proposed control approach is the combination of instantaneous power theory and Synchronous Reference Frame (SRF) theory which is optimized by using a self-tuning filter (STF) and without using load or filter currents measurement. In this approach, load and source voltages are used to generate the reference voltages of series active power filter (APF) and source currents are used to generate the reference currents of shunt APF. Therefore, the number of current measurements is reduced and system performance is improved. The performance of proposed control system is tested for cases of power factor correction, reducing source neutral current, load balancing and current and voltage harmonics in a three-phase four-wire system for distortional and unbalanced loads. Results obtained through MATLAB/SIMULINK software show the effectiveness of proposed control technique in comparison to the conventional p-q method.

Journal ArticleDOI
TL;DR: This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm.
Abstract: The stability of learning rate in neural network identifiers and controllers is one of the challenging issues, which attract many researchers’ interest in neural networks. This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm. Also, stable learning algorithm for parameters of MDNN is proposed. By the proposed method, some constraints are obtained for learning rate. Lyapunov stability theory is applied to study the stability of the proposed algorithm. The Lyapunov stability theory guaranteed the stability of the learning algorithm. In the proposed method, the learning rate can be calculated online and will provide an adaptive learning rate for the MDNN structure. Simulation results are given to validate the results.

Journal ArticleDOI
TL;DR: This paper proposes an intermediate structure for the measurement matrix based on random sampling and shows that in spite of simplicity of the proposed approach it can be competitive to the existing methods in terms of reconstruction quality and outperforms existing methodsIn terms of computation time.
Abstract: The focus of this paper is to consider the compressed sensing problem. It is stated that the compressed sensing theory, under certain conditions, helps relax the Nyquist sampling theory and takes smaller samples. One of the important tasks in this theory is to carefully design measurement matrix (sampling operator). Most existing methods in the literature attempt to optimize a randomly initialized matrix with the aim of decreasing the amount of required measurements. However, these approaches mainly lead to sophisticated structure of measurement matrix which makes it very difficult to implement. In this paper we propose an intermediate structure for the measurement matrix based on random sampling. The main advantage of block-based proposed technique is simplicity and yet achieving acceptable performance obtained through using conventional techniques. The experimental results clearly confirm that in spite of simplicity of the proposed approach it can be competitive to the existing methods in terms of reconstruction quality. It also outperforms existing methods in terms of computation time.

Journal ArticleDOI
TL;DR: A novel discrete-time model-free control law is proposed by employing an adaptive fuzzy estimator for the compensation of the uncertainty including model uncertainty, external disturbances and discretization error using a gradient descent algorithm.
Abstract: This paper presents a discrete-time robust control for electrically driven robot manipulators in the task space. A novel discrete-time model-free control law is proposed by employing an adaptive fuzzy estimator for the compensation of the uncertainty including model uncertainty, external disturbances and discretization error. Parameters of the fuzzy estimator are adapted to minimize the estimation error using a gradient descent algorithm. The proposed discrete control is robust against all uncertainties as verified by stability analysis. The proposed robust control law is simulated on a SCARA robot driven by permanent magnet dc motors. Simulation results show the effectiveness of the control approach.

Journal ArticleDOI
TL;DR: The superiority of OFW, in terms of classification accuracy and computation time, over other supervised feature extraction methods is established on three real hyperspectral images in the small sample size situation.
Abstract: Hyperspectral sensors provide a large number of spectral bands. This massive and complex data structure of hyperspectral images presents a challenge to traditional data processing techniques. Therefore, reducing the dimensionality of hyperspectral images without losing important information is a very important issue for the remote sensing community. We propose to use overlap-based feature weighting (OFW) for supervised feature extraction of hyperspectral data. In the OFW method, the feature vector of each pixel of hyperspectral image is divided to some segments. The weighted mean of adjacent spectral bands in each segment is calculated as an extracted feature. The less the overlap between classes is, the more the class discrimination ability will be. Therefore, the inverse of overlap between classes in each band (feature) is considered as a weight for that band. The superiority of OFW, in terms of classification accuracy and computation time, over other supervised feature extraction methods is established on three real hyperspectral images in the small sample size situation.

Journal ArticleDOI
TL;DR: A mixed methodology has been proposed and is a combination of analysis stage of ROADMAP methodology and design stage of AOR and ASPECS methodologies to increase the performance of proposed methodology of actor models, service model, capability and programming.
Abstract: Agent oriented software engineering (AOSE) is an emerging field in computer science and proposes some systematic ideas for multi agent systems analysis, implementation and maintenance. Despite the various methodologies introduced in the agent-oriented software engineering, the main challenges are defects in different aspects of methodologies. According to the defects resulted from weaknesses in agent oriented methodologies in different aspects, a combinatory solution named ARA using, ASPECS, ROADMAP and AOR has been proposed. The three methodologies were analyzed in a comprehensive analytical framework according to concepts and Perceptions, modeling language, process and pragmatism. According to time and resource limitations, sample methodologies for evaluation and in titration were selected. This selection was based on the use of methodologies' and their combination ability. The evaluation show that, the ROADMAP methodology supports stages of agent-oriented systems' analysis and the design stage is not complete because it doesn’t model all semi agents. On the other hand, since AOR and ASPECS methodologies support the design stage and inter agent interactions, a mixed methodology has been proposed and is a combination of analysis stage of ROADMAP methodology and design stage of AOR and ASPECS methodologies. Furthermore, to increase the performance of proposed methodology of actor models, service model, capability and programming were also added to this proposed methodology. To describe its difference phases, it was used in a case study too. Results of this project can pave the way to introduce future agent-oriented methodologies.

Journal ArticleDOI
TL;DR: The core amino acids propensity could be approved as a novel potential descriptor for the classification of enzymes by taking the amino acid propensity at the core, surface and both the parts.
Abstract: The present work was designed to classify and differentiate between the dehalogenase enzyme and non– dehalogenases (other hydrolases) by taking the amino acid propensity at the core, surface and both the parts. The data sets were made on an individual basis by selecting the 3D structures of protein available in the PDB (Protein Data Bank). The prediction of the core amino acids were predicted by IPFP tool and their structural propensity calculation was performed by an in-house built software, Propensity Calculator which is available online. All datasets were finally grouped into two categories, namely dehalogenase and non-dehalogenase using Naive Bayes, J-48, Random forest, K-means clustering, and SMO classification algorithm. By making the comparison of various classification methods, the proposed tree method (Random forest) performs well with a classification accuracy of 98.88 % (maximum) for the core propensity data set. Therefore, we proposed that, the core amino acid propensity could be approved as a novel potential descriptor for the classification of enzymes.

Journal ArticleDOI
TL;DR: A two-dimensional model of software security is proposed by Stochastic Petri Net, which provides the possibility to investigate and compare different solutions with the target system in the designing phase and calculates the security prediction based on the probability distribution of the MC in the steady state.
Abstract: To evaluate and predict component-based software security, a two-dimensional model of software security is proposed by Stochastic Petri Net in this paper. In this approach, the software security is modeled by graphical presentation ability of Petri nets, and the quantitative prediction is provided by the evaluation capability of Stochastic Petri Net and the computing power of Markov chain. Each vulnerable component is modeled by Stochastic Petri net and two parameters, Successfully Attack Probability (SAP) and Vulnerability Volume of each component to another component. The second parameter, as a second dimension of security evaluation, is a metric that is added to modeling to improve the accuracy of the result of system security prediction. An isomorphic Markov chain is obtained from a corresponding SPN model. The security prediction is calculated based on the probability distribution of the MC in the steady state. To identify and trace back to the critical points of system security, a sensitive analysis method is applied by derivation of the security prediction equation. It provides the possibility to investigate and compare different solutions with the target system in the designing phase.