scispace - formally typeset
Search or ask a question

Showing papers by "Albert Y. Zomaya published in 2008"


Journal ArticleDOI
TL;DR: A measure of local information transfer, derived from an existing averaged information-theoretical measure, namely, transfer entropy, is presented, providing the first quantitative evidence for the long-held conjecture that the emergent traveling coherent structures known as particles are the dominant information transfer agents in cellular automata.
Abstract: We present a measure of local information transfer, derived from an existing averaged information-theoretical measure, namely, transfer entropy. Local transfer entropy is used to produce profiles of the information transfer into each spatiotemporal point in a complex system. These spatiotemporal profiles are useful not only as an analytical tool, but also allow explicit investigation of different parameter settings and forms of the transfer entropy metric itself. As an example, local transfer entropy is applied to cellular automata, where it is demonstrated to be a useful method of filtering for coherent structure. More importantly, local transfer entropy provides the first quantitative evidence for the long-held conjecture that the emergent traveling coherent structures known as particles (both gliders and domain walls, which have analogs in many physical processes) are the dominant information transfer agents in cellular automata.

282 citations


Journal ArticleDOI
TL;DR: The algorithm developed combines the inherent efficiency of the centralized approach and the fault-tolerant nature of the distributed, decentralized approach to solve the grid load-balancing problem.
Abstract: Load balancing is a very important and complex problem in computational grids. A computational grid differs from traditional high-performance computing systems in the heterogeneity of the computing nodes, as well as the communication links that connect the different nodes together. There is a need to develop algorithms that can capture this complexity yet can be easily implemented and used to solve a wide range of load-balancing scenarios. In this paper, we propose a game-theoretic solution to the grid load-balancing problem. The algorithm developed combines the inherent efficiency of the centralized approach and the fault-tolerant nature of the distributed, decentralized approach. We model the grid load-balancing problem as a noncooperative game, whereby the objective is to reach the Nash equilibrium. Experiments were conducted to show the applicability of the proposed approaches. One advantage of our scheme is the relatively low overhead and robust performance against inaccuracies in performance prediction information.

127 citations


Journal ArticleDOI
01 Oct 2008-EPL
TL;DR: A measure of local assortativeness that quantifies the level of assortative mixing for individual nodes in the context of the overall network is introduced, and it is shown that such a measure is useful in analyzing network's robustness against targeted attacks.
Abstract: The level of assortative mixing of nodes in real-world networks gives important insights about the networks design and functionality, and has been analyzed in detail. However, this network-level measure conveys insufficient information about the local-level structure and motifs present in networks. We introduce a measure of local assortativeness that quantifies the level of assortative mixing for individual nodes in the context of the overall network. We show that such a measure, together with the resultant local assortativeness distributions for the network, is useful in analyzing network's robustness against targeted attacks. We also study local assortativeness in real-world networks, identifying different phases of network growth, showing that biological and social networks display markedly different local assortativeness distributions to technological networks, and discussing the implications to network design.

97 citations


Journal Article
TL;DR: This work uses a recently published framework to characterize the distributed computation in terms of its underlying information dynamics: information storage, information transfer and information modification, and finds maximizations in information storage and coherent information transfer on either side of the critical point.
Abstract: Random Boolean Networks (RBNs) are discrete dynamical systems which have been used to model Gene Regulatory Networks. We investigate the well-known phase transition between ordered and chaotic behavior in RBNs from the perspective of the distributed computation conducted by their nodes. We use a recently published framework to characterize the distributed computation in terms of its underlying information dynamics: information storage, information transfer and information modification. We find maximizations in information storage and coherent information transfer on either side of the critical point, allowing us to explain the phase transition in RBNs in terms of the intrinsic distributed computations they are undertaking.

84 citations


Journal ArticleDOI
TL;DR: This paper model the QoS based, grid job allocation problem as a cooperative game and presents the structure of the Nash Bargaining Solution, which represents a Pareto optimal solution to theQoS objective.
Abstract: A grid differs from traditional high performance computing systems in the heterogeneity of the computing nodes as well as the communication links that connect the different nodes together. In grids there exist users and service providers. The service providers provide the service for jobs that the users generate. Typically the amount of jobs generated by all the users are more than any single provider can handle alone with any acceptable quality of service (QoS). As such, the service providers need to cooperate and allocate jobs among them so that each is providing an acceptable QoS to their customers. QoS is of particular concerns to service providers as it directly affects customers' satisfaction and loyalty. In this paper, we propose a game theoretic solution to the QoS sensitive, grid job allocation problem. We model the QoS based, grid job allocation problem as a cooperative game and present the structure of the Nash Bargaining Solution. The proposed algorithm is fair to all users and represents a Pareto optimal solution to the QoS objective. One advantage of our scheme is the relatively low overhead and robust performance against inaccuracies in performance prediction information.

74 citations


Journal ArticleDOI
TL;DR: The proposed IGRN was trained using a PSSM, secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence and showed superior predictive performance and generalisation ability among the most widely used neural network models.
Abstract: Background Protein domains present some of the most useful information that can be used to understand protein structure and functions. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques, such as Artificial Neural Networks and Support Vector Machines. In this study, we propose a new machine learning model (IGRN) that can achieve accurate and reliable classification, with significantly reduced computations. The IGRN was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence.

59 citations


Journal ArticleDOI
TL;DR: The proposed Duplication-based State Transition (DST) method is incorporated into three different metaheuristics: genetic algorithms (GAs), simulated annealing (SA), and artificial immune system (AISs) and experimental results confirm DST's promising impact on the performance of meta heuristics.
Abstract: Much of the recent literature shows a prevalance in the use of metaheuristics in solving a variety of problems in parallel and distributed computing. This is especially ture for problems that have a combinatorial nature, such as scheduling and load balancing. Despite numerous efforts, task scheduling remains one of the most challenging problems in heterogeneous computing environments. In this paper, we propose a new state transitionscheme , called the Duplication-based State Transition (DST) method specially designed for metaheuristics that can be used for the task scheduling problem in heterogeneous computing environments. State transition in metaheuristics is a key component that takes charge of generating variants of a given state. The DST method produces a new state by first overlapping randomly generated states with the current state and then the resultant state is refined by removing ineffectual tasks. The proposed method is incorporated into three different metaheuristics: genetic algorithms (GAs), simulated annealing (SA), and artificial immune system (AISs). They are experimentally evaluated and are also compared with existing algorithms. The experimental results confirm DST's promising impact on the performance of metaheuristics.

56 citations


Book ChapterDOI
26 Mar 2008
TL;DR: This study assesses the performance of two different nature inspired algorithms for mobile location management using a recent version of Particle Swarm Optimization based on geometric ideas and shows that the proposed techniques outperform existing methods in the literature.
Abstract: Mobile Location Management (MLM) is an important and complex telecommunication problem found in mobile cellular GSM networks. Basically, this problem consists in optimizing the number and location of paging cells to find the lowest location management cost. There is a need to develop techniques capable of operating with this complexity and used to solve a wide range of location management scenarios. Nature inspired algorithms are useful in this context since they have proved to be able to manage large combinatorial search spaces efficiently. The aim of this study is to assess the performance of two different nature inspired algorithms when tackling this problem. The first technique is a recent version of Particle Swarm Optimization based on geometric ideas. This approach is customized for the MLM problem by using the concept of Hamming spaces. The second algorithm consists of a combination of the Hopfield Neural Network coupled with a Ball Dropping technique. The location management cost of a network is embedded into the parameters of the Hopfield Neural Network. Both algorithms are evaluated and compared using a series of test instances based on realistic scenarios. The results are very encouraging for current applications, and show that the proposed techniques outperform existing methods in the literature.

42 citations



Journal ArticleDOI
TL;DR: In this article, the authors proposed a new machine learning based domain predictor named DomNet that can show a more accurate and stable predictive performance than the existing state-of-the-art models.
Abstract: The accurate and stable prediction of protein domain boundaries is an important avenue for the prediction of protein structure, function, evolution, and design. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques. In this paper, we propose a new machine learning based domain predictor namely, DomNet that can show a more accurate and stable predictive performance than the existing state-of-the-art models. The DomNet is trained using a novel compact domain profile, secondary structure, solvent accessibility information, and interdomain linker index to detect possible domain boundaries for a target sequence. The performance of the proposed model was compared to nine different machine learning models on the Benchmark_2 dataset in terms of accuracy, sensitivity, specificity, and correlation coefficient. The DomNet achieved the best performance with 71% accuracy for domain boundary identification in multidomains proteins. With the CASP7 benchmark dataset, it again demonstrated superior performance to contemporary domain boundary predictors such as DOMpro, DomPred, DomSSEA, DomCut, and DomainDiscovery.

35 citations


Journal ArticleDOI
TL;DR: The newly proposed methods used in SiteSeek were shown to be useful for the identification of protein phosphorylation sites as it performed much better than widely known predictors on the newly built PS-Benchmark_1 dataset.
Abstract: Post-translational modifications have a substantial influence on the structure and functions of protein. Post-translational phosphorylation is one of the most common modification that occur in intracellular proteins. Accurate prediction of protein phosphorylation sites is of great importance for the understanding of diverse cellular signalling processes in both the human body and in animals. In this study, we propose a new machine learning based protein phosphorylation site predictor, SiteSeek. SiteSeek is trained using a novel compact evolutionary and hydrophobicity profile to detect possible protein phosphorylation sites for a target sequence. The newly proposed method proves to be more accurate and exhibits a much stable predictive performance than currently existing phosphorylation site predictors. The performance of the proposed model was compared to nine existing different machine learning models and four widely known phosphorylation site predictors with the newly proposed PS-Benchmark_1 dataset to contrast their accuracy, sensitivity, specificity and correlation coefficient. SiteSeek showed better predictive performance with 86.6% accuracy, 83.8% sensitivity, 92.5% specificity and 0.77 correlation-coefficient on the four main kinase families (CDK, CK2, PKA, and PKC). Our newly proposed methods used in SiteSeek were shown to be useful for the identification of protein phosphorylation sites as it performed much better than widely known predictors on the newly built PS-Benchmark_1 dataset.

Journal Article
TL;DR: In the most fit snakebot in the final generation, the first known application of a direct measure of information transfer, transfer entropy, as a fitness function to evolve a self-organized multi-agent system is reported.
Abstract: Information-driven evolutionary design has been proposed as an efficient method for designing self-organized multi-agent systems. Information transfer is known to be an important component of distributed computation in many complex systems, and indeed it has been suggested that maximization of information transfer can give rise to interesting behavior and induce necessary structure in a system. In this paper, we report the first known application of a direct measure of information transfer, transfer entropy, as a fitness function to evolve a self-organized multi-agent system. The system evolved here is a simulated snake-like modular robot. In the most fit snakebot in the final generation, we observe coherent traveling information transfer structures. These are analogous to gliders in cellular automata, which have been demonstrated to represent the coherent transfer of information across space and time, and play an important role in facilitating distributed computation. These observations provide evidence that using information transfer to drive evolutionary design can produce useful structure in the underlying system.

Proceedings ArticleDOI
19 May 2008
TL;DR: This paper proposes a distributed, non-cooperative game theoretic approach to the data replication problem in grids that directly takes into account the self interest and priorities of the different providers in a grid, and maximizes the utility of each provider individually.
Abstract: Data grids and its cost effective nature has taken on a new level of interest in recent years; amalgamation of different providers results in increased capacity as well as lower energy costs. As a result, there are efforts worldwide to design more efficient data replication algorithms. Such replication algorithm for grids is further complicated by the fact that the different sites in a grid system are likely to have different ownerships with their own self interest and priorities. As such, any replication algorithm that simply aims to minimize total job delays are likely to fail in grids. Further, a grid differs from traditional high performance computing systems in the heterogeneity of the communication links that connect the different nodes together. In this paper, we propose a distributed, non-cooperative game theoretic approach to the data replication problem in grids. Our proposed replication scheme directly takes into account the self interest and priorities of the different providers in a grid, and maximizes the utility of each provider individually. Experiments were conducted to show the applicability of the proposed approaches. One advantage of our scheme is the relatively low overhead and robust performance against inaccuracies in performance prediction information.

Proceedings ArticleDOI
07 May 2008
TL;DR: A survey on bio-inspired algorithms formerly used to solve the location management problem in two different network schemes: paging cell and location area is presented, providing new insight into the mobility management problem that can influence the design of future networks.
Abstract: This work presents a survey on bio-inspired algorithms formerly used to solve the location management problem in two different network schemes: paging cell and location area. For a better comparison, two test networks, generated by intelligent algorithms to present real world traffics, are used in this work. Several conclusions were made after comparing their answers. The results provide new insight into the mobility management problem that can influence the design of future networks.

Journal IssueDOI
01 Mar 2008
TL;DR: In this approach (HNN-BDT-PC), a combination of the Hopfield Neural Network and the author's Ball Dropping Technique is used to solve the mobile location management problem using the Paging Cells scheme.
Abstract: This paper presents a new approach to solving the mobile location management problem using the Paging Cells (PCs) scheme. In this approach (HNN-BDT-PC), a combination of the Hopfield Neural Network (HNN) and the author's Ball Dropping Technique (BDT) is used to solve the problem. To this end, the location management cost of a network is embedded in the HNN parameters, and by iteration, the mobile network gradually moves toward an optimal state. The approach is inspired by the phenomenon that results from the natural movement of balls when they are dropped onto a non-even plate (a plate with troughs and crests). Each trough of the plate corresponds to a PC, and the network corresponds to the whole plate. The aim is to find optimal PC configuration (i.e., the troughs) of the network. Three main procedures are used in the optimization process; in each optimization cycle, the HNN is launched to move the balls around the plate, and, by analogy, to move the PCs around the network to find the best configuration. Copyright © 2007 John Wiley & Sons, Ltd.

01 Jan 2008
TL;DR: This paper presents a novel approach to solve the Multiple Sequence Alignment (MSA) problem and shows the superiority of the proposed technique even in the case of formidable sequences.

Proceedings ArticleDOI
31 Mar 2008
TL;DR: This approach tries to achieve a double optimization effect from both the replica management and the scheduling phases, while integrating scheduling and data replication to improve the performance of the grid system.
Abstract: In data grid environments data-intensive applications require large amounts of data to execute. Data transfer is a primary cause of job execution delay. In this paper we study smart scheduling integrated with replica management optimization to improve system performance. We study the use of genetic algorithm (GA) for the scheduling phase of data-intensive applications. The schedulers proposed incorporate information about the datasets and their replicas needed by the jobs to be scheduled, and co-schedules the jobs and the datasets to the computation node guaranteeing minimum job execution time. We employ a data grid replica management framework for the optimization phase of the replica distribution. In this approach we try to achieve a double optimization effect from both the replica management and the scheduling phases, while integrating scheduling and data replication to improve the performance of the grid system. We evaluate and compare our genetic algorithm (GA) with a Tabu search (TS) and the de facto max-min based schedulers.

Proceedings ArticleDOI
31 Mar 2008
TL;DR: An analytical model for implementing Priority Queueing in a sensor node to calculate the queueing delay is presented and is based on M/D/l queueing system (a special class of M/G/L queueing systems).
Abstract: Recent advances in miniaturization and low power design have led to a flurry of activity in wireless sensor networks. However, the introduction of real time communication has created additional challenges in this area. The sensor node spends most of its life in routing packets from one node to another until the packet reaches the sink In other words, we can say that it is functioning as a small router most of the time. Since sensor networks deal with time-critical applications, it is often necessary for communication to meet real time constraints. However, research dealing with providing QoS guarantees for real time traffic in sensor networks is still in its infancy. In this paper, an analytical model for implementing Priority Queueing (PQ) in a sensor node to calculate the queueing delay is presented. The model is based on M/D/l queueing system (a special class of M/G/l queueing systems). Here, two different classes of traffic are considered. The exact packet delay for corresponding classes is calculated. Further, the analytical results are validated through an extensive simulation study.

Proceedings ArticleDOI
31 Mar 2008
TL;DR: The novel approach to solve the multiple sequence alignment (MSA) problem, inspired by the elastic behavior of a rubber band on a plate with poles, shows the superiority of the proposed technique even in the case of formidable sequences.
Abstract: This paper presents a novel approach to solve the multiple sequence alignment (MSA) problem. the rubber band technique: index base (RBT-I) introduced in this paper, is inspired by the elastic behavior of a rubber band (RB) on a plate with poles. RBT-I is an iterative optimization algorithm designed and implemented to find the optimal alignment for a set of input protein sequences. In this technique, the alignment answer of the MSA problem is modeled as a RB, while the answer space is modeled as the plate with several poles resembling locations in the input sequences that are most likely to be correlated and/or biologically related. Fixing the head and tail of the RB at two corners of this plate, the RB is free to bend and finds its best configuration, yielding the best answer for the MSA problem. RBT-I is tested with one of the well-known benchmarks (BALiBASE 2.0) in this field. The obtained results show the superiority of the proposed technique even in the case of formidable sequences.

Proceedings ArticleDOI
05 May 2008
TL;DR: A novel analytical model based on limited service polling discipline based on M/D/l queueing system, which takes into account two different classes of traffic in a sensor node, to provide guaranteed QoS in WSN.
Abstract: Data gathering in a timely and reliable fashion has been a key concern in wireless sensor networks particularly related to military applications. The introduction of real time communication has created additional challenges in this area with different communication constraints. Since sensor networks represent a new generation of time-critical applications, it is often necessary for communication to meet real time constraints. However, research dealing with providing QoS guarantees for real time traffic in sensor networks is still immature. To provide guaranteed QoS in WSN, this paper presents a novel analytical model based on limited service polling discipline. The proposed model implements two queues in a sensor node which are being served according to round robin service. The model is based on M/D/l queueing system (a special class of M/G/l queueing systems), which takes into account two different classes of traffic in a sensor node. The exact queueing delay in a sensor node for corresponding classes is calculated. Further, the analytical results are validated through an extensive simulation study.

Proceedings ArticleDOI
14 Apr 2008
TL;DR: A theoretical framework that employs global statistics learnt from gene expression data to infer different network structural properties of large- scale gene regulatory networks and Experimental results show that the developed system is more superior to previously published results.
Abstract: Considerable attempts have been made to develop models and learning strategies to infer gene networks starting from single connections. However, due to noise and other difficulties that arise from making measurements at the meso and nano levels, these so called bottom-up approaches have not been of much success. The need for methods that use a top-down approach to extract global statistics from expression data has emerged to deal with such difficulties. This paper presents a theoretical framework that employs global statistics learnt from gene expression data to infer different network structural properties of large- scale gene regulatory networks. The framework is inspired by genetic algorithms and designed with the aim to address the different weaknesses in existing approaches. Experimental results show that the developed system is more superior to previously published results.

Book ChapterDOI
01 Jan 2008
TL;DR: This chapter will provide a comprehensive study on fast authentication solutions found in the literature as well as the industry that address this problem, and detail such a solution that explores the use of local trust relationships to foster fast authentication.
Abstract: Wireless local area networks (WLAN) are rapidly becoming a core part of network access. Supporting user mobility, more specifically session continuation in changing network access points, is becoming an integral part of wireless network services. This is because of the popularity of emerging real-time streaming applications that can be commonly used when the user is mobile, such as voice-over-IP and Internet radio. However, mobility introduces a new set of problems in wireless environments because of handoffs between network access points (APs). The IEEE 802.11i security standard imposes an authentication delay long enough to hamper real-time applications. This chapter will provide a comprehensive study on fast authentication solutions found in the literature as well as the industry that address this problem. These proposals focus on solving the mentioned problem for intradomain handoff scenarios where the access points belong to the same administrative domain or provider. Interdomain roaming is also becoming common-place for wireless access. We need fast authentication solutions for these environments that are managed by independent administrative authorities. We detail such a solution that explores the use of local trust relationships to foster fast authentication. (Less)

Book ChapterDOI
26 Mar 2008
TL;DR: This chapter contains sections titled: Introduction Limitations in sensor Networks Sensor Networks and Manets Security in Sensor Networks Cryptography in Sensor networks Key Management Schemes Secure Routing Summary Exercises Bibliography.
Abstract: This chapter contains sections titled: Introduction Limitations in Sensor Networks Sensor Networks and Manets Security in Sensor Networks Cryptography in Sensor Networks Key Management Schemes Secure Routing Summary Exercises Bibliography

Proceedings ArticleDOI
14 Apr 2008
TL;DR: The results from the thorough and extensive evaluation study confirm the superior performance of MSMD, and its generic applicability compared with previous approaches that only consider one or the other of the task requirements.
Abstract: This paper addresses the problem of scheduling bag-of-tasks (BoT) applications in grids and presents a novel heuristic, called the most suitable match with danger model support algorithm (MSMD) for these applications. Unlike previous approaches, MSMD is capable of efficiently dealing with BoT applications regardless of whether they are computationally or data intensive, or a mixture of both; this strength of MSMD is achieved by making scheduling decisions based on the suitability of resource-task matches, instead of completion time. MSMD incorporates an artificial danger model - based on the danger model in immunology - which selectively responds to unexpected behaviors of resources and applications, in order to increase fault-tolerance. The results from our thorough and extensive evaluation study confirm the superior performance of MSMD, and its generic applicability compared with previous approaches that only consider one or the other of the task requirements.

Journal ArticleDOI
TL;DR: A new version of the problem of capturing an intruder in two popular interconnection topologies namely, the mesh and the torus, where each agent can replicate new agents when needed, i.e. the algorithm starts with a single agent and new agents are created on demand.
Abstract: In this paper, we propose a solution for the problem of capturing an intruder in two popular interconnection topologies namely, the mesh and the torus. A set of agents collaborate to capture a hostile intruder in the network. While the agents can move in the network one hop at a time, the intruder is assumed to be arbitrarily fast i.e. it can traverse any number of nodes contiguously as far as there are no agents in those nodes. Here we consider a new version of the problem where each agent can replicate new agents when needed, i.e. the algorithm starts with a single agent and new agents are created on demand. We define a new class of algorithms for capturing an intruder. In particular, we propose two different algorithms on mesh and torus networks and we will later discuss about the merits of each algorithm based on some performance criteria.

Proceedings ArticleDOI
30 Oct 2008
TL;DR: The talk will present several scenarios for static and dynamic mobility management instances incorporating a combination of metaheuristics, and show that hybrid approaches are more capable at producing efficient solutions.
Abstract: In order to support a wide range of data transfer and user applications, mobility management becomes a crucial factor when designing infrastructure for wireless mobile networks. Mobility management requests are often initiated either by a mobile terminal movement (crossing a cell boundary) or by deterioration in the quality of a received signal on a currently allocated channel. Due to the anticipated increase in the usage of the wireless services in the future, the next generation of mobile networks should be able to support a huge number of users and their bandwidth requirements.The talk will address some of the key algorithmic and computational challenges associated with the mobility management problem. The talk will present several scenarios for static and dynamic mobility management instances incorporating a combination of metaheuristics. The studies show that hybrid approaches are more capable at producing efficient solutions. From a practical standpoint, these approaches have the potential to lead to massive savings in the number of network signal transactions made to locate users. Several hybrid approaches are used with a number of test networks to show their advantages to the currently implemented GSM standards. The results provide new insights into the mobility management problem.

Journal ArticleDOI
TL;DR: Domain discovery performed significantly well compared to the structure-based methods like structural classification of proteins (SCOP), class, architecture, topology and homologous superfamily (CATH), and domain maker (DOMAK).
Abstract: Wetlaufer introduced the classification of domains into continuous and discontinuous. Continuous domains form from a single-chain segment and discontinuous domains are composed of two or more chain segments. Richardson identified approximately 100 domains in her review. Her assignment was based on the concepts that the domain would be independently stable and/or could undergo rigid-body-like movements with respect to the entire protein. There are now several instances where structurally similar domains occur in different proteins in the absence of noticeable sequence similarity. Possibly, the most notable of such domains is the trios-phosphate isomerase (TIM) barrel. With the increase in the number of known sequences, computer algorithms are required to identify the discontinuous domain of an unknown protein chain in order to determine its structure and function. We have developed a novel algorithm for discontinuous-domain boundary prediction based on a machine learning algorithm and interresidue contact interactions values. We have used 415 proteins, including 100 discontinuous-domain chains for training. There is no method available that is designed solely on a sequence based for the prediction of discontinuous domain. Domain discovery performed significantly well compared to the structure-based methods like structural classification of proteins (SCOP), class, architecture, topology and homologous superfamily (CATH), and domain maker (DOMAK).

Proceedings ArticleDOI
14 Apr 2008
TL;DR: A new machine learning model namely, adaptive locality-effective kernel machine (Adaptive-LEKM) for protein phosphorylation site prediction proves to be more accurate and exhibits a much stable predictive performance over the existing machine learning models.
Abstract: In this study, we propose a new machine learning model namely, adaptive locality-effective kernel machine (Adaptive-LEKM) for protein phosphorylation site prediction. Adaptive-LEKM proves to be more accurate and exhibits a much stable predictive performance over the existing machine learning models. Adaptive-LEKM is trained using Position Specific Scoring Matrix (PSSM) to detect possible protein phosphorylation sites for a target sequence. The performance of the proposed model was compared to seven existing different machine learning models on newly proposed PS-Benchmark_l dataset in terms of accuracy, sensitivity, specificity and correlation coefficient. Adaptive-LEKM showed better predictive performance with 82.3% accuracy, 80.1% sensitivity, 84.5% specificity and 0.65 correlation- coefficient than contemporary machine learning models.