scispace - formally typeset
Search or ask a question

Showing papers presented at "International Conference on Bioinformatics in 2013"


Proceedings ArticleDOI
22 Sep 2013
TL;DR: An integrative method is developed to identify patterns from multiple experiments simultaneously while taking full advantage of high-resolution data, discovering joint patterns across different assay types, and yields a model which elucidates the relationship between assay observations and functional elements in the genome.
Abstract: Sequence census methods like ChIP-seq now produce an unprecedented amount of genome-anchored data. We have developed an integrative method to identify patterns from multiple experiments simultaneously while taking full advantage of high-resolution data, discovering joint patterns across different assay types. We apply this method to ENCODE chromatin data for the human chronic myeloid leukemia cell line K562, including ChIP-seq data on covalent histone modifications and transcription factor binding, and DNase-seq and FAIRE-seq readouts of open chromatin. In an unsupervised fashion, we identify patterns associated with transcription start sites, gene ends, enhancers, CTCF elements, and repressed regions. The method yields a model which elucidates the relationship between assay observations and functional elements in the genome. This model identifies sequences likely to affect transcription, and we verify these predictions in laboratory experiments. We have made software and an integrative genome browser track freely available (noble.gs.washington.edu/proj/segway/).

528 citations


Journal ArticleDOI
30 Oct 2013
TL;DR: The goal of this work is to support local governments and public administrations in effective smart city implementation, able to create public value and well being for citizens and environmental sustainability in the urban space.
Abstract: During the latest years, smart city projects have been more and more popular and widespread all over the world. The continuous increasing of city’s population and the complexity of city management drive local governments towards the strong use of technologies to support a higher quality of urban spaces and a better offering of public services. The fascination of smart cities, able to link high technology, green environment and well-being for citizens, interests all the municipalities, independently on their dimensions, geographical area or culture. However, the concept of smart city is far from to be unambiguous. Several experiences all over the world show that cities define themselves as smart, but the meaning attributed to this word is different each time. Smart city concept has been growing from empirical experience, therefore a systemic theoretical study about this phenomenon still lacks. In this paper, the author aims to propose a comprehensive and verified definition of smart city, based on both a deep literature investigation about smart city studies and a large survey of smart city projects in the international panorama. The goal of this work is not only to provide a clear framework about this interesting and current topic, but also to support local governments and public administrations in effective smart city implementation, able to create public value and well being for citizens and environmental sustainability in the urban space.

422 citations


Journal ArticleDOI
27 Mar 2013
TL;DR: In this article, the authors focus on one of the most recurring themes of resistance to change, readiness for change, leadership effectiveness, commitment and participation in change initiatives, and roles and competencies needed to ensure the success of strategic change.
Abstract: As change management becomes an essential ingredient of organizations’ performance, the body of literature describing successful and unsuccessful change management initiatives continues to expand. Numerous articles and studies provide an insight into the nature of change management and its most common pitfalls. The most recurring themes include resistance to change, readiness for change, leadership effectiveness, employee commitment and participation in change initiatives, and the roles and competencies needed to ensure the success of strategic change. The present article focuses on one of these themes: resistance to change. Understanding of resistance may enable managers to reduce conflict and increase collaboration. To meet these challenges, leaders must be trained and educated to overcome resistance to change. This article points out important types of resistance for organizations to address.

83 citations


Proceedings ArticleDOI
22 Sep 2013
TL;DR: An evolutionary stochastic search algorithm is presented to obtain a discrete representation of the protein energy surface in terms of an ensemble of conformations representing local minima to result in protein conformational search algorithms with high exploration capability.
Abstract: We present an evolutionary stochastic search algorithm to obtain a discrete representation of the protein energy surface in terms of an ensemble of conformations representing local minima. This objective is of primary importance in protein structure modeling, whether the goal is to obtain a broad view of potentially different structural states thermodynamically available to a protein system or to predict a single representative structure of a unique functional native state. In this paper, we focus on the latter setting, and show how approaches from evolutionary computation for effective stochastic search and multi-objective analysis can be combined to result in protein conformational search algorithms with high exploration capability. From a broad computational perspective, the contributions of this paper are on how to balance global and local search of some high-dimensional search space and how to guide the search in the presence of a noisy, inaccurate scoring function. From an application point of view, the contributions are demonstrated in the domain of template-free protein structure prediction on the primary subtask of sampling diverse low-energy decoy conformations of an amino-acid sequence. Comparison with the approach used for decoy sampling in the popular Rosetta protocol on 20 diverse protein sequences shows that the evolutionary algorithm proposed in this paper is able to access lower-energy regions with similar or better proximity to the known native structure.

51 citations


Proceedings ArticleDOI
22 Sep 2013
TL;DR: This work “breaks down” existing state-of-the-art network alignment methods and finds that a combination of the cost function of one method and the alignment strategy of another method beats the existing methods.
Abstract: Analogous to sequence alignment, network alignment (NA) can be used to transfer biological knowledge across species between conserved network regions. NA faces two algorithmic challenges: 1) Which cost function to use to capture "similarities" between nodes in different networks? 2)Which alignment strategy to use to rapidly identify "high-scoring" alignments from all possible alignments? We "break down" existing state-of-the-art methods that use both different cost functions and different alignment strategies to evaluate each combination of their cost functions and alignment strategies. We find that a combination of the cost function of one method and the alignment strategy of another method beats the existing methods. Hence, we propose this combination as a novel superior NA method. Then, since human aging is hard to study experimentally due to long lifespan, we use NA to transfer aging-related knowledge from well annotated model species to poorly annotated human. By doing so, we produce novel human aging-related knowledge, which complements currently available knowledge about aging that has been obtained mainly by sequence alignment. We demonstrate significant similarity between topological and functional properties of our novel predictions and those of known aging-related genes. We are the first to use NA to learn more about aging.This work was published as a full paper in Proceedings of ACM BCB 2013 [2] and an extended journal version was published in IEEE/ACM Transactions on Computational Biology and Bioinformatics [1].

50 citations


Proceedings ArticleDOI
22 Sep 2013
TL;DR: A free-living health monitoring system based on simple standalone smart phones, which can accurately compute walking speed, and a new method of computing human body motion to estimate gait speed from the spatio-temporal gait parameters generated by regular phone sensors.
Abstract: Detecting abnormal health is an important issue for mobile health, especially for chronic diseases. We present a free-living health monitoring system based on simple standalone smart phones, which can accurately compute walking speed. This phone app can be used to validate status of the major chronic condition, Chronic Obstructive Pulmonary Disease (COPD), by estimating gait speed of actual patients. We first show that smart phone sensors are as accurate for monitoring gait as expensive medical accelerometers. We then propose a new method of computing human body motion to estimate gait speed from the spatio-temporal gait parameters generated by regular phone sensors. The raw sensor data is processed in both time and frequency domain and pruned by a smoothing algorithm to eliminate noise. After that, eight gait parameters are selected as the input vector of a support vector regression model to estimate gait speed. For trained subjects, the overall root mean square error of absolute gait speed is We design GaitTrack, a free living health monitor which runs on Android smart phones and integrates known activity recognition and position adjustment technology. The GaitTrack system enables the phone to be carried normally for health monitoring by transforming carried spatio-temporal motion into stable human body motion with energy saving sensor control for continuous tracking. We present validation by monitoring COPD patients during timed walk tests and healthy subjects during free-living walking. We show that COPD patients can be detected by spatio-temporal motion and abnormal health status of healthy subjects can be detected by personalized trained models with accuracy >84%.

47 citations


Proceedings ArticleDOI
22 Sep 2013
TL;DR: An effective single-parameter approach, SSEtracer, is presented to automatically identify helices and β-sheets from the cryoEM three-dimensional (3D) maps at medium resolutions and a simple mathematical model to represent the β-sheet density is presented.
Abstract: Secondary structure element (SSE) identification from volumetric protein density maps is critical for de-novo backbone structure derivation in electron cryo-microscopy (cryoEM) Although multiple methods have been developed to detect SSE from the density maps, accurate detection either need use intervention or carefully adjusting various parameters It is still challenging to detect the SSE automatically and accurately from cryoEM density maps at medium resolutions (~5-10A) A detected β-sheet can be represented by either the voxels of the β-sheet density or by many piecewise polygons to compose a rough surface However, none of these is effective in capturing the global surface feature of the β-sheet We present an effective single-parameter approach, SSEtracer, to automatically identify helices and β-sheets from the cryoEM three-dimensional (3D) maps at medium resolutions More importantly, we present a simple mathematical model to represent the β-sheet density It was tested using eleven cryoEM β-sheets detected by SSEtracer The RMSE between the density and the model is 188A The mathematical model can be used for the β-strands detection from medium resolution density maps

45 citations


Proceedings ArticleDOI
22 Sep 2013
TL;DR: The design and experimentation of Cloud4SNP is presented, a novel Cloud-based bioinformatics tool for the parallel preprocessing and statistical analysis of pharmacogenomics SNP microarray data and shows good speed-up and scalability.
Abstract: Pharmacogenomics studies the impact of genetic variation of patients on drug responses and searches for correlations between gene expression or Single Nucleotide Polymorphisms (SNPs) of patient's genome and the toxicity or efficacy of a drug. SNPs data, produced by microarray platforms, need to be preprocessed and analyzed in order to find correlation between the presence/absence of SNPs and the toxicity or efficacy of a drug. Due to the large number of samples and the high resolution of instruments, the data to be analyzed can be very huge, requiring high performance computing. The paper presents the design and experimentation of Cloud4SNP, a novel Cloud-based bioinformatics tool for the parallel preprocessing and statistical analysis of pharmacogenomics SNP microarray data. Experimental evaluation shows good speed-up and scalability. Moreover, the availability on the Cloud platform allows to face in an elastic way the requirements of small as well as very large pharmacogenomics studies.

40 citations


Proceedings ArticleDOI
22 Sep 2013
TL;DR: The search results demonstrate that language use within electronic health records is sufficiently different from general use to warrant domain-specific processing, and top-performing systems each used some sort of vocabulary normalization device specific to the medical domain.
Abstract: The Text REtrieval Conference (TREC) is a series of annual workshops designed to build the infrastructure for large-scale evaluation of search systems and thus improve the state-of-the-art. Each workshop is organized around a set of "tracks", challenge problems that focus effort in particular research areas. The most recent TRECs have contained a Medical Records track whose goal is to enable semantic access to the free-text fields of electronic health records. Such access will enhance clinical care and support the secondary use of health records. The specific search task used in the track was a cohort-finding task. A search request described the criteria for inclusion in a (possible, but not actually planned) clinical study and the systems searched a set of de-identified clinical reports to identify candidates who matched the criteria. As anticipated, the search results demonstrate that language use within electronic health records is sufficiently different from general use to warrant domain-specific processing. Top-performing systems each used some sort of vocabulary normalization device specific to the medical domain to accommodate the array of abbreviations, acronyms, and other informal terminology used to designate medical procedures and findings in the records. The use of negative language is also much more prevalent in health records (e.g., patient denies pain, no fever) and thus requires appropriate handling for good search results.

34 citations


Journal ArticleDOI
31 Dec 2013
TL;DR: Work-family conflict is defined as the experience of mutually incompatible pressures that stem from work and family domains as discussed by the authors, and it has a detrimental effect on the individual, family, organizations, and society at large.
Abstract: This paper discusses the work-family conflict that forms the central construct of the work-family literature, and is defined as the experience of mutually incompatible pressures that stem from work and family domains. Juggling myriad responsibilities within the areas of work and family - two of the most important life domains for most adults - has become increasingly difficult. Consequently, the level of experienced conflict has been rising steadily in the last three decades and has a detrimental effect on the individual, family, organizations, and society at large. On the basis of construct definition, the purpose of this paper is to provide a synthesis of the antecedents and outcomes of the work-family conflict. The authors first analyze two categories of antecedents - individual differences and job/family characteristics. Furthermore, outcomes are classified as variables related to well-being, attitudes, and behaviors. By having a clearer understanding of what causes conflict between work and family roles and by being aware of the detrimental effects that conflict has on individuals and organizations, HR professionals, managers, and representatives of other institutions can work together toward developing initiatives for the better integration of work and family roles.

33 citations


Journal ArticleDOI
22 Oct 2013
TL;DR: The proposed method is shown to be capable of predicting kinase-specific phosphorylation sites on 3D structures and has been implemented as a web server which is freely accessible at http://csb.cse.yzu.edu.tw/PhosK3D/.
Abstract: Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in cellular processes. Given the high-throughput mass spectrometry-based experiments, the desire to annotate the catalytic kinases for in vivo phosphorylation sites has motivated. Thus, a variety of computational methods have been developed for performing a large-scale prediction of kinase-specific phosphorylation sites. However, most of the proposed methods solely rely on the local amino acid sequences surrounding the phosphorylation sites. An increasing number of three-dimensional structures make it possible to physically investigate the structural environment of phosphorylation sites. In this work, all of the experimental phosphorylation sites are mapped to the protein entries of Protein Data Bank by sequence identity. It resulted in a total of 4508 phosphorylation sites containing the protein three-dimensional (3D) structures. To identify phosphorylation sites on protein 3D structures, this work incorporates support vector machines (SVMs) with the information of linear motifs and spatial amino acid composition, which is determined for each kinase group by calculating the relative frequencies of 20 amino acid types within a specific radial distance from central phosphorylated amino acid residue. After the cross-validation evaluation, most of the kinase-specific models trained with the consideration of structural information outperform the models considering only the sequence information. Furthermore, the independent testing set which is not included in training set has demonstrated that the proposed method could provide a comparable performance to other popular tools. The proposed method is shown to be capable of predicting kinase-specific phosphorylation sites on 3D structures and has been implemented as a web server which is freely accessible at http://csb.cse.yzu.edu.tw/PhosK3D/ . Due to the difficulty of identifying the kinase-specific phosphorylation sites with similar sequenced motifs, this work also integrates the 3D structural information to improve the cross classifying specificity.

Proceedings Article
01 Jan 2013
TL;DR: MIST (Microbial In Silico Typer), a bioinformatics tool for rapidly generating in silico typing data (e.g. MLST, MLVA) from draft bacterial genome assemblies, is presented, allowing the analysis of existing typing methods along with novel typing schemes.
Abstract: Whole-genome sequence (WGS) data can, in principle, resolve bacterial isolates that differ by a single base pair, thus providing the highest level of discriminatory power for epidemiologic subtyping. Nonetheless, because the capability to perform whole-genome sequencing in the context of epidemiological investigations involving priority pathogens has only recently become practical, fewer isolates have WGS data available relative to traditional subtyping methods. It will be important to link these WGS data to data in traditional typing databases such as PulseNet and PubMLST in order to place them into proper historical and epidemiological context, thus enhancing investigative capabilities in response to public health events. We present MIST (Microbial In Silico Typer), a bioinformatics tool for rapidly generating in silico typing data (e.g. MLST, MLVA) from draft bacterial genome assemblies. MIST is highly customizable, allowing the analysis of existing typing methods along with novel typing schemes. Rapid in silico typing provides a link between historical typing data and WGS data, while also providing a framework for the assessment of molecular typing methods based on WGS analysis.


Proceedings ArticleDOI
22 Sep 2013
TL;DR: A hidden Markov model (HMM) is developed and compared its recognition performance against a non-sequential classifier (KNN), showing that knowledge of the sequential nature of activities during eating improves recognition accuracy.
Abstract: Advances in body sensing and mobile health technology have created new opportunities for empowering people to take a more active role in managing their health. Measurements of dietary intake are commonly used for the study and treatment of obesity. However, the most widely used tools rely upon self-report and require considerable manual effort, leading to underreporting of consumption, non-compliance, and discontinued use over the long term. We are investigating the use of wrist-worn accelerometers and gyroscopes to automatically recognize eating gestures. In order to improve recognition accuracy, we studied the sequential dependency of actions during eating. Using a set of four actions (rest, utensiling, bite, drink), we developed a hidden Markov model (HMM) and compared its recognition performance against a non-sequential classifier (KNN). Tested on a dataset of 20 meals, the KNN achieved 71.7% accuracy while the HMM achieved 84.3% accuracy, showing that knowledge of the sequential nature of activities during eating improves recognition accuracy.

Proceedings ArticleDOI
22 Sep 2013
TL;DR: This work presents several new definitions for a hypergraph clustering coefficient that pertain specifically to the biology of interacting proteins, and evaluates the biological meaning of these and previously proposed definitions and test their correlation with protein complexes.
Abstract: Modeling protein interaction data with graphs (networks) is insufficient for some common types of experimentally generated interaction data. For example, in affinity purification experiments, one protein is pulled out of the cell along with other proteins that are bound to it. This data is not intrinsically binary, so we lose information when we model it with a graph, which can only associate pairs of proteins. Hypergraphs, an extension of graphs which allows relationships among sets of arbitrary size, have been proposed to model this type of data. However, there is no consensus for appropriate measures for these "protein interaction hypernetworks" that are meaningful in both their interpretation and in their correspondence to a biological question (e.g., predicting the function of uncharacterized proteins, identifying new biological modules). The clustering coefficient is a measure commonly used in binary networks for biological insights. While multiple analogs of the clustering coefficient have been proposed for hypernetworks, the usefulness of these for generating biological hypotheses has not been established. We present several new definitions for a hypergraph clustering coefficient that pertain specifically to the biology of interacting proteins. We evaluate the biological meaning of these and previously proposed definitions in protein interaction hypernetworks and test their correlation with protein complexes. We conclude that hypergraph analysis offers important advantages over graph measures for non-binary data, and we discuss the clustering coefficient measures that perform best. Our work suggests a paradigm shift is needed to best gain insights from affinity purification assays and other non-binary data.

Journal ArticleDOI
30 Apr 2013
TL;DR: Simulation results show that depending on network configuration, a substantial increase in stability of network lifetime can be accomplished as compared to PEGASIS-TC.
Abstract: Wireless sensor Networks present a new generation of real time embedded systems with limited computation, energy and memory resources that are being used in wide variety of applications where traditional networking infrastructure is practically infeasible. Legitimate leader selection can drastically improve the lifetime of the sensor network. This paper proposes fuzzy logic methodology for leader election in PEGASIS based protocol PEGASIS-TC [1, 17] based on two descriptorsresidual energy of node and its proximity to Base Station. Simulation results show that depending on network configuration, a substantial increase in stability of network lifetime can be accomplished as compared to PEGASIS-TC.

Journal ArticleDOI
06 Jun 2013
TL;DR: In this article, the authors investigated the relationship of capital structure and financial performance of trading companies which are listed in CSE (Colombo Stock Exchange) from 2007 to 2011, and found that debt ratio is negatively correlated with all financial performance measures [Gross Profit (GP); Net Profit (NP); Return on Equity (ROE) and Earnings Per Share (EPS)].
Abstract: Capital structure choice is an important decision for a firm. It is important not only from a return maximization point of view, but also this decision has a great impact on a firm's ability to successfully operate in a competitive environment. The ability of companies to carry out their stakeholders’ needs is tightly related to capital structure. Therefore, this derivation is an important fact that we cannot omit. Capital structure in financial term means the way a firm finances their assets through the combination of equity, debt, or hybrid securities (Saad, 2010). This study investigates the relationship of capital structure and financial performance of trading companies which are listed in CSE (Colombo Stock Exchange) from 2007 to 2011. The results show that debt ratio is negatively correlated with all financial performance measures [Gross Profit (GP); Net Profit (NP); Return on Equity (ROE) and Earnings Per Share (EPS)] similarly debt-equity ratio (D/E) is negatively correlated with all financial performance measures except GP and only (D/E) ratio shows significant relationship with NP. R2 (Regression) value of financial performance ratios indicate that 36.6%; 91.6%; 36% and11.2% to the observed variability in financial performance is explained by the debt/equity and debt ratios.

Proceedings Article
05 Nov 2013
TL;DR: In this paper, the effect of hybridization on mechanical properties on kenaf and banana reinforced polyester composite (KBRP) were evaluated experimentally and the results demonstrate that hybridization play an important role for improving the mechanical properties of composites.
Abstract: Recently the use of natural fiber reinforcedPolyestercomposite in the various sectors has increased tremendously.The interest in fiber-reinforced polyester composites (FRPC) is growing rapidly due to its high performance in terms of mechanical properties, significant processing advantages, excellent chemical resistance, low cost, and low density. The development of composite materials based on the reinforcement of two or more fiber types in a matrix leads to the production of laminate composites. In the present investigation, the effect of hybridization on mechanical properties on kenaf and banana reinforced polyester composite (KBRP) were evaluated experimentally.The main aim of this paper is to review the work carried out by using kenaf and banana fiber composite. This is due to the environmental problems and health hazard possessed by the synthetic fiber during disposal and manufacturing. The reinforcement made by using the kenaf and banana fiber shows its potential to replace the glass fiber composite. Composites were fabricated using Hand lay-up technique. The results demonstrate that hybridization play an important role for improving the mechanical properties of composites. The tensile and flexural properties of hybrid composites are markedly improved as compare to un hybrid composites.. Water absorption behavior indicated that hybrid composites offer better resistance to water absorption.In addition to the mechanical properties, processing methods and application of kenaf and banana fiber composite is also discussed. This work demonstrates the potential of the hybrid natural fiber composite materials for use in a number of consumable goods.

Proceedings ArticleDOI
22 Sep 2013
TL;DR: It is shown that the use of a probabilistic graphical model can facilitate effective transfer learning between distinct healthcare data sets by parameter sharing while simultaneously allowing us to construct a network for interpretation use by domain experts and the discovery of disease relationships.
Abstract: With the recent signing of the Affordable Care Act into law, the use of electronic medical data is set to become ubiquitous in the United States. This presents an unprecedented opportunity to use population health data for the benefit of patient-centered outcomes. However, there are two major hurdles to utilizing this wealth of data. First, medical data is not centrally located but is often divided across hospital systems, health exchanges, and physician practices. Second, sharing specific or identifiable information may not be allowed. Moreover, organizations may have a vested interest in keeping their data sets private as they may have been gathered and curated at great cost. We develop an approach to allow the sharing of beneficial information while staying within the bounds of data privacy. We show that the use of a probabilistic graphical model can facilitate effective transfer learning between distinct healthcare data sets by parameter sharing while simultaneously allowing us to construct a network for interpretation use by domain experts and the discovery of disease relationships. Our method utilizes aggregate information from distinct populations to improve the estimation of patient disease risk.

Proceedings ArticleDOI
22 Sep 2013
TL;DR: A general framework for mapping back and forth between genomes is proposed, which employs a new format, MOD, to represent known variants between genomes, and a set of tools that facilitate genome manipulation and mapping.
Abstract: Next generation sequencing techniques have enabled new methods of DNA and RNA quantification. Many of these methods require a step of aligning short reads to some reference genome. If the target organism differs significantly from this reference, alignment errors can lead to significant errors in downstream analysis. Various attempts have been tried to integrate known genetic variants into the reference genome so as to construct sample-specific genomes to improve read alignments. However, many hurdles in generating and annotating such genomes remain unsolved. In this paper, we propose a general framework for mapping back and forth between genomes. It employs a new format, MOD, to represent known variants between genomes, and a set of tools that facilitate genome manipulation and mapping. We demonstrate the utility of this framework using three inbred mouse strains. We built pseudogenomes from the mm9 mouse reference genome for three highly divergent mouse strains based on MOD files and used them to map the gene annotations to these new genomes. We observe that a large fraction of genes have their positions or ranges altered. Finally, using RNA-seq and DNA-seq short reads from these strains, we demonstrate that mapping to the new genomes yields a better alignment result than mapping to the standard reference. The MOD files for the 17 mouse strains sequenced in the Wellcome Trust Sanger Institute's Mouse Genomes Project can be found at http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo The auxiliary tools (i.e. MODtools and Lapels), written in Python, are available at http://code.google.com/p/modtools/ and http://code.google.com/p/lapels/.

Proceedings Article
08 Mar 2013
TL;DR: All the algorithms applied on the data of students to predict their performance has a satisfactory performance but accuracy is more witnessed in case of C4.5 algorithm.
Abstract: Now days, the amount of data stored in educational database is increasing rapidly. These databases contain hidden information for improvement of student’s performance. Classification of data objects is a data mining and knowledge management technique used in grouping similar data objects together. There are many classification algorithms available in literature but decision tree is the most commonly used because of its ease of implementation and easier to understand compared to other classification algorithms. The ID3, C4.5 and CART decision tree algorithms has been applied on the data of students to predict their performance. In this paper, all the algorithms are explained one by one. Performance and results are compared of all algorithms and evaluation is done by already existing datasets. All the algorithms has a satisfactory performance but accuracy is more witnessed in case of C4.5 algorithm.

Proceedings ArticleDOI
22 Sep 2013
TL;DR: This work applies a rigidity-analysis tool to determine the rigid and flexible regions in protein structures, since hinges may not always lie on loops between secondary structure elements, to allow for better accuracy in determining the rotational degrees of freedom of the proteins.
Abstract: We present a geometry-based, sampling based method to explore conformational pathways in medium and large proteins which undergo large-scale conformational transitions. In a past work we developed a coarse-grained geometry-based method that was able to trace large-scale conformational motions in proteins using residues between secondary structure elements as hinges, and a simple yet effective energy function. In this work we apply a rigidity-analysis tool to determine the rigid and flexible regions in protein structures, since hinges may not always lie on loops between secondary structure elements. This method allows for better accuracy in determining the rotational degrees of freedom of the proteins. We conducted a multi-resolution search scheme, as both C-α and backbone representations are used for sampling the protein conformational paths. Characteristic conformations detected by clustering the paths are converted to full atom protein structures and minimized to detect interesting intermediate conformations that may correspond to transition states or other events. Our algorithm was able to run efficiently on a proteins of various sizes and the results agree with experimentally determined intermediate protein structures.

Journal ArticleDOI
16 Nov 2013
TL;DR: In this article, an optimal q-homotopy analysis method (Oq-HAM) is proposed to determine the convergence-control parameter of a single-stage optimization problem.
Abstract: In this paper, an optimal q-homotopy analysis method (Oq-HAM) is proposed. We present some examples to show the reliability and efficiency of the method. It is compared with the one-step optimal homotopy analysis method. The results reveal that the Oq-HAM has more accuracy to determine the convergence-control parameter than the one-step optimal HAM.

Journal ArticleDOI
23 Jun 2013
TL;DR: The Artificial Bee Colony (ABC) algorithm is a stochastic, population-based evolutionary method proposed by Karaboga in the year 2005 as discussed by the authors, which is simple and very flexible when compared to other swarm based algorithms.
Abstract: The Artificial Bee Colony (ABC) algorithm is a stochastic, population-based evolutionary method proposed by Karaboga in the year 2005. ABC algorithm is simple and very flexible when compared to other swarm based algorithms. This method has become very popular and is widely used, because of its good convergence properties. The intelligent foraging behavior of honeybee swarm has been reproduced in ABC.Numerous ABC algorithms were developed based on foraging behavior of honey bees for solving optimization, unconstrained and constrained problems. This paper attempts to provide a comprehensive survey of research on ABC. A system of comparisons and descriptions is used to designate the importance of ABC algorithm, its enhancement, hybrid approaches and applications.

Proceedings ArticleDOI
22 Sep 2013
TL;DR: The results suggest that AnnSim can provide a deeper understanding of the relatedness of concepts and can provide an explanation of potential novel patterns.
Abstract: Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and patterns between entities. Determining the relatedness (or similarity) between entities becomes a building block for graph pattern mining, e.g., identifying drug-drug relationships could depend on the similarity of the diseases (conditions) that are associated with each drug. Diverse similarity metrics have been proposed in the literature, e.g., i) string-similarity metrics; ii) path-similarity metrics; iii) topological-similarity metrics; all measure relatedness in a given taxonomy or ontology. In this paper, we consider a novel annotation similarity metric AnnSim that measures the relatedness between two entities in terms of the similarity of their annotations. We model AnnSim as a 1-to-1 maximal weighted bipartite match, and we exploit properties of existing solvers to provide an efficient solution. We empirically study the effectiveness of AnnSim on real-world datasets of genes and their GO annotations, clinical trials, and a human disease benchmark. Our results suggest that AnnSim can provide a deeper understanding of the relatedness of concepts and can provide an explanation of potential novel patterns.

Journal ArticleDOI
27 Jul 2013
TL;DR: Scrum, as well Extreme Programming (XP) and others methodologies, are classified as agile software development, and achieved through a set of values, principles and practices that differ primarily in the traditional way to develop software.
Abstract: Scrum, as well Extreme Programming (XP) and others methodologies, are classified as agile software development. According to Teles [17], this new concept of methodology has emerged in America in the 90s, in an attempt to create better systems in less time, and produced more economically than usual. These objectives are achieved through a set of values, principles and practices that differ primarily in the traditional way to develop software.

Journal ArticleDOI
16 Oct 2013
TL;DR: It is demonstrated that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.
Abstract: Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress. We describe a Web-accessible system, available online at http://biodb100.apbionet.org , for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system ( http://www.bioslax.com ), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases. Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.

Proceedings Article
01 Jan 2013
TL;DR: The statistical RqPCRAnalysis tool is user-friendly and should help biologist with no prior formation in R programming to analyze their quantitative PCR data.
Abstract: We propose the statistical RqPCRAnalysis tool for quantitative real-time PCR data analysis which includes the use of several normalization genes, biological as well as technical replicates and provides statistically validated results. This RqPCRAnalysis tool improved methods developed by Genorm and qBASE programs. The algorithm was developed in R language and is freely available. The main contributions of RqPCRAnalysis tool are: (1) determining the most stable reference genes (REF)--housekeeping genes-across biological replicates and technical replicates; (2) computing the normalization factor based on REF; (3) computing the normalized expression of the genes of interest (GOI), as well as rescaling the normalized expression across biological replicates; (4) comparing the level expression between samples across biological replicates via the test of statistical significance. In this paper we describe and demonstrate the available statistical functions for practical analysis of quantitative real-time PCR data. Our statistical RqPCRAnalysis tool is user-friendly and should help biologist with no prior formation in R programming to analyze their quantitative PCR data.

Journal ArticleDOI
12 Jun 2013
TL;DR: The aim here is to develop a methodology which monitors the database transactions on continuous basis and to make a decision whether thedatabase transactions are legitimate or suspicious by combining multiple evidences gathered.
Abstract: The information security for securing enterprise databases from internal and external attacks and violations of mutual policy is an interminable struggle. With the growing number of attacks and frauds, the organizations are finding it difficult to meet various regulatory compliance requirements such as SOX, HIPAA, and state privacy laws. The aim here is to develop a methodology which monitors the database transactions on continuous basis and to make a decision whether the database transactions are legitimate or suspicious by combining multiple evidences gathered. The suspicious transactions can then be used for forensic analysis to reconstruct the illegal activity carried out in an organization. This can be achieved by incorporating information accountability in Database Management System. Information accountability means, the information usage should be transparent so that it is possible to determine whether a use is appropriate under a given set of rules. We focus on effective information accountability of data stored in high-performance databases through database forensics which collects and analyses database transactions collected through various sources and artifacts like data cache, log files, error logs etc. having volatile or non-volatile characteristics within high performance databases. The information and multiple evidences collected are then analyzed using an Extended Dempster-Shafer theory(EDST). It combines multiple such evidences and an initial belief is computed for suspected transactions which can be further used for reconstructing the activity in database forensics process.

Proceedings ArticleDOI
22 Sep 2013
TL;DR: Major age-specific signatures at all levels are found including age- specific hypermethylation in polycomb group protein target genes and the upregulation of angiogenesis-related genes in older GBMs.
Abstract: Age is a powerful predictor of survival in glioblastoma multiforme (GBM) yet the biological basis for the difference in clinical outcome is mostly unknown Discovering genes and pathways that would explain age-specific survival difference could generate opportunities for novel therapeutics for GBM Here we have integrated gene expression, exon expression, microRNA expression, copy number alteration, SNP, whole exome sequence, and DNA methylation data sets of a cohort of GBM patients in The Cancer Genome Atlas (TCGA) project to discover age-specific signatures at the transcriptional, genetic, and epigenetic levels and validated our findings on the REMBRANDT data set We found major age-specific signatures at all levels including age-specific hypermethylation in polycomb group protein target genes and the upregulation of angiogenesis-related genes in older GBMs These age-specific differences in GBM, which are independent of molecular subtypes, may in part explain the preferential effects of anti-angiogenic agents in older GBM and pave the way to a better understanding of the unique biology and clinical behavior of older versus younger GBMs