scispace - formally typeset
Search or ask a question

Showing papers presented at "International Conference on Bioinformatics in 2015"


Journal ArticleDOI
22 Jan 2015
TL;DR: In this article, the authors apply the Technology Acceptance Model (TAM) to investigate factors that affect customers' online purchasing behavior, such as product information, price, convenience, and perceived product or service quality.
Abstract: Various studies have examined the effects of factors on online attitudes and behavior. By applying the Technology Acceptance Model, this study is focused on investigating factors that affect customers’ online purchasing behavior. In particular, this study examines i) effects of such factors as product information, price, convenience, and perceived product or service quality on perceived usefulness; ii) effects of convenience, perceived product or service quality, and desire to shop without a salesperson on perceived ease of use; iii) effects of perceived ease of use on perceived usefulness; iv) effects of perceived ease of use and usefulness on intentions to shop online; and v) effects of trust on purchase intentions. The data collected online and offline were analyzed using factor and regression analysis, and structural equation modeling. The results of this study indicate that perceived usefulness, perceived ease of use, and trust had a statistically significant effect on behavioral intention to shop on the Internet.

194 citations


Proceedings ArticleDOI
09 Sep 2015
TL;DR: This paper uses convolutional neural networks to build binary text classifiers and achieves an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset.
Abstract: Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.

118 citations


Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors reviewed the literature of the supply chain management to determine the new direction area of this emerging field, capturing increasing concern over sustainability, and focused on creating environment friendly supply chain while considering the impact of environmental factors at each stage of supply chain.
Abstract: Owing to global importance given to sustainability many forward looking organisations are becoming aware about the indispensable need for them to implement sustainable supply chain management (SSCM) practices by taking into consideration the environmental and social impacts of supply chain. The paper briefly reviewed the literature of the Supply Chain Management to determine the new direction area of this emerging field, capturing increasing concern over sustainability. Also this paper focuses on creating environment friendly supply chain while considering the impact of environmental factors at each stage of supply chain. Findings were used as a basis to develop an integrative framework of sustainable supply chain management. The paper concludes by discussing environmental initiatives and the relevance of sustainability in the development of supply chain management.

97 citations


Journal ArticleDOI
09 Dec 2015
TL;DR: A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation site prediction tools and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.yzu.edu.tw/OGTSite.
Abstract: Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.

46 citations


Proceedings ArticleDOI
09 Sep 2015
TL;DR: This work explored the accuracy and utility of identifying and removing PCR duplicates from HTS data using Super Deduper, a pre-alignment, sequence read based technique developed at the University of Idaho, which examines and uses only a small portion of each read's sequence in order to identify and remove PCR and/or optical duplicates.
Abstract: Our goal was to explore the accuracy and utility of identifying and removing PCR duplicates from HTS data using Super Deduper. Super Deduper is a pre-alignment, sequence read based technique developed at the University of Idaho, which examines and uses only a small portion of each read's sequence in order to identify and remove PCR and/or optical duplicates. Through comparisons with well-known pre- and post-alignment techniques, Super Deduper's parameters were optimized and its performance assessed. The results conclude that Super Deduper is a viable pre-alignment alternative to post-alignment techniques. Super Deduper is both independent of a reference genome and choice in alignment application, allowing for its use in a greater variety of HTS applications. Super Deduper is an open source application and can be found at https://github.com/dstreett/Super-Deduper.

43 citations


Journal ArticleDOI
22 Jan 2015
TL;DR: In this paper, the authors examine strategies intended to improve employees' morale and highlight specific actions organizations can take to enhance employee engagement and trust in the aftermath of layoffs and significant reorganizations.
Abstract: In this paper we examine strategies intended to improve employees' morale and highlight specific actions organizations can take to enhance employee engagement and trust in the aftermath of layoffs and significant reorganizations.

43 citations


Proceedings ArticleDOI
01 Jan 2015
TL;DR: Single nucleotide polymorphisms (SNPs) and variants (SNVs) are often found in regulatory regions of human genome and nucleotide substitutions in promoter and enhancer regions may affect transcription factor binding and alter gene expression regulation.
Abstract: Single nucleotide polymorphisms (SNPs) and variants (SNVs) are often found in regulatory regions of human genome. Nucleotide substitutions in promoter and enhancer regions may affect transcription factor (TF) binding and alter gene expression regulation. Nowadays binding patterns are known for hundreds of human TFs. Thus one can assess possible functional effects of allele variations or mutations in TF binding sites using

38 citations


Proceedings Article
01 Jan 2015
TL;DR: Steganography plays an important role in Information Security, the art and science of writing hidden messages in such a way that no one apart from the sender and intended recipient even realizes there is a hidden message.
Abstract: The growth of high speed computer networks and that of the Internet, in particular, has increased the ease of Information Communication. Ironically, the cause for the development is also of the apprehension - use of digital formatted data. In comparison with Analog media, Digital media offers several distinct advantages such as high quality, easy editing, high fidelity copying, compression etc. But this type advancement in the field of data communication in other sense has hiked the fear of getting the data snooped at the time of sending it from the sender to the receiver. So, Information Security is becoming an inseparable part of Data Communication. In order to address this Information Security, Steganography plays an important role. Steganography is the art and science of writing hidden messages in such a way that no one apart from the sender and intended recipient even realizes there is a hidden message.

33 citations


Journal ArticleDOI
01 Jan 2015
TL;DR: A simple and effective approach to improve the accuracy of multiple sequence alignment by using a natural measure to estimate the similarity of the input sequences, and based on this measure, the aligned sequences are aligned differently.
Abstract: This paper introduces a simple and effective approach to improve the accuracy of multiple sequence alignment. We use a natural measure to estimate the similarity of the input sequences, and based on this measure, we align the input sequences differently. For example, for inputs with high similarity, we consider the whole sequences and align them globally, while for those with moderately low similarity, we may ignore the flank regions and align them locally. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with about one dozen leading alignment tools on three benchmark alignment databases, and GLProbs’s alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic trees construction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging.

32 citations


Journal ArticleDOI
10 May 2015
TL;DR: A framework in which to consider penalized regression approaches to variable selection for causal effects is described, which leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as Penalized regression used.
Abstract: A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data, and imputation are drawn. A difference least absolute shrinkage and selection operator algorithm is defined, along with its multiple imputation analogs. The procedures are illustrated using a well-known right-heart catheterization dataset. Copyright © 2015 John Wiley & Sons, Ltd.

31 citations


Proceedings ArticleDOI
09 Sep 2015
TL;DR: This paper proposes a framework to automatically identify and classify regions of bacterial colony images and correspond them across different images from different contexts, and demonstrates that this method outperforms other classical methods on segmentation and classification.
Abstract: Bacterial image segmentation and classification is an important problem because bacterial appearance can vary dramatically based on environmental conditions. Further, newly isolated species may exhibit phenotypes previously unseen. Conventionally, biologists identify bacteria using colony morphology, biochemical properties, or molecular phylogenetic approaches. However, these phylogenetic classification approaches do not provide predictive information about the colony morphology that is expected to result from the growth of these bacteria on agar, or how these morphological phenotypes might vary in the presence of other bacterial species. In this paper, we propose a framework to automatically identify and classify regions of bacterial colony images and correspond them across different images from different contexts. Importantly, this approach does not require prior knowledge of species' appearances. Rather, our method assumes that images contain one or more bacteria from a pool of bacteria, and learns morphological features relevant to distinguishing between bacteria in this pool. Our method first segments the image into regions covering the bacterial colonies, agar, plate, and various border artefacts. To achieve this, we use an unsupervised deep learning technique, Convolutional Deep Belief Network (CDBN). This technique provides a deep representation of small image patches. Using this high-level representation instead of raw pixel intensities, we train a support vector machine (SVM). The trained SVM accurately classifies foreground and background patches. Once the foreground patches are identified, we train a supervised deep learning method, Convolutional Neural Network (CNN), that predicts which bacterial colonies from the pool occur in a query image. Experimental results demonstrate that our method outperforms other classical methods on segmentation and classification.

Journal ArticleDOI
30 Jul 2015
TL;DR: In this paper, the authors present information on the prevalence of unethical behavior, antecedents of unethical behaviour, organizational environment, cognitive moral development, and trends of ethical behavior over a 3 year span.
Abstract: A substantial body of research reveals that unethical behavior continues to be a concern in the workplace. This article presents information on the prevalence of unethical behavior, antecedents of unethical behavior, the organizational environment, cognitive moral development, and trends of unethical behavior over a 3 year span. Findings of earlier studies generally agree that unethical behavior has a negative effect in the workplace. The greatest research effort on this issue has been to continue to conduct studies within organizations, identifying if the issue of unethical behavior is improving or stagnant.

Proceedings Article
01 Jan 2015
TL;DR: The E-commerce market was valued at INR 81,525 crores by the end of December 2014 registering a growth of 53% over the last year and is expected to grow at the rate of 33% during 2015 to cross INR 1 Lakh Crores by end of year 2015.
Abstract: India is one of the fastest growing economies having a growing middle class with penetration of consumer goods and technology. It has one of the largest and fastest growing internet user's populations. It hadinternet population of 190 million active users as of June 2014 which is third largest after China and US and It is estimated that India would have second largest internet population of over 500 million users as of 2018.1There has been an increase in the level of income, priority toward education and changing lifestyles. This has opened an abundance of investment and growth opportunity of E-Commerce in India. Many websites have been launched in India over the past few years facilitating thesaleof almost everything. The E-Commerce market was valued at INR 81,525 crores by the end of December 2014 registering a growth of 53% over the last year and is expected to grow at the rate of 33% during 2015 to cross INR 1 Lakh Crores by end of year 2015.2 However, In order to sustain E-Commerce growth in long-term, structural reforms are required to take place.

Proceedings ArticleDOI
09 Sep 2015
TL;DR: Two methods for constructing a network of features to be used by a Bayesian Network Augmented Naïve Bayes (BAN) classifier, in datasets of aging-related genes where Gene Ontology terms are used as hierarchically related predictive features are proposed.
Abstract: In the context of the classification task of data mining or machine learning, hierarchical feature selection methods exploit hierarchical relationships among features in order to select a subset of features without hierarchical redundancy. Hierarchical feature selection is a new research area in classification research, since nearly all feature selection methods ignore hierarchical relationships among features. This paper proposes two methods for constructing a network of features to be used by a Bayesian Network Augmented Naive Bayes (BAN) classifier, in datasets of aging-related genes where Gene Ontology (GO) terms are used as hierarchically related predictive features. One of the BAN network construction method relies on a hierarchical feature selection method to detect and remove hierarchical redundancies among features (GO terms); whilst the other BAN network construction method simply uses a conventional, flat feature selection method to select features, without removing the hierarchical redundancies associated with the GO. Both BAN network construction methods may create new edges among nodes (features) in the BAN network that did not exist in the original GO DAG (Directed Acyclic Graph), in order to preserve the generalization-specialization (ancestor-descendant) relationship among selected features. Experiments comparing these two BAN network construction methods, when using two different hierarchical feature selection methods and one at feature selection method, have shown that the best results are obtained by the BAN network construction method using one type of hierarchical feature selection method, i.e., select Hierarchical Information-Preserving features (HIP).

Proceedings ArticleDOI
09 Sep 2015
TL;DR: An easy-to-use system, built on top of Apache Spark, for large-scale interactive analysis and prediction of microRNA targets, that has superior prediction performance over the state-of-the-art protocols.
Abstract: Background MicroRNAs are small non-coding endogenous RNAs that are responsible for post-transcriptional regulation of genes. Given that large numbers of human genes are targeted by microRNAs, understanding the precise mechanism of microRNA action and accurately mapping their targets is of paramount importance; this will uncover the role of microRNAs in development, differentiation, and disease pathogenesis. However, the current state-of-the-art computational methods for microRNA target prediction suffer from high false-positive rates to be useful in practice. Results In this paper, we develop a suite of models for microRNA target prediction, under the banner Avishkar, that have superior prediction performance over the state-of-the-art protocols. Specifically, our final model developed in this paper achieves an average true positive rate of more than 75%, when keeping the false positive rate of 20%, for non-canonical microRNA target sites in humans. This is an improvement of over 150% in the true positive rate for non-canonical sites, over the best competitive protocol. We are able to achieve such superior performance by representing the thermodynamic and sequence profiles of microRNA-mRNA interaction as curves, coming up with a novel metric of seed enrichment to model seed matches as well as all possible non-canonical matches, and learning an ensemble of microRNA family-specific non-linear SVM classifiers. We provide an easy-to-use system, built on top of Apache Spark, for large-scale interactive analysis and prediction of microRNA targets. All operations in our system, namely candidate set generation, feature generation and transformation, training, prediction and computing performance metrics are fully distributed and are scalable. Availability All source code and sample data is available at https://bitbucket.org/cellsandmachines/avishkar. We also provide scalable implementations of kernel SVM using Apache Spark, which can be used to solve large-scale non-linear binary classification problems at https://bitbucket.org/cellsandmachines/kernelsvmspark.

Proceedings ArticleDOI
09 Sep 2015
TL;DR: A logical exposition of Dirac 3-polarizer experiment is presented in bipolar quantum geometry (BQG) and potential impact of the findings on computational biology is discussed.
Abstract: A logical exposition of Dirac 3-polarizer experiment is presented in bipolar quantum geometry (BQG). Potential impact of the findings on computational biology is discussed.

Journal ArticleDOI
20 Apr 2015
TL;DR: This study implements an enhanced framework for e-learning system based on cloud that can be applied everywhere where there is a need for intensive teaching and learning in higher education institutions to improve their educational process.
Abstract: "Cloud Computing (CC) is an extension of a paradigm where the capabilities of applications are exposed as services. CC has spread dramatically because of the different features that make it the target of most cloud service providers. In cloud computing the data stored in the storage area and accessed by the organizations based on their demand from the internet. Cloud computing with virtualization is removing any additional operational expense or investing in assets capital by automating the requested services. Currently, cloud computing has become a technology with the potential for coping with the problems of e-learning. This study, therefore, attempts to explore the possible effects and measures of how educational institutions can benefit from CC. For that purpose, this study implements an enhanced framework for e-learning system based on cloud. This enhanced framework can be applied everywhere where there is a need for intensive teaching and learning in higher education institutions to improve their educational process. The applied case study findings of the adoption of the enhanced framework equate the study expectations, where the student’s satisfaction significantly increased compared with the existing system."

Proceedings ArticleDOI
09 Sep 2015
TL;DR: A novel framework for biomedical text mining based on a learning multi-agent system that comprises of several software agents that is able to appropriately learn the sentiment score related to specific keywords by parallel and distributed analysis of the documents by multiple software agents.
Abstract: Due to the expanding growth of information in the biomedical literature and biomedical databases, researchers and practitioners in the biomedical field require efficient methods of handling and extracting useful information. We present a novel framework for biomedical text mining based on a learning multi-agent system. Our distributed system comprises of several software agents, where each agent uses a reinforcement learning method to update the sentiment of a relevant text from a particular set of research articles related to specific keywords. Our system was tested on the biomedical research articles from PubMed, where the goal of each agent is to accrue utility by correctly determining the relevant information that is communicated with other agents. Our results tested on the abstracts collected from PubMed related to muscular atrophy, Alzheimer's disease, and diabetes show that our system is able to appropriately learn the sentiment score related to specific keywords by parallel and distributed analysis of the documents by multiple software agents.

Proceedings ArticleDOI
09 Sep 2015
TL;DR: The ORBM model is the first ontology-based deep learning approach in health informatics for human behavior prediction, and Experiments conducted on both real and synthetic data from health social networks have shown the tremendous effectiveness of the approach.
Abstract: Human behavior prediction is a key component to studying the spread of wellness and healthy behavior in a social network. In this paper, we introduce an ontology-based Restricted Boltzmann Machine (ORBM) model for human behavior prediction in health social networks. We first propose a bottom-up algorithm to learn the user representation from ontologies. Then the user representation is used to incorporate self-motivation, social influences, and environmental events together in a human behavior prediction model, which extends a well-known deep learning method, Restricted Boltzmann Machines (RBMs), so that the interactions among the behavior determinants are naturally simulated through parameters. To our best knowledge, the ORBM model is the first ontology-based deep learning approach in health informatics for human behavior prediction. Experiments conducted on both real and synthetic data from health social networks have shown the tremendous effectiveness of our approach compared with conventional methods.

Journal ArticleDOI
31 May 2015
TL;DR: The study shows that the efficiency varies among algorithms, helps to suggest which one of them ought to be used to solve a specific variant of the shortest path problem.
Abstract: While technological revolution has active role to the increase of computer information, growing computational capabilities of devices, and raise the level of knowledge abilities, and skills. Increase developments in science and technology. In graph used the shortest path algorithms for solving the shortest path problem. The shortest path can be single pair shortest path problem or all pairs shortest path problem. This paper discuss briefly the shortest path algorithms such as Dijkstra's algorithm, Bellman-Ford algorithm,Floyd- Warshall algorithm, and johnson's algorithm. It describes the previous algorithms for solving the shortest path problem. The goal of this paper is to investigate and comparison the impacts of different shortest path algorithms. The study shows that the efficiency varies among algorithms, helps to suggest which one of them ought to be used to solve a specific variant of the shortest path problem.

Proceedings ArticleDOI
09 Sep 2015
TL;DR: Nine pipelines are constructed consisting of nine spliced aligners and one quantifier, and it is found that among various pipelines, FalseExpNum and FalseFcNum are correlated and may be used to assess the performance of gene expression estimation for all RNA-seq datasets.
Abstract: While numerous RNA-seq data analysis pipelines are available, research has shown that the choice of pipeline influences the results of differentially expressed gene detection and gene expression estimation. Gene expression estimation is a key step in RNA-seq data analysis, since the accuracy of gene expression estimates profoundly affects the subsequent analysis. Generally, gene expression estimation involves sequence alignment and quantification, and accurate gene expression estimation requires accurate alignment. However, the impact of aligners on gene expression estimation remains unclear. We address this need by constructing nine pipelines consisting of nine spliced aligners and one quantifier. We then use simulated data to investigate the impact of aligners on gene expression estimation. To evaluate alignment, we introduce three alignment performance metrics, (1) the percentage of reads aligned, (2) the percentage of reads aligned with zero mismatch (ZeroMismatchPercentage), and (3) the percentage of reads aligned with at most one mismatch (ZeroOneMismatchPercentage). We then evaluate the impact of alignment performance on gene expression estimation using three metrics, (1) gene detection accuracy, (2) the number of genes falsely quantified (FalseExpNum), and (3) the number of genes with falsely estimated fold changes (FalseFcNum). We found that among various pipelines, FalseExpNum and FalseFcNum are correlated. Moreover, FalseExpNum is linearly correlated with the percentage of reads aligned and ZeroMismatchPercentage, and FalseFcNum is linearly correlated with ZeroMismatchPercentage. Because of this correlation, the percentage of reads aligned and ZeroMismatchPercentage may be used to assess the performance of gene expression estimation for all RNA-seq datasets.

Journal ArticleDOI
25 Sep 2015
TL;DR: A new architecture and modeling has been proposed for RSA public key algorithm, the suggested system uses 1024-bit RSA encryption/decryption for restricted system that uses the multiply and square algorithm to perform modular operation.
Abstract: RSA cryptographic algorithm used to encrypt and decrypt the messages to send it over the secure transmission channel like internet. The RSA algorithm is a secure, high quality, public key algorithm. In this paper, a new architecture and modeling has been proposed for RSA public key algorithm, the suggested system uses 1024-bit RSA encryption/decryption for restricted system. The system uses the multiply and square algorithm to perform modular operation. The design has been described by VHDL and simulated by using Xilinx ISE 12.2 tool. The architectures have been implemented on reconfigurable platforms FPGAs. Accomplishment when implemented on Xilinx_Spartan3 (device XC3S50, package PG208, speed -4) which confirms that the proposed architectures have minimum hardware resource, where only 29% of the chip resources are used for RSA algorithm design with realizable operating clock frequency of 68.573 MHz.

Proceedings ArticleDOI
09 Sep 2015
TL;DR: An integer programming solution to the Persistent-Phylogeny Problem is developed; empirically explore its efficiency; and the utility of using fast algorithms that recognize galled trees, to recognize persistent phylogeny is explored.
Abstract: The Persistent-Phylogeny Model is an extension of the widely studied Perfect-Phylogeny Model, encompassing a broader range of evolutionary phenomena. Biological and algorithmic questions concerning persistent phylogeny have been intensely investigated in recent years. In this paper, we explore two alternative approaches to the persistent-phylogeny problem that grow out of our previous work on perfect phylogeny, and on galled trees. We develop an integer programming solution to the Persistent-Phylogeny Problem; empirically explore its efficiency; and empirically explore the utility of using fast algorithms that recognize galled trees, to recognize persistent phylogeny. The empirical results identify parameter ranges where persistent phylogeny are galled trees with high frequency, and show that the integer programming approach can efficiently identify persistent phylogeny of much larger size than has been previously reported.

Proceedings ArticleDOI
09 Sep 2015
TL;DR: This study presents the exact computation of the expected value and variance of the number of occurrences of key motifs in probabilistic networks, as well as a specialized sampling approximate method for computing the variance for very large networks.
Abstract: Studying the distribution of motifs in biological networks provides valuable insights about the key functions of these networks. Finding motifs in networks is however a computationally challenging task. This task is further complicated by the fact that inherently, biological networks have uncertain topologies. Such uncertainty is often described using probabilistic network models. In this study we tackle this challenge. We present the exact computation of the expected value and variance of the number of occurrences of key motifs in probabilistic networks, as well as a specialized sampling approximate method for computing the variance for very large networks. Our method is generic, and easily extends to arbitrary motif topologies. Our experiments demonstrate that our method scales to large protein interaction networks as well as synthetically generated networks with different connectivity patterns. Using our method, we identify over-represented motifs in protein-protein interaction networks of five different organisms, as well as in human transcription regulatory networks of different human cells with different lineages.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: A new classification method based on L2-regularization of group means and the pooled covariance matrix and accompany it by an efficient algorithm for its computation is proposed.
Abstract: This paper is focused on regularized versions of classification analysis and their computation for highdimensional data. A variety of regularized classification methods has been proposed and we critically discuss their computational aspects. We formulate several new algorithms for regularized linear discriminant analysis, which exploits a regularized covariance matrix estimator towards a regular target matrix. Numerical linear algebra considerations are used to propose tailor-made algorithms for specific choices of the target matrix. Further, we arrive at proposing a new classification method based on L2-regularization of group means and the pooled covariance matrix and accompany it by an efficient algorithm for its computation.

Proceedings Article
01 Jan 2015
TL;DR: In the country of India, around 40% of the people lack access to even basic financial services like savings, credit and insurance facilities as discussed by the authors, and India is the second only to China in the number of people excluded from financial facilities.
Abstract: Financial inclusion has been a buzzword for the policymakers and governments for a long time. Attempts have been made by the policymakers and financial institutions to bring large sections of the rural population within the banking system having realized that financial inclusion is the essence of sustainable economic growth and development in a country like India. Inclusive growth becomes impossible without financial inclusion. Financial inclusion is also must for the economic development of the country. Without Financial Inclusion we cannot think of economic development because a large chunk of total population remains outside the growth process. Though our country's economy is growing at a one digit, still the growth is not inclusive with the economic condition of the people in rural areas worsening further. One of the typical reasons for poverty is being financially excluded. Though there are few people who are enjoying all kinds of services from savings to net banking, but still in our country around 40% of people lack access to even basic financial services like savings, credit and insurance facilities. India is the second only to China in the number of people excluded from financial facilities. Even after 68 years of independence, around ten crore households are not connected with banking.

Proceedings ArticleDOI
09 Sep 2015
TL;DR: The experimental High Throughput Sequencing (expHTS) pipeline mitigates these bottlenecks by streaming external application data, speeding analysis, and reducing storage space by eliminating intermediate, un-analyzed files.
Abstract: Analysis time and storage space are critical bottlenecks when analyzing multi-sample, comparative high throughput sequence experiments. The experimental High Throughput Sequencing (expHTS) pipeline mitigates these bottlenecks by streaming external application data, speeding analysis, and reducing storage space by eliminating intermediate, un-analyzed files. This creates faster run times, less storage space on the machine, and improves uniformity of sample analysis.

Journal ArticleDOI
17 Mar 2015
TL;DR: In this article, the authors discuss the degree of centralization and decentralization of IT authority and the ability to standardize. But they do not address the impact of standardization on local needs.
Abstract: Information Technology (IT) governance deals with how decision-making authority concerning IT is distributed across the firm. Based upon that distribution of authority, different behaviors and consequent decisions can be observed. A fundamental question is the degree of centralization and decentralization of that authority and the ability to standardize. Standardization often has benefits, but there can be negative repercussions if the standard does not take into account localized needs. The dominant governance mechanism has changed as technology and available tradeoffs have changed.

Journal ArticleDOI
22 Jan 2015
TL;DR: In this paper, the authors report the use of social media tools for virtual teaming for a large Vietnamese bank, Navibank, in order to improve communication and trust relationships among virtual project teams.
Abstract: In recent years, social networking sites have witnessed a significant growth in popularity and membership. These social networking sites like Facebook or LinkedIn use tools that also appeal to businesses. One important area of business application of social networking is to improve communication and trust relationships among virtual project teams (Sarker, Ahuja & Kirkeby, 2011; Anantatmula & Thomas, 2010). In this paper, the authors report the use of social media tools for virtual teaming for a large Vietnamese bank, Navibank. The paper delves into the strategic initiative and expansion into global markets using virtual project teams that communicated mostly through social media tools and services. Navibank (hereafter known as the Bank) is one of the top ten largest commercial banks in Vietnam. In a large project of opening ten branches of the Bank in Cambodia by the end of 2015, social media tools like Skype, e-mail, Facebook, instant messaging and video conference are being used by the Bank teams in Vietnam to communicate and work effectively with the new teams that are being formed in Cambodia.

Proceedings ArticleDOI
09 Sep 2015
TL;DR: This work considers a Bayesian hierarchical model for feature selection in which a prior describes the identity of feature sets as well as their underlying class-conditional distribution under an independence assumption, and results in optimal Bayesian feature filtering.
Abstract: Recent work proposes a Bayesian hierarchical model for feature selection in which a prior describes the identity of feature sets as well as their underlying class-conditional distribution. In this work, we consider this model under an independence assumption on both feature identities, and the underlying class-conditional distribution of each feature. This framework results in optimal Bayesian feature filtering. Closed form solutions, which are applicable to high dimensional data with low computation cost, can be found. In addition, this model may be used to provide feedback on the quality of any feature set via closed-form estimators of the expected number of good features missed and the expected number of bad features selected. Synthetic data simulations depict outstanding performance of the optimal Bayesian feature filter relative to other popular feature selection methods. We also observe robustness with respect to assumptions in the model, particularly the independence assumption, and robustness under non-informative priors on the underlying class-conditional distributions. Furthermore, application of the optimal Bayesian feature filter on gene expression microarray datasets provides a gene list in which markers with known links to the cancer in question are highly ranked.