scispace - formally typeset
Search or ask a question

Showing papers by "Svetha Venkatesh published in 2019"


Proceedings ArticleDOI
01 Jan 2019
TL;DR: The proposed memory-augmented autoencoder called MemAE is free of assumptions on the data type and thus general to be applied to different tasks and proves the excellent generalization and high effectiveness of the proposed MemAE.
Abstract: Deep autoencoder has been extensively used for anomaly detection. Training on the normal data, the autoencoder is expected to produce higher reconstruction error for the abnormal inputs than the normal ones, which is adopted as a criterion for identifying anomalies. However, this assumption does not always hold in practice. It has been observed that sometimes the autoencoder "generalizes" so well that it can also reconstruct anomalies well, leading to the miss detection of anomalies. To mitigate this drawback for autoencoder based anomaly detector, we propose to augment the autoencoder with a memory module and develop an improved autoencoder called memory-augmented autoencoder, i.e. MemAE. Given an input, MemAE firstly obtains the encoding from the encoder and then uses it as a query to retrieve the most relevant memory items for reconstruction. At the training stage, the memory contents are updated and are encouraged to represent the prototypical elements of the normal data. At the test stage, the learned memory will be fixed, and the reconstruction is obtained from a few selected memory records of the normal data. The reconstruction will thus tend to be close to a normal sample. Thus the reconstructed errors on anomalies will be strengthened for anomaly detection. MemAE is free of assumptions on the data type and thus general to be applied to different tasks. Experiments on various datasets prove the excellent generalization and high effectiveness of the proposed MemAE.

888 citations


Journal ArticleDOI
06 Sep 2019
TL;DR: Four opportunities for research directed toward clinical relevance are identified: exploring intermediate outcomes and underlying disease mechanisms; focusing on purposes that are likely to be used in clinical practice; anticipating quality and safety barriers to adoption; and exploring the potential for digital personalized medicine arising from the integration of digital phenotyping and digital interventions.
Abstract: The use of data generated passively by personal electronic devices, such as smartphones, to measure human function in health and disease has generated significant research interest. Particularly in psychiatry, objective, continuous quantitation using patients’ own devices may result in clinically useful markers that can be used to refine diagnostic processes, tailor treatment choices, improve condition monitoring for actionable outcomes, such as early signs of relapse, and develop new intervention models. If a principal goal for digital phenotyping is clinical improvement, research needs to attend now to factors that will help or hinder future clinical adoption. We identify four opportunities for research directed toward this goal: exploring intermediate outcomes and underlying disease mechanisms; focusing on purposes that are likely to be used in clinical practice; anticipating quality and safety barriers to adoption; and exploring the potential for digital personalized medicine arising from the integration of digital phenotyping and digital interventions. Clinical relevance also means explicitly addressing consumer needs, preferences, and acceptability as the ultimate users of digital phenotyping interventions. There is a risk that, without such considerations, the potential benefits of digital phenotyping are delayed or not realized because approaches that are feasible for application in healthcare, and the evidence required to support clinical commissioning, are not developed. Practical steps to accelerate this research agenda include the further development of digital phenotyping technology platforms focusing on scalability and equity, establishing shared data repositories and common data standards, and fostering multidisciplinary collaborations between clinical stakeholders (including patients), computer scientists, and researchers.

164 citations


Proceedings ArticleDOI
01 Jan 2019
TL;DR: Wang et al. as mentioned in this paper proposed a new method to model the normal patterns of human movements in surveillance video for anomaly detection using dynamic skeleton features, which decomposes the skeletal movements into two sub-components: global body movement and local body posture.
Abstract: Appearance features have been widely used in video anomaly detection even though they contain complex entangled factors. We propose a new method to model the normal patterns of human movements in surveillance video for anomaly detection using dynamic skeleton features. We decompose the skeletal movements into two sub-components: global body movement and local body posture. We model the dynamics and interaction of the coupled features in our novel Message-Passing Encoder-Decoder Recurrent Network. We observed that the decoupled features collaboratively interact in our spatio-temporal model to accurately identify human-related irregular events from surveillance video sequences. Compared to traditional appearance-based models, our method achieves superior outlier detection performance. Our model also offers “open-box” examination and decision explanation made possible by the semantically understandable features and a network architecture supporting interpretability.

147 citations


Posted Content
TL;DR: In this article, a memory-augmented autoencoder (MemAE) is proposed to improve the performance of anomaly detection by augmenting the autoencoders with a memory module.
Abstract: Deep autoencoder has been extensively used for anomaly detection. Training on the normal data, the autoencoder is expected to produce higher reconstruction error for the abnormal inputs than the normal ones, which is adopted as a criterion for identifying anomalies. However, this assumption does not always hold in practice. It has been observed that sometimes the autoencoder "generalizes" so well that it can also reconstruct anomalies well, leading to the miss detection of anomalies. To mitigate this drawback for autoencoder based anomaly detector, we propose to augment the autoencoder with a memory module and develop an improved autoencoder called memory-augmented autoencoder, i.e. MemAE. Given an input, MemAE firstly obtains the encoding from the encoder and then uses it as a query to retrieve the most relevant memory items for reconstruction. At the training stage, the memory contents are updated and are encouraged to represent the prototypical elements of the normal data. At the test stage, the learned memory will be fixed, and the reconstruction is obtained from a few selected memory records of the normal data. The reconstruction will thus tend to be close to a normal sample. Thus the reconstructed errors on anomalies will be strengthened for anomaly detection. MemAE is free of assumptions on the data type and thus general to be applied to different tasks. Experiments on various datasets prove the excellent generalization and high effectiveness of the proposed MemAE.

132 citations


Proceedings ArticleDOI
25 Jul 2019
TL;DR: This work proposes Graph Transformation Policy Network (GTPN), a novel generic method that combines the strengths of graph neural networks and reinforcement learning to learn reactions directly from data with minimal chemical knowledge.
Abstract: We address a fundamental problem in chemistry known as chemical reaction product prediction. Our main insight is that the input reactant and reagent molecules can be jointly represented as a graph, and the process of generating product molecules from reactant molecules can be formulated as a sequence of graph transformations. To this end, we propose Graph Transformation Policy Network (GTPN) - a novel generic method that combines the strengths of graph neural networks and reinforcement learning to learn reactions directly from data with minimal chemical knowledge. Compared to previous methods, GTPN has some appealing properties such as: end-to-end learning, and making no assumption about the length or the order of graph transformations. In order to guide model search through the complex discrete space of sets of bond changes effectively, we extend the standard policy gradient loss by adding useful constraints. Evaluation results show that GTPN improves the top-1 accuracy over the current state-of-the-art method by about 3% on the large USPTO dataset.

125 citations


Posted Content
TL;DR: A new method to model the normal patterns of human movements in surveillance video for anomaly detection using dynamic skeleton features using the novel Message-Passing Encoder-Decoder Recurrent Network.
Abstract: Appearance features have been widely used in video anomaly detection even though they contain complex entangled factors. We propose a new method to model the normal patterns of human movements in surveillance video for anomaly detection using dynamic skeleton features. We decompose the skeletal movements into two sub-components: global body movement and local body posture. We model the dynamics and interaction of the coupled features in our novel Message-Passing Encoder-Decoder Recurrent Network. We observed that the decoupled features collaboratively interact in our spatio-temporal model to accurately identify human-related irregular events from surveillance video sequences. Compared to traditional appearance-based models, our method achieves superior outlier detection performance. Our model also offers "open-box" examination and decision explanation made possible by the semantically understandable features and a network architecture supporting interpretability.

118 citations


Posted Content
TL;DR: It is shown that discriminators trained on discrete datasets with the original GAN loss have poor generalization capability and do not approximate the theoretically optimal discriminator, so a zero-centered gradient penalty is proposed for improving the generalization of the discriminator by pushing it toward the optimal discriminators.
Abstract: Generative Adversarial Networks (GANs) are one of the most popular tools for learning complex high dimensional distributions. However, generalization properties of GANs have not been well understood. In this paper, we analyze the generalization of GANs in practical settings. We show that discriminators trained on discrete datasets with the original GAN loss have poor generalization capability and do not approximate the theoretically optimal discriminator. We propose a zero-centered gradient penalty for improving the generalization of the discriminator by pushing it toward the optimal discriminator. The penalty guarantees the generalization and convergence of GANs. Experiments on synthetic and large scale datasets verify our theoretical analysis.

69 citations


Proceedings Article
01 Jan 2019
TL;DR: This article proposed a zero-centered gradient penalty for improving the generalization of the discriminator by pushing it toward the optimal discriminator, which guarantees generalization and convergence of GANs.
Abstract: Generative Adversarial Networks (GANs) are one of the most popular tools for learning complex high dimensional distributions. However, generalization properties of GANs have not been well understood. In this paper, we analyze the generalization of GANs in practical settings. We show that discriminators trained on discrete datasets with the original GAN loss have poor generalization capability and do not approximate the theoretically optimal discriminator. We propose a zero-centered gradient penalty for improving the generalization of the discriminator by pushing it toward the optimal discriminator. The penalty guarantees the generalization and convergence of GANs. Experiments on synthetic and large scale datasets verify our theoretical analysis.

67 citations


Posted ContentDOI
28 Jun 2019-bioRxiv
TL;DR: GraphDTA as mentioned in this paper uses graph convolutional networks to learn drug-target binding affinity, which can not only predict the affinity better than non-deep learning models, but also outperform competing deep learning approaches.
Abstract: While the development of new drugs is costly, time consuming, and often accompanied with safety issues, drug repurposing, where old drugs with established safety are used for medical conditions other than originally developed, is an attractive alternative. Then, how the old drugs work on new targets becomes a crucial part of drug repurposing and gains much of interest. Several statistical and machine learning models have been proposed to estimate drug-target binding affinity and deep learning approaches have been shown to be among state-of-the-art methods. However, drugs and targets in these models were commonly represented in 1D strings, regardless the fact that molecules are by nature formed by the chemical bonding of atoms. In this work, we propose GraphDTA to capture the structural information of drugs, possibly enhancing the predictive power of the affinity. In particular, unlike competing methods, drugs are represented as graphs and graph convolutional networks are used to learn drug-target binding affinity. We trial our method on two benchmark drug-target binding affinity datasets and compare the performance with state-of-the-art models in the field. The results show that our proposed method can not only predict the affinity better than non-deep learning models, but also outperform competing deep learning approaches. This demonstrates the practical advantages of graph-based representation for molecules in providing accurate prediction of drug-target binding affinity. The application may also include any recommendation systems where either or both of the user- and product-like sides can be represented in graphs.

53 citations


Journal ArticleDOI
TL;DR: A novel transfer learning method for Bayesian optimization where the knowledge from an already completed source optimization task is leverage for the optimization of a target task and the theoretical results show that the proposed method converges faster than a generic no-transfer Bayesian optimized method.
Abstract: Experimental optimization is prevalent in many areas of artificial intelligence including machine learning. Conventional methods like grid search and random search can be computationally demanding. Over the recent years, Bayesian optimization has emerged as an efficient technique for global optimization of black-box functions. However, a generic Bayesian optimization algorithm suffers from a “cold start” problem. It may struggle to find promising locations in the initial stages. We propose a novel transfer learning method for Bayesian optimization where we leverage the knowledge from an already completed source optimization task for the optimization of a target task. Assuming both the source and target functions lie in some proximity to each other, we model source data as noisy observations of the target function. The level of noise models the proximity or relatedness between the tasks. We provide a mechanism to compute the noise level from the data to automatically adjust for different relatedness between the source and target tasks. We then analyse the convergence properties of the proposed method using two popular acquisition functions. Our theoretical results show that the proposed method converges faster than a generic no-transfer Bayesian optimization. We demonstrate the effectiveness of our method empirically on the tasks of tuning the hyperparameters of three different machine learning algorithms. In all the experiments, our method outperforms state-of-the-art transfer learning and no-transfer Bayesian optimization methods.

34 citations


Posted Content
TL;DR: This work proposes a new method that formulates the problem as a multi-armed bandit problem, wherein each category corresponds to an arm with its reward distribution centered around the optimum of the objective function in continuous variables.
Abstract: Many real-world functions are defined over both categorical and category-specific continuous variables and thus cannot be optimized by traditional Bayesian optimization (BO) methods. To optimize such functions, we propose a new method that formulates the problem as a multi-armed bandit problem, wherein each category corresponds to an arm with its reward distribution centered around the optimum of the objective function in continuous variables. Our goal is to identify the best arm and the maximizer of the corresponding continuous function simultaneously. Our algorithm uses a Thompson sampling scheme that helps connecting both multi-arm bandit and BO in a unified framework. We extend our method to batch BO to allow parallel optimization when multiple resources are available. We theoretically analyze our method for convergence and prove sub-linear regret bounds. We perform a variety of experiments: optimization of several benchmark functions, hyper-parameter tuning of a neural network, and automatic selection of the best machine learning model along with its optimal hyper-parameters (a.k.a automated machine learning). Comparisons with other methods demonstrate the effectiveness of our proposed method.

Posted Content
TL;DR: This work derives a theoretical bound on the amount of information stored in a RAM-like system and formulate an optimization problem that maximizes the bound, resulting in a solution termed Cached Uniform Writing, which aims to balance between maximizing memorization and forgetting via overwriting mechanisms.
Abstract: Memory-augmented neural networks consisting of a neural controller and an external memory have shown potentials in long-term sequential learning. Current RAM-like memory models maintain memory accessing every timesteps, thus they do not effectively leverage the short-term memory held in the controller. We hypothesize that this scheme of writing is suboptimal in memory utilization and introduces redundant computation. To validate our hypothesis, we derive a theoretical bound on the amount of information stored in a RAM-like system and formulate an optimization problem that maximizes the bound. The proposed solution dubbed Uniform Writing is proved to be optimal under the assumption of equal timestep contributions. To relax this assumption, we introduce modifications to the original solution, resulting in a solution termed Cached Uniform Writing. This method aims to balance between maximizing memorization and forgetting via overwriting mechanisms. Through an extensive set of experiments, we empirically demonstrate the advantages of our solutions over other recurrent architectures, claiming the state-of-the-arts in various sequential modeling tasks.

Proceedings Article
01 Jan 2019
TL;DR: In this paper, a multi-objective Bayesian optimisation algorithm that allows the user to express preference-order constraints on the objectives of the type objective A is more important than objective B is presented.
Abstract: We present a multi-objective Bayesian optimisation algorithm that allows the user to express preference-order constraints on the objectives of the type objective A is more important than objective B. These preferences are defined based on the stability of the obtained solutions with respect to preferred objective functions. Rather than attempting to find a representative subset of the complete Pareto front, our algorithm selects those Pareto-optimal points that satisfy these constraints. We formulate a new acquisition function based on expected improvement in dominated hypervolume (EHI) to ensure that the subset of Pareto front satisfying the constraints is thoroughly explored. The hypervolume calculation is weighted by the probability of a point satisfying the constraints from a gradient Gaussian Process model. We demonstrate our algorithm on both synthetic and real-world problems.

Book ChapterDOI
Phuc Luong1, Sunil Gupta1, Dang Nguyen1, Santu Rana1, Svetha Venkatesh1 
02 Dec 2019
TL;DR: This work proposes a method (named Discrete-BO) that manipulates the exploration of an acquisition function and the length scale of a covariance function, which are two key components of a BO method, to prevent sampling a pre-existing observation.
Abstract: Bayesian Optimization (BO) is an efficient method to optimize an expensive black-box function with continuous variables. However, in many cases, the function has only discrete variables as inputs, which cannot be optimized by traditional BO methods. A typical approach to optimize such functions assumes the objective function is on a continuous domain, then applies a normal BO method with a rounding of suggested continuous points to nearest discrete points at the end. This may cause BO to get stuck and repeat pre-existing observations. To overcome this problem, we propose a method (named Discrete-BO) that manipulates the exploration of an acquisition function and the length scale of a covariance function, which are two key components of a BO method, to prevent sampling a pre-existing observation. Our experiments on both synthetic and real-world applications show that the proposed method outperforms state-of-the-art baselines in terms of convergence rate. More importantly, we also show some theoretical analyses to prove the correctness of our method.

Journal ArticleDOI
Vu Nguyen1, Sunil Gupta1, Santu Rana1, Cheng Li1, Svetha Venkatesh1 
TL;DR: This paper proposes the filtering expansion strategy for Bayesian optimization, which starts from the initial region and gradually expands the search space, and develops an efficient algorithm for this strategy and derive its regret bound.
Abstract: Bayesian optimization (BO) has recently emerged as a powerful and flexible tool for hyper-parameter tuning and more generally for the efficient global optimization of expensive black-box functions. Systems implementing BO have successfully solved difficult problems in automatic design choices and machine learning hyper-parameters tunings. Many recent advances in the methodologies and theories underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Still, these established techniques always require a user-defined space to perform optimization. This pre-defined space specifies the ranges of hyper-parameter values. In many situations, however, it can be difficult to prescribe such spaces, as a prior knowledge is often unavailable. Setting these regions arbitrarily can lead to inefficient optimization—if a space is too large, we can miss the optimum with a limited budget, and on the other hand, if a space is too small, it may not contain the optimum point that we want to get. The unknown search space problem is intractable to solve in practice. Therefore, in this paper, we narrow down to consider specifically the setting of “weakly specified” search space for Bayesian optimization. By weakly specified space, we mean that the pre-defined space is placed at a sufficiently good region so that the optimization can expand and reach to the optimum. However, this pre-defined space need not include the global optimum. We tackle this problem by proposing the filtering expansion strategy for Bayesian optimization. Our approach starts from the initial region and gradually expands the search space. We develop an efficient algorithm for this strategy and derive its regret bound. These theoretical results are complemented by an extensive set of experiments on benchmark functions and two real-world applications which demonstrate the benefits of our proposed approach.

Journal ArticleDOI
TL;DR: Graph Attention model for multi-label classification over graphs (GAML) as mentioned in this paper was proposed to capture the relations between the labels and the input subgraphs at various resolution scales.
Abstract: We address a largely open problem of multilabel classification over graphs. Unlike traditional vector input, a graph has rich variable-size substructures which are related to the labels in some ways. We believe that uncovering these relations might hold the key to classification performance and explainability. We introduce Graph Attention model for Multi-Label learning ( $$\text {GAML}$$ ), a novel graph neural network that can handle this problem effectively. $$\text {GAML}$$ regards labels as auxiliary nodes and models them in conjunction with the input graph. By applying the neural message passing algorithm and attention mechanism to both the label nodes and the input nodes iteratively, $$\text {GAML}$$ can capture the relations between the labels and the input subgraphs at various resolution scales. Moreover, our model can take advantage of explicit label dependencies. It also scales linearly with the number of labels and graph size thanks to our proposed hierarchical attention. We evaluate $$\text {GAML}$$ on an extensive set of experiments with both graph-structured inputs and classical unstructured inputs. The results show that $$\text {GAML}$$ significantly outperforms other competing methods. Importantly, $$\text {GAML}$$ enables intuitive visualizations for better understanding of the label-substructure relations and explanation of the model behaviors.

Posted Content
TL;DR: A new memory to store weights for the controller, analogous to the stored-program memory in modern computer architectures is introduced, creating differentiable machines that can switch programs through time, adapt to variable contexts and thus resemble the Universal Turing Machine.
Abstract: Neural networks powered with external memory simulate computer behaviors. These models, which use the memory to store data for a neural controller, can learn algorithms and other complex tasks. In this paper, we introduce a new memory to store weights for the controller, analogous to the stored-program memory in modern computer architectures. The proposed model, dubbed Neural Stored-program Memory, augments current memory-augmented neural networks, creating differentiable machines that can switch programs through time, adapt to variable contexts and thus resemble the Universal Turing Machine. A wide range of experiments demonstrate that the resulting machines not only excel in classical algorithmic problems, but also have potential for compositional, continual, few-shot learning and question-answering tasks.

Proceedings Article
01 Jan 2019
TL;DR: In this article, Zhao et al. proposed Cached uniform writing to balance between maximizing memorization and forgetting via overwriting mechanisms, which is proved to be optimal under the assumption of equal timestep contributions.
Abstract: © 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. Memory-augmented neural networks consisting of a neural controller and an external memory have shown potentials in long-term sequential learning. Current RAM-like memory models maintain memory accessing every timesteps, thus they do not effectively leverage the short-term memory held in the controller. We hypothesize that this scheme of writing is suboptimal in memory utilization and introduces redundant computation. To validate our hypothesis, we derive a theoretical bound on the amount of information stored in a RAM-like system and formulate an optimization problem that maximizes the bound. The proposed solution dubbed Uniform Writing is proved to be optimal under the assumption of equal timestep contributions. To relax this assumption, we introduce modifications to the original solution, resulting in a solution termed Cached Uniform Writing. This method aims to balance between maximizing memorization and forgetting via overwriting mechanisms. Through an extensive set of experiments, we empirically demonstrate the advantages of our solutions over other recurrent architectures, claiming the state-of-the-arts in various sequential modeling tasks.

Proceedings Article
01 Jan 2019
TL;DR: This work proposes a systematic volume expansion strategy for the Bayesian optimization to guarantee that in iterative expansions of the search space, the method can find a point whose function value within epsilon of the objective function maximum.
Abstract: Applying Bayesian optimization in problems wherein the search space is unknown is challenging. To address this problem, we propose a systematic volume expansion strategy for the Bayesian optimization. We devise a strategy to guarantee that in iterative expansions of the search space, our method can find a point whose function value within epsilon of the objective function maximum. Without the need to specify any parameters, our algorithm automatically triggers a minimal expansion required iteratively. We derive analytic expressions for when to trigger the expansion and by how much to expand. We also provide theoretical analysis to show that our method achieves epsilon-accuracy after a finite number of iterations. We demonstrate our method on both benchmark test functions and machine learning hyper-parameter tuning tasks and demonstrate that our method outperforms baselines.

Posted Content
TL;DR: In this paper, a systematic volume expansion strategy for the Bayesian optimization is proposed to guarantee that in iterative expansions of the search space, the method can find a point whose function value within epsilon of the objective function maximum.
Abstract: Applying Bayesian optimization in problems wherein the search space is unknown is challenging. To address this problem, we propose a systematic volume expansion strategy for the Bayesian optimization. We devise a strategy to guarantee that in iterative expansions of the search space, our method can find a point whose function value within epsilon of the objective function maximum. Without the need to specify any parameters, our algorithm automatically triggers a minimal expansion required iteratively. We derive analytic expressions for when to trigger the expansion and by how much to expand. We also provide theoretical analysis to show that our method achieves epsilon-accuracy after a finite number of iterations. We demonstrate our method on both benchmark test functions and machine learning hyper-parameter tuning tasks and demonstrate that our method outperforms baselines.

Posted Content
TL;DR: A cost-aware multi-objective Bayesian optimisation with non-uniform evaluation cost over objective functions by defining cost- Aware constraints over the search space and formulation of the convergence that incorporates this cost- aware constraints while optimising the objective functions is formulated.
Abstract: The notion of expense in Bayesian optimisation generally refers to the uniformly expensive cost of function evaluations over the whole search space. However, in some scenarios, the cost of evaluation for black-box objective functions is non-uniform since different inputs from search space may incur different costs for function evaluations. We introduce a cost-aware multi-objective Bayesian optimisation with non-uniform evaluation cost over objective functions by defining cost-aware constraints over the search space. The cost-aware constraints are a sorted tuple of indexes that demonstrate the ordering of dimensions of the search space based on the user's prior knowledge about their cost of usage. We formulate a new multi-objective Bayesian optimisation acquisition function with detailed analysis of the convergence that incorporates this cost-aware constraints while optimising the objective functions. We demonstrate our algorithm based on synthetic and real-world problems in hyperparameter tuning of neural networks and random forests.

Journal ArticleDOI
TL;DR: The architecture of, and design rationale for, a new software platform designed to support the conduct of digital phenotyping research studies, which includes universal support for both iOS and Android devices and privacy-preserving mechanisms which, by default, collect only anonymized participant data are described.
Abstract: In this viewpoint we describe the architecture of, and design rationale for, a new software platform designed to support the conduct of digital phenotyping research studies. These studies seek to collect passive and active sensor signals from participants' smartphones for the purposes of modelling and predicting health outcomes, with a specific focus on mental health. We also highlight features of the current research landscape that recommend the coordinated development of such platforms, including the significant technical and resource costs of development, and we identify specific considerations relevant to the design of platforms for digital phenotyping. In addition, we describe trade-offs relating to data quality and completeness versus the experience for patients and public users who consent to their devices being used to collect data. We summarize distinctive features of the resulting platform, InSTIL (Intelligent Sensing to Inform and Learn), which includes universal (ie, cross-platform) support for both iOS and Android devices and privacy-preserving mechanisms which, by default, collect only anonymized participant data. We conclude with a discussion of recommendations for future work arising from learning during the development of the platform. The development of the InSTIL platform is a key step towards our research vision of a population-scale, international, digital phenotyping bank. With suitable adoption, the platform will aggregate signals from large numbers of participants and large numbers of research studies to support modelling and machine learning analyses focused on the prediction of mental illness onset and disease trajectories.

Journal ArticleDOI
TL;DR: Through pathway analysis, it is found that these sex‐independent biomarkers have substantially different biological roles than the sex‐dependent biomarkers, and that some of these pathways are ubiquitously dysregulated in both postmortem brain and blood.
Abstract: Autism spectrum disorder (ASD) is a markedly heterogeneous condition with a varied phenotypic presentation. Its high concordance among siblings, as well as its clear association with specific genetic disorders, both point to a strong genetic etiology. However, the molecular basis of ASD is still poorly understood, although recent studies point to the existence of sex-specific ASD pathophysiologies and biomarkers. Despite this, little is known about how exactly sex influences the gene expression signatures of ASD probands. In an effort to identify sex-dependent biomarkers and characterize their function, we present an analysis of a single paired-end postmortem brain RNA-Seq data set and a meta-analysis of six blood-based microarray data sets. Here, we identify several genes with sex-dependent dysregulation, and many more with sex-independent dysregulation. Moreover, through pathway analysis, we find that these sex-independent biomarkers have substantially different biological roles than the sex-dependent biomarkers, and that some of these pathways are ubiquitously dysregulated in both postmortem brain and blood. We conclude by synthesizing the discovered biomarker profiles with the extant literature, by highlighting the advantage of studying sex-specific dysregulation directly, and by making a call for new transcriptomic data that comprise large female cohorts.

Journal ArticleDOI
TL;DR: This report proposes using an established surveillance method that detects anomalous samples based on their deviation from a learned normal steady-state structure, and can create an anomaly detector for tissue transcriptomes, a “tissue detector,” that is capable of identifying cancer without ever seeing a single cancer example.
Abstract: Since the turn of the century, researchers have sought to diagnose cancer based on gene expression signatures measured from the blood or biopsy as biomarkers. This task, known as classification, is typically solved using a suite of algorithms that learn a mathematical rule capable of discriminating one group (``cases'') from another (``controls''). However, discriminatory methods can only identify cancerous samples that resemble those that the algorithm already saw during training. As such, discriminatory methods may be ill-suited for the classification of cancer: because the possibility space of cancer is definitively large, the existence of a one-of-a-kind gene expression signature is likely. Instead, we propose using an established surveillance method that detects anomalous samples based on their deviation from a learned normal steady-state structure. By transferring this method to transcriptomic data, we can create an anomaly detector for tissue transcriptomes, a ``tissue detector'', that is capable of identifying cancer without ever seeing a single cancer example. As a proof-of-concept, we train a ``tissue detector'' on normal GTEx samples that can classify TCGA samples with >90\% AUC for 3 out of 6 tissues. Importantly, we find that the classification accuracy is improved simply by adding more healthy samples. We conclude this report by emphasizing the conceptual advantages of anomaly detection and by highlighting future directions for this field of study.

Journal ArticleDOI
23 Sep 2019
TL;DR: This work presents the development of a highly hydrophobic material from an amphiphilic polymer through a novel, adaptive artificial intelligence approach using Bayesian optimization, which resulted in additional knowledge gain, which can be applied to the fabrication process.
Abstract: In materials science, the investigation of a large and complex experimental space is time-consuming and thus may induce bias to exclude potential solutions where little to no knowledge is available. This work presents the development of a highly hydrophobic material from an amphiphilic polymer through a novel, adaptive artificial intelligence approach. The hydrophobicity arises from the random packing of short polymer fibers into paper, a highly entropic, multistep process. Using Bayesian optimization, the algorithm is able to efficiently navigate the parameter space without bias, including areas which a human experimenter would not address. This resulted in additional knowledge gain, which can then be applied to the fabrication process, resulting in a highly hydrophobic material (static water contact angle 135°) from an amphiphilic polymer (contact angle of 90°) through a simple and scalable filtration-based method. This presents a potential pathway for surface modification using the short polymer fibers to create fluorine-free hydrophobic surfaces on a larger scale.

Posted ContentDOI
28 Jun 2019-bioRxiv
TL;DR: This work proposes a sparse neural encoder-decoder network to predict metabolite abundances from microbe abundances using paired data from a cohort of inflammatory bowel disease (IBD) patients and shows that the model outperforms linear univariate and multivariate methods in terms of accuracy, sparsity, and stability.
Abstract: Technological advances in next-generation sequencing (NGS) and chromatographic assays [e.g., liquid chromatography mass spectrometry (LC-MS)] have made it possible to identify thousands of microbe and metabolite species, and to measure their relative abundance. In this paper, we propose a sparse neural encoder-decoder network to predict metabolite abundances from microbe abundances. Using paired data from a cohort of inflammatory bowel disease (IBD) patients, we show that our neural encoder-decoder model outperforms linear univariate and multivariate methods in terms of accuracy, sparsity, and stability. Importantly, we show that our neural encoder-decoder model is not simply a black box designed to maximize predictive accuracy. Rather, the network’s hidden layer (i.e., the latent space, comprised only of sparsely weighted microbe counts) actually captures key microbe-metabolite relationships that are themselves clinically meaningful. Although this hidden layer is learned without any knowledge of the patient’s diagnosis, we show that the learned latent features are structured in a way that predicts IBD and treatment status with high accuracy. By imposing a non-negative weights constraint, the network becomes a directed graph where each downstream node is interpretable as the additive combination of the upstream nodes. Here, the middle layer comprises distinct microbe-metabolite axes that relate key microbial biomarkers with metabolite biomarkers. By pre-processing the microbiome and metabolome data using compositional data analysis methods, we ensure that our proposed multi-omics workflow will generalize to any pair of -omics data. To the best of our knowledge, this work is the first application of neural encoder-decoders for the interpretable integration of multi-omics biological data.

Posted ContentDOI
29 Jan 2019-bioRxiv
TL;DR: A novel deep learning architecture, called DeepTRIAGE (Deep learning for the TRactable Individualised Analysis of Gene Expression), which not only classifies cancer sub-types with comparable accuracy, but simultaneously assigns each patient their own set of interpretable and individualised biomarker scores.
Abstract: Motivation Breast cancer is a collection of multiple tissue pathologies, each with a distinct molecular signature that correlates with patient prognosis and response to therapy. Accurately differentiating between breast cancer sub-types is an important part of clinical decision-making. Already, this problem has been addressed using machine learning methods that separate tissue samples into distinct groups, there remains unexplained heterogeneity within the established sub-types that cannot be resolved by the commonly used classification algorithms. In this paper, we propose a novel deep learning architecture, called DeepTRIAGE (Deep learning for the TRactable Individualised Analysis of Gene Expression), which not only classifies cancer sub-types with good accuracy, but simultaneously assigns each patient their own set of interpretable and individualised biomarker scores. These personalised scores describe how important each feature is in the classification of any patient, and can be analysed post-hoc to generate new hypotheses about latent heterogeneity. Results We apply the DeepTRIAGE framework to classify the gene expression signatures of luminal A and luminal B breast cancer sub-types, and illustrate its use for genes as well as the GO and KEGG gene sets. Using DeepTRIAGE, we calculate personalised biomarker scores that describe the most important features for classifying an individual patient as luminal A or luminal B. In doing so, DeepTRIAGE simultaneously reveals heterogeneity within the luminal A biomarker scores that significantly associate with tumour stage, placing all luminal samples along a continuum of severity. Availability and implementation The proposed model is implemented in Python using Py-Torch framework. The analysis is done in Python and R. All methods and models are freely available from https://github.com/adham/BiomarkerAttend.

Posted Content
TL;DR: A novel method to solve experimental design problems of new short polymer fiber with the target length, and designing of a new three dimensional porous scaffolding with a target porosity demonstrates faster convergence than the basic Bayesian optimization approach not using such 'hunches'.
Abstract: Experimental design is a process of obtaining a product with target property via experimentation. Bayesian optimization offers a sample-efficient tool for experimental design when experiments are expensive. Often, expert experimenters have 'hunches' about the behavior of the experimental system, offering potentials to further improve the efficiency. In this paper, we consider per-variable monotonic trend in the underlying property that results in a unimodal trend in those variables for a target value optimization. For example, sweetness of a candy is monotonic to the sugar content. However, to obtain a target sweetness, the utility of the sugar content becomes a unimodal function, which peaks at the value giving the target sweetness and falls off both ways. In this paper, we propose a novel method to solve such problems that achieves two main objectives: a) the monotonicity information is used to the fullest extent possible, whilst ensuring that b) the convergence guarantee remains intact. This is achieved by a two-stage Gaussian process modeling, where the first stage uses the monotonicity trend to model the underlying property, and the second stage uses `virtual' samples, sampled from the first, to model the target value optimization function. The process is made theoretically consistent by adding appropriate adjustment factor in the posterior computation, necessitated because of using the `virtual' samples. The proposed method is evaluated through both simulations and real world experimental design problems of a) new short polymer fiber with the target length, and b) designing of a new three dimensional porous scaffolding with a target porosity. In all scenarios our method demonstrates faster convergence than the basic Bayesian optimization approach not using such `hunches'.

Posted Content
TL;DR: A multi-objective Bayesian optimisation algorithm that allows the user to express preference-order constraints on the objectives of the type "objective A is more important than objective B" is presented.
Abstract: We present a multi-objective Bayesian optimisation algorithm that allows the user to express preference-order constraints on the objectives of the type "objective A is more important than objective B". These preferences are defined based on the stability of the obtained solutions with respect to preferred objective functions. Rather than attempting to find a representative subset of the complete Pareto front, our algorithm selects those Pareto-optimal points that satisfy these constraints. We formulate a new acquisition function based on expected improvement in dominated hypervolume (EHI) to ensure that the subset of Pareto front satisfying the constraints is thoroughly explored. The hypervolume calculation is weighted by the probability of a point satisfying the constraints from a gradient Gaussian Process model. We demonstrate our algorithm on both synthetic and real-world problems.

Book ChapterDOI
01 Jan 2019
TL;DR: An optimization procedure is developed, which helps a decision tree to mimic a black- box model, by efficiently retraining the decision tree in a sequential manner, using the data labeled by the black-box model.
Abstract: Explaining black-box machine learning models is important for their successful applicability to many real world problems. Existing approaches to model explanation either focus on explaining a particular decision instance or are applicable only to specific models. In this paper, we address these limitations by proposing a new model-agnostic mechanism to black-box model explainability. Our approach can be utilised to explain the predictions of any black-box machine learning model. Our work uses interpretable surrogate models (e.g. a decision tree) to extract global rules to describe the preditions of a model. We develop an optimization procedure, which helps a decision tree to mimic a black-box model, by efficiently retraining the decision tree in a sequential manner, using the data labeled by the black-box model. We demonstrate the usefulness of our proposed framework using three applications: two classification models, one built using iris dataset, other using synthetic dataset and a regression model built for bike sharing dataset.