scispace - formally typeset
Search or ask a question

Showing papers on "Domain knowledge published in 2020"


Journal ArticleDOI
03 Apr 2020
TL;DR: This work proposes a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge, which significantly outperforms BERT and reveals promising results in twelve NLP tasks.
Abstract: Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by being equipped with a KG without pre-training by itself because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.

516 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide a survey of recent scientific works that incorporate machine learning and the way that explainable machine learning is used in combination with domain knowledge from the application areas.
Abstract: Machine learning methods have been remarkably successful for a wide range of application areas in the extraction of essential information from data. An exciting and relatively recent development is the uptake of machine learning in the natural sciences, where the major goal is to obtain novel scientific insights and discoveries from observational or simulated data. A prerequisite for obtaining a scientific outcome is domain knowledge, which is needed to gain explainability, but also to enhance scientific consistency. In this article, we review explainable machine learning in view of applications in the natural sciences and discuss three core elements that we identified as relevant in this context: transparency, interpretability, and explainability. With respect to these core elements, we provide a survey of recent scientific works that incorporate machine learning and the way that explainable machine learning is used in combination with domain knowledge from the application areas.

493 citations


Proceedings ArticleDOI
27 Jan 2020
TL;DR: It is shown that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making, which may also depend on whether the human can bring in enough unique knowledge to complement the AI's errors.
Abstract: Today, AI is being increasingly used to help human experts make decisions in high-stakes scenarios. In these scenarios, full automation is often undesirable, not only due to the significance of the outcome, but also because human experts can draw on their domain knowledge complementary to the model's to ensure task success. We refer to these scenarios as AI-assisted decision making, where the individual strengths of the human and the AI come together to optimize the joint decision outcome. A key to their success is to appropriately calibrate human trust in the AI on a case-by-case basis; knowing when to trust or distrust the AI allows the human expert to appropriately apply their knowledge, improving decision outcomes in cases where the model is likely to perform poorly. This research conducts a case study of AI-assisted decision making in which humans and AI have comparable performance alone, and explores whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI. Specifically, we study the effect of showing confidence score and local explanation for a particular prediction. Through two human experiments, we show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making, which may also depend on whether the human can bring in enough unique knowledge to complement the AI's errors. We also highlight the problems in using local explanation for AI-assisted decision making scenarios and invite the research community to explore new approaches to explainability for calibrating human trust in AI.

287 citations


Journal ArticleDOI
30 Apr 2020
TL;DR: OpenFermion as mentioned in this paper is an open-source software library written largely in Python under an Apache 2.0 license, aimed at enabling the simulation of fermionic and bosonic models and quantum chemistry problems on quantum hardware.
Abstract: Quantum simulation of chemistry and materials is predicted to be an important application for both near-term and fault-tolerant quantum devices. However, at present, developing and studying algorithms for these problems can be difficult due to the prohibitive amount of domain knowledge required in both the area of chemistry and quantum algorithms. To help bridge this gap and open the field to more researchers, we have developed the OpenFermion software package (www.openfermion.org). OpenFermion is an open-source software library written largely in Python under an Apache 2.0 license, aimed at enabling the simulation of fermionic and bosonic models and quantum chemistry problems on quantum hardware. Beginning with an interface to common electronic structure packages, it simplifies the translation between a molecular specification and a quantum circuit for solving or studying the electronic structure problem on a quantum computer, minimizing the amount of domain expertise required to enter the field. The package is designed to be extensible and robust, maintaining high software standards in documentation and testing. This release paper outlines the key motivations behind design choices in OpenFermion and discusses some basic OpenFermion functionality which we believe will aid the community in the development of better quantum algorithms and tools for this exciting area of research.

258 citations


Posted Content
TL;DR: After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.
Abstract: Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.

218 citations


Proceedings ArticleDOI
30 Apr 2020
TL;DR: This work introduces scientific claim verification, a new task to select abstracts from the research literature containing evidence that supports or refutes a given scientific claim, and to identify rationales justifying each decision.
Abstract: We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim, and to identify rationales justifying each decision. To study this task, we construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales. We develop baseline models for SciFact, and demonstrate that simple domain adaptation techniques substantially improve performance compared to models trained on Wikipedia or political news. We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus. Our experiments indicate that SciFact will provide a challenging testbed for the development of new systems designed to retrieve and reason over corpora containing specialized domain knowledge. Data and code for this new task are publicly available at https://github.com/allenai/scifact. A leaderboard and COVID-19 fact-checking demo are available at https://scifact.apps.allenai.org.

214 citations


Journal ArticleDOI
TL;DR: A Python package entitled TSFEL, which computes over 60 different features extracted across temporal, statistical and spectral domains, designed to support the process of fast exploratory data analysis and feature extraction on time series with computational cost evaluation.

192 citations


Proceedings ArticleDOI
Mengxue Li1, Yi-Ming Zhai1, You-Wei Luo1, Pengfei Ge1, Chuan-Xian Ren1 
14 Jun 2020
TL;DR: This work proposes an enhanced transport distance (ETD) for UDA, which builds an attention-aware transport distance, which can be viewed as the prediction feedback of the iteratively learned classifier, to measure the domain discrepancy.
Abstract: Unsupervised domain adaptation (UDA) is a representative problem in transfer learning, which aims to improve the classification performance on an unlabeled target domain by exploiting discriminant information from a labeled source domain. The optimal transport model has been used for UDA in the perspective of distribution matching. However, the transport distance cannot reflect the discriminant information from either domain knowledge or category prior. In this work, we propose an enhanced transport distance (ETD) for UDA. This method builds an attention-aware transport distance, which can be viewed as the prediction feedback of the iteratively learned classifier, to measure the domain discrepancy. Further, the Kantorovich potential variable is re-parameterized by deep neural networks to learn the distribution in the latent space. The entropy-based regularization is developed to explore the intrinsic structure of the target domain. The proposed method is optimized alternately in an end-to-end manner. Extensive experiments are conducted on four benchmark datasets to demonstrate the SOTA performance of ETD.

147 citations


Posted Content
TL;DR: A comprehensive review of the leading approaches for combining model-based algorithms with deep learning in a systematic manner is provided, along with concrete guidelines and detailed signal processing oriented examples from recent literature.
Abstract: Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Such model-based methods utilize mathematical formulations that represent the underlying physics, prior information and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. On the other hand, purely data-driven approaches that are model-agnostic are becoming increasingly popular as datasets become abundant and the power of modern deep learning pipelines increases. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance, especially for supervised problems. However, DNNs typically require massive amounts of data and immense computational resources, limiting their applicability for some signal processing scenarios. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches. Such model-based deep learning methods exploit both partial domain knowledge, via mathematical structures designed for specific problems, as well as learning from limited data. In this article we survey the leading approaches for studying and designing model-based deep learning systems. We divide hybrid model-based/data-driven systems into categories based on their inference mechanism. We provide a comprehensive review of the leading approaches for combining model-based algorithms with deep learning in a systematic manner, along with concrete guidelines and detailed signal processing oriented examples from recent literature. Our aim is to facilitate the design and study of future systems on the intersection of signal processing and machine learning that incorporate the advantages of both domains.

120 citations


Proceedings ArticleDOI
27 Jan 2020
TL;DR: This paper focuses on explaining Doctor AI, a multilabel classifier which takes as input the clinical history of a patient in order to predict the next visit, and shows how exploiting the temporal dimension in the data and the domain knowledge encoded in the medical ontology improves the quality of the mined explanations.
Abstract: Several recent advancements in Machine Learning involve blackbox models: algorithms that do not provide human-understandable explanations in support of their decisions. This limitation hampers the fairness, accountability and transparency of these models; the field of eXplainable Artificial Intelligence (XAI) tries to solve this problem providing human-understandable explanations for black-box models. However, healthcare datasets (and the related learning tasks) often present peculiar features, such as sequential data, multi-label predictions, and links to structured background knowledge. In this paper, we introduce Doctor XAI, a model-agnostic explainability technique able to deal with multi-labeled, sequential, ontology-linked data. We focus on explaining Doctor AI, a multilabel classifier which takes as input the clinical history of a patient in order to predict the next visit. Furthermore, we show how exploiting the temporal dimension in the data and the domain knowledge encoded in the medical ontology improves the quality of the mined explanations.

113 citations


Proceedings ArticleDOI
TL;DR: In this paper, the authors study the effect of showing confidence score and local explanation for a particular prediction for AI-assisted decision making, and show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve the joint performance of the human and AI.
Abstract: Today, AI is being increasingly used to help human experts make decisions in high-stakes scenarios. In these scenarios, full automation is often undesirable, not only due to the significance of the outcome, but also because human experts can draw on their domain knowledge complementary to the model's to ensure task success. We refer to these scenarios as AI-assisted decision making, where the individual strengths of the human and the AI come together to optimize the joint decision outcome. A key to their success is to appropriately \textit{calibrate} human trust in the AI on a case-by-case basis; knowing when to trust or distrust the AI allows the human expert to appropriately apply their knowledge, improving decision outcomes in cases where the model is likely to perform poorly. This research conducts a case study of AI-assisted decision making in which humans and AI have comparable performance alone, and explores whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI. Specifically, we study the effect of showing confidence score and local explanation for a particular prediction. Through two human experiments, we show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making, which may also depend on whether the human can bring in enough unique knowledge to complement the AI's errors. We also highlight the problems in using local explanation for AI-assisted decision making scenarios and invite the research community to explore new approaches to explainability for calibrating human trust in AI.

Proceedings Article
30 Apr 2020
TL;DR: Differentiable Digital Signal Processing (DDSP) as discussed by the authors is an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning, which enables manipulation of each separate model component, with applications such as independent control of pitch and loudness.
Abstract: Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is generated and perceived. A third approach (vocoders/synthesizers) successfully incorporates strong domain knowledge of signal processing and perception, but has been less actively researched due to limited expressivity and difficulty integrating with modern auto-differentiation-based machine learning methods. In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. Focusing on audio synthesis, we achieve high-fidelity generation without the need for large autoregressive models or adversarial losses, demonstrating that DDSP enables utilizing strong inductive biases without losing the expressive power of neural networks. Further, we show that combining interpretable modules permits manipulation of each separate model component, with applications such as independent control of pitch and loudness, realistic extrapolation to pitches not seen during training, blind dereverberation of room acoustics, transfer of extracted room acoustics to new environments, and transformation of timbre between disparate sources. In short, DDSP enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning. The library will be made available upon paper acceptance and we encourage further contributions from the community and domain experts.

Posted Content
Jiaxi Tang1, Rakesh Shivanna1, Zhe Zhao1, Dong Lin1, Anima Singh1, Ed H. Chi1, Sagar Jain1 
TL;DR: This paper dissects the effects of knowledge distillation into three main factors: (1) benefits inherited from label smoothing, (2) example re-weighting based on teacher's confidence on ground-truth, and (3) prior knowledge of optimal output (logit) layer geometry.
Abstract: Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is used to train a more compact student model with better inference efficiency. Through distillation, one hopes to benefit from student's compactness, without sacrificing too much on model quality. Despite the large success of knowledge distillation, better understanding of how it benefits student model's training dynamics remains under-explored. In this paper, we categorize teacher's knowledge into three hierarchical levels and study its effects on knowledge distillation: (1) knowledge of the `universe', where KD brings a regularization effect through label smoothing; (2) domain knowledge, where teacher injects class relationships prior to student's logit layer geometry; and (3) instance specific knowledge, where teacher rescales student model's per-instance gradients based on its measurement on the event difficulty. Using systematic analyses and extensive empirical studies on both synthetic and real-world datasets, we confirm that the aforementioned three factors play a major role in knowledge distillation. Furthermore, based on our findings, we diagnose some of the failure cases of applying KD from recent studies.

Journal ArticleDOI
TL;DR: In this article, the authors consider the challenges and opportunities in the retail sector from five perspectives: retail managers, retailing researchers, public-policy makers, investors, and retailing educators.

Proceedings ArticleDOI
23 Apr 2020
TL;DR: As a step towards the goal of automated damage detection, preliminary results are presented from dynamic modelling of beam structures using physics-informed artificial neural networks and a sensing paradigm for non-contact full-field measurements for damage diagnosis is presented.
Abstract: A physics-based approach to structural health monitoring (SHM) has practical shortcomings which restrict its suitability to simple structures under well controlled environments. With the advances in information and sensing technology (sensors and sensor networks), it has become feasible to monitor large/diverse number of parameters in complex real-world structures either continuously or intermittently by employing large in-situ (wireless) sensor networks. The availability of this historical data has engendered a lot of interest in a data-driven approach as a natural and more viable option for realizing the goal of SHM in such structures. However, the lack of sensor data corresponding to different damage scenarios continues to remain a challenge. Most of the supervised machine-learning/deep-learning techniques, when trained using this inherently limited data, lack robustness and generalizability. Physics-informed learning, which involves the integration of domain knowledge into the learning process, is presented here as a potential remedy to this challenge. As a step towards the goal of automated damage detection (mathematically an inverse problem), preliminary results are presented from dynamic modelling of beam structures using physics-informed artificial neural networks. Forward and inverse problems involving partial differential equations are solved and comparisons reveal a clear superiority of physics-informed approach over one that is purely datadriven vis-a-vis overfitting/generalization. Other ways of incorporating domain knowledge into the machine learning pipeline are then presented through case-studies on various aspects of NDI/SHM (visual inspection, impact diagnosis). Lastly, as the final attribute of an optimal SHM approach, a sensing paradigm for non-contact full-field measurements for damage diagnosis is presented.

Journal ArticleDOI
TL;DR: The proposed automated classification model and LDA-based network analysis method provide a useful approach to enable machine-assisted interpretation of texts-based accident narratives and can provide managers with much-needed information and knowledge to improve safety on-site.

Proceedings ArticleDOI
01 Jul 2020
TL;DR: This paper proposes Adaptive Sequence Partitioner with Power-law Attention (ASPPA) to automatically identify each semantic subsequence of POIs and discover their sequential patterns in the user’s check-in sequence.
Abstract: Next Point-of-Interest (POI) recommendation plays an important role in location-based services. State-of-the-art methods learn the POI-level sequential patterns in the user's check-in sequence but ignore the subsequence patterns that often represent the socio-economic activities or coherence of preference of the users. However, it is challenging to integrate the semantic subsequences due to the difficulty to predefine the granularity of the complex but meaningful subsequences. In this paper, we propose Adaptive Sequence Partitioner with Power-law Attention (ASPPA) to automatically identify each semantic subsequence of POIs and discover their sequential patterns. Our model adopts a state-based stacked recurrent neural network to hierarchically learn the latent structures of the user's check-in sequence. We also design a power-law attention mechanism to integrate the domain knowledge in spatial and temporal contexts. Extensive experiments on two real-world datasets demonstrate the effectiveness of our model.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: The RL-CycleGAN as mentioned in this paper introduces the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image.
Abstract: Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain the RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.

Journal ArticleDOI
01 Jul 2020
TL;DR: The experimental results demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster.
Abstract: Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches that have been proposed so far in the literature have severe limitations: they either require prior domain knowledge that is used to design the anomaly discovery algorithms, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems, and propose an unsupervised method suitable for domain agnostic subsequence anomaly detection. Our method, Series2Graph, is based on a graph representation of a novel low-dimensionality embedding of subsequences. Series2Graph needs neither labeled instances (like supervised techniques), nor anomaly-free data (like zero-positive learning techniques), and identifies anomalies of varying lengths. The experimental results, on the largest set of synthetic and real datasets used to date, demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster.

Posted Content
TL;DR: A fixed ratio-based mixup is introduced to augment multiple intermediate domains between the source and target domain and gradually transfer domain knowledge from the source to the target domain.
Abstract: Unsupervised domain adaptation (UDA) methods for learning domain invariant representations have achieved remarkable progress. However, few studies have been conducted on the case of large domain discrepancies between a source and a target domain. In this paper, we propose a UDA method that effectively handles such large domain discrepancies. We introduce a fixed ratio-based mixup to augment multiple intermediate domains between the source and target domain. From the augmented-domains, we train the source-dominant model and the target-dominant model that have complementary characteristics. Using our confidence-based learning methodologies, e.g., bidirectional matching with high-confidence predictions and self-penalization using low-confidence predictions, the models can learn from each other or from its own results. Through our proposed methods, the models gradually transfer domain knowledge from the source to the target domain. Extensive experiments demonstrate the superiority of our proposed method on three public benchmarks: Office-31, Office-Home, and VisDA-2017.

Posted Content
TL;DR: Experiments show that the proposed role-oriented MARL framework (ROMA) can learn specialized, dynamic, and identifiable roles, which help the method push forward the state of the art on the StarCraft II micromanagement benchmark.
Abstract: The role concept provides a useful tool to design and understand complex multi-agent systems, which allows agents with a similar role to share similar behaviors. However, existing role-based methods use prior domain knowledge and predefine role structures and behaviors. In contrast, multi-agent reinforcement learning (MARL) provides flexibility and adaptability, but less efficiency in complex tasks. In this paper, we synergize these two paradigms and propose a role-oriented MARL framework (ROMA). In this framework, roles are emergent, and agents with similar roles tend to share their learning and to be specialized on certain sub-tasks. To this end, we construct a stochastic role embedding space by introducing two novel regularizers and conditioning individual policies on roles. Experiments show that our method can learn specialized, dynamic, and identifiable roles, which help our method push forward the state of the art on the StarCraft II micromanagement benchmark. Demonstrative videos are available at this https URL.

Proceedings ArticleDOI
17 Mar 2020
TL;DR: This paper provides a high-level introduction to meta-learning with applications to communication systems, and provides a way to automatize the selection of an inductive bias.
Abstract: Machine learning methods adapt the parameters of a model, constrained to lie in a given model class, by using a fixed learning procedure based on data or active observations. Adaptation is done on a per-task basis, and retraining is needed when the system configuration changes. The resulting inefficiency in terms of data and training time requirements can be mitigated, if domain knowledge is available, by selecting a suitable model class and learning procedure, collectively known as inductive bias. However, it is generally difficult to encode prior knowledge into an inductive bias, particularly with black-box model classes such as neural networks. Meta-learning provides a way to automatize the selection of an inductive bias. Meta-learning leverages data or active observations from tasks that are expected to be related to future, and a priori unknown, tasks of interest. With a meta-trained inductive bias, training of a machine learning model can be potentially carried out with reduced training data and/or time complexity. This paper provides a high-level introduction to meta-learning with applications to communication systems.

Proceedings ArticleDOI
23 Aug 2020
TL;DR: This paper designs a novel search space tailored for ST-domain which consists of two categories of components: optional convolution operations at each layer to automatically extract multi-range spatio-temporal dependencies and learnable skip connections among layers to dynamically fuse low- and high-level ST-features.
Abstract: Spatio-temporal (ST) prediction (e.g. crowd flow prediction) is of great importance in a wide range of smart city applications from urban planning, intelligent transportation and public safety. Recently, many deep neural network models have been proposed to make accurate prediction. However, manually designing neural networks requires amount of expert efforts and ST domain knowledge. How to automatically construct a general neural network for diverse spatio-temporal predication tasks in cities? In this paper, we study Neural Architecture Search (NAS) for spatio-temporal prediction and propose an efficient spatio-temporal neural architecture search method, entitled AutoST. To our best knowledge, the search space is an important human prior to the success of NAS in different applications while current NAS models concentrated on optimizing search strategy in the fixed search space. Thus, we design a novel search space tailored for ST-domain which consists of two categories of components: (i) optional convolution operations at each layer to automatically extract multi-range spatio-temporal dependencies; (ii) learnable skip connections among layers to dynamically fuse low- and high-level ST-features. We conduct extensive experiments on four real-word spatio-temporal prediction tasks, including taxi flow and crowd flow, showing that the learned network architectures can significantly improve the performance of representative ST neural network models. Furthermore, our proposed efficient NAS approach searches 8-10x faster than state-of-the-art NAS approaches, demonstrating the efficiency and effectiveness of AutoST.

Posted Content
TL;DR: The RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning, is obtained by incorporating the RL-scene consistency loss into unsupervised domain translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image.
Abstract: Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.

Journal ArticleDOI
01 Aug 2020
TL;DR: The proposed approach is found to be better than existing machine learning based models which used hand-crafted features and can be easily applied to any kind of review as the features are calculated only from the review text and not from other domain knowledge.
Abstract: The smart cities aim to provide an infrastructure to their citizens that reduces both their time and effort. An example of such an available infrastructure is electronic shopping. Electronic shopping has become the hotbeds of many customers as it is easier to judge the quality of the product based on the review information. The purpose of this study is to predict the best helpful online product review, out of the several thousand reviews available for the product using review representation learning. The prediction is done using a two-layered convolutional neural network model. The review texts are embedded into low-dimensional vectors using a pre-trained model. To learn the best features of the review text, three filters are used to learn tri-gram, four-gram, and five-gram features of the text. The proposed approach is found to be better than existing machine learning based models which used hand-crafted features. The very low value of mean squared error confirms the prediction accuracy of the proposed method. The proposed method can be easily applied to any kind of review as the features are calculated only from the review text and not from other domain knowledge. The proposed model helps in predicting the helpfulness score of new reviews as soon as it gets posted on the product review page.

Journal ArticleDOI
TL;DR: Eight deep-learning models, three based on convolutional neural networks and five based on recurrent neural networks, with two types of input structures, i.e., word level and character level, are compared for 13 review datasets and the classification performances are discussed under different perspectives.
Abstract: The purpose of sentiment classification is to determine whether a particular document has a positive or negative nuance. Sentiment classification is extensively used in many business domains to improve products or services by understanding the opinions of customers regarding these products. Deep learning achieves state-of-the-art results in various challenging domains. With the success of deep learning, many studies have proposed deep-learning-based sentiment classification models and achieved better performances compared with conventional machine learning models. However, one practical issue occurring in deep-learning-based sentiment classification is that the best model structure depends on the characteristics of the dataset on which the deep learning model is trained; moreover, it is manually determined based on the domain knowledge of an expert or selected from a grid search of possible candidates. Herein, we present a comparative study of different deep-learning-based sentiment classification model structures to derive meaningful implications for building sentiment classification models. Specifically, eight deep-learning models, three based on convolutional neural networks and five based on recurrent neural networks, with two types of input structures, i.e., word level and character level, are compared for 13 review datasets, and the classification performances are discussed under different perspectives.

Proceedings ArticleDOI
20 Jan 2020
TL;DR: This work proposes (basic) PCS inference for reliability measures on data results, extending statistical inference to a much broader scope as current data science practice entails, and proposes PCS documentation based on R Markdown or Jupyter Notebook to back up human choices made throughout an analysis.
Abstract: Veridical data science extracts reliable and reproducible information from data, with an enriched technical language to communicate and evaluate empirical evidence in the context of human decisions and domain knowledge. Building and expanding on principles of statistics, machine learning, and the sciences, we propose the predictability, computability, and stability (PCS) framework forveridical data science. Our framework is comprised of both a workflow and documentation and aims to provide responsible, reliable, reproducible, and transparent results across the entire data science life cycle. Moreover, we propose the PDR desiderata for interpretable machine learning as part of veridical data science (with PDR standing for predictive accuracy, predictive accuracy and relevancy to a human audience and a particular domain problem). The PCS framework will be illustrated through the development of the DeepTune framework for characterizing V4 neurons. DeepTune builds predictive models using DNNs and ridge regression and applies the stability principle to obtain stable interpretations of 18 predictive models. Finally, a general DNN interpretaion method based on contexual decomposition (CD) will be discussed with applications to sentiment analysis and cosmological parameter estimation.

Journal ArticleDOI
TL;DR: This paper proposes a relationship extraction method for domain knowledge graph construction based on improved cross-entropy loss function and obtains upper and lower relationships from structured data in the classification system of network encyclopedia and semi-structured data inThe classification labels of web pages.
Abstract: As a semantic knowledge base, knowledge graph is a powerful tool for managing large-scale knowledge consists with instances, concepts and relationships between them. In view that the existing domain knowledge graphs can not obtain relationships in various structures through targeted approaches in the process of construction which resulting in insufficient knowledge utilization, this paper proposes a relationship extraction method for domain knowledge graph construction. We obtain upper and lower relationships from structured data in the classification system of network encyclopedia and semi-structured data in the classification labels of web pages, and non-superordinate relationships are extracted from unstructured text through the proposed convolution residual network based on improved cross-entropy loss function. We verify the effectiveness of the designed method by comparing with existing relationship extraction methods and constructing a food domain knowledge graph.

Proceedings ArticleDOI
Dongxu Li1, Xin Yu1, Chenchen Xu1, Lars Petersson1, Hongdong Li1 
14 Jun 2020
TL;DR: In this paper, a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them is proposed.
Abstract: Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. It requires models to recognize isolated sign words from videos. However, annotating WSLR data needs expert knowledge, thus limiting WSLR dataset acquisition. On the contrary, there are abundant subtitled sign news videos on the internet. Since these videos have no word-level annotation and exhibit a large domain gap from isolated signs, they cannot be directly used for training WSLR models. We observe that despite the existence of a large domain gap, isolated and news signs share the same visual concepts, such as hand gestures and body movements. Motivated by this observation, we propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them. To this end, we extract news signs using a base WSLR model, and then design a classifier jointly trained on news and isolated signs to coarsely align these two domain features. In order to learn domain-invariant features within each class and suppress domain-specific features, our method further resorts to an external memory to store the class centroids of the aligned news signs. We then design a temporal attention based on the learnt descriptor to improve recognition performance. Experimental results on standard WSLR datasets show that our method outperforms previous state-of-the-art methods significantly. We also demonstrate the effectiveness of our method on automatically localizing signs from sign news, achieving 28.1 for AP@0.5.

Journal ArticleDOI
03 Apr 2020
TL;DR: This work designs ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction, and demonstrates that it is possible to use it to extract deep classifiers trained on a variety of datasets from image and text domains.
Abstract: Machine learning models are increasingly being deployed in practice. Machine Learning as a Service (MLaaS) providers expose such models to queries by third-party developers through application programming interfaces (APIs). Prior work has developed model extraction attacks, in which an attacker extracts an approximation of an MLaaS model by making black-box queries to it. We design ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction. It does not expect strong domain knowledge or access to annotated data on the part of the attacker. We demonstrate that (1) it is possible to use ActiveThief to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.