Showing papers on "Domain knowledge published in 2020"

PDF

Open Access

Journal Article•DOI•

K-BERT: Enabling Language Representation with Knowledge Graph

[...]

Weijie Liu¹, Peng Zhou², Zhe Zhao², Zhiruo Wang³, Qi Ju², Haotang Deng², Ping Wang² - Show less +3 more•Institutions (3)

Peking University¹, Tencent², Beijing Normal University³

03 Apr 2020

TL;DR: This work proposes a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge, which significantly outperforms BERT and reveals promising results in twelve NLP tasks.

...read moreread less

Abstract: Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by being equipped with a KG without pre-training by itself because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.

...read moreread less

516 citations

Journal Article•DOI•

Explainable Machine Learning for Scientific Insights and Discoveries

[...]

Ribana Roscher¹, Bastian Bohn², Marco F. Duarte³, Jochen Garcke²•Institutions (3)

University of Osnabrück¹, University of Bonn², University of Massachusetts Amherst³

24 Feb 2020-IEEE Access

TL;DR: In this paper, the authors provide a survey of recent scientific works that incorporate machine learning and the way that explainable machine learning is used in combination with domain knowledge from the application areas.

...read moreread less

Abstract: Machine learning methods have been remarkably successful for a wide range of application areas in the extraction of essential information from data. An exciting and relatively recent development is the uptake of machine learning in the natural sciences, where the major goal is to obtain novel scientific insights and discoveries from observational or simulated data. A prerequisite for obtaining a scientific outcome is domain knowledge, which is needed to gain explainability, but also to enhance scientific consistency. In this article, we review explainable machine learning in view of applications in the natural sciences and discuss three core elements that we identified as relevant in this context: transparency, interpretability, and explainability. With respect to these core elements, we provide a survey of recent scientific works that incorporate machine learning and the way that explainable machine learning is used in combination with domain knowledge from the application areas.

...read moreread less

493 citations

Proceedings Article•DOI•

Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making

[...]

Yunfeng Zhang¹, Q. Vera Liao¹, Rachel K. E. Bellamy¹•Institutions (1)

IBM¹

27 Jan 2020

TL;DR: It is shown that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making, which may also depend on whether the human can bring in enough unique knowledge to complement the AI's errors.

...read moreread less

Abstract: Today, AI is being increasingly used to help human experts make decisions in high-stakes scenarios. In these scenarios, full automation is often undesirable, not only due to the significance of the outcome, but also because human experts can draw on their domain knowledge complementary to the model's to ensure task success. We refer to these scenarios as AI-assisted decision making, where the individual strengths of the human and the AI come together to optimize the joint decision outcome. A key to their success is to appropriately calibrate human trust in the AI on a case-by-case basis; knowing when to trust or distrust the AI allows the human expert to appropriately apply their knowledge, improving decision outcomes in cases where the model is likely to perform poorly. This research conducts a case study of AI-assisted decision making in which humans and AI have comparable performance alone, and explores whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI. Specifically, we study the effect of showing confidence score and local explanation for a particular prediction. Through two human experiments, we show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making, which may also depend on whether the human can bring in enough unique knowledge to complement the AI's errors. We also highlight the problems in using local explanation for AI-assisted decision making scenarios and invite the research community to explore new approaches to explainability for calibrating human trust in AI.

...read moreread less

287 citations

Journal Article•DOI•

OpenFermion: The Electronic Structure Package for Quantum Computers

[...]

Jarrod R. McClean¹, Jarrod R. McClean², Nicholas C. Rubin², Kevin J. Sung³, Kevin J. Sung², Ian D. Kivlichan⁴, Ian D. Kivlichan², Xavier Bonet-Monroig⁵, Xavier Bonet-Monroig⁶, Yudong Cao⁴, Chengyu Dai³, E. Schuyler Fried⁴, Craig Gidney², Brendan Gimby³, Pranav Gokhale⁷, Thomas Häner⁸, Tarini S. Hardikar⁹, Vojtěch Havlíček¹⁰, Oscar Higgott¹¹, Cupjin Huang³, Josh Izaac, Zhang Jiang¹², Zhang Jiang², Xinle Liu², Sam McArdle¹⁰, Matthew Neeley², Thomas E. O'Brien², Thomas E. O'Brien⁶, Bryan O'Gorman¹², Bryan O'Gorman¹³, Isil Ozfidan¹⁴, Maxwell D. Radin¹⁵, Jhonathan Romero⁴, Nicolas P. D. Sawaya⁴, Bruno Senjean⁶, Kanav Setia⁹, Sukin Sim⁴, Damian S. Steiger⁸, Damian S. Steiger², Mark Steudtner⁶, Mark Steudtner¹, Mark Steudtner⁵, Qiming Sun¹⁶, Wei Sun², Daochen Wang¹⁷, Fang Zhang³, Ryan Babbush¹, Ryan Babbush² - Show less +44 more•Institutions (17)

Free University of Berlin¹, Google², University of Michigan³, Harvard University⁴, Delft University of Technology⁵, Leiden University⁶, University of Chicago⁷, ETH Zurich⁸, Dartmouth College⁹, University of Oxford¹⁰, University College London¹¹, Ames Research Center¹², University of California, Berkeley¹³, D-Wave Systems¹⁴, University of California, Santa Barbara¹⁵, California Institute of Technology¹⁶, University of Maryland, College Park¹⁷

30 Apr 2020

TL;DR: OpenFermion as mentioned in this paper is an open-source software library written largely in Python under an Apache 2.0 license, aimed at enabling the simulation of fermionic and bosonic models and quantum chemistry problems on quantum hardware.

...read moreread less

Abstract: Quantum simulation of chemistry and materials is predicted to be an important application for both near-term and fault-tolerant quantum devices. However, at present, developing and studying algorithms for these problems can be difficult due to the prohibitive amount of domain knowledge required in both the area of chemistry and quantum algorithms. To help bridge this gap and open the field to more researchers, we have developed the OpenFermion software package (www.openfermion.org). OpenFermion is an open-source software library written largely in Python under an Apache 2.0 license, aimed at enabling the simulation of fermionic and bosonic models and quantum chemistry problems on quantum hardware. Beginning with an interface to common electronic structure packages, it simplifies the translation between a molecular specification and a quantum circuit for solving or studying the electronic structure problem on a quantum computer, minimizing the amount of domain expertise required to enter the field. The package is designed to be extensible and robust, maintaining high software standards in documentation and testing. This release paper outlines the key motivations behind design choices in OpenFermion and discusses some basic OpenFermion functionality which we believe will aid the community in the development of better quantum algorithms and tools for this exciting area of research.

...read moreread less

258 citations

Posted Content•

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

[...]

Chence Shi¹, Minkai Xu², Zhaocheng Zhu³, Weinan Zhang⁴, Ming Zhang⁵, Jian Tang⁶ - Show less +2 more•Institutions (6)

Peking University¹, Microsoft², Tsinghua University³, Shanghai Jiao Tong University⁴, Hong Kong Polytechnic University⁵, Canadian Institute for Advanced Research⁶

26 Jan 2020-arXiv: Learning

TL;DR: After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.

...read moreread less

Abstract: Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.

...read moreread less

218 citations

Proceedings Article•DOI•

Fact or fiction: Verifying scientific claims

[...]

David Wadden¹, Shanchuan Lin¹, Kyle Lo², Lucy Lu Wang², Madeleine van Zuylen², Arman Cohan², Hannaneh Hajishirzi² - Show less +3 more•Institutions (2)

University of Washington¹, Allen Institute for Artificial Intelligence²

30 Apr 2020

TL;DR: This work introduces scientific claim verification, a new task to select abstracts from the research literature containing evidence that supports or refutes a given scientific claim, and to identify rationales justifying each decision.

...read moreread less

Abstract: We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim, and to identify rationales justifying each decision. To study this task, we construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales. We develop baseline models for SciFact, and demonstrate that simple domain adaptation techniques substantially improve performance compared to models trained on Wikipedia or political news. We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus. Our experiments indicate that SciFact will provide a challenging testbed for the development of new systems designed to retrieve and reason over corpora containing specialized domain knowledge. Data and code for this new task are publicly available at https://github.com/allenai/scifact. A leaderboard and COVID-19 fact-checking demo are available at https://scifact.apps.allenai.org.

...read moreread less

214 citations

Journal Article•DOI•

TSFEL: Time Series Feature Extraction Library

[...]

Marília Barandas, Duarte Folgado, Letícia Fernandes, Sara Santos, Mariana Abreu, Patrícia J. Bota, Hui Liu¹, Tanja Schultz¹, Hugo Gamboa² - Show less +5 more•Institutions (2)

University of Bremen¹, Universidade Nova de Lisboa²

01 Jan 2020-SoftwareX

TL;DR: A Python package entitled TSFEL, which computes over 60 different features extracted across temporal, statistical and spectral domains, designed to support the process of fast exploratory data analysis and feature extraction on time series with computational cost evaluation.

...read moreread less

192 citations

Proceedings Article•DOI•

Enhanced Transport Distance for Unsupervised Domain Adaptation

[...]

Mengxue Li¹, Yi-Ming Zhai¹, You-Wei Luo¹, Pengfei Ge¹, Chuan-Xian Ren¹ - Show less +1 more•Institutions (1)

Sun Yat-sen University¹

14 Jun 2020

TL;DR: This work proposes an enhanced transport distance (ETD) for UDA, which builds an attention-aware transport distance, which can be viewed as the prediction feedback of the iteratively learned classifier, to measure the domain discrepancy.

...read moreread less

Abstract: Unsupervised domain adaptation (UDA) is a representative problem in transfer learning, which aims to improve the classification performance on an unlabeled target domain by exploiting discriminant information from a labeled source domain. The optimal transport model has been used for UDA in the perspective of distribution matching. However, the transport distance cannot reflect the discriminant information from either domain knowledge or category prior. In this work, we propose an enhanced transport distance (ETD) for UDA. This method builds an attention-aware transport distance, which can be viewed as the prediction feedback of the iteratively learned classifier, to measure the domain discrepancy. Further, the Kantorovich potential variable is re-parameterized by deep neural networks to learn the distribution in the latent space. The entropy-based regularization is developed to explore the intrinsic structure of the target domain. The proposed method is optimized alternately in an end-to-end manner. Extensive experiments are conducted on four benchmark datasets to demonstrate the SOTA performance of ETD.

...read moreread less

147 citations

Posted Content•

Model-Based Deep Learning.

[...]

Nir Shlezinger¹, Jay Whang², Yonina C. Eldar³, Alexandros G. Dimakis²•Institutions (3)

Ben-Gurion University of the Negev¹, University of Texas at Austin², Weizmann Institute of Science³

15 Dec 2020-arXiv: Signal Processing

TL;DR: A comprehensive review of the leading approaches for combining model-based algorithms with deep learning in a systematic manner is provided, along with concrete guidelines and detailed signal processing oriented examples from recent literature.

...read moreread less

Abstract: Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Such model-based methods utilize mathematical formulations that represent the underlying physics, prior information and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. On the other hand, purely data-driven approaches that are model-agnostic are becoming increasingly popular as datasets become abundant and the power of modern deep learning pipelines increases. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance, especially for supervised problems. However, DNNs typically require massive amounts of data and immense computational resources, limiting their applicability for some signal processing scenarios. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches. Such model-based deep learning methods exploit both partial domain knowledge, via mathematical structures designed for specific problems, as well as learning from limited data. In this article we survey the leading approaches for studying and designing model-based deep learning systems. We divide hybrid model-based/data-driven systems into categories based on their inference mechanism. We provide a comprehensive review of the leading approaches for combining model-based algorithms with deep learning in a systematic manner, along with concrete guidelines and detailed signal processing oriented examples from recent literature. Our aim is to facilitate the design and study of future systems on the intersection of signal processing and machine learning that incorporate the advantages of both domains.

...read moreread less

120 citations

Proceedings Article•DOI•

Doctor XAI: an ontology-based approach to black-box sequential data classification explanations

[...]

Cecilia Panigutti, Alan Perotti¹, Dino Pedreschi²•Institutions (2)

Institute for Scientific Interchange¹, University of Pisa²

27 Jan 2020

TL;DR: This paper focuses on explaining Doctor AI, a multilabel classifier which takes as input the clinical history of a patient in order to predict the next visit, and shows how exploiting the temporal dimension in the data and the domain knowledge encoded in the medical ontology improves the quality of the mined explanations.

...read moreread less

Abstract: Several recent advancements in Machine Learning involve blackbox models: algorithms that do not provide human-understandable explanations in support of their decisions. This limitation hampers the fairness, accountability and transparency of these models; the field of eXplainable Artificial Intelligence (XAI) tries to solve this problem providing human-understandable explanations for black-box models. However, healthcare datasets (and the related learning tasks) often present peculiar features, such as sequential data, multi-label predictions, and links to structured background knowledge. In this paper, we introduce Doctor XAI, a model-agnostic explainability technique able to deal with multi-labeled, sequential, ontology-linked data. We focus on explaining Doctor AI, a multilabel classifier which takes as input the clinical history of a patient in order to predict the next visit. Furthermore, we show how exploiting the temporal dimension in the data and the domain knowledge encoded in the medical ontology improves the quality of the mined explanations.

...read moreread less

113 citations

Proceedings Article•DOI•

Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making

[...]

Yunfeng Zhang¹, Q. Vera Liao¹, Rachel K. E. Bellamy¹•Institutions (1)

IBM¹

07 Jan 2020-arXiv: Artificial Intelligence

TL;DR: In this paper, the authors study the effect of showing confidence score and local explanation for a particular prediction for AI-assisted decision making, and show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve the joint performance of the human and AI.

...read moreread less

Abstract: Today, AI is being increasingly used to help human experts make decisions in high-stakes scenarios. In these scenarios, full automation is often undesirable, not only due to the significance of the outcome, but also because human experts can draw on their domain knowledge complementary to the model's to ensure task success. We refer to these scenarios as AI-assisted decision making, where the individual strengths of the human and the AI come together to optimize the joint decision outcome. A key to their success is to appropriately \textit{calibrate} human trust in the AI on a case-by-case basis; knowing when to trust or distrust the AI allows the human expert to appropriately apply their knowledge, improving decision outcomes in cases where the model is likely to perform poorly. This research conducts a case study of AI-assisted decision making in which humans and AI have comparable performance alone, and explores whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI. Specifically, we study the effect of showing confidence score and local explanation for a particular prediction. Through two human experiments, we show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making, which may also depend on whether the human can bring in enough unique knowledge to complement the AI's errors. We also highlight the problems in using local explanation for AI-assisted decision making scenarios and invite the research community to explore new approaches to explainability for calibrating human trust in AI.

...read moreread less

Proceedings Article•

DDSP: Differentiable Digital Signal Processing

[...]

Jesse Engel¹, Lamtharn Hantrakul¹, Chenjie Gu¹, Adam Roberts¹•Institutions (1)

Google¹

30 Apr 2020

TL;DR: Differentiable Digital Signal Processing (DDSP) as discussed by the authors is an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning, which enables manipulation of each separate model component, with applications such as independent control of pitch and loudness.

...read moreread less

Abstract: Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is generated and perceived. A third approach (vocoders/synthesizers) successfully incorporates strong domain knowledge of signal processing and perception, but has been less actively researched due to limited expressivity and difficulty integrating with modern auto-differentiation-based machine learning methods. In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. Focusing on audio synthesis, we achieve high-fidelity generation without the need for large autoregressive models or adversarial losses, demonstrating that DDSP enables utilizing strong inductive biases without losing the expressive power of neural networks. Further, we show that combining interpretable modules permits manipulation of each separate model component, with applications such as independent control of pitch and loudness, realistic extrapolation to pitches not seen during training, blind dereverberation of room acoustics, transfer of extracted room acoustics to new environments, and transformation of timbre between disparate sources. In short, DDSP enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning. The library will be made available upon paper acceptance and we encourage further contributions from the community and domain experts.

...read moreread less

Posted Content•

Understanding and Improving Knowledge Distillation

[...]

Jiaxi Tang¹, Rakesh Shivanna¹, Zhe Zhao¹, Dong Lin¹, Anima Singh¹, Ed H. Chi¹, Sagar Jain¹ - Show less +3 more•Institutions (1)

Google¹

10 Feb 2020-arXiv: Learning

TL;DR: This paper dissects the effects of knowledge distillation into three main factors: (1) benefits inherited from label smoothing, (2) example re-weighting based on teacher's confidence on ground-truth, and (3) prior knowledge of optimal output (logit) layer geometry.

...read moreread less

Abstract: Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is used to train a more compact student model with better inference efficiency. Through distillation, one hopes to benefit from student's compactness, without sacrificing too much on model quality. Despite the large success of knowledge distillation, better understanding of how it benefits student model's training dynamics remains under-explored. In this paper, we categorize teacher's knowledge into three hierarchical levels and study its effects on knowledge distillation: (1) knowledge of the `universe', where KD brings a regularization effect through label smoothing; (2) domain knowledge, where teacher injects class relationships prior to student's logit layer geometry; and (3) instance specific knowledge, where teacher rescales student model's per-instance gradients based on its measurement on the event difficulty. Using systematic analyses and extensive empirical studies on both synthetic and real-world datasets, we confirm that the aforementioned three factors play a major role in knowledge distillation. Furthermore, based on our findings, we diagnose some of the failure cases of applying KD from recent studies.

...read moreread less

Journal Article•DOI•

Retailing and retailing research in the age of big data analytics

[...]

Marnik G. Dekimpe¹, Marnik G. Dekimpe²•Institutions (2)

Tilburg University¹, Katholieke Universiteit Leuven²

01 Mar 2020-International Journal of Research in Marketing

TL;DR: In this article, the authors consider the challenges and opportunities in the retail sector from five perspectives: retail managers, retailing researchers, public-policy makers, investors, and retailing educators.

...read moreread less

Proceedings Article•DOI•

Machine learning for structural health monitoring: challenges and opportunities

[...]

Fuh-Gwo Yuan¹, Sakib Ashraf Zargar¹, Qiuyi Chen¹, Shaohan Wang¹•Institutions (1)

North Carolina State University¹

23 Apr 2020

TL;DR: As a step towards the goal of automated damage detection, preliminary results are presented from dynamic modelling of beam structures using physics-informed artificial neural networks and a sensing paradigm for non-contact full-field measurements for damage diagnosis is presented.

...read moreread less

Abstract: A physics-based approach to structural health monitoring (SHM) has practical shortcomings which restrict its suitability to simple structures under well controlled environments. With the advances in information and sensing technology (sensors and sensor networks), it has become feasible to monitor large/diverse number of parameters in complex real-world structures either continuously or intermittently by employing large in-situ (wireless) sensor networks. The availability of this historical data has engendered a lot of interest in a data-driven approach as a natural and more viable option for realizing the goal of SHM in such structures. However, the lack of sensor data corresponding to different damage scenarios continues to remain a challenge. Most of the supervised machine-learning/deep-learning techniques, when trained using this inherently limited data, lack robustness and generalizability. Physics-informed learning, which involves the integration of domain knowledge into the learning process, is presented here as a potential remedy to this challenge. As a step towards the goal of automated damage detection (mathematically an inverse problem), preliminary results are presented from dynamic modelling of beam structures using physics-informed artificial neural networks. Forward and inverse problems involving partial differential equations are solved and comparisons reveal a clear superiority of physics-informed approach over one that is purely datadriven vis-a-vis overfitting/generalization. Other ways of incorporating domain knowledge into the machine learning pipeline are then presented through case-studies on various aspects of NDI/SHM (visual inspection, impact diagnosis). Lastly, as the final attribute of an optimal SHM approach, a sensing paradigm for non-contact full-field measurements for damage diagnosis is presented.

...read moreread less

Journal Article•DOI•

Deep learning and network analysis: Classifying and visualizing accident narratives in construction

[...]

Botao Zhong¹, Xing Pan¹, Peter E.D. Love², Lieyun Ding¹, Weili Fang¹ - Show less +1 more•Institutions (2)

Huazhong University of Science and Technology¹, Curtin University²

01 May 2020-Automation in Construction

TL;DR: The proposed automated classification model and LDA-based network analysis method provide a useful approach to enable machine-assisted interpretation of texts-based accident narratives and can provide managers with much-needed information and knowledge to improve safety on-site.

...read moreread less

Proceedings Article•DOI•

Discovering subsequence patterns for next POI recommendation

[...]

Kangzhi Zhao¹, Yong Zhang¹, Hongzhi Yin², Jin Wang³, Kai Zheng⁴, Xiaofang Zhou², Chunxiao Xing¹ - Show less +3 more•Institutions (4)

Tsinghua University¹, University of Queensland², University of California, Los Angeles³, University of Electronic Science and Technology of China⁴

01 Jul 2020

TL;DR: This paper proposes Adaptive Sequence Partitioner with Power-law Attention (ASPPA) to automatically identify each semantic subsequence of POIs and discover their sequential patterns in the user’s check-in sequence.

...read moreread less

Abstract: Next Point-of-Interest (POI) recommendation plays an important role in location-based services. State-of-the-art methods learn the POI-level sequential patterns in the user's check-in sequence but ignore the subsequence patterns that often represent the socio-economic activities or coherence of preference of the users. However, it is challenging to integrate the semantic subsequences due to the difficulty to predefine the granularity of the complex but meaningful subsequences. In this paper, we propose Adaptive Sequence Partitioner with Power-law Attention (ASPPA) to automatically identify each semantic subsequence of POIs and discover their sequential patterns. Our model adopts a state-based stacked recurrent neural network to hierarchically learn the latent structures of the user's check-in sequence. We also design a power-law attention mechanism to integrate the domain knowledge in spatial and temporal contexts. Extensive experiments on two real-world datasets demonstrate the effectiveness of our model.

...read moreread less

Proceedings Article•DOI•

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real

[...]

Kanishka Rao¹, Harris Christopher K¹, Alex Irpan¹, Sergey Levine², Julian Ibarz¹, Mohi Khansari - Show less +2 more•Institutions (2)

Google¹, University of California, Berkeley²

14 Jun 2020

TL;DR: The RL-CycleGAN as mentioned in this paper introduces the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image.

...read moreread less

Abstract: Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain the RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.

...read moreread less

Journal Article•DOI•

Series2Graph: graph-based subsequence anomaly detection for time series

[...]

Paul Boniol¹, Themis Palpanas¹•Institutions (1)

University of Paris¹

01 Jul 2020

TL;DR: The experimental results demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster.

...read moreread less

Abstract: Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches that have been proposed so far in the literature have severe limitations: they either require prior domain knowledge that is used to design the anomaly discovery algorithms, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems, and propose an unsupervised method suitable for domain agnostic subsequence anomaly detection. Our method, Series2Graph, is based on a graph representation of a novel low-dimensionality embedding of subsequences. Series2Graph needs neither labeled instances (like supervised techniques), nor anomaly-free data (like zero-positive learning techniques), and identifies anomalies of varying lengths. The experimental results, on the largest set of synthetic and real datasets used to date, demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster.

...read moreread less

Posted Content•

FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation.

[...]

Jaemin Na¹, Heechul Jung², Hyung Jin Chang³, Wonjun Hwang¹•Institutions (3)

Ajou University¹, Kyungpook National University², University of Birmingham³

18 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A fixed ratio-based mixup is introduced to augment multiple intermediate domains between the source and target domain and gradually transfer domain knowledge from the source to the target domain.

...read moreread less

Abstract: Unsupervised domain adaptation (UDA) methods for learning domain invariant representations have achieved remarkable progress. However, few studies have been conducted on the case of large domain discrepancies between a source and a target domain. In this paper, we propose a UDA method that effectively handles such large domain discrepancies. We introduce a fixed ratio-based mixup to augment multiple intermediate domains between the source and target domain. From the augmented-domains, we train the source-dominant model and the target-dominant model that have complementary characteristics. Using our confidence-based learning methodologies, e.g., bidirectional matching with high-confidence predictions and self-penalization using low-confidence predictions, the models can learn from each other or from its own results. Through our proposed methods, the models gradually transfer domain knowledge from the source to the target domain. Extensive experiments demonstrate the superiority of our proposed method on three public benchmarks: Office-31, Office-Home, and VisDA-2017.

...read moreread less

Posted Content•

ROMA: Multi-Agent Reinforcement Learning with Emergent Roles

[...]

Tonghan Wang¹, Heng Dong¹, Victor Lesser², Chongjie Zhang¹•Institutions (2)

Tsinghua University¹, University of Massachusetts Amherst²

18 Mar 2020-arXiv: Multiagent Systems

TL;DR: Experiments show that the proposed role-oriented MARL framework (ROMA) can learn specialized, dynamic, and identifiable roles, which help the method push forward the state of the art on the StarCraft II micromanagement benchmark.

...read moreread less

Abstract: The role concept provides a useful tool to design and understand complex multi-agent systems, which allows agents with a similar role to share similar behaviors. However, existing role-based methods use prior domain knowledge and predefine role structures and behaviors. In contrast, multi-agent reinforcement learning (MARL) provides flexibility and adaptability, but less efficiency in complex tasks. In this paper, we synergize these two paradigms and propose a role-oriented MARL framework (ROMA). In this framework, roles are emergent, and agents with similar roles tend to share their learning and to be specialized on certain sub-tasks. To this end, we construct a stochastic role embedding space by introducing two novel regularizers and conditioning individual policies on roles. Experiments show that our method can learn specialized, dynamic, and identifiable roles, which help our method push forward the state of the art on the StarCraft II micromanagement benchmark. Demonstrative videos are available at this https URL.

...read moreread less

Proceedings Article•DOI•

From Learning to Meta-Learning: Reduced Training Overhead and Complexity for Communication Systems

[...]

Osvaldo Simeone¹, Sangwoo Park², Joonhyuk Kang²•Institutions (2)

King's College London¹, KAIST²

17 Mar 2020

TL;DR: This paper provides a high-level introduction to meta-learning with applications to communication systems, and provides a way to automatize the selection of an inductive bias.

...read moreread less

Abstract: Machine learning methods adapt the parameters of a model, constrained to lie in a given model class, by using a fixed learning procedure based on data or active observations. Adaptation is done on a per-task basis, and retraining is needed when the system configuration changes. The resulting inefficiency in terms of data and training time requirements can be mitigated, if domain knowledge is available, by selecting a suitable model class and learning procedure, collectively known as inductive bias. However, it is generally difficult to encode prior knowledge into an inductive bias, particularly with black-box model classes such as neural networks. Meta-learning provides a way to automatize the selection of an inductive bias. Meta-learning leverages data or active observations from tasks that are expected to be related to future, and a priori unknown, tasks of interest. With a meta-trained inductive bias, training of a machine learning model can be potentially carried out with reduced training data and/or time complexity. This paper provides a high-level introduction to meta-learning with applications to communication systems.

...read moreread less

Proceedings Article•DOI•

AutoST: Efficient Neural Architecture Search for Spatio-Temporal Prediction

[...]

Li Ting, Junbo Zhang¹, Kainan Bao¹, Yuxuan Liang², Yexin Li³, Yu Zheng¹ - Show less +2 more•Institutions (3)

Southwest Jiaotong University¹, National University of Singapore², Hong Kong University of Science and Technology³

23 Aug 2020

TL;DR: This paper designs a novel search space tailored for ST-domain which consists of two categories of components: optional convolution operations at each layer to automatically extract multi-range spatio-temporal dependencies and learnable skip connections among layers to dynamically fuse low- and high-level ST-features.

...read moreread less

Abstract: Spatio-temporal (ST) prediction (e.g. crowd flow prediction) is of great importance in a wide range of smart city applications from urban planning, intelligent transportation and public safety. Recently, many deep neural network models have been proposed to make accurate prediction. However, manually designing neural networks requires amount of expert efforts and ST domain knowledge. How to automatically construct a general neural network for diverse spatio-temporal predication tasks in cities? In this paper, we study Neural Architecture Search (NAS) for spatio-temporal prediction and propose an efficient spatio-temporal neural architecture search method, entitled AutoST. To our best knowledge, the search space is an important human prior to the success of NAS in different applications while current NAS models concentrated on optimizing search strategy in the fixed search space. Thus, we design a novel search space tailored for ST-domain which consists of two categories of components: (i) optional convolution operations at each layer to automatically extract multi-range spatio-temporal dependencies; (ii) learnable skip connections among layers to dynamically fuse low- and high-level ST-features. We conduct extensive experiments on four real-word spatio-temporal prediction tasks, including taxi flow and crowd flow, showing that the learned network architectures can significantly improve the performance of representative ST neural network models. Furthermore, our proposed efficient NAS approach searches 8-10x faster than state-of-the-art NAS approaches, demonstrating the efficiency and effectiveness of AutoST.

...read moreread less

Posted Content•

RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

[...]

Kanishka Rao¹, Harris Christopher K¹, Alex Irpan¹, Sergey Levine², Julian Ibarz¹, Mohi Khansari - Show less +2 more•Institutions (2)

Google¹, University of California, Berkeley²

16 Jun 2020-arXiv: Robotics

TL;DR: The RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning, is obtained by incorporating the RL-scene consistency loss into unsupervised domain translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image.

...read moreread less

Abstract: Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.

...read moreread less

Journal Article•DOI•

Predicting the helpfulness score of online reviews using convolutional neural network

[...]

Sunil Saumya¹, Jyoti Prakash Singh¹, Yogesh K. Dwivedi²•Institutions (2)

National Institute of Technology, Patna¹, Swansea University²

01 Aug 2020

TL;DR: The proposed approach is found to be better than existing machine learning based models which used hand-crafted features and can be easily applied to any kind of review as the features are calculated only from the review text and not from other domain knowledge.

...read moreread less

Abstract: The smart cities aim to provide an infrastructure to their citizens that reduces both their time and effort. An example of such an available infrastructure is electronic shopping. Electronic shopping has become the hotbeds of many customers as it is easier to judge the quality of the product based on the review information. The purpose of this study is to predict the best helpful online product review, out of the several thousand reviews available for the product using review representation learning. The prediction is done using a two-layered convolutional neural network model. The review texts are embedded into low-dimensional vectors using a pre-trained model. To learn the best features of the review text, three filters are used to learn tri-gram, four-gram, and five-gram features of the text. The proposed approach is found to be better than existing machine learning based models which used hand-crafted features. The very low value of mean squared error confirms the prediction accuracy of the proposed method. The proposed method can be easily applied to any kind of review as the features are calculated only from the review text and not from other domain knowledge. The proposed model helps in predicting the helpfulness score of new reviews as soon as it gets posted on the product review page.

...read moreread less

Journal Article•DOI•

Comparative Study of Deep Learning-Based Sentiment Classification

[...]

Seungwan Seo¹, Czangyeob Kim¹, Haedong Kim², Kyounghyun Mo, Pilsung Kang¹ - Show less +1 more•Institutions (2)

Korea University¹, Pennsylvania State University²

01 Jan 2020-IEEE Access

TL;DR: Eight deep-learning models, three based on convolutional neural networks and five based on recurrent neural networks, with two types of input structures, i.e., word level and character level, are compared for 13 review datasets and the classification performances are discussed under different perspectives.

...read moreread less

Abstract: The purpose of sentiment classification is to determine whether a particular document has a positive or negative nuance. Sentiment classification is extensively used in many business domains to improve products or services by understanding the opinions of customers regarding these products. Deep learning achieves state-of-the-art results in various challenging domains. With the success of deep learning, many studies have proposed deep-learning-based sentiment classification models and achieved better performances compared with conventional machine learning models. However, one practical issue occurring in deep-learning-based sentiment classification is that the best model structure depends on the characteristics of the dataset on which the deep learning model is trained; moreover, it is manually determined based on the domain knowledge of an expert or selected from a grid search of possible candidates. Herein, we present a comparative study of different deep-learning-based sentiment classification model structures to derive meaningful implications for building sentiment classification models. Specifically, eight deep-learning models, three based on convolutional neural networks and five based on recurrent neural networks, with two types of input structures, i.e., word level and character level, are compared for 13 review datasets, and the classification performances are discussed under different perspectives.

...read moreread less

Proceedings Article•DOI•

Veridical Data Science

[...]

Bin Yu¹•Institutions (1)

University of California, Berkeley¹

20 Jan 2020

TL;DR: This work proposes (basic) PCS inference for reliability measures on data results, extending statistical inference to a much broader scope as current data science practice entails, and proposes PCS documentation based on R Markdown or Jupyter Notebook to back up human choices made throughout an analysis.

...read moreread less

Abstract: Veridical data science extracts reliable and reproducible information from data, with an enriched technical language to communicate and evaluate empirical evidence in the context of human decisions and domain knowledge. Building and expanding on principles of statistics, machine learning, and the sciences, we propose the predictability, computability, and stability (PCS) framework forveridical data science. Our framework is comprised of both a workflow and documentation and aims to provide responsible, reliable, reproducible, and transparent results across the entire data science life cycle. Moreover, we propose the PDR desiderata for interpretable machine learning as part of veridical data science (with PDR standing for predictive accuracy, predictive accuracy and relevancy to a human audience and a particular domain problem). The PCS framework will be illustrated through the development of the DeepTune framework for characterizing V4 neurons. DeepTune builds predictive models using DNNs and ridge regression and applies the stability principle to obtain stable interpretations of 18 predictive models. Finally, a general DNN interpretaion method based on contexual decomposition (CD) will be discussed with applications to sentiment analysis and cosmological parameter estimation.

...read moreread less

Journal Article•DOI•

A relationship extraction method for domain knowledge graph construction

[...]

Haoze Yu¹, Haisheng Li¹, Dianhui Mao¹, Qiang Cai¹•Institutions (1)

Beijing Technology and Business University¹

01 Mar 2020-World Wide Web

TL;DR: This paper proposes a relationship extraction method for domain knowledge graph construction based on improved cross-entropy loss function and obtains upper and lower relationships from structured data in the classification system of network encyclopedia and semi-structured data inThe classification labels of web pages.

...read moreread less

Abstract: As a semantic knowledge base, knowledge graph is a powerful tool for managing large-scale knowledge consists with instances, concepts and relationships between them. In view that the existing domain knowledge graphs can not obtain relationships in various structures through targeted approaches in the process of construction which resulting in insufficient knowledge utilization, this paper proposes a relationship extraction method for domain knowledge graph construction. We obtain upper and lower relationships from structured data in the classification system of network encyclopedia and semi-structured data in the classification labels of web pages, and non-superordinate relationships are extracted from unstructured text through the proposed convolution residual network based on improved cross-entropy loss function. We verify the effectiveness of the designed method by comparing with existing relationship extraction methods and constructing a food domain knowledge graph.

...read moreread less

Proceedings Article•DOI•

Transferring Cross-Domain Knowledge for Video Sign Language Recognition

[...]

Dongxu Li¹, Xin Yu¹, Chenchen Xu¹, Lars Petersson¹, Hongdong Li¹ - Show less +1 more•Institutions (1)

Australian National University¹

14 Jun 2020

TL;DR: In this paper, a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them is proposed.

...read moreread less

Abstract: Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. It requires models to recognize isolated sign words from videos. However, annotating WSLR data needs expert knowledge, thus limiting WSLR dataset acquisition. On the contrary, there are abundant subtitled sign news videos on the internet. Since these videos have no word-level annotation and exhibit a large domain gap from isolated signs, they cannot be directly used for training WSLR models. We observe that despite the existence of a large domain gap, isolated and news signs share the same visual concepts, such as hand gestures and body movements. Motivated by this observation, we propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them. To this end, we extract news signs using a base WSLR model, and then design a classifier jointly trained on news and isolated signs to coarsely align these two domain features. In order to learn domain-invariant features within each class and suppress domain-specific features, our method further resorts to an external memory to store the class centroids of the aligned news signs. We then design a temporal attention based on the learnt descriptor to improve recognition performance. Experimental results on standard WSLR datasets show that our method outperforms previous state-of-the-art methods significantly. We also demonstrate the effectiveness of our method on automatically localizing signs from sign news, achieving 28.1 for AP@0.5.

...read moreread less

Journal Article•DOI•

ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data

[...]

Soham Pal¹, Yash Gupta¹, Aditya Shukla¹, Aditya Kanade¹, Shirish Shevade¹, Vinod Ganapathy¹ - Show less +2 more•Institutions (1)

Indian Institute of Science¹

03 Apr 2020

TL;DR: This work designs ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction, and demonstrates that it is possible to use it to extract deep classifiers trained on a variety of datasets from image and text domains.

...read moreread less

Abstract: Machine learning models are increasingly being deployed in practice. Machine Learning as a Service (MLaaS) providers expose such models to queries by third-party developers through application programming interfaces (APIs). Prior work has developed model extraction attacks, in which an attacker extracts an approximation of an MLaaS model by making black-box queries to it. We design ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction. It does not expect strong domain knowledge or access to annotated data on the part of the attacker. We demonstrate that (1) it is possible to use ActiveThief to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.

...read moreread less

Collapse