scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 2020"


Proceedings ArticleDOI
Kaiming He1, Haoqi Fan1, Yuxin Wu1, Saining Xie1, Ross Girshick1 
14 Jun 2020
TL;DR: This article proposed Momentum Contrast (MoCo) for unsupervised visual representation learning, which enables building a large and consistent dictionary on-the-fly that facilitates contrastive learning.
Abstract: We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

4,128 citations


Posted Content
TL;DR: With simple modifications to MoCo, this note establishes stronger baselines that outperform SimCLR and do not require large training batches, and hopes this will make state-of-the-art unsupervised learning research more accessible.
Abstract: Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR's design improvements by implementing them in the MoCo framework. With simple modifications to MoCo---namely, using an MLP projection head and more data augmentation---we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. Code will be made public.

1,947 citations


Posted Content
TL;DR: This paper proposes an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons, and uses a swapped prediction mechanism where it predicts the cluster assignment of a view from the representation of another view.
Abstract: Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or views) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a swapped prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.

1,771 citations


Journal ArticleDOI
TL;DR: This survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work.
Abstract: Semi-supervised learning is the branch of machine learning concerned with using labelled as well as unlabelled data to perform certain learning tasks. Conceptually situated between supervised and unsupervised learning, it permits harnessing the large amounts of unlabelled data available in many use cases in combination with typically smaller sets of labelled data. In recent years, research in this area has followed the general trends observed in machine learning, with much attention directed at neural network-based models and generative learning. The literature on the topic has also expanded in volume and scope, now encompassing a broad spectrum of theory, algorithms and applications. However, no recent surveys exist to collect and organize this knowledge, impeding the ability of researchers and engineers alike to utilize it. Filling this void, we present an up-to-date overview of semi-supervised learning methods, covering earlier work as well as more recent advances. We focus primarily on semi-supervised classification, where the large majority of semi-supervised learning research takes place. Our survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work. Furthermore, we propose a new taxonomy of semi-supervised classification algorithms, which sheds light on the different conceptual and methodological approaches for incorporating unlabelled data into the training process. Lastly, we show how the fundamental assumptions underlying most semi-supervised learning algorithms are closely connected to each other, and how they relate to the well-known semi-supervised clustering assumption.

1,226 citations


Journal ArticleDOI
TL;DR: An unsupervised learning schema is constructed for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters.
Abstract: The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsupervised learning to clustering in pattern recognition and machine learning, the k-means algorithm and its extensions are always influenced by initializations with a necessary number of clusters a priori. That is, the k-means algorithm is not exactly an unsupervised clustering method. In this paper, we construct an unsupervised learning schema for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters. That is, we propose a novel unsupervised k-means (U-k-means) clustering algorithm with automatically finding an optimal number of clusters without giving any initialization and parameter selection. The computational complexity of the proposed U-k-means clustering algorithm is also analyzed. Comparisons between the proposed U-k-means and other existing methods are made. Experimental results and comparisons actually demonstrate these good aspects of the proposed U-k-means clustering algorithm.

545 citations


Journal ArticleDOI
04 Mar 2020-Nature
TL;DR: It is demonstrated that an image sensor can itself constitute an ANN that can simultaneously sense and process optical images without latency, and is trained to classify and encode images with high throughput, acting as an artificial neural network.
Abstract: Machine vision technology has taken huge leaps in recent years, and is now becoming an integral part of various intelligent systems, including autonomous vehicles and robotics. Usually, visual information is captured by a frame-based camera, converted into a digital format and processed afterwards using a machine-learning algorithm such as an artificial neural network (ANN)1. The large amount of (mostly redundant) data passed through the entire signal chain, however, results in low frame rates and high power consumption. Various visual data preprocessing techniques have thus been developed2-7 to increase the efficiency of the subsequent signal processing in an ANN. Here we demonstrate that an image sensor can itself constitute an ANN that can simultaneously sense and process optical images without latency. Our device is based on a reconfigurable two-dimensional (2D) semiconductor8,9 photodiode10-12 array, and the synaptic weights of the network are stored in a continuously tunable photoresponsivity matrix. We demonstrate both supervised and unsupervised learning and train the sensor to classify and encode images that are optically projected onto the chip with a throughput of 20 million bins per second.

436 citations


Journal ArticleDOI
TL;DR: In this article, the authors review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning and investigate their employment in the compelling applications of wireless networks, including heterogeneous networks, cognitive radios (CR), Internet of Things (IoT), machine to machine networks (M2M), and so on.
Abstract: Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of Things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.

413 citations


Journal ArticleDOI
TL;DR: A general Contrastive Representation Learning framework is proposed that simplifies and unifies many different contrastive learning methods and a taxonomy for each of the components is provided in order to summarise and distinguish it from other forms of machine learning.
Abstract: Contrastive Learning has recently received interest due to its success in self-supervised representation learning in the computer vision domain. However, the origins of Contrastive Learning date as far back as the 1990s and its development has spanned across many fields and domains including Metric Learning and natural language processing. In this paper, we provide a comprehensive literature review and we propose a general Contrastive Representation Learning framework that simplifies and unifies many different contrastive learning methods. We also provide a taxonomy for each of the components of contrastive learning in order to summarise it and distinguish it from other forms of machine learning. We then discuss the inductive biases which are present in any contrastive learning system and we analyse our framework under different views from various sub-fields of Machine Learning. Examples of how contrastive learning has been applied in computer vision, natural language processing, audio processing, and others, as well as in Reinforcement Learning are also presented. Finally, we discuss the challenges and some of the most promising future research directions ahead.

359 citations


Journal ArticleDOI
31 Oct 2020
TL;DR: In contrastive self-supervised learning as discussed by the authors, augmented versions of the same sample close to each other while trying to push away embeddings from different samples is used to learn representations for several downstream tasks.
Abstract: Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.

325 citations


Proceedings ArticleDOI
20 Apr 2020
TL;DR: An unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder is developed, which outperforms state-of-the-art unsuper supervised counterparts, and even sometimes exceeds the performance of supervised ones.
Abstract: The richness in the content of various information networks such as social networks and communication networks provides the unprecedented potential for learning high-quality expressive representations without external supervision. This paper investigates how to preserve and extract the abundant information from graph-structured data into embedding space in an unsupervised manner. To this end, we propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations. GMI generalizes the idea of conventional mutual information computations from vector space to the graph domain where measuring mutual information from two aspects of node features and topological structure is indispensable. GMI exhibits several benefits: First, it is invariant to the isomorphic transformation of input graphs—an inevitable constraint in many existing graph representation learning algorithms; Besides, it can be efficiently estimated and maximized by current mutual information estimation methods such as MINE; Finally, our theoretical analysis confirms its correctness and rationality. With the aid of GMI, we develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder. Considerable experiments on transductive as well as inductive node classification and link prediction demonstrate that our method outperforms state-of-the-art unsupervised counterparts, and even sometimes exceeds the performance of supervised ones.

320 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this article, an unsupervised learning approach to anomaly detection that considers the diversity of normal patterns explicitly, while lessening the representation capacity of CNNs is presented. But the main drawbacks of these approaches are that they do not consider the diversity this article.
Abstract: We address the problem of anomaly detection, that is, detecting anomalous events in a video sequence. Anomaly detection methods based on convolutional neural networks (CNNs) typically leverage proxy tasks, such as reconstructing input video frames, to learn models describing normality without seeing anomalous samples at training time, and quantify the extent of abnormalities using the reconstruction error at test time. The main drawbacks of these approaches are that they do not consider the diversity of normal patterns explicitly, and the powerful representation capacity of CNNs allows to reconstruct abnormal video frames. To address this problem, we present an unsupervised learning approach to anomaly detection that considers the diversity of normal patterns explicitly, while lessening the representation capacity of CNNs. To this end, we propose to use a memory module with a new update scheme where items in the memory record prototypical patterns of normal data. We also present novel feature compactness and separateness losses to train the memory, boosting the discriminative power of both memory items and deeply learned features from normal data. Experimental results on standard benchmarks demonstrate the effectiveness and efficiency of our approach, which outperforms the state of the art.

Proceedings ArticleDOI
23 Aug 2020
TL;DR: A fast and stable method called UnSupervised Anomaly Detection for multivariate time series (USAD) based on adversely trained autoencoders capable of learning in an unsupervised way is proposed.
Abstract: The automatic supervision of IT systems is a current challenge at Orange. Given the size and complexity reached by its IT operations, the number of sensors needed to obtain measurements over time, used to infer normal and abnormal behaviors, has increased dramatically making traditional expert-based supervision methods slow or prone to errors. In this paper, we propose a fast and stable method called UnSupervised Anomaly Detection for multivariate time series (USAD) based on adversely trained autoencoders. Its autoencoder architecture makes it capable of learning in an unsupervised way. The use of adversarial training and its architecture allows it to isolate anomalies while providing fast training. We study the properties of our methods through experiments on five public datasets, thus demonstrating its robustness, training speed and high anomaly detection performance. Through a feasibility study using Orange's proprietary data we have been able to validate Orange's requirements on scalability, stability, robustness, training speed and high performance.

Proceedings Article
06 Dec 2020
TL;DR: SwAV as discussed by the authors uses a "swapped" prediction mechanism where they predict the cluster assignment of a view from the representation of another view, instead of comparing features directly as in contrastive learning.
Abstract: Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or "views") of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a "swapped" prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.

Journal ArticleDOI
07 Jan 2020-Sensors
TL;DR: This survey is to review some well-known techniques for each approach and to give the taxonomy of their categories and a solid discussion is given about future directions in terms of techniques to be used for face recognition.
Abstract: Over the past few decades, interest in theories and algorithms for face recognition has been growing rapidly. Video surveillance, criminal identification, building access control, and unmanned and autonomous vehicles are just a few examples of concrete applications that are gaining attraction among industries. Various techniques are being developed including local, holistic, and hybrid approaches, which provide a face image description using only a few face image features or the whole facial features. The main contribution of this survey is to review some well-known techniques for each approach and to give the taxonomy of their categories. In the paper, a detailed comparison between these techniques is exposed by listing the advantages and the disadvantages of their schemes in terms of robustness, accuracy, complexity, and discrimination. One interesting feature mentioned in the paper is about the database used for face recognition. An overview of the most commonly used databases, including those of supervised and unsupervised learning, is given. Numerical results of the most interesting techniques are given along with the context of experiments and challenges handled by these techniques. Finally, a solid discussion is given in the paper about future directions in terms of techniques to be used for face recognition.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new intrusion detection framework based on the feature selection and ensemble learning techniques, and this framework is able to exhibit better performance than other related and state of the art approaches under several metrics.

Journal ArticleDOI
TL;DR: A three-step methodological framework called computational grounded theory is proposed, which combines expert human knowledge and hermeneutic skills with the processing power and pattern recognition of computers, producing a more methodologically rigorous but interpretive approach to content analysis.
Abstract: This article proposes a three-step methodological framework called computational grounded theory, which combines expert human knowledge and hermeneutic skills with the processing power and pattern ...

Posted Content
TL;DR: An overview of the different schools of thought and approaches to mitigating (social) biases and increase fairness in the Machine Learning literature is provided, organises approaches into the widely accepted framework of pre-processing, in- processing, and post-processing methods, subcategorizing into a further 11 method areas.
Abstract: As Machine Learning technologies become increasingly used in contexts that affect citizens, companies as well as researchers need to be confident that their application of these methods will not have unexpected social implications, such as bias towards gender, ethnicity, and/or people with disabilities. There is significant literature on approaches to mitigate bias and promote fairness, yet the area is complex and hard to penetrate for newcomers to the domain. This article seeks to provide an overview of the different schools of thought and approaches to mitigating (social) biases and increase fairness in the Machine Learning literature. It organises approaches into the widely accepted framework of pre-processing, in-processing, and post-processing methods, subcategorizing into a further 11 method areas. Although much of the literature emphasizes binary classification, a discussion of fairness in regression, recommender systems, unsupervised learning, and natural language processing is also provided along with a selection of currently available open source libraries. The article concludes by summarising open challenges articulated as four dilemmas for fairness research.

Proceedings ArticleDOI
25 Jan 2020
TL;DR: PASE+ is proposed, an improved version of PASE that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks and learns transferable representations suitable for highly mismatched acoustic conditions.
Abstract: Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). PASE was shown to capture relevant speech information, including speaker voice-print and phonemes. This paper proposes PASE+, an improved version of PASE for robust speech recognition in noisy and reverberant environments. To this end, we employ an online speech distortion module, that contaminates the input signals with a variety of random disturbances. We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, we refine the set of workers used in self-supervision to encourage better cooperation.Results on TIMIT, DIRHA and CHiME-5 show that PASE+ significantly outperforms both the previous version of PASE as well as common acoustic features. Interestingly, PASE+ learns transferable representations suitable for highly mismatched acoustic conditions.

Journal ArticleDOI
TL;DR: A deep learning framework for the optimization of downlink beamforming is proposed based on convolutional neural networks and exploitation of expert knowledge, such as the uplink-downlink duality and the known structure of optimal solutions, paving the way for fast realization of optimal beamforming in multiuser MISO systems.
Abstract: Beamforming is an effective means to improve the quality of the received signals in multiuser multiple-input-single-output (MISO) systems. Traditionally, finding the optimal beamforming solution relies on iterative algorithms, which introduces high computational delay and is thus not suitable for real-time implementation. In this paper, we propose a deep learning framework for the optimization of downlink beamforming. In particular, the solution is obtained based on convolutional neural networks and exploitation of expert knowledge, such as the uplink-downlink duality and the known structure of optimal solutions. Using this framework, we construct three beamforming neural networks (BNNs) for three typical optimization problems, i.e., the signal-to-interference-plus-noise ratio (SINR) balancing problem, the power minimization problem, and the sum rate maximization problem. For the former two problems the BNNs adopt the supervised learning approach, while for the sum rate maximization problem a hybrid method of supervised and unsupervised learning is employed. Simulation results show that the BNNs can achieve near-optimal solutions to the SINR balancing and power minimization problems, and a performance close to that of the weighted minimum mean squared error algorithm for the sum rate maximization problem, while in all cases enjoy significantly reduced computational complexity. In summary, this work paves the way for fast realization of optimal beamforming in multiuser MISO systems.

Proceedings Article
30 Apr 2020
TL;DR: This paper argues, and provides empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators.
Abstract: Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.

Posted Content
TL;DR: A novel framework for multivariate time series representation learning based on the transformer encoder architecture, which can offer substantial performance benefits over fully supervised learning on downstream tasks, both with but even without leveraging additional unlabeled data, i.e., by reusing the existing data samples.
Abstract: In this work we propose for the first time a transformer-based framework for unsupervised representation learning of multivariate time series. Pre-trained models can be potentially used for downstream tasks such as regression and classification, forecasting and missing value imputation. By evaluating our models on several benchmark datasets for multivariate time series regression and classification, we show that not only does our modeling approach represent the most successful method employing unsupervised learning of multivariate time series presented to date, but also that it exceeds the current state-of-the-art performance of supervised methods; it does so even when the number of training samples is very limited, while offering computational efficiency. Finally, we demonstrate that unsupervised pre-training of our transformer models offers a substantial performance benefit over fully supervised learning, even without leveraging additional unlabeled data, i.e., by reusing the same data samples through the unsupervised objective.

Proceedings Article
30 Apr 2020
TL;DR: In this paper, the authors proposed to maximize the information between labels and input data indices to solve the cross-entropy minimization problem for unsupervised learning of deep neural networks.
Abstract: Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard cross-entropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Compared to the best previous method in this class, namely DeepCluster, our formulation minimizes a single objective function for both representation learning and clustering; it also significantly outperforms DeepCluster in standard benchmarks.

Book ChapterDOI
01 Jan 2020
TL;DR: A systematic review of scholarly articles published between 2015 and 2018 addressing or implementing supervised and unsupervised machine learning techniques in different problem-solving paradigms revealed decision tree, support vector machine, and Naive Bayes algorithms appeared to be the most cited, discussed, and implemented supervised learners.
Abstract: Machine learning is as growing as fast as concepts such as Big data and the field of data science in general. The purpose of the systematic review was to analyze scholarly articles that were published between 2015 and 2018 addressing or implementing supervised and unsupervised machine learning techniques in different problem-solving paradigms. Using the elements of PRISMA, the review process identified 84 scholarly articles that had been published in different journals. Of the 84 articles, 6 were published before 2015 despite their metadata indicating that they were published in 2015. The existence of the six articles in the final papers was attributed to errors in indexing. Nonetheless, from the reviewed papers, decision tree, support vector machine, and Naive Bayes algorithms appeared to be the most cited, discussed, and implemented supervised learners. Conversely, k-means, hierarchical clustering, and principal component analysis also emerged as the commonly used unsupervised learners. The review also revealed other commonly used algorithms that include ensembles and reinforce learners, and future systematic reviews can focus on them because of the developments that machine learning and data science is undergoing at the moment.

Posted Content
TL;DR: This work proposes to use a memory module with a new update scheme where items in the memory record prototypical patterns of normal data, boosting the discriminative power of both memory items and deeply learned features from normal data and lessening the representation capacity of CNNs.
Abstract: We address the problem of anomaly detection, that is, detecting anomalous events in a video sequence. Anomaly detection methods based on convolutional neural networks (CNNs) typically leverage proxy tasks, such as reconstructing input video frames, to learn models describing normality without seeing anomalous samples at training time, and quantify the extent of abnormalities using the reconstruction error at test time. The main drawbacks of these approaches are that they do not consider the diversity of normal patterns explicitly, and the powerful representation capacity of CNNs allows to reconstruct abnormal video frames. To address this problem, we present an unsupervised learning approach to anomaly detection that considers the diversity of normal patterns explicitly, while lessening the representation capacity of CNNs. To this end, we propose to use a memory module with a new update scheme where items in the memory record prototypical patterns of normal data. We also present novel feature compactness and separateness losses to train the memory, boosting the discriminative power of both memory items and deeply learned features from normal data. Experimental results on standard benchmarks demonstrate the effectiveness and efficiency of our approach, which outperforms the state of the art.

Proceedings ArticleDOI
23 Aug 2020
TL;DR: The tutorial will revisit well known unsupervised learning techniques in deep learning including autoencoders and generative adversarial networks (GANs) from the perspective of anomaly detection to give the audience a more grounded perspective on un supervised deep learning methods.
Abstract: Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. A robust anomaly detection system identifies rare events and patterns in the absence of labelled data. The identified patterns provide crucial insights about both the fidelity of the data and deviations in the underlying data-generating process. For example a surveillance system designed to monitor the emergence of new epidemics will use a robust anomaly detection methods to separate spurious associations from genuine indicators of an epidemic with minimal lag time. The key concept in anomaly detection is the notion of "robustness'', i.e., designing models and representations which are less-sensitive to small changes in the underlying data distribution. The canonical example is that the median is more robust than the mean as an estimator. The tutorial will primarily help researchers and developers design deep learning architectures and loss functions where the learnt representation behave more like the "median'' rather than the "mean.'' The tutorial will revisit well known unsupervised learning techniques in deep learning including autoencoders and generative adversarial networks (GANs) from the perspective of anomaly detection. This in turn will give the audience a more grounded perspective on unsupervised deep learning methods. All the methods will be introduced in a hands-on manner to demonstrate how high-level ideas and concepts get translated to practical real code.

Journal ArticleDOI
01 Apr 2020
TL;DR: Transfer Learning (TL) as discussed by the authors can be classified into four classes: transductive learning, inductive, unsupervised learning and negative learning, and each category can be organized into four learning types: learning on instances, learning on features, learningon parameters, and learning on relations.
Abstract: Transfer learning (TL) has been successfully applied to many real-world problems that traditional machine learning (ML) cannot handle, such as image processing, speech recognition, and natural language processing (NLP). Commonly, TL tends to address three main problems of traditional machine learning: (1) insufficient labeled data, (2) incompatible computation power, and (3) distribution mismatch. In general, TL can be organized into four categories: transductive learning, inductive learning, unsupervised learning, and negative learning. Furthermore, each category can be organized into four learning types: learning on instances, learning on features, learning on parameters, and learning on relations. This article presents a comprehensive survey on TL. In addition, this article presents the state of the art, current trends, applications, and open challenges.

Journal ArticleDOI
TL;DR: In this article, a deep neural network (DNN) based power control method that aims at solving the non-convex optimization problem of maximizing the sum rate of a fading multi-user interference channel is proposed.
Abstract: A deep neural network (DNN) based power control method that aims at solving the non-convex optimization problem of maximizing the sum rate of a fading multi-user interference channel is proposed. Towards this end, we first present PCNet , which is a multi-layer fully connected neural network that is specifically designed for the power control problem. A key challenge in training a DNN for the power control problem is the lack of ground truth, i.e., the optimal power allocation is unknown. To address this issue, PCNet leverages the unsupervised learning strategy and directly maximizes the sum rate in the training phase. We then present PCNet +, which enhances the generalization capacity of PCNet by incorporating noise power as an input to the network. Observing that a single PCNet(+) does not universally outperform the existing solutions, we further propose ePCNet( + ) , a network ensemble with multiple PCNets(+) trained independently. Simulation results show that for the standard symmetric $K$ -user Gaussian interference channel, the proposed methods can outperform state-of-the-art power control solutions under a variety of system configurations. Furthermore, the performance improvement of ePCNet comes with a reduced computational complexity.

Journal ArticleDOI
03 Apr 2020
TL;DR: A new unsupervised and unified densely connected network for different types of image fusion tasks, termed as FusionDN, which obtains a single model applicable to multiple fusion tasks by applying elastic weight consolidation to avoid forgetting what has been learned from previous tasks when training multiple tasks sequentially.
Abstract: In this paper, we present a new unsupervised and unified densely connected network for different types of image fusion tasks, termed as FusionDN. In our method, the densely connected network is trained to generate the fused image conditioned on source images. Meanwhile, a weight block is applied to obtain two data-driven weights as the retention degrees of features in different source images, which are the measurement of the quality and the amount of information in them. Losses of similarities based on these weights are applied for unsupervised learning. In addition, we obtain a single model applicable to multiple fusion tasks by applying elastic weight consolidation to avoid forgetting what has been learned from previous tasks when training multiple tasks sequentially, rather than train individual models for every fusion task or jointly train tasks roughly. Qualitative and quantitative results demonstrate the advantages of FusionDN compared with state-of-the-art methods in different fusion tasks.

Posted Content
TL;DR: A comprehensive taxonomy of representation learning methods for graph-structured data is proposed, aiming to unify several disparate bodies of work and provide a solid foundation for understanding the intuition behind these methods, and enables future research in the area.
Abstract: There has been a surge of recent interest in learning representations for graph-structured data. Graph representation learning methods have generally fallen into three main categories, based on the availability of labeled data. The first, network embedding (such as shallow graph embedding or graph auto-encoders), focuses on learning unsupervised representations of relational structure. The second, graph regularized neural networks, leverages graphs to augment neural network losses with a regularization objective for semi-supervised learning. The third, graph neural networks, aims to learn differentiable functions over discrete topologies with arbitrary structure. However, despite the popularity of these areas there has been surprisingly little work on unifying the three paradigms. Here, we aim to bridge the gap between graph neural networks, network embedding and graph regularization models. We propose a comprehensive taxonomy of representation learning methods for graph-structured data, aiming to unify several disparate bodies of work. Specifically, we propose a Graph Encoder Decoder Model (GRAPHEDM), which generalizes popular algorithms for semi-supervised learning on graphs (e.g. GraphSage, Graph Convolutional Networks, Graph Attention Networks), and unsupervised learning of graph representations (e.g. DeepWalk, node2vec, etc) into a single consistent approach. To illustrate the generality of this approach, we fit over thirty existing methods into this framework. We believe that this unifying view both provides a solid foundation for understanding the intuition behind these methods, and enables future research in the area.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: A novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), is presented to serve as an improved visual region encoder for high-level tasks such as captioning and VQA, and observes consistent performance boosts across them.
Abstract: We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the contextual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while others are by using the conventional likelihood: P(Y|X). This is also the core reason why VC R-CNN can learn “sense-making” knowledge like chair can be sat — while not just “common” co-occurrences such as chair is likely to exist if table is observed. We extensively apply VC R-CNN features in prevailing models of three popular tasks: Image Captioning, VQA, and VCR, and observe consistent performance boosts across them, achieving many new state-of-the-arts.