scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 2019"


Posted Content
Kaiming He1, Haoqi Fan1, Yuxin Wu1, Saining Xie1, Ross Girshick1 
TL;DR: This article proposed Momentum Contrast (MoCo) for unsupervised visual representation learning, which enables building a large and consistent dictionary on-the-fly that facilitates contrastive learning.
Abstract: We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.

4,272 citations


Posted Content
TL;DR: Key properties of the multiview contrastive learning approach are analyzed, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views the authors learn from, the better the resulting representation captures underlying scene semantics.
Abstract: Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics. Our approach achieves state-of-the-art results on image and video unsupervised learning benchmarks. Code is released at: this http URL.

1,453 citations


Proceedings Article
24 May 2019
TL;DR: The authors show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data, and suggest that future work on disentanglement learning should be explicit about the role of inductive bias and (implicit) supervision.
Abstract: The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on seven different data sets. We observe that while the different methods successfully enforce properties ``encouraged'' by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.

1,013 citations


Journal ArticleDOI
08 May 2019-Nature
TL;DR: An optical version of a brain-inspired neurosynaptic system, using wavelength division multiplexing techniques, is presented that is capable of supervised and unsupervised learning.
Abstract: Software implementations of brain-inspired computing underlie many important computational tasks, from image processing to speech recognition, artificial intelligence and deep learning applications. Yet, unlike real neural tissue, traditional computing architectures physically separate the core computing functions of memory and processing, making fast, efficient and low-energy computing difficult to achieve. To overcome such limitations, an attractive alternative is to design hardware that mimics neurons and synapses. Such hardware, when connected in networks or neuromorphic systems, processes information in a way more analogous to brains. Here we present an all-optical version of such a neurosynaptic system, capable of supervised and unsupervised learning. We exploit wavelength division multiplexing techniques to implement a scalable circuit architecture for photonic neural networks, successfully demonstrating pattern recognition directly in the optical domain. Such photonic neurosynaptic networks promise access to the high speed and high bandwidth inherent to optical systems, thus enabling the direct processing of optical telecommunication and visual data. An optical version of a brain-inspired neurosynaptic system, using wavelength division multiplexing techniques, is presented that is capable of supervised and unsupervised learning.

862 citations


Proceedings Article
04 Jun 2019
TL;DR: The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
Abstract: Unsupervised learning with generative models has the potential of discovering rich representations of 3D scenes. While geometric deep learning has explored 3D-structure-aware representations of scene geometry, these models typically require explicit 3D supervision. Emerging neural scene representations can be trained only with posed 2D images, but existing methods ignore the three-dimensional structure of scenes. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D images and their camera poses, without access to depth or shape. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.

832 citations


Journal ArticleDOI
TL;DR: Fast AnoGAN (f‐AnoGAN), a generative adversarial network (GAN) based unsupervised learning approach capable of identifying anomalous images and image segments, that can serve as imaging biomarker candidates is presented.

777 citations


Posted ContentDOI
29 Apr 2019-bioRxiv
TL;DR: This work uses unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, enabling state-of-the-art supervised prediction of mutational effect and secondary structure, and improving state- of- the-art features for long-range contact prediction.
Abstract: In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.

748 citations


Journal ArticleDOI
TL;DR: This Review discusses the multiple algorithmic options for clustering scRNA-seq data, including various technical, biological and computational considerations.
Abstract: Single-cell RNA sequencing (scRNA-seq) allows researchers to collect large catalogues detailing the transcriptomes of individual cells. Unsupervised clustering is of central importance for the analysis of these data, as it is used to identify putative cell types. However, there are many challenges involved. We discuss why clustering is a challenging problem from a computational point of view and what aspects of the data make it challenging. We also consider the difficulties related to the biological interpretation and annotation of the identified clusters.

741 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
Abstract: Unsupervised visual representation learning remains a largely unsolved problem in computer vision research. Among a big body of recently proposed approaches for unsupervised learning of visual representations, a class of self-supervised techniques achieves superior performance on many challenging benchmarks. A large number of the pretext tasks for self-supervised learning have been studied, but other important aspects, such as the choice of convolutional neural networks (CNN), has not received equal attention. Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. We challenge a number of common practices in self-supervised visual representation learning and observe that standard recipes for CNN design do not always translate to self-supervised representation learning. As part of our study, we drastically boost the performance of previously proposed techniques and outperform previously published state-of-the-art results by a large margin. We will release the code for reproducing our experiments when the anonymity requirements are lifted.

730 citations


Journal ArticleDOI
TL;DR: This Review describes different deep learning techniques and how they can be applied to extract biologically relevant information from large, complex genomic data sets.
Abstract: As a data-driven science, genomics largely utilizes machine learning to capture dependencies in data and derive novel biological hypotheses. However, the ability to extract new insights from the exponentially increasing volume of genomics data requires more expressive machine learning models. By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Now, it is becoming the method of choice for many genomics modelling tasks, including predicting the impact of genetic variation on gene regulatory mechanisms such as DNA accessibility and splicing. This Review describes different deep learning techniques and how they can be applied to extract biologically relevant information from large, complex genomic data sets.

685 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: This model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images, which helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task.
Abstract: Human adaptability relies crucially on the ability to learn and merge knowledge both from supervised and unsupervised learning: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly effective, because supervised learning can never be exhaustive and thus learning autonomously allows to discover invariances and regularities that help to generalize. In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images. This secondary task helps the network to learn the concepts of spatial correlation while acting as a regularizer for the classification task. Multiple experiments on the PACS, VLCS, Office-Home and digits datasets confirm our intuition and show that this simple method outperforms previous domain generalization and adaptation solutions. An ablation study further illustrates the inner workings of our approach.

Journal ArticleDOI
TL;DR: The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, generalization, and gradient descent before moving on to more advanced topics in both supervised and unsupervised learning.

Journal ArticleDOI
03 Jul 2019-Nature
TL;DR: It is shown that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11–13 (vector representations of words) without human labelling or supervision, suggesting that latent knowledge regarding future discoveries is to a large extent embedded in past publications.
Abstract: The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature. Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors. To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3-10, which requires large hand-labelled datasets for training. Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11-13 (vector representations of words) without human labelling or supervision. Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure-property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery. This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.

Posted Content
TL;DR: Self-Supervised Learning: Self-supervised learning as discussed by the authors is a subset of unsupervised image and video feature learning, which aims to learn general image features from large-scale unlabeled data without using any human-annotated labels.
Abstract: Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the main components and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used image and video datasets and the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this article, the authors propose a competitive collaboration framework that facilitates the coordinated training of multiple specialized neural networks to solve complex low-level vision problems, such as single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions.
Abstract: We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the problem because the solutions can reinforce each other. We go beyond previous work by exploiting geometry more explicitly and segmenting the scene into static and moving regions. To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. Competitive Collaboration works much like expectation-maximization, but with neural networks that act as both competitors to explain pixels that correspond to static or moving regions, and as collaborators through a moderator that assigns pixels to be either static or independently moving. Our novel method integrates all these problems in a common framework and simultaneously reasons about the segmentation of the scene into moving objects and the static background, the camera motion, depth of the static scene structure, and the optical flow of moving objects. Our model is trained without any supervision and achieves state-of-the-art performance among joint unsupervised methods on all sub-problems.

Book ChapterDOI
13 Jun 2019
TL;DR: In this paper, a multiview contrastive learning framework is proposed to maximize mutual information between different views of the same scene but is otherwise compact, which achieves state-of-the-art results on image and video unsupervised learning benchmarks.
Abstract: Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt). We investigate the classic hypothesis that a powerful representation is one that models view-invariant factors. We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact. Our approach scales to any number of views, and is view-agnostic. We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics. Our approach achieves state-of-the-art results on image and video unsupervised learning benchmarks. Code is released at: this http URL.

Posted Content
TL;DR: The proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables and is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.
Abstract: The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.

Proceedings ArticleDOI
29 Mar 2019
TL;DR: In this paper, the authors train an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate.
Abstract: Unsupervised approaches to learning in neural networks are of substantial interest for furthering artificial intelligence, both because they would enable the training of networks without the need for large numbers of expensive annotations, and because they would be better models of the kind of general-purpose learning deployed by humans. However, unsupervised networks have long lagged behind the performance of their supervised counterparts, especially in the domain of large-scale visual recognition. Recent developments in training deep convolutional embeddings to maximize non-parametric instance separation and clustering objectives have shown promise in closing this gap. Here, we describe a method that trains an embedding function to maximize a metric of local aggregation, causing similar data instances to move together in the embedding space, while allowing dissimilar instances to separate. This aggregation metric is dynamic, allowing soft clusters of different scales to emerge. We evaluate our procedure on several large-scale visual recognition datasets, achieving state-of-the-art unsupervised transfer learning performance on object recognition in ImageNet, scene recognition in Places 205, and object detection in PASCAL VOC.

Journal ArticleDOI
TL;DR: It is suggested that for AI to learn from animal brains, it is important to consider that animal behaviour results from brain connectivity specified in the genome through evolution, and not due to unique learning algorithms.
Abstract: Artificial neural networks (ANNs) have undergone a revolution, catalyzed by better supervised learning algorithms. However, in stark contrast to young animals (including humans), training such networks requires enormous numbers of labeled examples, leading to the belief that animals must rely instead mainly on unsupervised learning. Here we argue that most animal behavior is not the result of clever learning algorithms-supervised or unsupervised-but is encoded in the genome. Specifically, animals are born with highly structured brain connectivity, which enables them to learn very rapidly. Because the wiring diagram is far too complex to be specified explicitly in the genome, it must be compressed through a "genomic bottleneck". The genomic bottleneck suggests a path toward ANNs capable of rapid learning.

Journal ArticleDOI
17 Jul 2019
TL;DR: This work addresses unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics.
Abstract: Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has established strong baselines in the domain. We propose a novel approach which produces higher quality results, is able to model moving objects and is shown to transfer across data domains, e.g. from outdoors to indoor scenes. The main idea is to introduce geometric structure in the learning process, by modeling the scene and the individual objects; camera ego-motion and object motions are learned from monocular videos as input. Furthermore an online refinement method is introduced to adapt learning on the fly to unknown domains. The proposed approach outperforms all state-of-the-art approaches, including those that handle motion e.g. through learned flow. Our results are comparable in quality to the ones which used stereo as supervision and significantly improve depth prediction on scenes and datasets which contain a lot of object motion. The approach is of practical relevance, as it allows transfer across environments, by transferring models trained on data collected for robot navigation in urban scenes to indoor navigation settings. The code associated with this paper can be found at https://sites.google.com/view/struct2depth.

Proceedings ArticleDOI
25 Feb 2019
TL;DR: This work proposes to use a perceptual loss function based on structural similarity which examines inter-dependencies between local image regions, taking into account luminance, contrast and structural information, instead of simply comparing single pixel values.
Abstract: Convolutional autoencoders have emerged as popular methods for unsupervised defect segmentation on image data. Most commonly, this task is performed by thresholding a per-pixel reconstruction error based on an p-distance. This procedure, however, leads to large residuals whenever the reconstruction includes slight localization inaccuracies around edges. It also fails to reveal defective regions that have been visually altered when intensity values stay roughly consistent. We show that these problems prevent these approaches from being applied to complex real-world scenarios and that they cannot be easily avoided by employing more elaborate architectures such as variational or feature matching autoencoders. We propose to use a perceptual loss function based on structural similarity that examines inter-dependencies between local image regions, taking into account luminance, contrast, and structural information, instead of simply comparing single pixel values. It achieves significant performance gains on a challenging real-world dataset of nanofibrous materials and a novel dataset of two woven fabrics over state-of-the-art approaches for unsupervised defect segmentation that use per-pixel reconstruction error metrics.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: The proposed unsupervised tracker achieves the baseline accuracy of fully supervised trackers, which require complete and accurate labels during training, and exhibits a potential in leveraging unlabeled or weakly labeled data to further improve the tracking accuracy.
Abstract: We propose an unsupervised visual tracking method in this paper. Different from existing approaches using extensive annotated data for supervised learning, our CNN model is trained on large-scale unlabeled videos in an unsupervised manner. Our motivation is that a robust tracker should be effective in both the forward and backward predictions (i.e., the tracker can forward localize the target object in successive frames and backtrace to its initial position in the first frame). We build our framework on a Siamese correlation filter network, which is trained using unlabeled raw videos. Meanwhile, we propose a multiple-frame validation method and a cost-sensitive loss to facilitate unsupervised learning. Without bells and whistles, the proposed unsupervised tracker achieves the baseline accuracy of fully supervised trackers, which require complete and accurate labels during training. Furthermore, unsupervised framework exhibits a potential in leveraging unlabeled or weakly labeled data to further improve the tracking accuracy.

Posted Content
TL;DR: HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models.
Abstract: We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images. Most generative models rely on 2D kernels to generate images and make few assumptions about the 3D world. These models therefore tend to create blurry images or artefacts in tasks that require a strong 3D understanding, such as novel-view synthesis. HoloGAN instead learns a 3D representation of the world, and to render this representation in a realistic manner. Unlike other GANs, HoloGAN provides explicit control over the pose of generated objects through rigid-body transformations of the learnt 3D features. Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images with similar or higher visual quality than other generative models. HoloGAN can be trained end-to-end from unlabelled 2D images only. Particularly, we do not require pose labels, 3D shapes, or multiple views of the same objects. This shows that HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner.

Posted Content
TL;DR: The proposed novel and principled learning formulation is able to self-label visual data so as to train highly competitive image representations without manual labels and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline.
Abstract: Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard crossentropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Our method achieves state of the art representation learning performance for AlexNet and ResNet-50 on SVHN, CIFAR-10, CIFAR-100 and ImageNet and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline. Code and models are available.

Proceedings ArticleDOI
27 Oct 2019
TL;DR: This work proposes a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data and validates its approach on 96 million images from YFCC100M, achieving state-of-the-art results among unsuper supervised methods on standard benchmarks.
Abstract: Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using uncurated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available. To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data. We validate our approach on 96 million images from YFCC100M, achieving state-of-the-art results among unsupervised methods on standard benchmarks, which confirms the potential of unsupervised learning when only uncurated data are available. We also show that pre-training a supervised VGG-16 with our method achieves 74.9% top-1 classification accuracy on the validation set of ImageNet, which is an improvement of +0.8% over the same network trained from scratch. Our code is available at https://github.com/facebookresearch/DeeperCluster.

Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, self-supervised GANs are used to learn meaningful feature representations which are not forgotten during training, which can be scaled to achieve state-of-the-art performance.
Abstract: Conditional GANs are at the forefront of natural image synthesis. The main drawback of such models is the necessity for labeled data. In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs. In particular, we allow the networks to collaborate on the task of representation learning, while being adversarial with respect to the classic GAN game. The role of self-supervision is to encourage the discriminator to learn meaningful feature representations which are not forgotten during training. We test empirically both the quality of the learned image representations, and the quality of the synthesized images. Under the same conditions, the self-supervised GAN attains a similar performance to state-of-the-art conditional counterparts. Finally, we show that this approach to fully unsupervised learning can be scaled to attain an FID of 23.4 on unconditional ImageNet generation.

Journal ArticleDOI
TL;DR: A probabilistic generative model is presented and an unsupervised learning-based inference algorithm is derived that uses insights from classical registration methods and makes use of recent developments in convolutional neural networks (CNNs).

Book ChapterDOI
17 Sep 2019
TL;DR: In this article, an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs), using the Long Short-Term-Memory Recurrent Neural Networks (LSTM-RNN) as the base models (namely, the generator and discriminator) in the GAN framework, was proposed.
Abstract: Many real-world cyber-physical systems (CPSs) are engineered for mission-critical tasks and usually are prime targets for cyber-attacks. The rich sensor data in CPSs can be continuously monitored for intrusion events through anomaly detection. On one hand, conventional supervised anomaly detection methods are unable to exploit the large amounts of data due to the lack of labelled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system when detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs), using the Long-Short-Term-Memory Recurrent Neural Networks (LSTM-RNN) as the base models (namely, the generator and discriminator) in the GAN framework to capture the temporal correlation of time series distributions. Instead of treating each data stream independently, our proposed Multivariate Anomaly Detection with GAN (MAD-GAN) framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies through discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPSs: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results show that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-attacks inserted in these complex real-world systems.

Journal ArticleDOI
TL;DR: Current application of statistical machine learning techniques in machine vision systems, analyses each technique potential for specific application and represents an overview of instructive examples in different agricultural areas are surveyed.

Journal ArticleDOI
Bin Cao1, Long Zhang1, Li Yun1, Daquan Feng2, Wei Cao 
TL;DR: The basic concept of MEC and main applications are introduced, and existing fundamental works using various ML-based approaches are reviewed, and some potential issues of AI in MEC for future work are discussed.
Abstract: Multi-access edge computing (MEC), which is deployed in the proximity area of the mobile user side as a supplement to the traditional remote cloud center, has been regarded as a promising technique for 5G heterogeneous networks. With the assistance of MEC, mobile users can access computing resource effectively. Also, congestion in the core network can be alleviated by offloading. To adapt in stochastic and constantly varying environments, augmented intelligence (AI) is introduced in MEC for intelligent decision making. For this reason, several recent works have focused on intelligent offloading in MEC to harvest its potential benefits. Therefore, machine learning (ML)-based approaches, including reinforcement learning, supervised/unsupervised learning, deep learning, as well as deep reinforcement learning for AI in MEC have become hot topics. However, many technical challenges still remain to be addressed for AI in MEC. In this article, the basic concept of MEC and main applications are introduced, and existing fundamental works using various ML-based approaches are reviewed. Furthermore, some potential issues of AI in MEC for future work are discussed.