scispace - formally typeset
Search or ask a question

Showing papers on "Autoencoder published in 2020"


Journal ArticleDOI
TL;DR: This article provides a systematic survey of deep learning methods for remote sensing image scene classification by covering more than 160 papers and discusses the main challenges of remote sensing images classification and survey.
Abstract: Remote sensing image scene classification, which aims at labeling remote sensing images with a set of semantic categories based on their contents, has broad applications in a range of fields. Propelled by the powerful feature learning capabilities of deep neural networks, remote sensing image scene classification driven by deep learning has drawn remarkable attention and achieved significant breakthroughs. However, to the best of our knowledge, a comprehensive review of recent achievements regarding deep learning for scene classification of remote sensing images is still lacking. Considering the rapid evolution of this field, this article provides a systematic survey of deep learning methods for remote sensing image scene classification by covering more than 160 papers. To be specific, we discuss the main challenges of remote sensing image scene classification and survey: first, autoencoder-based remote sensing image scene classification methods; second, convolutional neural network-based remote sensing image scene classification methods; and third, generative adversarial network-based remote sensing image scene classification methods. In addition, we introduce the benchmarks used for remote sensing image scene classification and summarize the performance of more than two dozen of representative algorithms on three commonly used benchmark datasets. Finally, we discuss the promising opportunities for further research.

450 citations


Journal ArticleDOI
01 Feb 2020
TL;DR: Deep Packet can identify encrypted traffic and also distinguishes between VPN and non-VPN network traffic, and outperforms all of the proposed classification methods on UNB ISCX VPN-nonVPN dataset.
Abstract: Network traffic classification has become more important with the rapid growth of Internet and online applications. Numerous studies have been done on this topic which have led to many different approaches. Most of these approaches use predefined features extracted by an expert in order to classify network traffic. In contrast, in this study, we propose a deep learning-based approach which integrates both feature extraction and classification phases into one system. Our proposed scheme, called “Deep Packet,” can handle both traffic characterization in which the network traffic is categorized into major classes (e.g., FTP and P2P) and application identification in which identifying end-user applications (e.g., BitTorrent and Skype) is desired. Contrary to most of the current methods, Deep Packet can identify encrypted traffic and also distinguishes between VPN and non-VPN network traffic. The Deep Packet framework employs two deep neural network structures, namely stacked autoencoder (SAE) and convolution neural network (CNN) in order to classify network traffic. Our experiments show that the best result is achieved when Deep Packet uses CNN as its classification model where it achieves recall of 0.98 in application identification task and 0.94 in traffic categorization task. To the best of our knowledge, Deep Packet outperforms all of the proposed classification methods on UNB ISCX VPN-nonVPN dataset.

417 citations


Posted Content
Arash Vahdat1, Jan Kautz1
TL;DR: NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels and achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ.
Abstract: Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels. The source code is available at this https URL .

391 citations


Journal ArticleDOI
TL;DR: Results demonstrate the promising potential of the deep learning model in forecasting COVID-19 cases and highlight the superior performance of the VAE compared to the other algorithms.
Abstract: The novel coronavirus (COVID-19) has significantly spread over the world and comes up with new challenges to the research community. Although governments imposing numerous containment and social distancing measures, the need for the healthcare systems has dramatically increased and the effective management of infected patients becomes a challenging problem for hospitals. Thus, accurate short-term forecasting of the number of new contaminated and recovered cases is crucial for optimizing the available resources and arresting or slowing down the progression of such diseases. Recently, deep learning models demonstrated important improvements when handling time-series data in different applications. This paper presents a comparative study of five deep learning methods to forecast the number of new cases and recovered cases. Specifically, simple Recurrent Neural Network (RNN), Long short-term memory (LSTM), Bidirectional LSTM (BiLSTM), Gated recurrent units (GRUs) and Variational AutoEncoder (VAE) algorithms have been applied for global forecasting of COVID-19 cases based on a small volume of data. This study is based on daily confirmed and recovered cases collected from six countries namely Italy, Spain, France, China, USA, and Australia. Results demonstrate the promising potential of the deep learning model in forecasting COVID-19 cases and highlight the superior performance of the VAE compared to the other algorithms.

306 citations


Proceedings ArticleDOI
23 Aug 2020
TL;DR: A fast and stable method called UnSupervised Anomaly Detection for multivariate time series (USAD) based on adversely trained autoencoders capable of learning in an unsupervised way is proposed.
Abstract: The automatic supervision of IT systems is a current challenge at Orange. Given the size and complexity reached by its IT operations, the number of sensors needed to obtain measurements over time, used to infer normal and abnormal behaviors, has increased dramatically making traditional expert-based supervision methods slow or prone to errors. In this paper, we propose a fast and stable method called UnSupervised Anomaly Detection for multivariate time series (USAD) based on adversely trained autoencoders. Its autoencoder architecture makes it capable of learning in an unsupervised way. The use of adversarial training and its architecture allows it to isolate anomalies while providing fast training. We study the properties of our methods through experiments on five public datasets, thus demonstrating its robustness, training speed and high anomaly detection performance. Through a feasibility study using Orange's proprietary data we have been able to validate Orange's requirements on scalability, stability, robustness, training speed and high performance.

283 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Local Implicit Grid Representations (LIGR) as mentioned in this paper is a 3D shape representation designed for scalability and generality, which can be used to reconstruct 3D objects from partial or noisy data.
Abstract: Shape priors learned from data are commonly used to reconstruct 3D objects from partial or noisy data. Yet no such shape priors are available for indoor scenes, since typical 3D autoencoders cannot handle their scale, complexity, or diversity. In this paper, we introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality. The motivating idea is that most 3D surfaces share geometric details at some scale -- i.e., at a scale smaller than an entire object and larger than a small patch. We train an autoencoder to learn an embedding of local crops of 3D shapes at that size. Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops such that an interpolation of the decoded local shapes matches a partial or noisy observation. We demonstrate the value of this proposed approach for 3D surface reconstruction from sparse point observations, showing significantly better results than alternative approaches.

255 citations


Posted Content
TL;DR: This paper introduces Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality and demonstrates the value of this proposed approach for 3D surface reconstruction from sparse point observations, showing significantly better results than alternative approaches.
Abstract: Shape priors learned from data are commonly used to reconstruct 3D objects from partial or noisy data. Yet no such shape priors are available for indoor scenes, since typical 3D autoencoders cannot handle their scale, complexity, or diversity. In this paper, we introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality. The motivating idea is that most 3D surfaces share geometric details at some scale -- i.e., at a scale smaller than an entire object and larger than a small patch. We train an autoencoder to learn an embedding of local crops of 3D shapes at that size. Then, we use the decoder as a component in a shape optimization that solves for a set of latent codes on a regular grid of overlapping crops such that an interpolation of the decoded local shapes matches a partial or noisy observation. We demonstrate the value of this proposed approach for 3D surface reconstruction from sparse point observations, showing significantly better results than alternative approaches.

250 citations


Journal ArticleDOI
TL;DR: The asymmetric and unsupervised FC-SAE can extract optimal non-linear features from environmental factors successfully, outperforms some conventional machine learning methods, and is promising for LSP.
Abstract: The environmental factors of landslide susceptibility are generally uncorrelated or non-linearly correlated, resulting in the limited prediction performances of conventional machine learning methods for landslide susceptibility prediction (LSP). Deep learning methods can exploit low-level features and high-level representations of information from environmental factors. In this paper, a novel deep learning–based algorithm, the fully connected spare autoencoder (FC-SAE), is proposed for LSP. The FC-SAE consists of four steps: raw feature dropout in input layers, a sparse feature encoder in hidden layers, sparse feature extraction in output layers, and classification and prediction. The Sinan County of Guizhou Province in China, with a total of 23,195 landslide grid cells (306 recorded landslides) and 23,195 randomly selected non-landslide grid cells, was used as study case. The frequency ratio values of 27 environmental factors were taken as the input variables of FC-SAE. All 46,390 landslide and non-landslide grid cells were randomly divided into a training dataset (70%) and a test dataset (30%). By analyzing real landslide/non-landslide data, the performances of the FC-SAE and two other conventional machine learning methods, support vector machine (SVM) and back-propagation neural network (BPNN), were compared. The results show that the prediction rate and total accuracies of the FC-SAE are 0.854 and 85.2% which are higher than those of the SVM-only (0.827 and 81.56%) and BPNN (0.819 and 80.86%), respectively. In conclusion, the asymmetric and unsupervised FC-SAE can extract optimal non-linear features from environmental factors successfully, outperforms some conventional machine learning methods, and is promising for LSP.

233 citations


Proceedings ArticleDOI
20 Apr 2020
TL;DR: Structural Deep Clustering Network (SDCN) as discussed by the authors integrates the structural information into deep clustering by designing a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer, and a dual self-supervised mechanism to unify these two different deep neural architectures.
Abstract: Clustering is a fundamental task in data analysis. Recently, deep clustering, which derives inspiration primarily from deep learning approaches, achieves state-of-the-art performance and has attracted considerable attention. Current deep clustering methods usually boost the clustering results by means of the powerful representation ability of deep learning, e.g., autoencoder, suggesting that learning an effective representation for clustering is a crucial requirement. The strength of deep clustering methods is to extract the useful representations from the data itself, rather than the structure of data, which receives scarce attention in representation learning. Motivated by the great success of Graph Convolutional Network (GCN) in encoding the graph structure, we propose a Structural Deep Clustering Network (SDCN) to integrate the structural information into deep clustering. Specifically, we design a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer, and a dual self-supervised mechanism to unify these two different deep neural architectures and guide the update of the whole model. In this way, the multiple structures of data, from low-order to high-order, are naturally combined with the multiple representations learned by autoencoder. Furthermore, we theoretically analyze the delivery operator, i.e., with the delivery operator, GCN improves the autoencoder-specific representation as a high-order graph regularization constraint and autoencoder helps alleviate the over-smoothing problem in GCN. Through comprehensive experiments, we demonstrate that our propose model can consistently perform better over the state-of-the-art techniques.

230 citations


Proceedings ArticleDOI
14 Jun 2020
Abstract: Autoencoder networks are unsupervised approaches aiming at combining generative and representational properties by learning simultaneously an encoder-generator map. Although studied extensively, the issues of whether they have the same generative power of GANs, or learn disentangled representations, have not been fully addressed. We introduce an autoencoder that tackles these issues jointly, which we call Adversarial Latent Autoencoder (ALAE). It is a general architecture that can leverage recent improvements on GAN training procedures. We designed two autoencoders: one based on a MLP encoder, and another based on a StyleGAN generator, which we call StyleALAE. We verify the disentanglement properties of both architectures. We show that StyleALAE can not only generate 1024x1024 face images with comparable quality of StyleGAN, but at the same resolution can also produce face reconstructions and manipulations based on real images. This makes ALAE the first autoencoder able to compare with, and go beyond the capabilities of a generator-only type of architecture.

223 citations


Journal ArticleDOI
TL;DR: In this paper, the adversarial training principle is applied to enforce the latent codes to match a prior Gaussian or uniform distribution, which can be used to learn the graph embedding effectively.
Abstract: Graph embedding aims to transfer a graph into vectors to facilitate subsequent graph-analytics tasks like link prediction and graph clustering. Most approaches on graph embedding focus on preserving the graph structure or minimizing the reconstruction errors for graph data. They have mostly overlooked the embedding distribution of the latent codes, which unfortunately may lead to inferior representation in many cases. In this article, we present a novel adversarially regularized framework for graph embedding. By employing the graph convolutional network as an encoder, our framework embeds the topological information and node content into a vector representation, from which a graph decoder is further built to reconstruct the input graph. The adversarial training principle is applied to enforce our latent codes to match a prior Gaussian or uniform distribution. Based on this framework, we derive two variants of the adversarial models, the adversarially regularized graph autoencoder (ARGA) and its variational version, and adversarially regularized variational graph autoencoder (ARVGA), to learn the graph embedding effectively. We also exploit other potential variations of ARGA and ARVGA to get a deeper understanding of our designs. Experimental results that compared 12 algorithms for link prediction and 20 algorithms for graph clustering validate our solutions.

Journal ArticleDOI
TL;DR: A potentially powerful new method of searching for new physics at the LHC, using autoencoders and unsupervised deep learning, which opens up the exciting possibility of training directly on actual data to discover new physics with no prior expectations or theory prejudice.
Abstract: We introduce a potentially powerful new method of searching for new physics at the LHC, using autoencoders and unsupervised deep learning The key idea of the autoencoder is that it learns to map ``normal'' events back to themselves, but fails to reconstruct ``anomalous'' events that it has never encountered before The reconstruction error can then be used as an anomaly threshold We demonstrate the effectiveness of this idea using QCD jets as background and boosted top jets and R-parity violating (RPV) gluino jets as signal We show that a deep autoencoder can significantly improve signal over background when trained on backgrounds only, or even directly on data which contain a small admixture of signal Finally, we examine the correlation of the autoencoders with jet mass and show how the jet mass distribution can be stable against cuts in reconstruction loss This may be important for estimating QCD backgrounds from data As a test case, we show how one could plausibly discover 400 GeV RPV gluinos using an autoencoder combined with a bump hunt in jet mass This opens up the exciting possibility of training directly on actual data to discover new physics with no prior expectations or theory prejudice

Proceedings Article
30 Apr 2020
TL;DR: It is shown, in a rigorous empirical study, that the proposed regularized deterministic autoencoders are able to generate samples that are comparable to, or better than, those of VAEs and more powerful alternatives when applied to images as well as to structured data such as molecules.
Abstract: Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of the VAE. We observe that sampling a stochastic encoder in a Gaussian VAE can be interpreted as simply injecting noise into the input of a deterministic decoder. We investigate how substituting this kind of stochasticity, with other explicit and implicit regularization schemes, can lead to an equally smooth and meaningful latent space without having to force it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism to sample new data points, we introduce an ex-post density estimation step that can be readily applied to the proposed framework as well as existing VAEs, improving their sample quality. We show, in a rigorous empirical study, that the proposed regularized deterministic autoencoders are able to generate samples that are comparable to, or better than, those of VAEs and more powerful alternatives when applied to images as well as to structured data such as molecules.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Experimental results show that the proposed SelfDeblur can achieve notable quantitative gains as well as more visually plausible deblurring results in comparison to state-of-the-art blind deconvolution methods on benchmark datasets and real-world blurry images.
Abstract: Blind deconvolution is a classical yet challenging low-level vision problem with many real-world applications. Traditional maximum a posterior (MAP) based methods rely heavily on fixed and handcrafted priors that certainly are insufficient in characterizing clean images and blur kernels, and usually adopt specially designed alternating minimization to avoid trivial solution. In contrast, existing deep motion deblurring networks learn from massive training images the mapping to clean image or blur kernel, but are limited in handling various complex and large size blur kernels. To connect MAP and deep models, we in this paper present two generative networks for respectively modeling the deep priors of clean image and blur kernel, and propose an unconstrained neural optimization solution to blind deconvolution. In particular, we adopt an asymmetric Autoencoder with skip connections for generating latent clean image, and a fully-connected network (FCN) for generating blur kernel. Moreover, the SoftMax nonlinearity is applied to the output layer of FCN to meet the non-negative and equality constraints. The process of neural optimization can be explained as a kind of ''zero-shot" self-supervised learning of the generative networks, and thus our proposed method is dubbed SelfDeblur. Experimental results show that our SelfDeblur can achieve notable quantitative gains as well as more visually plausible deblurring results in comparison to state-of-the-art blind deconvolution methods on benchmark datasets and real-world blurry images. The source code is publicly available at https://github.com/csdwren/SelfDeblur

Journal ArticleDOI
TL;DR: A novel deep learning network is proposed for quality-relevant feature representation in this article, based on stacked quality-driven autoencoder (SQAE), which is validated on an industrial debutanizer column process.
Abstract: Deep learning is a recently developed feature representation technique for data with complicated structures, which has great potential for soft sensing of industrial processes. However, most deep networks mainly focus on hierarchical feature learning for the raw observed input data. For soft sensor applications, it is important to reduce irrelevant information and extract quality-relevant features from the raw input data for quality prediction. To deal with this problem, a novel deep learning network is proposed for quality-relevant feature representation in this article, which is based on stacked quality-driven autoencoder (SQAE). First, a quality-driven autoencoder (QAE) is designed by exploiting the quality data to guide feature extraction with the constraint that the potential features should largely reconstruct the input layer data and the quality data at the output layer. In this way, quality-relevant features can be captured by QAE. Then, by stacking multiple QAEs to construct the deep SQAE network, SQAE can gradually reduce irrelevant features and learn hierarchical quality-relevant features. Finally, the high-level quality-relevant features can be directly applied for soft sensing of the quality variables. The effectiveness and flexibility of the proposed deep learning model are validated on an industrial debutanizer column process.

Journal ArticleDOI
TL;DR: A novel 3D self-attention convolutional neural network for the LDCT denoising problem and a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function are proposed.
Abstract: Computed tomography (CT) is a widely used screening and diagnostic tool that allows clinicians to obtain a high-resolution, volumetric image of internal structures in a non-invasive manner. Increasingly, efforts have been made to improve the image quality of low-dose CT (LDCT) to reduce the cumulative radiation exposure of patients undergoing routine screening exams. The resurgence of deep learning has yielded a new approach for noise reduction by training a deep multi-layer convolutional neural networks (CNN) to map the low-dose to normal-dose CT images. However, CNN-based methods heavily rely on convolutional kernels, which use fixed-size filters to process one local neighborhood within the receptive field at a time. As a result, they are not efficient at retrieving structural information across large regions. In this paper, we propose a novel 3D self-attention convolutional neural network for the LDCT denoising problem. Our 3D self-attention module leverages the 3D volume of CT images to capture a wide range of spatial information both within CT slices and between CT slices. With the help of the 3D self-attention module, CNNs are able to leverage pixels with stronger relationships regardless of their distance and achieve better denoising results. In addition, we propose a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function. We combine these two methods and demonstrate their effectiveness on both CNN-based neural networks and WGAN-based neural networks with comprehensive experiments. Tested on the AAPM-Mayo Clinic Low Dose CT Grand Challenge data set, our experiments demonstrate that self-attention (SA) module and autoencoder (AE) perceptual loss function can efficiently enhance traditional CNNs and can achieve comparable or better results than the state-of-the-art methods.

Journal ArticleDOI
TL;DR: A taxonomy of deep learning models in intrusion detection is introduced and desirable evaluation metrics on all four datasets in terms of accuracy, F1-score and training and inference time are suggested.

Journal ArticleDOI
TL;DR: A novel recurrent BLS with sparse autoencoder used to extract the features from the input instead of the randomly initialized weights, motivated by the idea of “fine-tuning” in deep learning.
Abstract: The broad learning system (BLS) is an emerging approach for effective and efficient modeling of complex systems. The inputs are transferred and placed in the feature nodes, and then sent into the enhancement nodes for nonlinear transformation. The structure of a BLS can be extended in a wide sense. Incremental learning algorithms are designed for fast learning in broad expansion. Based on the typical BLSs, a novel recurrent BLS (RBLS) is proposed in this paper. The nodes in the enhancement units of the BLS are recurrently connected, for the purpose of capturing the dynamic characteristics of a time series. A sparse autoencoder is used to extract the features from the input instead of the randomly initialized weights. In this way, the RBLS retains the merit of fast computing and fits for processing sequential data. Motivated by the idea of “fine-tuning” in deep learning, the weights in the RBLS can be updated by conjugate gradient methods if the prediction errors are large. We exhibit the merits of our proposed model on several chaotic time series. Experimental results substantiate the effectiveness of the RBLS. For chaotic benchmark datasets, the RBLS achieves very small errors, and for the real-world dataset, the performance is satisfactory.

Journal ArticleDOI
14 Apr 2020
TL;DR: To the best of the knowledge, this is the first practical JSCC scheme that can fully exploit channel output feedback, demonstrating yet another setting in which modern machine learning techniques can enable the design of new and efficient communication methods that surpass the performance of traditional structured coding-based designs.
Abstract: We consider wireless transmission of images in the presence of channel output feedback. From a Shannon theoretic perspective feedback does not improve the asymptotic end-to-end performance, and separate source coding followed by capacity-achieving channel coding, which ignores the feedback signal, achieves the optimal performance. It is well known that separation is not optimal in the practical finite blocklength regime; however, there are no known practical joint source-channel coding (JSCC) schemes that can exploit the feedback signal and surpass the performance of separation-based schemes. Inspired by the recent success of deep learning methods for JSCC, we investigate how noiseless or noisy channel output feedback can be incorporated into the transmission system to improve the reconstruction quality at the receiver. We introduce an autoencoder-based JSCC scheme, which we call DeepJSCC- $f$ , that exploits the channel output feedback, and provides considerable improvements in terms of the end-to-end reconstruction quality for fixed-length transmission, or in terms of the average delay for variable-length transmission. To the best of our knowledge, this is the first practical JSCC scheme that can fully exploit channel output feedback, demonstrating yet another setting in which modern machine learning techniques can enable the design of new and efficient communication methods that surpass the performance of traditional structured coding-based designs.

Journal ArticleDOI
TL;DR: The proposed algorithm to attenuate random noise based on a deep-denoising autoencoder (DDAE) succeeds in attenuating the random noise in an effective manner and is compared with several benchmark algorithms.
Abstract: Attenuation of seismic random noise is considered an important processing step to enhance the signal-to-noise ratio of seismic data. A new approach is proposed to attenuate random noise bas...

Proceedings ArticleDOI
TL;DR: A Structural Deep Clustering Network (SDCN) is proposed to integrate the structural information into deep clustering, with a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer, and a dual self-supervised mechanism to unify these two different deep neural architectures and guide the update of the whole model.
Abstract: Clustering is a fundamental task in data analysis. Recently, deep clustering, which derives inspiration primarily from deep learning approaches, achieves state-of-the-art performance and has attracted considerable attention. Current deep clustering methods usually boost the clustering results by means of the powerful representation ability of deep learning, e.g., autoencoder, suggesting that learning an effective representation for clustering is a crucial requirement. The strength of deep clustering methods is to extract the useful representations from the data itself, rather than the structure of data, which receives scarce attention in representation learning. Motivated by the great success of Graph Convolutional Network (GCN) in encoding the graph structure, we propose a Structural Deep Clustering Network (SDCN) to integrate the structural information into deep clustering. Specifically, we design a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer, and a dual self-supervised mechanism to unify these two different deep neural architectures and guide the update of the whole model. In this way, the multiple structures of data, from low-order to high-order, are naturally combined with the multiple representations learned by autoencoder. Furthermore, we theoretically analyze the delivery operator, i.e., with the delivery operator, GCN improves the autoencoder-specific representation as a high-order graph regularization constraint and autoencoder helps alleviate the over-smoothing problem in GCN. Through comprehensive experiments, we demonstrate that our propose model can consistently perform better over the state-of-the-art techniques.

Book ChapterDOI
23 Aug 2020
TL;DR: Li et al. as discussed by the authors proposed Convolutional Adversarial Variational autoencoder with Guided Attention (CAVGA), which localizes the anomaly with a convolutional latent variable to preserve the spatial information.
Abstract: Anomaly localization is an important problem in computer vision which involves localizing anomalous regions within images with applications in industrial inspection, surveillance, and medical imaging. This task is challenging due to the small sample size and pixel coverage of the anomaly in real-world scenarios. Most prior works need to use anomalous training images to compute a class-specific threshold to localize anomalies. Without the need of anomalous training images, we propose Convolutional Adversarial Variational autoencoder with Guided Attention (CAVGA), which localizes the anomaly with a convolutional latent variable to preserve the spatial information. In the unsupervised setting, we propose an attention expansion loss where we encourage CAVGA to focus on all normal regions in the image. Furthermore, in the weakly-supervised setting we propose a complementary guided attention loss, where we encourage the attention map to focus on all normal regions while minimizing the attention map corresponding to anomalous regions in the image. CAVGA outperforms the state-of-the-art (SOTA) anomaly localization methods on MVTec Anomaly Detection (MVTAD), modified ShanghaiTech Campus (mSTC) and Large-scale Attention based Glaucoma (LAG) datasets in the unsupervised setting and when using only 2% anomalous images in the weakly-supervised setting. CAVGA also outperforms SOTA anomaly detection methods on the MNIST, CIFAR-10, Fashion-MNIST, MVTAD, mSTC and LAG datasets.

Posted Content
TL;DR: The Swapping Autoencoder is proposed, a deep model designed specifically for image manipulation, rather than random sampling, that can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic.
Abstract: Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging. We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. In particular, we encourage the components to represent structure and texture, by enforcing one component to encode co-occurrent patch statistics across different parts of an image. As our method is trained with an encoder, finding the latent codes for a new input image becomes trivial, rather than cumbersome. As a result, it can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.

Journal ArticleDOI
TL;DR: The integrated model of the convolutional neural network (CNN) and recurrent autoencoder is proposed for anomaly detection and empirical results show that the proposed model has better performances on multiple classification metrics and achieves preferable effect on anomaly detection.
Abstract: Internet of Things (IoT) realizes the interconnection of heterogeneous devices by the technology of wireless and mobile communication. The data of target regions are collected by widely distributed sensing devices and transmitted to the processing center for aggregation and analysis as the basis of IoT. The quality of IoT services usually depends on the accuracy and integrity of data. However, due to the adverse environment or device defects, the collected data will be anomalous. Therefore, the effective method of anomaly detection is the crucial issue for guaranteeing service quality. Deep learning is one of the most concerned technology in recent years which realizes automatic feature extraction from raw data. In this article, the integrated model of the convolutional neural network (CNN) and recurrent autoencoder is proposed for anomaly detection. Simple combination of CNN and autoencoder cannot improve classification performance, especially, for time series. Therefore, we utilize the two-stage sliding window in data preprocessing to learn better representations. Based on the characteristics of the Yahoo Webscope S5 dataset, raw time series with anomalous points are extended to fixed-length sequences with normal or anomaly label via the first-stage sliding window. Then, each sequence is transformed into continuous time-dependent subsequences by another smaller sliding window. The preprocessing of the two-stage sliding window can be considered as low-level temporal feature extraction, and we empirically prove that the preprocessing of the two-stage sliding window will be useful for high-level feature extraction in the integrated model. After data preprocessing, spatial and temporal features are extracted in CNN and recurrent autoencoder for the classification in fully connected networks. Empiric results show that the proposed model has better performances on multiple classification metrics and achieves preferable effect on anomaly detection.

Journal ArticleDOI
TL;DR: A cervical cancer cell detection and classification system based on convolutional neural networks (CNNs) and an extreme learning machine (ELM)-based classifier that achieved 99.5% accuracy in the detection problem and 91.2% in the classification problem.

Journal ArticleDOI
TL;DR: This paper is the first SLR specifically on the deep learning based RS to summarize and analyze the existing studies based on the best quality research publications and indicated that autoencoder models are the most widely exploited deep learning architectures for RS followed by the Convolutional Neural Networks and the Recurrent Neural Networks.
Abstract: These days, many recommender systems (RS) are utilized for solving information overload problem in areas such as e-commerce, entertainment, and social media. Although classical methods of RS have achieved remarkable successes in providing item recommendations, they still suffer from many issues such as cold start and data sparsity. With the recent achievements of deep learning in various applications such as Natural Language Processing (NLP) and image processing, more efforts have been made by the researchers to exploit deep learning methods for improving the performance of RS. However, despite the several research works on deep learning based RS, very few secondary studies were conducted in the field. Therefore, this study aims to provide a systematic literature review (SLR) of deep learning based RSs that can guide researchers and practitioners to better understand the new trends and challenges in the field. This paper is the first SLR specifically on the deep learning based RS to summarize and analyze the existing studies based on the best quality research publications. The paper particularly adopts an SLR approach based on the standard guidelines of the SLR designed by Kitchemen-ham which uses selection method and provides detail analysis of the research publications. Several publications were gathered and after inclusion/exclusion criteria and the quality assessment, the selected papers were finally used for the review. The results of the review indicated that autoencoder (AE) models are the most widely exploited deep learning architectures for RS followed by the Convolutional Neural Networks (CNNs) and the Recurrent Neural Networks (RNNs) models. Also, the results showed that Movie Lenses is the most popularly used datasets for the deep learning-based RS evaluation followed by the Amazon review datasets. Based on the results, the movie and e-commerce have been indicated as the most common domains for RS and that precision and Root Mean Squared Error are the most commonly used metrics for evaluating the performance of the deep leaning based RSs.

Journal ArticleDOI
TL;DR: Automatic Chemical Design is a framework for generating novel molecules with optimized properties that can be applied to solve the challenge of designing complex molecules with novel properties.
Abstract: Automatic Chemical Design is a framework for generating novel molecules with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathology that it tends to produce invalid molecular structures. First, we demonstrate empirically that this pathology arises when the Bayesian optimization scheme queries latent space points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathology can be mitigated, yielding marked improvements in the validity of the generated molecules. We posit that constrained Bayesian optimization is a good approach for solving this kind of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this paper, an autoencoder is used to learn 3D deformable object categories from raw single-view images, without external supervision, using the fact that many object categories have, at least in principle, a symmetric structure.
Abstract: We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.

Posted Content
Bingyi Cao1, Andre Araujo1, Jack Sim1
TL;DR: This work unify global and local features into a single deep model, enabling accurate retrieval with efficient feature extraction, and introduces an autoencoder-based dimensionality reduction technique for local features, which is integrated into the model, improving training efficiency and matching performance.
Abstract: Image retrieval is the problem of searching an image database for items that are similar to a query image. To address this task, two main types of image representations have been studied: global and local image features. In this work, our key contribution is to unify global and local features into a single deep model, enabling accurate retrieval with efficient feature extraction. We refer to the new model as DELG, standing for DEep Local and Global features. We leverage lessons from recent feature learning work and propose a model that combines generalized mean pooling for global features and attentive selection for local features. The entire network can be learned end-to-end by carefully balancing the gradient flow between two heads -- requiring only image-level labels. We also introduce an autoencoder-based dimensionality reduction technique for local features, which is integrated into the model, improving training efficiency and matching performance. Comprehensive experiments show that our model achieves state-of-the-art image retrieval on the Revisited Oxford and Paris datasets, and state-of-the-art single-model instance-level recognition on the Google Landmarks dataset v2. Code and models are available at this https URL .

Journal ArticleDOI
TL;DR: This work uses deep reinforcement learning to learn controllers that achieve goal-directed movements in data-driven generative models of human movement using autoregressive conditional variational autoencoders, or Motion VAEs.
Abstract: A fundamental problem in computer animation is that of realizing purposeful and realistic human movement given a sufficiently-rich set of motion capture clips. We learn data-driven generative models of human movement using autoregressive conditional variational autoencoders, or Motion VAEs. The latent variables of the learned autoencoder define the action space for the movement and thereby govern its evolution over time. Planning or control algorithms can then use this action space to generate desired motions. In particular, we use deep reinforcement learning to learn controllers that achieve goal-directed movements. We demonstrate the effectiveness of the approach on multiple tasks. We further evaluate system-design choices and describe the current limitations of Motion VAEs.