scispace - formally typeset
Search or ask a question

Showing papers on "Standard test image published in 2019"


Proceedings ArticleDOI
20 Jun 2019
TL;DR: This paper utilizes GANs to train a very powerful generator of facial texture in UV space and revisits the original 3D Morphable Models (3DMMs) fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective.
Abstract: In the past few years, a lot of work has been done towards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the most recent works, differentiable renderers were employed in order to learn the relationship between the facial identity features and the parameters of a 3D morphable model for shape and texture. The texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction of the state-of-the-art methods is still not capable of modeling textures in high fidelity. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful generator of facial texture in UV space. Then, we revisit the original 3D Morphable Models (3DMMs) fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.

291 citations


Journal ArticleDOI
16 May 2019
TL;DR: Zhang et al. as discussed by the authors proposed an approach that combines automatic features learned by convolutional neural networks (CNN) and handcrafted features computed by the bag-of-visual-words (BOVW) model in order to achieve the state of the art results in facial expression recognition (FER).
Abstract: We present an approach that combines automatic features learned by convolutional neural networks (CNN) and handcrafted features computed by the bag-of-visual-words (BOVW) model in order to achieve the state-of-the-art results in facial expression recognition (FER). To obtain automatic features, we experiment with multiple CNN architectures, pre-trained models, and training procedures, e.g., Dense–Sparse–Dense. After fusing the two types of features, we employ a local learning framework to predict the class label for each test image. The local learning framework is based on three steps. First, a k-nearest neighbors model is applied in order to select the nearest training samples for an input test image. Second, a one-versus-all support vector machines (SVM) classifier is trained on the selected training samples. Finally, the SVM classifier is used to predict the class label only for the test image it was trained for. Although we have used local learning in combination with handcrafted features in our previous work, to the best of our knowledge, local learning has never been employed in combination with deep features. The experiments on the 2013 FER Challenge data set, the FER+ data set, and the AffectNet data set demonstrate that our approach achieves the state-of-the-art results. With a top accuracy of 75.42% on the FER 2013, 87.76% on the FER+, 59.58% on the AffectNet eight-way classification, and 63.31% on the AffectNet seven-way classification, we surpass the state-of-the-art methods by more than 1% on all data sets.

223 citations


Journal ArticleDOI
TL;DR: A data-driven approach to semantic segmentation of cloud and cloud shadow in single date images based on a modified U-Net convolutional neural network that consistently outperforms Fmask and a traditional Random Forest classifier on a globally distributed multi-sensor test dataset in terms of accuracy, Cohen's Kappa coefficient, Dice coefficient and inference speed.

111 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: This work proposes a data-driven learned sky model, which is used for outdoor lighting estimation from a single image, and shows that it can be used to recover plausible illumination, leading to visually pleasant virtual object insertions.
Abstract: We propose a data-driven learned sky model, which we use for outdoor lighting estimation from a single image. As no large-scale dataset of images and their corresponding ground truth illumination is readily available, we use complementary datasets to train our approach, combining the vast diversity of illumination conditions of SUN360 with the radiometrically calibrated and physically accurate Laval HDR sky database. Our key contribution is to provide a holistic view of both lighting modeling and estimation, solving both problems end-to-end. From a test image, our method can directly estimate an HDR environment map of the lighting without relying on analytical lighting models. We demonstrate the versatility and expressivity of our learned sky model and show that it can be used to recover plausible illumination, leading to visually pleasant virtual object insertions. To further evaluate our method, we capture a dataset of HDR 360° panoramas and show through extensive validation that we significantly outperform previous state-of-the-art.

91 citations


Journal ArticleDOI
TL;DR: A novel template establishment is presented and a simple guidance template-based algorithm for strip steel surface defect detection is proposed, which achieves a better average detection rate of 96.2% on a data set including 1500 test images.
Abstract: Automatic defect detection on strip steel surfaces is a challenging task in computer vision, owing to miscellaneous patterns of defects, disturbance of pseudodefects, and random arrangement of gray-level in background. In this paper, a novel template establishment is presented. Further, a simple guidance template-based algorithm for strip steel surface defect detection is proposed. First, a large number of defect-free images are collected to obtain the statistical characteristic of normal textures. Second, for each given test image, the initial template is built according to the statistical characteristic and the size of test image. Then, a sorting operation is applied to the given test image. Further, by updating the initial template, a unique guidance template is generated based on specific intensity distribution of the sorted test image. So far, the background of each test image is approximately reconstructed in the guidance template. Finally, based on pixel-wise detection, the defects can be located accurately by subtraction operation between the guidance template and sorted test image, reverse sorting operation, and adaptive threshold determination. Experimental results show that the proposed method is both efficient and effective. It achieves a better average detection rate of 96.2% on a data set including 1500 test images.

80 citations


Journal ArticleDOI
TL;DR: A surrogate-assisted classification method to classify retinal OCT images automatically based on convolutional neural networks (CNNs) that has been evaluated on different databases and shows that the proposed method is a very promising tool for classifying the retinal Oct images automatically.
Abstract: Optical Coherence Tomography (OCT) is beco-ming one of the most important modalities for the noninvasive assessment of retinal eye diseases. As the number of acquired OCT volumes increases, automating the OCT image analysis is becoming increasingly relevant. In this paper, we propose a surrogate-assisted classification method to classify retinal OCT images automatically based on convolutional neural networks (CNNs). Image denoising is first performed to reduce the noise. Thresholding and morphological dilation are applied to extract the masks. The denoised images and the masks are then employed to generate a lot of surrogate images, which are used to train the CNN model. Finally, the prediction for a test image is determined by the average of the outputs from the trained CNN model on the surrogate images. The proposed method has been evaluated on different databases. The results (AUC of 0.9783 in the local database and AUC of 0.9856 in the Duke database) show that the proposed method is a very promising tool for classifying the retinal OCT images automatically.

71 citations


Proceedings ArticleDOI
Lin Zhang1, Lijun Zhang1, Xiao Liu1, Ying Shen1, Shaoming Zhang1, Shengjie Zhao1 
15 Oct 2019
TL;DR: This paper proposes a "zero-shot" scheme for back-lit image restoration, which exploits the power of deep learning, but does not rely on any prior image examples or prior training, and is the first unsupervised CNN-based back- lit image restoration method.
Abstract: How to restore back-lit images still remains a challenging task. State-of-the-art methods in this field are based on supervised learning and thus they are usually restricted to specific training data. In this paper, we propose a "zero-shot" scheme for back-lit image restoration, which exploits the power of deep learning, but does not rely on any prior image examples or prior training. Specifically, we train a small image-specific CNN, namely ExCNet (short for Exposure Correction Network) at test time, to estimate the "S-curve" that best fits the test back-lit image. Once the S-curve is estimated, the test image can be then restored straightforwardly. ExCNet can adapt itself to different settings per image. This makes our approach widely applicable to different shooting scenes and kinds of back-lighting conditions. Statistical studies performed on 1512 real back-lit images demonstrate that our approach can outperform the competitors by a large margin. To the best of our knowledge, our scheme is the first unsupervised CNN-based back-lit image restoration method. To make the results reproducible, the source code is available at https://cslinzhang.github.io/ExCNet/.

70 citations


Journal ArticleDOI
TL;DR: This approach based on deep learning which uses autoencoders for extraction of discriminative features can detect different defects without using any defect samples during training, and it can be used to detect different types of defects with minimum customization.

67 citations


Journal ArticleDOI
TL;DR: A global Fourier image reconstruction method to detect and localize small defects in nonperiodical pattern images that is invariant to translation and illumination, and can detect subtle defects as small as 1-pixel wide in a wide variety of non periodical patterns found in the electronic industry.
Abstract: For defect detection in nonperiodical pattern images, such as printed circuit boards or integrated circuit dies found in the electronic industry, template matching could be the only applicable method to tackle the problem. The traditional template matching techniques work in the spatial domain and rely on the local pixel information. They are sensitive to geometric and lighting changes, and random product variations. The currently available Fourier-based methods mainly work for plain and periodical texture surfaces. In this paper, we propose a global Fourier image reconstruction method to detect and localize small defects in nonperiodical pattern images. It is based on the comparison of the whole Fourier spectra between the template and the inspection image. It retains only the frequency components associated with the local spatial anomaly. The inverse Fourier transform is then applied to reconstruct the test image, where the local anomaly will be restored and the common pattern will be removed as a uniform surface. The proposed method is invariant to translation and illumination, and can detect subtle defects as small as 1-pixel wide in a wide variety of nonperiodical patterns found in the electronic industry.

59 citations


Posted Content
TL;DR: This work proposes a completely generic deep pose estimation approach, which does not require the network to have been trained on relevant categories, nor objects in a category to have a canonical pose, and demonstrates that this method boosts performances for supervised category pose estimation on standard benchmarks.
Abstract: Most deep pose estimation methods need to be trained for specific object instances or categories. In this work we propose a completely generic deep pose estimation approach, which does not require the network to have been trained on relevant categories, nor objects in a category to have a canonical pose. We believe this is a crucial step to design robotic systems that can interact with new objects in the wild not belonging to a predefined category. Our main insight is to dynamically condition pose estimation with a representation of the 3D shape of the target object. More precisely, we train a Convolutional Neural Network that takes as input both a test image and a 3D model, and outputs the relative 3D pose of the object in the input image with respect to the 3D model. We demonstrate that our method boosts performances for supervised category pose estimation on standard benchmarks, namely Pascal3D+, ObjectNet3D and Pix3D, on which we provide results superior to the state of the art. More importantly, we show that our network trained on everyday man-made objects from ShapeNet generalizes without any additional training to completely new types of 3D objects by providing results on the LINEMOD dataset as well as on natural entities such as animals from ImageNet.

50 citations


Journal ArticleDOI
TL;DR: This work combined the CNN feature of an image into a proposed model which is referred as SEM by using a famous CNN model-AlexNet and extracted a CNN feature by removing its final layer and it is proved to be useful in the authors' SEM model.
Abstract: Automatic image annotation(AIA) methods are considered as a kind of efficient schemes to solve the problem of semantic-gap between the original images and their semantic information. However, traditional annotation models work well only with finely crafted manual features. To address this problem, we combined the CNN feature of an image into our proposed model which we referred as SEM by using a famous CNN model-AlexNet. We extracted a CNN feature by removing its final layer and it is proved to be useful in our SEM model. Additionally, based on the experience of the traditional KNN models, we propose a model to address the problem of simultaneously addressing the image tag refinement and assignment while maintaining the simplicity of the KNN model. The proposed model divides the images which have similar features into a semantic neighbor group. Moreover, utilizing a self-defined Bayesian-based model, we distribute the tags which belong to the neighbor group to the test images according to the distance between the test image and the neighbors. At last, the experiments are performed on three typical image datasets corel5k, espGame and laprtc12, which verify the effectiveness of the proposed model.

Journal ArticleDOI
TL;DR: A blind quality assessment method that can effectively and efficiently evaluate the quality of contrast distorted images without requiring reference information is developed and is more consistent with subjective evaluation results than the state-of-the-art image quality assessment methods and requires a lower computational complexity.
Abstract: This paper mainly focuses on developing a blind quality assessment method that can effectively and efficiently evaluate the quality of contrast distorted images without requiring reference information Through experiments, we discover and validate that the global intensity change is the main characteristic of contrast distorted images and has a close relationship to the perceptual quality With these observations, two elements are utilized to quantify this characteristic, ie, the maximum information entropy of intensity values and the Kullback–Leibler (K–L) divergence between the test image’s intensity histogram and the prior one based on the statistical experiment over a great number of high-quality images To be specific, the entropy represents the valuable information of an image and the K–L divergence reflects the change degree of intensity distribution In view of these, the proposed method is generated by combining these two elements linearly Extensive experiments on three publicly available databases demonstrate the superiority of the proposed method More specifically, it is more consistent with subjective evaluation results than the state-of-the-art image quality assessment methods and requires a lower computational complexity

Journal ArticleDOI
TL;DR: The proposed morphological operations with arc-shaped SEs can efficiently intensify local defects and remove the tool-mark background in the circular machined surface and can achieve high detection accuracy for various small defects, including scratch, bump and edge burst.

Journal ArticleDOI
TL;DR: A CNN-based segmentation algorithm that, in addition to being highly accurate and fast, is also resilient to variation in the input acquisition, and consistent across a wide range of acquisition protocols is proposed.

Journal ArticleDOI
15 Apr 2019-Entropy
TL;DR: An effective technique of Electromagnetic Field Optimization (EFO) algorithm based on a fuzzy entropy criterion is proposed, and in addition, a novel chaotic strategy is embedded into EFO to develop a new algorithm named CEFO to evaluate the robustness of the proposed algorithm.
Abstract: Multilevel thresholding segmentation of color images is an important technology in various applications which has received more attention in recent years. The process of determining the optimal threshold values in the case of traditional methods is time-consuming. In order to mitigate the above problem, meta-heuristic algorithms have been employed in this field for searching the optima during the past few years. In this paper, an effective technique of Electromagnetic Field Optimization (EFO) algorithm based on a fuzzy entropy criterion is proposed, and in addition, a novel chaotic strategy is embedded into EFO to develop a new algorithm named CEFO. To evaluate the robustness of the proposed algorithm, other competitive algorithms such as Artificial Bee Colony (ABC), Bat Algorithm (BA), Wind Driven Optimization (WDO), and Bird Swarm Algorithm (BSA) are compared using fuzzy entropy as the fitness function. Furthermore, the proposed segmentation method is also compared with the most widely used approaches of Otsu's variance and Kapur's entropy to verify its segmentation accuracy and efficiency. Experiments are conducted on ten Berkeley benchmark images and the simulation results are presented in terms of peak signal to noise ratio (PSNR), mean structural similarity (MSSIM), feature similarity (FSIM), and computational time (CPU Time) at different threshold levels of 4, 6, 8, and 10 for each test image. A series of experiments can significantly demonstrate the superior performance of the proposed technique, which can deal with multilevel thresholding color image segmentation excellently.

Journal ArticleDOI
TL;DR: Comparative studies with three existing methods confirm that the proposed convolutional capsule network for detecting vehicles from high-resolution remote sensing images effectively performs in detecting vehicles of various conditions.
Abstract: Vehicle detection plays an important role in a variety of traffic-related applications. However, due to the scale and orientation variations and partial occlusions of vehicles, it is still challengeable to accurately detect vehicles from remote sensing images. This letter proposes a convolutional capsule network for detecting vehicles from high-resolution remote sensing images. First, a test image is segmented into superpixels to generate meaningful and nonredundant patches. Then, these patches are input to a convolutional capsule network to label them into vehicles or the background. Finally, nonmaximum suppression is adopted to eliminate repetitive detections. Quantitative evaluations on four test data sets show that average completeness, correctness, quality, and F1-measure of 0.93, 0.97, 0.90, and 0.95, respectively, are obtained. Comparative studies with three existing methods confirm that the proposed method effectively performs in detecting vehicles of various conditions.

Journal ArticleDOI
TL;DR: The KonIQ-10k dataset as mentioned in this paper is the first in-the-wild dataset for image quality assessment (IQA), consisting of 10,073 quality scored images.
Abstract: Deep learning methods for image quality assessment (IQA) are limited due to the small size of existing datasets. Extensive datasets require substantial resources both for generating publishable content and annotating it accurately. We present a systematic and scalable approach to creating KonIQ-10k, the largest IQA dataset to date, consisting of 10,073 quality scored images. It is the first in-the-wild database aiming for ecological validity, concerning the authenticity of distortions, the diversity of content, and quality-related indicators. Through the use of crowdsourcing, we obtained 1.2 million reliable quality ratings from 1,459 crowd workers, paving the way for more general IQA models. We propose a novel, deep learning model (KonCept512), to show an excellent generalization beyond the test set (0.921 SROCC), to the current state-of-the-art database LIVE-in-the-Wild (0.825 SROCC). The model derives its core performance from the InceptionResNet architecture, being trained at a higher resolution than previous models (512x384). Correlation analysis shows that KonCept512 performs similar to having 9 subjective scores for each test image.

Book ChapterDOI
13 Oct 2019
TL;DR: A novel multi-task deep learning framework for simultaneous histopathology image classification and retrieval, leveraging on the classic concept of k-nearest neighbours to improve model interpretability and evaluate the method on colorectal cancer histology slides to show that the confidence estimates are strongly correlated with model performance.
Abstract: Deep neural networks have achieved tremendous success in image recognition, classification and object detection. However, deep learning is often criticised for its lack of transparency and general inability to rationalise its predictions. The issue of poor model interpretability becomes critical in medical applications: a model that is not understood and trusted by physicians is unlikely to be used in daily clinical practice. In this work, we develop a novel multi-task deep learning framework for simultaneous histopathology image classification and retrieval, leveraging on the classic concept of k-nearest neighbours to improve model interpretability. For a test image, we retrieve the most similar images from our training databases. These retrieved nearest neighbours can be used to classify the test image with a confidence score, and provide a human-interpretable explanation of our classification. Our original framework can be built on top of any existing classification network (and therefore benefit from pretrained models), by (i) combining a triplet loss function with a novel triplet sampling strategy to compare distances between samples and (ii) adding a Cauchy hashing loss function to accelerate neighbour searching. We evaluate our method on colorectal cancer histology slides and show that the confidence estimates are strongly correlated with model performance. Nearest neighbours are intuitive and useful for expert evaluation. They give insights into understanding possible model failures, and can support clinical decision making by comparing archived images and patient records with the actual case.

Posted Content
TL;DR: In this paper, a data-driven learned sky model is proposed for outdoor lighting estimation from a single image, which can directly estimate an HDR environment map of the lighting without relying on analytical lighting models.
Abstract: We propose a data-driven learned sky model, which we use for outdoor lighting estimation from a single image. As no large-scale dataset of images and their corresponding ground truth illumination is readily available, we use complementary datasets to train our approach, combining the vast diversity of illumination conditions of SUN360 with the radiometrically calibrated and physically accurate Laval HDR sky database. Our key contribution is to provide a holistic view of both lighting modeling and estimation, solving both problems end-to-end. From a test image, our method can directly estimate an HDR environment map of the lighting without relying on analytical lighting models. We demonstrate the versatility and expressivity of our learned sky model and show that it can be used to recover plausible illumination, leading to visually pleasant virtual object insertions. To further evaluate our method, we capture a dataset of HDR 360° panoramas and show through extensive validation that we significantly outperform previous state-of-the-art.

Journal ArticleDOI
Tao Zhao1, Juan Liu1, Junyi Duan1, Xin Li1, Yongtian Wang1 
TL;DR: In this paper, a gradient-limited random phase addition method is developed to avoid excessively diffusing object information, where an image is segmented into two regions according to its frequency characteristics.

Journal ArticleDOI
TL;DR: A novel blur metric based on Multiscale SVD fusion (M-SVD) fuses different sub-bands of the selected singular values (SVs) in multiscale image windows, which could drastically reduce the chances of false positives for blur detection and overcome the difficulty that the sharp region is misjudged for a blur region because of its smooth texture.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A comparative analysis of the application of the most demanded gradational correction models (power, exponential and logarithmic) of a highly lighted digital image, which capable of automatic adaptation to different brightness scales is provided.
Abstract: The paper provides a comparative analysis of the application of the most demanded gradational correction models (power, exponential and logarithmic) of a highly lighted digital image, which capable of automatic adaptation to different brightness scales, discusses the features of their practical application, sets up an experiment to improve the highly lighted photo. In the experiment, the test image is modified with use of different gradation correction models and different parameters, this helps to show the practical value of such modifications and the coefficient of image enhancement is given to provide comparative analysis of influence of input parameters to final result. After considering the results of the experiment, analysis and recommendations for the practical use of the models are given which helps to solve actual applied tasks of digital image quality improvement

Proceedings ArticleDOI
TL;DR: In this paper, the power of Generative Adversarial Networks (GANs) and Deep Convolutional Neural Networks (DCNNs) is harnessed to reconstruct the facial texture and shape from single images.
Abstract: In the past few years, a lot of work has been done towards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the most recent works, differentiable renderers were employed in order to learn the relationship between the facial identity features and the parameters of a 3D morphable model for shape and texture. The texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction of the state-of-the-art methods is still not capable of modeling textures in high fidelity. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful generator of facial texture in UV space. Then, we revisit the original 3D Morphable Models (3DMMs) fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.

Journal ArticleDOI
TL;DR: A multilevel reconstruction-based multitask joint sparse representation method, which can not only restrain the background clutter and noise but also augment the data set, is proposed in this paper.
Abstract: Template-matching-based approaches have been developed for many years in the field of synthetic aperture radar (SAR) automatic target recognition (ATR). However, the performance of template-matching-based approaches is strongly affected by two factors: background clutter and noise and the size of the data set. To solve the problems mentioned above, a multilevel reconstruction-based multitask joint sparse representation method is proposed in this paper. According to the theory of the attributed scattering center (ASC) model, a SAR image exhibits strong point-scatter-like behavior, which can be modeled by scattering centers on the target. As a result, the ASCs can be extracted from SAR images based on the ASC model. Then, ASCs extracted from SAR images are used to reconstruct the SAR target at multilevels based on energy ratio (ER). The multilevel reconstruction is a process of data augmentation, which can not only restrain the background clutter and noise but also augment the data set. Several subdictionaries are designed after multilevel reconstruction according to the label of training samples. Meanwhile, a test image chip is reconstructed into multiple test images. The random projection coefficients associated with multiple reconstructed test images are fed into a multitask joint sparse representation classification framework. The final decision is made in terms of accumulated reconstruction error. Experiments on moving and stationary target acquisition and recognition (MSTAR) data set proved the effectiveness of our method.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: In this paper, weight gradients from backpropagation were used to characterize the representation space learned by deep learning algorithms for perceptual image quality assessment and out-of-distribution classification.
Abstract: In this paper, we utilize weight gradients from backpropagation to characterize the representation space learned by deep learning algorithms. We demonstrate the utility of such gradients in applications including perceptual image quality assessment and out-of-distribution classification. The applications are chosen to validate the effectiveness of gradients as features when the test image distribution is distorted from the train image distribution. In both applications, the proposed gradient based features outperform activation features. In image quality assessment, the proposed approach is compared with other state of the art approaches and is generally the top performing method on TID 2013 and MULTI-LIVE databases in terms of accuracy, consistency, linearity, and monotonic behavior. Finally, we analyze the effect of regularization on gradients using CURE-TSR dataset for out-of-distribution classification.

Journal ArticleDOI
TL;DR: A deep learning model to extract vein features by combining the Convolutional Neural Networks (CNN) model and Long Short-Term Memory (LSTM) model is proposed, which significantly improves the finger-vein verification accuracy.
Abstract: Finger-vein biometrics has been extensively investigated for personal verification. A challenge is that the finger-vein acquisition is affected by many factors, which results in many ambiguous regions in the finger-vein image. Generally, the separability between vein and background is poor in such regions. Despite recent advances in finger-vein pattern segmentation, current solutions still lack the robustness to extract finger-vein features from raw images because they do not take into account the complex spatial dependencies of vein pattern. This paper proposes a deep learning model to extract vein features by combining the Convolutional Neural Networks (CNN) model and Long Short-Term Memory (LSTM) model. Firstly, we automatically assign the label based on a combination of known state of the art handcrafted finger-vein image segmentation techniques, and generate various sequences for each labeled pixel along different directions. Secondly, several Stacked Convolutional Neural Networks and Long Short-Term Memory (SCNN-LSTM) models are independently trained on the resulting sequences. The outputs of various SCNN-LSTMs form a complementary and over-complete representation and are conjointly put into Probabilistic Support Vector Machine (P-SVM) to predict the probability of each pixel of being foreground (i.e., vein pixel) given several sequences centered on it. Thirdly, we propose a supervised encoding scheme to extract the binary vein texture. A threshold is automatically computed by taking into account the maximal separation between the inter-class distance and the intra-class distance. In our approach, the CNN learns robust features for vein texture pattern representation and LSTM stores the complex spatial dependencies of vein patterns. So, the pixels in any region of a test image can then be classified effectively. In addition, the supervised information is employed to encode the vein patterns, so the resulting encoding images contain more discriminating features. The experimental results on one public finger-vein database show that the proposed approach significantly improves the finger-vein verification accuracy.

Patent
22 Jan 2019
TL;DR: In this article, an automatic image annotation method for weakly supervised semantic segmentation is proposed, where the object border and the semantic label are regarded as a kind of weak supervised semantic label of image level.
Abstract: An automatic image annotation method for weakly supervised semantic segmentation. The object border is located by an image object detection method, and the semantic label is given. The object border and the semantic label are regarded as a kind of weak supervised semantic label of image level. By using traditional image segmentation method, the whole object region is segmented out, and the segmentation template for training classification network is generated. Then, the segmentation template is used as a supervisory signal to train the classification network. Finally, the trained classification network is used to segment the test image semantically. The technical proposal of the invention utilizes an object detection method to obtain a border and a semantic tag of an object in an image, utilizes a traditional image segmentation method to segment an object region, and combines the semantic tag to serve as a training sample for weak supervision semantic segmentation. The method to automatically generate training samples for weak supervised semantic segmentation, solves the problem of time-consuming and laborious manual labeling of a large number of images.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: The performance evaluation results demonstrate that the two proposed anomaly detection models based on deep learning can well meet the dual requirements of real-time and accuracy for anomaly detection in the high-speed industrial production scenarios.
Abstract: In the process of industrial production, anomaly detection is the key link to ensure the high quality of the product. This paper deeply studies the method of anomaly detection for industrial products based on deep learning. For the balanced image data set of industrial production products, this paper proposes a supervised anomaly detection model based on YOLOv3. This model constructs the ROI classifier to detect anomaly types. For the unbalanced image data set (only few anomaly images) of industrial production products, this paper proposes a semi-supervised anomaly detection model based on Fast-AnoGAN. This model is built from normal samples only. It uses the trained WGAN-GP model to generate images, and achieves anomaly detection by monitoring the anomaly score which is obtained by calculating the difference between the generated image and the test image. The two proposed anomaly detection models evaluated on both balanced and unbalanced data sets in the real industrial production scenarios. The performance evaluation results demonstrate that the two proposed anomaly detection models based on deep learning can well meet the dual requirements of real-time and accuracy for anomaly detection in the high-speed industrial production scenarios.

Journal ArticleDOI
TL;DR: A robust synthetic aperture radar (SAR) automatic target recognition (ATR) method is proposed by combining the global and local filters, especially aiming to improve the recognition performance under various extended operating conditions (EOCs).

Proceedings ArticleDOI
Hu Tian1, Fei Li1
27 May 2019
TL;DR: By exploring similarities between different patches in the whole test image, a novel autoencoder-based fabric defect detection method is proposed and the original encoded latent variable is modified, and the cross-patch similarity is introduced for determining the modification function.
Abstract: Fabric quality inspection plays an important role in the textile industry. As an effective approach to learn data representations, autoencoder has been adopted for defect detection. With the basic idea that the defect area cannot be recovered by the model trained on non-defective image patches, the residual is often used as an indication for defect judgement. However, usually the texture (non-defect) area in a defective patch also cannot be well reconstructed, which makes the pixel-wise detection inaccurate. In this paper, by exploring similarities between different patches in the whole test image, a novel autoencoder-based fabric defect detection method is proposed. In order to maintain the texture area in the reconstructed patch, the original encoded latent variable is modified, and the cross-patch similarity is introduced for determining the modification function. The whole algorithm is conducted in an iterative way, and the detection results will become better and better. Experimental results on the benchmark datasets demonstrate the effectiveness of our proposal.