scispace - formally typeset
Search or ask a question

Showing papers on "Standard test image published in 2017"


Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results and gauges the state-of-the-art in single imagesuper-resolution.
Abstract: This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results. A new DIVerse 2K resolution image dataset (DIV2K) was employed. The challenge had 6 competitions divided into 2 tracks with 3 magnification factors each. Track 1 employed the standard bicubic downscaling setup, while Track 2 had unknown downscaling operators (blur kernel and decimation) but learnable through low and high res train images. Each competition had ∽100 registered participants and 20 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.

1,243 citations


Proceedings ArticleDOI
11 Sep 2017
TL;DR: In this paper, a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN) to perform dense pixel-level prediction on a test image for the new semantic class.
Abstract: Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster.

413 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: It is demonstrated that the approach allows the recovery of plausible illumination conditions and enables photorealistic virtual object insertion from a single image and significantly outperforms previous solutions to this problem.
Abstract: We present a CNN-based technique to estimate high-dynamic range outdoor illumination from a single low dynamic range image. To train the CNN, we leverage a large dataset of outdoor panoramas. We fit a low-dimensional physically-based outdoor illumination model to the skies in these panoramas giving us a compact set of parameters (including sun position, atmospheric conditions, and camera parameters). We extract limited field-of-view images from the panoramas, and train a CNN with this large set of input image–output lighting parameter pairs. Given a test image, this network can be used to infer illumination parameters that can, in turn, be used to reconstruct an outdoor illumination environment map. We demonstrate that our approach allows the recovery of plausible illumination conditions and enables photorealistic virtual object insertion from a single image. An extensive evaluation on both the panorama dataset and captured HDR environment maps shows that our technique significantly outperforms previous solutions to this problem.

238 citations


Journal ArticleDOI
TL;DR: It is shown how sparse autoencoders can be leveraged to partition images into tissue sub-types, so that color standardization for each can be performed independently.

181 citations


Journal ArticleDOI
TL;DR: A deep learning model is proposed to extract and recover vein features using limited a priori knowledge to recover missing finger-vein patterns in the segmented image.
Abstract: Finger-vein biometrics has been extensively investigated for personal verification. Despite recent advances in finger-vein verification, current solutions completely depend on domain knowledge and still lack the robustness to extract finger-vein features from raw images. This paper proposes a deep learning model to extract and recover vein features using limited a priori knowledge. First, based on a combination of the known state-of-the-art handcrafted finger-vein image segmentation techniques, we automatically identify two regions: a clear region with high separability between finger-vein patterns and background, and an ambiguous region with low separability between them. The first is associated with pixels on which all the above-mentioned segmentation techniques assign the same segmentation label (either foreground or background), while the second corresponds to all the remaining pixels. This scheme is used to automatically discard the ambiguous region and to label the pixels of the clear region as foreground or background. A training data set is constructed based on the patches centered on the labeled pixels. Second, a convolutional neural network (CNN) is trained on the resulting data set to predict the probability of each pixel of being foreground (i.e., vein pixel), given a patch centered on it. The CNN learns what a finger-vein pattern is by learning the difference between vein patterns and background ones. The pixels in any region of a test image can then be classified effectively. Third, we propose another new and original contribution by developing and investigating a fully convolutional network to recover missing finger-vein patterns in the segmented image. The experimental results on two public finger-vein databases show a significant improvement in terms of finger-vein verification accuracy.

170 citations


Journal ArticleDOI
TL;DR: Thorough experiments conducted on standard databases show that the proposed novel full-reference IQA framework, codenamed DeepSim, can accurately predict human perceived image quality and outperforms previous state-of-the-art performance.

135 citations


Posted Content
TL;DR: In this paper, a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN) to perform dense pixel-level prediction on a test image for the new semantic class.
Abstract: Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster.

124 citations


Proceedings ArticleDOI
03 Nov 2017
TL;DR: This paper presents the approach for group-level emotion recognition in the Emotion Recognition in the Wild Challenge 2017, based on two types of Convolutional Neural Networks, namely individual facial emotion CNNs and global image based CNNs.
Abstract: This paper presents our approach for group-level emotion recognition in the Emotion Recognition in the Wild Challenge 2017. The task is to classify an image into one of the group emotion such as positive, neutral or negative. Our approach is based on two types of Convolutional Neural Networks (CNNs), namely individual facial emotion CNNs and global image based CNNs. For the individual facial emotion CNNs, we first extract all the faces in an image, and assign the image label to all faces for training. In particular, we utilize a large-margin softmax loss for discriminative learning and we train two CNNs on both aligned and non-aligned faces. For the global image based CNNs, we compare several recent state-of-the-art network structures and data augmentation strategies to boost performance. For a test image, we average the scores from all faces and the image to predict the final group emotion category. We win the challenge with accuracies 83.9% and 80.9% on the validation set and testing set respectively, which improve the baseline results by about 30%.

72 citations


Proceedings ArticleDOI
01 Aug 2017
TL;DR: A modified LBPH algorithm based on pixel neighborhood gray median(MLBPH) is proposed, and the results show that MLBPH algorithm is superior toLBPH algorithm in recognition rate.
Abstract: The Local Binary Pattern Histogram(LBPH) algorithm is a simple solution on face recognition problem, which can recognize both front face and side face. However, the recognition rate of LBPH algorithm under the conditions of illumination diversification, expression variation and attitude deflection is decreased. To solve this problem, a modified LBPH algorithm based on pixel neighborhood gray median(MLBPH) is proposed. The gray value of the pixel is replaced by the median value of its neighborhood sampling value, and then the feature value is extracted by the sub blocks and the statistical histogram is established to form the MLBPH feature dictionary, which is used to recognize the human face identity compared with test image. Experiments are carried on FERET standard face database and the creation of new face database, and the results show that MLBPH algorithm is superior to LBPH algorithm in recognition rate.

62 citations


Journal ArticleDOI
TL;DR: The experimental results demonstrate the superiority of the proposed method over several state-of-the-art methods, especially for practical image splicing, where the noise difference between the original and spliced regions is typically small.
Abstract: Image splicing is one of the most common image tampering operations, where the content of the tampered image usually significantly differs from that of the original one. As a consequence, forensic methods aiming to locate the spliced areas are of great realistic significance. Among these methods, the noise based ones, which utilize the fact that images from different sources tend to have various noise levels, have drawn much attention due to their convenience to implement and the relaxation of some operation specific assumptions. However, the performances of the existing noise based image splicing localization methods are unsatisfactory when the noise difference between the original and spliced regions is relatively small. In this paper, through incorporation of a recent developed noise level estimation algorithm, we propose an effective image splicing localization method. The proposed method performs blockwise noise level estimation of a test image with principal component analysis (PCA)-based algorithm, and segments the tampered region from the original region by k-means clustering. The experimental results demonstrate the superiority of the proposed method over several state-of-the-art methods, especially for practical image splicing, where the noise difference between the original and spliced regions is typically small.

56 citations


Proceedings ArticleDOI
Li Linshen1, Lin Zhang1, Xiyuan Li1, Xiao Liu1, Ying Shen1, Lu Xiong1 
01 Jul 2017
TL;DR: A learning based parking-slot detection approach is proposed and with this approach, given a test image, the marking-points will be detected at first and then the valid parking-slots can be inferred and its efficacy and efficiency have been corroborated on the database.
Abstract: Recent years have witnessed a growing interest in developing automatic parking systems in the field of intelligent vehicle. However, how to effectively and efficiently locating parking-slots using a vision-based system is still an unresolved issue. In this paper, we attempt to fill this research gap to some extent and our contributions are twofold. Firstly, to facilitate the study of vision-based parking-slot detection, a large-scale parking-slot image database is established. For each image in this database, the marking-points and parking-slots are carefully labelled. Such a database can serve as a benchmark to design and validate parking-slot detection algorithms. Secondly, a learning based parking-slot detection approach is proposed. With this approach, given a test image, the marking-points will be detected at first and then the valid parking-slots can be inferred. Its efficacy and efficiency have been corroborated on our database. The labeled database and the source codes are publicly available at http://sse.tongji.edu.cn/linzhang/ps/index.htm.

Journal ArticleDOI
TL;DR: The proposed approach is based on a novel class adapting principal directions’ (CAPDs) concept that allows multiple embeddings of image features into a semantic space and can generalize the seen CAPDs by estimating seen–unseen diversity that significantly improves the performance of generalized zero-shot learning.
Abstract: Prevalent techniques in zero-shot learning do not generalize well to other related problem scenarios. Here, we present a unified approach for conventional zero-shot, generalized zero-shot and few-shot learning problems. Our approach is based on a novel Class Adapting Principal Directions (CAPD) concept that allows multiple embeddings of image features into a semantic space. Given an image, our method produces one principal direction for each seen class. Then, it learns how to combine these directions to obtain the principal direction for each unseen class such that the CAPD of the test image is aligned with the semantic embedding of the true class, and opposite to the other classes. This allows efficient and class-adaptive information transfer from seen to unseen classes. In addition, we propose an automatic process for selection of the most useful seen classes for each unseen class to achieve robustness in zero-shot learning. Our method can update the unseen CAPD taking the advantages of few unseen images to work in a few-shot learning scenario. Furthermore, our method can generalize the seen CAPDs by estimating seen-unseen diversity that significantly improves the performance of generalized zero-shot learning. Our extensive evaluations demonstrate that the proposed approach consistently achieves superior performance in zero-shot, generalized zero-shot and few/one-shot learning problems.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: In this paper, the authors propose a novel image set classification technique using linear regression models, where the gallery image sets are interpreted as subspaces of a high dimensional space to avoid the computationally expensive training step.
Abstract: We propose a novel image set classification technique using linear regression models. Downsampled gallery image sets are interpreted as subspaces of a high dimensional space to avoid the computationally expensive training step. We estimate regression models for each test image using the class specific gallery subspaces. Images of the test set are then reconstructed using the regression models. Based on the minimum reconstruction error between the reconstructed and the original images, a weighted voting strategy is used to classify the test set. We performed extensive evaluation on the benchmark UCSD/Honda, CMU Mobo and YouTube Celebrity datasets for face classification, and ETH-80 dataset for object classification. The results demonstrate that by using only a small amount of training data, our technique achieved competitive classification accuracy and superior computational speed compared with the state-of-the-art methods.

Journal ArticleDOI
TL;DR: This study incorporated SSF with a Lab color transformation to reduce over-detection problems associated with the original luminance image and ported four of the most time-consuming processes to the graphics processing unit (GPU) to improve computational efficiency.
Abstract: The use of unmanned aerial vehicles (UAV) can allow individual tree detection for forest inventories in a cost-effective way. The scale-space filtering (SSF) algorithm is commonly used and has the capability of detecting trees of different crown sizes. In this study, we made two improvements with regard to the existing method and implementations. First, we incorporated SSF with a Lab color transformation to reduce over-detection problems associated with the original luminance image. Second, we ported four of the most time-consuming processes to the graphics processing unit (GPU) to improve computational efficiency. The proposed method was implemented using PyCUDA, which enabled access to NVIDIA’s compute unified device architecture (CUDA) through high-level scripting of the Python language. Our experiments were conducted using two images captured by the DJI Phantom 3 Professional and a most recent NVIDIA GPU GTX1080. The resulting accuracy was high, with an F-measure larger than 0.94. The speedup achieved by our parallel implementation was 44.77 and 28.54 for the first and second test image, respectively. For each 4000 × 3000 image, the total runtime was less than 1 s, which was sufficient for real-time performance and interactive application.

Journal ArticleDOI
TL;DR: The objective of this paper is to develop a photo forensics algorithm which can detect any photo manipulation and showed that the proposed algorithm could identify successfully the modified image as well as showing the exact location of modifications.
Abstract: Nowadays, image manipulation is common due to the availability of image processing software, such as Adobe Photoshop or GIMP. The original image captured by digital camera or smartphone normally is saved in the JPEG format due to its popularity. JPEG algorithm works on image grids, compressed independently, having size of 8x8 pixels. For unmodified image, all 8x8 grids should have a similar error level. For resaving operation, each block should degrade at approximately the same rate due to the introduction of similar amount of errors across the entire image. For modified image, the altered blocks should have higher error potential compred to the remaining part of the image. The objective of this paper is to develop a photo forensics algorithm which can detect any photo manipulation. The error level analysis (ELA) was further enhanced using vertical and horizontal histograms of ELA image to pinpoint the exact location of modification. Results showed that our proposed algorithm could identify successfully the modified image as well as showing the exact location of modifications.

Proceedings ArticleDOI
22 Mar 2017
TL;DR: All 14 types of defects are detected and are classified in all possible classes using referential inspection approach and it shows that the proposed af algorithm is suitable for automatic visual inspection of PCBs.
Abstract: Inspection of printed circuit board (PCB) has been a crucial process in the electronic manufacturing industry to guarantee product quality & reliability, cut manufacturing cost and to increase production. The PCB inspection involves detection of defects in the PCB and classification of those defects in order to identify the roots of defects. In this paper, all 14 types of defects are detected and are classified in all possible classes using referential inspection approach. The proposed algorithm is mainly divided into five stages: Image registration, Pre-processing, Image segmentation, Defect detection and Defect classification. The algorithm is able to perform inspection even when captured test image is rotated, scaled and translated with respect to template image which makes the algorithm rotation, scale and translation in-variant. The novelty of the algorithm lies in its robustness to analyze a defect in its different possible appearance and severity. In addition to this, algorithm takes only 2.528 s to inspect a PCB image. The efficacy of the proposed algorithm is verified by conducting experiments on the different PCB images and it shows that the proposed afgorithm is suitable for automatic visual inspection of PCBs.

Journal ArticleDOI
TL;DR: The research identified small area in eggs properly and compared preprocessing, the methods, and the results of image processing by using centroid and the bounding box for determining the object and the small area of chicken eggs.
Abstract: The research used watermarking techniques to obtain the image originality. The aims of the research were to identify small area in eggs properly and compared preprocessing, the methods, and the results of image processing. The study has been improved from the previous papers by combined all methods and analysis was obtained.This study was conducted by using centroid and the bounding box for determining the object and the small area of chicken eggs. The segmentation method was used to compare the original image and the watermarked image. Image processing using image data that are subject watermark to maintain the authenticity of the images used in the study will the impact in delivering the desired results. In the identification of chicken eggs using watermark image using several methods are expected to provide results as desired. Segmentation also deployed to process the Image and counted the objects. The results showed that the process of segmentation and objects counting determined that the original image and watermarked image had the same value and recognized eggs. Identification had determined percentage of 100% for all the samples.

Journal ArticleDOI
TL;DR: In this work, a fuzzy pre-classifier is used to complement a set of support vector machines (SVM) to manage the large wood database and classify the wood species efficiently.
Abstract: An automated wood texture recognition system of 48 tropical wood species is presented. For each wood species, 100 macroscopic texture images are captured from different timber logs where 70 images are used for training while 30 images are used for testing. In this work, a fuzzy pre-classifier is used to complement a set of support vector machines (SVM) to manage the large wood database and classify the wood species efficiently. Given a test image, a set of texture pore features is extracted from the image and used as inputs to a fuzzy pre-classifier which assigns it to one of the four broad categories. Then, another set of texture features is extracted from the image and used with the SVM dedicated to the selected category to further classify the test image to a particular wood species. The advantage of dividing the database into four smaller databases is that when a new wood species is added into the system, only the SVM classifier of one of the four databases needs to be retrained instead of those of the entire database. This shortens the training time and emulates the experts’ reasoning when expanding the wood database. The results show that the proposed model is more robust as the size of wood database is increased.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A multi-class face segmentation algorithm is implemented and a model for each considered pose is trained, achieving competitive results when compared to most recent methods, according to mean absolute error and accuracy metrics.
Abstract: The aim of this work is to explore the usefulness of face semantic segmentation for head pose estimation. We implement a multi-class face segmentation algorithm and we train a model for each considered pose. Given a new test image, the probabilities associated to face parts by the different models are used as the only information for estimating the head orientation. A simple algorithm is proposed to exploit such probabilites in order to predict the pose. The proposed scheme achieves competitive results when compared to most recent methods, according to mean absolute error and accuracy metrics. Moreover, we release and make publicly available a face segmentation dataset1 consisting of 294 images belonging to 13 different poses, manually labeled into six semantic regions, which we used to train the segmentation models.

Journal ArticleDOI
TL;DR: An AIA system using Non-negative Matrix Factorization (NMF) framework, which discovers a latent space, by factorizing data into a set of non-negative basis and coefficients and is competitive with the current state-of-the-art methods.

Journal ArticleDOI
TL;DR: A novel NR-IQA measure is introduced in which quality-aware statistics are used as perceptual features for the quality prediction, which demonstrates that the proposed technique outperforms the state-of-the-art NR measures.
Abstract: The aim of no-reference image quality assessment (NR-IQA) techniques is to measure the perceptual quality of an image without access to the reference image. In this letter, a novel NR-IQA measure is introduced in which quality-aware statistics are used as perceptual features for the quality prediction. In the method, the distorted image is converted to grayscale and filtered using gradient operators. Then, the speeded-up robust feature (SURF) technique is employed to detect and describe keypoints in obtained images. The SURF interest point detection method is affected by distortions in the filtered image. Therefore, it can be used to reflect the decreased attention of the human visual system caused by image distortions. In the method, statistics are calculated for processed images and their SURF descriptors. Finally, they are mapped into subjective opinion scores using a support vector regression technique. The experimental evaluation conducted on four demanding large benchmark datasets, which contain images corrupted by single and multiple distortions, demonstrates that the proposed technique outperforms the state-of-the-art NR measures.

Proceedings ArticleDOI
01 Nov 2017
TL;DR: An approach for automatic detection of helmeted and non-helmeted motorcyclist using convolutional neural network (CNN) and uses detection of person class instead of motorcycle in order to increase the accuracy of helmet detection in the input image.
Abstract: Detection of helmeted and non-helmeted motorcyclist is mandatory now-a-days in order to ensure the safety of riders on the road. However, due to many constraints such as poor video quality, occlusion, illumination, and other varying factors it becomes very difficult to detect them accurately. In this paper, we introduce an approach for automatic detection of helmeted and non-helmeted motorcyclist using convolutional neural network (CNN). During the past several years, the advancements in deep learning models have drastically improved the performance of object detection. One such model is YOLOv2 [1] which combines both classification and object detection in a single architecture. Here, we use YOLOv2 at two different stages one after another in order to improve the helmet detection accuracy. At the first stage, YOLOv2 model is used to detect different objects in the test image. Since this model is trained on COCO dataset, it can detect all classes of the COCO dataset. In the proposed approach, we use detection of person class instead of motorcycle in order to increase the accuracy of helmet detection in the input image. The cropped images of detected persons are used as input to second YOLOv2 stage which was trained on our dataset of helmeted images. The non-helmeted images are processed further to extract license plate by using OpenALPR. In the proposed approach, we use two different datasets i.e., COCO and helmet datasets. We tested the potential of our approach on different helmeted and non-helmeted images. Experimental results show that the proposed method performs better when compared to other existing approaches with 94.70% helmet detection accuracy.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The main focus of this paper is to recognize whether a given face input corresponds to a registered person in the database by using Histogram of Oriented Gradients technique in AT & T database.
Abstract: Face recognition is widely used in computer vision and in many other biometric applications where security is a major concern. The most common problem in recognizing a face arises due to pose variations, different illumination conditions and so on. The main focus of this paper is to recognize whether a given face input corresponds to a registered person in the database. Face recognition is done using Histogram of Oriented Gradients (HOG) technique in AT & T database with an inclusion of a real time subject to evaluate the performance of the algorithm. The feature vectors generated by HOG descriptor are used to train Support Vector Machines (SVM) and results are verified against a given test input. The proposed method checks whether a test image in different pose and lighting conditions is matched correctly with trained images of the facial database. The results of the proposed approach show minimal false positives and improved detection accuracy.

Posted Content
TL;DR: A framework to analyze predictions in terms of the model's internal features by inspecting information flow through the network by comparing the sets of neurons selected by two metrics, which suggests a way to investigate the internal attention mechanisms of convolutional neural networks.
Abstract: The predictive power of neural networks often costs model interpretability. Several techniques have been developed for explaining model outputs in terms of input features; however, it is difficult to translate such interpretations into actionable insight. Here, we propose a framework to analyze predictions in terms of the model's internal features by inspecting information flow through the network. Given a trained network and a test image, we select neurons by two metrics, both measured over a set of images created by perturbations to the input image: (1) magnitude of the correlation between the neuron activation and the network output and (2) precision of the neuron activation. We show that the former metric selects neurons that exert large influence over the network output while the latter metric selects neurons that activate on generalizable features. By comparing the sets of neurons selected by these two metrics, our framework suggests a way to investigate the internal attention mechanisms of convolutional neural networks.

Journal ArticleDOI
01 Jun 2017-Optik
TL;DR: The experimental results show that the canny edge detection based feature extraction achieves low root mean square error (RMSE) when compared to Otsu method for detecting the grain count.

Proceedings ArticleDOI
01 Nov 2017
TL;DR: A technique to remove visible watermark automatically using image inpainting algorithms through investigating the sparsity of natural image patches and a statistical method to detect the watermark region is proposed.
Abstract: This paper introduces a technique to remove visible watermark automatically using image inpainting algorithms. The pending images which need watermark re-moval are assumed to have same resolution and watermark region and we will show this assumption is reasonable. Our proposed technique includes two basic step. The first step is detecting the watermark region, we propose a statistical method to detect the watermark region. Thresholding algorithm for segmentation proceeds at the accumulation image which is calculated by accumulation of the gray-scale maps of pending images. The second step is removing the watermark using image inpainting algorithms. Since watermarks are usually with large re-gion areas, an exemplar-based inpainting algorithm through investigating the sparsity of natural image patches is proposed for this step. Experiments were im-plemented in a test image set of 889 images downloaded from a shopping web-site with the resolution of 800∗800 and same watermark regions.

Journal ArticleDOI
TL;DR: This correspondence paper proposes a new approach called cascaded elastically progressive model aiming for pixel-wise landmark localization and shows advantages for accurate landmark localization compared with prevailing methods.
Abstract: While recently published face alignment algorithms mainly focused on occlusion, low image quality, and complex head poses, subtle variances of facial components were often overlooked. In this correspondence paper, we propose a new approach called cascaded elastically progressive model aiming for pixel-wise landmark localization. First of all, elastically progressive model (EPM) is designed to synthesize the prior knowledge of face shape and appearance of test image. More specifically, a novel framework referred to as inherent linear structure (ILS) is explored for capturing the characteristics of the shape, which is more plastic and flexible than extensively used principle component analysis-based modeling. A locally linear support vector machine (LL-SVM) is used as local expert for searching candidate feature points. In order to optimally integrate ILS with localization results of LL-SVM, we introduce Kalman filter (KF) to dynamically estimate the true shape in the sense of least mean square error. Two schemes are utilized based on our modeling of KF. First, we embedded heuristic line-like search strategy into the framework to guarantee and accelerate the convergence. Second, Kalman gain is manipulated adaptively in accordance with the confidence of the localizers so that poorly localized points are more subject to global constraint than well localized ones. To further improve robustness to initializations, two EPMs are cascaded, in which primary EPM detects the global structure and secondary EPM captures the details. Validation experiments are conducted on in-the-wild LFPW and HELEN databases. Our method shows advantages for accurate landmark localization compared with prevailing methods.

Proceedings ArticleDOI
30 Sep 2017
TL;DR: An algorithm that provides a pixel-wise classification of building facades that integrates appearance and layout cues in a single framework and is on par with the reported performance results.
Abstract: We propose an algorithm that provides a pixel-wise classification of building facades. Building facades provide a rich environment for testing semantic segmentation techniques. They come in a variety of styles that reflect both appearance and layout characteristics. On the other hand, they exhibit a degree of stability in the arrangement of structures across different instances. We integrate appearance and layout cues in a single framework. The most likely label based on appearance is obtained through applying the state-of-the-art deep convolution networks. This is further optimized through Restricted Boltzmann Machines (RBM), applied on vertical and horizontal scanlines of facade models. Learning the probability distributions of the models via the RBMs is utilized in two settings. Firstly, we use them in learning from pre-seen facade samples, in the traditional training sense. Secondly, we learn from the test image at hand, in a way the allows the transfer of visual knowledge of the scene from correctly classified areas to others. Experimentally, we are on par with the reported performance results. However, we do not explicitly specify any hand-engineered features that are architectural scene dependent, nor do we include any dataset specific heuristics/thresholds.

Patent
23 Jun 2017
TL;DR: In this paper, a radar echo extrapolation method based on a dynamic convolution neural network is proposed, which comprises a step of offline convolutional neural network training which comprises the steps of carrying out data preprocessing on a given training image set to obtain a training sample set.
Abstract: The invention discloses a radar echo extrapolation method based on a dynamic convolution neural network. The method comprises a step of offline convolutional neural network training which comprises the steps of carrying out data preprocessing on a given training image set to obtain a training sample set, initializing a dynamic convolution neural network model, training a dynamic convolution neural network by using the training sample set, calculating an output value through network forward propagation, and updating network parameters through backward propagation such that the dynamic convolution neural network converges. The method also comprises a step of online radar echo extrapolation which comprises the steps of converting a test image set into a test sample set through data preprocessing, testing the trained dynamic convolution neural network by using the test sample set, and carrying out convolution of a laster radar echo image inputted into an image sequence and a probability vector obtained in the network forward propagation to obtain a predicted radar echo extrapolation image.

Journal ArticleDOI
TL;DR: The experimental results demonstrate that the proposed hybrid method has substantial quality improvement, in terms of the CPSNR quality, visual effect, CPSNR-bitrate trade-off, and Bjøntegaard delta PSNR performance, of the reconstructed RGB images when compared with existing chroma subsampling schemes.
Abstract: In this paper, we propose a novel and effective hybrid method, which joins the conventional chroma subsampling and the distortion-minimization-based luma modification together, to improve the quality of the reconstructed RGB full-color image. Assume the input RGB full-color image has been transformed to a YUV image, prior to compression. For each $2\times 2$ UV block, one 4:2:0 subsampling is applied to determine the one subsampled U and V components, $U_{s}$ and $V_{s}$ . Based on $U_{s}$ , $V_{s}$ , and the corresponding $2\times 2$ original RGB block, a main theorem is provided to determine the ideally modified $2\times 2$ luma block in constant time such that the color peak signal-to-noise ratio (CPSNR) quality distortion between the original $2\times 2$ RGB block and the reconstructed $2\times 2$ RGB block can be minimized in a globally optimal sense. Furthermore, the proposed hybrid method and the delivered theorem are adjusted to tackle the digital time delay integration images and the Bayer mosaic images whose Bayer CFA structure has been widely used in modern commercial digital cameras. Based on the IMAX, Kodak, and screen content test image sets, the experimental results demonstrate that in high efficiency video coding, the proposed hybrid method has substantial quality improvement, in terms of the CPSNR quality, visual effect, CPSNR-bitrate trade-off, and Bjontegaard delta PSNR performance, of the reconstructed RGB images when compared with existing chroma subsampling schemes.