scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Signal Processing Letters in 2018"


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a conceptually simple and intuitive learning objective function, i.e., additive margin softmax, for face verification, which is more intuitive and interpretable.
Abstract: In this letter, we propose a conceptually simple and intuitive learning objective function, i.e., additive margin softmax, for face verification. In general, face verification tasks can be viewed as metric learning problems, even though lots of face verification models are trained in classification schemes. It is possible when a large-margin strategy is introduced into the classification model to encourage intraclass variance minimization. As one alternative, angular softmax has been proposed to incorporate the margin. In this letter, we introduce another kind of margin to the softmax loss function, which is more intuitive and interpretable. Experiments on LFW and MegaFace show that our algorithm performs better when the evaluation criteria are designed for very low false alarm rate.

936 citations


Journal ArticleDOI
TL;DR: A three-dimensional attention-based convolutional recurrent neural networks to learn discriminative features for SER is proposed, where the Mel-spectrogram with deltas and delta-deltas are used as input.
Abstract: Speech emotion recognition (SER) is a difficult task due to the complexity of emotions. The SER performances are heavily dependent on the effectiveness of emotional features extracted from the speech. However, most emotional features are sensitive to emotionally irrelevant factors, such as the speaker, speaking styles, and environment. In this letter, we assume that calculating the deltas and delta-deltas for personalized features not only preserves the effective emotional information but also reduces the influence of emotionally irrelevant factors, leading to reduce misclassification. In addition, SER often suffers from the silent frames and emotionally irrelevant frames. Meanwhile, attention mechanism has exhibited outstanding performances in learning relevant feature representations for specific tasks. Inspired by this, we propose a three-dimensional attention-based convolutional recurrent neural networks to learn discriminative features for SER, where the Mel-spectrogram with deltas and delta-deltas are used as input. Experiments on IEMOCAP and Emo-DB corpus demonstrate the effectiveness of the proposed method and achieve the state-of-the-art performance in terms of unweighted average recall.

354 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a weakly supervised color transfer method to correct color distortion, which relaxes the need for paired underwater images for training and allows the underwater images being taken in unknown locations.
Abstract: Underwater vision suffers from severe effects due to selective attenuation and scattering when light propagates through water. Such degradation not only affects the quality of underwater images, but limits the ability of vision tasks. Different from existing methods that either ignore the wavelength dependence on the attenuation or assume a specific spectral profile, we tackle color distortion problem of underwater images from a new view. In this letter, we propose a weakly supervised color transfer method to correct color distortion. The proposed method relaxes the need for paired underwater images for training and allows the underwater images being taken in unknown locations. Inspired by cycle-consistent adversarial networks, we design a multiterm loss function including adversarial loss, cycle consistency loss, and structural similarity index measure loss, which makes the content and structure of the outputs same as the inputs, meanwhile the color is similar to the images that were taken without the water. Experiments on underwater images captured under diverse scenes show that our method produces visually pleasing results, even outperforms the state-of-the-art methods. Besides, our method can improve the performance of vision tasks.

308 citations


Journal ArticleDOI
TL;DR: Thorough evaluation on LFW and SCface databases shows that the proposed DCR model achieves consistently and considerably better performance than the state of the arts.
Abstract: Face images captured by surveillance cameras are often of low resolution (LR), which adversely affects the performance of their matching with high-resolution (HR) gallery images. Existing methods including super resolution, coupled mappings (CMs), multidimensional scaling, and convolutional neural network yield only modest performance. In this letter, we propose the deep coupled ResNet (DCR) model. It consists of one trunk network and two branch networks. The trunk network, trained by face images of three significantly different resolutions, is used to extract discriminative features robust to the resolution change. Two branch networks, trained by HR images and images of the targeted LR, work as resolution-specific CMs to transform HR and corresponding LR features to a space where their difference is minimized. Model parameters of branch networks are optimized using our proposed CM loss function, which considers not only the discriminability of HR and LR features, but also the similarity between them. In order to deal with various possible resolutions of probe images, we train multiple pairs of small branch networks while using the same trunk network. Thorough evaluation on LFW and SCface databases shows that the proposed DCR model achieves consistently and considerably better performance than the state of the arts.

202 citations


Journal ArticleDOI
TL;DR: Simulation results demonstrate that the proposed array interpolation-based DoA estimation algorithm achieves improved performance as compared to existing coarray-based DOA estimation algorithms in terms of the number of achievable degrees-of-freedom and estimation accuracy.
Abstract: In this letter, we propose a coprime array interpolation approach to provide an off-grid direction-of-arrival (DOA) estimation. Through array interpolation, a uniform linear array (ULA) with the same aperture is generated from the deterministic non-uniform coprime array. Taking the observed correlations calculated from the signals received at the coprime array, a gridless convex optimization problem is formulated to recover all the rows and columns of the unknown correlation matrix entries corresponding to the interpolated sensors. The optimized Hermitian positive semidefinite Toeplitz matrix functions as the covariance matrix of the interpolated ULA, which enables to resolve off-grid sources. Simulation results demonstrate that the proposed array interpolation-based DOA estimation algorithm achieves improved performance as compared to existing coarray-based DOA estimation algorithms in terms of the number of achievable degrees-of-freedom and estimation accuracy.

185 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed the use of temporal convolution, in the form of time-delay neural network (TDNN) layers, along with unidirectional LSTM layers to limit the latency to 200 ms.
Abstract: Bidirectional long short-term memory (BLSTM) acoustic models provide a significant word error rate reduction compared to their unidirectional counterpart, as they model both the past and future temporal contexts. However, it is nontrivial to deploy bidirectional acoustic models for online speech recognition due to an increase in latency. In this letter, we propose the use of temporal convolution, in the form of time-delay neural network (TDNN) layers, along with unidirectional LSTM layers to limit the latency to 200 ms. This architecture has been shown to outperform the state-of-the-art low frame rate (LFR) BLSTM models. We further improve these LFR BLSTM acoustic models by operating them at higher frame rates at lower layers and show that the proposed model performs similar to these mixed frame rate BLSTMs. We present results on the Switchboard 300 h LVCSR task and the AMI LVCSR task, in the three microphone conditions.

181 citations


Journal ArticleDOI
TL;DR: A multi-scale capsule network that is more robust and efficient for feature representation in image classification and has a competitive performance on FashionMNIST and CIFAR10 datasets is proposed.
Abstract: Capsule network is a novel architecture to encode the properties and spatial relationships of the feature in an image, which shows encouraging results on image classification. However, the original capsule network is not suitable for some classification tasks, where the target objects are complex internal representations. Hence, we propose a multi-scale capsule network that is more robust and efficient for feature representation in image classification. The proposed multi-scale capsule network consists of two stages. In the first stage, structural and semantic information are obtained by multi-scale feature extraction. In the second stage, the hierarchy of features is encoded to multi-dimensional primary capsules. Moreover, we propose an improved dropout to enhance the robustness of the capsule network. Experimental results show that our method has a competitive performance on FashionMNIST and CIFAR10 datasets.

181 citations


Journal ArticleDOI
TL;DR: In this article, two iterative algorithms based on Dinkelbach's method and Newton's method are proposed to minimize the offloading delay for nonorthogonal multiple access assisted mobile edge computing (NOMA-MEC).
Abstract: This letter considers the minimization of the offloading delay for nonorthogonal multiple access assisted mobile edge computing (NOMA-MEC). By transforming the delay minimization problem into a form of fractional programming, two iterative algorithms based on, respectively, Dinkelbach's method and Newton's method are proposed. The optimality of both methods is proved and their convergence is compared. Furthermore, criteria for choosing between three possible modes, namely orthogonal multiple access, pure NOMA, and hybrid NOMA, for MEC offloading are established.

140 citations


Journal ArticleDOI
TL;DR: This letter unrolls the computational pipeline of BM3D algorithm into a convolutional neural network structure, with “extraction” and “aggregation” layers to model block matching stage in BM2D, and proposes a new convolutionAL neural network inspired by the classical BM3d algorithm, dubbed as BM3 D-Net.
Abstract: Denoising is a fundamental task in image processing with wide applications for enhancing image qualities. BM3D is considered as an effective baseline for image denoising. Although learning-based methods have been dominant in this area recently, the traditional methods are still valuable to inspire new ideas by combining with learning-based approaches. In this letter, we propose a new convolutional neural network inspired by the classical BM3D algorithm, dubbed as BM3D-Net. We unroll the computational pipeline of BM3D algorithm into a convolutional neural network structure, with “extraction” and “aggregation” layers to model block matching stage in BM3D. We apply our network to three denoising tasks: gray-scale image denoising, color image denoising, and depth map denoising. Experiments show that BM3D-Net significantly outperforms the basic BM3D method, and achieves competitive results compared with state of the art on these tasks.

139 citations


Journal ArticleDOI
TL;DR: A deep-CNN that is adjustable to the noise level of the input image immediately is proposed and an optimization method for proportionality coefficients for the thresholds of soft shrinkage is proposed that optimizes the coefficients for various noise levels simultaneously.
Abstract: The noise level of an image depends on settings of an imaging device. The settings can be used to select appropriate parameters for denoising methods. But denoising methods based on deep convolutional neural networks (deep-CNN) do not have such adjustable parameters. Therefore, a deep-CNN whose training data contain limited levels of noise does not effectively restore images whose noise level is different from the training data. If the range of noise levels of training data is extended to solve the problem, the maximum performance of a produced deep-CNN is limited. To solve the tradeoff, we propose a deep-CNN that is adjustable to the noise level of the input image immediately. We use soft shrinkage for activation functions of our deep-CNN. The soft shrinkage has thresholds proportional to the noise level given by the user. We also propose an optimization method for proportionality coefficients for the thresholds of soft shrinkage. Our method optimizes the coefficients for various noise levels simultaneously. In our experiment using a test set whose noise level is from 5 to 50, the proposed method showed higher PSNR than that in the case of the conventional method using only one deep-CNN, and PSNR comparable to that in the case of the conventional method using multiple noise-level-specific CNNs.

133 citations


Journal ArticleDOI
TL;DR: Two disturbance terms, which account for distortion once auditory masking and threshold effects are factored in, amend the mean square error (MSE) loss function by introducing perceptual criteria based on human psychoacoustics.
Abstract: This letter proposes a perceptual metric for speech quality evaluation, which is suitable, as a loss function, for training deep learning methods. This metric, derived from the perceptual evaluation of the speech quality algorithm, is computed in a per-frame basis and from the power spectra of the reference and processed speech signal. Thus, two disturbance terms, which account for distortion once auditory masking and threshold effects are factored in, amend the mean square error (MSE) loss function by introducing perceptual criteria based on human psychoacoustics. The proposed loss function is evaluated for noisy speech enhancement with deep neural networks. Experimental results show that our metric achieves significant gains in speech quality (evaluated using an objective metric and a listening test) when compared to using MSE or other perceptual-based loss functions from the literature.

Journal ArticleDOI
TL;DR: Analytical fusion rules are provided for the labeled multi-Bernoulli and marginalized $\delta$-generalized labeling multi- Bernoulli families of labeled multiobject densities.
Abstract: This letter proposes analytical expressions for the fusion of certain classes of labeled multiobject densities via Kullback–Leibler averaging. Specifically, we provide analytical fusion rules for the labeled multi-Bernoulli and marginalized $\delta$ -generalized labeled multi-Bernoulli families of labeled multiobject densities. Information fusion via Kullback–Leibler averaging ensures immunity to double counting of information and is essential to the development of effective multiagent multiobject estimation.

Journal ArticleDOI
TL;DR: This letter introduces the LOOP binary descriptor (local optimal-oriented pattern) that encodes rotation invariance into the main formulation itself, which makes any post processing stage for rotation invariant redundant and improves on both accuracy and time complexity.
Abstract: This letter introduces the LOOP binary descriptor (local optimal-oriented pattern) that encodes rotation invariance into the main formulation itself. This makes any post processing stage for rotation invariance redundant and improves on both accuracy and time complexity. We consider fine-grained lepidoptera (moth/butterfly) species recognition as the representative problem since it involves repetition of localized patterns and textures that may be exploited for discrimination. We evaluate the performance of LOOP against its predecessors as well as few other popular descriptors. Besides experiments on standard benchmarks, we also introduce a new small image dataset on NZ Lepidoptera. LOOP performs as well or better on all datasets evaluated compared to previous binary descriptors. The new dataset and demo code of the proposed method are available through the lead author's academic webpage and GitHub.

Journal ArticleDOI
TL;DR: Experiments show the proposed MSF-CNN method is superior to multiple state-of-the art plant leaf recognition methods on the MalayaKew Leaf dataset and the LeafSnap Plant Leaf dataset.
Abstract: Plant leaf recognition is a computer vision task used to automatically recognize plant species. It is very challenging since rich plant leaf morphological variations, such as sizes, textures, shapes, venation, and so on. Most existing plant leaf methods typically normalize all plant leaf images to the same size and recognize them at one scale, resulting in unsatisfactory performances. In this letter, a multiscale fusion convolutional neural network (MSF-CNN) is proposed for plant leaf recognition at multiple scales. First, an input image is down-sampled into multiples low resolution images with a list of bilinear interpolation operations. Then, these input images with different scales are step-by-step fed into the MSF-CNN architecture to learn discriminative features at different depths. At this stage, the feature fusion between two different scales is realized by a concatenation operation, which concatenates feature maps learned on different scale images from a channel view. Along with the depth of the MSF-CNN, multiscale images are progressively handled and the corresponding features are fused. Third, the last layer of the MSF-CNN aggregates all discriminative information to obtain the final feature for predicting the plant species of the input image. Experiments show the proposed MSF-CNN method is superior to multiple state-of-the art plant leaf recognition methods on the MalayaKew Leaf dataset and the LeafSnap Plant Leaf dataset.

Journal ArticleDOI
TL;DR: A new CNN is designed in these aspects in order to better capture embedding artifacts and build a wide structure with parallel subnets using several filter groups for preprocessing to detect content-adaptive steganographic schemes.
Abstract: Recent steganalytic schemes reveal embedding traces in a promising way by using convolutional neural networks (CNNs). However, further improvements, such as exploring complementary data processing operations and using wider structures, were not extensively studied so far. In this letter, we design a new CNN in these aspects in order to better capture embedding artifacts. Specifically, on the one hand, we propose to process information diversely with a module called diverse activation module. On the other hand, we build a wide structure with parallel subnets using several filter groups for preprocessing. To accelerate the training process, we pretrain the subnets independently. Extensive experiments show that the proposed method is effective in detecting content-adaptive steganographic schemes.

Journal ArticleDOI
Juntae Kim1, Minsoo Hahn1
TL;DR: This letter improves the use of context information by using an adaptive context attention model (ACAM) with a novel training strategy for effective attention, which weights the most crucial parts of the context for proper classification.
Abstract: Voice activity detection (VAD) classifies incoming signal segments into speech or background noise; its performance is crucial in various speech-related applications. Although speech-signal context is a relevant VAD asset, its usefulness varies in unpredictable noise environments. Therefore, its usage should be adaptively adjustable to the noise type. This letter improves the use of context information by using an adaptive context attention model (ACAM) with a novel training strategy for effective attention, which weights the most crucial parts of the context for proper classification. Experiments in real-world scenarios demonstrate that the proposed ACAM-based VAD outperforms the other baseline VAD methods.

Journal ArticleDOI
TL;DR: A new identification criterion is proposed that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) generative model, under mild conditions.
Abstract: In this letter, we propose a new identification criterion that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) generative model, under mild conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if the rows of one factor are sufficiently scattered over the nonnegative orthant, while no structural assumption is imposed on the other factor except being full-rank. This is by far the mildest condition under which the latent factors are provably identifiable from the NMF model.

Journal ArticleDOI
TL;DR: This letter proposed an efficient linear algorithm to solve the nonlinear equations, and gave closed-form positioning and synchronization error analysis, and showed that the proposed method can approach the Cramér–Rao lower bound by both theoretical analysis and simulation.
Abstract: For the purpose of localization and time synchronization of underwater sensor networks, buoys are generally distributed on the sea surface of the area of interest, serving as fixed anchors. However, this method is not economical and has poor scalability. An alternative is to employ an autonomous underwater vehicle (AUV) as a mobile anchor. By receiving the periodical broadcast signals from the AUV, any sensor in the communication range can measure time of arrival of received packets and obtain a series of nonlinear equations. In this letter, we proposed an efficient linear algorithm to solve the nonlinear equations, and gave closed-form positioning and synchronization error analysis. Besides, we show that the proposed method can approach the Cramer–Rao lower bound by both theoretical analysis and simulation.

Journal ArticleDOI
TL;DR: A paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising using standard pretrained CNNs together with standard nonlocal filters is introduced, exploiting the mutual similarities between groups of patches.
Abstract: We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF), exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease at every iteration of our framework, while their output is regularized by a nonlocal prior implicit within the NLF. Unlike complicated neural networks that embed the nonlocality prior within the layers of the network, our framework is modular, and it uses standard pretrained CNNs together with standard nonlocal filters. An instance of the proposed framework, called NN3D, is evaluated over large grayscale image datasets showing state-of-the-art performance.

Journal ArticleDOI
TL;DR: The proposed EB-ZCPs have aperiodic autocorrelation sums (AACS) magnitude of 4 outside the ZCZ region (except for the last time-shift taking AACS value of zero).
Abstract: This letter is focused on increasing the zero correlation zone (ZCZ) of even-length binary Z-complementary pairs (EB-ZCPs). Till date, the maximum ZCZ ratio (i.e., ZCZ width over the sequence length) for systematically constructed EB-ZCPs is $2/3$ . In this letter, we give a construction of EB-ZCPs with lengths $2^{\alpha +2} 10^\beta 26^\gamma +2$ (where $\alpha$ , $\beta$ , and $\gamma$ are nonnegative integers) and ZCZ widths $3 \times 2^\alpha 10^\beta 26^\gamma +1$ , thus achieving asymptotic ZCZ ratio of $3/4$ . The proposed EB-ZCPs are constructed via proper insertion of concatenated odd-length binary ZCPs. The ZCZ width is proved by exploiting several newly identified intrinsic structure properties of binary Golay complementary pairs, obtained from Turyn's method. The proposed EB-ZCPs have aperiodic autocorrelation sums (AACS) magnitude of 4 outside the ZCZ region (except for the last time-shift taking AACS value of zero).

Journal ArticleDOI
TL;DR: A semisupervised learning road detection method based on generative adversarial networks (GANs) and a weakly supervised learning (WSL) methodbased on conditional GANs are proposed, which can learn better representations of road areas and leverage the feature distributions on both labeled and unlabeled data.
Abstract: Road detection is a key component of autonomous driving; however, most fully supervised learning road detection methods suffer from either insufficient training data or high costs of manual annotation. To overcome these problems, we propose a semisupervised learning (SSL) road detection method based on generative adversarial networks (GANs) and a weakly supervised learning (WSL) method based on conditional GANs. Specifically, in our SSL method, the generator generates the road detection results of labeled and unlabeled images, and then they are fed into the discriminator, which assigns a label on each input to judge whether it is labeled. Additionally, in WSL method we add another network to predict road shapes of input images and use them in both generator and discriminator to constrain the learning progress. By training under these frameworks, the discriminators can guide a latent annotation process on the unlabeled data; therefore, the networks can learn better representations of road areas and leverage the feature distributions on both labeled and unlabeled data. The experiments are carried out on KITTI ROAD benchmark, and the results show our methods achieve the state-of-the-art performances.

Journal ArticleDOI
TL;DR: A new robust Kalman filter based on a detect-and-reject idea is developed that outperforms several recent robust solutions with higher computational efficiency and better accuracy.
Abstract: We consider the nonlinear robust filtering problem where the measurements are partially disturbed by outliers. A new robust Kalman filter based on a detect-and-reject idea is developed. To identify and exclude outliers automatically, each measurement is assigned an indicator variable, which is modeled by a beta-Bernoulli prior. The mean-field variational Bayesian method is then utilized to estimate the state of interest as well as the indicator in an iterative manner at each time instant. Simulation results reveal that the proposed algorithm outperforms several recent robust solutions with higher computational efficiency and better accuracy.

Journal ArticleDOI
TL;DR: This letter presents an SR method to obtain light-field images with high quality and geometric consistency via a combined deep convolutional neural networks (CNNs) framework and proposes an epipolar plane image enhancement deep CNN to restore the geometric consistency of these images.
Abstract: Light-field cameras can capture spatial and angular information of light with a single exposure, but they suffer from low spatial resolution that limits their performance in practical use. Superresolution (SR) methods have been used to improve spatial resolution of light-field images, but most of them do not take full advantages of the particular structure of the light field. In this letter, we present an SR method to obtain light-field images with high quality and geometric consistency via a combined deep convolutional neural networks (CNNs) framework. The spatial resolution of subaperture images is enhanced separately by a single-image superresolution deep CNN. Then, an epipolar plane image enhancement deep CNN is proposed to restore the geometric consistency of these images. Experimental results show that our method achieves the state-of-the-art performance on both quantitative and qualitative evaluations.

Journal ArticleDOI
TL;DR: This letter proposes a more discriminative MDS method to learn a mapping matrix, which projects the HR images and LR images to a common subspace, and adds an interclass constraint to enlarge the distances of different subjects in the subspace to ensure discriminability.
Abstract: Face images captured by surveillance videos usually have limited resolution. Due to resolution mismatch, it is hard to match high-resolution (HR) faces with low-resolution (LR) faces directly. Recently, multidimensional scaling (MDS) has been employed to solve the problem. In this letter, we proposed a more discriminative MDS method to learn a mapping matrix, which projects the HR images and LR images to a common subspace. Our method is discriminative since both interclass distances and intraclass distances are taken into consideration. We add an interclass constraint to enlarge the distances of different subjects in the subspace to ensure discriminability. Besides, we consider not only the relationship of HR–LR images, but also the relationship of HR–HR images and LR–LR images in order to preserve local consistency. Experimental results on FERET, Multi-PIE, and SCface databases demonstrate the effectiveness of our proposed approach.

Journal ArticleDOI
TL;DR: This letter proposes a novel feature weighting and regularization (FWR) method that utilizes all CSP features to avoid information loss and demonstrates that the proposed FWR method enhances the classification accuracy comparing to the conventional feature selection approaches.
Abstract: Electroencephalography signals have very low spatial resolution and electrodes capture signals that are overlapping each other. To extract the discriminative features and alleviate overfitting problem for motor imagery brain-computer interface (BCI), spatial filtering is widely applied but often only very few common spatial patterns (CSP) are selected as features while ignoring all others. However, using only few CSP features, though alleviates overfitting problem, loses the discriminating information, which limits the BCI performance. This letter proposes a novel feature weighting and regularization (FWR) method that utilizes all CSP features to avoid information loss. The proposed method can be applied in all CSP-based approaches. Experiments of this letter show the effect of the proposed method applied in the standard CSP and its two extensions, common spatio-spectral patterns and regularized CSP. Results on BCI Competition III Dataset IIIa and IV Dataset IIa demonstrate that the proposed FWR method enhances the classification accuracy comparing to the conventional feature selection approaches.

Journal ArticleDOI
Dongkyu Kim1, Han-Ul Jang1, Seung-Min Mun1, Sunghee Choi1, Heung-Kyu Lee1 
TL;DR: This work presents a median filtering anti-forensic method based on deep convolutional neural networks, which can effectively remove traces from median filtered images and adopts the framework of generative adversarial networks to generate images that follow the underlying statistics of unaltered images, significantly enhancing forensic undetectability.
Abstract: Median filtering is used as an anti-forensic technique to erase processing history of some image manipulations such as JPEG, resampling, etc. Thus, various detectors have been proposed to detect median filtered images. To counter these techniques, several anti-forensic methods have been devised as well. However, restoring the median filtered image is a typical ill-posed problem, and thus it is still difficult to reconstruct the image visually close to the original image. Also, it is further hard to make the restored image have the statistical characteristic of the raw image for the anti-forensic purpose. To solve this problem, we present a median filtering anti-forensic method based on deep convolutional neural networks, which can effectively remove traces from median filtered images. We adopt the framework of generative adversarial networks to generate images that follow the underlying statistics of unaltered images, significantly enhancing forensic undetectability. Through extensive experiments, we demonstrate that our method successfully deceives the existing median filtering forensic techniques.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed filter is more robust to the regularization parameter and can produce visually pleasing output images.
Abstract: Although the guided image filtering (GIF) has an excellent edge-preserving property, it is prone to suffer from the halo artifacts near the edges. Weighted GIF and gradient-domain GIF try to address the problem by incorporating an edge-aware weighting into GIF. However, they are very sensitive to the regularization parameter and the halo artifacts will become serious as the regularization parameter increases. Moreover, noise in the background is often amplified because of the fixed amplification factor for the detail layer. In this letter, an effective GIF is proposed for better contrast enhancement. First, the average of local variances for all pixels is incorporated into the cost function of GIF for preserving the edges accurately in the base layer. Second, the amplification factor for the detail layer is calculated in a content-adaptive way for suppressing the noise while boosting the fine details. Experimental results show that the proposed filter is more robust to the regularization parameter and can produce visually pleasing output images. Compared to GIF and its related filters, halo artifacts and noise are reduced or attenuated by the proposed filter significantly.

Journal ArticleDOI
TL;DR: A novel method for estimating transmission map by energy minimization is proposed to solve the underconstraint problem of single image dehazing, which combines the dark channel prior with piecewise smoothness.
Abstract: Hazy images have limited visibility and low contrast. The degradation is expressed by transmission map, which is one of the most important estimates of single image dehazing. Transmission map estimation is an underconstraint problem, and lots of priors have been proposed. Among them, the dark channel prior is widely recognized. However, traditional methods have not fully exploited its power due to improper assumptions or operations, which cause unwanted artifacts. The postrefinement algorithms employed to remove these artifacts in turn undermine the merits of the prior. In this letter, a novel method for estimating transmission map by energy minimization is proposed to solve this problem. The energy function combines the dark channel prior with piecewise smoothness. The method is compared to the state-of-the-art methods and shows outstanding performance.

Journal ArticleDOI
TL;DR: Experiments on real-world Jilin-1 video satellite images and Kaggle Open Source Dataset show that the proposed PECNN outperforms the state-of theart methods both in visual effects and quantitative metrics.
Abstract: Deep convolutional neural networks (CNNs) have been extensively applied to image or video processing and analysis tasks For single-image superresolution (SR) processing, previous CNN-based methods have led to significant improvements, when compared to the shallow learning-based methods However, these CNN-based algorithms with simply direct or skip connections are not suitable for satellite imagery SR because of complex imaging conditions and unknown degradation process More importantly, they ignore the extraction and utilization of the structural information in satellite images, which is very unfavorable for video satellite imagery SR with such characteristics as small ground targets, weak textures, and over-compression distortion To this end, this letter proposes a novel progressively enhanced network for satellite image SR called PECNN, which is composed of a pretraining CNN-based network and an enhanced dense connection network The pretraining part is used to extract the low-level feature maps and reconstructs a basic high-resolution image from the low-resolution input In particular, we propose a transition unit to obtain the structural information from the base output Then, the obtained structural information and the extracted low-level feature maps are transmitted to the enhanced network for further extraction to enforce the feature expression Finally, a residual image with enhanced fine details obtained from the dense connection network is used to enrich the basic image for the ultimate SR output Experiments on real-world Jilin-1 video satellite images and Kaggle Open Source Dataset show that the proposed PECNN outperforms the state-of-the-art methods both in visual effects and quantitative metrics Code is available at https://githubcom/kuihua/PECNN

Journal ArticleDOI
TL;DR: In this letter, a novel construction of Z-complementary sequence (ZCS) sets is proposed based on generalized Boolean functions and the constructed ZCS set is optimal since the set size achieves the theoretical upper bound.
Abstract: In this letter, a novel construction of Z-complementary sequence (ZCS) sets is proposed based on generalized Boolean functions. The constructed ZCS set is optimal since the set size achieves the theoretical upper bound. In addition, the proposed construction is a direct construction without the aid of other special sequences. In this letter, the peak-to-average power ratio (PAPR) property of the constructed ZCS sets is also investigated. Furthermore, the set sizes, flock sizes, sequence lengths, and the widths of zero correlation zones of the constructed ZCS sets are all very flexible.