scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Signal Processing Letters in 2022"


Journal ArticleDOI
TL;DR: In this article , a linear prefilter was introduced to whiten the correlated noise (i.e., colored noise) for obtaining the unbiased estimate of the filter weight, and a new gradient approach was developed for the adaptive filter design based on the fractional-order derivative and a linear filter.
Abstract: The previous work for the filter design considers uncorrelated white measurement noise disturbance. For more complex correlated noise disturbance, the conventional adaptive filter results in biased estimates. To overcome this problem, we introduce a linear prefilter to whiten the correlated noise (i.e., colored noise) for obtaining the unbiased estimate of the filter weight. Moreover, the design of some adaptive filters mainly focuses on the integer-order optimization methods. However, compared with the integer-order-based adaptive algorithms, the fractional-order-based algorithms show better performance. Thus, this letter develops a new gradient approach for the adaptive filter design based on the fractional-order derivative and a linear filter. Finally, the simulation results are provided from the system identification perspective for demonstrating the performance analysis of the proposed algorithms.

93 citations


Journal ArticleDOI
TL;DR: Thumbnail consistency loss is proposed to guarantee the visual quality of the encrypted image, and the quality of decrypted image is improved via ssim-loss, and a binary string is adopted as the key instead of the decryption network parameters to facilitate sharing of images.
Abstract: To balance the privacy and usability of images in the cloud, Tajik et al. recently designed a thumbnail preserving encryption (TPE) based on sum-preserving encryption, however, multiple iterations make it inefficient. We use CycleGan to simulate randomized unary encoding (RUE) and achieve a more efficient TPE. Thumbnail consistency loss is proposed to guarantee the visual quality of the encrypted image, and the quality of decrypted image is improved via ssim-loss. Besides, a decryption network with a key is retrained such that it can decrypt cipher images in multiple domains, and a binary string is adopted as the key instead of the decryption network parameters to facilitate sharing of images. Simulations verify the effectiveness of the proposed algorithm.

34 citations


Journal ArticleDOI
TL;DR: In this article , a multi-innovation algorithm was proposed for estimating the unknown model parameters and time-delay for an extended version of the nonlinear exponential autoregressive (ExpAR) time-series model.
Abstract: Nonlinear time-series modeling is fundamental to a wide variety of control and prediction problems. This letter focuses on the joint parameter and time-delay estimation for an extended version of the nonlinear exponential autoregressive (ExpAR) time-series model. To address the difficulties posed by the unknown time-delay and improve the estimation accuracy, we first employ the redundant rule to transform the ExpAR model into an augmented identification model. Then we invoke the multi-innovation theory to enhance data utilization and propose a new algorithm that combines stochastic gradient descent with discrete search for estimating the unknown model parameters and time-delay. The simulation results show that by properly adjusting the innovation length, the estimation accuracy of the proposed multi-innovation algorithm can significantly exceed that of the single-innovation algorithm.

26 citations


Journal ArticleDOI
TL;DR: This paper proposes a simple and effective Siamese oriented Region Proposal Network (Siamese-ORPN) for visual tracking that uses oriented RPN on the similarity feature maps to directly generate high-quality oriented proposals in a nearly cost-free manner.
Abstract: Current oriented visual tracking depends on segmentation-driven framework brings about expensive computation cost, which becomes the bottleneck in the practical application. This paper proposes a simple and effective Siamese oriented Region Proposal Network (Siamese-ORPN) for visual tracking. Specifically, we propose to use oriented RPN on the similarity feature maps to directly generate high-quality oriented proposals in a nearly cost-free manner. Moreover, a top-down feature fusion network is proposed as the backbone for feature extraction and feature fusion, which can achieve substantial gains from the diversity of visual-semantic hierarchies. The Siamese-ORPN runs at 85 fps while achieving leading performance on the benchmark datasets including VOT2018 (44.6% EAO) and VOT2019 (39.6% EAO).

24 citations


Journal ArticleDOI
TL;DR: Siamese oriented region proposal network as mentioned in this paper was proposed to directly generate high-quality oriented proposals in a nearly cost-free manner and a top-down feature fusion network was proposed as the backbone for feature extraction and feature fusion, which can achieve substantial gains from the diversity of visual-semantic hierarchies.
Abstract: Current oriented visual tracking depends on segmentation-driven framework brings about expensive computation cost, which becomes the bottleneck in the practical application. This paper proposes a simple and effective Siamese oriented Region Proposal Network (Siamese-ORPN) for visual tracking. Specifically, we propose to use oriented RPN on the similarity feature maps to directly generate high-quality oriented proposals in a nearly cost-free manner. Moreover, a top-down feature fusion network is proposed as the backbone for feature extraction and feature fusion, which can achieve substantial gains from the diversity of visual-semantic hierarchies. The Siamese-ORPN runs at 85 fps while achieving leading performance on the benchmark datasets including VOT2018 (44.6% EAO) and VOT2019 (39.6% EAO).

24 citations


Journal ArticleDOI
TL;DR: This work proposes a pure transformer-based framework, called as HashFormer, to tackle the deep hashing task, and utilizes vision transformer (ViT) as its backbone, and treats binary codes as the intermediate representations for the authors' surrogate task, i.e., image classification.
Abstract: Deep image hashing aims to map an input image to compact binary codes by deep neural network, to enable efficient image retrieval across large-scale dataset. Due to the explosive growth of modern data, deep hashing has gained growing attention from research community. Recently, convolutional neural networks like ResNet have dominated in deep hashing. Nevertheless, motivated by the recent advancements of vision transformers, we propose a pure transformer-based framework, called as HashFormer, to tackle the deep hashing task. Specifically, we utilize vision transformer (ViT) as our backbone, and treat binary codes as the intermediate representations for our surrogate task, i.e., image classification. In addition, we observe that the binary codes suitable for classification are sub-optimal for retrieval. To mitigate this problem, we present a novel average precision loss, which enables us to directly optimize the retrieval accuracy. To the best of our knowledge, our work is one of the pioneer works to address deep hashing learning problems without convolutional neural networks (CNNs). We perform comprehensive experiments on three widely-studied datasets: CIFAR-10, NUSWIDE and ImageNet. The proposed method demonstrates promising results against existing state-of-the-art works, validating the advantages and merits of our HashFormer.

22 citations


Journal ArticleDOI
TL;DR: A multi-scale temporal transformer (MTT) is proposed, a task-oriented lateral connection (LaC) aiming to align semantical hierarchies is proposed and the proposed method achieves the state-of-the-art on three large datasets.
Abstract: In the task of skeleton-based action recognition, long-term temporal dependencies are significant cues for sequential skeleton data. State-of-the-art methods rarely have access to long-term temporal information, due to the limitations of their receptive fields. Meanwhile, most of the recent multiple branches methods only consider different input modalities but ignore the information in various temporal scales. To address the above issues, we propose a multi-scale temporal transformer (MTT) in this letter, for skeleton-based action recognition. Firstly, the raw skeleton data are embedded by graph convolutional network (GCN) blocks and multi-scale temporal embedding modules (MT-EMs), which are designed as multiple branches to extract features in various temporal scales. Secondly, we introduce transformer encoders (TE) to integrate embeddings and model the long-term temporal pattern. Moreover, we propose a task-oriented lateral connection (LaC) aiming to align semantical hierarchies. LaC distributes input embeddings to the downstream transformer encoders (TE), according to semantical levels. The classification headers aggregate results from TE and predict the action categories at last. The proposed method is shown efficiency and universality during experiments and achieves the state-of-the-art on three large datasets, NTU-RGBD 60, NTU-RGBD 120 and Kinetics-Skeleton 400.

21 citations


Journal ArticleDOI
TL;DR: Experimental results on UrbanSound8K datasets demonstrate that the proposed CNN-RNN architecture achieves better performance than the state-of-the-art classification models.
Abstract: Deep neural networks in deep learning have been widely demonstrated to have higher accuracy and distinct advantages over traditional machine learning methods in extracting data features. While convolutional neural networks (CNNs) have shown great success in feature extraction and audio classification, it is important to note that real-time audios are dependent on previous scenes. Also, the main drawback of deep learning algorithms is that they need a huge number of datasets to indicate their efficient performance. In this paper, a recurrent neural network (RNN) combined with CNN is proposed to address this problem. Moreover, a Deep Convolutional Generative Adversarial Network (DCGAN) is used for high-quality data augmentation. This data augmentation technique is applied to the UrbanSound8K dataset to improve the environmental sound classification. Batch normalization, transfer learning, and three feature representations map are used to improve the model accuracy. The results show that the generated images by DCGAN have similar features to the original training images and has the capability to generate spectrograms and improve the classification accuracy. Experimental results on UrbanSound8K datasets demonstrate that the proposed CNN-RNN architecture achieves better performance than the state-of-the-art classification models.

18 citations


Journal ArticleDOI
TL;DR: Zheng et al. as mentioned in this paper proposed a simple but effective Transformer-based method for light field image super-resolution (LFT), where an angular transformer is designed to incorporate complementary information among different views, and a spatial transformer is developed to capture both local and long-range dependencies within each sub-aperture image.
Abstract: Light field (LF) image super-resolution (SR) aims at reconstructing high-resolution LF images from their low-resolution counterparts. Although CNN-based methods have achieved remarkable performance in LF image SR, these methods cannot fully model the non-local properties of the 4D LF data. In this paper, we propose a simple but effective Transformer-based method for LF image SR. In our method, an angular Transformer is designed to incorporate complementary information among different views, and a spatial Transformer is developed to capture both local and long-range dependencies within each sub-aperture image. With the proposed angular and spatial Transformers, the beneficial information in an LF can be fully exploited and the SR performance is boosted. We validate the effectiveness of our angular and spatial Transformers through extensive ablation studies, and compare our method to recent state-of-the-art methods on five public LF datasets. Our method achieves superior SR performance with a small model size and low computational cost. Code is available at https://github.com/ZhengyuLiang24/LFT.

18 citations


Journal ArticleDOI
TL;DR: In this paper , the authors make the utmost use of the fitting advantages of Gaussian and polynomial functions, and propose a nonlinear signal model with broader applications, and focus on the parameter estimation issues of the proposed models in the presence of noises.
Abstract: To extract important information about the nonlinear signals, this letter makes the utmost of the fitting advantages of Gaussian and polynomial functions, and proposes a nonlinear signal model with broader applications. Then we focus on the parameter estimation issues of the proposed models in the presence of noises. The stability factor recursive algorithm is devised based on the increasing noisy data, which makes full use of the information from the nonlinear signals. Applying the hierarchical identification principle, a two-stage recursive algorithm with higher computational efficiency is developed for the nonlinear signals. The simulation results test the effectiveness of the proposed algorithms from the aspects of estimation accuracy and prediction effect.

16 citations


Journal ArticleDOI
TL;DR: A hybrid denoising model based on Transformer Encoder and Convolutional Decoder Network (TECDNet) is proposed, where the Transformer based on novel radial basis function (RBF) attention is used as encoder to improve the representation capability of overall model.
Abstract: Transformer typically enjoys larger model capacity but higher computational loads than convolutional neural network (CNN) in vision tasks. In this letter, the advantages of such two networks are fused for achieving effective and efficient real image denoising. We propose a hybrid denoising model based on Transformer Encoder and Convolutional Decoder Network (TECDNet). The Transformer based on novel radial basis function (RBF) attention is used as encoder to improve the representation capability of overall model. In decoder, the residual CNN instead of Transformer is adopted to greatly reduce computational complexity of the whole denoising network. Extensive experimental results on real images show that TECDNet achieves the state-of-the-art denosing performance with relatively low computational cost.

Journal ArticleDOI
TL;DR: A semidefinite relaxation (SDR)-based iterative algorithm is developed, which alternately yields the transmit beamformer at each SN and the corresponding reflection phases at the IRS to achieve the minimum mean-squared error (MSE) parameter estimate at the FC, subject to transmit power and ED signal-to-noise ratio constraints.
Abstract: Wireless sensor networks (WSNs) are vulnerable to eavesdropping as the sensor nodes (SNs) communicate over an open radio channel. Intelligent reflecting surface (IRS) technology can be leveraged for physical layer security in WSNs. In this letter, we propose a joint transmit and reflective beamformer (JTRB) design for secure parameter estimation at the fusion center (FC) in the presence of an eavesdropper (ED) in a WSN. We develop a semidefinite relaxation (SDR)-based iterative algorithm, which alternately yields the transmit beamformer at each SN and the corresponding reflection phases at the IRS, to achieve the minimum mean-squared error (MSE) parameter estimate at the FC, subject to transmit power and ED signal-to-noise ratio constraints. Our simulation results demonstrate robust MSE and security performance of the proposed IRS-based JTRB technique.

Journal ArticleDOI
TL;DR: In this article , a joint optimization of sparse FDAs was proposed to synthesize a decoupled transmit beampattern from the perspective of spatial-frequency virtual array, where both spatial and spectral configuration of FDAs were optimized via joint antenna-frequency selection.
Abstract: Beampattern synthesis of frequency diverse arrays (FDAs) has recently raised increased attention attributed to their range-dependent beampattern. The transmit beampattern of uniform FDA appears $S$ -shaped, which implies coupling in the range-angle domain and thus causing unwanted energy leakage into the area of non-interest. In this work, we propose a joint optimization of sparse FDAs to synthesize a decoupled transmit beampattern from the perspective of spatial-frequency virtual array. Specifically, both spatial and spectral configuration of FDAs are optimized via joint antenna-frequency selection. In order to solve the resultant NP-hard combinatorial optimization problem, we propose an iterative reweighting strategy to transform the original problem into a convex optimization. Further, we proceed to synthesize a time-invariant decoupled beampattern by designing a time-varying unit frequency step. Comparative simulations are provided to manifest the superior performance of the proposed FDA in the metric of normalized peak sidelobe level (NPSLL) of transmit beampatterns.

Journal ArticleDOI
TL;DR: In this article , a novel sparse array (SA) structure is proposed based on the maximum inter-element spacing (IES) constraint (MISC) criterion, which has significantly increased uniform degrees of freedom (uDOF) and reduced mutual coupling.
Abstract: A novel sparse array (SA) structure is proposed based on the maximum inter-element spacing (IES) constraint (MISC) criterion. Compared with the traditional MISC array, the proposed SA configurations, termed as improved MISC (IMISC) has significantly increased uniform degrees of freedom (uDOF) and reduced mutual coupling. In particular, the IMISC arrays are composed of six uniform linear arrays (ULAs), which can be determined by an IES set. The IES set is constrained by two parameters, namely the maximum IES and the number of sensors. The uDOF of the IMISC arrays is derived and the weight function of the IMISC arrays is analyzed as well. The proposed IMISC arrays have a great advantage in terms of uDOF against the existing SAs, while their mutual coupling remains at a low level. Simulations are carried out to demonstrate the advantages of the IMISC arrays.

Journal ArticleDOI
TL;DR: Substantial experiments on prevailing synthetic datasets and real-world videos verify the superior performance of the proposed method over the existing state-of-the-art methods for video deraining.
Abstract: Presence of rainy artifacts severely degrade the overall visual quality of a video and tend to overlap with the useful information present in the video frames. This degraded video affects the effectiveness of many automated applications like traffic monitoring, surveillance, etc. As video deraining is a pre-processing step for automated applications, it is highly demanded to have a lightweight deraining module. Therefore, in this paper, a “Progressive Subtractive Recurrent Lightweight Network” is proposed for video deraining. Initially, the Multi-Kernel feature Sharing Residual Block (MKSRB) is designed to learn different sizes of rain streaks which facilitates the complete removal of rain streaks through progressive subtractions. These MKSRB features are merged with previous frame output recurrently to maintain the temporal consistency. Further, multi-receptive feature subtraction is performed through Multi-scale Multi-Receptive Difference Block (MMRDB) to avoid loss of details and extract high-frequency information. Finally, progressively learned features through MKSRB and recurrent feature merging are aggregated with fused MMRDB features which outputs the rain-free frame. Substantial experiments on prevailing synthetic datasets and real-world videos verify the superior performance of the proposed method over the existing state-of-the-art methods for video deraining.

Journal ArticleDOI
TL;DR: A multi-branch topology residual block (M TRB)-based network (MTRBNet), which can alleviate training difficulties and more efficiently use the parameters between neurons, is proposed, which achieves superior performance compared with several state-of-the-art methods.
Abstract: The learning-based low-light image enhancement methods have remarkable performance due to the robust feature learning and mapping capabilities. This paper proposes a multi-branch topology residual block (MTRB)-based network (MTRBNet), which can alleviate training difficulties and more efficiently use the parameters between neurons. Compared with the previous residual block, the proposed MTRB increases the width of the network and simultaneously transmits information along with the depth and width directions, which can effectively select network nodes to promote the network learning capacity. Meanwhile, the feature information of neighbor nodes is transferred to each other, thereby maximizing the information flow of the convolution unit. The proposed information connection and feedback mechanism can improve the network’s ability to capture the global and local features. We analyze the pros and cons of two multi-feature fusion strategies (i.e., addition and concatenation) and three normalization methods on the quantitative results. In addition, we embed our MTRB into traditional Encoder-Decoder structure to improve the image enhancement results under different low-light imaging conditions. Experiments on the LOL image dataset have demonstrated that our MTRBNet achieves superior performance compared with several state-of-the-art methods.

Journal ArticleDOI
TL;DR: In this article , the authors considered the joint active and passive beamforming problem for an RIS-assisted radar, where multiple RISs are employed to assist the surveillance of multiple targets in cluttered environments.
Abstract: Intelligent reflecting surface (IRS) is a promising technology being considered for future wireless communications due to its ability to control signal propagation. This paper considers the joint active and passive beamforming problem for an IRS-assisted radar, where multiple IRSs are employed to assist the surveillance of multiple targets in cluttered environments. Specifically, we aim to maximize the minimum target illumination power at multiple target locations by jointly optimizing the active beamformer at the radar transmitter and the passive phase-shift matrices at the IRSs, subject to an upperbound on the clutter power at each clutter scatterer. The resulting optimization problem is nonconvex and solved with a sequential optimization procedure along with semidefinite relaxation (SDR). Simulation results show that additional line-of-sight (LOS) paths created by IRSs can substantially improve the radar robustness against target blockage.

Journal ArticleDOI
TL;DR: In this paper, a tensor-based coherent direction-of-arrival (DOA) estimation method was proposed, which avoids the inefficient spatial smoothing by reconstructing a structured covariance tensor from the rank-deficient coherent covariance statistics.
Abstract: Existing tensor-based coherent direction-of-arrival (DOA) estimation methods adopting spatial smoothing to decorrelate the coherent tensor statistics usually lead to a poor decorrelation performance. In this letter, we propose a structured tensor reconstruction method for two-dimensional coherent DOA estimation, which then avoids the inefficient spatial smoothing. In particular, after investigating the structural property of the four-dimensional incoherent covariance tensor, we propose a tensorial Hermitian Toeplitz mapping rule to reconstruct a structured covariance tensor from the rank-deficient coherent covariance tensor statistics. It is theoretically proved that, the reconstructed covariance tensor admits a decorrelated canonical polyadic model with a tensorial Hermitian Toeplitz structure, whose decomposition ensures a closed-form coherent DOA estimation. The effectiveness of the proposed method is verified by simulations.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an LSTM-based method by exploiting the spatio-temporal correlation of the 3D movements of POI and auxiliary points (APs) on the same surface of the heart.
Abstract: In robot-assisted cardiac surgery, predicting heart motion can help improve the operation accuracy and safety of surgical robots. Different from the conventional prediction schemes which model the point of interest (POI) with only temporal correlation of past observations, this paper proposes an LSTM-based method by exploiting the spatio-temporal correlation of the 3D movements of POI and auxiliary points (APs) on the same surface of the heart. Three different LSTM models are investigated. The first two models define the POI prediction as a pure time-series forecasting problem based on past POI trajectory, and the third model combines the past observations of POI and new observations of APs to take into consideration the extra spatial correlations for prediction. Experimental comparison studies based on 3D coordinates obtained from real stereo-endoscopic videos demonstrate the superior performance of the proposed spatio-temporal LSTM model.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper developed a symmetric coprime array (SCA) whose sensor locations satisfy the nesting property, so it can be used as a dense subarray of nested array.
Abstract: Recently, sparse arrays such as nested array and coprime array have attracted much attention in the field of array signal processing. In this letter, we develop a symmetric coprime array (SCA) whose sensor locations satisfy the nesting property, so it can be used as a dense subarray of nested array. Based on this observation, we propose a new sparse array named coprime nested array, which can achieve the same number of uniform degrees of freedom (uDOFs) as the prototype nested array, while the mutual coupling effect is at the same level as the coprime arrays. Moreover, an improved coprime nested array (ICNA) is proposed by rearranging some sensors in SCA to the right side of the sparse subarray. ICNA possesses more uDOFs than the existing nested arrays with further reduced mutual coupling effect. Numerical simulations verify the effectiveness of the proposed configurations.

Journal ArticleDOI
TL;DR: In this article , a global-local enhancement network (GLE-Net) is proposed to correct the intensity via learning extra information from collected training data, including the following three key steps: first, RSIs are decomposed by the discrete wavelet transformation (DWT) method into the low-frequency component and the detail components.
Abstract: Low-quality remotely sensed images (RSIs) are not beneficial for the analysis of many activities including agricultural growth, resident migration, forest fire, and etc. Many previous enhancement schemes improve their quality via changing their illumination. However, these approaches often fail in detail and brightness preservation as well as contrast improvement due to that the information from a single image is limited. To address this issue, an enhancement framework, named as global-local enhancement network (GLE-Net), is proposed to correct the intensity via learning extra information from collected training data, including the following three key steps: first, RSIs are decomposed by the discrete wavelet transformation (DWT) method into the low-frequency component and the detail components. Then, the low-frequency component is improved by the global enhancement network while the detail components are enhanced by the local enhancement network in parallel. Finally, the enhanced components are employed to produce high-quality images with the inverse DWT (IDWT) method. The quantitatively and qualitatively comparable experiments on both synthetic and real-world RSIs validate that the proposed GLE-Net method performs well on preserving brightness and fine details, and even outperforms the state-of-the-arts.

Journal ArticleDOI
TL;DR: This letter proposes a powerful Vision Transformer-based Generative Adversarial Network (Transformer-GAN) for enhancing low-light images and demonstrates that the method outperforms state-of-the-art low- light enhancement methods on popular low-lights datasets.
Abstract: Images and videos shot in low light are often accompanied by severe image degradation, such as color noise, chromatic aberrations and loss of details. Most existing convolutional neural network (CNN)-based low-light enhancement methods focus on decomposing the image into illumination and reflection parts via the Retinex model, but these methods often fail to adequately consider controlling noise during enhancement and perform poorly in the face of complex lighting environments. In this letter, we propose a powerful Vision Transformer-based Generative Adversarial Network (Transformer-GAN) for enhancing low-light images. Transformer-GAN consists of two subnets as follows: (1) the feature extraction is achieved by an iterative multi-branch network in the feature extraction subnet, and (2) the enhancement is completed in the image reconstruction subnet. The innovative core works are multi-head multi-covariance self-attention (MHMCA) and Light feature-forward module structures (LFFM) in Transformer-GAN. Experiments demonstrate that our method outperforms state-of-the-art low-light enhancement methods on popular low-light datasets.

Journal ArticleDOI
TL;DR: A hierarchical decoding network based on a swin transformer to perform red–green–blue and thermal (RGB-T) salient object detection (SOD) and outperforms 12 state-of-the-art methods on three RGB-T SOD datasets.
Abstract: Although conventional deep convolutional neural networks are effective for contextual semantic segmentation of objects, recent vision transformers can capture global information of an image and are better at capturing semantic associations over longer ranges. In addition, some existing saliency detection methods disregard the guidance of high-level semantic information for low-level features during decoding, and only use layer-by-layer transmission for encoding. Therefore, we propose a hierarchical decoding network based on a swin transformer to perform red–green–blue and thermal (RGB-T) salient object detection (SOD). First, a sine–cosine fusion module performs multimodality intersections and exploits complementarity. As a second fusion stage, an advanced semantic information guidance module adjusts high-level semantic information and low-level detailed characteristics. Finally, a global saliency perception module fuses cross-layer information in a top-down path. Comprehensive experiments demonstrate that the proposed network outperforms 12 state-of-the-art methods on three RGB-T SOD datasets.

Journal ArticleDOI
TL;DR: The jointly-optimized S-RRLS (JO-S- RRLS) algorithm, which not only exhibits low misadjustment but also can track well sudden changes of a sparse system, is developed.
Abstract: This paper proposes a unified sparsity-aware robust recursive least-squares RLS (S-RRLS) algorithm for the identification of sparse systems under impulsive noise. The proposed algorithm generalizes multiple algorithms only by replacing the specified criterion of robustnessand sparsity-aware penalty. Furthermore, by jointly optimizing the forgetting factor and the sparsity penalty parameter, we develop the jointly-optimized S-RRLS (JO-S-RRLS) algorithm, which not only exhibits low misadjustment but also can track well sudden changes of a sparse system. Simulations in impulsive noise scenarios demonstrate that the proposed S-RRLS and JO-S-RRLS algorithms outperform existing techniques.

Journal ArticleDOI
TL;DR: This work makes the first attempt to leverage the plentiful unlabeled data to conduct self-supervised pre-training for BIQA task and demonstrates that the learned pre-trained model can significantly benefit the existing learning based IQA models.
Abstract: Blind image quality assessment (BIQA) has witnessed a flourishing progress due to the rapid advances in deep learning technique. The vast majority of prior BIQA methods try to leverage models pre-trained on ImageNet to mitigate the data shortage problem. These well-trained models, however, can be sub-optimal when applied to BIQA task that varies considerably from the image classification domain. To address this issue, we make the first attempt to leverage the plentiful unlabeled data to conduct self-supervised pre-training for BIQA task. Based on the distorted images generated from the high-quality samples using the designed distortion augmentation strategy, the proposed pre-training is implemented by a feature representation prediction task. Specifically, patch-wise feature representations corresponding to a certain grid are integrated to make prediction for the representation of the patch below it. The prediction quality is then evaluated using a contrastive loss to capture quality-aware information for BIQA task. Experimental results conducted on KADID-10 k and KonIQ-10 k databases demonstrate that the learned pre-trained model can significantly benefit the existing learning based IQA models.

Journal ArticleDOI
TL;DR: This paper evaluates the proposed neural acoustic-phonetic framework on the RSR2015 database Part III corpus, that consists of random digit strings, and shows that the proposed framework with PAM consistently outperforms baseline.
Abstract: Traditional acoustic-phonetic approach makes use of both spectral and phonetic information when comparing the voice of speakers. While phonetic units are not equally informative, the phonetic context of speech plays an important role in speaker verification (SV). In this paper, we propose a neural acoustic-phonetic approach that learns to dynamically assign differentiated weights to spectral features for SV. Such differentiated weights form a phonetic attention mask (PAM). The neural acoustic-phonetic framework consists of two training pipelines, one for SV and another for speech recognition. Through the PAM, we leverage the phonetic information for SV. We evaluate the proposed neural acoustic-phonetic framework on the RSR2015 database Part III corpus, that consists of random digit strings. We show that the proposed framework with PAM consistently outperforms baseline with an equal error rate reduction of 13.45% and 10.20% for female and male data, respectively.

Journal ArticleDOI
TL;DR: In this paper , a nested U-Net with self-attention and dense connectivity (SADNUNet) is proposed for monaural speech enhancement in the time domain.
Abstract: With the development of deep neural networks, speech enhancement technology has been vastly improved. However, commonly used speech enhancement approaches cannot fully leverage contextual information from different scales, which limits performance improvement. To address this problem, we propose a nested U-Net with self-attention and dense connectivity (SADNUNet) for monaural speech enhancement in the time domain. SADNUNet is an encoder-decoder structure with skip connections. In SADNUNet, the multi-scale aggregation block is proposed to explore more contextual information from different scales. By this means, the advantage of global and local speech features can be fully utilized to improve speech reconstruction ability. Furthermore, dense connectivity and self-attention are incorporated in the network for better feature extraction and utterance level context aggregation. The experimental results demonstrate that the proposed approach achieves on-par or better performance than other models in objective speech intelligibility and quality scores.

Journal ArticleDOI
TL;DR: It is argued that the insufficient use of splicing boundary is a main reason for poor accuracy and an Edge-enhanced Transformer (ET) for tampered region localization is proposed, which can accurately localize tampered regions in both pixel and edge levels and outperforms state-of-the-art methods.
Abstract: A key challenge of image splicing detection is how to localize integral tampered regions without false alarm. Although current forgery detection approaches have achieved promising performance, the integrality and false alarm are overlooked. In this paper, we argue that the insufficient use of splicing boundary is a main reason for poor accuracy. To tackle this problem, we propose an Edge-enhanced Transformer (ET) for tampered region localization. Specifically, to capture rich tampering traces, a two-branch edge-aware transformer is built to integrate the splicing edge clues into the forgery localization network, generating forgery features and edge features. Furthermore, we design a feature enhancement module to highlight the artifacts of the edge area in forgery features and assign weight values to the resulting tensor in spatial domain for vital signal strengthening and noise suppression. Extensive experimental results on CASIA v1.0, CASIA v2.0 and NC2016 demonstrate that the proposed method can accurately localize tampered regions in both pixel and edge levels and outperforms state-of-the-art methods.

Journal ArticleDOI
TL;DR: The proposed method has a relatively good robustness in localization even under the circumstance that the prior knowledge of the NLOS links or its distribution does not know, and is demonstrated in the simulations compared with other state-of-the-art techniques.
Abstract: This paper addresses the target localization problem using time-of-arrival (TOA)-based technique under the non-line-of-sight (NLOS) environment. To alleviate the adverse effect of the NLOS error on localization, a total least square framework integrated with a regularization term (RTLS) is utilized, and with which the localization problem can get rid of the ill-posed issue. However, it is challenging to figure out the exact solution for the considered localization problem. In this case, we convert the RTLS problem into a semidefinite program (SDP), and then obtain the solution of the original problem by solving a generalized trust region subproblem (GTRS). The proposed method has a relatively good robustness in localization even under the circumstance that the prior knowledge of the NLOS links or its distribution does not know. The outperformance of the proposed method is demonstrated in the simulations compared with other state-of-the-art techniques.

Journal ArticleDOI
TL;DR: Using pairs of a real-world low-light image and a pseudo well-exposed image, a lightweight deep CNN model is presented through knowledge distillation and demonstrated the effectiveness and practicality of the proposed method on various datasets.
Abstract: Recently, there has been growing attention on deep learning-based low-light image enhancement algorithms. With this interest, various synthetic low-light image datasets have been released publicly. However, real-world low-light and well-exposed image pair datasets are still lacking. In this paper, we propose a real-world low-light image dataset and a practical lightweight low-light image enhancement network. In order to construct a large-scale real-world low-light dataset, we have not only captured under-exposed images by ourselves but also collected under-exposed images from the Internet. Then, we produce pseudo well-exposed images for each low-light image. Using pairs of a real-world low-light image and a pseudo well-exposed image, we present a lightweight deep CNN model through knowledge distillation. Experimental results demonstrate the effectiveness and practicality of the proposed method on various datasets.