scispace - formally typeset
Search or ask a question

Showing papers by "Guangming Shi published in 2023"


DOI
TL;DR: In this article , a low specific absorption rate (SAR) and high on-body efficiency tri-band smartwatch antenna is designed utilizing the theory of characteristic modes (TCM) of composite perfect electric conductors and lossy dielectric structures.
Abstract: A low specific absorption rate (SAR) and high on-body efficiency tri-band smartwatch antenna is designed utilizing the theory of characteristic modes (TCM) of composite perfect electric conductors and lossy dielectric structures. The antenna works in the GPS frequency band, WLAN 2.4 GHz/Bluetooth frequency band, and 5G frequency band. After analyzing the characteristic modes of the antenna and human body in the above three frequency bands, the characteristic modes with small modal coupling coefficients (MCCs) are selected and excited. Due to the small mutual coupling between the antenna and the human body, the proposed antenna has good performance on both SAR and on-body efficiency. Its measured 10 g average body SAR is lower than 1.3 W/kg. When it is placed 0 mm above the body, its on-body efficiency is 9.3%–9.88% in the GPS frequency band, 12%–14.2% in the WLAN 2.4 GHz/Bluetooth frequency band, and 17.3%–20.1% in the 5G frequency band.

2 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new IAA model based on Theme-Aware Visual Attribute Reasoning (TAVAR), which simulated the process of human perception in image aesthetics by performing bilevel reasoning.
Abstract: People usually assess image aesthetics according to visual attributes, e.g., interesting content, good lighting and vivid color, etc. Further, the perception of visual attributes depends on the image theme. Therefore, the inherent relationship between visual attributes and image theme is crucial for image aesthetics assessment (IAA), which has not been comprehensively investigated. With this motivation, this paper presents a new IAA model based on Theme-Aware Visual Attribute Reasoning (TAVAR). The underlying idea is to simulate the process of human perception in image aesthetics by performing bilevel reasoning. Specifically, a visual attribute analysis network and a theme understanding network are first pre-trained to extract aesthetic attribute features and theme features, respectively. Then, the first level Attribute-Theme Graph (ATG) is built to investigate the coupling relationship between visual attributes and image theme. Further, a flexible aesthetics network is introduced to extract general aesthetic features, based on which we built the second level Attribute-Aesthetics Graph (AAG) to mine the relationship between theme-aware visual attributes and aesthetic features, producing the final aesthetic prediction. Extensive experiments on four public IAA databases demonstrate the superiority of the proposed TAVAR model over the state-of-the-arts. Furthermore, TAVAR features better explainability due to the use of visual attributes.

2 citations


Journal ArticleDOI
01 Feb 2023
TL;DR: Wang et al. as discussed by the authors proposed a hierarchical spatial relation module to capture long-term temporal context evolving with the 2D-1T event cloud sequence, which achieved state-of-the-art performance.
Abstract: The neuromorphic event cameras can efficiently sense the latent geometric structures and motion clues of a scene by generating asynchronous and sparse event signals. Due to the irregular layout of the event signals, how to leverage their plentiful spatio-temporal information for recognition tasks remains a significant challenge. Existing methods tend to treat events as dense image-like or point-serie representations. However, they either suffer from severe destruction on the sparsity of event data or fail to encode robust spatial cues. To fully exploit their inherent sparsity with reconciling the spatio-temporal information, we introduce a compact event representation, namely 2D-1T event cloud sequence (2D-1T ECS). We couple this representation with a novel light-weight spatio-temporal learning framework (ECSNet) that accommodates both object classification and action recognition tasks. The core of our framework is a hierarchical spatial relation module. Equipped with specially designed surface-event-based sampling unit and local event normalization unit to enhance the inter-event relation encoding, this module learns robust geometric features from the 2D event clouds. And we propose a motion attention module for efficiently capturing long-term temporal context evolving with the 1T cloud sequence. Empirically, the experiments show that our framework achieves par or even better state-of-the-art performance. Importantly, our approach cooperates well with the sparsity of event data without any sophisticated operations, hence leading to low computational costs and prominent inference speeds.

2 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a novel image reconstruction method based on the maximum a posterior (MAP) estimation framework using learned Gaussian Scale Mixture (GSM) prior.
Abstract: Image reconstruction from partial observations has attracted increasing attention. Conventional image reconstruction methods with hand-crafted priors often fail to recover fine image details due to the poor representation capability of the hand-crafted priors. Deep learning methods attack this problem by directly learning mapping functions between the observations and the targeted images can achieve much better results. However, most powerful deep networks lack transparency and are nontrivial to design heuristically. This paper proposes a novel image reconstruction method based on the Maximum a Posterior (MAP) estimation framework using learned Gaussian Scale Mixture (GSM) prior. Unlike existing unfolding methods that only estimate the image means (i.e., the denoising prior) but neglected the variances, we propose characterizing images by the GSM models with learned means and variances through a deep network. Furthermore, to learn the long-range dependencies of images, we develop an enhanced variant based on the Swin Transformer for learning GSM models. All parameters of the MAP estimator and the deep network are jointly optimized through end-to-end training. Extensive simulation and real data experimental results on spectral compressive imaging and image super-resolution demonstrate that the proposed method outperforms existing state-of-the-art methods.

1 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an anchor-based knowledge embedding (ANKE) approach for generic image aesthetics assessment, which makes predictions based on a universal aesthetic knowledge base.

1 citations


Journal ArticleDOI
TL;DR: In this article , a few-shot learning framework was proposed to recognize human-object interaction (HOI) classes with a few labeled samples by leveraging a meta-learning paradigm where humanobject interactions are embedded into compact features for similarity calculation.

1 citations


Journal ArticleDOI
TL;DR: The authors proposed a method of parameter retention and feedforward network parameter distillation to compress N-stacked transformer modules into one module in the fine-tuning stage, which can guide the student network's feature reconstruction in the latent space.
Abstract: Despite the remarkable performance on various Natural Language Processing (NLP) tasks, the parametric complexity of pretrained language models has remained a major obstacle due to limited computational resources in many practical applications. Techniques such as knowledge distillation, network pruning, and quantization have been developed for language model compression. However, it has remained challenging to achieve an optimal tradeoff between model size and inference accuracy. To address this issue, we propose a novel and efficient uncertainty-driven knowledge distillation compression method for transformer-based pretrained language models. Specifically, we design a method of parameter retention and feedforward network parameter distillation to compress N-stacked transformer modules into one module in the fine-tuning stage. A key innovation of our approach is to add the uncertainty estimation module (UEM) into the student network such that it can guide the student network's feature reconstruction in the latent space (similar to the teacher's). Across multiple datasets in the natural language inference tasks of GLUE, we have achieved more than 95% accuracy of the original BERT, while only using about 50% of the parameters.

Peer ReviewDOI
TL;DR: In this paper , the authors review the development of SR technology and advocate for a hybrid approach that combines the strengths of model-based and learning-based super-resolution imaging techniques, including geometry-driven, sparsity-based, and gradient-profile priors.
Abstract: In multidimensional signal processing, such as image and video processing, superresolution (SR) imaging is a classical problem. Over the past 25 years, academia and industry have been interested in reconstructing high-resolution (HR) images from their low-resolution (LR) counterparts. We review the development of SR technology in this tutorial article based on the evolution of key insights associated with the prior knowledge or regularization method from analytical representations to data-driven deep models. The coevolution of SR with other technical fields, such as autoregressive modeling, sparse coding, and deep learning, will be highlighted in both model-based and learning-based approaches. Model-based SR includes geometry-driven, sparsity-based, and gradient-profile priors; learning-based SR covers three types of neural network (NN) architectures, namely residual networks (ResNet), generative adversarial networks (GANs), and pretrained models (PTMs). Both model-based and learning-based SR are united by highlighting their limitations from the perspective of model-data mismatch. Our new perspective allows us to maintain a healthy skepticism about current practice and advocate for a hybrid approach that combines the strengths of model-based and learning-based SR. We will also discuss several open challenges, including arbitrary-ratio, reference-based, and domain-specific SR.


Journal ArticleDOI
TL;DR: DSD-PRO as mentioned in this paper proposes a motion-enhanced UGC-VQA method based on decomposition and recomposition to cover the dynamic degradations, which can provide implicit modeling, which is unclear and difficult to analyze.
Abstract: The prevalence of short-video applications imposes more requirements for video quality assessment (VQA). User-generated content (UGC) videos are captured under an unprofessional environment, thus suffering from various dynamic degradations, such as camera shaking. To cover the dynamic degradations, existing recurrent neural network-based UGC-VQA methods can only provide implicit modeling, which is unclear and difficult to analyze. In this work, we consider explicit motion representation for dynamic degradations, and propose a motion-enhanced UGC-VQA method based on decomposition and recomposition. In the decomposition stage, a dual-stream decomposition module is built, and VQA task is decomposed into single frame-based quality assessment problem and cross frames-based motion understanding. The dual streams are well grounded on the two-pathway visual system during perception, and require no extra UGC data due to knowledge transfer. Hierarchical features from shallow to deep layers are gathered to narrow the gaps from tasks and domains. In the recomposition stage, a progressively residual aggregation module is built to recompose features from the dual streams. Representations with different layers and pathways are interacted and aggregated in a progressive and residual manner, which keeps a good trade-off between representation deficiency and redundancy. Extensive experiments on UGC-VQA databases verify that our method achieves the state-of-the-art performance and keeps a good capability of generalization. The source code will be available in https://github.com/Sissuire/DSD-PRO.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a new generation of data-driven computational imaging systems with performances even better than those of their model-based counterparts, but the design of learning-based algorithms for computational imaging often lacks transparency, making it difficult to optimize the entire imaging system in a complete manner.
Abstract: Conventional wisdom in model-based computational imaging incorporates physics-based imaging models, noise characteristics, and image priors into a unified Bayesian framework. Rapid advances in deep learning have inspired a new generation of data-driven computational imaging systems with performances even better than those of their model-based counterparts. However, the design of learning-based algorithms for computational imaging often lacks transparency, making it difficult to optimize the entire imaging system in a complete manner.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel deep learning method called the spatial and conditional domain adaptation network (SCDAN), which aims to adapt the intersubject MI EEG data, and three innovative structures are employed: a parallel temporal-spatial convolution feature extractor, a spatial discriminator, and a conditional discriminator.
Abstract: An electroencephalogram (EEG)-based motor imagery (MI) brain–computer interface (BCI) builds a direct communication channel between humans and computers by decoding EEG signals. The intersubject decoding ability is crucial for the application of MI-BCI, which implies that the subject can use MI-BCI equipment without recording additional data for training. Physiologically, because of the distinction in the imagery method, brain structure, and brain state, the intersubject data distribution of MI EEG data is different. This often leads to a partial or even complete failure of the MI decoding algorithm between subjects. To solve these issues, we propose a novel deep learning method called the spatial and conditional domain adaption network (SCDAN), which aims to adapt the intersubject MI EEG data. In SCDAN, three innovative structures are employed: a parallel temporal–spatial convolution feature extractor, a spatial discriminator, and a conditional discriminator. The feature extractor adopts an improved temporal–spatial convolutional network that has a more reasonable structure and fewer parameters to reduce the risk of intersubject overfitting. The spatial discriminator and conditional discriminator calibrate the training processing to help the feature extractor learn the intersubject common feature representation. We evaluate the performance of SCDAN on the GigaScience dataset and the 2a BCI Competition IV dataset using both one-to-one and leave-one-out transfer protocols. For the one-to-one transfer protocol, the classification accuracies of SCDAN improve by 5.70% and 12.43% compared with the baseline method. And for the leave-one-out transfer protocol, the improvements are 8.60% and 15.84%, respectively. The results show a significant improvement compared with the baseline and comparison methods.

Journal ArticleDOI
01 Jun 2023
TL;DR: In this article , the authors propose a domain generalizable video quality assessment (VQA) method named Dynamic Ensemble of Expert-Knowledge (DEEK), a novel framework that dynamically exploits the expert-knowledge from each source domain to achieve a generalizable ensemble prediction.
Abstract: Despite the impressive progress of supervised methods in quality assessment for in- the-wild videos, models trained on one domain often fail to generalize well to others due to the domain shifts caused by distortion diversity and content variation. Domain generalizable video quality assessment (VQA) methods that can work across domains remain an open research challenge. Although combining more data following the mixed-domain training strategy can improve the generalization performance to a certain extent, the specific knowledge from each source domain, which could potentially be useful for improving unseen domain generalization, is ignored in this principle. Motivated by this, we propose a domain generalizable VQA method named Dynamic Ensemble of Expert-Knowledge (DEEK), a novel framework that dynamically exploits the expert-knowledge from each source domain to achieve a generalizable ensemble prediction. Specifically, based on the multiple experts each trained to specialize in a particular source domain, we aim to exploit complementary information provided by the expert-knowledge. We effectively train an ensemble model by proposing a quality-sensitive InfoNCE loss to regularize the collaborative training of all experts in the contrastive learning formulation, aiming to exploit complementary information provided by the expert-knowledge when forming the ensemble. By dynamically integrating the experts according to their relevances to the target data, these expert-knowledge could be leveraged for better generalization. Experiments on five VQA datasets verify that our approach outperforms the state-of-the-arts by large margins.

Journal ArticleDOI
TL;DR: In this paper , the spatial distance information of objects on the feature map is encoded during the unidirectional pooling process, which effectively alleviates the occlusion of the homogeneous object features.
Abstract: Detecting objects as multiple keypoints is an important approach in the anchor-free object detection methods while corner pooling is an effective feature encoding method for corner positioning. The corners of the bounding box are located by summing the feature maps which are max-pooled in the x and y directions respectively by corner pooling. In the unidirectional max pooling operation, the features of the densely arranged objects of the same class are prone to occlusion. To this end, we propose a method named Gradient Corner Pooling. The spatial distance information of objects on the feature map is encoded during the unidirectional pooling process, which effectively alleviates the occlusion of the homogeneous object features. Further, the computational complexity of gradient corner pooling is the same as traditional corner pooling and hence it can be implemented efficiently. Gradient corner pooling obtains consistent improvements for various keypoint-based methods by directly replacing corner pooling. We verify the gradient corner pooling algorithm on the dataset and in real scenarios, respectively. The networks with gradient corner pooling located the corner points earlier in the training process and achieve an average accuracy improvement of 0.2%-1.6% on the MS-COCO dataset. The detectors with gradient corner pooling show better angle adaptability for arrayed objects in the actual scene test.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a method for robotic tactile super-resolution enhancement by learning spatio-temporal continuity of contact position and a tactile sensor composed of overlapping air chambers, where each overlapping air chamber is constructed of soft material and sealed the barometer inside to mimic adapting receptors of human skin.
Abstract: Human hand has amazing super-resolution ability in sensing the force and position of contact and this ability can be strengthened by practice. Inspired by this, we propose a method for robotic tactile super-resolution enhancement by learning spatiotemporal continuity of contact position and a tactile sensor composed of overlapping air chambers. Each overlapping air chamber is constructed of soft material and seals the barometer inside to mimic adapting receptors of human skin. Each barometer obtains the global receptive field of the contact surface with the pressure propagation in the hyperelastic seal overlapping air chambers. Neural networks with causal convolution are employed to resolve the pressure data sampled by barometers and to predict the contact position. The temporal consistency of spatial position contributes to the accuracy and stability of positioning. We obtain an average super-resolution (SR) factor of over 2500 with only four physical sensing nodes on the rubber surface (0.1 mm in the best case on 38 × 26 mm²), which outperforms the state-of-the-art. The effect of time series length on the location prediction accuracy of causal convolution is quantitatively analyzed in this article. We show that robots can accomplish challenging tasks such as haptic trajectory following, adaptive grasping, and human-robot interaction with the tactile sensor. This research provides new insight into tactile super-resolution sensing and could be beneficial to various applications in the robotics field.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a supervised contrastive learning based on the fusion of global and local features method, named SCFR, to enhance the ability of image expression.
Abstract: With the rapid development of remote sensing sensor technology, the number of remote sensing images (RSIs) has exploded. How to effectively retrieve and manage this massive data have become an urgent problem. At present, content-based image retrieval (CBIR) methods have become a mainstream method due to their excellent performance. However, most of the existing retrieval methods only consider the global features of images, which lacks the ability to discriminate images with the same semantic information but different visual representations. To alleviate this issue, supervised contrastive learning based on the fusion of global and local features method is proposed in this article, named SCFR. First, a fusion module is designed to combine global and local features to enhance the ability of image expression. Second, supervised contrastive learning is introduced into the retrieval task to effectively improve the feature distribution, so that the positive sample pairs are close to each other, and the negative sample pairs are far away from each other in the feature space. Furthermore, to make the distribution of features of the same class more compact, the center contrastive loss is added to the constraints and combines the class centers that change iteratively with the network. Experimental results on three RSI datasets show that our proposed method has a more effective retrieval performance than the state-of-the-art methods. The code and models are available at https://github.com/xdplay17/SCFR.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a model-guided deep unfolding network for the more challenging and realistic mixed noise video denoising problem, named DU-MVDnet, which divides video frames into several overlapping groups and progressively integrate these frames into one frame.
Abstract: Existing image and video denoising algorithms have focused on removing homogeneous Gaussian noise. However, this assumption with noise modeling is often too simplistic for the characteristics of real-world noise. Moreover, the design of network architectures in most deep learning-based video denoising methods is heuristic, ignoring valuable domain knowledge. In this paper, we propose a model-guided deep unfolding network for the more challenging and realistic mixed noise video denoising problem, named DU-MVDnet. First, we develop a novel observation model/likelihood function based on the correlations among adjacent degraded frames. In the framework of Bayesian deep learning, we introduce a deep image denoiser prior and obtain an iterative optimization algorithm based on the maximum a posterior (MAP) estimation. To facilitate end-to-end optimization, the iterative algorithm is transformed into a deep convolutional neural network (DCNN)-based implementation. Furthermore, recognizing the limitations of traditional motion estimation and compensation methods, we propose an efficient multistage recursive fusion strategy to exploit temporal dependencies. Specifically, we divide video frames into several overlapping groups and progressively integrate these frames into one frame. Toward this objective, we implement a multiframe adaptive aggregation operation to integrate feature maps of intragroup with those of intergroup frames. Extensive experimental results on different video test datasets have demonstrated that the proposed model-guided deep network outperforms current state-of-the-art video denoising algorithms such as FastDVDnet and MAP-VDNet.

Journal ArticleDOI
TL;DR: In this article , a modified radon inverse Fourier transform (MRIFT) was proposed to estimate velocity via single parameter searching and achieve the target's CI in the 2D frequency domain using the inverse-fraction transform.
Abstract: Within the long-time coherent integration (CI) in radar detection, the range walk effect inevitably impacts high-speed targets, which makes coherent detection methods invalid. To address this problem, a coherent detection method, a modified radon inverse Fourier transform (MRIFT), was developed in this article. The MRIFT can estimate velocity via single parameter searching and achieve the target's CI in the 2-D frequency domain using the inverse Fourier transform. Compared with the traditional coherent detection methods, the MRIFT achieved great detection performance and range and velocity estimation performance and averted the blind speed side lobe effect at a lower computational cost. Simulations demonstrate the effectiveness and efficiency of the proposed MRIFT.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a joint search-and-training approach to learn a compact network directly from scratch using pruning as a search strategy, which can achieve better balance in terms of efficiency and accuracy and notable advantages over current state-of-the-art pruning methods.
Abstract: Both network pruning and neural architecture search (NAS) can be interpreted as techniques to automate the design and optimization of artificial neural networks. In this paper, we challenge the conventional wisdom of training before pruning by proposing a joint search-and-training approach to learn a compact network directly from scratch. Using pruning as a search strategy, we advocate three new insights for network engineering: 1) to formulate adaptive search as a cold start strategy to find a compact subnetwork on the coarse scale; and 2) to automatically learn the threshold for network pruning; 3) to offer flexibility to choose between efficiency and robustness. More specifically, we propose an adaptive search algorithm in the cold start by exploiting the randomness and flexibility of filter pruning. The weights associated with the network filters will be updated by ThreshNet, a flexible coarse-to-fine pruning method inspired by reinforcement learning. In addition, we introduce a robust pruning strategy leveraging the technique of knowledge distillation through a teacher-student network. Extensive experiments on ResNet and VGGNet have shown that our proposed method can achieve a better balance in terms of efficiency and accuracy and notable advantages over current state-of-the-art pruning methods in several popular datasets, including CIFAR10, CIFAR100, and ImageNet. The code associate with this paper is available at: https://see.xidian.edu.cn/faculty/wsdong/Projects/AST-NP.htm.

Journal ArticleDOI
TL;DR: In this paper , a new algorithm for compressing reads in FASTQ files, which can be integrated into various genomic compression tools, is presented. But there remain lots of room for the compression of the reads part.
Abstract: Motivation: Despite significant advances in Third-Generation Sequencing (TGS) technologies, Next-Generation Sequencing (NGS) technologies remain dominant in the current sequencing market. This is due to the lower error rates and richer analytical software of NGS than that of TGS. NGS technologies generate vast amounts of genomic data including short reads, quality values and read identifiers. As a result, efficient compression of such data has become a pressing need, leading to extensive research efforts focused on designing FASTQ compressors. Previous researches show that lossless compression of quality values seems to reach its limits. But there remain lots of room for the compression of the reads part. Results: By investigating the characters of the sequencing process, we present a new algorithm for compressing reads in FASTQ files, which can be integrated into various genomic compression tools. We first reviewed the pipeline of reference-based algorithms and identified three key components that heavily impact storage: the matching positions of reads on the reference sequence(refpos), the mismatched positions of bases on reads(mispos) and the matching failed reads(unmapseq). To reduce their sizes, we conducted a detailed analysis of the distribution of matching positions and sequencing errors and then developed the three modules of AMGC. According to the experiment results, AMGC outperformed the current state-of-the-art methods, achieving an 81.23% gain in compression ratio on average compared with the second-best-performing compressor.