scispace - formally typeset
Search or ask a question

Showing papers on "FLOPS published in 2023"


Journal ArticleDOI
TL;DR: In this paper , the authors proposed a lightweight improved YOLOv5 (You Only Look Once) based algorithm to achieve real-time localization and ripeness detection of tomato fruits, which used a down-sampling convolutional layer instead of the original focus layer.

10 citations


Journal ArticleDOI
TL;DR: In this article , a split depth-wise transpose attention (STDA) encoder is introduced to split input tensors into multiple channel groups and utilize depthwise convolution along with self-attention across channel dimensions to increase the receptive field and encode multi-scale features.
Abstract: In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources and therefore cannot be deployed on edge devices. It is of great interest to build resource-efficient general purpose networks due to their usefulness in several application areas. In this work, we strive to effectively combine the strengths of both CNN and Transformer models and propose a new efficient hybrid architecture EdgeNeXt. Specifically in EdgeNeXt, we introduce split depth-wise transpose attention (STDA) encoder that splits input tensors into multiple channel groups and utilizes depth-wise convolution along with self-attention across channel dimensions to implicitly increase the receptive field and encode multi-scale features. Our extensive experiments on classification, detection and segmentation tasks, reveal the merits of the proposed approach, outperforming state-of-the-art methods with comparatively lower compute requirements. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2.2% with 28% reduction in FLOPs. Further, our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K. The code and models are available at https://t.ly/_Vu9 .

5 citations


Journal ArticleDOI
TL;DR: In this article , a non-dominated sorting genetic algorithm-II-based pruning algorithm was proposed to obtain an optimal pruning that balanced the detection accuracy and speed of the pruned model.

4 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors replaced the residual network with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture, and to help the network focus on the most useful information, they insert a simple but effective attention module called convolutional block attention module (CBAM).
Abstract: Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the naturally distributed word-level Chinese dataset called ‘Databox’ previously built by our laboratory. Secondly, the current state-of-the-art model consists of a residual network and a temporal convolutional network. The residual network leads to excessive computational cost and is not suitable for the on-device applications. In the new model, the residual network is replaced with ShuffleNet, which is an extremely computation-efficient Convolutional Neural Network (CNN) architecture. Thirdly, to help the network focus on the most useful information, we insert a simple but effective attention module called Convolutional Block Attention Module (CBAM) into the ShuffleNet. In our experiment, we compare several model architectures and find that our model achieves a comparable accuracy to the residual network (3.5 GFLOPs) under the computational budget of 1.01 GFLOPs.

3 citations


Journal ArticleDOI
01 Feb 2023-Entropy
TL;DR: YOLOv5s-G2 as discussed by the authors improves feature extraction accuracy by incorporating the Global Attention Mechanism (GAM) module, which can extract relevant information for pedestrian target identification tasks and suppress irrelevant information.
Abstract: Advanced object detection methods always face high algorithmic complexity or low accuracy when used in pedestrian target detection for the autonomous driving system. This paper proposes a lightweight pedestrian detection approach called the YOLOv5s-G2 network to address these issues. We apply Ghost and GhostC3 modules in the YOLOv5s-G2 network to minimize computational cost during feature extraction while keeping the network’s capability of extracting features intact. The YOLOv5s-G2 network improves feature extraction accuracy by incorporating the Global Attention Mechanism (GAM) module. This application can extract relevant information for pedestrian target identification tasks and suppress irrelevant information, improving the unidentified problem of occluded and small targets by replacing the GIoU loss function used in the bounding box regression with the α-CIoU loss function. The YOLOv5s-G2 network is evaluated on the WiderPerson dataset to ensure its efficacy. Our proposed YOLOv5s-G2 network offers a 1.0% increase in detection accuracy and a 13.2% decrease in Floating Point Operations (FLOPs) compared to the existing YOLOv5s network. As a result, the YOLOv5s-G2 network is preferable for pedestrian identification as it is both more lightweight and more accurate.

3 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed using tandem stitching to create a new disaster cassification network D-Net (Disaster Cassification Net) using the D-Conv, D-Linear, Dmodel, and D-Layer modules.
Abstract: As we all know, natural disasters have a great impact on people’s lives and properties, and it is very necessary to deal with disaster categories in a timely and effective manner. In light of this, we propose using tandem stitching to create a new Disaster Cassification network D-Net (Disaster Cassification Net) using the D-Conv, D-Linear, D-model, and D-Layer modules. During the experiment, we compared the proposed method with “CNN” and “Transformer”, we found that disaster cassification net compared to CNN algorithm Params decreased by 26–608 times, FLOPs decreased by up to 21 times, Precision increased by 1.6%–43.5%; we found that disaster cassification net compared to Transformer algorithm Params decreased by 23–149 times, FLOPs decreased by 1.7–10 times, Precision increased by 3.9%–25.9%. Precision increased by 3.9%–25.9%. And found that disaster cassification net achieves the effect of SOTA(State-Of-The-Art) on the disaster dataset; After that, we compared the above-mentioned MobileNet_v2 with the best performance on the classification dataset and CCT network are compared with disaster cassification net on fashion_mnist and CIFAR_100 public datasets, respectively, and the results show that disaster cassification net can still achieve the state-of-the-art classification effect. Therefore, our proposed algorithm can be applied not only to disaster tasks, but also to other classification tasks.

2 citations


Journal ArticleDOI
TL;DR: In this paper , the authors focus on inference costs rather than training costs, as the former account for most of the computing effort, solely because of the multiplicative factors, and the latter is usually accompanied with important energy efficiency optimisations.

2 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed a lightweight and highly accurate bus detection model based on an improved version of the YOLOv5 model, which integrates GhostConv and C3Ghost Modules into the network to reduce the number of parameters and floating-point operations per second.
Abstract: Object detection is crucial for individuals with visual impairment, especially when waiting for a bus. In this study, we propose a lightweight and highly accurate bus detection model based on an improved version of the YOLOv5 model. We propose integrating the GhostConv and C3Ghost Modules into the YOLOv5 network to reduce the number of parameters and floating-point operations per second (FLOPs), ensuring detection accuracy while reducing the model parameters. Following that, we added the SimSPPF module to replace the SPPF in the YOLOv5 backbone for increased computational efficiency and accurate object detection capabilities. Finally, we developed a Slim scale detection model by modifying the original YOLOv5 structure in order to make the model more efficient and faster, which is critical for real-time object detection applications. According to the experimental results, the Improved-YOLOv5 outperforms the original YOLOv5 in terms of the precision, recall, and mAP@0.5. Further analysis of the model complexity reveals that the Improved-YOLOv5 is more efficient due to fewer FLOPS, with fewer parameters, less memory usage, and faster inference time capabilities. The proposed model is smaller and more feasible to implement in resource-constrained mobile devices and a promising option for bus detection systems.

2 citations


Journal ArticleDOI
TL;DR: ASTER as discussed by the authors proposes adaptive sensitivity-based pruning (ASTER) which dynamically adjusts the pruning threshold concurrently with the training process to achieve scaling-invariance by refraining from modifying unpruned filter weights.
Abstract: Filter pruning is advocated for accelerating deep neural networks without dedicated hardware or libraries, while maintaining high prediction accuracy. Several works have cast pruning as a variant of l1 -regularized training, which entails two challenges: 1) the l1 -norm is not scaling-invariant (i.e., the regularization penalty depends on weight values) and 2) there is no rule for selecting the penalty coefficient to trade off high pruning ratio for low accuracy drop. To address these issues, we propose a lightweight pruning method termed adaptive sensitivity-based pruning (ASTER) which: 1) achieves scaling-invariance by refraining from modifying unpruned filter weights and 2) dynamically adjusts the pruning threshold concurrently with the training process. ASTER computes the sensitivity of the loss to the threshold on the fly (without retraining); this is carried efficiently by an application of L-BFGS solely on the batch normalization (BN) layers. It then proceeds to adapt the threshold so as to maintain a fine balance between pruning ratio and model capacity. We have conducted extensive experiments on a number of state-of-the-art CNN models on benchmark datasets to illustrate the merits of our approach in terms of both FLOPs reduction and accuracy. For example, on ILSVRC-2012 our method reduces more than 76% FLOPs for ResNet-50 with only 2.0% Top-1 accuracy degradation, while for the MobileNet v2 model it achieves 46.6% FLOPs Drop with a Top-1 Acc. Drop of only 2.77%. Even for a very lightweight classification model like MobileNet v3-small, ASTER saves 16.1% FLOPs with a negligible Top-1 accuracy drop of 0.03%.

2 citations


Journal ArticleDOI
Rui Tang, Hui Sun, Di Liu, Hui Xu, Miao Qi, Jun Kong 
TL;DR: Zhang et al. as discussed by the authors proposed a new detection head with double residual branch structure to reduce the delay of a decoupled head and improve the detection ability for object detection.
Abstract: Object detection has drawn the attention of many researchers due to its wide application in computer vision-related applications. In this paper, a novel model is proposed for object detection. Firstly, a new neck is designed for the proposed detection model, including an efficient SPPNet (Spatial Pyramid Pooling Network), a modified NLNet (Non Local Network) and a lightweight adaptive feature fusion module. Secondly, the detection head with double residual branch structure is presented to reduce the delay of a decoupled head and improve the detection ability. Finally, these improvements are embedded in YOLOX as plug-and-play modules for forming a high-performance detector, EYOLOX (EfficientYOLOX). Extensive experiments demonstrate that the EYOLOX achieves significant improvements, which increases YOLOX-s from 40.5% to 42.2% AP on the MS COCO dataset with a single GPU. Moreover, the performance of the detection of EYOLOX also outperforms YOLOv6 and some SOTA methods with the same number of parameters and GFLOPs. In particular, EYOLOX has only been trained on the COCO-2017 dataset without using any other datasets, and only the pre-training weights of the backbone part are loaded.

2 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a small yet discrimative model called STair network, which can be stacked towards an accurate multi-stage pose estimation system, which is composed of novel basic feature extraction blocks which focus on promoting feature diversity and obtaining rich local representations with fewer parameters, enabling a satisfactory balance on efficiency and performance.
Abstract: In this paper, we focus on tackling the precise keypoint coordinates regression task. Most existing approaches adopt complicated networks with a large number of parameters, leading to a heavy model with poor cost-effectiveness in practice. To overcome this limitation, we develop a small yet discrimicative model called STair Network, which can be simply stacked towards an accurate multi-stage pose estimation system. Specifically, to reduce computational cost, STair Network is composed of novel basic feature extraction blocks which focus on promoting feature diversity and obtaining rich local representations with fewer parameters, enabling a satisfactory balance on efficiency and performance. To further improve the performance, we introduce two mechanisms with negligible computational cost, focusing on feature fusion and replenish. We demonstrate the effectiveness of the STair Network on two standard datasets, e.g., 1-stage STair Network achieves a higher accuracy than HRNet by 5.5% on COCO test dataset with 80% fewer parameters and 68% fewer GFLOPs.

Journal ArticleDOI
TL;DR: In this paper , a 7-layer simple-structured network called "LiteCNN" with only 176 K parameters and 78.47 M floating point operations (FLOPs) was designed.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a novel filter pruning method for deep learning networks by calculating the learned representation median (RM) in frequency domain (LRMF), which emphasizes the removal of absolutely unimportant filters in the frequency domain.
Abstract: In this article, we propose a novel filter pruning method for deep learning networks by calculating the learned representation median (RM) in frequency domain (LRMF). In contrast to the existing filter pruning methods that remove relatively unimportant filters in the spatial domain, our newly proposed approach emphasizes the removal of absolutely unimportant filters in the frequency domain. Through extensive experiments, we observed that the criterion for “relative unimportance” cannot be generalized well and that the discrete cosine transform (DCT) domain can eliminate redundancy and emphasize low-frequency representation, which is consistent with the human visual system. Based on these important observations, our LRMF calculates the learned RM in the frequency domain and removes its corresponding filter, since it is absolutely unimportant at each layer. Thanks to this, the time-consuming fine-tuning process is not required in LRMF. The results show that LRMF outperforms state-of-the-art pruning methods. For example, with ResNet110 on CIFAR-10, it achieves a 52.3% FLOPs reduction with an improvement of 0.04% in Top-1 accuracy. With VGG16 on CIFAR-100, it reduces FLOPs by 35.9% while increasing accuracy by 0.5%. On ImageNet, ResNet18 and ResNet50 are accelerated by 53.3% and 52.7% with only 1.76% and 0.8% accuracy loss, respectively. The code is based on PyTorch and is available at https://github.com/zhangxin-xd/LRMF .

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors adopt the k-means++ method to cluster filters with similar features hierarchically in each convolutional layer and use an improved social group optimization (SGO) algorithm to iteratively search and optimize the compression process of the post-clustered structure to find the optimal compressed structure.


Journal ArticleDOI
TL;DR: Huang et al. as discussed by the authors proposed an efficient hierarchical hybrid vision Transformer (H2Former) for medical image segmentation, which integrates the merits of CNNs, multi-scale channel attention and Transformers.
Abstract: Accurate medical image segmentation is of great significance for computer aided diagnosis. Although methods based on convolutional neural networks (CNNs) have achieved good results, it is weak to model the long-range dependencies, which is very important for segmentation task to build global context dependencies. The Transformers can establish long-range dependencies among pixels by self-attention, providing a supplement to the local convolution. In addition, multi-scale feature fusion and feature selection are crucial for medical image segmentation tasks, which is ignored by Transformers. However, it is challenging to directly apply self-attention to CNNs due to the quadratic computational complexity for high-resolution feature maps. Therefore, to integrate the merits of CNNs, multi-scale channel attention and Transformers, we propose an efficient hierarchical hybrid vision Transformer (H2Former) for medical image segmentation. With these merits, the model can be data-efficient for limited medical data regime. The experimental results show that our approach exceeds previous Transformer, CNNs and hybrid methods on three 2D and two 3D medical image segmentation tasks. Moreover, it keeps computational efficiency in model parameters, FLOPs and inference time. For example, H2Former outperforms TransUNet by 2.29% in IoU score on KVASIR-SEG dataset with 30.77% parameters and 59.23% FLOPs.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a modified You Only Look Once (YOLO) algorithm PF-yOLOv4-Tiny, which incorporates spatial pyramidal pooling (SPP) and squeeze-and-excitation (SE) visual attention modules to enhance the target localization capability.
Abstract: Infrared target detection models are more required than ever before to be deployed on embedded platforms, which requires models with less memory consumption and better real-time performance while considering accuracy. To address the above challenges, we propose a modified You Only Look Once (YOLO) algorithm PF-YOLOv4-Tiny. The algorithm incorporates spatial pyramidal pooling (SPP) and squeeze-and-excitation (SE) visual attention modules to enhance the target localization capability. The PANet-based-feature pyramid networks (P-FPN) are proposed to transfer semantic information and location information simultaneously to ameliorate detection accuracy. To lighten the network, the standard convolutions other than the backbone network are replaced with depthwise separable convolutions. In post-processing the images, the soft-non-maximum suppression (soft-NMS) algorithm is employed to subside the missed and false detection problems caused by the occlusion between targets. The accuracy of our model can finally reach 61.75%, while the total Params is only 9.3 M and GFLOPs is 11. At the same time, the inference speed reaches 87 FPS on NVIDIA GeForce GTX 1650 Ti, which can meet the requirements of the infrared target detection algorithm for the embedded deployments.


Journal ArticleDOI
TL;DR: In this article , a self-rectifying memristor array based in-memory sparse computing system was proposed for sparse computation tasks, which can achieve an overall performance of ~97 to 11 TOPS/W for 2-to 8-bit sparse computation when processing practical scientific computing tasks.
Abstract: Memristor-enabled in-memory computing provides an unconventional computing paradigm to surpass the energy efficiency of von Neumann computers. Owing to the limitation of the computing mechanism, while the crossbar structure is desirable for dense computation, the system’s energy and area efficiency degrade substantially in performing sparse computation tasks, such as scientific computing. In this work, we report a high-efficiency in-memory sparse computing system based on a self-rectifying memristor array. This system originates from an analog computing mechanism that is motivated by the device’s self-rectifying nature, which can achieve an overall performance of ~97 to ~11 TOPS/W for 2- to 8-bit sparse computation when processing practical scientific computing tasks. Compared to previous in-memory computing system, this work provides over 85 times improvement in energy efficiency with an approximately 340 times reduction in hardware overhead. This work can pave the road toward a highly efficient in-memory computing platform for high-performance computing.

Journal ArticleDOI
TL;DR: In this article , an analytical model is proposed to estimate the store energy in typical operational conditions under the assumption of switching time variations following the normal distributions based on the measurements of a real chip fabricated with a 40-nm perpendicular MTJ/CMOS hybrid process.
Abstract: While the spin-transfer torque (STT) magnetic tunnel junction (MTJ) is a promising technique for enabling nonvolatile flip-flops (NVFFs) to perform power gating to reduce leakage power without any data losses, the large store energy (the energy to make a store operation) of MTJs needs to be addressed. The nonvolatile cool mega array series is an edge-oriented coarse-grained reconfigurable accelerator that implements an improved MTJ-based NVFF with a verify-and-retryable store method that should ideally reduce the store energy under the presence of the switching time variation originating from the stochastic nature of the MTJs. However, the energy reduction effect of the method has not been formulated or evaluated thoroughly enough to make the best use of the method in actual applications. In this study, we propose an analytical model to estimate the store energy in typical operational conditions under the assumption of switching time variations following the normal distributions based on the measurements of a real chip fabricated with a 40-nm perpendicular MTJ/CMOS hybrid process. In contrast to the tedious measurement on each different condition, the proposed model allows for an instantaneous determination of the best storing method for minimizing the store energy, with an energy reduction of up to 69% compared with a conventional one-time attempt storing method. This model is expected to be used for system-level energy simulations and, ultimately, for design explorations in pursuit of energy-optimized memory.

Journal ArticleDOI
Jingfei Chang, Yang Lu, Ping Xue, Yiqun Xu, Zhen Wei 
TL;DR: In this article , the authors proposed an iterative clustering pruning method named ICP together with knowledge transfer for channels, where the intermediate and output features of the original network are applied to guide the learning of the compressed network after each pruning step to quickly recover the network performance and then implement the next pruning operation.
Abstract: Convolutional neural networks (CNNs) have shown excellent performance in numerous computer vision tasks. However, the high computational and memory demands in computer vision tasks prohibit the practical applications of CNNs on edge computing devices. Existing iterative pruning methods suffer from insufficient accuracy recovery after each pruning, which severely affects the importance evaluation of the parameters. Moreover, channel pruning based on the magnitude of parameters often results in performance loss. In this context, we propose an iterative clustering pruning method named ICP together with knowledge transfer for channels. First, channel clustering pruning is performed based on the similarity between feature maps. Then, the intermediate and output features of the original network are applied to guide the learning of the compressed network after each pruning step to quickly recover the network performance and then implement the next pruning operation. Pruning and knowledge transfer are performed alternately to achieve accurate compression of the convolutional network. Finally, we demonstrate the effectiveness of the proposed method on the CIFAR-10, CIFAR-100, and ILSVRC-2012 datasets by pruning VGGNet, ResNet, and GoogLeNet. Our pruning scheme can typically reduce parameters and Floating-point Operations (FLOPs) of the network without harming accuracy significantly. In addition, the ICP was verified to have good practical generalization by compressing the SSD network on the object detection dataset PASCAL VOC.

Journal ArticleDOI
TL;DR: In this paper , a transmission-friendly vision transformer model, TFormer, is proposed for deployment on resource-constrained IoT devices with the assistance of a cloud server, which can obtain multitype and multiscale features with only a few parameters and FLOPs.
Abstract: Deploying high-performance vision transformer (ViT) models on ubiquitous Internet of Things (IoT) devices to provide high-quality vision services will revolutionize the way we live, work, and interact with the world. Due to the contradiction between the limited resources of IoT devices and resource-intensive ViT models, the use of cloud servers to assist ViT model training has become mainstream. However, due to the larger number of parameters and floating-point operations (FLOPs) of the existing ViT models, the model parameters transmitted by cloud servers are large and difficult to run on resource-constrained IoT devices. To this end, this article proposes a transmission-friendly ViT model, TFormer, for deployment on resource-constrained IoT devices with the assistance of a cloud server. The high performance and small number of model parameters and FLOPs of TFormer are attributed to the proposed hybrid layer and the proposed partially connected feed-forward network (PCS-FFN). The hybrid layer consists of nonlearnable modules and a pointwise convolution, which can obtain multitype and multiscale features with only a few parameters and FLOPs to improve the TFormer performance. The PCS-FFN adopts group convolution to reduce the number of parameters. The key idea of this article is to propose TFormer with few model parameters and FLOPs to facilitate applications running on resource-constrained IoT devices to benefit from the high performance of the ViT models. Experimental results on the ImageNet-1K, MS COCO, and ADE20K datasets for image classification, object detection, and semantic segmentation tasks demonstrate that the proposed model outperforms other state-of-the-art models. Specifically, TFormer-S achieves 5% higher accuracy on ImageNet-1K than ResNet18 with 1.4× fewer parameters and FLOPs.

Journal ArticleDOI
TL;DR: In this paper , a variable-hyperparameter visual transformer architecture is proposed for image inpainting, which divides the feature maps into a variable number of multi-scale patches and distributes them to different heads.
Abstract: Image inpainting has shown a great evolution in the reconstruction of damaged regions or holes since the advent of deep neural networks. Recently, transformers have been used in the field of computer vision to capture global information about the image, which cannot be done with convolutional neural networks due to the limitation of their local receptive fields. Therefore, the transformer may be essential to achieve realistic results when damaged regions cover a large part of the image. However, the quadratic computational and memory costs in the self-attention layer have led to its prohibited usage in high-resolution images and restricted devices, especially for image inpainting when the method must deal with large masks. To overcome this problem, we propose a variable-hyperparameter visual transformer architecture that (i) subdivides the feature maps into a variable number of multi-scale patches, (ii) distributes the feature map into a variable number of heads to balance the complexity of the self-attention operation, and (iii) includes a new strategy based on depth-wise convolution to reduce the number of channels of the feature map sent to each transformer block. We conduct experiments on three datasets from the literature. Our experiments show that our method consistently achieved the best results for the FID and LPIPS metrics on the CelebA dataset. We obtained competitive results for Places2 and Paris StreetView datasets compared to state-of-the-art methods. Moreover, our model presents the best performance in terms of model size, number of parameters, and FLOPS. Our qualitative results indicate that our proposed method is capable of reconstructing semantic content, such as parts of human faces.

Journal ArticleDOI
TL;DR: In this article , the existence and type of Calabi-Yau threefolds realised as Kähler-favourable complete intersections in products of projective spaces (CICYs) are studied.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a new detection model for feature extraction based on a multi-headed attention mechanism, which is more competent in performing the task of detecting maize tassels.

Posted ContentDOI
20 Jun 2023
TL;DR: In this paper , Hourenke et al. proposed a lightweight IR small target segmentation network (LW-IRSTNet), which combines regular convolutions, depthwise separable convions, atrous convolutions and asymmetric convolutions to form a new lightweight encoding and decoding structure.
Abstract: <p>It is a challenging task to separate infrared (IR) small targets from complex backgrounds quickly and accurately. Many kinds of literature have designed various feature fusion modules to further extract IR small target features. Although these designs are slightly helpful to the improvement of IR small target detection accuracy, they will cause a significant increase in network params and FLOPs. To minimize the computational complexity of the network and achieve industrial implementation while ensuring accuracy, we abandon the complex feature fusion modules and combine regular convolutions, depthwise separable convolutions, atrous convolutions, and asymmetric convolutions modules to form a new lightweight encoding and decoding structure, which called lightweight IR small target segmentation network (LW-IRSTNet). In addition, we design the post-processing modules, which include an eight-neighborhood clustering algorithm and an online target feature adjustment strategy. The experimental results show that: 1. The segmentation accuracy metrics of LW-IRSTNet are the same as the best results of 14 state-of-the-art comparative baselines; 2. The params and FLOPs of LW-IRSTNet are only 0.16M and 303M, which is much smaller than the comparison baselines; 3. The post-processing module increases human-machine friendliness and improves the robustness of the algorithm in application deployment. Meanwhile, LW IRSTNet is deployed on both embedded platforms and the website, further expanding its application scope. Through the ONNX framework, NPU acceleration, and CPU multi-threaded resource utilization, the high-performance inference capability and online dynamic threshold adjustment ability of LW-IRSTNet are realized. The source codes are available at <a href="https://github.com/kourenke/LW-IRSTNet" target="_blank">https://github.com/kourenke/LW-IRSTNet</a>. </p>


Journal ArticleDOI
TL;DR: In this paper , a CNN compression technique based on the hierarchical Tucker-2 tensor decomposition is proposed, which makes an important contribution to the field of neural network compression based on low-rank approximations.

Journal ArticleDOI
TL;DR: GhostFaceNets as discussed by the authors uses a series of inexpensive linear transformations to extract additional feature maps from a set of intrinsic features, allowing for a more comprehensive representation of the underlying information.
Abstract: The development of deep learning-based biometric models that can be deployed on devices with constrained memory and computational resources has proven to be a significant challenge. Previous approaches to this problem have not prioritized the reduction of feature map redundancy, but the introduction of Ghost modules represents a major innovation in this area. Ghost modules use a series of inexpensive linear transformations to extract additional feature maps from a set of intrinsic features, allowing for a more comprehensive representation of the underlying information. GhostNetV1 and GhostNetV2, both of which are based on Ghost modules, serve as the foundation for a group of lightweight face recognition models called GhostFaceNets. GhostNetV2 expands upon the original GhostNetV1 by adding an attention mechanism to capture long-range dependencies. Evaluation of GhostFaceNets using various benchmarks reveals that these models offer superior performance while requiring a computational complexity of approximately 60-275 MFLOPs. This is significantly lower than that of State-Of-The-Art (SOTA) big convolutional neural network (CNN) models, which can require hundreds of millions of FLOPs. GhostFaceNets trained with the ArcFace loss on the refined MS-Celeb-1M dataset demonstrate SOTA performance on all benchmarks. In comparison to previous SOTA mobile CNNs, GhostFaceNets greatly improve efficiency for face verification tasks. The GhostFaceNets entire code will be made available after publication.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a more comprehensive criterion to prune filters, further improving the network compactness while preserving good performance, which is more robust against noise than the spatial ones since the critical clues for pruning are more concentrated after DCT.