HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers
Mingyu Ding,Xiaochen Lian,Linjie Yang,Peng Wang,Xiaojie Jin,Zhiwu Lu,Ping Luo +6 more
- pp 2982-2992
TLDR
HR-NAS as mentioned in this paper adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions and proposes an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.Abstract:
High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR representations is typically ignored in previous Neural Architecture Search (NAS) methods that focus on image classification. This work proposes a novel NAS method, called HR-NAS, which is able to find efficient and accurate networks for different tasks, by effectively encoding multiscale contextual information while maintaining high-resolution representations. In HR-NAS, we renovate the NAS search space as well as its searching strategy. To better encode multiscale image contexts in the search space of HR-NAS, we first carefully design a lightweight transformer, whose computational complexity can be dynamically changed with respect to different objective functions and computation budgets. To maintain high-resolution representations of the learned networks, HR-NAS adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions, inspired by HRNet [73]. Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources. As shown in Fig. 1 (a), HR-NAS is capable of achieving state-of-the-art trade-offs between performance and FLOPs for three dense prediction tasks and an image classification task, given only small computational budgets. For example, HR-NAS surpasses SqueezeNAS [63] that is specially designed for semantic segmentation while improving efficiency by 45.9%. Code is available at https://github.com/dingmyu/HR-NAS.read more
Citations
More filters
Proceedings ArticleDOI
DaViT: Dual Attention Vision Transformers
TL;DR: This work proposes approaching the problem from an orthogonal an-gle: exploiting self-attention mechanisms with both spatial and channel tokens with both “spatial tokens” and “ channel tokens ”, and shows that these twoSelf-attentions complement each other.
Proceedings ArticleDOI
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang,Zilong Huang,Guozhong Luo,Tao Chen,Xinggang Wang,Wenyu Liu,Gang Yu,Chunhua Shen +7 more
TL;DR: Experimental results demonstrate that the proposed TopFormer method significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency.
Proceedings ArticleDOI
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation
TL;DR: HRViT as discussed by the authors integrates high-resolution multi-branch architectures with vision transformers to learn semantically rich and spatially-precise multi-scale representations by various branch-block co-optimization techniques.
Journal ArticleDOI
CLFormer: A Lightweight Transformer Based on Convolutional Embedding and Linear Self-Attention With Strong Robustness for Bearing Fault Diagnosis Under Limited Sample Conditions
TL;DR: This work provides a feasible strategy for the research topic of fault diagnosis with the goal of practical deployment and proposes a lightweight transformer based on convolutional embedding and linear self-attention, called CLFormer.
Proceedings Article
AutoSNN: Towards Energy-Efficient Spiking Neural Networks
TL;DR: This work investigates the design choices used in the previous studies in terms of the accuracy and number of spikes and points out that they are not best-suited for SNNs and proposes a spike-aware neural architecture search framework called AutoSNN, a search space consisting of architectures without undesirable design choices.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Book ChapterDOI
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Related Papers (5)
Efficient Neural Architecture Search on Low-Dimensional Data for OCT Image Segmentation
Nils Gessert,Alexander Schlaefer +1 more