scispace - formally typeset
Open AccessProceedings ArticleDOI

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

TLDR
HR-NAS as mentioned in this paper adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions and proposes an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.
Abstract
High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR representations is typically ignored in previous Neural Architecture Search (NAS) methods that focus on image classification. This work proposes a novel NAS method, called HR-NAS, which is able to find efficient and accurate networks for different tasks, by effectively encoding multiscale contextual information while maintaining high-resolution representations. In HR-NAS, we renovate the NAS search space as well as its searching strategy. To better encode multiscale image contexts in the search space of HR-NAS, we first carefully design a lightweight transformer, whose computational complexity can be dynamically changed with respect to different objective functions and computation budgets. To maintain high-resolution representations of the learned networks, HR-NAS adopts a multi-branch architecture that provides convolutional encoding of multiple feature resolutions, inspired by HRNet [73]. Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources. As shown in Fig. 1 (a), HR-NAS is capable of achieving state-of-the-art trade-offs between performance and FLOPs for three dense prediction tasks and an image classification task, given only small computational budgets. For example, HR-NAS surpasses SqueezeNAS [63] that is specially designed for semantic segmentation while improving efficiency by 45.9%. Code is available at https://github.com/dingmyu/HR-NAS.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

DaViT: Dual Attention Vision Transformers

TL;DR: This work proposes approaching the problem from an orthogonal an-gle: exploiting self-attention mechanisms with both spatial and channel tokens with both “spatial tokens” and “ channel tokens ”, and shows that these twoSelf-attentions complement each other.
Proceedings ArticleDOI

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

TL;DR: Experimental results demonstrate that the proposed TopFormer method significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency.
Proceedings ArticleDOI

Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation

TL;DR: HRViT as discussed by the authors integrates high-resolution multi-branch architectures with vision transformers to learn semantically rich and spatially-precise multi-scale representations by various branch-block co-optimization techniques.
Journal ArticleDOI

CLFormer: A Lightweight Transformer Based on Convolutional Embedding and Linear Self-Attention With Strong Robustness for Bearing Fault Diagnosis Under Limited Sample Conditions

TL;DR: This work provides a feasible strategy for the research topic of fault diagnosis with the goal of practical deployment and proposes a lightweight transformer based on convolutional embedding and linear self-attention, called CLFormer.
Proceedings Article

AutoSNN: Towards Energy-Efficient Spiking Neural Networks

TL;DR: This work investigates the design choices used in the previous studies in terms of the accuracy and number of spikes and points out that they are not best-suited for SNNs and proposes a spike-aware neural architecture search framework called AutoSNN, a search space consisting of architectures without undesirable design choices.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Related Papers (5)