scispace - formally typeset
Proceedings ArticleDOI

Accelerate Non-unit Stride Convolutions with Winograd Algorithms

TLDR
In this paper, a universal approach was proposed to construct non-unit stride convolution algorithms for any given stride and filter sizes from Winograd algorithms. But, this method cannot directly apply those algorithms to accelerate the computations.
Abstract
While computer vision tasks target increasingly challenging scenarios, the need for real-time processing of images rises as well, requiring more efficient methods to accelerate convolutional neural networks For unit stride convolutions, we use FFT-based methods and Winograd algorithms to compute matrix convolutions, which effectively lower the computing complexity by reducing the number of multiplications For non-unit stride convolutions, we usually cannot directly apply those algorithms to accelerate the computations In this work, we propose a novel universal approach to construct the non-unit stride convolution algorithms for any given stride and filter sizes from Winograd algorithms Specifically, we first demonstrate the steps to decompose an arbitrary convolutional kernel and apply the Winograd algorithms separately to compute non-unit stride convolutions We then present the derivation of this method and proof by construction to confirm the validity of this approach Finally, we discuss the minimum number of multiplications and additions necessary for the non-unit stride convolutions and evaluate the performance of the decomposed Winograd algorithms From our analysis of the computational complexity, the new approach can benefit from 15x to 3x fewer multiplications In our experiments in real DNN layers, we have acquired around 13x speedup (T old / T new ) of the Winograd algorithms against the conventional convolution algorithm in various experiment settings

read more

Citations
More filters
Journal ArticleDOI

Evaluating FFT-based algorithms for strided convolutions on ARMv8 architectures

TL;DR: Experimental results with convolutional layers in popular CNNs show that the rearrangement-based method generally exceeds the sampling-based one under the same optimizations in most cases, and these two methods can get much better performance than GEMM-based ones when the kernels, feature maps and batch sizes are large.
Journal ArticleDOI

RiSA: A Reinforced Systolic Array for Depthwise Convolutions and Embedded Tensor Reshaping

TL;DR: RiSA as mentioned in this paper improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively, compared to Eyeriss v2.
Journal ArticleDOI

Low-Cost Online Convolution Checksum Checker

TL;DR: ConvGuard as discussed by the authors utilizes a newly introduced invariance condition of convolution to predict the output checksum using only the pixels at the border of the input image, which reduces the power required for accumulating the input pixels without requiring large buffers to hold intermediate checksum results.
Journal ArticleDOI

FPGA-based reflection image removal using cognitive neural networks

TL;DR: This paper generalizes the assumptions for issues of reflection based on usage of different information or impose new limitations elimination to help in effective energy utilization and the proposed algorithm is effective as compared with current approaches using CNN methods.
References
More filters
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Posted Content

Deep Residual Learning for Image Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Journal ArticleDOI

ImageNet classification with deep convolutional neural networks

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI

Rethinking the Inception Architecture for Computer Vision

TL;DR: In this article, the authors explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Related Papers (5)