scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A High-Performance VLSI Architecture for a Self-Feedback Convolutional Neural Network

TL;DR: This brief studies the problem of developing an area-time efficient VLSI architecture for a novel self-feedback Convolutional Neural Network (CNN) and presents an efficient systolic array architecture for the self- feedback CNN with low on-chip memory requirement.
Abstract: This brief studies the problem of developing an area-time efficient VLSI architecture for a novel self-feedback Convolutional Neural Network (CNN) Self-feedback CNNs offer the promise of high-precision object detection amidst occlusions However, the size of a typical network required for practical applications presents a challenge for embedded system development We first present the structure of the self-feedback CNN We then present an efficient systolic array architecture for the self-feedback CNN with low on-chip memory requirement The self-feedback CNN has been tested on the KITTI benchmark dataset and it achieves high accuracy for detecting occluded cyclists and pedestrians FPGA implementation of the proposed architecture on Xilinx Virtex7 XC7VX485T achieves roughly 114 Tera Operations per second (TOP/s) at 386 MHz with $9\times $ reduction in on-chip memory requirement compared to recent CNN architectures
Citations
More filters
Journal ArticleDOI
TL;DR: CASSANN-v2 as mentioned in this paper proposes a high-performance reconfigurable CNN accelerator architecture, which can achieve 1TOPS peak performance at 1GHz, achieving 98.59% and 90.20% average processing element utilization, respectively.
Abstract: This work proposes a high-performance reconfigurable CNN accelerator architecture, called CASSANN-v2, which can achieve 1TOPS peak performance at 1GHz. CASSANN-v2 provides the function of on-chip SRAM memory real-time adaptive tuning by parameter configuration to reduce the intermediate output data transmission to further exploit the acceleration performance. The system simulation results show that CASSANN-v2 exhibits excellent performance on VGG-16 and ResNet-18 inference, with a throughput of 1009.54GOPS and 923.24GOPS at 1GHz, which achieved 98.59% and 90.20% average processing element utilization, respectively. Compared with state-of-the-art accelerator works, CASSANN-v2 improves the resource utilization by 2.02× in VGG-16 and 2.35× in ResNet-18.

3 citations

References
More filters
Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"A High-Performance VLSI Architectur..." refers background or methods in this paper

  • ...We present an area-time efficient systolic arraybased architecture for the proposed self-feedback CNN. Prior works on hardware realization of deep networks are largely limited to those for the classical CNN base networks: [17] and [18] for VGG-16, [2] and [19] for DarkNet [20]....

    [...]

  • ...works on hardware realization of deep networks are largely limited to those for the classical CNN base networks: [17] and [18] for VGG-16, [2] and [19] for DarkNet [20]....

    [...]

  • ...We compare the base network architecture of the self-feedback CNN with the base network architectures, VGG-16 [2], DarkNet [20], MFFD-B [21] and SqueezeNet [22], of the recent object detectors [3], [20], [21] and [23] respectively....

    [...]

  • ...In particular, the CNN models have achieved performance comparable to humans in object recognition [2] and detection [3] tasks irrespective of object position, scaling, rotation and lighting variability....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations

Journal ArticleDOI
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ’attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps ( including all steps ) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

26,458 citations

Posted Content
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

23,183 citations