scispace - formally typeset
Search or ask a question
Proceedings Article

Understanding the effective receptive field in deep convolutional neural networks

05 Dec 2016-Vol. 29, pp 4898-4906
TL;DR: The notion of an effective receptive fieldsize is introduced, and it is shown that it both has a Gaussian distribution and only occupies a fraction of the full theoretical receptive field size.
Abstract: We study characteristics of receptive fields of units in deep convolutional networks. The receptive field size is a crucial issue in many visual tasks, as the output must respond to large enough areas in the image to capture information about large objects. We introduce the notion of an effective receptive field size, and show that it both has a Gaussian distribution and only occupies a fraction of the full theoretical receptive field size. We analyze the effective receptive field in several architecture designs, and the effect of sub-sampling, skip connections, dropout and nonlinear activations on it. This leads to suggestions for ways to address its tendency to be too small.
Citations
More filters
Proceedings ArticleDOI
17 Mar 2017
TL;DR: Deformable convolutional networks as discussed by the authors augment the spatial sampling locations in the modules with additional offsets and learn the offsets from the target tasks, without additional supervision, which can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard backpropagation.
Abstract: Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in their building modules. In this work, we introduce two new modules to enhance the transformation modeling capability of CNNs, namely, deformable convolution and deformable RoI pooling. Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from the target tasks, without additional supervision. The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the performance of our approach. For the first time, we show that learning dense spatial transformation in deep CNNs is effective for sophisticated vision tasks such as object detection and semantic segmentation. The code is released at https://github.com/msracver/Deformable-ConvNets.

3,318 citations

Proceedings ArticleDOI
18 Apr 2019
TL;DR: KPConv is a new design of point convolution, i.e. that operates on point clouds without any intermediate representation, that outperform state-of-the-art classification and segmentation approaches on several datasets.
Abstract: We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. The convolution weights of KPConv are located in Euclidean space by kernel points, and applied to the input points close to them. Its capacity to use any number of kernel points gives KPConv more flexibility than fixed grid convolutions. Furthermore, these locations are continuous in space and can be learned by the network. Therefore, KPConv can be extended to deformable convolutions that learn to adapt kernel points to local geometry. Thanks to a regular subsampling strategy, KPConv is also efficient and robust to varying densities. Whether they use deformable KPConv for complex tasks, or rigid KPconv for simpler tasks, our networks outperform state-of-the-art classification and segmentation approaches on several datasets. We also offer ablation studies and visualizations to provide understanding of what has been learned by KPConv and to validate the descriptive power of deformable KPConv.

1,742 citations


Cites methods from "Understanding the effective recepti..."

  • ...To apprehend the differences between the representations learned by rigid and deformable KPConv, we can compute its Effective Receptive Field (ERF) [21] at different locations....

    [...]

  • ...We use Effective Receptive Field (ERF) [21] and ablation studies to compare rigid KPConv with deformable KPConv....

    [...]

  • ...The ERF is a measure of the influence that each input point has on the result of a KPConv layer at a particular location....

    [...]

  • ...We see that rigid KPConv ERF has a relatively consistent range on every type of object, whereas deformable KPConv ERF seems to adapt to the object size....

    [...]

  • ...As we can see in Figure 7, the ERF varies depending on the object it is centered on....

    [...]

Proceedings ArticleDOI
15 Jun 2019
TL;DR: This work presents a reformulation of Deformable Convolutional Networks that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training, and guides network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features.
Abstract: The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation.

1,373 citations


Cites background or methods from "Understanding the effective recepti..."

  • ...To better understand the behavior of Deformable ConvNets, we visualize the spatial support of network nodes by their effective receptive fields [30], effective sampling locations, and error-bounded saliency regions....

    [...]

  • ...The differences in these contributions are represented by an effective receptive field, whose values are calculated as the gradient of the node response with respect to intensity perturbations of each image pixel [30]....

    [...]

  • ...Recent works on effective receptive fields [30] and salient regions [40, 41, 12, 6] reveal that only a small proportion of pixels in the theoretical receptive field contribute significantly to the final network prediction....

    [...]

Book ChapterDOI
08 Sep 2018
TL;DR: Zhang et al. as discussed by the authors proposed a novel Receptive Fields (RFB) module, which takes the relationship between the size and eccentricity of RFs into account, to enhance the feature discriminability and robustness.
Abstract: Current top-performing object detectors depend on deep CNN backbones, such as ResNet-101 and Inception, benefiting from their powerful feature representations but suffering from high computational costs. Conversely, some lightweight model based detectors fulfil real time processing, while their accuracies are often criticized. In this paper, we explore an alternative to build a fast and accurate detector by strengthening lightweight features using a hand-crafted mechanism. Inspired by the structure of Receptive Fields (RFs) in human visual systems, we propose a novel RF Block (RFB) module, which takes the relationship between the size and eccentricity of RFs into account, to enhance the feature discriminability and robustness. We further assemble RFB to the top of SSD, constructing the RFB Net detector. To evaluate its effectiveness, experiments are conducted on two major benchmarks and the results show that RFB Net is able to reach the performance of advanced very deep detectors while keeping the real-time speed. Code is available at https://github.com/ruinmessi/RFBNet.

1,028 citations

Book ChapterDOI
08 Sep 2018
TL;DR: The point-wise spatial attention network (PSANet) is proposed to relax the local neighborhood constraint and achieves top performance on various competitive scene parsing datasets, including ADE20K, PASCAL VOC 2012 and Cityscapes, demonstrating its effectiveness and generality.
Abstract: We notice information flow in convolutional neural networks is restricted inside local neighborhood regions due to the physical design of convolutional filters, which limits the overall understanding of complex scenes. In this paper, we propose the point-wise spatial attention network (PSANet) to relax the local neighborhood constraint. Each position on the feature map is connected to all the other ones through a self-adaptively learned attention mask. Moreover, information propagation in bi-direction for scene parsing is enabled. Information at other positions can be collected to help the prediction of the current position and vice versa, information at the current position can be distributed to assist the prediction of other ones. Our proposed approach achieves top performance on various competitive scene parsing datasets, including ADE20K, PASCAL VOC 2012 and Cityscapes, demonstrating its effectiveness and generality.

940 citations


Cites background from "Understanding the effective recepti..."

  • ...Owing to the design of CNN structures, the receptive field of it is limited to local regions [47,27]....

    [...]