Learning Hierarchical Features for Scene Labeling
read more
Citations
Deep learning
Going deeper with convolutions
Deep Learning
Fully convolutional networks for semantic segmentation
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
References
Gradient-based learning applied to document recognition
Fast approximate energy minimization via graph cuts
Efficient Graph-Based Image Segmentation
Contour Detection and Hierarchical Image Segmentation
Dimensionality Reduction by Learning an Invariant Mapping
Related Papers (5)
Frequently Asked Questions (18)
Q2. What is the main challenge of scene parsing?
One challenge of scene parsing is that it combines the traditional problems of detection, segmentation, and multi-label recognition in a single process.
Q3. What is the originality of the approach?
The originality of the approach is that the feature vector of the combination of two segments is computed from the feature vectors of the individual segments through a trainable function.
Q4. What is the main idea of scene parsing?
One important step towards understanding an image is to perform a full-scene labeling also known as a scene parsing, which consists in labeling every pixel in the image with the category of the object it belongs to.
Q5. What is the aggregated feature vector of each grid cell?
The aggregated feature vector of each grid cell is computed by a component-wise max pooling of the feature vectors centered on all the pixels that fall into the grid cell.
Q6. What scales are used to generate feature vectors?
Hence with three scales, each feature vector has multiple fields which encode multiple regions of increasing sizes and decreasing resolutions, centered on the same pixel location.
Q7. How can the authors parse an image of size 320240 in less than one second?
Exploiting the parallel structure of this special network, by computing convolutions in parallel, allows us to parse an image of size 320×240 in less than one second on a 4-core Intel i7 laptop.
Q8. What is the first representation of an image patch?
In the first representation, an image patch is seen as a point in RP , and the authors seek to find a transform f : RP → RQ that maps each patch into RQ, a space where it can be classified linearly.
Q9. What is the way to improve the performance of a feedforward pixel labeling system?
Relying heavily on a highly-accurate feed-forward pixel labeling system, while simplifying the postprocessing module to its bare minimum cuts down the inference times considerably.
Q10. What is the representation of the segmentation component?
Each segmentation component is represented by the set of feature vectors that fall into it: the component is encoded by a spatial grid of aggregated feature vectors.
Q11. What is the way to train a multiscale model?
This multiscale model, in which weights are shared across scales, allows the model to capture long-range interactions, without the penalty of extra parameters to train.
Q12. What is the way to reduce the set of components?
A classical technique to reduce the set of components is to consider a hierarchy of segmentations [33], [1], that can be represented as a tree T .
Q13. What is the attention function a used to mask the feature vector map with each component Ck?
The authors define a simple attention function a used to mask the feature vector map with each component Ck, producing a set of K masked feature vector patterns {F ⋂ Ck}, ∀k ∈ {1, . . . ,K}.
Q14. How is the model able to accurately locate and delineate objects?
This is usually achieved by using pooling/subsampling layers, which in turn degrades the ability of the model to precisely locate and delineate objects.
Q15. What is the way to analyze a family of segmentations?
It can be used as a solution to the first problem exposed above: assuming the capability of assessing the quality of all the components in this family of segmentations, a system can automatically choose its components so as to produce the best set of predictions.
Q16. How do the authors train the classifier c to predict the distribution of classes in the training set?
We8 construct the segmentation collections (T )T∈T on the entire training set, and, for all T ∈ T train the classifier c to predict the distribution of classes in component Ck ∈ T , as well as the costs Sk.
Q17. What is the general case of a segmentation tree?
In the simplest case, this family might be a segmentation tree; in the most general case it can be any set of segmentations, for example a collection of superpixels either produced using the same algorithm with different parameter tunings or produced by different algorithms.
Q18. How do the authors get feature vectors in f?
As described in Section 3.1, feature vectors in F are obtained by concatenating the outputs of multiple networks fs, each taking as input a different image in a multiscale pyramid.