scispace - formally typeset
Open AccessJournal ArticleDOI

DeepPrimitive: Image decomposition by layered primitive detection

Reads0
Chats0
TLDR
This paper builds a framework to detect primitives from images in a layered manner by modifying the YOLO network, and uses an RNN with a novel loss function to equip this network with the capability to predict primitives with a variable number of parameters.
Abstract
The perception of the visual world through basic building blocks, such as cubes, spheres, and cones, gives human beings a parsimonious understanding of the visual world. Thus, efforts to find primitive-based geometric interpretations of visual data date back to 1970s studies of visual media. However, due to the difficulty of primitive fitting in the pre-deep learning age, this research approach faded from the main stage, and the vision community turned primarily to semantic image understanding. In this paper, we revisit the classical problem of building geometric interpretations of images, using supervised deep learning tools. We build a framework to detect primitives from images in a layered manner by modifying the YOLO network; an RNN with a novel loss function is then used to equip this network with the capability to predict primitives with a variable number of parameters. We compare our pipeline to traditional and other baseline learning methods, demonstrating that our layered detection model has higher accuracy and performs better reconstruction.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds

TL;DR: A novel, end-to-end trainable, deep network called ParSeNet is proposed that decomposes a 3D point cloud into parametric surface patches, including B-spline patches as well as basic geometric primitives, and allows us to represent surfaces with higher fidelity.
Book ChapterDOI

ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds

TL;DR: ParSeNet as discussed by the authors decomposes a 3D point cloud into parametric surface patches, including B-spline patches as well as basic geometric primitives, to represent surfaces with higher fidelity.
Proceedings Article

UCSG-Net -- Unsupervised Discovering of Constructive Solid Geometry Tree

TL;DR: A model that extracts a CSG parse tree without any supervision - UCSG-Net is proposed that predicts parameters of primitives and binarizes their SDF representation through differentiable indicator function and shows that the predicted parse tree representation is interpretable and can be used in CAD software.
Book ChapterDOI

Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid

TL;DR: DefGrid as discussed by the authors predicts location offsets of vertices of a 2-dimensional triangular grid, such that the edges of the deformed grid align with image boundaries, which can be used for unsupervised image partitioning.
References
More filters
Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
Book ChapterDOI

SSD: Single Shot MultiBox Detector

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Related Papers (5)