scispace - formally typeset
Proceedings ArticleDOI

DANIEL: A Deep Architecture for Automatic Analysis and Retrieval of Building Floor Plans

01 Nov 2017-pp 420-425

TL;DR: This paper proposes Deep Architecture for fiNdIng alikE Layouts (DANIEL), a novel deep learning framework to retrieve similar floor plan layouts from repository and creation of a new complex dataset ROBIN, having three broad dataset categories with 510 real world floor plans.

AbstractAutomatically finding out existing building layouts from a repository is always helpful for an architect to ensure reuse of design and timely completion of projects. In this paper, we propose Deep Architecture for fiNdIng alikE Layouts (DANIEL). Using DANIEL, an architect can search from the existing projects repository of layouts (floor plan), and give accurate recommendation to the buyers. DANIEL is also capable of recommending the property buyers, having a floor plan image, the corresponding rank ordered list of alike layouts. DANIEL is based on the deep learning paradigm to extract both low and high level semantic features from a layout image. The key contributions in the proposed approach are: (i) novel deep learning framework to retrieve similar floor plan layouts from repository; (ii) analysing the effect of individual deep convolutional neural network layers for floor plan retrieval task; and (iii) creation of a new complex dataset ROBIN (Repository Of BuildIng plaNs), having three broad dataset categories with 510 real world floor plans.We have evaluated DANIEL by performing extensive experiments on ROBIN and compared our results with eight different state-of-the-art methods to demonstrate DANIEL’s effectiveness on challenging scenarios.

...read more


Citations
More filters
Journal ArticleDOI
TL;DR: In insights about how document analysis systems are built, the examination of the practices of researchers in this field allows us to conclude that the tools that are used, and related issues, have become more and more complex over time.
Abstract: As the use of deep methods become widespread in the scientific community, causing major changes in systems architecture and position in terms of knowledge acquisition, we report here our insights about how document analysis systems are built. Where does the expertise really lie? In the features, in the decision making step, in the system design, in the data illustrating the problem to be solved? The examination of the practices of researchers in this field, and their evolution, allows us to conclude that the tools that are used, and related issues, have become more and more complex over time. Nevertheless, human skill is needed to activate these tools and to imagine new ones.

15 citations

Book ChapterDOI
23 Aug 2020
TL;DR: This work presents a new method for vectorization of technical line drawings, such as floor plans, architectural drawings, and 2D CAD images, that quantitatively and qualitatively outperforms a number of existing techniques on a collection of representative technical drawings.
Abstract: We present a new method for vectorization of technical line drawings, such as floor plans, architectural drawings, and 2D CAD images. Our method includes (1) a deep learning-based cleaning stage to eliminate the background and imperfections in the image and fill in missing parts, (2) a transformer-based network to estimate vector primitives, and (3) optimization procedure to obtain the final primitive configurations. We train the networks on synthetic data, renderings of vector line drawings, and manually vectorized scans of line drawings. Our method quantitatively and qualitatively outperforms a number of existing techniques on a collection of representative technical drawings.

14 citations


Additional excerpts

  • ..., SESYD [8], ROBIN [37], and FPLAN-POLY [34])....

    [...]

Proceedings ArticleDOI
01 Mar 2018
TL;DR: It is demonstrated that the proposed end-to-end framework for first person vision based textual description synthesis of building floor plans gives state of the art performance on challenging, real-world floor plan images.
Abstract: We focus on synthesis of textual description from a given building floor plan image based on the first-person vision perspective. Tasks like symbol spotting, wall and decor segmentation, semantic and perceptual segmentation has been done in the past on floor plans. Here, for the first time, we propose an end-to-end framework for first person vision based textual description synthesis of building floor plans. We demonstrate (qualitative and quantitatively) that the proposed framework gives state of the art performance on challenging, real-world floor plan images. Potential application of this work could be understanding floor plans, stability analysis of buildings, and retrieval.

7 citations


Cites methods from "DANIEL: A Deep Architecture for Aut..."

  • ...We performed our experiment on the ROBIN dataset proposed in [18] to show the effectiveness of our proposed method....

    [...]

Journal ArticleDOI
TL;DR: A novel algorithm to extract high-level semantic features from an architectural floor plan using weighted sum of the features is proposed, where a feature can be given more preference over others, during retrieval.
Abstract: Due to the massive growth of real estate industry, there is an increase in the number of online platforms designed for finding homes/furnished properties. Instead of descriptive words, query by example is always a preferred method for retrieval. Floor plans are the basic 2D representation giving an idea about the building structure at a particular level. The authors propose a framework for the retrieval of similar architectural floor plans under the query by example paradigm. They propose a novel algorithm to extract high-level semantic features from an architectural floor plan. Fine-grained retrieval using weighted sum of the features is proposed, where a feature can be given more preference over others, during retrieval. Experiments were performed on publicly available dataset containing 510 floor plans and compared with existing state-of-the-art techniques. Their proposed method outperforms others both in qualitative and quantitative terms.

6 citations

Journal ArticleDOI
TL;DR: A critical review of past and present hospital layout modelling techniques discusses their capabilities and limitations, and enables readers to consider ethical values while critiquing the epistemology of computational processes hidden beneath algorithmic outputs.
Abstract: Purpose This paper reviews an area of interdisciplinary collaboration in the design of healthcare facilities that attempts to optimize hospital space-planning using automated statistical techniques from the discipline of Operations Research (OR). This review articulates Facility Layout Problems (FLPs) as a general class of OR problems. Furthermore, the review highlights limitations of these techniques, which necessitate an ethical and participatory engagement with computerized processes of healthcare architecture. Design/methodology/approach An in-depth critical review was carried out, which revealed a number of common themes, collectively theorized as metamodeling processes, or models of models, through which various FLP modelling techniques can be challenged and debated in terms of their architectural viability, and ethical ramifications. Findings This review provides a methodological basis for the further evaluation of computational models. It was found that most of the reviewed studies are functionally focused on flow efficiency and, in general, do not consider broader contextual, relational, social, or salutogenic design values. Originality/value This review is the first on the subject written from an architectural perspective. It can be used by a broad range of readers as its critical review of past and present hospital layout modelling techniques discusses their capabilities and limitations. As such, it also enables them to consider ethical values while critiquing the epistemology of computational processes hidden beneath algorithmic outputs.

6 citations


References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,871 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

28,803 citations


"DANIEL: A Deep Architecture for Aut..." refers methods in this paper

  • ...We have shown the P-R plot for only proposed ROBIN Dataset, as for SESYD dataset our proposed and some of the existing state-of-the-art techniques [2], [3], [15] yielded flat PR curve (Precision value 1 for all Recall values)....

    [...]

  • ...Hence, when the same features (HOG, SIFT, RLH) are used under a canonical CBIR paradigm, they yield superior results than OASIS (see Tab....

    [...]

  • ...Several generic image retrieval systems were also proposed in the past [13], [14], where features like Histogram of Oriented Gradients (HOG) [15], Bag of Features (BOF) [16], Local Binary Pattern (LBP) [17] and Run-Length Histogram [18] has been used....

    [...]

Posted Content
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

12,530 citations

Journal ArticleDOI
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Abstract: This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper encompasses a detailed description of the detector and descriptor and then explores the effects of the most important parameters. We conclude the article with SURF's application to two challenging, yet converse goals: camera calibration as a special case of image registration, and object recognition. Our experiments underline SURF's usefulness in a broad range of topics in computer vision.

11,276 citations


"DANIEL: A Deep Architecture for Aut..." refers background in this paper

  • ...Rotation and translation invariant features for symbol spotting in documents were proposed in [22], and [23]....

    [...]

Proceedings ArticleDOI
03 Nov 2014
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

9,244 citations


"DANIEL: A Deep Architecture for Aut..." refers background in this paper

  • ...layers is switched (pooling is done before normalization) [28]....

    [...]