scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

DANIEL: A Deep Architecture for Automatic Analysis and Retrieval of Building Floor Plans

TL;DR: This paper proposes Deep Architecture for fiNdIng alikE Layouts (DANIEL), a novel deep learning framework to retrieve similar floor plan layouts from repository and creation of a new complex dataset ROBIN, having three broad dataset categories with 510 real world floor plans.
Abstract: Automatically finding out existing building layouts from a repository is always helpful for an architect to ensure reuse of design and timely completion of projects. In this paper, we propose Deep Architecture for fiNdIng alikE Layouts (DANIEL). Using DANIEL, an architect can search from the existing projects repository of layouts (floor plan), and give accurate recommendation to the buyers. DANIEL is also capable of recommending the property buyers, having a floor plan image, the corresponding rank ordered list of alike layouts. DANIEL is based on the deep learning paradigm to extract both low and high level semantic features from a layout image. The key contributions in the proposed approach are: (i) novel deep learning framework to retrieve similar floor plan layouts from repository; (ii) analysing the effect of individual deep convolutional neural network layers for floor plan retrieval task; and (iii) creation of a new complex dataset ROBIN (Repository Of BuildIng plaNs), having three broad dataset categories with 510 real world floor plans.We have evaluated DANIEL by performing extensive experiments on ROBIN and compared our results with eight different state-of-the-art methods to demonstrate DANIEL’s effectiveness on challenging scenarios.
Citations
More filters
Journal ArticleDOI
TL;DR: A floorplan embedding technique that uses an attributed graph to represent the geometric information as well as design semantics and behavioral features of the inhabitants as node and edge attributes and is a generative model.
Abstract: Floorplans are commonly used to represent the layout of buildings. In computer aided-design (CAD) floorplans are usually represented in the form of hierarchical graph structures. Research works towards computational techniques that facilitate the design process, such as automated analysis and optimization, often use simple floorplan representations that ignore the semantics of the space and do not take into account usage related analytics. We present a floorplan embedding technique that uses an attributed graph to represent the geometric information as well as design semantics and behavioral features of the inhabitants as node and edge attributes. A Long Short-Term Memory (LSTM) Variational Autoencoder (VAE) architecture is proposed and trained to embed attributed graphs as vectors in a continuous space. A user study is conducted to evaluate the coupling of similar floorplans retrieved from the embedding space with respect to a given input (e.g., design layout). The qualitative, quantitative and user-study evaluations show that our embedding framework produces meaningful and accurate vector representations for floorplans. In addition, our proposed model is a generative model. We studied and showcased its effectiveness for generating new floorplans. We also release the dataset that we have constructed and which, for each floorplan, includes the design semantics attributes as well as simulation generated human behavioral features for further study in the community.

4 citations


Additional excerpts

  • ...By emerging Convolutional Neural Networks (CNN), In [38] a deep CNN is presented for feature extraction to address the limitation of conventional image processing techniques for extracting features....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors proposed two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images.
Abstract: Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model’s superiority.

4 citations

Journal ArticleDOI
TL;DR: In this article, two intelligent design processes based on healthcare systematic layout planning (HSLP) and generative adversarial network (GAN) are proposed to solve the generation problem of the plane functional layout of the operating departments (ODs) of general hospitals.
Abstract: With the increasing demands of health care, the design of hospital buildings has become increasingly demanding and complicated. However, the traditional layout design method for hospital is labor intensive, time consuming and prone to errors. With the development of artificial intelligence (AI), the intelligent design method has become possible and is considered to be suitable for the layout design of hospital buildings. Two intelligent design processes based on healthcare systematic layout planning (HSLP) and generative adversarial network (GAN) are proposed in this paper, which aim to solve the generation problem of the plane functional layout of the operating departments (ODs) of general hospitals. The first design method that is more like a mathematical model with traditional optimization algorithm concerns the following two steps: developing the HSLP model based on the conventional systematic layout planning (SLP) theory, identifying the relationship and flows amongst various departments/units, and arriving at the preliminary plane layout design; establishing mathematical model to optimize the building layout by using the genetic algorithm (GA) to obtain the optimized scheme. The specific process of the second intelligent design based on more than 100 sets of collected OD drawings includes: labelling the corresponding functional layouts of each OD plan; building image-to-image translation with conditional adversarial network (pix2pix) for training OD plane layouts, which is one of the most representative GAN models. Finally, the functions and features of the results generated by the two methods are analyzed and compared from an architectural and algorithmic perspective. Comparison of the two design methods shows that the HSLP and GAN models can autonomously generate new OD plane functional layouts. The HSLP layouts have clear functional area adjacencies and optimization goals, but the layouts are relatively rigid and not specific enough. The GAN outputs are the most innovative layouts with strong applicability, but the dataset has strict constraints. The goal of this paper is to help release the heavy load of architects in the early design stage and present the effectiveness of these intelligent design methods in the field of medical architecture.

4 citations

Posted Content
TL;DR: SUGAMAN is the first framework for describing a floor plan and giving direction for obstacle-free movement within a building and can be applied to areas like understanding floor plans of historical monuments, stability analysis of buildings, and retrieval.
Abstract: In this paper, we propose SUGAMAN (Supervised and Unified framework using Grammar and Annotation Model for Access and Navigation). SUGAMAN is a Hindi word meaning "easy passage from one place to another". SUGAMAN synthesizes textual description from a given floor plan image for the visually impaired. A visually impaired person can navigate in an indoor environment using the textual description generated by SUGAMAN. With the help of a text reader software, the target user can understand the rooms within the building and arrangement of furniture to navigate. SUGAMAN is the first framework for describing a floor plan and giving direction for obstacle-free movement within a building. We learn $5$ classes of room categories from $1355$ room image samples under a supervised learning paradigm. These learned annotations are fed into a description synthesis framework to yield a holistic description of a floor plan image. We demonstrate the performance of various supervised classifiers on room learning. We also provide a comparative analysis of system generated and human written descriptions. SUGAMAN gives state of the art performance on challenging, real-world floor plan images. This work can be applied to areas like understanding floor plans of historical monuments, stability analysis of buildings, and retrieval.

4 citations


Cites background or methods from "DANIEL: A Deep Architecture for Aut..."

  • ...They are: (i) Systems Evaluation SYnthetic Documents (SESYD) [30] (ii) Computer Vision Center Floor Plan (CVC-FP) [31] and (iii) Repository Of BuildIng plaNs (ROBIN) [29]....

    [...]

  • ...Figure 6 shows the 12 decor symbols used in the dataset [29]....

    [...]

Proceedings Article
01 Jan 2019
TL;DR: The approach takes into account different possible actions of the configuration process, such as adding, removing, or (re)assigning of the room type, and is implemented in a distributed CBR framework for support of early conceptual design in architecture.
Abstract: This paper presents the first results of the research into AIbased support of the room configuration process during the early design phases in architecture. Room configuration (also: room layout or space layout) is an essential stage of the initial design phase: its results are crucial for user-friendliness and success of the planned utilization of the architectural object. Our approach takes into account different possible actions of the configuration process, such as adding, removing, or (re)assigning of the room type. Its mode of operation is based on specific process chain clusters, where each cluster represents a contextual subset of previous configuration steps and provides a recurrent neural network trained on this cluster data only to suggest the next step, and a case base that is used to determine if the current process chain belongs to this cluster. The most similar cluster then tries to suggest the next step of the process. The approach is implemented in a distributed CBR framework for support of early conceptual design in architecture and was evaluated with a high number of process chain queries to prove its general suitability.

3 citations


Cites background from "DANIEL: A Deep Architecture for Aut..."

  • ...One of the most relevant recently developed approaches uses deep learning to extract semantic features from floor plan images (Sharma et al. 2017)....

    [...]

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"DANIEL: A Deep Architecture for Aut..." refers methods in this paper

  • ...We have shown the P-R plot for only proposed ROBIN Dataset, as for SESYD dataset our proposed and some of the existing state-of-the-art techniques [2], [3], [15] yielded flat PR curve (Precision value 1 for all Recall values)....

    [...]

  • ...Hence, when the same features (HOG, SIFT, RLH) are used under a canonical CBIR paradigm, they yield superior results than OASIS (see Tab....

    [...]

  • ...Several generic image retrieval systems were also proposed in the past [13], [14], where features like Histogram of Oriented Gradients (HOG) [15], Bag of Features (BOF) [16], Local Binary Pattern (LBP) [17] and Run-Length Histogram [18] has been used....

    [...]

Posted Content
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

12,531 citations

Journal ArticleDOI
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.

12,449 citations


"DANIEL: A Deep Architecture for Aut..." refers background in this paper

  • ...Rotation and translation invariant features for symbol spotting in documents were proposed in [22], and [23]....

    [...]

Proceedings ArticleDOI
03 Nov 2014
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

10,161 citations


"DANIEL: A Deep Architecture for Aut..." refers background in this paper

  • ...layers is switched (pooling is done before normalization) [28]....

    [...]