scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

ASYSST: A Framework for Synopsis Synthesis Empowering Visually Impaired

TL;DR: This work proposes an end to end framework (ASYSST) for textual description synthesis from digitized building floor plans and introduces a novel Bag of Decor feature to learn $5$ classes of a room from $1355$ samples under a supervised learning paradigm.
Abstract: In an indoor scenario, the visually impaired do not have the information about the surroundings and finds it difficult to navigate from room to room. The sensor-based solutions are expensive and may not always be comfortable for the end users. In this paper, we focus on the problem of synthesis of textual description from a given floor plan image to assist the visually impaired. The textual description, in addition to a text reading software, can aid the visually impaired person while moving inside a building. In this work, for the first time, we propose an end to end framework (ASYSST) for textual description synthesis from digitized building floor plans. We have introduced a novel Bag of Decor (BoD) feature to learn $5$ classes of a room from $1355$ samples under a supervised learning paradigm. These learned labels are fed into a description synthesis framework to yield a holistic description of a floor plan image. Experimental analysis of real publicly available floor plan data-set proves the superiority of our framework.
Citations
More filters
Proceedings ArticleDOI
01 Sep 2019
TL;DR: An extensive experimental study is presented for tasks like furniture localization in a floor plan, caption and description generation, on the proposed dataset showing the utility of BRIDGE.
Abstract: In this paper, a large scale public dataset containing floor plan images and their annotations is presented. BRIDGE (Building plan Repository for Image Description Generation, and Evaluation) dataset contains more than 13000 images of the floor plan and annotations collected from various websites, as well as publicly available floor plan images in the research domain. The images in BRIDGE also has annotations for symbols, region graphs, and paragraph descriptions. The BRIDGE dataset will be useful for symbol spotting, caption and description generation, scene graph synthesis, retrieval and many other tasks involving building plan parsing. In this paper, we also present an extensive experimental study for tasks like furniture localization in a floor plan, caption and description generation, on the proposed dataset showing the utility of BRIDGE.

11 citations


Cites methods from "ASYSST: A Framework for Synopsis Sy..."

  • ...In [14], [15], authors have used handcrafted features for identifying decor symbol, room information and generating region wise caption generation....

    [...]

  • ...1) Template based: Paragraph based descriptions are generated by using technique proposed in [14]....

    [...]

01 Jan 2017
TL;DR: In this article, the authors present results of a user study into extending the functionality of an existing casebased search engine for similar architectural designs to a flexible process-oriented case-based support tool for the architectural conceptualization phase.
Abstract: This paper presents results of a user study into extending the functionality of an existing case-based search engine for similar architectural designs to a flexible process-oriented case-based support tool for the architectural conceptualization phase. Based on a research examining the target group’s (architects) thinking and working processes during the early conceptualization phase (especially during the search for similar architectural references), we identified common features for defining retrieval strategies for a more flexible case-based search for similar building designs within our system. Furthermore, we were also able to infer a definition for implementing these strategies into the early conceptualization process in architecture, that is, to outline a definition for this process as a wrapping structure for a user model. The study was conducted among the target group representatives (architects, architecture students and teaching personnel) by means of applying the paper prototyping method and Business Processing Model and Notation (BPMN). The results of this work are intended as a foundation for our upcoming research, but we also think it could be of wider interest for the case-based design research area.

6 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images.
Abstract: Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model’s superiority.

4 citations

Posted Content
TL;DR: In this paper, the authors proposed two models, Description Synthesis from Image Cue (DSIC) and Transformer Based Description Generation (TBDG), for floor plan image to text generation to fill the gaps in existing methods.
Abstract: Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, Description Synthesis from Image Cue (DSIC) and Transformer Based Description Generation (TBDG), for the floor plan image to text generation to fill the gaps in existing methods. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model's superiority.
References
More filters
Proceedings ArticleDOI
06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

21,126 citations

Journal ArticleDOI
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

14,245 citations

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations


"ASYSST: A Framework for Synopsis Sy..." refers methods in this paper

  • ...To delineate room boundaries, we detect doors using scale invariant features [9] and close the gaps in wall image corresponding to the door locations....

    [...]

Proceedings Article
25 Jul 2004
TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.
Abstract: ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

9,293 citations


"ASYSST: A Framework for Synopsis Sy..." refers methods in this paper

  • ...Since ROUGE-1, ROUGE-2, and ROUGE-3 use uni-gram, bi-gram and trigram comparisons respectively, the decreasing nature of average precision is natural....

    [...]

  • ...We have compared the machine generated description of the floor plan with human written descriptions using 3metrics, ROUGE [12], BLEU [16] and METEOR [6]....

    [...]

  • ...As the value of n in n-gram comparison increasing, the ROUGE precision score decreases, which is also clear from Tab....

    [...]

  • ...Table 3 depicts the average recall, average precision and F score for ROUGE-1, ROUGE-2, ROUGE-3....

    [...]

  • ...We have compared the machine generated description of the floor plan with human written descriptions using 3metrics, ROUGE [12], BLEU [16] and METEOR [6]....

    [...]

Journal ArticleDOI
Ming-Kuei Hu1
TL;DR: It is shown that recognition of geometrical patterns and alphabetical characters independently of position, size and orientation can be accomplished and it is indicated that generalization is possible to include invariance with parallel projection.
Abstract: In this paper a theory of two-dimensional moment invariants for planar geometric figures is presented. A fundamental theorem is established to relate such moment invariants to the well-known algebraic invariants. Complete systems of moment invariants under translation, similitude and orthogonal transformations are derived. Some moment invariants under general two-dimensional linear transformations are also included. Both theoretical formulation and practical models of visual pattern recognition based upon these moment invariants are discussed. A simple simulation program together with its performance are also presented. It is shown that recognition of geometrical patterns and alphabetical characters independently of position, size and orientation can be accomplished. It is also indicated that generalization is possible to include invariance with parallel projection.

7,963 citations


"ASYSST: A Framework for Synopsis Sy..." refers background or methods in this paper

  • ...Table 1 shows the quantitative comparison between ours, [4], [10] and our technique using LBP (local binary pattern) feature [15]....

    [...]

  • ...Also in [10], recognition is 0 for many decor items with low accuracy for others....

    [...]