ASYSST: A Framework for Synopsis Synthesis Empowering Visually Impaired

doi:10.1145/3264856.3264859

Home
/
Papers
/
ASYSST: A Framework for Synopsis Synthesis Empowering Visually Impaired

Proceedings Article•DOI•

ASYSST: A Framework for Synopsis Synthesis Empowering Visually Impaired

Shreya Goyal¹, Chiranjoy Chattopadhyay¹, Gaurav Bhatnagar¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

15 Oct 2018-

TL;DR: This work proposes an end to end framework (ASYSST) for textual description synthesis from digitized building floor plans and introduces a novel Bag of Decor feature to learn $5$ classes of a room from $1355$ samples under a supervised learning paradigm.

read less

Abstract: In an indoor scenario, the visually impaired do not have the information about the surroundings and finds it difficult to navigate from room to room. The sensor-based solutions are expensive and may not always be comfortable for the end users. In this paper, we focus on the problem of synthesis of textual description from a given floor plan image to assist the visually impaired. The textual description, in addition to a text reading software, can aid the visually impaired person while moving inside a building. In this work, for the first time, we propose an end to end framework (ASYSST) for textual description synthesis from digitized building floor plans. We have introduced a novel Bag of Decor (BoD) feature to learn $5$ classes of a room from $1355$ samples under a supervised learning paradigm. These learned labels are fed into a description synthesis framework to yield a holistic description of a floor plan image. Experimental analysis of real publicly available floor plan data-set proves the superiority of our framework.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

BRIDGE: Building Plan Repository for Image Description Generation, and Evaluation

[...]

Shreya Goyal¹, Vishesh Mistry¹, Chiranjoy Chattopadhyay¹, Gaurav Bhatnagar¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

01 Sep 2019

TL;DR: An extensive experimental study is presented for tasks like furniture localization in a floor plan, caption and description generation, on the proposed dataset showing the utility of BRIDGE.

...read moreread less

Abstract: In this paper, a large scale public dataset containing floor plan images and their annotations is presented. BRIDGE (Building plan Repository for Image Description Generation, and Evaluation) dataset contains more than 13000 images of the floor plan and annotations collected from various websites, as well as publicly available floor plan images in the research domain. The images in BRIDGE also has annotations for symbols, region graphs, and paragraph descriptions. The BRIDGE dataset will be useful for symbol spotting, caption and description generation, scene graph synthesis, retrieval and many other tasks involving building plan parsing. In this paper, we also present an extensive experimental study for tasks like furniture localization in a floor plan, caption and description generation, on the proposed dataset showing the utility of BRIDGE.

...read moreread less

11 citations

Cites methods from "ASYSST: A Framework for Synopsis Sy..."

...In [14], [15], authors have used handcrafted features for identifying decor symbol, room information and generating region wise caption generation....
[...]
...1) Template based: Paragraph based descriptions are generated by using technique proposed in [14]....
[...]

Extending the flexibility of case-based design support tools: A use case in the architectural domain

[...]

Viktor Ayzenshtadt¹, Viktor Ayzenshtadt², Christoph Langenhan³, Syed Saqib Bukhari², Klaus-Dieter Althoff¹, Klaus-Dieter Althoff², Frank Petzold³, Andreas Dengel⁴, Andreas Dengel² - Show less +5 more•Institutions (4)

University of Hildesheim¹, German Research Centre for Artificial Intelligence², Technische Universität München³, Kaiserslautern University of Technology⁴

01 Jan 2017

TL;DR: In this article, the authors present results of a user study into extending the functionality of an existing casebased search engine for similar architectural designs to a flexible process-oriented case-based support tool for the architectural conceptualization phase.

...read moreread less

Abstract: This paper presents results of a user study into extending the functionality of an existing case-based search engine for similar architectural designs to a flexible process-oriented case-based support tool for the architectural conceptualization phase. Based on a research examining the target group’s (architects) thinking and working processes during the early conceptualization phase (especially during the search for similar architectural references), we identified common features for defining retrieval strategies for a more flexible case-based search for similar building designs within our system. Furthermore, we were also able to infer a definition for implementing these strategies into the early conceptualization process in architecture, that is, to outline a definition for this process as a wrapping structure for a user model. The study was conducted among the target group representatives (architects, architecture students and teaching personnel) by means of applying the paper prototyping method and Business Processing Model and Notation (BPMN). The results of this work are intended as a foundation for our upcoming research, but we also think it could be of wider interest for the case-based design research area.

...read moreread less

6 citations

Journal Article•DOI•

Knowledge-driven description synthesis for floor plan interpretation

[...]

Shreya Goyal¹, Chiranjoy Chattopadhyay¹, Gaurav Bhatnagar¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

26 Apr 2021-International Journal on Document Analysis and Recognition

TL;DR: In this paper, the authors proposed two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images.

...read moreread less

Abstract: Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model’s superiority.

...read moreread less

4 citations

Posted Content•

Knowledge driven Description Synthesis for Floor Plan Interpretation

[...]

Shreya Goyal¹, Chiranjoy Chattopadhyay¹, Gaurav Bhatnagar¹•Institutions (1)

Indian Institute of Technology, Jodhpur¹

15 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors proposed two models, Description Synthesis from Image Cue (DSIC) and Transformer Based Description Generation (TBDG), for floor plan image to text generation to fill the gaps in existing methods.

...read moreread less

Abstract: Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, Description Synthesis from Image Cue (DSIC) and Transformer Based Description Generation (TBDG), for the floor plan image to text generation to fill the gaps in existing methods. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model's superiority.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Bleu: a Method for Automatic Evaluation of Machine Translation

[...]

Kishore Papineni¹, Salim Roukos¹, Todd Ward¹, Wei-Jing Zhu¹•Institutions (1)

IBM¹

06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

...read moreread less

21,126 citations

Journal Article•DOI•

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

[...]

Timo Ojala¹, Matti Pietikäinen¹, Topi Mäenpää¹•Institutions (1)

University of Oulu¹

01 Jul 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.

...read moreread less

Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

...read moreread less

14,245 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

"ASYSST: A Framework for Synopsis Sy..." refers methods in this paper

...To delineate room boundaries, we detect doors using scale invariant features [9] and close the gaps in wall image corresponding to the door locations....
[...]

Proceedings Article•

ROUGE: A Package for Automatic Evaluation of Summaries

[...]

Chin-Yew Lin¹•Institutions (1)

Information Sciences Institute¹

25 Jul 2004

TL;DR: Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

...read moreread less

Abstract: ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

...read moreread less

9,293 citations

"ASYSST: A Framework for Synopsis Sy..." refers methods in this paper

...Since ROUGE-1, ROUGE-2, and ROUGE-3 use uni-gram, bi-gram and trigram comparisons respectively, the decreasing nature of average precision is natural....
[...]
...We have compared the machine generated description of the floor plan with human written descriptions using 3metrics, ROUGE [12], BLEU [16] and METEOR [6]....
[...]
...As the value of n in n-gram comparison increasing, the ROUGE precision score decreases, which is also clear from Tab....
[...]
...Table 3 depicts the average recall, average precision and F score for ROUGE-1, ROUGE-2, ROUGE-3....
[...]
...We have compared the machine generated description of the floor plan with human written descriptions using 3metrics, ROUGE [12], BLEU [16] and METEOR [6]....
[...]

Journal Article•DOI•

Visual pattern recognition by moment invariants

[...]

Ming-Kuei Hu¹•Institutions (1)

Syracuse University¹

01 Feb 1962-IEEE Transactions on Information Theory

TL;DR: It is shown that recognition of geometrical patterns and alphabetical characters independently of position, size and orientation can be accomplished and it is indicated that generalization is possible to include invariance with parallel projection.

...read moreread less

Abstract: In this paper a theory of two-dimensional moment invariants for planar geometric figures is presented. A fundamental theorem is established to relate such moment invariants to the well-known algebraic invariants. Complete systems of moment invariants under translation, similitude and orthogonal transformations are derived. Some moment invariants under general two-dimensional linear transformations are also included. Both theoretical formulation and practical models of visual pattern recognition based upon these moment invariants are discussed. A simple simulation program together with its performance are also presented. It is shown that recognition of geometrical patterns and alphabetical characters independently of position, size and orientation can be accomplished. It is also indicated that generalization is possible to include invariance with parallel projection.

...read moreread less

7,963 citations

"ASYSST: A Framework for Synopsis Sy..." refers background or methods in this paper

...Table 1 shows the quantitative comparison between ours, [4], [10] and our technique using LBP (local binary pattern) feature [15]....
[...]
...Also in [10], recognition is 0 for many decor items with low accuracy for others....
[...]