scispace - formally typeset
Search or ask a question
Author

Shreya Goyal

Bio: Shreya Goyal is an academic researcher from Indian Institute of Technology, Jodhpur. The author has contributed to research in topics: Floor plan & Supervised learning. The author has an hindex of 4, co-authored 11 publications receiving 26 citations.

Papers
More filters
Proceedings ArticleDOI
01 Sep 2019
TL;DR: An extensive experimental study is presented for tasks like furniture localization in a floor plan, caption and description generation, on the proposed dataset showing the utility of BRIDGE.
Abstract: In this paper, a large scale public dataset containing floor plan images and their annotations is presented. BRIDGE (Building plan Repository for Image Description Generation, and Evaluation) dataset contains more than 13000 images of the floor plan and annotations collected from various websites, as well as publicly available floor plan images in the research domain. The images in BRIDGE also has annotations for symbols, region graphs, and paragraph descriptions. The BRIDGE dataset will be useful for symbol spotting, caption and description generation, scene graph synthesis, retrieval and many other tasks involving building plan parsing. In this paper, we also present an extensive experimental study for tasks like furniture localization in a floor plan, caption and description generation, on the proposed dataset showing the utility of BRIDGE.

11 citations

Proceedings ArticleDOI
01 Mar 2018
TL;DR: It is demonstrated that the proposed end-to-end framework for first person vision based textual description synthesis of building floor plans gives state of the art performance on challenging, real-world floor plan images.
Abstract: We focus on synthesis of textual description from a given building floor plan image based on the first-person vision perspective. Tasks like symbol spotting, wall and decor segmentation, semantic and perceptual segmentation has been done in the past on floor plans. Here, for the first time, we propose an end-to-end framework for first person vision based textual description synthesis of building floor plans. We demonstrate (qualitative and quantitatively) that the proposed framework gives state of the art performance on challenging, real-world floor plan images. Potential application of this work could be understanding floor plans, stability analysis of buildings, and retrieval.

8 citations

Journal ArticleDOI
TL;DR: In this paper, the authors propose a framework called Sugaman (Supervised and Unified framework using Grammar and Annotation Model for Access and Navigation) for describing a floor plan and giving direction for obstacle-free movement within a building.
Abstract: In this study, the authors propose a framework SUGAMAN (Supervised and Unified framework using Grammar and Annotation Model for Access and Navigation). SUGAMAN is a Hindi word meaning ‘easy passage from one place to another’. SUGAMAN synthesises textual description from a given floor plan image, usable by visually impaired to navigate by understanding the arrangement of rooms and furniture. It is the first framework for describing a floor plan and giving direction for obstacle-free movement within a building. The model learns five classes of room categories from 1355 room image samples under a supervised learning paradigm. These learned annotations are fed into a description synthesis framework to yield a holistic description of a floor plan image. Authors demonstrate the performance of various supervised classifiers on room learning and provided a comparative analysis of system generated and human-written descriptions. The contribution of this study includes a novel framework for description generation from document images with graphics while proposing a new feature representing the floor plans, text annotations for a publicly available data set, and an algorithm for door to door obstacle avoidance navigation. This work can be applied to areas like understanding floor plans and design of historical monuments, and retrieval.

8 citations

Proceedings ArticleDOI
15 Oct 2018
TL;DR: This work proposes an end to end framework (ASYSST) for textual description synthesis from digitized building floor plans and introduces a novel Bag of Decor feature to learn $5$ classes of a room from $1355$ samples under a supervised learning paradigm.
Abstract: In an indoor scenario, the visually impaired do not have the information about the surroundings and finds it difficult to navigate from room to room. The sensor-based solutions are expensive and may not always be comfortable for the end users. In this paper, we focus on the problem of synthesis of textual description from a given floor plan image to assist the visually impaired. The textual description, in addition to a text reading software, can aid the visually impaired person while moving inside a building. In this work, for the first time, we propose an end to end framework (ASYSST) for textual description synthesis from digitized building floor plans. We have introduced a novel Bag of Decor (BoD) feature to learn $5$ classes of a room from $1355$ samples under a supervised learning paradigm. These learned labels are fed into a description synthesis framework to yield a holistic description of a floor plan image. Experimental analysis of real publicly available floor plan data-set proves the superiority of our framework.

4 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images.
Abstract: Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model’s superiority.

4 citations


Cited by
More filters
Journal ArticleDOI
27 Aug 2020-Entropy
TL;DR: An artificial intelligence-based fully automatic assistive technology to recognize different objects, and auditory inputs are provided to the user in real time, which gives better understanding to the visually impaired person about their surroundings.
Abstract: Visually impaired people face numerous difficulties in their daily life, and technological interventions may assist them to meet these challenges. This paper proposes an artificial intelligence-based fully automatic assistive technology to recognize different objects, and auditory inputs are provided to the user in real time, which gives better understanding to the visually impaired person about their surroundings. A deep-learning model is trained with multiple images of objects that are highly relevant to the visually impaired person. Training images are augmented and manually annotated to bring more robustness to the trained model. In addition to computer vision-based techniques for object recognition, a distance-measuring sensor is integrated to make the device more comprehensive by recognizing obstacles while navigating from one place to another. The auditory information that is conveyed to the user after scene segmentation and obstacle identification is optimized to obtain more information in less time for faster processing of video frames. The average accuracy of this proposed method is 95.19% and 99.69% for object detection and recognition, respectively. The time complexity is low, allowing a user to perceive the surrounding scene in real time.

39 citations

Proceedings ArticleDOI
26 Oct 2020
TL;DR: A survey with 106 people with visual impairments is presented, in which the strategies they use to prepare for a journey to unknown buildings, how they orient themselves in unfamiliar buildings and what materials they use are examined.
Abstract: It is much more difficult for people with visual impairments to plan and implement a journey to unknown places than for sighted people, because in addition to the usual travel arrangements, they also need to know whether the different parts of the travel chain are accessible at all. The need for information is presumably therefore very high and ranges from knowledge about the accessibility of public transport as well as outdoor and indoor environments. However, to the best of our knowledge, there is no study that examines in-depth requirements of both the planning of a trip and its implementation, looking separately at the various special needs of people with low vision and blindness. In this paper, we present a survey with 106 people with visual impairments, in which we examine the strategies they use to prepare for a journey to unknown buildings, how they orient themselves in unfamiliar buildings and what materials they use. Our analysis shows that requirements for people with blindness and low vision differ. The feedback from the participants reveals that there is a large information gap, especially for orientation in buildings, regarding maps, accessibility of buildings and supporting systems. In particular, there is a lack of availability of indoor maps.

21 citations

Journal ArticleDOI
TL;DR: This paper investigates 10 relevant studies combining GSL and EPO and analyses their gaps and extends the analysis to the research on GSL andEPO.

16 citations

Journal ArticleDOI
TL;DR: The article illustrates the entire AR pipeline development in architecture, from the conceptual phase to its application, highlighting each step’s specific aspects, to provide a general overview to a non-expert, deepening the topic and stimulating a democratization process.
Abstract: Augmented reality (AR) allows the real and digital worlds to converge and overlap in a new way of observation and understanding. The architectural field can significantly benefit from AR applications, due to their systemic complexity in terms of knowledge and process management. Global interest and many research challenges are focused on this field, thanks to the conjunction of technological and algorithmic developments from one side, and the massive digitization of built data. A significant quantity of research in the AEC and educational fields describes this state of the art. Moreover, it is a very fragmented domain, in which specific advances or case studies are often described without considering the complexity of the whole development process. The article illustrates the entire AR pipeline development in architecture, from the conceptual phase to its application, highlighting each step’s specific aspects. This storytelling aims to provide a general overview to a non-expert, deepening the topic and stimulating a democratization process. The aware and extended use of AR in multiple areas of application can lead a new way forward for environmental understanding, bridging the gap between real and virtual space in an innovative perception of architecture.

15 citations