scispace - formally typeset

Journal ArticleDOI

See ColOr: an extended sensory substitution device for the visually impaired

27 Jul 2014-Journal of Assistive Technologies (Emerald Group Publishing Limited)-Vol. 8, Iss: 2, pp 77-94

TL;DR: It is argued that any visual perception can be achieved through hearing needs to be reinforced or enhanced by techniques that lie beyond mere visual-to-audio mapping, and the See ColOr is learnable and functional.
Abstract: Purpose – The purpose of this paper is to overcome the limitations of sensory substitution methods (SSDs) to represent high-level or conceptual information involved in vision, which are mainly produced by the biological sensory mismatch between sight and substituting senses. Thus, provide the visually impaired with a more practical and functional SSD. Design/methodology/approach – Unlike any other approach, the SSD extends beyond a sensing prototype, by integrating computer vision methods to produce reliable knowledge about the physical world (at the lowest cost to the user). Importantly though, the authors do not abandon the typical encoding of low-level features into sound. The paper simply argues that any visual perception can be achieved through hearing needs to be reinforced or enhanced by techniques that lie beyond mere visual-to-audio mapping (e.g. computer vision, image processing). Findings – Experiments reported in this paper reveal that the See ColOr is learnable and functional, and provides ea...
Topics: Sensory substitution (58%), Visual perception (54%)

Summary (3 min read)

Introduction

  • An extended Sensory Substitution Device, also known as See ColOr.
  • They state that due to all this sensorial interconnection in their brain, visual-like experiences may be attained through senses others than vision [19], [21].
  • The authors hypothesis is that nowadays, all this amount of data cannot be supplied efficiently in SSDs, unless the authors integrate more advanced methods that lie beyond mere visual-to-audio mapping (e.g. computer or artificial vision techniques).
  • 1. STATE OF THE ART Back in the late 1960s, Paul Bach-y-Rita introduced the first SSD [22] prototype known as TVSS (Tactile Visual Sensory Substitution): black-and-white camera images, instead of going to a TV screen, were sent to a matrix of vibrators in contact with the skin of the back of a blind participant [5].
  • The auditory representation of an image was similar to that used in “The vOICe” [28] with distinct sinusoidal waves for each pixel.

2.1 The local module

  • The local module provides the user with the auditory representation of a row containing 25 points of the central part of captured image [35] .
  • These points are coded into leftright spatialized musical instrument sounds, in order to represent and emphasize the color and location of visual entities in the environment [37], [38], [39].
  • Having access to more than one portion of the image simultaneously would bring many advantages.
  • Note that such a task is unachievable using this local module.
  • To rectify for this deficiency, the authors introduced the global perception module that allows the user to explore several points with the fingers.

2.2 The Global module

  • In the global module the image is made accessible on a tactile tablet that makes it possible for the user to compare different points and explore the scene in a broader context [35], [40] .
  • A user may rapidly scan an image sliding one or more fingers on this tablet.
  • The finger movements are intended to mimic those of the eyes.
  • The spatial position of this pixel (left to right) is mapped to the hearing by directional virtual sound sources (spatialized sound) [36], [37], [38], [39].
  • The entire image resolution (460x640 using the Kinect camera) is made accessible to make the most of the camera information .

2.3.1 The alerting module

  • Since the functionalities just mentioned are limited to describe local portions using lowlevel features such as color and depth, they might fail to reveal cognitive aspects which often determine regions of interest within a picture [33].
  • The purpose of alerting system is to warn the user whenever a hazardous situation arises from obstacles lying ahead [35].
  • Once the system launches a warning, the user is expected to suspend the navigation not to bump into an obstacle.
  • This allows the blind persons finding a safe, clear path to advance through [40], [41].
  • The authors use image processing methods to this end.

2.3.2 The recognition module

  • See ColOr also uses computer vision techniques to process higher visual features of the images in order to produce acoustic virtual objects .
  • Actually, the authors recognize and then sonify objects that do not intrinsically produce sound, with the purpose of revealing their nature and location.
  • The recognition module is a detecting-and-tracking hybrid method [44] for learning the appearance of natural objects in unconstrained video streams [42],[43],[44], [45].
  • Firstly, there is a training phase to learn the distinct appearance of an object of interest (scale, rotation, perspective, etc.) [44], [46].
  • Afterwards, a visually impaired user that performs exploration with See ColOr is informed about the presence of learned objects in real time, if any .

3.1 Detecting a colored target

  • This study concerns the capacity of blind users to perceive, through the audio feedback, salient points (colored target) in the environment.
  • They were asked if they understood the working of the system and they finally consented to taking part in the experiment.
  • To achieve his goal, the participant is asked to spin the chair looking for the target that emits the sound of red.
  • Meaning that while spatialized sound leads the user to locate the target, the alerting system prevents him from bumping into it.
  • The authors goal is to measure the impact of the independent variable “Target Position” TP (TP ∈ [0°, 360°]) on the variables time (t) and crash (c).

3.2 Awareness of walls

  • In this experiment the authors evaluate how efficiently See ColOr provides awareness of spatial relations to the blind (i.e. perceive an entity in relation to oneself).
  • A person with spatial awareness understands that as (s)he walks towards a door, the door is becoming closer to his/her own body.
  • This understanding is all achieved during their earliest age.
  • The goal here is to be aware of the distance to the wall accurately enough to stop timely and not to collide; yet close enough to reach out and touch it .
  • If this occurs, See ColOr will additionally prevent users from having their hands occupied and liberate mental attention.

3.3 Finding a person

  • Based on surveys with visually impaired and blind users, the authors in [33] claim that face detection and recognition were suggested as highly desirable features for an assistive device.
  • The authors want to assess the effectiveness of See Color in guiding the route that leads a blind individual to meet someone nearby.
  • Visual cues revealing distinguishable features are imperative for achieving the recognition of a face (or person).
  • The target person stands still at each corner of the room for trial (4 trials).
  • The goal here is to show that See ColOr reaches the behavioral criterion of visual substitution [18].

3.4 Grasping objects

  • This study concerns the retrieving of daily objects.
  • Out of 2.35 minutes that a participant spent (in average) in this test, 1.7 minutes (70%) were used up in walking towards the target.
  • Amid et al [19] show that blind individuals using an audio-based SSD (without computer vision) take 70 hours training to start recognizing cartoon-like faces and yet, it yields no mobility aid.
  • To this end, the authors rather advocate the use of more practical (from the user’s point of view) approaches such as computer-vision-based guidance.
  • Nonetheless, unlike many, See ColOr is a utilitarian prototype (capable of functioning) that substitutes several features of vision at expense of relatively little user effort .

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

See ColOr: An extended Sensory Substitution Device.
Audio-based Sensory Substitution Devices (SSDs) perform adequately when sensing and
mapping low-level visual features into sound. Yet, their limitations become apparent when
it comes to represent high-level or conceptual information involved in vision. We introduce
See ColOr as an SSD that senses color and depth to convert them into musical instrument
sounds. In addition and unlike any other approach, our SSD extends beyond a sensing
prototype, by integrating computer vision methods to produce reliable knowledge about the
physical world (effortlessly for the user). Experiments reported in this article reveal that
our See ColOr SSD is learnable, functional, and provides easy interaction. In moderate
time, participants were able to grasp visual information from the environment out of which
they could derive: spatial awareness, ability to find someone, location of daily objects, and
skill to walk safely avoiding obstacles. Our encouraging results open a door towards
autonomous mobility of the blind.
INTRODUCTION
Multisensory perception theories state that sensory inputs in our brain are never processed
separately [1], [2]. On the contrary, information from different sensory modalities is
integrated by the nervous system enabling coherent perception of the world, which
ultimately leads to meaningful perceptual experiences [3]. As a consequence, for example,
what we see, somehow and somewhat, is always influenced by what we hear, and vice
versa. A remarkable example to this is the McGurk effect [4] that shows experimentally
how vision overwrites hearing. In light of these assumptions, it is plausible to think that a
particular sensory perception could be elicited through a sensory pathway that is not
typically responsible for it. In fact, this is the idea central to neurological behavior that
neuroscientists have termed cross-modal transfer [19], [21]. They state that due to all this
sensorial interconnection in our brain, visual-like experiences may be attained through
senses others than vision [19], [21].
Sensory substitution then, refers to the mapping of stimuli of one sensory modality into
another. This is usually done with the aim of bypassing a defective sense, so that associated
stimuli may flow through a functioning sense [5]. In general, sensory substitution argues
that when individuals go blind or deaf, they do not actually lose the ability of seeing or
hearing, rather they become incapable to convey external stimuli to the brain. When the
working of the brain is not affected, in most of the cases a person who lost the ability to
retrieve data from their eyes could still create subjective images by using data conveyed
from other sensory modalities (e.g. auditory pathway) [5].
Publihsed in Journal of Assistive Technologies, 2014, vol. 8, no.
2, pp. 77-94, which should be cited to refer ot this work.
DOI: 10.1108/JAT-08-2013-0025

Sensory substitution may be achieved using invasive methods that collect external signals
and transduce them into electrical impulses for the brain to interpret them naturally [5],
[23]. Thus, stimulation of the brain without intact sensory organs to relay the information
is possible [5], [6], [7]. However, in this work, we address non-invasive sensory substitution
devices also known as SSDs. These devices use computational interfaces to transmit
sensory information (of a substituted sense) captured by an artificial modality (artificial
sensor) to another human sense (substituting sense) [23]. In other words, they translate
sensory information in order to enable perception through another one than the originally
responsible sense [8]. This idea largely relies upon the concept of brain plasticity [7] that
denotes the self-adaptation ability of the brain to the deterioration (or absence) of a sense
[7]. This is a natural mechanism that allows people devoid of a sense to somewhat adapt
and compensate through other sensory pathways. For instance, cortical re-mapping or
reorganization happens when the brain is subject to some type of deterioration or injury [9].
Motivation
Vision is a phenomenon that entails both sensation and perception [10]. Sensation is the
low-level -biochemical and neurological- feeling of external visual information as it is
registered (sensed) by the eyes. The visual sensation alone does not imply the coherent
conception (or understanding) of external visual objects [10], [11]. Therefore, following a
sensation, perception appears as the mental process that decodes the sensory input
(sensation) to create awareness or understanding of the real-world [10], [11], [12]. In short,
we perceive the world through sensations, though we derive sense from it (vision comes into
being) only when perception takes place [10], [13]. In this work, we argue that current SSDs
have been intended to provide a substitute to sensation, while the perceptual experience
has been left mostly unattended. The underlying problem is that the human visual system
is known to be capable of 4.3x10
6
bits per second (bps) bandwidth [14], [15]. Yet, senses
intended as substitutes can hardly reach 10^4 bps at most (i.e. hearing) [14], [15]. In this
light, even though a cross-modal transfer may apply, it is hard for mapping systems to
overcome the large sensory mismatch between vision and other sensory pathways: if
hearing does not even provide room enough to convey visual sensations; actual visual
perceptions are therefore very unlikely.
Importantly though, we do not think of visual perception being unattainable through long-
term use of current SSDs. Simply, it implies a tough/long learning process that in any case,
will yield inaccurate approximations of vision, if at all. Further, we argue that any visual
perception we can achieve through hearing will always need to be reinforced, in order for
the substitution to be: practical, fast, accurate, and let users act as though they were
actually seeing [18]. Visual perception is a holistic phenomenon that emerges from complex
information unlikely to be encoded into hearing (shapes, perspectives, color, position,
distance, texture, luminance, concepts etc.) [10], [11], [12], [3], [16], [17]. In fact, ‘normal’
vision is itself constrained by top-down knowledge that produces the kind of information

that sighted individuals typically achieve without conscious effort [12], [13], [18]. Our
hypothesis is that nowadays, all this amount of data cannot be supplied efficiently in SSDs,
unless we integrate more advanced methods that lie beyond mere visual-to-audio mapping
(e.g. computer or artificial vision techniques). In this spirit, we shall not abandon the
encoding of low-level features into sound for sensory substitution. Rather, we would like to
augment such an approach with the use of computer vision and image processing to deal
with high-level information that usually surpasses the bandwidth of audio. Whether this
strategy will lead us to an SSD more learnable, practical, easy to interact with, and chiefly
functional? This is the fundamental research question underlying this work.
1. STATE OF THE ART
Back in the late 1960s, Paul Bach-y-Rita introduced the first SSD [22] prototype known as
TVSS (Tactile Visual Sensory Substitution): black-and-white camera images, instead of
going to a TV screen, were sent to a matrix of vibrators in contact with the skin of the back
of a blind participant [5]. However, first attempts to substitute vision using the auditory
pathway came later on. In 1975, Raymond M. Fish [24] used a sequence of tone bursts to
represent image pixels: vertical locations were determined by the frequency of the tone,
whereas horizontal positions were represented by the ratio of sound amplitudes presented
to each ear of the blind user. Following in 1977, T.Bower writing for the Newscientist
magazine [25], reported on several new devices that provide blind babies with sound
information about their environment using ultrasonic echolocation similar to that of a bat.
In fact, in the 1980s there was a proliferation of these ultrasound-based devices with
improved features, known as ultrasonic pathfinders [26], [27]. One of the most popular
examples is “The Sonic Guide” [26] that constantly irradiates from objects into audible
sound to reveal distances.
In 1992, the vOICe [28] was introduced as a SSD to sonify 2D gray scale images, allowing
its user to “see with sound”. The sonification algorithm presented in this work uses three
pixel parameters: horizontal position, vertical position and brightness to synthesize a
complex audio output describing the whole image (soundscape). Later in 1997, Capelle et al.
[29] proposed the implementation of a crude model of the primary visual system [29]. The
auditory representation of an image was similar to that used in “The vOICe” [28] with
distinct sinusoidal waves for each pixel. Gonzalez-Mora et al. [30] developed in 1999 a
prototype using the spatialization of sound in the three dimensional space [30]. The sound
was perceived as coming from somewhere in front of the user by means of head related
transfer functions (HRTFs). In 2000, Soundview [31] represented a single point of color as
sound scaled activated by the velocity of haptic exploration of an image on a tablet with no
haptic feedback (vibration, temperature etc.). TheVIBE [32], was later introduced in 2008
as a visual-auditory substitution system that converts video-streaming into auditory
streaming. The sound generated is a weighted summation of sinusoidal sounds produced by
virtual "sources", corresponding each to a "receptive field" in the image [32].

More recently in 2009, the Kromophone [30] and The Shelfscanning [31] were introduced.
The former takes the input from a webcam and chooses the center pixel and maps its color
into several superimposed sounds [30]. As for the latter, it was intended to empower
visually impaired individuals to shop at their own convenience from a shopping list using
computer vision methods such as object recognition, sign reading and text to speech
encoding [31]. Then in 2010, Michał Bujacz [32] presented an algorithm for sonification of
3D scenes including the use of segmented 3D scene images, personalized spatial audio and
musical sound codes. Thus, virtual sound sources originating from various scene elements
were generated [32]. In 2012, The EyeMusic [19] emerged as a tool that provides visual
information through a musical auditory experience: in about 2 seconds. Colors in a 24x40
pixel image are first segregated into red, green, blue, yellow, white, or black. Then each
color is encoded through a timbre (e.g. white = piano, blue = marimba) [19]. Also in 2012,
Rebeiro et al. [33] coined the term auditory augmented reality in reference to the
sonification of objects that do not intrinsically produce sound, with the purpose of revealing
their location to the blind. They use spatialized (3D) audio to place acoustic virtual objects
that create the illusion of being originating from real-world coordinates.
It becomes apparent in the literature that experiments on mobility/exploration using SSDs
are rather few. Their usability to substitute actual vision, therefore, remains largely
uncertain in practice. Most of the cited works are intended to translate static images into
sounds. For instance, the vOICe [28], one of the most popular SSDs, takes two seconds to
encode a 100x100 image, resulting in a complex sound that requires also time to be fully
understood. This makes it hard for it to function as a real time mobility tool. Mobility
encompasses three main tasks: understanding the near space global geometry; avoiding
obstacles; and focusing on a specific goal to reach, for instance looking for a specific door, a person, etc.
Moreover, in contrast with the vast majority of works, a key aspect of See ColOr is to
provide a substitute to color for the blind users. Likewise, approaches that account for
depth are still few. To the best of our knowledge, there is no approach other than ours,
attempting to code both color and depth into sound.
More importantly, despite the increased use of computer vision nowadays, surprisingly its
use in SSDs is rather vapid. This is curious since one would think that devices meant to
substitute natural vision should be heavily based on artificial vision. One is left with the
thought that it seems unpractical to have methods to enable computers to “see”, and not to
target it to the benefit of blind in combination with SSDs. In our view, there should be, at
least, a marked tendency to the use of robust and stable computer vision technologies to
strengthen the weakness of existing electronic aid device.
2. See ColOr: Seeing Colors with an Orchestra.

See ColOr is a multimode aid (SSD + computer vision) engineered with the following
components (Figure 1): 3D camera (ASUS Xtion PRO LIVE); Bone-phones (AfterShokz);
and 14” laptop. Occasionally, a tactile tablet (iPad) is used to enhance interaction. Users
can switch between a local and a global module as they explore the nearby environment. In
any case, an alerting system and a recognition module are always running in the
background to prevent the user from hitting obstacles, and to inform about the presence of
known persons/objects (Figure 5).
Figure1: A blind individual wearing See ColOr (photograph with permission). Note that the tactile tablet was
not subject of study in this paper. Also, the ears of the user remain uncovered due to the bone-phones.
2.1 The local module
The local module provides the user with the auditory representation of a row containing 25
points of the central part of captured image [35] (Figure 3). These points are coded into left-
right spatialized musical instrument sounds, in order to represent and emphasize the color
and location of visual entities in the environment [37], [38], [39]. The key idea is to
represent a pixel as a sound source located at a particular azimuth angle [38]. Moreover,
each emitted sound is assigned to a musical instrument and to a sound duration, based on
the color and the depth of the pixel, respectively (Figure 2). The local module allows the
user to explore a scene as he can move the head to scan it. However, since the perception of
the user is focused only on a small portion, peripheral vision and global perception become
unattainable. Having access to more than one portion of the image simultaneously would
bring many advantages. For instance, it would be possible to compare several distant
points. Note that such a task is unachievable using this local module. To rectify for this
deficiency, we introduced the global perception module that allows the user to explore
several points with the fingers.

Figures (13)
Citations
More filters

Journal ArticleDOI
TL;DR: The system uses a Microsoft Kinect sensor as a wearable device, performs face detection, and uses temporal coherence along with a simple biometric procedure to generate a sound associated with the identified person, virtualized at his/her estimated 3-D location.
Abstract: In this paper, we introduce a real-time face recognition (and announcement) system targeted at aiding the blind and low-vision people. The system uses a Microsoft Kinect sensor as a wearable device, performs face detection, and uses temporal coherence along with a simple biometric procedure to generate a sound associated with the identified person, virtualized at his/her estimated 3-D location. Our approach uses a variation of the K-nearest neighbors algorithm over histogram of oriented gradient descriptors dimensionally reduced by principal component analysis. The results show that our approach, on average, outperforms traditional face recognition methods while requiring much less computational resources (memory, processing power, and battery life) when compared with existing techniques in the literature, deeming it suitable for the wearable hardware constraints. We also show the performance of the system in the dark, using depth-only information acquired with Kinect's infrared camera. The validation uses a new dataset available for download, with 600 videos of 30 people, containing variation of illumination, background, and movement patterns. Experiments with existing datasets in the literature are also considered. Finally, we conducted user experience evaluations on both blindfolded and visually impaired users, showing encouraging results.

58 citations


Journal ArticleDOI
Shachar Maidenbaum1, Galit Buchs1, Sami Abboud1, Ori Lavi-Rotbain1  +1 moreInstitutions (1)
16 Feb 2016-PLOS ONE
TL;DR: Visual-to-audio Sensory-Substitution-Devices can increase accessibility generically by sonifying on-screen content regardless of the specific environment and offer increased accessibility without the use of expensive dedicated peripherals like electrode/vibrator arrays.
Abstract: Graphical virtual environments are currently far from accessible to blind users as their content is mostly visual. This is especially unfortunate as these environments hold great potential for this population for purposes such as safe orientation, education, and entertainment. Previous tools have increased accessibility but there is still a long way to go. Visual-to-audio Sensory-Substitution-Devices (SSDs) can increase accessibility generically by soni-fying on-screen content regardless of the specific environment and offer increased accessibility without the use of expensive dedicated peripherals like electrode/vibrator arrays. Using SSDs virtually utilizes similar skills as when using them in the real world, enabling both training on the device and training on environments virtually before real-world visits. This could enable more complex, standardized and autonomous SSD training and new insights into multisensory interaction and the visually-deprived brain. However, whether congenitally blind users, who have never experienced virtual environments, will be able to use this information for successful perception and interaction within them is currently unclear.We tested this using the EyeMusic SSD, which conveys whole-scene visual information , to perform virtual tasks otherwise impossible without vision. Congenitally blind users had to navigate virtual environments and find doors, differentiate between them based on their features (Experiment1:task1) and surroundings (Experiment1:task2) and walk through them; these tasks were accomplished with a 95% and 97% success rate, respectively. We further explored the reactions of congenitally blind users during their first interaction with a more complex virtual environment than in the previous tasks–walking down a virtual street, recognizing different features of houses and trees, navigating to cross-walks, etc. Users reacted enthusiastically and reported feeling immersed within the environment. They highlighted the potential usefulness of such environments for understanding what visual scenes are supposed to look like and their potential for complex training and suggested many future environments they wished to experience.

28 citations


Cites background from "See ColOr: an extended sensory subs..."

  • ...Accordingly, this factor is increasingly being added into other new SSDs as well in various ways [51,52]....

    [...]


Journal ArticleDOI
TL;DR: This work has developed a comprehensive and universal application with a unified, flexible, and adaptable interface to support the different conditions of PWDs and has employed an interactive smart based-location service for establishing a smart university Geographic Information System (GIS) solution.
Abstract: (1) Background: A disabled student or employee in a certain university faces a large number of obstacles in achieving his/her ordinary duties. An interactive smart search and communication application can support the people at the university campus and Science Park in a number of ways. Primarily, it can strengthen their professional network and establish a responsive eco-system. Therefore, the objective of this research work is to design and implement a unified flexible and adaptable interface. This interface supports an intensive search and communication tool across the university. It would benefit everybody on campus, especially the People with Disabilities (PWDs). (2) Methods: In this project, three main contributions are presented: (A) Assistive Technology (AT) software design and implementation (based on user- and technology-centered design); (B) A wireless sensor network employed to track and determine user’s location; and (C) A novel event behavior algorithm and movement direction algorithm used to monitor and predict users’ behavior and intervene with them and their caregivers when required. (3) Results: This work has developed a comprehensive and universal application with a unified, flexible, and adaptable interface to support the different conditions of PWDs. It has employed an interactive smart based-location service for establishing a smart university Geographic Information System (GIS) solution. This GIS solution has been based on tracking location service, mobility, and wireless sensor network technologies. (4) Conclusion: The proposed system empowered inter-disciplinary interaction between management, staff, researchers, and students, including the PWDs. Identifying the needs of the PWDs has led to the determination of the relevant requirements for designing and implementing a unified flexible and adaptable interface suitable for PWDs on the university campus.

9 citations


Journal ArticleDOI
TL;DR: In this work, an AT and smart context-aware solution has been developed to assist the people with disabilities at the workplace through the designing and the implementation of a smart unified interface to guide and assist the PWDs.
Abstract: The developments and improvements in human–computer interaction and information and communication technologies have resulted in innovative assistive technology (AT) solutions for the people with di...

7 citations


Additional excerpts

  • ...Another example based on context-aware that is adaptive to the personalized mobile learning system has been proposed by Gómez et al.80 The people-centric sensing framework for the health care of elderly and disabled people in the smart city, which focused on the context manipulation from the mobile device, emergency response using context base information, and modeling the mobile context sources as services has been implemented by Hussain et al.81 An intelligent agent that combined the nearfield communication technique with context acquisition has been proposed by Chih-Hao Lin82 to support context awareness in internet-of-things environment....

    [...]

  • ...Gomez et al.37 developed a system to provide visually impaired persons a more practical and functional sensory substitution methods (SSDs) device using computer vision and image processing methods called See ColOr....

    [...]

  • ...Gomez et al.(37) developed a system to provide visually impaired persons a more practical and functional sensory substitution methods (SSDs) device using computer vision and image processing methods called See ColOr....

    [...]


Book ChapterDOI
11 Sep 2017
TL;DR: The indoor 3D pipeline of an assistive system for visually impaired people, whose goal is to scan the environment, extract information of interest and send it to the user through haptics and sounds, is presented.
Abstract: In this paper, we present the indoor 3D pipeline of an assistive system for visually impaired people, whose goal is to scan the environment, extract information of interest and send it to the user through haptics and sounds. The particularities of indoor scenes, containing man-made objects, with many planar faces, led us to the idea of developing the 3D object recognition algorithms around a planar segmentation, based on normal vectors. The 3D pipeline starts with acquiring depth frames from a range camera and synchronized IMU data from an inertial sensor. The pre-processing stage computes normal vectors in the 3D points of the scanned environment and filters them to reduce the noise from the input data. The next stages are planar segmentation and object labeling, which divides the scene into ground, ceiling, walls and generic objects. The whole 3D pipeline works in real-time on a consumer laptop at approximately 15 fps. We describe each step of the pipeline, with the focus on the labeling stage, and present experimental results and ideas for further improvements.

7 citations


References
More filters

Journal ArticleDOI
R. E. Kalman1Institutions (1)

22,129 citations


Journal ArticleDOI
Harry McGurk1, John Macdonald1Institutions (1)
01 Dec 1976-Nature
TL;DR: The study reported here demonstrates a previously unrecognised influence of vision upon speech perception, on being shown a film of a young woman's talking head in which repeated utterances of the syllable [ba] had been dubbed on to lip movements for [ga].
Abstract: MOST verbal communication occurs in contexts where the listener can see the speaker as well as hear him. However, speech perception is normally regarded as a purely auditory process. The study reported here demonstrates a previously unrecognised influence of vision upon speech perception. It stems from an observation that, on being shown a film of a young woman's talking head, in which repeated utterances of the syllable [ba] had been dubbed on to lip movements for [ga], normal adults reported hearing [da]. With the reverse dubbing process, a majority reported hearing [bagba] or [gaba]. When these subjects listened to the soundtrack from the film, without visual input, or when they watched untreated film, they reported the syllables accurately as repetitions of [ba] or [ga]. Subsequent replications confirm the reliability of these findings; they have important implications for the understanding of speech perception.

5,154 citations


Book
Durand R. Begault1Institutions (1)
01 Jan 1994
Abstract: Technology and applications for the rendering of virtual acoustic spaces are reviewed. Chapter 1 deals with acoustics and psychoacoustics. Chapters 2 and 3 cover cues to spatial hearing and review psychoacoustic literature. Chapter 4 covers signal processing and systems overviews of 3-D sound systems. Chapter 5 covers applications to computer workstations, communication systems, aeronautics and space, and sonic arts. Chapter 6 lists resources. This TM is a reprint of the 1994 book from Academic Press.

960 citations


Journal ArticleDOI
P.B.L. Meijer1Institutions (1)
TL;DR: Computerized sampling of the system output and subsequent calculation of the approximate inverse (sound-to-image) mapping provided the first convincing experimental evidence for the preservation of visual information in sound representations of complicated images.
Abstract: An experimental system for the conversion of images into sound patterns was designed to provide auditory image representations within some of the known limitations of the human hearing systems possibly as a step towards the development of a vision substitution device for the blind. The application of an invertible (one-to-one) image-to-sound mapping ensures the preservation of visual information. The system implementation involves a pipelined special-purpose computer connected to a standard television camera. A novel design and the use of standard components have made for a low-cost portable prototype conversion system with a power dissipation suitable for battery operation. Computerized sampling of the system output and subsequent calculation of the approximate inverse (sound-to-image) mapping provided the first convincing experimental evidence for the preservation of visual information in sound representations of complicated images. >

770 citations


Journal ArticleDOI
William Gaver1Institutions (1)
TL;DR: It is argued that technical theories must be considered in the context of the uses to which they are put and help the theorist to determine what is a good approximation, the degree of formalization that is justified, the appropriate commingling of qualitative and quantitative techniques, and encourages cumulative progress through the heuristic of divide and conquer.
Abstract: There is growing interest in the use of sound to convey information in computer interfaces. The strategies employed thus far have been based on an understanding of sound that leads to either an arbitrary or metaphorical relation between the sounds used and the data to be represented. In this article, an alternative approach to the use of sound in computer interfaces is outlined, one that emphasizes the role of sound in conveying information about the world to the listener. According to this approach, auditory icons, caricatures of naturally occurring sounds, could be used to provide information about sources of data. Auditory icons provide a natural way to represent dimensional data as well as conceptual objects in a computer system. They allow categorization of data into distinct families, using a single sound. Perhaps the most important advantage of this strategy is that it is based on the way people listen to the world in their everyday lives.

680 citations


Network Information
Related Papers (5)
01 Oct 2017

Simona Caraiman, Anca Morar +8 more

Shlomi Hanassy, Shachar Maidenbaum +4 more

01 Jan 2010

Antonio F. Rodríguez-Hernández, Carlos Merino +8 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20212
20181
20172
20163
20152