scispace - formally typeset
Book ChapterDOI

Reduction of Computational Cost Using Two-Stage Deep Neural Network for Training for Denoising and Sound Source Identification

TLDR
This paper addresses reduction of computational cost in training of a Deep Neural Network (DNN) for sound identification using highly noise-contaminated sound recorded with a microphone array embedded in an Unmanned Aerial Vehicle (UAV), aiming at people’s voice detection quickly and widely in a disastrous situation.
Abstract
This paper addresses reduction of computational cost in training of a Deep Neural Network (DNN), in particular, for sound identification using highly noise-contaminated sound recorded with a microphone array embedded in an Unmanned Aerial Vehicle (UAV), aiming at people’s voice detection quickly and widely in a disastrous situation. It is known that a DNN training method called end-to-end training shows high performance, since it uses a huge neural network with high non-linearity which is trained with a large amount of raw input signals without preprocessing. Its computational cost is, however, expensive due to the high complexity of the neural network. Therefore, we propose two-stage DNN training using two separately-trained networks; denoising of sound sources and sound source identification. Since the huge network is divided into two smaller networks, the complexity of the networks is expected to decrease and each of them can consider a specific model of denoising and identification. This results in faster convergence and computational cost reduction in DNN training. Preliminary results showed that only 71 % of training time was necessary with the proposed two staged network, while maintaining the accuracy of sound source identification, compared to end-to-end training using noisy acoustic signals recorded with an 8 ch circular microphone array embedded in a UAV.

read more

Citations
More filters
Journal ArticleDOI

Design of UAV-Embedded Microphone Array System for Sound Source Localization in Outdoor Environments

TL;DR: The design and implementation of a UAV-embedded microphone array system for sound source localization in outdoor environments and results confirmed that the SMAS provides highly accurate localization, water resistance, prompt assembly, stable wireless communication, and intuitive information for observers and operators.
Proceedings ArticleDOI

Development of microphone-array-embedded UAV for search and rescue task

TL;DR: This paper addresses online outdoor sound source localization using a microphone array embedded in an unmanned aerial vehicle (UAV) to cope with trade-off between latency and noise robustness, and develops data compression based on free lossless audio codec extended to support a 16 ch audio data stream via UDP and a water-resistant microphone array.
Book ChapterDOI

Recent R&D technologies and future prospective of flying robot in tough robotics challenge

TL;DR: This chapter describes firstly the definition of drones and recent trends and the important functions of the search and rescue flying robot and consists of an overview of R&D technologies of flying robot in Tough Robotics Challenge and a technical and general discussion about a future prospective ofFlying robot including the real disaster survey and technical issues.
References
More filters

Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising 1 criterion

P. Vincent
TL;DR: This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations.
Journal Article

Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

TL;DR: Denoising autoencoders as mentioned in this paper are trained locally to denoise corrupted versions of their inputs, which is a straightforward variation on the stacking of ordinary autoencoder.
Journal ArticleDOI

A Scale for the Measurement of the Psychological Magnitude Pitch

TL;DR: A subjective scale for the measurement of pitch was constructed from determinations of the half-value of pitches at various frequencies as mentioned in this paper, which differs from both the musical scale and the frequency scale, neither of which is subjective.
Book

Automatic Speech Recognition: A Deep Learning Approach

Dong Yu, +1 more
TL;DR: This book summarizes the recent advancement in the field of automatic speech recognition with a focus on discriminative and hierarchical models and presents insights and theoretical foundation of a series of recent models such as conditional random field, semi-Markov and hidden conditionalrandom field, deep neural network, deep belief network, and deep stacking models for sequential learning.

What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation

TL;DR: The convergence of the backpropagation algorithm with respect to a) the complexity of the required function approximatio n, b) the size of the network in relation to the size required for an optimal solution, and c) the degree of noise in the training data is investigated.
Related Papers (5)