scispace - formally typeset
Search or ask a question

Showing papers on "Microphone array published in 2017"


Journal ArticleDOI
TL;DR: It is found that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact.

345 citations


Proceedings ArticleDOI
05 Mar 2017
TL;DR: This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system, where a neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling.
Abstract: This paper presents an end-to-end training approach for a beamformer-supported multi-channel ASR system. A neural network which estimates masks for a statistically optimum beamformer is jointly trained with a network for acoustic modeling. To update its parameters, we propagate the gradients from the acoustic model all the way through feature extraction and the complex valued beamforming operation. Besides avoiding a mismatch between the front-end and the back-end, this approach also eliminates the need for stereo data, i.e., the parallel availability of clean and noisy versions of the signals. Instead, it can be trained with real noisy multi-channel data only. Also, relying on the signal statistics for beamforming, the approach makes no assumptions on the configuration of the microphone array. We further observe a performance gain through joint training in terms of word error rate in an evaluation of the system on the CHiME 4 dataset.

124 citations


Journal ArticleDOI
TL;DR: A first proof of concept for EEG-informed attended speaker extraction and denoising is provided, showing that AAD-based speaker extraction from microphone array recordings is feasible and robust, even in noisy acoustic environments, and without access to the clean speech signals to perform EEG-based AAD.
Abstract: Objective : We aim to extract and denoise the attended speaker in a noisy two-speaker acoustic scenario, relying on microphone array recordings from a binaural hearing aid, which are complemented with electroencephalography (EEG) recordings to infer the speaker of interest. Methods : In this study, we propose a modular processing flow that first extracts the two speech envelopes from the microphone recordings, then selects the attended speech envelope based on the EEG, and finally uses this envelope to inform a multichannel speech separation and denoising algorithm. Results : Strong suppression of interfering (unattended) speech and background noise is achieved, while the attended speech is preserved. Furthermore, EEG-based auditory attention detection (AAD) is shown to be robust to the use of noisy speech signals. Conclusions : Our results show that AAD-based speaker extraction from microphone array recordings is feasible and robust, even in noisy acoustic environments, and without access to the clean speech signals to perform EEG-based AAD. Significance : Current research on AAD always assumes the availability of the clean speech signals, which limits the applicability in real settings. We have extended this research to detect the attended speaker even when only microphone recordings with noisy speech mixtures are available. This is an enabling ingredient for new brain–computer interfaces and effective filtering schemes in neuro-steered hearing prostheses. Here, we provide a first proof of concept for EEG-informed attended speaker extraction and denoising.

119 citations


Journal ArticleDOI
03 Nov 2017-Sensors
TL;DR: The design and implementation of a UAV-embedded microphone array system for sound source localization in outdoor environments and results confirmed that the SMAS provides highly accurate localization, water resistance, prompt assembly, stable wireless communication, and intuitive information for observers and operators.
Abstract: In search and rescue activities, unmanned aerial vehicles (UAV) should exploit sound information to compensate for poor visual information. This paper describes the design and implementation of a UAV-embedded microphone array system for sound source localization in outdoor environments. Four critical development problems included water-resistance of the microphone array, efficiency in assembling, reliability of wireless communication, and sufficiency of visualization tools for operators. To solve these problems, we developed a spherical microphone array system (SMAS) consisting of a microphone array, a stable wireless network communication system, and intuitive visualization tools. The performance of SMAS was evaluated with simulated data and a demonstration in the field. Results confirmed that the SMAS provides highly accurate localization, water resistance, prompt assembly, stable wireless communication, and intuitive information for observers and operators.

83 citations


Proceedings ArticleDOI
05 Mar 2017
TL;DR: The core of the algorithm estimates a time-frequency mask which represents the target speech and use masking-based beamforming to enhance corrupted speech and propose a masked-based post-filter to further suppress the noise in the output of beamforming.
Abstract: We propose a speech enhancement algorithm based on single- and multi-microphone processing techniques. The core of the algorithm estimates a time-frequency mask which represents the target speech and use masking-based beamforming to enhance corrupted speech. Specifically, in single-microphone processing, the received signals of a microphone array are treated as individual signals and we estimate a mask for the signal of each microphone using a deep neural network (DNN). With these masks, in multi-microphone processing, we calculate a spatial covariance matrix of noise and steering vector for beamforming. In addition, we propose a masking-based post-filter to further suppress the noise in the output of beamforming. Then, the enhanced speech is sent back to DNN for mask re-estimation. When these steps are iterated for a few times, we obtain the final enhanced speech. The proposed algorithm is evaluated as a frontend for automatic speech recognition (ASR) and achieves a 5.05% average word error rate (WER) on the real environment test set of CHiME-3, outperforming the current best algorithm by 13.34%.

71 citations


Journal ArticleDOI
TL;DR: The problem of localizing and quantifying acoustic sources from a set of acoustic measurements has been addressed, in the last decades, by a huge number of scientists, from different communities.
Abstract: The problem of localizing and quantifying acoustic sources from a set of acoustic measurements has been addressed, in the last decades, by a huge number of scientists, from different communities (s...

63 citations


Journal ArticleDOI
21 Jul 2017
TL;DR: High-resolution-CLEAN-SC takes advantage of the fact that source components can likewise be derived from points at some distance from the peak, as long as these ‘source markers’ are on the main lobe of the point spread function.
Abstract: In this article, a high-resolution extension of CLEAN-SC is proposed: high-resolution-CLEAN-SC Where CLEAN-SC uses peak sources in ‘dirty maps’ to define so-called source components, high-resolution-CLEAN-SC takes advantage of the fact that source components can likewise be derived from points at some distance from the peak, as long as these ‘source markers’ are on the main lobe of the point spread function This is very useful when sources are closely spaced together, such that their point spread functions interfere Then, alternative markers can be sought in which the relative influence by point spread functions of other source locations is minimised For those markers, the source components agree better with the actual sources, which allows for better estimation of their locations and strengths This article outlines the theory needed to understand this approach and discusses applications to 2D and 3D microphone array simulations with closely spaced sources An experimental validation was performed with two closely spaced loudspeakers in an anechoic chamber

60 citations


Journal ArticleDOI
TL;DR: Object-oriented design based on Python allows for easy-to-use scripting and graphical user interfaces, the practical combination with other data handling and scientific computing libraries, and the possibility to extend the software by implementing new processing methods with minimal effort.

60 citations



Journal ArticleDOI
TL;DR: The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state of the art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources.
Abstract: Direction of arrival DOA estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented that operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses pseudointensity vectors PIVs and works well in acoustic environments where only one sound source is active at any time. The second uses subspace pseudointensity vectors SSPIVs and is targeted at environments where multiple simultaneous soures and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise, and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state of the art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated by using speech recordings in a real acoustic environment.

56 citations


Patent
17 Jan 2017
TL;DR: An Internet of Things (IoT) sport device includes a light source; sensors including a camera and a microphone array; a processor coupled to the light source and the sensor; and a wireless transceiver coupled to a processor as discussed by the authors.
Abstract: An Internet of Thing (IoT) sport device includes a light source; sensors including a camera and a microphone array; a processor coupled to the light source and the sensor; and a wireless transceiver coupled to the processor.

Journal ArticleDOI
TL;DR: In this article, the authors proposed to use the time-frequency processing approach, which formulates a spatial filter that can enhance a target direction based on local direction of arrival estimates at individual timefrequency bins.
Abstract: When a micro aerial vehicle (MAV) captures sounds emitted by a ground or aerial source, its motors and propellers are much closer to the microphone(s) than the sound source, thus leading to extremely low signal-to-noise ratios (SNR), e.g., −15 dB. While microphone-array techniques have been investigated intensively, their application to MAV-based ego-noise reduction has been rarely reported in the literature. To fill this gap, we implement and compare three types of microphone-array algorithms to enhance the target sound captured by an MAV. These algorithms include a recently emerged technique, time-frequency spatial filtering, and two well-known techniques, beamforming and blind source separation. In particular, based on the observation that the target sound and the ego-noise usually have concentrated energy at sparsely isolated time-frequency bins, we propose to use the time-frequency processing approach, which formulates a spatial filter that can enhance a target direction based on local direction of arrival estimates at individual time-frequency bins. By exploiting the time-frequency sparsity of the acoustic signal, this spatial filter works robustly for sound enhancement in the presence of strong ego-noise. We analyze in details the three techniques and conduct a comparative evaluation with real-recorded MAV sounds. Experimental results show the superiority of blind source separation and time-frequency filtering in low-SNR scenarios.

Journal ArticleDOI
TL;DR: In this paper, a non-contacting measurement technique based on acoustic monitoring is proposed to detect cracks or damage within a structure by observing sound radiation using a single microphone or a b...
Abstract: This article proposes a non-contacting measurement technique based on acoustic monitoring to detect cracks or damage within a structure by observing sound radiation using a single microphone or a b...

Journal ArticleDOI
TL;DR: A multiple sound source localization and counting method based on a relaxed sparsity of speech signal that achieves a higher accuracy of DOA estimation and source counting compared with the existing techniques has higher efficiency and lower complexity, which makes it suitable for real-time applications.
Abstract: In this work, a multiple sound source localization and counting method based on a relaxed sparsity of speech signal is presented. A soundfield microphone is adopted to overcome the redundancy and complexity of microphone array in this paper. After establishing an effective measure, the relaxed sparsity of speech signals is investigated. According to this relaxed sparsity, we can obtain an extensive assumption that “single-source” zones always exist among the soundfield microphone signals, which is validated by statistical analysis. Based on “single-source” zone detecting, the proposed method jointly estimates the number of active sources and their corresponding DOAs by applying a peak searching approach to the normalized histogram of estimated DOA. The cross distortions caused by multiple simultaneously occurring sources are solved by estimating DOA in these “single-source” zones. The evaluations reveal that the proposed method achieves a higher accuracy of DOA estimation and source counting compared with the existing techniques. Furthermore, the proposed method has higher efficiency and lower complexity, which makes it suitable for real-time applications.

Journal ArticleDOI
TL;DR: Results indicate that the suggested filter is beneficial for restoring the timbral composition of order-truncated binaural signals, while conserving, and even improving, some spatial properties of the signal.
Abstract: The synthesis of binaural signals from spherical microphone array recordings has been recently proposed. The limited spatial resolution of the reproduced signal due to order-limited reproduction has been previously investigated perceptually, showing spatial perception ramifications, such as poor source localization and limited externalization. Furthermore, this spatial order limitation also has a detrimental effect on the frequency content of the signal and its perceived timbre, due to the rapid roll-off at high frequencies. In this paper, the underlying causes of this spectral roll-off are described mathematically and investigated numerically. A digital filter that equalizes the frequency spectrum of a low spatial order signal is introduced and evaluated. A comprehensive listening test was conducted to study the influence of the filter on the perception of the reproduced sound. Results indicate that the suggested filter is beneficial for restoring the timbral composition of order-truncated binaural signals, while conserving, and even improving, some spatial properties of the signal.

Proceedings ArticleDOI
01 Mar 2017
TL;DR: This study reports gesture recognition accuracies in the range 64.5–96.9%, based on the number of gestures to be recognized, and shows that ultrasound sensors have the potential to become low power, low computation, and low cost alternatives to existing optical sensors.
Abstract: In this study, we explore the possibility of recognizing hand gestures using ultrasonic depth imaging. The ultrasonic device consists of a single piezoelectric transducer and an 8 - element microphone array. Using carefully designed transmit pulse, and a combination of beamforming, matched filtering, and cross-correlation methods, we construct ultrasound images with depth and intensity pixels. Thereafter, we use a combined Convolutional (CNN) and Long Short-Term Memory (LSTM) network to recognize gestures from the ultrasound images. We report gesture recognition accuracies in the range 64.5–96.9%, based on the number of gestures to be recognized, and show that ultrasound sensors have the potential to become low power, low computation, and low cost alternatives to existing optical sensors.

Proceedings ArticleDOI
13 Dec 2017
TL;DR: This paper addresses online outdoor sound source localization using a microphone array embedded in an unmanned aerial vehicle (UAV) to cope with trade-off between latency and noise robustness, and develops data compression based on free lossless audio codec extended to support a 16 ch audio data stream via UDP and a water-resistant microphone array.
Abstract: This paper addresses online outdoor sound source localization using a microphone array embedded in an unmanned aerial vehicle (UAV). In addition to sound source localization, sound source enhancement and robust communication method are also described. This system is one instance of deployment of our continuously developing open source software for robot audition called HARK (Honda Research Institute Japan Audition for Robots with Kyoto University). To improve the robustness against outdoor acoustic noise, we propose to combine two sound source localization methods based on MUSIC (multiple signal classification) to cope with trade-off between latency and noise robustness. The standard Eigenvalue decomposition based MUSIC (SEVD-MUSIC) has smaller latency but less noise robustness, whereas the incremental generalized singular value decomposition based MUSIC (iGSVD-MUSIC) has higher noise robustness but larger latency. A UAV operator can use an appropriate method according to the situation. A sound enhancement method called online robust principal component analysis (ORPCA) enables the operator to detect a target sound source more easily. To improve the stability of wireless communication, and robustness of the UAV system against weather changes, we developed data compression based on free lossless audio codec (FLAC) extended to support a 16 ch audio data stream via UDP, and developed a water-resistant microphone array. The resulting system successfully worked in an outdoor search and rescue task in ImPACT Tough Robotics Challenge in November 2016.

Proceedings ArticleDOI
05 Mar 2017
TL;DR: This paper proposes a novel approach for sound source tracking that constructively exploits the spatial diversity of a microphone array installed in a moving robot using expectation-maximization (EM) approaches and Bayesian approaches.
Abstract: Intuitive spoken dialogues are a prerequisite for human-robot interaction. In many practical situations, robots must be able to identify and focus on sources of interest in the presence of interfering speakers. Techniques such as spatial filtering and blind source separation are therefore often used, but rely on accurate knowledge of the source location. In practice, sound emitted in enclosed environments is subject to reverberation and noise. Hence, sound source localization must be robust to both diffuse noise due to late reverberation, as well as spurious detections due to early reflections. For improved robustness against reverberation, this paper proposes a novel approach for sound source tracking that constructively exploits the spatial diversity of a microphone array installed in a moving robot. In previous work, we developed speaker localization approaches using expectation-maximization (EM) approaches and using Bayesian approaches. In this paper we propose to combine the EM and Bayesian approach in one framework for improved robustness against reverberation and noise.

Journal ArticleDOI
TL;DR: A Kinect microphone array-based method for the voice-based control of humanoid robot exhibitions through speech and speaker recognition, using a support vector machine, a Gaussian mixture model, and dynamic time warping for speaker verification, speaker identification, and speech recognition.

Journal ArticleDOI
TL;DR: A database of acoustic radiation patterns was recorded, modeled, and analyzed for 41 modern or authentic orchestral musical instruments and can be used both for studying the radiation of musical instruments itself and for the implementation of radiation patterns in room acoustical simulations and auralization in order to obtain a spatial excitation of the room closer to reality.
Abstract: A database of acoustic radiation patterns was recorded, modeled, and analyzed for 41 modern or authentic orchestral musical instruments. The generation of this database included recordings of each instrument over the entire chromatic tone range in an anechoic chamber using a surrounding spherical microphone array. Acoustic source centering was applied in order to align the acoustic center of the sound source to the physical center of the microphone array. The acoustic radiation pattern is generated in the spherical harmonics domain at each harmonic partial of each played tone. An analysis of the acoustic radiation pattern complexity has been performed in terms of the number of excitation points using the centering algorithm. The database can be used both for studying the radiation of musical instruments itself, as well as for the implementation of radiation patterns in room acoustical simulations and auralization in order to obtain a spatial excitation of the room closer to reality.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an acoustic-based diagnosis method for rotating machines, where a microphone array is used to record the acoustic field radiated by the machine and the recorded signals are processed through a fault detection algorithm allowing the identification of the failing part.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: The passive sound localization and classification system is designed and implemented and can detect the acoustic signature of power tools and the effectiveness of the system to be used as an early warning system to detect misuse of machinery is demonstrated.
Abstract: For a wide range of applications in industry, it is sometimes necessary to perform acoustic source localization. In this paper, a passive sound localization and classification system is designed and implemented. Each sensor consists of a microphone array which is used to detect the direction-of-arrival (DoA) of an acoustic signal. Multiple DoA sensors can be combined to form a wireless sensor network. The system can detect the acoustic signature of power tools and the effectiveness of the system to be used as an early warning system to detect misuse of machinery is demonstrated. It is shown that the system can detect the DoA of an acoustic signal with an overall mean estimation error of 7° and can correctly classify the signal source with a classification rate of 71.5%.

Proceedings ArticleDOI
05 Mar 2017
TL;DR: An audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sensor platform composed of a camera and a small microphone array that outperforms both the individual modalities and a classical approach with fixed parameters in terms of tracking accuracy.
Abstract: We propose an audio-visual fusion algorithm for 3D speaker tracking from a localised multi-modal sensor platform composed of a camera and a small microphone array. After extracting audio-visual cues from individual modalities we fuse them adaptively using their reliability in a particle filter framework. The reliability of the audio signal is measured based on the maximum Global Coherence Field (GCF) peak value at each frame. The visual reliability is based on colour-histogram matching with detection results compared with a reference image in the RGB space. Experiments on the AV16.3 dataset show that the proposed adaptive audio-visual tracker outperforms both the individual modalities and a classical approach with fixed parameters in terms of tracking accuracy.

Journal ArticleDOI
TL;DR: A method for estimating the reliability of microphone array algorithms using Monte Carlo simulations to show not only that the performance of a method depends on the given source distribution, but also that the methods differ in terms of their sensitivity to imperfect input data.

Proceedings ArticleDOI
05 Mar 2017
TL;DR: Experimental results with real-recorded MAV ego-noise show the superiority of the proposed time-frequency processing framework over the state of the art in performing source localization robustly.
Abstract: We address the problem of sound source localization with a microphone array mounted on a micro aerial vehicle (MAV). Due to the noise generated by motors and propellers, this scenario is characterized by extremely low signal-to-noise ratios (SNR). Based on the observation that the energy of MAV sound recordings is usually concentrated at isolated time-frequency bins, we propose a time-frequency processing framework to address this problem. We first estimate the direction of arrival of the sound at individual time-frequency bins. Then we formulate a set of spatially informed filters pointing at candidate directions in the search space. The output of the filtering tends to present high non-Gaussianity when the spatial filter is steered towards the target sound source. Finally, by measuring the non-Gaussianity of the spatial filtering outputs we build a spatial likelihood function from which we estimate the direction of the target sound. Experimental results with real-recorded MAV ego-noise show the superiority of the proposed method over the state of the art in performing source localization robustly.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an approach to estimate the 3D positions of acoustic reflectors given room impulse responses (RIRs) by using a single loudspeaker and using the ellipsoid tangent sample consensus (ETSAC) algorithm.
Abstract: Acoustic reflector localization is an important issue in audio signal processing, with direct applications in spatial audio, scene reconstruction, and source separation. Several methods have recently been proposed to estimate the 3-D positions of acoustic reflectors given room impulse responses (RIRs). In this paper, we categorize these methods as “image-source reversion,” which localizes the image source before finding the reflector position, and “direct localization,” which localizes the reflector without intermediate steps. We present five new contributions. First, an onset detector, called the clustered dynamic programing projected phase-slope algorithm, is proposed to automatically extract the time of arrival for early reflections within the RIRs of a compact microphone array. Second, we propose an image-source reversion method that uses the RIRs from a single loudspeaker. It is constructed by combining an image source locator (the image source direction and range (ISDAR) algorithm), and a reflector locator (using the loudspeaker-image bisection (LIB) algorithm). Third, two variants of it, exploiting multiple loudspeakers, are proposed. Fourth, we present a direct localization method, the ellipsoid tangent sample consensus (ETSAC), exploiting ellipsoid properties to localize the reflector. Finally, systematic experiments on simulated and measured RIRs are presented, comparing the proposed methods with the state-of-the-art. ETSAC generates errors lower than the alternative methods compared through our datasets. Nevertheless, the ISDAR-LIB combination performs well and has a run time 200 times faster than ETSAC.

Journal ArticleDOI
TL;DR: In this article, experiments with three different axial fans, featuring backward-skewed, unskewed and forward skewed blades, were conducted in a standardized fan test chamber.
Abstract: Microphone arrays can be used to detect sound sources on rotating machinery. For this study, experiments with three different axial fans, featuring backward-skewed, unskewed, and forward-skewed blades, were conducted in a standardized fan test chamber. The measured data are processed using the virtual rotating array method. Subsequent application of beamforming and deconvolution in the frequency domain allows the localization and quantification of separate sources, as appear at different regions on the blades. Evaluating broadband spectra of the leading and trailing edges of the blades, phenomena governing the acoustic characteristics of the fans at different operating points are identified. This enables a detailed discussion of the influence of the blade design on the radiated noise.

Journal ArticleDOI
Jørgen Hald1
TL;DR: A method for removal of incoherent noise contamination from the CSM diagonal as a semidefinite program, which is a convex optimization problem that can be solved very efficiently and with guaranteed convergence properties.
Abstract: Measured cross-spectral matrices (CSMs) from a microphone array will in some cases be contaminated by severe incoherent noise signals in the individual channels. A typical example is flow noise generated in the individual microphones when measuring in a wind tunnel. Assuming stationary signals and performing long-time averaging, the contamination will be concentrated on the CSM diagonal. When the CSM is used for traditional frequency-domain beamforming, diagonal removal (DR) will avoid use of the diagonal. DR is effective at suppressing the contamination effects, but it also has some side effects. With other beamforming algorithms and in connection with acoustic holography, however, the diagonal of the CSM is needed. The present paper describes a method for removal of incoherent noise contamination from the CSM diagonal. The method formulates the problem as a semidefinite program, which is a convex optimization problem that can be solved very efficiently and with guaranteed convergence properties. A first numerical study investigates the question, whether the semidefinite program formulation will provide in all cases the desired output. A second numerical study investigates the limitations introduced by off-diagonal noise contributions due to finite averaging time. The results of that study are backed up by results from a practical measurement.

Proceedings ArticleDOI
Teng Han1, Khalad Hasan1, Keisuke Nakamura2, Randy Gomez2, Pourang Irani1 
20 Oct 2017
TL;DR: The algorithm, which is described, adopts from the MUltiple SIgnal Classification (MUSIC) technique, that enables robust localization and classification of the acoustics when the microphones are required to be placed at close proximity is described.
Abstract: We present SoundCraft, a smartwatch prototype embedded with a microphone array, that localizes angularly, in azimuth and elevation, acoustic signatures: non-vocal acoustics that are produced using our hands. Acoustic signatures are common in our daily lives, such as when snapping or rubbing our fingers, tapping on objects or even when using an auxiliary object to generate the sound. We demonstrate that we can capture and leverage the spatial location of such naturally occurring acoustics using our prototype. We describe our algorithm, which we adopt from the MUltiple SIgnal Classification (MUSIC) technique [31], that enables robust localization and classification of the acoustics when the microphones are required to be placed at close proximity. SoundCraft enables a rich set of spatial interaction techniques, including quick access to smartwatch content, rapid command invocation, in-situ sketching, and also multi-user around device interaction. Via a series of user studies, we validate SoundCraft's localization and classification capabilities in non-noisy and noisy environments.

Journal ArticleDOI
TL;DR: The proposed spatial cepstrum does not require the positions of the microphones and is robust against the synchronization mismatch of channels, thus ensuring its suitability for use with a distributed microphone array.
Abstract: In this paper, with the aim of using the spatial information obtained from a distributed microphone array employed for acoustic scene analysis, we propose a robust and efficient method, which is called the spatial cepstrum. In our approach, similarly to the cepstrum, which is widely used as a spectral feature, the logarithm of the amplitude in multichannel observation is converted to a feature vector by a linear orthogonal transformation. This linear orthogonal transformation is achieved by principal component analysis PCA in general. Moreover, we also show that for a circularly symmetric microphone arrangement with an isotropic sound field, PCA is identical to the inverse discrete Fourier transform and the spatial cepstrum exactly corresponds to the cepstrum. The proposed approach does not require the positions of the microphones and is robust against the synchronization mismatch of channels, thus ensuring its suitability for use with a distributed microphone array. Experimental results obtained using actual environmental sounds verify the validity of our approach even when a smaller feature dimension than the original one is used, which is achieved by dimensionality reduction through PCA. Additionally, experimental results also indicate that the robustness of the proposed method is satisfactory for observations that have the synchronization mismatch of channels.