TL;DR: A family of new algorithms for compression of NNs is presented based on Compressive Sampling (CS) theory, which makes it possible to find a sparse structure for NNs, and then the designed neural network is compressed by using CS.
Abstract: Microphone arrays are today employed to specify the sound source locations in numerous real time applications such as speech processing in large rooms or acoustic echo cancellation. Signal sources may exist in the near field or far field with respect to the microphones. Current Neural Networks (NNs) based source localization approaches assume far field narrowband sources. One of the important limitations of these NN-based approaches is making balance between computational complexity and the development of NNs; an architecture that is too large or too small will affect the performance in terms of generalization and computational cost. In the previous analysis, saliency subject has been employed to determine the most suitable structure, however, it is time-consuming and the performance is not robust. In this paper, a family of new algorithms for compression of NNs is presented based on Compressive Sampling (CS) theory. The proposed framework makes it possible to find a sparse structure for NNs, and then the designed neural network is compressed by using CS. The key difference between our algorithm and the state-of-the-art techniques is that the mapping is continuously done using the most effective features; therefore, the proposed method has a fast convergence. The empirical work demonstrates that the proposed algorithm is an effective alternative to traditional methods in terms of accuracy and computational complexity.
In the sound source localization techniques, location of the source has to be estimated automatically by calculating the direction of the received signal [1] .
Feature extraction is the process of selection of the useful data for estimation of DOA.
The important key insight is the use of the instantaneous crosspower spectrum at each pair of sensors.
After this step the authors have compressed the neural network that is designed with these feature vectors.
The next section presents a review of techniques for sound source localization.
II. SOUND SOURCE LOCALIZATION
The assumption of far field sources remains true while the distance between source and reference microphone is larger than [2] fig.
And D is the microphone array length.
So, the time delay of the received signal between the reference microphone and the − ℎ microphone would be [15] : EQUATION.
Therefore, is the amount of time that the signal traverses the distance between any two neighboring microphones, Fig. 1 EQUATION where, r is the distance between source and the first microphone [15] .
III. FEATURE SELECTION
The aim of this section is to compute the feature vectors from the array data and use the MLP (Multi Layer Perceptron) approximation property to map the feature vectors to the corresponding DOA, as shown in Fig. 3 [6] .
The authors summarized their algorithm for computing a real-valued feature vector of length (2( − 1) + 1) , for dominant frequencies and M sensors below: Preprocessing algorithm for computing a real-valued feature vector: 1. Calculate the -point FFT of the signal at each sensor.
In conclusion, their purpose is to design a neural network with least number of hidden neurons (or weights) that has the minimum increase in error given by‖ − ‖.
This problem is equivalent to finding which most of its rows are zeros.
Comparing these equations with (7) the authors can conclude that these minimization problems can be written as CS problems.
VI. RESULTS AND DISCUSSION
As mentioned before, assuming that the received speech signals are modeled with 10 dominant frequencies, the authors have trained a two layer Perceptron neural network with 128 neurons in hidden layer and trained it with feature vectors that are obtained with CS from the cross-power spectrum of the received microphone signals.
After computing network weights the authors tried to compress network with their algorithms.
With these outputs the authors can infer that CS algorithms are faster than other algorithms and have smaller error in compare with other algorithms.
This means that, According to the number of Measurement vectors, the algorithm that uses single-measurement vector (SMV) is faster than another algorithm that uses multiple-measurement vector (MMV) but its achieve error is not smaller.
VII. CONCLUSION
Particularly, using the pursuit and greedy methods in CS, a compressing methods for NNs has been presented.
The key difference between their algorithm and previous techniques is that the authors focus on the remaining elements of neural networks; their method has a quick convergence.
The simulation results, demonstrates that their algorithm is an effective alternative to traditional methods in terms of accuracy and computational complexity.
Results revealed this fact that the proposed algorithm could decrease the computational complexity while the performance is increased.
TL;DR: It is shown how DNN-based algorithm significantly outperforms the state-of-the-art approaches evaluated on the DIRHA dataset, providing an average localization error expressed in terms of Root Mean Square Error (RMSE), equal to 324 mm and 367 mm for the Simulated and the Real subsets.
Abstract: In the field of human speech capturing systems, a fundamental role is played by the source localization algorithms. In this paper a Speaker Localization algorithm (SLOC) based on Deep Neural Networks (DNN) is evaluated and compared with state-of-the art approaches. The speaker position in the room under analysis is directly determined by the DNN, leading the proposed algorithm to be fully data-driven. Two different neural network architectures are investigated: the Multi Layer Perceptron (MLP) and Convolutional Neural Networks (CNN). GCC-PHAT (Generalized Cross Correlation-PHAse Transform) Patterns, computed from the audio signals captured by the microphone are used as input features for the DNN. In particular, a multi-room case study is dealt with, where the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested by means of the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In detail, the focus goes to speaker localization task in two distinct neighboring rooms. As term of comparison, two algorithms proposed in literature for the addressed applicative context are evaluated, the Crosspower Spectrum Phase Speaker Localization (CSP-SLOC) and the Steered Response Power using the Phase Transform speaker localization (SRP-SLOC). Besides providing an extensive analysis of the proposed method, the article shows how DNN-based algorithm significantly outperforms the state-of-the-art approaches evaluated on the DIRHA dataset, providing an average localization error, expressed in terms of Root Mean Square Error (RMSE), equal to 324 mm and 367 mm, respectively, for the Simulated and the Real subsets.
TL;DR: A source localization algorithm based on a sparse Fast Fourier Transform-based feature extraction method and spatial sparsity which leads to a sparse representation of audio signals and a significant reduction in the dimensionality of the signals.
Abstract: In this paper, we propose a source localization algorithm based on a sparse Fast Fourier Transform (FFT)-based feature extraction method and spatial sparsity. We represent the sound source positions as a sparse vector by discretely segmenting the space with a circular grid. The location vector is related to microphone measurements through a linear equation, which can be estimated at each microphone. For this linear dimensionality reduction, we have utilized a Compressive Sensing (CS) and two-level FFT-based feature extraction method which combines two sets of audio signal features and covers both short-time and long-time properties of the signal. The proposed feature extraction method leads to a sparse representation of audio signals. As a result, a significant reduction in the dimensionality of the signals is achieved. In comparison to the state-of-the-art methods, the proposed method improves the accuracy while the complexity is reduced in some cases.
15 citations
Cites background or methods or result from "A compressive sensing based compres..."
...In next step, to evaluate the proposed sound source localization system, we have compared its performance with those for two of the previously-reported CS-based target localization algorithms, namely DTL [8] and CSNN [9]....
[...]
...The Compressive Sensing-based Neural Network (CSNN) method [9] employs a neural network for the calculation of spectral feature vectors in each microphone....
[...]
...In [9], authors have tried to reduce computational complexity by employing a feature extraction process that selects useful data for estimation of DOA....
[...]
...Comparison between the localization performance of the proposed system, CSNN [9] and DTL algorithm [8] in the case of two sound sources and two microphones....
TL;DR: A new modeling and analysis framework for the multipatient positioning in a wireless body area network (WBAN) which exploits the spatial sparsity of patients and a sparse fast Fourier transform (FFT)-based feature extraction mechanism for monitoring of Patients and for reporting the movement tracking to a central database server containing patient vital information is presented.
Abstract: Recent achievements in wireless technologies have opened up enormous opportunities for the implementation of ubiquitous health care systems in providing rich contextual information and warning mechanisms against abnormal conditions. This helps with the automatic and remote monitoring/tracking of patients in hospitals and facilitates and with the supervision of fragile, elderly people in their own domestic environment through automatic systems to handle the remote drug delivery. This paper presents a new modeling and analysis framework for the multipatient positioning in a wireless body area network (WBAN) which exploits the spatial sparsity of patients and a sparse fast Fourier transform (FFT)-based feature extraction mechanism for monitoring of patients and for reporting the movement tracking to a central database server containing patient vital information. The main goal of this paper is to achieve a high degree of accuracy and resolution in the patient localization with less computational complexity in the implementation using the compressive sensing theory. We represent the patients' positions as a sparse vector obtained by the discrete segmentation of the patient movement space in a circular grid. To estimate this vector, a compressive-sampling-based two-level FFT (CS-2FFT) feature vector is synthesized for each received signal from the biosensors embedded on the patient's body at each grid point. This feature extraction process benefits in the combination of both short-time and long-time properties of the received signals. The robustness of the proposed CS-2FFT-based algorithm in terms of the average positioning error is numerically evaluated using the realistic parameters in the IEEE 802.15.6-WBAN standard in the presence of additive white Gaussian noise. Due to the circular grid pattern and the CS-2FFT feature extraction method, the proposed scheme represents a significant reduction in the computational complexity, while improving the level of the resolution and the localization accuracy when compared to some classical CS-based positioning algorithms.
9 citations
Cites methods from "A compressive sensing based compres..."
...Localization performance of (a) the proposed scheme, (b) DTL algorithm in [10], and (c) CSNN algorithm in [23], for three patients and six receiver nodes....
[...]
...We compare the performance of the CS-2FFT-based scheme with that of two CS-based target localization algorithms, namely DTL [10] and CS-based neural network (CSNN) [23]....
[...]
...EML algorithms in [10], [23], and [24] in the case of three patients and six receiver nodes....
[...]
...pared to other classical positioning algorithms such as the EML, DTL, and CSNN approaches in [10] and [23]....
TL;DR: In this article , the state of the art in marine intelligent electromagnetic detection sensors, systems and platforms for the detection and health monitoring of offshore structures, especially for subsea cables and pipelines.
Abstract: This paper introduces the state of the art in marine intelligent electromagnetic detection sensors, systems and platforms for the detection and health monitoring of offshore structures, especially for subsea cables and pipelines. The presented survey is by no means exhaustive, but introduce some notable results and relevant techniques in the detection and inspection of marine structure electromagnetic monitoring fields. Particularly, this paper presents a review of the main research works focusing on electromagnetic detection techniques and intelligent marine vehicles for subsea cables. The marine electromagnetic exploration techniques are elaborated as active and passive detections based on different electromagnetic field generation and sensing principles. The pulse induction-based detection method for subsea ferromagnetic lines detection, transient and controlled-source electromagnetic systems in marine geological fields under the category of active detection approach, are reviewed. According to the classical applications on subsea cable-like target inspection, the passive detection approach is detailed classified as crisscrossing detection and along-tracking detection, which are characterised by different trajectories of detection vehicles. Finally, the paper discusses the marine electromagnetic detection techniques in general and highlights challenges that need to be addressed in developing detection vehicles with lower electromagnetic noises and advanced autonomy on carrying out detection missions.
TL;DR: A novel sound source localization method based on compressive sensing theory that can directly determine the number of sound sources in one step and successfully estimate the source positions in noisy and reverberant environments is proposed.
Abstract: Sound source localization with less data is a challenging task. To address this problem, a novel sound source localization method based on compressive sensing theory is proposed in this paper. Specifically, a sparsity basis is first constructed for each microphone by shifting the audio signal recorded from one reference microphone. In this manner, the microphones except the reference one are allowed to capture audio signals under the sampling rate far below the Nyquist criterion. Next, the source positions are estimated by solving an $$l_1$$
minimization based on each frame of audio signals. Finally, a fine localization scheme is presented by fusing the estimated source positions from multiple frames. The proposed method can directly determine the number of sound sources in one step and successfully estimate the source positions in noisy and reverberant environments. Experimental results demonstrate the validity of the proposed method.
TL;DR: In this paper, the authors considered the model problem of reconstructing an object from incomplete frequency samples and showed that with probability at least 1-O(N/sup -M/), f can be reconstructed exactly as the solution to the lscr/sub 1/ minimization problem.
Abstract: This paper considers the model problem of reconstructing an object from incomplete frequency samples. Consider a discrete-time signal f/spl isin/C/sup N/ and a randomly chosen set of frequencies /spl Omega/. Is it possible to reconstruct f from the partial knowledge of its Fourier coefficients on the set /spl Omega/? A typical result of this paper is as follows. Suppose that f is a superposition of |T| spikes f(t)=/spl sigma//sub /spl tau//spl isin/T/f(/spl tau/)/spl delta/(t-/spl tau/) obeying |T|/spl les/C/sub M//spl middot/(log N)/sup -1/ /spl middot/ |/spl Omega/| for some constant C/sub M/>0. We do not know the locations of the spikes nor their amplitudes. Then with probability at least 1-O(N/sup -M/), f can be reconstructed exactly as the solution to the /spl lscr//sub 1/ minimization problem. In short, exact recovery may be obtained by solving a convex optimization problem. We give numerical values for C/sub M/ which depend on the desired probability of success. Our result may be interpreted as a novel kind of nonlinear sampling theorem. In effect, it says that any signal made out of |T| spikes may be recovered by convex programming from almost every set of frequencies of size O(|T|/spl middot/logN). Moreover, this is nearly optimal in the sense that any method succeeding with probability 1-O(N/sup -M/) would in general require a number of frequency samples at least proportional to |T|/spl middot/logN. The methodology extends to a variety of other situations and higher dimensions. For example, we show how one can reconstruct a piecewise constant (one- or two-dimensional) object from incomplete frequency samples - provided that the number of jumps (discontinuities) obeys the condition above - by minimizing other convex functionals such as the total variation of f.
TL;DR: A class of practical and nearly optimal schemes for adapting the size of a neural network by using second-derivative information to make a tradeoff between network complexity and training set error is derived.
Abstract: We have used information-theoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative information to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application.
3,961 citations
"A compressive sensing based compres..." refers methods in this paper
...Several iterative algorithms have been proposed to solve this min imization problem (Greedy Algorithms such as Orthogonal Matching Pursuit (OMP) or Matching Pursuit (MP) and Non-convex local optimizat ion like FOCUSS algorithm[16]....
[...]
...All of the traditional algorithms, such as Optimal Brain Damage (OBD)[16], Optimal Brain Surgeon (OBS)[17], and Magnitude-based pruning (MAG)[ 18], Skeletonization (SKEL)[6], non-contributing units (NC)[7] and Extended Fourier Amplitude Sensitiv ity Test (EFAST)[13], are available in SNNS (CSS1 is name of algorithm that uses SMV for sparse representation and CSS2 is another technique that uses MMV for sparserepresentatio n)....
TL;DR: Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case, and thus yields better generalization on test data.
Abstract: We investigate the use of information from all second order derivatives of the error function to perform network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Solla, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix H-1 from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weight decay on three benchmark MONK's problems [Thrun et al., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987] used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.
1,785 citations
"A compressive sensing based compres..." refers background in this paper
...Neural network based techniques have been proposed to overcome the computational complexity problem by exploiting their massive parallelism [3,4]....
TL;DR: The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed.
Abstract: A rule of thumb for obtaining good generalization in systems trained by examples is that one should use the smallest system that will fit the data. Unfortunately, it usually is not obvious what size is best; a system that is too small will not be able to learn the data while one that is just big enough may learn very slowly and be very sensitive to initial conditions and learning parameters. This paper is a survey of neural network pruning algorithms. The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed. >
1,705 citations
"A compressive sensing based compres..." refers background in this paper
...In the sound source localization techniques, location of the source has to be estimated automatically by calculating the direction of the received signal [1]....
TL;DR: A practical iterative algorithm for signal reconstruction is proposed, and potential applications to coding, analog-digital (A/D) conversion, and remote wireless sensing are discussed.
Abstract: Recent results show that a relatively small number of random projections of a signal can contain most of its salient information. It follows that if a signal is compressible in some orthonormal basis, then a very accurate reconstruction can be obtained from random projections. This "compressive sampling" approach is extended here to show that signals can be accurately recovered from random projections contaminated with noise. A practical iterative algorithm for signal reconstruction is proposed, and potential applications to coding, analog-digital (A/D) conversion, and remote wireless sensing are discussed