scispace - formally typeset
Search or ask a question

Showing papers on "Linear predictive coding published in 2018"


Journal ArticleDOI
TL;DR: In this article, the authors study the ability of deep neural networks (DNNs) to restore missing audio content based on its context, i.e., inpaint audio gaps.
Abstract: We study the ability of deep neural networks (DNNs) to restore missing audio content based on its context, i.e., inpaint audio gaps. We focus on a condition which has not received much attention yet: gaps in the range of tens of milliseconds. We propose a DNN structure that is provided with the signal surrounding the gap in the form of time-frequency (TF) coefficients. Two DNNs with either complex-valued TF coefficient output or magnitude TF coefficient output were studied by separately training them on inpainting two types of audio signals (music and musical instruments) having 64-ms long gaps. The magnitude DNN outperformed the complex-valued DNN in terms of signal-to-noise ratios and objective difference grades. Although, for instruments, a reference inpainting obtained through linear predictive coding performed better in both metrics, it performed worse than the magnitude DNN for music. This demonstrates the potential of the magnitude DNN, in particular for inpainting signals that are more complex than single instrument sounds.

36 citations


Proceedings ArticleDOI
23 Mar 2018
TL;DR: This paper proposes to use the speech signal identification technique to recognize infant cry signals by using nearest neighbor approach, neural networks method, and the cry recognition of specific infants yielded promising results.
Abstract: The cry signals generated by infants serves as the primary communication for infants. Cry signals can provide insight into their wellbeing. This paper proposes to use the speech signal identification technique to recognize infant cry signals. Advanced signal processing methods are used to analyze the infant cry by using audio features in the time and frequency domains in an attempt to classify each cry to a specific need. The features extracted from audio feature space include linear predictive coding (LPC), linear predictive cepstral coefficients (LPCC), Bark frequency cepstral coefficients (BFCC) and Mel frequency cepstral coefficients (MFCC). The primary classification technique used were: nearest neighbor approach, neural networks method. The cry recognition of specific infants yielded promising results.

25 citations


Posted Content
TL;DR: The verification results indicate that the detection technique can detect most collapsed segments and the subjective evaluations of voice conversion demonstrate that the generation technique significantly improves the speech quality while maintaining the same speaker similarity.
Abstract: In this paper, we propose a technique to alleviate the quality degradation caused by collapsed speech segments sometimes generated by the WaveNet vocoder. The effectiveness of the WaveNet vocoder for generating natural speech from acoustic features has been proved in recent works. However, it sometimes generates very noisy speech with collapsed speech segments when only a limited amount of training data is available or significant acoustic mismatches exist between the training and testing data. Such a limitation on the corpus and limited ability of the model can easily occur in some speech generation applications, such as voice conversion and speech enhancement. To address this problem, we propose a technique to automatically detect collapsed speech segments. Moreover, to refine the detected segments, we also propose a waveform generation technique for WaveNet using a linear predictive coding constraint. Verification and subjective tests are conducted to investigate the effectiveness of the proposed techniques. The verification results indicate that the detection technique can detect most collapsed segments. The subjective evaluations of voice conversion demonstrate that the generation technique significantly improves the speech quality while maintaining the same speaker similarity.

19 citations


Journal ArticleDOI
TL;DR: The presented experiments demonstrate that the proposed randomizations yield uncorrelated signals, that perceptual quality is competitive, and that the complexity of the proposed methods is feasible for practical applications.
Abstract: Efficient coding of speech and audio in a distributed system requires that quantization errors across nodes are uncorrelated. Yet, with conventional methods at low bitrates, quantization levels become increasingly sparse, which does not correspond to the distribution of the input signal and, importantly, also reduces coding efficiency in a distributed system. We have recently proposed a distributed speech and audio codec design, which applies quantization in a randomized domain such that quantization errors are randomly rotated in the output domain. Similar to dithering, this ensures that quantization errors across nodes are uncorrelated and coding efficiency is retained. In this paper, we improve this approach by proposing faster randomization methods, with a computational complexity of $\mathcal O(N\log N)$ . The presented experiments demonstrate that the proposed randomizations yield uncorrelated signals, that perceptual quality is competitive, and that the complexity of the proposed methods is feasible for practical applications.

18 citations


Proceedings ArticleDOI
02 Sep 2018
TL;DR: In this paper, the authors proposed a technique to alleviate the quality degradation caused by collapsed speech segments sometimes generated by the WaveNet vocoder, and they also proposed a waveform generation technique for WaveNet using a linear predictive coding constraint.
Abstract: In this paper, we propose a technique to alleviate the quality degradation caused by collapsed speech segments sometimes generated by the WaveNet vocoder. The effectiveness of the WaveNet vocoder for generating natural speech from acoustic features has been proved in recent works. However, it sometimes generates very noisy speech with collapsed speech segments when only a limited amount of training data is available or significant acoustic mismatches exist between the training and testing data. Such a limitation on the corpus and limited ability of the model can easily occur in some speech generation applications, such as voice conversion and speech enhancement. To address this problem, we propose a technique to automatically detect collapsed speech segments. Moreover, to refine the detected segments, we also propose a waveform generation technique for WaveNet using a linear predictive coding constraint. Verification and subjective tests are conducted to investigate the effectiveness of the proposed techniques. The verification results indicate that the detection technique can detect most collapsed segments. The subjective evaluations of voice conversion demonstrate that the generation technique significantly improves the speech quality while maintaining the same speaker similarity.

18 citations


Journal ArticleDOI
TL;DR: This paper reviewed various speech emotion database and reviewed various algorithms available on SER including hidden Markov model, Gaussian mixture mdoel, vector quantization, artificial neural networks, and deep neural networks.
Abstract: In recent years, there is a growing interest in speech emotion recognition (SER) by analyzing input speech. SER can be considered as simply pattern recognition task which includes features extraction, classifier, and speech emotion database. The objective of this paper is to provide a comprehensive review on various literature available on SER. Several audio features are available, including linear predictive coding coefficients (LPCC), Mel-frequency cepstral coefficients (MFCC), and Teager energy based features. While for classifier, many algorithms are available including hidden Markov model (HMM), Gaussian mixture mdoel (GMM), vector quantization (VQ), artificial neural networks (ANN), and deep neural networks (DNN). In this paper, we also reviewed various speech emotion database. Finally, recent related works on SER using DNN will be discussed.

18 citations


Proceedings ArticleDOI
01 Oct 2018
TL;DR: Using LSTM for recognize Indonesian speech digit, the MFCC feature extraction gets better accuracy result of 96.58% compared to the LPC feature extraction which amounts to 93.79 %.
Abstract: This paper presents Indonesian speech digit of decimal number (0–9) recognition using Deep Learning Long-Short Term Memory (LSTM). The LPC (Linear Predictive Coding) and MFCC (Mel-Frequency Cepstrum) feature extraction was used as an input on the LSTM model and the level of recognition accuracy was compared. The LPC feature extract speech feature based on a pitch or fundamental frequency, while MFCC extract speech feature based on the sound spectrum. We used 7990 speech digits consisted of 12 LPC coefficients and 12 MFCC coefficients as training data, while 790 data was used to classify on LSTM that had been trained. The results show that using LSTM for recognize Indonesian speech digit, the MFCC feature extraction gets better accuracy result of 96.58% compared to the LPC feature extraction which amounts to 93.79 %.

17 citations


Journal ArticleDOI
01 Sep 2018
TL;DR: This research shows the implementation of speech recognition to control arm robot using Linear Predictive Coding and Adaptive Neuro-Fuzzy Inference System and the successful grade for trained speech data and not trained data.
Abstract: This research shows the implementation of speech recognition to control arm robot. The method to identify the speech recognition using Linear Predictive Coding (LPC) and Adaptive Neuro-Fuzzy Inference System (ANFIS). LPC method used to feature extraction the signal of speech and ANFIS method used to learn the speech recognition. The data learning which used to ANFIS processed are 6 features. The examination system of speech identification using trained and not trained data. The result of the research shows the successful grade for trained speech data is 88.75% and not trained data is 78.78%. Identification of speech recognition system was applied to controlled arm robot based on Arduino microcontroller.

17 citations


Proceedings ArticleDOI
06 Apr 2018
TL;DR: The paper makes an effort to discuss different speaker modeling techniques like Vector Quantization (VQ), Gaussian Mixture Model (GMM).
Abstract: Speaker Recognition is the process of recognizing the speaker from the individual's speech biometrics. The voice characteristics of every speaker are different and thus can be used to construct a model. This model is later used to recognize an enrolled speaker from the list of available speakers. The paper makes an effort to discuss different speaker modeling techniques like Vector Quantization (VQ), Gaussian Mixture Model (GMM)., Neural Networks (NN)., etc. Also., different techniques for extraction of voice characteristics like Mel Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC) are discussed. Further, an in-depth analysis of these surveyed techniques is made to identify their advantages and limitations. The work in the field of Speaker Recognition Systems began in the 1950's and is evolving since then, it has wide applications in the fields of security, forensics., authentication etc.

15 citations


Journal ArticleDOI
TL;DR: The proposed LWAP descriptor, followed by VLAD encoding, PCA plus LDA feature extraction, and a simple distance-based classifier, yields promising results that are competitive with those obtained by the state-of-the-art convolutional neural networks.
Abstract: Investigating the identity, distribution, and evolution of bird species is important for both biodiversity assessment and environmental conservation. The discrete wavelet transform (DWT) has been widely exploited to extract time–frequency features for acoustic signal analysis. Traditional approaches usually compute statistical measures (e.g., maximum, mean, standard deviation) of the DWT coefficients in each subband independently to yield the feature descriptor, without considering the intersubband correlation. A new acoustic descriptor, called the local wavelet acoustic pattern (LWAP), is proposed to characterize the correlation of the DWT coefficients in different subbands for birdsong recognition. First, we divide a variable-length birdsong segment into a number of fixed-duration texture windows. For each texture window, several LWAP descriptors are extracted. The vector of locally aggregated descriptors (VLAD) is then used to aggregate the set of LWAP descriptors into a single VLAD vector. Finally, principal component analysis (PCA) plus linear discriminant analysis (LDA) are employed to reduce the feature dimensionality for classification purposes. Experiments on two birdsong datasets show that the proposed LWAP descriptor outperforms other local descriptors, including linear predictive coding cepstral coefficients, Mel-frequency cepstral coefficients, perceptual linear prediction cepstral coefficients, chroma features, and prosody features. Furthermore, the proposed LWAP descriptor, followed by VLAD encoding, PCA plus LDA feature extraction, and a simple distance-based classifier, yields promising results that are competitive with those obtained by the state-of-the-art convolutional neural networks.

13 citations


Proceedings ArticleDOI
01 Oct 2018
TL;DR: This work in progress paper describes software that enables online machine learning experiments in an undergraduate DSP course that operates in HTML5 and embeds several digital signal processing functions that provide a user-friendly visualization of phoneme recognition tasks.
Abstract: This work in progress paper describes software that enables online machine learning experiments in an undergraduate DSP course. This software operates in HTML5 and embeds several digital signal processing functions. The software can process natural signals such as speech and can extract various features, for machine learning applications. For example in the case of speech processing, LPC coefficients and formant frequencies can be computed. In this paper, we present speech processing, feature extraction and clustering of features using the K-means machine learning algorithm. The primary objective is to provide a machine learning experience to undergraduate students. The functions and simulations described provide a user-friendly visualization of phoneme recognition tasks. These tasks make use of the Levinson-Durbin linear prediction and the K-means machine learning algorithms. The exercise was assigned as a class project in our undergraduate DSP class. The description of the exercise along with assessment results is described.

Journal ArticleDOI
TL;DR: This paper presents a framework for computing low-dimensional representations of speech data based on two assumptions: that speech data represented in high-dimensional data spaces lie on shapes called manifolds that can be used to map speech data to low- dimensional coordinate spaces, and that manifolds underlying speech data are generated from a combination of language-specific lexical, phonological, and phonetic information that is expressed by talkers of a given speech community.

Book ChapterDOI
01 Jan 2018
TL;DR: The analysis finds that CC surpassed all other features to provide maximum accuracy and robustness along with neural networks.
Abstract: This paper proposes a novel method for an automated classification of musical instruments based on the analysis of the audio signals generated by them. The paper studies the effectiveness and efficiency of a number of features like Mel frequency cepstral coefficients (MFCC), harmonic pitch class profile (HPCP), linear predictive coding (LPC) coefficients, spectral centroid and pitch salience peaks with cepstral coefficients (CC) with multiple machine learning algorithms like artificial neural network (ANN), K-nearest neighbors (K-NN), support vector machine (SVM), and random forest. The analysis finds that CC surpassed all other features to provide maximum accuracy and robustness along with neural networks. Multiple datasets have been used in the experimentations to remove the possibility of a bias. The overall accuracy obtained ranged between 90 and 93%.

Book ChapterDOI
07 Dec 2018
TL;DR: Application of two signal processing schemes very commonly used in speech recognition systems, Linear Predictive Coding and MFCC, to enhance the bone conducted signal are introduced and comparison between them shows that slight improvement in noise reduction is possible and retrieval of lost information, due to bone conduction of speech, cannot be achieved.
Abstract: The air microphone used in communication devices to acquire speech signal gathers highly imperceptible signal in noisy background conditions. Bone conducted speech signal appears to be a promising tool to avoid this situation and improve the quality of communication between two users because of its inherent capability of attenuating high frequency signals. Though, there is no background noise present, the quality of extracted bone conducted signal is usually quite low in terms of intelligibility and strength. The reason for this quality degradation can again be accounted to the high frequency signal repulsion nature of bones. To rectify this issue and to make the bone conducted signal useful in communication systems, some signal processing schemes are required to be developed. This paper introduces application of two signal processing schemes which are very commonly used in speech recognition systems, Linear Predictive Coding (LPC) and MFCC (Mel Frequency Cepstral Coefficient), to enhance the bone conducted signal and shows comparison between them. Results of the analysis show that slight improvement in noise reduction is possible by using the proposed techniques. However, retrieval of lost information, due to bone conduction of speech, cannot be achieved by any of the two proposed techniques and a more robust scheme has to be developed for bone conducted signal improvement.

Patent
17 Jan 2018
TL;DR: In this paper, a linear predictive coding apparatus consisting of linear predictive analysis part 221 and linear predictive coefficient code part 224 was proposed. But the authors did not specify the coefficients transformable to linear predictive coefficients for which the values of · have been adapted.
Abstract: A linear predictive coding apparatus comprises: a linear predictive analysis part 221 performing linear predictive analysis using a pseudo correlation function signal sequence obtained by performing inverse Fourier transform regarding the · 1 -th power of the absolute values of the frequency domain sample sequence corresponding to the time-series signal as a power spectrum to obtain coefficients transformable to linear predictive coefficients; an adaptation part 22A adapting values of · for a plurality of candidates for coefficients transformable to linear predictive coefficients stored in a code book stored in a code book storing part 222 and the coefficients transformable to linear predictive coefficients obtained by the linear predictive analysis part 221; and a coding part 224 obtaining a linear predictive coefficient code corresponding to the coefficients transformable to linear predictive coefficients obtained by the linear predictive analysis part 221, using the plurality of candidates for coefficients transformable to linear predictive coefficients and the coefficients transformable to linear predictive coefficients for which the values of · have been adapted.

Proceedings ArticleDOI
27 Aug 2018
TL;DR: A novel method to classify speech into Voiced/ Unvoiced/ Silence/ Music/ Background noise frames and to find optimal order of LPC for each frame using neural network is proposed and gives classification of frames into five categories with very high accuracy.
Abstract: Speech codec which is an integral part of most of the communication standards consists of a Voice activity detector (VAD) module followed by an encoder that uses Linear Predictive Coding (LPC). These two modules have a lot of potential for improvements that can yield low bit-rates without compromising quality. VAD is used for detecting voice activity in the input signal, which is an important step in achieving high efficiency speech coding. LPC analysis of input speech at an optimal order can assure maximum SNR and thereby perceptual quality while reducing the transmission bit-rate. This paper proposes a novel method to classify speech into Voiced/ Unvoiced/ Silence/ Music/ Background noise (V/UV/S/M/BN) frames and to find optimal order of LPC for each frame using neural network. The speech sound classifier module gives classification of frames into five categories with very high accuracy. Choosing the order predicted by neural network as the optimal LPC order for voiced frames while keeping a low order for unvoiced frames maintains the reconstruction quality and brings down the bit-rate.

Proceedings ArticleDOI
01 Apr 2018
TL;DR: The proposed method, least significant bit management (LSBM), controls the least significant bits of the spectra based on their envelope to make them represented by fixed bit rates, which guarantees the range of the damage caused by the bit error which makes the mismatch of the spectral envelopes between the encoder and the decoder.
Abstract: We have devised a method for bit assignment of quantized frequency spectra aiming at its use in low-delay bit-error-robust speech compression. The proposed method, least significant bit management (LSBM), controls the least significant bits of the spectra based on their envelope to make them represented by fixed bit rates, which guarantees the range of the damage caused by the bit error which makes the mismatch of the spectral envelopes between the encoder and the decoder. In addition, we relate the method to the linear predictive coding scheme and show its performance and robustness in a speech codec by objective and subjective evaluations. The codec based on this method, having bit-error robustness with only 1.5-ms algorithmic delay, can be useful at such situations as real-time speech communication with non-IP protocols.

Journal ArticleDOI
TL;DR: The simulation results for segmentation of synthetic and real EEG data show that by applying the newly proposed methodology, the specificity and sensitivity of the segmentation are highly improved.
Abstract: In order to analyze non-stationary signals, like Electroencephalogram (EEG), it is sometimes easier to segment signals into pseudo-stationary segments. In this paper, the cascade of linear predictive coding (LPC) and non-linear Volterra filter is employed for modeling of noise in EEG signal and this methodology is applied to the procedure of change-point detection, for estimating the number of change-points and their exact location which is a powerful way to detect the change-points as precisely as possible. The earlier results are completed by constructing algorithms that use the cascade of LPC and non-linear Volterra filter for modeling the relation between noisy signal and noise in practical situations. In a Bayesian configuration, the posterior distribution of the change-point sequence is constructed and then Markov Chain Monte Carlo procedure is used for sampling this posterior distribution. The simulation results for segmentation of synthetic and real EEG data show that by applying our newly...

Proceedings ArticleDOI
01 May 2018
TL;DR: In this article, the analysis of speech signal based on formant space provides a method of assessing the influence of each formant on a phoneme across gender and different age groups. But the analysis is carried out separately for male and female speakers.
Abstract: Acoustic phonetics is the study of the physical properties of sounds and provides means to distinguish one sound from another in quality and quantity. A study of acoustic characteristics of Kannada begins with the phonemic analysis of the language. Phonetic analysis of Kannada vowels is presented in this paper. The analysis of speech signal based on formant space provides a method of assessing the influence of each formant on a phoneme across gender and different age groups. PRAAT software is used for the purpose of analysis of speech signals. In this work Kannada vowels speech signals were recorded from different age groups of both male and female. Formant frequencies of corresponding vowels were computed. The analysis is carried out separately for male and female speakers. The preliminary analysis of formants of vowels show significant variations across gender and age groups. In the similar way using the Linear Predictive coding (LPC) analysis is done to get in depth understanding of formants by considering different filter orders. Then order of the LPC filter is typically estimated by using information about the formants obtained using PRAAT tool.

Proceedings ArticleDOI
01 Aug 2018
TL;DR: This paper reduces the amount of data generated during the signal conversion process by combining linear predictive coding techniques, which is widely used in speech signal processing, with nonlinear quantization coding, and by using this method, RF signals can be transmitted efficiently.
Abstract: Recently, HFC (hybrid fiber cable) has been widely used due to the demand for economical and efficient digital services. HFC can be used for an integrated digital service broadband network access technology, through the photoelectric conversion process to achieve data transmission between different devices and equipment. Therefore, RoIP (Radio Over IP) technology for two-way communication is revealed without changing the original system hardware structure. RoIP is a technology that converts radio frequency signal generated at a transmitting end into digital signal and transmits it using an optical IP network. However, during the signal conversion process, a large amount of data generated needs to be compressed to achieve efficient data transmission. In this paper, we reduce the amount of data by combining linear predictive coding techniques, which is widely used in speech signal processing, with nonlinear quantization coding. By using this method, RF signals can be transmitted efficiently. We measured the compression ratio and the magnitude of the error vector representing the degree of signal damage during the compression process and compared it with previous experimental results to verify the availability of these schemes.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: This paper sparsification of speech signal is done by applying Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT) and Linear Predictive Coding (LPC) and it provides a novel framework for speech signal compression.
Abstract: The advancements in today’s multimedia applications demand high quality speech signal transmission as well as storage. The limited availability of bandwidth and storage capacity necessitates the development of better compression techniques for speech signals. Compressive sensing is an emerging technique in signal processing and it provides a novel framework for speech signal compression. In compressive sensing, a signal can be exactly reconstructed if it is naturally sparse or in some sparsifying basis. In this paper, the sparsification of speech signal is done by applying Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT) and Linear Predictive Coding (LPC). The sparsified speech signal is compressive sensed to reduce the number of samples. The original signal is reconstructed using different algorithms like Basis Pursuit (BP), l1 regularized least squares (l1 ls) and Orthogonal Matching Pursuit (OMP). The quality of reconstructed speech signal is quantitatively expressed using different metrics like Mean Square Error (MSE), Segmental Signal to Noise Ratio (SSNR) and Perceptual Evaluation of Speech Quality (PESQ). For a 60 percentage of samples, the value of MSE obtained by using a combination of sparsifying basis DCT and reconstruction algorithm BP is 0.00018122. Using the same conditions the value of SSNR and PESQ is found to bell.3 dB and 2.574 respectively.

Proceedings ArticleDOI
01 Nov 2018
TL;DR: An efficient low bit-rate speech coding technique to transmit encrypted speech over Global System for Mobile communications (GSM) and results show that the proposed speech coder can be used effectively in low bit rate applications such as secure voice communications.
Abstract: Speech coding for secure mobile communication is a challenging task since encrypting the digitized speech signal adds overheads and randomizes the bit stream to a level where recovery of original signal becomes difficult. The commercially available speech coders are unbefitting for transmission of encrypted speech as it requires a low bit speech coding with preserved speech characteristics so that it can be transmitted over a bandwidth-limited channels with intelligible speech. In this paper, we present an efficient low bit-rate (1.9 kbps) speech coding technique to transmit encrypted speech over Global System for Mobile communications (GSM). Different speech characteristics such as pitch, energy and Line Spectral Frequencies (LSF) are extracted and preserved before compression and encryption speech. Empirical results show that the proposed speech coder can be used effectively in low bit rate applications such as secure voice communications.

Patent
01 May 2018
TL;DR: In this article, an inertial data compression method, a server and a computer readable storage medium, is described, which includes the steps: acquiring inertial datasets to be compressed by the aid of the server, performing feature extraction on the inertial dataset to obtain a mapping feature value, reflecting a mapping relationship between the data and the feature value by the preset main component mapping model, and performing linear predictive coding on the mapping feature values to get a predictive coding result.
Abstract: The invention discloses an inertial data compression method, a server and a computer readable storage medium. The method includes the steps: acquiring inertial data to be compressed in an inertial sensor by the aid of the server; performing feature extraction on the inertial data to be compressed by the aid of a preset main component mapping model to obtain a mapping feature value; reflecting a mapping relationship between the inertial data and the feature value by the preset main component mapping model; performing linear predictive coding on the mapping feature value to obtain a predictive coding result; performing compression coding on the predictive coding result. Features of the inertial data to be compressed are extracted by the preset main component mapping model, predictive effectsof the predictive coding result can be improved, the predictive coding result is compressed and encoded, processing delay of the inertial data can be reduced, compression ratio is improved, processing and calculating complexity of the inertial data is reduced, and compression loss of the inertial data is reduced.

Book ChapterDOI
06 Sep 2018
TL;DR: This study addresses the design of sparsifying matrices for electroencephalogram (EEG) signals in the context of compressed sensing using embedded multicore architectures as powerful IoT edge devices and energy efficient signal acquisition and processing techniques.
Abstract: The sensitive domain of healthcare intensifies the shortcomings associated with internet of things (IoT) based remote health monitoring systems in terms of their high-energy consumption and big data issues such as latency and privacy, caused by, the continuous stream of raw data. Hence, in the development of their remote elderly monitoring system (REMS), the authors focus on using embedded multicore architectures as powerful IoT edge devices and energy efficient signal acquisition and processing techniques to elevate such limitations. This study addresses the design of sparsifying matrices for electroencephalogram (EEG) signals in the context of compressed sensing. These signals are known to be non-sparse in both time and standard transform domains. The designed matrices are adapted to the data and are based on the autoregressive modeling of the signal and the singular value decomposition (SVD) of the impulse response matrix of the linear predictive coding (LPC) filter. To facilitate the hardware implementation and to prolong the life of the wearable node, the measurement matrix is chosen to be binary. The proposed algorithm has been applied to the EEGLab dataset ‘eeglab data set’ with an average normalized mean square error of 0.068.


Proceedings ArticleDOI
24 Jul 2018
TL;DR: This paper demonstrates another scale of auditory frequency spectrum namely, Bark and Equivalent Rectangular Bandwidth (ERB) scales, which have achieved the better performance than the Mel scale.
Abstract: With the rapidly growth of digital computers, there has been an increasing demand to communicate with machines in efficient spoken manner. Speech recognition is the process of translating fromspoken words into readable text. To get the robust and reliable transcription text from recognizer,proper feature extraction methods are needed. This paper is concerned to an approach of features extraction on spoken Myanmar digits recognition. In this study, the recognition performances of Fast Fourier Transform (FFT), Mel Frequency Cepstrum Coefficients (MFCC), Linear Predictive Coding (LPC)and Linear Prediction Cepstral Coefficients (LPCC) methods will be compared. Even though the frequency spacing with Mel scale is extensively used in Automatic Speech Recognition (ASR), this paper demonstrates another scale of auditory frequency spectrum namely, Bark and Equivalent Rectangular Bandwidth (ERB) scales. The results have achieved the better performance than the Mel scale. The k-Nearest Neighbor (KNN) is employed as the classifier and ten digits of Myanmar language from twelve speakers are collected. According to these experiments, the results show the best recognition rates of 88.6% with the used of feature extraction based on ERB scale band pass filter.

Journal ArticleDOI
TL;DR: There is an improvement in the quality of extracted features with the order of linear prediction and the optimum performance is obtained for Linear Predictive Coding order between 20 and 30, and this varies with gender and statistical characteristics of speech.
Abstract: Speech coding facilitates speech compression without perceptual loss that results in the elimination or deterioration of both speech and speaker specific features used for a wide range of applications like automatic speaker and speech recognition, biometric authentication, prosody evaluations etc. The present work investigates the effect of speech coding in the quality of features which include Mel Frequency Cepstral Coefficients, Gammatone Frequency Cepstral Coefficients, Power-Normalized Cepstral Coefficients, Perceptual Linear Prediction Cepstral Coefficients, Rasta-Perceptual Linear Prediction Cepstral Coefficients, Residue Cepstrum Coefficients and Linear Predictive Coding-derived cepstral coefficients extracted from codec compressed speech. The codecs selected for this study are G.711, G.729, G.722.2, Enhanced Voice Services, Mixed Excitation Linear Prediction and also three codecs based on compressive sensing frame work. The analysis also includes the variation in the quality of extracted features with various bit-rates supported by Enhanced Voice Services, G.722.2 and compressive sensing codecs. The quality analysis of extracted epochs, fundamental frequency and formants estimated from codec compressed speech was also performed here. In the case of various features extracted from the output of selected codecs, the variation introduced by Mixed Excitation Linear Prediction codec is the least due to its unique method for the representation of excitation. In the case of compressive sensing based codecs, there is a drastic improvement in the quality of extracted features with the augmentation of bit rate due to the waveform type coding used in compressive sensing based codecs. For the most popular Code Excited Linear Prediction codec based on Analysis-by-Synthesis coding paradigm, the impact of Linear Predictive Coding order in feature extraction is investigated. There is an improvement in the quality of extracted features with the order of linear prediction and the optimum performance is obtained for Linear Predictive Coding order between 20 and 30, and this varies with gender and statistical characteristics of speech. Even though the basic motive of a codec is to compress single voice source, the performance of codecs in multi speaker environment is also studied, which is the most common environment in majority of the speech processing applications. Here, the multi speaker environment with two speakers is considered and there is an augmentation in the quality of individual speeches with increase in diversity of mixtures that are passed through codecs. The perceptual quality of individual speeches extracted from the codec compressed speech is almost same for both Mixed Excitation Linear Prediction and Enhanced Voice Services codecs but regarding the preservation of features, the Mixed Excitation Linear Prediction codec has shown a superior performance over Enhanced Voice Services codec.

Patent
05 Apr 2018
TL;DR: A linear predictive coding apparatus as discussed by the authors includes a linear predictive analysis part performing linear predictive analyses using a pseudo correlation function signal sequence obtained by performing inverse Fourier transform regarding the η 1-th power of absolute values of the frequency domain sample sequence corresponding to the time-series signal as a power spectrum to obtain coefficients transformable to linear predictive coefficients.
Abstract: A linear predictive coding apparatus includes: a linear predictive analysis part performing linear predictive analysis using a pseudo correlation function signal sequence obtained by performing inverse Fourier transform regarding the η1-th power of absolute values of the frequency domain sample sequence corresponding to the time-series signal as a power spectrum to obtain coefficients transformable to linear predictive coefficients; an adaptation part adapting values η for a plurality of plural candidates for coefficients transformable to linear predictive coefficients stored in a code book in a code book storing part and the coefficients transformable to linear predictive coefficients obtained by the linear predictive analysis part; and a coding part obtaining a linear predictive coefficient code corresponding to the coefficients transformable to linear predictive coefficients, using the plurality of candidates for coefficients transformable to linear predictive coefficients and the coefficients transformable to linear predictive coefficients for which the values of η have been adapted.

Proceedings ArticleDOI
01 Jul 2018
TL;DR: This paper proposes to implement Vector Quantization (VQ) to obtain the representative LPC vectors and aims at implementing a simple speaker verification system for a single person efficiently implemented on FPGA.
Abstract: Speaker verification is of great importance, especially in the field of forensics and security. This paper aims at implementing such a system at the hardware level. This system extracts features from the fresh voice samples and verifies the speaker by comparing those with the ones being stored in the database. The features used here are the Linear Predictive Coding (LPC) Coefficients which are obtained using the Levinson - Durbin (LD) algorithm. This paper proposes to implement Vector Quantization (VQ) to obtain the representative LPC vectors. A simple speaker verification system for a single person is efficiently implemented on FPGA.

Book ChapterDOI
01 Jan 2018
TL;DR: A new algorithm is proposed for selecting an appropriate set of noisy frames for noise identification through random forest classifier using VAD G.729, andMel-frequency cepstral coefficient (MFCC) and linear predictive coding (LPC) are used as feature vectors.
Abstract: Background noise is acoustically added with human speech while communicating with others. Nowadays, many researchers are working on voice/speech activity detection (VAD) in noisy environment. VAD system segregates the frames containing human speech/only noise. Background noise identification has number of applications like speech enhancement, crime investigation. Using background noise identification system, one can identify possible location (street, train, airport, restaurant, babble, car, etc.) during communication. It is useful for security and intelligence personnel for responding quickly by identifying the location of crime. In this paper, using VAD G.729, a new algorithm is proposed for selecting an appropriate set of noisy frames. Mel-frequency cepstral coefficient (MFCC) and linear predictive coding (LPC) are used as feature vectors. These features of selected frames are calculated and passed to the classifier. Using proposed classifier, seven types of noises are classified. Experimentally, it is observed that MFCC is a more suitable feature vector for noise identification through random forest classifier. Here, by selecting appropriate noisy frames through proposed approach accuracy of random forest and SVM classifier increases up to 5 and 3%, respectively. The performance of the random forest classifier is found to be 11% higher than SVM classifier.