scispace - formally typeset
Search or ask a question
Author

M. Tech

Bio: M. Tech is an academic researcher. The author has contributed to research in topics: Speech coding & Discrete sine transform. The author has an hindex of 2, co-authored 3 publications receiving 16 citations.

Papers
More filters
01 Jan 2013
TL;DR: By using discrete wavelet transforms using different wavelet bases (Daubechies and Symlets) reduce the background noise in speech signals.
Abstract: In this Paper we introduce an enhancement terminology in speech processing. Speech enhancement involves processing speech signals for human listening or as preparation for further processing before listening. The enhancement process aims to improve the speeches overall quality; to increase the speech intelligibility in order to reduce the listener fatigue, ambiguity etc depending on specific application. The wavelet transform plays an important role in signal analysis and widely used in many applications such as signal detection and Denoising. The basic idea behind the project is to estimate the uncorrupted speech from the distorted or noisy speech signal and sine signal, and is also referred to as speech "Denoising". There are various methods to help restore speech from noisy distortions. In this paper by using discrete wavelet transforms using different wavelet bases (Daubechies and Symlets) reduce the background noise in speech signals.

10 citations

01 Jan 2013
TL;DR: A comparative study of Fast Fourier Transform (FFT, Discrete Cosine Transform (DCT), Discrete sine transform (DST) and DiscreteCosine Transform-II (D CT-II) is presented.
Abstract: ECG (electrocardiogram) is a test that measures the electrical activity of the heart. The heart is a muscular organ that beats in rhythm to pump the blood through the body. Large amount of signal data needs to be stored and transmitted. So, it is necessary to compress the ECG signal data in an efficient way. In the past decades, many ECG compression methods have been proposed and these methods can be roughly classified into three categories: direct methods, parameter extraction methods and transform methods. In this paper a comparative study of Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT), Discrete sine Transform (DST) and Discrete Cosine Transform-II (DCT-II). Records selected from MIT-BIH arrhythmia database are tested. For performance evaluation Compression Ratio (CR), Percent Root Mean Square differences (PRD) are used.

7 citations

01 Jan 2013
TL;DR: A matching algorithm to stabilize causal videos directly without the need to estimate the motion is proposed and a stable output video will be attained without any unanticipated effects during video recording.
Abstract: The poor image quality of many video surveillance cameras effectively renders them useless for the purposes of identifying a person, a license plate, etc. Hence, many researchers study such drawbacks to enhance the quality of casual videos. In this paper propose a matching algorithm to stabilize causal videos directly without the need to estimate the motion. A stable output video will be attained without any unanticipated effects during video recording.

Cited by
More filters
Journal ArticleDOI
TL;DR: This paper considers text-independent speaker recognition in the presence of some degradation effects, and the proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).
Abstract: Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products Most published algorithms for speaker recognition focus on text-dependent speaker recognition In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same The recognition rate reaches 9533% using MFCCs, while it is increased to 987% when using spectrum or log-spectrum However, the system has some challenges to recognize speakers from different recording environments Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent The proposed approach shows superiority, when compared to the algorithm of R Togneri and D Pullella (2011)

34 citations

Journal ArticleDOI
TL;DR: This work is based on studying and implementing wavelets as denoising algorithm on the basis of SNR (signal to noise ratio) and RMSE (root mean square error) for telephonic speech signal.
Abstract: communication systems and other speech related systems, background noise is a severe problem. The speech signal gets polluted by the noises that are from transmission medium and surroundings. Noise degrades the quality and the intelligibility of the speech signals. Addition of noise is by various factors like heavy machines, pumps, vehicles, using radio communication device or over noisy telephone channel. The basic idea behind the project work is to denoise the noisy telephonic speech signal. This work is based on studying and implementing wavelets as denoising algorithm. The Wavelet Transform (WT) and Wavelet Packet Transform (WPT) implemented for the work is Discrete. Haar, Daubechies, Symlet and Coiflet wavelets are implemented for denoising of telephonic speech signal. Performance of telephonic speech signal is evaluated on the basis of SNR (signal to noise ratio) and RMSE (root mean square error). SNR and RMSE are calculated for both Soft Thresholding and Hard Thresholding.

7 citations

Journal ArticleDOI
TL;DR: The results show that this method can be efficiently used for compression of ECG signal from multiple leads and performs well than the techniques based on SVD and Huffman Encoding.
Abstract: ECG (Electrocardiogram) is a test that analyzes the electrical behaviour of the heart. ECG is used in diagnosing most of the cardiac diseases. Large amount of ECG data from multiple leads needs to be stored and transmitted, which requires compression for effective data storage and retrieval. Proposed work has been developed with Singular Value Decomposition (SVD) followed by Run Length Encoding (RLE) combined with Huffman Encoding (HE) and Arithmetic Encoding (AE) individually. The ECG signal is first preprocessed. SVD is used to factorize the signal into three smaller set of values, which preserve the significant features of the ECG. Finally, Run Length Encoding combined with Huffman encoding (RLE-HE) and Arithmetic encoding (RLE-AE) individually are employed and the compression performance metrics are compared. The proposed method is evaluated with PTB Diagnostic database. Performance measures such as Compression Ratio (CR), Percentage Root mean square Difference (PRD) and Signal to Noise Ratio (SNR) of the reconstructed signal are used to evaluate the proposed technique. It is evident that the proposed method performs well than the techniques based on SVD and Huffman Encoding. The results show that this method can be efficiently used for compression of ECG signal from multiple leads.

5 citations

Journal Article
TL;DR: This paper demonstrates the application of the maximal overlap wavelet transform (Modwt) technique in speech signal denoising and reveals that Modwt based method outperforms conventional threshold methods while providing nearly up to a %24 increase in SNR value.
Abstract: Signal denoising for non-stationary digital signals can be effectively succeeded by using discrete wavelet transform. Selecting of a suitable thresholding method is important to minimize the loss of useful signal information. This paper demonstrates the application of the maximal overlap wavelet transform (Modwt) technique in speech signal denoising. The analysis algorithm was performed on Matlab platform. In this algorithm, different kinds of input noisy speech signals including environmental background noises such as restaurant, car, street or station were tested. The noisy signals were filtered from the speech signal by thresholding of wavelet coefficients with threshold estimation methods known as sgtwolog, modwtsqtwolog, heursure, rigrsure and minimaxi. The performance of the Modwt in denoising process was evaluated by comparing signal-to noise ratio (SNR) and mean square error (MSE) results to those of well-known threshold estimation methods. First, denoising effectiveness of a Modwt based threshold method was tested in different scenarios and very important improvements in denoising process were achieved by Modwt based scenarios. Next, the influence of the different wavelets families on Modwt based threshold estimation method was evaluated by experimental results. The results revealed that Modwt based method outperforms conventional threshold methods while providing nearly up to a %24 increase in SNR value.

5 citations

Proceedings ArticleDOI
Zhe Liu1, Wenbin Yu1, Cailian Chen1, Bo Yang1, Xinping Guan1 
27 Jul 2016
TL;DR: An adaptive compression ratio estimation technique of ECG signal is illustrated and a common model of relationship between compression ratio and sparsity is proposed, which can guarantee the reconstruction quality stable and improve the compression performance by 18.85% compared with traditional CS-based methods.
Abstract: Real-time electrocardiogram (ECG) monitoring system has sprung up due to the considerable interest attracted to Wireless Body Area Networks (WBANs). Commonly, the ECG data is required to be compressed for higher energy efficiency and Compressive Sensing (CS) has been proved to be an effective way. However, for the real-time ECG monitoring, the length of data frame should be strictly limited for short data latency, which unavoidably causes variation of data sparsity and fluctuation of reconstruction quality. Furthermore, the compression ratio is well worth considering with corresponding energy cost in WBANs. To balance the reconstruction quality and compression ratio, this paper illustrates an adaptive compression ratio estimation technique of ECG signal and proposes a common model of relationship between compression ratio and sparsity. Correlated-sparsity compression ratio (CoCR) is defined to reflect the influence of sparsity on compression performance. Moreover, a two-dimensional clustering algorithm is designed to accelerate the operation speed and improve the precision of classification without prior knowledge. Finally, simulation results verify that the proposed method can guarantee the reconstruction quality stable and improve the compression performance by 18.85% compared with traditional CS-based methods.

3 citations