scispace - formally typeset
Search or ask a question

Showing papers on "Adaptive Multi-Rate audio codec published in 2017"


Journal ArticleDOI
TL;DR: This paper proposes a framework for detecting double compressed AMR audio based on the stacked autoencoder (SAE) network and the universal background model-Gaussian mixture model (UBM-GMM), and uses the SAE to learn the optimal features automatically from the audio waveforms.
Abstract: The adaptive multi-rate (AMR) audio codec adopted by many portable recording devices is widely used in speech compression The use of AMR speech recordings as evidence in court is growing Nowadays, it is easy to tamper with digital speech recordings, which makes audio forensics increasingly important The detection of double compressed audio is one of the key issues in audio forensics In this paper, we propose a framework for detecting double compressed AMR audio based on the stacked autoencoder (SAE) network and the universal background model—Gaussian mixture model (UBM-GMM) Instead of hand-crafted features, we used the SAE to learn the optimal features automatically from the audio waveforms Audio frames are used as network input and the last hidden layer’s output constitutes the features of a single frame For an audio clip with many frames, the features of all the frames are aggregated and classified by UBM-GMM Experimental results show that our method is effective in distinguishing single/double compressed AMR audio and outperforms the existing methods by achieving a detection accuracy of 98% on the TIMIT database Exhaustive experiments demonstrate the effectiveness and robustness of the proposed method

46 citations


Book ChapterDOI
23 Aug 2017
TL;DR: This paper proposes a novel adaptive audio steganography in the time domain based on the advanced audio coding (AAC) and the Syndrome-Trellis coding (STC) and shows that the method can significantly outperform the conventional \(\pm 1\) LSB based steganographers in terms of security and audio quality.
Abstract: Most existing audio steganographic methods embed secret messages according to a pseudorandom number generator, thus some auditory sensitive parts in cover audio, such as mute or near-mute segments, will be contaminated, which would lead to poor perceptual quality and may introduce some detectable artifacts for steganalysis. In this paper, we propose a novel adaptive audio steganography in the time domain based on the advanced audio coding (AAC) and the Syndrome-Trellis coding (STC). The proposed method firstly compresses a given wave signal into AAC compressed file with a high bitrate, and then obtains a residual signal by comparing the signal before and after AAC compression. According to the quantity and sign of the residual signal, \(\pm 1\) embedding costs are assigned to the audio samples. Finally, the STC is used to create the stego audio. The extensive results evaluated on 10,000 music and 10,000 speech audio clips have shown that our method can significantly outperform the conventional \(\pm 1\) LSB based steganography in terms of security and audio quality.

24 citations


Journal ArticleDOI
TL;DR: There are no significant benefits to lowering the mode-sets or deploying dynamic codec rate adaptation, and controlled laboratory experiments indicated that there was an improvement in voice quality when mode-set eight was employed.
Abstract: In this paper, we examine the impact of four voice over long term evolution adaptive multi-rate wideband codec mode-sets on coverage at pedestrian and vehicular speeds Industry-standardized mean opinion scores were used as a metric for voice quality Controlled laboratory experiments simulating pedestrian speeds indicated that there was an improvement in voice quality when mode-set eight was employed At vehicular speeds, mode-set eight outperformed the other mode-sets for path losses less than 130 dB; however, all four mode-sets experienced a significant decline in voice quality when the path loss was greater than 130 dB Based on the current implementations, there are no significant benefits to lowering the mode-sets or deploying dynamic codec rate adaptation

13 citations


Journal ArticleDOI
TL;DR: The proposed codec CPCM (Chaotic Pulse Code Modulaion) will join the encryption to the compression of the voice data, which provides the same compression ratio given by the PCM codec, but with an unintelligible content.
Abstract: We propose to incorporate encryption procedure into the lossy compression of voice data PCM(Pulse Code Modulation) based on the A-law approximation quantization. The proposed codec CPCM (Chaotic Pulse Code Modulaion) will join the encryption to the compression of the voice data. This scheme provides the same compression ratio given by the PCM codec, but with an unintelligible content. Comparisons with many used schemes have been made to highlight the proposed method in terms of security and rapidity. CPCM codec can be a better alternative to Compress-then-encrypt classical methods which is a time and resource consuming and non suitable for real-time multimedia secure transmission.

12 citations


Proceedings ArticleDOI
01 Mar 2017
TL;DR: The method is based on a convolutional neural network applied to audio spectrograms and trained with the output of various lossy audio codecs and bitrates and shows good performances on a large database and robustness to codec type and resampling.
Abstract: In this paper, we propose a method for detecting marks of lossy compression encoding, such as MP3 or AAC, from PCM audio. The method is based on a convolutional neural network (CNN) applied to audio spectrograms and trained with the output of various lossy audio codecs and bitrates. Our method shows good performances on a large database and robustness to codec type and resampling.

11 citations


Journal ArticleDOI
TL;DR: An 8K satellite broadcasting experiment was carried out as a final verification test of the 8K broadcasting system, and the fabricated 22.2 ch audio codec was found to be valid.
Abstract: A 22.2 multichannel (22.2 ch) sound system has been adopted as an audio system for 8K Super Hi-Vision (8K). The 22.2 ch sound system is an advanced sound system composed of 24 channels three-dimensionally located in a space to envelop listeners in an immersive sound field. NHK has been working on standardizing and developing an 8K broadcasting system via a broadcasting satellite in time for test broadcasting in 2016. For an audio coding scheme, NHK developed a world-first 22.2 ch audio encoding/decoding hardware system (22.2 ch audio codec) capable of real time encoding/decoding. The fabricated 22.2 ch audio codec is based on MPEG-4 AAC and was assembled into the 8K codec together with the 8K video codec and the multiplexer. The audio quality of the fabricated 22.2 ch audio codec was assessed in an objective evaluation, and the evaluation results revealed the operational bit rates of the fabricated codec. An 8K satellite broadcasting experiment was carried out as a final verification test of the 8K broadcasting system, and 22.2 ch audio codec was found to be valid.

10 citations


Journal ArticleDOI
TL;DR: This paper will outline the thinking behind the design of the AC-4 codec, explain the different coding tools used, the systemic features included, and give an overview of performance and applications.
Abstract: AC-4 is a state-of-the-art audio codec standardized in ETSI (TS 103 190 and TS 103 190-2) and included in the DVB toolbox (TS 101 154 V2.2.1 and DVB BlueBook A157) and, at the time of writing, is a candidate standard for ATSC 3.0 as per A/342 part 2. AC-4 is an audio codec designed to address the current and future needs of video and audio entertainment services, including broadcast and Internet streaming. As such, it incorporates a number of features beyond the traditional audio coding algorithms, such as capabilities to support immersive and personalized audio, support for advanced loudness management, video-frame synchronous coding, dialog enhancement, etc. This paper will outline the thinking behind the design of the AC-4 codec, explain the different coding tools used, the systemic features included, and give an overview of performance and applications. It further outlines metadata aspects (immersive and personalized, essential for broadcast), metadata carriage, aspects of interchange of immersive programing, as well as immersive playback and rendering.

8 citations


Journal ArticleDOI
TL;DR: The objective of this paper is to investigate the codec’s operation as initial measurements performed by researchers show that the lossless compression performance of the IEEE compressor is better than any traditional encoders, while the encoding speed is slower which can be further optimized.
Abstract: Audio compression is a method of reducing the space demand and aid transmission of the source file which then can be categorized by lossy and lossless compression. Lossless audio compression was considered to be a luxury previously due to the limited storage space. However, as storage technology progresses, lossless audio files can be seen as the only plausible choice for those seeking the ultimate audio quality experience. There are a lot of commonly used lossless codecs are FLAC, Wavpack, ALAC, Monkey Audio, True Audio, etc. The IEEE Standard for Advanced Audio Coding (IEEE 1857.2) is a new standard approved by IEEE in 2013 that covers both lossy and lossless audio compression tools. A lot of research has been done on this standard, but this paper will focus more on whether the IEEE 1857.2 lossless audio codec to be a viable alternative to other existing codecs in its current state. Therefore, the objective of this paper is to investigate the codec’s operation as initial measurements performed by researchers show that the lossless compression performance of the IEEE compressor is better than any traditional encoders, while the encoding speed is slower which can be further optimized.

8 citations


Patent
19 Oct 2017
TL;DR: In this paper, the authors proposed a method for enhanced codec control in wireless networks, where the UE determines all available codec bitrates that can be performed by the UE and communicates the bitrates before receiving the codec control command.
Abstract: Apparatus and methods are provided for enhanced codec control. In one novel aspect, a method includes receiving a codec control command by a user equipment (UE) in a wireless network, determining if the recommended codec characteristic will be applied to a codec executing on the UE, and adjusting a characteristic of the codec executing on the UE based on the recommended codec characteristic. The UE is connected with a radio access network (RAN) and the codec control command includes a recommended codec characteristic. In another novel aspect, the recommended codec characteristic is a maximum bitrate. In another embodiment, the recommended codec characteristic is a type of codec. In yet another novel aspect, the recommended codec characteristic is a radio resource allocation command. In another novel aspect, the UE determines all available codec bitrates that can be performed by the UE and communicates the bitrates before receiving the codec control command.

6 citations


Journal ArticleDOI
TL;DR: The testing results show that in the same bit-rate, the quality of the proposed BWE is better than the BWE method in Audio Video Standard (AVS) Part 10, and the computational complexity reduced evidently.
Abstract: To meet the requirements of low coding bit-rate and low complexity for audio coding in mobile surveillance device, in this paper we proposed an audio Bandwidth Extension (BWE) algorithm based on a hybrid prediction model including the intra-frame prediction, inter-frame prediction and white-noise prediction. In the algorithm, we used four different predicting modes, including translation and fold of low-frequency signals, high-frequency signal in previous frame, and white noise, to reconstruct high-frequency signals for various types of audio signals. When a high frequency frame is encoded, Signal-Noise-Ratios (SNR) of all modes are calculated and compared with each other. The prediction pattern with highest SNR is selected as the most accurate prediction pattern to encoding the high frequency signal. And two indicator bits are used indicate the encoding mode. When the compressed high frequency frame is decoded, the specific mode is selected based on the indicator bits. The testing results show that in the same bit-rate, the quality of the proposed BWE is better than the BWE method in Audio Video Standard (AVS) Part 10, and the computational complexity reduced evidently. These advantages help the proposed method to meet the requirements of mobile surveillance device for audio codec.

3 citations


Book ChapterDOI
13 Oct 2017

Proceedings ArticleDOI
01 Aug 2017
TL;DR: A non-intrusive codec type and bit-rate detection algorithm that extracts a number of features from a decoded speech signal and models their statistics using a Deep Neural Network (DNN) classifier and a method for reducing the computational complexity and improving the robustness of the algorithm by pruning features that have a low importance and high computational cost using a CART binary tree.
Abstract: We present a non-intrusive codec type and bit-rate detection algorithm that extracts a number of features from a decoded speech signal and models their statistics using a Deep Neural Network (DNN) classifier. We also present a method for reducing the computational complexity and improving the robustness of the algorithm by pruning features that have a low importance and high computational cost using a CART binary tree. The proposed method is tested on a database that includes additive noise and transcoding as well as a real voicemail database. We show that the proposed method has 25% lower complexity than the baseline, 19% higher accuracy in the bitrate detection task and 10% higher accuracy in the CODEC classification experiment.

Proceedings ArticleDOI
05 Mar 2017
TL;DR: This paper introduces eAMR (enhanced-AMR), a novel technique for delivering wideband speech over existing narrowband networks based on the existing AMR (narrowband) codec, which is already widely deployed.
Abstract: This paper introduces eAMR (enhanced-AMR), a novel technique for delivering wideband speech over existing narrowband networks. Instead of using a completely new wideband speech coder which would require new infrastructure, as is the case e.g. for AMR-WB or EVS, eAMR is based on the existing AMR (narrowband) codec, which is already widely deployed. eAMR uses an efficient coding model to represent the high frequencies of the speech signal, and combines it with watermarking technology to hide this data within a normal narrowband AMR bitstream. As a result, eAMR is a wideband codec which is fully compatible with the existing AMR network infrastructure, and therefore can be deployed as a handset-only feature.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A very low bit-rate speech codec based on the long-term Harmonic plus Noise Model (LT-HNM) to develop a very lowbit-rate coder for narrowband speech.
Abstract: This paper presents a very low bit-rate speech codec based on the long-term Harmonic plus Noise Model (LT-HNM). The HNM is known to be efficient in terms of speech signal representation, thanks to the use of natural parameters: fundamental and voicing cut-off frequencies, harmonics and noise frequencies. Besides, the long-term modeling is particularly efficient in reducing the data size of the model parameters. In this paper we combine both approaches, long-term modeling and HNM, to develop a very low bit-rate coder for narrowband speech. The obtained bit-rates are as low as 2.3 kbps with objective listening quality (perceptual evaluation of speech quality PESQ) of 2.3.

Proceedings ArticleDOI
07 Jun 2017
TL;DR: A HEVC based spatial resolution scaling type of mixed resolution coding model for frame interleaved multiview videos is presented, designed such that the information in intermediate frames of the center and neighboring views are down-sampled, while the frames still retaining the original size.
Abstract: Studies have shown that mixed resolution based video codecs, also known as asymmetric spatial inter/intra view video codecs are successful in efficiently coding videos for low bitrate transmission. In this paper a HEVC based spatial resolution scaling type of mixed resolution coding model for frame interleaved multiview videos is presented. The proposed codec is designed such that the information in intermediate frames of the center and neighboring views are down-sampled, while the frames still retaining the original size. The codec's reference frames structure is designed to efficiently encode frame interleaved multiview videos using a HEVC based mixed resolution codec. The multiview test video sequences were coded using the proposed codec and the standard MV-HEVC. Results show that the proposed codec gives significantly higher coding performance over the MV-HEVC codec at low bitrates.

Proceedings ArticleDOI
01 May 2017
TL;DR: The third National Graduate smart city technology and creative design contest organized by the audio data to do the experiment shows that the algorithm has achieved some results and the compression ratio of the optimized algorithm increases with the same code rate.
Abstract: AVS-P10 is the first national standard for mobile audio codec with completely independent intellectual property rights. In view of the current situation of the explosive growth of network audio data, According to the principle of AVS-P10 coding and based on AVS-P10 core encoder to make the algorithm compile. And the algorithm is optimized by the self search cycle optimization scheme. Finally, the third National Graduate smart city technology and creative design contest organized by the audio data to do the experiment. And the experimental results show that the algorithm has achieved some results. At the same time, the compression ratio of the optimized algorithm increases with the same code rate.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: A procedure for designing a speech production model that takes the samples of speech as input and generates the essential parameters from which the decoded speech can be obtained close to the original speech is described.
Abstract: In the speech signal transmission, there is lot of difference between the bit rate requirement and information rate Many codec were designed for speech coding to decrease the bandwidth requirements This paper aims to provide a technical aspect for designing the encoder for 590 kbps Narrow-band AMR codec using MATLAB as simulation tool This paper describes a procedure for designing a speech production model that takes the samples of speech as input and generates the essential parameters from which the decoded speech can be obtained close to the original speech Beyond AMR, additional techniques supporting codec like AMR-WB and Enhanced Voice Service are working for 3G and 4G services respectively

Proceedings ArticleDOI
20 Apr 2017
TL;DR: The article describes the results of research on the selection of these bits of the codebook codec G.729 which the negation of the least have influence to the loss of quality and fidelity of the output signal.
Abstract: Network steganography is dedicated in particular for those communication services for which there are no bridges or nodes carrying out unintentional attacks on steganographic sequence. In order to set up a hidden communication channel the method of data encoding and decoding was implemented using code books of codec G.729. G.729 codec includes, in its construction, linear prediction vocoder CS-ACELP (Conjugate Structure Algebraic Code Excited Linear Prediction), and by modifying the binary content of the codebook, it is easy to change a binary output stream. The article describes the results of research on the selection of these bits of the codebook codec G.729 which the negation of the least have influence to the loss of quality and fidelity of the output signal. The study was performed with the use of subjective and objective listening tests.

Patent
Cho Yong Rae1
27 Apr 2017
TL;DR: In this paper, a method of designing to transform an a single-format decoder to a multi-format decoding system using a first codec and a second codec, and an electronic apparatus thereof, is presented.
Abstract: Disclosed are a method of designing to transform an a single-format decoder, which performs image processing on an image signal using a first codec, to a multi-format decoder, which performs image processing on an image signal using a second codec, and an electronic apparatus thereof. The method includes processing first information about the first codec and second information about the second codec to produce a first transform coefficient associated with the first codec and a second transform coefficient associated with the second codec, through a predetermined transform method; determining a similarity between the first transform coefficient associated with the first codec and the second transform coefficient associated with the second codec; and converting the single-format decoder to the multi-format decoder by adding at least one of a shifter, an adder, and a subtractor, based on the determined similarity. Consequently, a reduction in chip size and cost may be achieved.