scispace - formally typeset
Search or ask a question

Showing papers on "Residual frame published in 2009"


Patent
Nobuyuki Washio1, Shouji Harada1
11 Sep 2009
TL;DR: A speech recognition system includes a feature calculating unit, a sound level calculating unit that calculates an input sound level in each frame; a decoding unit that matches the feature of each frame with an acoustic model and a linguistic model, and outputs a recognized word sequence; a start-point detector that determines a start frame of a speech section based on a reference value; an end-point detectors that determines an end frame of the speech section, and an reference value updating unit that updates the reference value in accordance with variations in the input sound levels as mentioned in this paper.
Abstract: A speech recognition system includes the following: a feature calculating unit; a sound level calculating unit that calculates an input sound level in each frame; a decoding unit that matches the feature of each frame with an acoustic model and a linguistic model, and outputs a recognized word sequence; a start-point detector that determines a start frame of a speech section based on a reference value; an end-point detector that determines an end frame of the speech section based on a reference value; and a reference value updating unit that updates the reference value in accordance with variations in the input sound level. The start-point detector updates the start frame every time the reference value is updated. The decoding unit starts matching before being notified of the end frame and corrects the matching results every time it is notified of the start frame. The speech recognition system can suppress a delay in response time while performing speech recognition based on a proper speech section.

142 citations


Patent
James Bankoski1, Yaowu Xu1, Paul Wilkins1
10 Sep 2009
TL;DR: In this article, a method for digital video encoding prediction comprising creating a constructed reference frame using an encoder and compressing a series of source video frames using the constructed referenceframe to obtain a bitstream including a compressed digital video signal for a subsequent decoding process is presented.
Abstract: Disclosed herein is a method for digital video encoding prediction comprising creating a constructed reference frame using an encoder and compressing a series of source video frames using the constructed reference frame to obtain a bitstream including a compressed digital video signal for a subsequent decoding process. The constructed reference frame is omitted from the series of digital video frames during the subsequent viewing process.

77 citations


Patent
Min Dai1, Tao Xue1, Chia-Yuan Teng1
29 Jul 2009
TL;DR: In this article, intelligent frame skipping techniques that may be used by an encoding device or a decoding device to facilitate frame skipping in a manner that may help to minimize quality degradation due to the frame skipping are described.
Abstract: This disclosure provides intelligent frame skipping techniques that may be used by an encoding device or a decoding device to facilitate frame skipping in a manner that may help to minimize quality degradation due to the frame skipping. In particular, the described techniques may implement a similarity metric designed to identify good candidate frames for frame skipping. In this manner, noticeable reductions in the video quality caused by frame skipping, as perceived by a viewer of the video sequence, may be reduced relative to conventional frame skipping techniques. The described techniques advantageously operate in a compressed domain.

71 citations


Patent
10 Aug 2009
TL;DR: In this article, a night vision device and method for filtering a series of image frames that depict a moving subject, which improves the signal-to-noise ratio of each image frame, is provided.
Abstract: A night vision device and method for filtering a series of image frames that depict a moving subject, which thereby improves the signal-to-noise ratio of each image frame, is provided. A composite image is formed for each image frame by combining pixel values in a current image frame with pixel values in composite images corresponding to image frames acquired before the current image frame. Additionally, pixels values in image frames acquired subsequent to the acquisition of the current image frame are included when forming the composite image. A bi-directional recursive filter is used to weight the contributions from the previous composite images and subsequent image frames with a decay constant. Motion of the imaging system is optionally compensated for by establishing a moving reference frame and shifting the image frames to account for this motion; thus, registering the image frames before filtering the current image frame.

69 citations


Patent
27 Mar 2009
TL;DR: In this article, a method of processing a video signal is disclosed, which includes predicting a current picture using the prediction information of the macroblock, and applying a filter using the predicted current picture and the filter information.
Abstract: A method of processing a video signal is disclosed. The present invention includes receiving prediction information of a macroblock and filer information, predicting a current picture using the prediction information of the macroblock, and applying a filter using the predicted current picture and the filter information. Accordingly, accuracy of prediction can be enhanced by applying a filter to a frame predicted before a residual for a predicted frame is coded. As the residual is reduced, efficiency of video signal processing can be enhanced.

69 citations


Patent
14 Apr 2009
TL;DR: In this article, the macro block is encoded based on difference values of residual data using corresponding residual blocks in a past frame and a future frame of the base layer which are residual data corresponding to image difference values.
Abstract: Disclosed is a method for encoding a decoding a video signal. In the procedure of encoding the video signal, when a frame temporarily simultaneous with a frame including a macro block of an enhanced layer which will obtain a prediction video does not exist in a base layer, the macro block is encoded based on difference values of residual data using corresponding residual blocks in a past frame and a future frame of the base layer which are residual data corresponding to image difference values and using a residual block for the macro block of the enhanced layer. In another embodiment, the macro block is encoded based on difference values of residual data using corresponding residual blocks in a past frame and a future frame of the enhanced layer and the residual block for the macro block. Accordingly, a residual prediction mode is applied for a macro block of an enhanced layer even if a frame temporally simultaneous with a frame of the enhanced layer does not exist in a base layer, thereby improve coding efficiency.

64 citations


Journal ArticleDOI
TL;DR: Simulation results show that this scheme provides better reconstruction results than existing compressive sensing video acquisition schemes, such as 2-D or 3-D wavelet methods and the minimum total-variance (TV) method.
Abstract: We present a compressive sensing video acquisition scheme that relies on the sparsity properties of video in the spatial domain. In this scheme, the video sequence is represented by a reference frame, followed by the difference of measurement results between each pair of neighboring frames. The video signal is reconstructed by first reconstructing the frame differences using 1 minimization algorithm, then adding them sequentially to the reference frame. Simulation results on both simulated and real video sequences show that when the spatial changes between neighboring frames are small, this scheme provides better reconstruction results than existing compressive sensing video acquisition schemes, such as 2-D or 3-D wavelet methods and the minimum total-variance (TV) method. This scheme is suitable for compressive sensing acquisition of video sequences with relatively small spatial changes. A method that estimates the amount of spatial change based on the statistical properties of measurement results is also presented.

59 citations


Patent
02 Jul 2009
TL;DR: In this article, a video transcoding system and method employing an improved rate control algorithm is presented, where a plurality of frames in an input video bitstream are received by the system, in which each frame is in a first coding format, and complexity information indicating the complexity of the frame after decoding is obtained.
Abstract: A video transcoding system and method employing an improved rate control algorithm. A plurality of frames in an input video bitstream are received by the system, in which each frame is in a first coding format. Each frame in the input bitstream is decoded, and complexity information indicating the complexity of the frame after decoding is obtained. An estimated number of bits to allocate for the respective frame is calculated. Using a rate estimation model that employs the complexity information for the respective frame, a picture cost for the frame is calculated based on the estimated number of bits allocated to encode the frame, and a parameter of the rate estimation model. A target cost for the respective frame is calculated based at least in part on the picture cost 10 and the complexity information for the frame. A quantization parameter (QP) is calculated that, when used to encode the respective frame in a second coding format, would generate an encoded frame having an actual cost approximately equal to the target cost. The respective frame is encoded using the calculated QP, and the frames encoded in the second coding format are provided in an output video bitstream.

58 citations


Patent
24 Jun 2009
TL;DR: In this paper, the authors present video encoding and decoding techniques for modified temporal compression based on fragmented references rather than complete reference pictures, which are used as reference pictures for generating predicted frames during a motion compensation process, rather than the entire frame.
Abstract: In general, this disclosure describes techniques for encoding and decoding sequences of video frames using fragmentary reference pictures. The disclosure presents video encoding and decoding techniques for modified temporal compression based on fragmented references rather than complete reference pictures. In a typical sequence of video frames, only a portion (i.e., a tile) of each frame includes moving objects. Moreover, in each frame, the moving objects tend to be confined to specific areas that are common among each frame in the sequence of video frames. As described herein, such common areas of motion are identified. Pictures are then extracted from the identified areas of the video frames. Because these pictures may represent only portions of the frames, this disclosure refers to these pictures as "fragments." It is then these fragments that are used as reference pictures for generating predicted frames during a motion compensation process, rather than the entire frame.

57 citations


Journal ArticleDOI
TL;DR: The proposed spatio-temporal auto regressive model for frame rate upconversion is able to yield the interpolated frames with high performance in terms of both subjective and objective qualities.
Abstract: This paper proposes a spatio-temporal auto regressive (STAR) model for frame rate upconversion. In the STAR model, each pixel in the interpolated frame is approximated as the weighted combination of a sample space including the pixels within its two temporal neighborhoods from the previous and following original frames as well as the available interpolated pixels within its spatial neighborhood in the current to-be-interpolated frame. To derive accurate STAR weights, an iterative self-feedback weight training algorithm is proposed. In each iteration, first the pixels of each training window in the interpolated frames are approximated by the sample space from the previous and following original frames and the to-be-interpolated frame. And then the actual pixels of each training window in the original frame are approximated by the sample space from the previous and following interpolated frames and the current original frame with the same weights. The weights of each training window are calculated by jointly minimizing the distortion between the interpolated frames in the current and previous iterations as well as the distortion between the original frame and its interpolated one. Extensive simulation results demonstrate that the proposed STAR model is able to yield the interpolated frames with high performance in terms of both subjective and objective qualities.

56 citations


Patent
08 Jul 2009
TL;DR: In this article, an audio encoder is adapted for encoding frames of a sampled audio signal to obtain encoded frames, wherein a frame comprises a number of time domain audio samples, comprising a predictive coding analysis stage (110) for determining information on coefficients of a synthesis filter and information on a prediction domain frame based on a frame of audio samples.
Abstract: An audio encoder (100) adapted for encoding frames of a sampled audio signal to obtain encoded frames, wherein a frame comprises a number of time domain audio samples, comprising a predictive coding analysis stage (110) for determining information on coefficients of a synthesis filter and information on a prediction domain frame based on a frame of audio samples. The audio encoder (100) further comprises a frequency domain transformer (120) for transforming a frame of audio samples to the frequency domain to obtain a frame spectrum and an encoding domain decider (130). Moreover, the audio encoder (100) comprises a controller (140) for determining an information on a switching coefficient when the encoding domain decider decides that encoded data of a current frame is based on the information on the coefficients and the information on the prediction domain frame when encoded data of a previous frame was encoded based on a previous frame spectrum.

Journal ArticleDOI
TL;DR: A low-complexity processing technique and robust FRUC algorithm that utilizes a translational motion vector model of the first- and the second-order and detects the continuity of these motion vectors to improve perceptual quality of interpolated frame.
Abstract: Two challenging situations for video frame rate up-conversion (FRUC) are first identified and analyzed; namely, when the input video has abrupt illumination change and/or a low frame rate. Then, a low-complexity processing technique and robust FRUC algorithm are proposed to address these two issues. The proposed algorithm utilizes a translational motion vector model of the first- and the second-order and detects the continuity of these motion vectors. Additionally, in order to improve perceptual quality of interpolated frame, spatial smoothness criterion is employed. The superior performance of the proposed algorithm has been tested extensively and representative examples are given in this work.

Patent
Cha Zhang1, Dinei Florencio1
25 Jun 2009
TL;DR: In this paper, a virtual viewpoint is used to determine expected contributions of individual portions of the frames to a synthesized image of the scene from the viewpoint position using the frames, and the frames are transmitted in compressed form via a network to a remote device, which is configured to render the scene using the compressed frames.
Abstract: Multi-view video that is being streamed to a remote device in real time may be encoded. Frames of a real-world scene captured by respective video cameras are received for compression. A virtual viewpoint, positioned relative to the video cameras, is used to determine expected contributions of individual portions of the frames to a synthesized image of the scene from the viewpoint position using the frames. For each frame, compression rates for individual blocks of a frame are computed based on the determined contributions of the individual portions of the frame. The frames are compressed by compressing the blocks of the frames according to their respective determined compression rates. The frames are transmitted in compressed form via a network to a remote device, which is configured to render the scene using the compressed frames.

Patent
26 Jun 2009
TL;DR: In this paper, a method and a device are described for selecting between multiple available filters in an encoder to provide a frame having a low error and distortion rate for each full and sub-pixel position, determining whether to use an alternative filter over the default filter during interpolation.
Abstract: A method and a device are described for selecting between multiple available filters in an encoder to provide a frame having a low error and distortion rate. For each full and sub pixel position, determining whether to use an alternative filter over the default filter during interpolation by estimating the rate distortion gain of using each filter and signaling to the decoder the optimal filter(s) applied to each full and sub-pixel position. In one embodiment, identifying a reference frame and a current frame, interpolating the reference frame using a default filter to create a default interpolated frame, interpolating the reference frame using an alternative filter to create an alternative interpolated frame, determining for each sub-pixel position whether to use the default filter or the alternative filter based on a minimal cost to generate a final reference frame.

Patent
24 Nov 2009
TL;DR: In this paper, a method for comparing the frame rate of image capture by an image sensor to a frame rate threshold at an image capture device is presented. But the method is limited to a single image capture.
Abstract: In a particular embodiment, a method is disclosed that includes comparing a frame rate of image capture by an image sensor to a frame rate threshold at an image capture device. The method also includes when the frame rate is less than the frame rate threshold, increasing the frame rate to a modified frame rate that is greater than or at least equal to the frame rate threshold. The method further includes performing an autofocus operation on an image to be captured at the modified frame rate.

Proceedings ArticleDOI
Ling Shao1, Ling Ji1
25 May 2009
TL;DR: A novel algorithm for key frame extraction based on intra-frame and inter-frame motion histogram analysis is proposed and validated by a large variety of real-life videos.
Abstract: Key frame extraction is an important technique in video summarization, browsing, searching, and understanding. In this paper, a novel algorithm for key frame extraction based on intra-frame and inter-frame motion histogram analysis is proposed. The extracted key frames contain complex motion and are salient in respect to their neighboring frames, and can be used to represent actions and activities in video. The key frames are first initialized by finding peaks in the curve of entropy calculated on motion histograms in each video frame. The peaked entropies are then weighted by inter-frame saliency which we use histogram intersection to output final key frames. The effectiveness of the proposed method is validated by a large variety of real-life videos.

Patent
23 Feb 2009
TL;DR: In this paper, a density image for each of the first frame (frame i) and the second frame(frame j) is used to obtain the translation between the images and thus image-to-image point correspondence.
Abstract: Method (300) for registration of two or more of frames of three dimensional (3D) point cloud data (200-i, 200-j). A density image for each of the first frame (frame i) and the second frame (frame j) is used to obtain the translation between the images and thus image-to-image point correspondence. Correspondence for each adjacent frame is determined using correlation of the 'filtered density' images. The translation vector or vectors are used to perform a coarse registration of the 3D point cloud data in one or more of the XY plane and the Z direction. The method also includes a fine registration process applied to the 3D point cloud data (200-i, 200-j). Corresponding transformations between frames (not just adjacent frames) are accumulated and used in a 'global' optimization routine that seeks to find the best translation, rotation, and scale parameters that satisfy all frame displacements.

Patent
06 Mar 2009
TL;DR: In this paper, the authors present techniques for improving the rendering and management of client desktops and subsequent transmission to the remote client by merging rendering functions and encoding functions onto the same chip so that frame data does not need to be transferred, calculation of a tile-based checksum for determining which tiles have changed from frame to frame, and dropping of tiles waiting to be transmitted if network bandwidth or decode speed is limiting the transmission and an equivalent tile in a subsequent frame is available to replace it.
Abstract: Example embodiments of the present disclosure provide techniques for improving the rendering and management of client desktops and the subsequent transmission to the remote client. The techniques may minimize the movement of frame data within the server, the amount of data to be compressed, the amount of data transmitted over the network, and the amount of data to be decompressed. Various embodiments are disclosed for merging rendering functions and encoding functions onto the same chip so that frame data does not need to be transferred, calculation of a tile-based checksum for determining which tiles have changed from frame to frame, the dropping of tiles waiting to be transmitted if network bandwidth or decode speed is limiting the transmission and an equivalent tile in a subsequent frame is available to replace it, and the transfer of the frame buffer into the chip from an external GPU using one of three modes.

Patent
22 Apr 2009
TL;DR: In this article, the second image data is adjusted by at least partially compensating for offsets between portions of the first image data with respect to corresponding parts of the second input image data to produce adjusted second data.
Abstract: Systems and methods to selectively combine video frame image data are disclosed. First image data corresponding to a first video frame and second image data corresponding to a second video frame are received from an image sensor. The second image data is adjusted by at least partially compensating for offsets between portions of the first image data with respect to corresponding portions of the second image data to produce adjusted second image data. Combined image data corresponding to a combined video frame is generated by performing a hierarchical combining operation on the first image data and the adjusted second image data.

Patent
01 Aug 2009
TL;DR: In this article, a method and apparatus for decoding an encoded or tagged video frame provides a way, for a receiver, for example, to determine whether the video content is 3D content or 2D content.
Abstract: A method and apparatus for encoding or tagging a video frame provides a way to indicate, to a receiver, for example, whether the video content is 3-D content or 2-D content. A method and apparatus for decoding an encoded or tagged video frame provides a way, for a receiver, for example, to determine whether the video content is 3-D content or 2-D content. 3-D video data may be encoded by replacing lines of at least one video frame with a specific color or pattern. When a decoder detects the presence of the colored or patterned lines in an image frame, it may interpret them as an indicator that 3-D video data is present.

Patent
Lidong Xu1, Yi-Jen Chiu1, Wenhao Zhang1
25 Sep 2009
TL;DR: In this article, a motion estimation (ME) method based on reconstructed reference pictures in a B frame or in a P frame at a video decoder is proposed to obtain a motion vector (MV) for a current input block.
Abstract: Methods and systems to apply motion estimation (ME) based on reconstructed reference pictures in a B frame or in a P frame at a video decoder. For a P frame, projective ME may be performed to obtain a motion vector (MV) for a current input block. In a B frame, both projective ME and mirror ME may be performed to obtain an MV for the current input block. The ME process can be performed on sub-partitions of the input block, which may reduce the prediction error without increasing the amount of MV information in the bitstream. Decoder-side ME can be applied for the prediction of existing inter frame coding modes, and traditional ME or the decoder-side ME can be adaptively selected to predict a coding mode based on a rate distribution optimization (RDO) criterion.

Journal ArticleDOI
TL;DR: This work improves H.264 rate control scheme using two tools, the incremental proportional-integral-differential (PID) algorithm and the frame complexity estimation, and decreases the average standard deviation of video quality by 32.29%.

Patent
04 Jun 2009
TL;DR: In this article, a method for reconstructing an erased speech frame is described, where a second speech frame was received from a buffer and a third frame was reconstructed from one or both of the second and third speech frames.
Abstract: A method for reconstructing an erased speech frame is described. A second speech frame is received from a buffer. The index position of the second speech frame is greater than the index position of the erased speech frame. The type of packet loss concealment (PLC) method to use is determined based on one or both of the second speech frame and a third speech frame. The index position of the third speech frame is less than the index position of the erased speech frame. The erased speech frame is reconstructed from one or both of the second speech frame and the third speech frame.

Patent
邸佩云, 胡昌启, 元辉, 马彦卓, 常义林 
13 May 2009
TL;DR: In this paper, a method for key frame extraction from video data streams in a video service system is presented. But the method is not suitable for the case where the motion vector of each frame in the video data stream is obtained and the characteristics vector aggregate according to motion vector is determined.
Abstract: An extracting method of the key frame, a video service apparatus and a video service system are disclosed, in which the method is applied to extract the key frame of the video data stream in the video service system. The method comprises: obtaining the motion vector of each frame in the video data stream, and obtaining the characteristics vector aggregate according to the motion vector; determine whether the direction and the amplitude of the motion vector corresponding to the characteristics vector aggregate of the forward and the backward adjacent two frames occurs change or not; extracting the key frame using the determining result. So the frame whose speed changes abruptly could be extracted effectively.

Patent
16 Feb 2009
TL;DR: In this paper, the authors present a method for image processing, comprising receiving a video frame, coding a first portion of the video frame at a different quality than a second portion, based on an optical property.
Abstract: Systems and methods for image processing, comprising receiving a video frame, coding a first portion of the video frame at a different quality than a second portion of the video frame, based on an optical property, and displaying the video frame.

Patent
Masaaki Sasaki1
15 Dec 2009
TL;DR: When a displacement between a reference frame of a plurality of images acquired by continuous image pickup and a target frame is less than a first threshold indicating that such a frame is not likely to be affected by occlusion, smoothing is performed on an object area through a morphological operation.
Abstract: When a displacement between a reference frame of a plurality of images acquired by continuous image pickup and a target frame is less than a first threshold indicating that such a frame is not likely to be affected by occlusion, smoothing is performed on an object area through a morphological operation with a normal process amount. Conversely, when a displacement between the reference frame of the plurality of images acquired by continuous image pickup and a target frame is larger than or equal to the first threshold, smoothing is performed with the process amount of morphological operation being increased with respect to the normal process amount.

Patent
13 Nov 2009
TL;DR: In this article, a realigned second frame is produced by realigning the second frame with the first frame if the frames are determined to be misaligned, and the luminance data from the realigned frame and luminance values from the pixels of the pixel of the original frame are used to determine if an undesired flicker condition exists.
Abstract: Circuitry, apparatus and methods provide flicker detection and improved image generation for digital cameras that employ image sensors. In one example, circuitry and methods are operative to compare a first captured frame with a second captured frame that may be, for example, sequential and consecutive or non-consecutive if desired, to determine misalignment of scene content between the frames. A realigned second frame is produced by realigning the second frame with the first frame if the frames are determined to be misaligned. Luminance data from the realigned second frame and luminance data from the pixels of the first frame are used to determine if an undesired flicker condition exists. If an undesired flicker condition is detected, exposure time control information is generated for output to the imaging sensor that captured the frame, to reduce flicker. This operation may be done, for example, during a preview mode for a digital camera, or may be performed at any other suitable time.

Patent
26 Nov 2009
TL;DR: In this article, a virtual predicted block is defined within memory to hold the pixel values of a reference block used in motion compensation with respect to a macroblock being reconstructed, which avoids the need to pad the entire reference frame.
Abstract: Methods and systems for decoding motion compensated video. In the decoding process a virtual predicted block is defined within memory to hold the pixel values of a reference block used in motion compensation with respect to a macroblock being reconstructed. If the reference block includes out-of-boundary pixels from the reference frame, the corresponding pixels within the virtual predicted block are padded using the boundary values of the reference frame. This avoids the need to pad the entire reference frame.

Journal ArticleDOI
TL;DR: A lossless frame memory recompression scheme including a lossless pixel compression algorithm, an efficient address table organization method for random accessibility, and a frame memory placement scheme for compressed data to reduce the effective access time of SDRAM by suppressing row switching is proposed.
Abstract: In recent video applications such as MPEG or H.264/AVC, the bandwidth requirement for frame memory has become one of the most critical problems. Compressing pixel data before storing in off-chip frame memory is required to alleviate this problem. In this paper, we propose a lossless frame memory recompression scheme including 1) a lossless pixel compression algorithm, 2) an efficient address table organization method for random accessibility, and 3) frame memory placement scheme for compressed data to reduce the effective access time of SDRAM by suppressing row switching. Experimental results show that the proposed method reduces the frame data to 48% compared to that of the uncompressed one with H.264/AVC high profile encoder system, where 6.1 kB of SRAM is required for the address table of full HD video.

Proceedings ArticleDOI
09 Jul 2009
TL;DR: This paper continues the researches on storage and bandwidth reduction for stereo images by using reversible watermarking by embedding into one frame of the stereo pair the information needed to recover the other frame, the transmission/storage requirements are halved.
Abstract: This paper continues our researches on storage and bandwidth reduction for stereo images by using reversible watermarking. By embedding into one frame of the stereo pair the information needed to recover the other frame, the transmission/storage requirements are halved. Furthermore, the content of the image remains available and one out of the two images is exactly recovered. The quality of the other frame depends on two features: the embedding bit-rate of the watermarking and the size of the information needed to be embedded. This paper focuses on the second feature. Instead of a simple residual between the two frames, a disparity compensation scheme is used. The advantage is twofold. First, the quality of the recovered frame is improved. Second, at detection, the disparity frame is immediately available for 3D computation. Experimental results on standard test images are provided.