scispace - formally typeset
Search or ask a question

Showing papers by "Houqiang Li published in 2013"


Journal ArticleDOI
TL;DR: The proposed JIGSAW${+}$ is able to achieve 5% gain in terms of search performance and is ten times faster.
Abstract: This paper describes a novel multimodal interactive image search system on mobile devices. The system, the Joint search with ImaGe, Speech, And Word Plus (JIGSAW ${+}$ ), takes full advantage of the multimodal input and natural user interactions of mobile devices. It is designed for users who already have pictures in their minds but have no precise descriptions or names to address them. By describing it using speech and then refining the recognized query by interactively composing a visual query using exemplary images, the user can easily find the desired images through a few natural multimodal interactions with his/her mobile device. Compared with our previous work JIGSAW, the algorithm has been significantly improved in three aspects: 1) segmentation-based image representation is adopted to remove the artificial block partitions; 2) relative position checking replaces the fixed position penalty; and 3) inverted index is constructed instead of brute force matching. The proposed JIGSAW ${+}$ is able to achieve 5% gain in terms of search performance and is ten times faster.

111 citations


Journal ArticleDOI
TL;DR: This article proposes a novel geometric coding algorithm, to encode the spatial context among local features for large-scale partial-duplicate Web image retrieval, which achieves comparable performance to other state-of-the-art global geometric verification methods, but is more computationally efficient.
Abstract: Most large-scale image retrieval systems are based on the bag-of-visual-words model. However, the traditional bag-of-visual-words model does not capture the geometric context among local features in images well, which plays an important role in image retrieval. In order to fully explore geometric context of all visual words in images, efficient global geometric verification methods have been attracting lots of attention. Unfortunately, current existing methods on global geometric verification are either computationally expensive to ensure real-time response, or cannot handle rotation well. To solve the preceding problems, in this article, we propose a novel geometric coding algorithm, to encode the spatial context among local features for large-scale partial-duplicate Web image retrieval. Our geometric coding consists of geometric square coding and geometric fan coding, which describe the spatial relationships of SIFT features into three geo-maps for global verification to remove geometrically inconsistent SIFT matches. Our approach is not only computationally efficient, but also effective in detecting partial-duplicate images with rotation, scale changes, partial-occlusion, and background clutter.Experiments in partial-duplicate Web image search, using two datasets with one million Web images as distractors, reveal that our approach outperforms the baseline bag-of-visual-words approach even following a RANSAC verification in mean average precision. Besides, our approach achieves comparable performance to other state-of-the-art global geometric verification methods, for example, spatial coding scheme, but is more computationally efficient.

80 citations


Journal ArticleDOI
TL;DR: This scheme introduces several encoding and in-loop coding tools for depth and texture video coding, such as depth-based texture motion vector prediction, depth-range-based weighted prediction, joint inter-view depth filtering, and gradual view refresh.
Abstract: This paper presents a multiview-video-plus-depth coding scheme, which is compatible with the advanced video coding (H.264/AVC) standard and its multiview video coding (MVC) extension. This scheme introduces several encoding and in-loop coding tools for depth and texture video coding, such as depth-based texture motion vector prediction, depth-range-based weighted prediction, joint inter-view depth filtering, and gradual view refresh. The presented coding scheme is submitted to the 3D video coding (3DV) call for proposals (CfP) of the Moving Picture Experts Group standardization committee. When measured with commonly used objective metrics against the MVC anchor, the proposed scheme provides an average bitrate reduction of 26% and 35% for the 3DV CfP test scenarios with two and three views, respectively. The observed bitrate reduction is similar according to an analysis of the results obtained for the subjective tests on the 3DV CfP submissions.

74 citations


Proceedings ArticleDOI
19 May 2013
TL;DR: A Quantization Parameter (QP) refinement algorithm used in Rate-Distortion Optimization (RDO) process to improve the coding efficiency for High Efficiency Video Coding (HEVC) and the relationship between the QP value and Lagrange multiplier is analyzed.
Abstract: This paper presents a Quantization Parameter (QP) refinement algorithm used in Rate-Distortion Optimization (RDO) process to improve the coding efficiency for High Efficiency Video Coding (HEVC). To minimize the RD cost, QP is one of the parameters that can be optimized. Usually, multiple-QP optimization can be applied to choose the best QP value. However, this kind of optimization increases the encoding complexity significantly. We analyze the relationship between the QP value and Lagrange multiplier in this paper firstly. And then we apply the relationship between QP and Lagrange multiplier to the encoding process of HEVC. The proposed QP refinement method has already been adopted into HM (the HEVC reference software). The experimental results show that proposed algorithm can achieve about 1.4%~1.9% bit saving for luma component on average compared with the default anchor of HM-8.0. The bit saving on chroma components are larger than that of luma. The proposed method improves the coding efficiency without increasing the processing time.

68 citations


Journal ArticleDOI
TL;DR: A novel multi-level frame interpolation scheme by exploiting the interactions among different levels based on their distinct characteristics and intertwined relationships that has superior performance over several classical schemes in both subjective visual quality and objective peak signal-to-noise ratio/structure similarity measurements.
Abstract: This paper proposes a novel multi-level frame interpolation scheme by exploiting the interactions among different levels. The proposed scheme includes three major stages that work at block level, pixel level, and sequence level, respectively. Effective algorithms are designed for each stage, i.e., block-level motion estimation with dropping unreliable motion vectors, pixel-level motion vector-guided partial scale-invariant feature transform flow matching, and sequence-level 3-D total variation regularized completion. Compared to traditional methods that focus mostly at one single level, the proposed scheme manages to recognize and utilize the interactions among the three levels based on their distinct characteristics and intertwined relationships. With a proper exploitation of interactions, unique advantages for each level can be effectively preserved while inherent limitations of a given level can be overcome by utilizing information from other levels. Extensive experiments have confirmed its superior performance over several classical schemes, in both subjective visual quality and objective peak signal-to-noise ratio/structure similarity measurements, and typical artifacts can be significantly reduced.

59 citations


Proceedings ArticleDOI
21 Oct 2013
TL;DR: A novel scale-based region growing algorithm to detect multilingual text in various fonts, sizes, and with complex background is presented and offers insights on efficiently deploying local features in numerous applications, such as visual search.
Abstract: Scene text is widely observed in our daily life and has many important multimedia applications. Unlike document text, scene text usually exhibits large variations in font and language, and suffers from low resolution, occlusions and complex background. In this paper, we present a novel scale-based region growing algorithm for scene text detection. We first distinguish SIFT features in text regions from those in background by exploring the inter- and intra-statistics of SIFT features. Then scene text regions in images are identified by scale-based region growing, which explores the geometric context of SIFT keypoints in local regions. Our algorithm is very effective to detect multilingual text in various fonts, sizes, and with complex background. In addition, it offers insights on efficiently deploying local features in numerous applications, such as visual search. We evaluate our algorithm on three datasets and achieve the state-of-the-art performance.

28 citations


Journal ArticleDOI
TL;DR: A robust temporal-spatial decomposition (RTSD) model is proposed that treats video frames as a unity from both the temporal and spatial point of view, and demonstrates robustness to noise and certain background variations.
Abstract: In this paper, we propose a robust temporal-spatial decomposition (RTSD) model and discuss its applications in video processing. A video sequence usually possesses high correlations among and within its frames. Fully exploiting the temporal and spatial correlations enables efficient processing and better understanding of the video sequence. Considering that the video sequence typically contains slowly changing background and rapidly changing foreground as well as noise, we propose to decompose the video frames into three parts: the temporal-spatially correlated part, the feature compensation part, and the sparse noise part. Accordingly, the decomposition problem can be formulated as the minimization of a convex function, which consists of a nuclear norm, a total variation (TV)-like norm, and an l1 norm. Since the minimization is nontrivial to handle, we develop a two-stage strategy to solve this decomposition problem, and discuss different alternatives to fulfil each stage of decomposition. The RTSD model treats video frames as a unity from both the temporal and spatial point of view, and demonstrates robustness to noise and certain background variations. Experiments on video denoising and scratch detection applications verify the effectiveness of the proposed RTSD model and the developed algorithms.

27 citations


Proceedings ArticleDOI
21 Oct 2013
TL;DR: This paper proposes a graph-based label propagation algorithm that employs neighborhood graph search to find the nearest neighbors on an image similarity graph built up with visual representations from deep neural networks and further aggregates their clicked queries/click counts to get the labels of the new image.
Abstract: Our objective is to estimate the relevance of an image to a query for image search purposes. We address two limitations of the existing image search engines in this paper. First, there is no straightforward way of bridging the gap between semantic textual queries as well as users' search intents and image visual content. Image search engines therefore primarily rely on static and textual features. Visual features are mainly used to identify potentially useful recurrent patterns or relevant training examples for complementing search by image reranking. Second, image rankers are trained on query-image pairs labeled by human experts, making the annotation intellectually expensive and time-consuming. Furthermore, the labels may be subjective when the queries are ambiguous, resulting in difficulty in predicting the search intention. We demonstrate that the aforementioned two problems can be mitigated by exploring the use of click-through data, which can be viewed as the footprints of user searching behavior, as an effective means of understanding query. The correspondences between an image and a query are determined by whether the image was searched and clicked by users under the query in a commercial image search engine. We therefore hypothesize that the image click counts in response to a query are as their relevance indications. For each new image, our proposed graph-based label propagation algorithm employs neighborhood graph search to find the nearest neighbors on an image similarity graph built up with visual representations from deep neural networks and further aggregates their clicked queries/click counts to get the labels of the new image. We conduct experiments on MSR-Bing Grand Challenge and the results show consistent performance gain over various baselines. In addition, the proposed approach is very efficient, completing annotation of each query-image pair within just 15 milliseconds on a regular PC.

22 citations


Proceedings ArticleDOI
19 May 2013
TL;DR: A novel wireless video transmission scheme named HDA-Cast is proposed, which is a hybrid digital-analog (HDA) coding scheme that integrates the advantages of digital coding and analog coding that avoids the "cliff effect" and can be regarded as a kind of wireless scalable video coding (WSVC).
Abstract: In this paper, we propose a novel wireless video transmission scheme named HDA-Cast, which is a hybrid digital-analog (HDA) coding scheme that integrates the advantages of digital coding and analog coding. Relative to most state-of-the-art video transmission methods, it avoids the “cliff effect” provided that the channel quality is within the expected range, gives better fairness among all receivers for multicast, and has strong adaption to channel variation. The evaluation results show that our HDA-Cast is 3.5-9.6 dB better than the SoftCast which is an up-to-date analog scheme. Owing to its strong adaption to channel variation, it can be regarded as a kind of wireless scalable video coding (WSVC).

22 citations


Journal ArticleDOI
TL;DR: This paper investigates the characteristics of blotches and scratches in space and time domain, and proposes a novel detection method based on two main steps: cartoon-texture decomposition in the space domain and content-defect in the time domain.
Abstract: In old video restoration, automatic detection of common defects, e.g., scratches and blotches, has always been emphasized. While prior thoughts mainly focus on detecting blotches and linear, vertical scratches separately, this paper contributes to a more generalized and challenging issue: simultaneous detection of blotches and complex scratches in video, with much less knowledge of them. We investigate the characteristics of blotches and scratches in space and time domain, and propose a novel detection method based on two main steps: cartoon-texture decomposition in the space domain and content-defect separation in the time domain. We then formulate it into convex optimization problems and develop corresponding algorithms. The experiment results demonstrate that the proposed method is of high detection accuracy, verifying the effectiveness of our detection via a video decomposition method.

21 citations


Journal ArticleDOI
17 Oct 2013
TL;DR: A novel approach to mobile visual localization according to a given image (typically associated with a rough GPS position) is presented, capable of providing a complete set of more accurate parameters about the scene geo-context including the real locations of both the mobile user and perhaps more importantly the captured scene, as well as the viewing direction.
Abstract: Mobile applications are becoming increasingly popular. More and more people are using their phones to enjoy ubiquitous location-based services (LBS). The increasing popularity of LBS creates a fundamental problem: mobile localization. Besides traditional localization methods that use GPS or wireless signals, using phone-captured images for localization has drawn significant interest from researchers. Photos contain more scene context information than the embedded sensors, leading to a more precise location description. With the goal being to accurately sense real geographic scene contexts, this article presents a novel approach to mobile visual localization according to a given image (typically associated with a rough GPS position). The proposed approach is capable of providing a complete set of more accurate parameters about the scene geo-context including the real locations of both the mobile user and perhaps more importantly the captured scene, as well as the viewing direction. To figure out how to make image localization quick and accurate, we investigate various techniques for large-scale image retrieval and 2D-to-3D matching. Specifically, we first generate scene clusters using joint geo-visual clustering, with each scene being represented by a reconstructed 3D model from a set of images. The 3D models are then indexed using a visual vocabulary tree structure. Taking geo-tags of the database image as prior knowledge, a novel location-based codebook weighting scheme proposed to embed this additional information into the codebook. The discriminative power of the codebook is enhanced, thus leading to better image retrieval performance. The query image is aligned with the models obtained from the image retrieval results, and eventually registered to a real-world map. We evaluate the effectiveness of our approach using several large-scale datasets and achieving estimation accuracy of a user's location within 13 meters, viewing direction within 12 degrees, and viewing distance within 26 meters. Of particular note is our showcase of three novel applications based on localization results: (1) an on-the-spot tour guide, (2) collaborative routing, and (3) a sight-seeing guide. The evaluations through user studies demonstrate that these applications are effective in facilitating the ideal rendezvous for mobile users.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: A modified MVC+D coding scheme, where only the base view is coded at the original resolution whereas dependent views are coded at reduced resolution, to enable inter-view prediction.
Abstract: The emerging MVC+D standard specifies the coding of Multiview Video plus Depth (MVD) data for enabling advanced 3D video applications. MVC+D specifications define the coding of all views of MVD at equal spatial resolution and apply a conventional MVC technique for coding the multiview texture and the depth independently. This paper presents a modified MVC+D coding scheme, where only the base view is coded at the original resolution whereas dependent views are coded at reduced resolution. To enable inter-view prediction, the base view is downsampled within the MVC coding loop to provide a relevant reference for dependent views. At the decoder side, the proposed scheme consists of a post-processing scheme which upsamples of the decoded views to their original resolution. The proposed scheme is compared against the original MVC+D scheme and an average of 4% delta bitrate reduction (dBR) in the coded views and 14.5% of dBR in the synthesized views are reported.

Proceedings ArticleDOI
15 Jul 2013
TL;DR: This work proposes an effective and efficient block matching method, Semantic-Spatial Matching (SSM), in which not only the spatial layout but also the semantic content is considered for block matching.
Abstract: Spatial Pyramid Matching (SPM) has been proven a simple but effective extension to bag-of-visual-words image representation for spatial layout information compensation. SPM describes image in coarse-to-fine scale by partitioning the image into blocks over multiple levels and the features extracted from each block are concatenated into a long vector representation. Based on the assumption that images from the same class have similar spatial configurations, SPM matches the blocks from different images according to their spatial layout, by aligning all blocks from an image in a fixed spatial order. However, target objects may appear at any location in the image with various backgrounds. Therefore, the fixed spatial matching in SPM fails to match similar objects located different locations. To solve this problem, we propose an effective and efficient block matching method, Semantic-Spatial Matching (SSM). In this method, not only the spatial layout but also the semantic content is considered for block matching. The experiments on two benchmark image classification datasets demonstrate the effectiveness of SSM.

Proceedings ArticleDOI
06 Jul 2013
TL;DR: A novel scheme for video denoising based on improved matrix recovery strategy that attempts to go beyond the conventional approaches that focus on the rank properties of the matrix by making use of a priori knowledge derived from the characteristics of video and noise.
Abstract: This article presents a novel scheme for video denoising based on improved matrix recovery strategy. The proposed scheme attempts to go beyond the conventional approaches that focus on the rank properties of the matrix by making use of a priori knowledge derived from the characteristics of video and noise. In this paper, we will first demonstrate that the conventional approach such as robust PCA (principal component analysis) is not effective when the video is corrupted by the mixture of impulse and Gaussian noises. The impulse noise can be considered sparse in the image domain and can be effectively filtered by matrix recovery. However, the dense Gaussian noise cannot be easily filtered because it is not sparse in either spatial or frequency domain. We shall show that this Gaussian noise corrupted video can be considered sparse in the 3D total variation domain. Based on this, we formulate the problem as a 3D total variation optimization and design an algorithm to solve this convex problem efficiently. Experimental results show that the proposed scheme achieves noticeable improvement over the state-of-the-art algorithm VBM3D [5].

Proceedings ArticleDOI
19 May 2013
TL;DR: An adaptive weighted distortion optimization algorithm is introduced to improve the coding efficiency of the High Efficiency Video Coding (HEVC) and can be applied to other coding schemes such as H.264/MPEG-4 AVC.
Abstract: This paper presents an adaptive weighted distortion optimization algorithm used in the Rate-Distortion Optimization (RDO) process of the High Efficiency Video Coding (HEVC). RDO is an important tool to improve the coding efficiency. Usually the distortion weights of different color components are equal or predetermined. In this paper, an adaptive weighted distortion optimization algorithm is introduced to improve the coding efficiency. The distortion weight is estimated according to the previous coded pictures belonging to the same temporal level, such that encoding complexity is almost unchanged. With the proposed adaptive weighted distortion optimization method, on average about 3.3% and up to 10.6% bit-saving are obtained based on the latest HEVC reference software, HM-8.0 and the corresponding common test conditions. The proposed algorithm can also be applied to other coding schemes such as H.264/MPEG-4 AVC.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: It is proposed to use the same coding approach as taken in SHVC and MV-HEVC for coding of MVD data and an inter-component motion vector prediction (ICP) method is introduced to exploit the redundancy between texture and depth views.
Abstract: The multiview-video-plus-depth (MVD) format has become popular in representing 3D video content in a manner that enables free-viewpoint capability in the decoder side. The scalable (SHVC) and multiview (MV-HEVC) extensions of the High Efficiency Video Coding standard (H.265/HEVC) enable their functionality without additional coding tools and use the same high level syntax. In this paper, it is proposed to use the same coding approach as taken in SHVC and MV-HEVC for coding of MVD data. Furthermore, an inter-component motion vector prediction (ICP) method, which is realized through the temporal motion vector prediction (TMVP) mechanism of H.265/HEVC, is introduced to exploit the redundancy between texture and depth views. The experimental results show that 1.0% and 3.6% bitrate reduction can be achieved by ICP compared to independent coding of texture and depth for synthesized views and depth views, respectively, when the ICP method is applied to a 3-view coding scenario intended for multi-view autostereoscopic displays.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: The experimental results show that proposed algorithm can lead to significant bit saving compared with the default anchor of AVS, and improves the coding efficiency without increasing the processing time.
Abstract: This paper presents a Lagrange multiplier determination method and Quantization Parameter (QP) refinement algorithm used in the Rate-Distortion Optimization (RDO) process to improve the coding efficiency for Audio Video Coding Standard (AVS) (IEEE 1857). This paper investigates the Lagrange multiplier setting problem for different kinds of pictures in the encoding process of AVS. When Lagrange multiplier is determined, to minimize the RD (Rate-Distortion) cost, the optimal QP can be selected by multiple-QP optimization. However, this kind of optimization increases the encoding time significantly. This paper analyzes the relationship between the QP value and Lagrange multiplier in this paper. And then this relationship is applied in the encoding process of AVS to refine the predetermined QP value. The experimental results show that proposed algorithm can lead to significant bit saving compared with the default anchor of AVS. The proposed method improves the coding efficiency without increasing the processing time.

Journal ArticleDOI
TL;DR: Simulation results suggest that the proposed subpixel-based downsampling method can achieve sharper down-sampled gray/font images compared with conventional pixel and subpixels-based methods, without noticeable color fringing artifacts.
Abstract: In general, subpixel-based downsampling can achieve higher apparent resolution of the down-sampled images on LCD or OLED displays than pixel-based downsampling. With the frequency domain analysis of subpixel-based downsampling, we discover special characteristics of the luma-chroma color transform choice for monochrome images. With these, we model the anti-aliasing filter design for subpixel-based monochrome image downsampling as a human visual system-based optimization problem with a two-term cost function and obtain a closed-form solution. One cost term measures the luminance distortion and the other term measures the chrominance aliasing in our chosen luma-chroma space. Simulation results suggest that the proposed method can achieve sharper down-sampled gray/font images compared with conventional pixel and subpixel-based methods, without noticeable color fringing artifacts.

Proceedings ArticleDOI
21 Jul 2013
TL;DR: A noise reduction approach for hyperspectral images (HSIs) is presented, drawing inspiration from low-rank matrix decomposition and the emerging mixed norm to propose a method dealing with various patterns of noise simultaneously.
Abstract: In this paper, a noise reduction approach for hyperspectral images (HSIs) is presented. Due to the assorted noise sources of HSIs, it seems difficult to describe the noise in a concise manner. Commonly, noise reduction algorithms are dedicated to a certain kind of noise, such as random or striping noise. Most of them in addition have somewhat idealized hypotheses. For example, the random noise is white or signal-independent, or the observed scene is spatially homogeneous or quasi-homogeneous. Thus a practically efficient and universal denoising method is preferred. Thanks to the low-rank characteristic of HSI signal, and the structural sparsity of HSI noise, we draw inspiration from low-rank matrix decomposition and the emerging mixed norm, to propose a method dealing with various patterns of noise simultaneously. Both simulated and real data experiments show the effectiveness of the proposed approach.

Proceedings ArticleDOI
19 May 2013
TL;DR: Experimental results on high-resolution remote sensing stereo images demonstrate that the proposed scheme is comparable to JPEG2000 with respect to the compression performance, but with much lower encoding complexity and storage requirement.
Abstract: On board compression coding of remote sensing images requires simple encoder and small storage, etc. However, the existing scheme such as JPEG2000 which is based on DWT has a complex encoder, and many algorithms based on DWT with a lower storage requirement have been researched. In this paper, we propose a novel line-based distributed lossless compression scheme for remote sensing stereo images with low storage costs and light complexity at the encoder. All the lines are encoded independently so that the required storage of the encoder is only one line. In order to achieve low encoding complexity, distributed coding techniques are used to exploit the spatial correlation and inter-view correlation of the stereo images. At the encoder, sub-sampled lines are successively encoded and transmitted. At the decoder, side information is generated with the knowledge of decoded sub-sampled lines and other previously decoded lines for the first view, and the second view will use the previous lines of the first view to remove the inter-view redundancy. In addition, line-based adaptive filter is performed to capture the spatial characteristics and template matching is used to remove the inter-view redundancy. Experimental results on high-resolution remote sensing stereo images demonstrate that the proposed scheme is comparable to JPEG2000 with respect to the compression performance, but with much lower encoding complexity and storage requirement.

Book ChapterDOI
07 Jan 2013
TL;DR: This paper presents a scalable video rewriting system which is featured by the ability to rewrite spatial enhancement layers and range-of-interest (ROI) of enhancement layers, which is suitable for more application scenarios and is more flexible.
Abstract: Scalable Video Coding (SVC), as an extension of H.264/AVC, has been designed to provide H.264/AVC compatible base layer and spatial, temporal and quality enhancement layers. Bit-stream rewriting in SVC standard makes it possible to convert a quality enhancement layer to a H.264/AVC bit-stream. So that H.264/AVC decoder users could also experience high quality video content when network condition and hardware permits. In this paper, we present a scalable video rewriting system which is featured by the ability to rewrite spatial enhancement layers and range-of-interest (ROI) of enhancement layers. Compared to traditional rewriting, the proposed system is suitable for more application scenarios and is more flexible.

Proceedings ArticleDOI
02 Dec 2013
TL;DR: It is proposed to change the inter-view prediction direction periodically together with gradual view refresh to balance the quality difference between views and Experimental results show that up to 9.1% and an average of 3.2% bitrate reduction are achieved with respect to the MPEG Anchor.
Abstract: Asymmetric stereoscopic video coding, where one view has a higher quality than the other, has been researched widely. In order to balance the quality difference between views, it has been suggested to alternate the higher-quality view periodically. However, it has also been found out that inter-view prediction is more effective when the source of prediction is the higher-quality view. In this paper, it is proposed to change the inter-view prediction direction periodically together with gradual view refresh. The proposal was implemented on top of the reference software for the multiview and multiview-plus-depth extensions of the High Efficiency Video Coding (H.265/HEVC) standard. Experimental results show that up to 9.1% and an average of 3.1% bitrate reduction are achieved with respect to the MPEG Anchor.

Proceedings ArticleDOI
19 May 2013
TL;DR: This paper focuses on SVC video packet transmission from Ethernet to 802.11 and proposes an application layer adaptive video packets encapsulation method, which appends the truncated enhancement layer slice of SVC into the idle space of the 802.
Abstract: In recent years, IEEE 802.11 wireless local area networks (WLANs) have been widely deployed. However, robust streaming of video over 802.11 WLANs still faces many challenges, such as bandwidth variation, diversified receivers, transportation over heterogeneous network, etc. Scalable Video Coding (SVC), an extension of the Advanced Video Coding (H.264/AVC), has been designed for heterogeneous network. This paper focuses on SVC video packet transmission from Ethernet to 802.11 and proposes an application layer adaptive video packets encapsulation method, which appends the truncated enhancement layer slice of SVC into the idle space of the 802.11 frame. The proposed method can efficiently utilize the bandwidth of 802.11 and promote the received video quality. The simulation results show that the proposed method can get an average gain about 0.7dB compared with conventional algorithms.

Proceedings ArticleDOI
20 Mar 2013
TL;DR: A novel low bit-rate compression scheme with sub pixel-based down-sampling and reconstruction (SPDR) for full color images that offers complete standard compliance, competitive rate-distortion performance, and superior subjective quality.
Abstract: We propose a novel low bit-rate compression scheme with sub pixel-based down-sampling and reconstruction (SPDR) for full color images. In the encoder stage, a decoder-dependent multi-channel sub pixel-based down-sampling is proposed, which is more effective in retaining high frequency detail than conventional pixel-based process. The decoder first decompresses the low-resolution image and then up-converts it to the original resolution using encoder dependent sub pixel-based reconstruction scheme by jointly considering the sub pixel-based down-sampling effect and the compression degradation. Compared to existing algorithms with comparable encoder and decoder complexity, the proposed SPDR offers complete standard compliance, competitive rate-distortion performance, and superior subjective quality.