scispace - formally typeset
Search or ask a question

Showing papers by "Houqiang Li published in 2010"


Proceedings ArticleDOI
25 Oct 2010
TL;DR: This paper proposes a novel scheme, spatial coding, to encode the spatial relationships among local features in an image, and achieves a 53% improvement in mean average precision and 46% reduction in time cost over the baseline bag-of-words approach.
Abstract: The state-of-the-art image retrieval approaches represent images with a high dimensional vector of visual words by quantizing local features, such as SIFT, in the descriptor space. The geometric clues among visual words in an image is usually ignored or exploited for full geometric verification, which is computationally expensive. In this paper, we focus on partial-duplicate web image retrieval, and propose a novel scheme, spatial coding, to encode the spatial relationships among local features in an image. Our spatial coding is both efficient and effective to discover false matches of local features between images, and can greatly improve retrieval performance. Experiments in partial-duplicate web image search, using a database of one million images, reveal that our approach achieves a 53% improvement in mean average precision and 46% reduction in time cost over the baseline bag-of-words approach.

248 citations


Journal ArticleDOI
01 May 2010-Allergy
TL;DR: Opposing roles of IL‐17A and IL‐25 in the regulation of TSLP production in human nasal epithelial cells are studied.
Abstract: To cite this article: Xu G, Zhang L, Wang DY, Xu R, Liu Z, Han DM, Wang XD, Zuo KJ, Li HB. Opposing roles of IL-17A and IL-25 in the regulation of TSLP production in human nasal epithelial cells. Allergy 2010; 65: 581–589. Abstract Background: The importance of IL-17A, IL-17F, and IL-25 in allergic rhinitis (AR), as well as their possible role in regulation on thymic stromal lymphopoietin (TSLP) production in nasal epithelial cells, is not well understood. Objective: To determine the possible regulation of IL-17A, IL-17F, and IL-25 on TSLP production in the initiation of allergic responses. Methods: The levels of IL-17A, IL-17F, IL-25, and TSLP in nasal lavages of patients with AR were measured using an enzyme-linked immunosorbent assay (ELISA) and compared with that in normal controls. Then, primary human nasal epithelial cells (HNECs) were stimulated with dsRNA (0–75 μg/ml), as well as IL-17A (100 ng/ml), IL-17F (100 ng/ml), and IL-25(100 ng/ml). The mRNA expression of IL-17A, IL-17F, IL-25, TSLP, as well as the chemokines CCL20, IL-8, and eotaxin was analyzed using quantitative real-time PCR, and their protein levels in the supernatants of cultured HNECs were determined by ELISA. Results: Both TSLP and IL-17 cytokines are significantly elevated in patients with AR. dsRNA was found to increase the production of IL-17F, IL-25, TSLP, CCL20, and IL-8 in HNECs. Furthermore, IL-25 significantly enhanced dsRNA-induced TSLP production in primary HNECs and was dominant to the inhibitory effect of IL-17A on TSLP regulation. Conclusions: Our study provides the first evidence that both IL-17F and IL-25 can be induced by dsRNA in HNECs. Despite of the opposing effects of IL-17A and IL-25 on TSLP regulation in HNECs, IL-25 was dominant to IL-17A, providing a plausible explanation for the simultaneous upregulation of IL-17 cytokines and TSLP in patients with AR.

86 citations


Proceedings ArticleDOI
03 Dec 2010
TL;DR: The joint multiview video plus depth coding (JMVDC) scheme presented in this paper uses inter-view prediction similarly to MVC as well as the inter-layer motion prediction tool of the Scalable Video Coding (SVC) standard to exploit the correlation of the motion in the texture and depth video sequences.
Abstract: Multiview video plus depth (MVD), where per-pixel depth map sequences are associated with multiview texture video, is a promising approach for three-dimensional (3D) video solutions requiring view synthesis. The Multiview Video Coding (MVC) standard has been typically used to compress the MVD representation, resulting into the multiview texture video and the multiview depth video being in two separate MVC bitstreams. Such a coding scheme does not, however, utilize the similarities of the motion information of the multiview texture video and the multiview depth video. The joint multiview video plus depth coding (JMVDC) scheme presented in this paper uses inter-view prediction similarly to MVC as well as the inter-layer motion prediction tool of the Scalable Video Coding (SVC) standard to exploit the correlation of the motion in the texture and depth video sequences. The simulation results show that the JMVDC method achieves about 10 to 20% saving in depth bitrate compared with conventional MVC-based coding of MVD representations.

42 citations


Proceedings ArticleDOI
03 Dec 2010
TL;DR: A novel low complexity yet very efficient rate control algorithm for Scalable Video Coding (SVC) based on the ρ-domain model which can be adaptively estimated by utilizing either temporal or inter-layer information and leads to significant improvement in estimation accuracy.
Abstract: In this paper, we present a novel low complexity yet very efficient rate control algorithm for Scalable Video Coding (SVC) based on the ρ-domain model. Compared with the conventional ρ-domain model in determining quantization parameter, the proposed algorithm adopts a new linear model to obtain the quantization parameter at frame-level. This linear model is able to characterize the relationship between the percentage of zero among quantized coefficients and the quantization step. Since this model considers the Mean Absolute Difference (MAD) of each frame, individual frame only needs to be encoded once to control the bit-rate. In particular, the model parameter in the ρ-domain model can be adaptively estimated by utilizing either temporal or inter-layer information. This leads to significant improvement in estimation accuracy. Experimental results show that the proposed algorithm can achieve accurate bit-rate and maintain the maximum mismatch between actual bit-rate and target bit-rate within 0.5%. More importantly, the proposed low complexity algorithm is able to obtain PSNR improvement up to 0.6dB comparing with the algorithm adopted in the Joint Scalable Video Model (JSVM).

28 citations


Proceedings ArticleDOI
14 Mar 2010
TL;DR: This paper proposes a novel bispace quantization strategy that achieves an improvement in discriminative power and ambiguity of visual words in web image search by quantizing local features first in descriptor space and then in orientation space.
Abstract: The state-of-the-art image retrieval approaches represent image with a high dimensional vector of visual words by quantizing local features, such as SIFT, solely in descriptor space The resulting visual words usually suffer from the dilemma of discrimination and ambiguity Besides, geometric relationships among visual words are usually ignored or only used for post-processing such as re-ranking In this paper, to improve the discriminative power and reduce the ambiguity of visual word, we propose a novel bispace quantization strategy Local features are quantized to visual words first in descriptor space and then in orientation space Moreover, geometric consistency constraints are embedded into the relevance formulation Experiments in web image search with a database of one million images show that our approach achieves an improvement of 654% over the baseline bag-of-words approach

13 citations


Proceedings ArticleDOI
14 Mar 2010
TL;DR: A coding and transmission scheme for streaming to clients of different display capabilities that saves transmission bandwidth by sending only those views that are displayed in a receiver and enables viewpoint switching through inter-view-predicted redundant pictures representing different sets of views being switched from and switched to.
Abstract: Interactive selection of the desired viewpoint is one important feature in multiview video streaming. When receivers differ in the number of simultaneously displayed views, it is challenging to optimize the multiview coding structure particularly when it comes to the use of inter-view prediction and anchor picture frequency. This paper presents a coding and transmission scheme for streaming to clients of different display capabilities. The scheme saves transmission bandwidth by sending only those views that are displayed in a receiver. Furthermore, the scheme enables viewpoint switching through inter-view-predicted redundant pictures representing different sets of views being switched from and switched to. When switching happens at a switch point, an appropriate redundant picture is adaptively chosen based on which views were transmitted before and will be transmitted after the switch point. Experimental results show that the proposed scheme improves the rate-distortion (RD) performance compared to simple anchor picture insertion.

12 citations


Proceedings ArticleDOI
19 Jul 2010
TL;DR: A depth-level-adaptive view synthesis algorithm is presented to reduce the amount of artifacts and to improve the quality of the synthesized images.
Abstract: In the multiview video plus depth (MVD) representation for 3D video, a depth map sequence is coded for each view. In the decoding end, a view synthesis algorithm is used to generate virtual views from depth map sequences. Many of the known view synthesis algorithms introduce rendering artifacts especially at object boundaries. In this paper, a depth-level-adaptive view synthesis algorithm is presented to reduce the amount of artifacts and to improve the quality of the synthesized images. The proposed algorithm introduces awareness of the depth level so that no pixel value in the synthesized image is derived from pixels of more than one depth level. Improvements on objective quality of the synthesized views were achieved in five out of eight test cases, while the subjective quality of the proposed method was similar to or better than that of the view synthesis method used by Moving Picture Experts Group (MPEG).

10 citations


Proceedings ArticleDOI
05 Jul 2010
TL;DR: A novel scheme of latent visual context analysis (LVCA) for image re-ranking that argues that the image significance is determined by its contained visual word context, which is analyzed through Latent Semantic Analysis (LSA) and visual word link graph.
Abstract: Currently Web image search is mostly implemented as text retrieval based on the textual information extracted from the Web page associated with the image. Since the text in the Web page may not match with the image content, image search re-ranking is preferable to refine the text-based search results. In this paper, we propose a novel scheme of latent visual context analysis (LVCA) for image re-ranking. The latent visual context is explored in both latent semantic context and visual link graphs. We argue that the image significance is determined by its contained visual word context, which is analyzed through Latent Semantic Analysis (LSA) and visual word link graph. With the visual word context information, the image context is explored by analysis of image link graph and the significance value for each image can be inferred by VisualRank. In both visual word link graph and image link graph, latent-layer will be incorporated to effectively discover the visual context. We validate our approach on text-query based search results returned by Google Image. Experimental results show improvement of both accuracy and efficiency of our method over the state-of-the-art VisualRank algorithm.

9 citations


Book
01 Jan 2010

7 citations


Proceedings ArticleDOI
03 Dec 2010
TL;DR: Experimental results on AVRIS data demonstrate that the proposed scheme achieves competitive compression performance with respect to other state-of-the-art 3D codecs and with even lower encoding complexity than 2D Codecs.
Abstract: In this paper we propose a novel distributed lossless compression scheme for hyperspectral images All the images/bands are encoded independently, and the spectral correlation is exploited using distributed coding technologies in order to achieve low encoding complexity At the encoder, sub-sampled images are successively encoded and transmitted At the decoder, side information is generated with the knowledge of decoded sub-sampled images and other previously decoded bands Reference bands are adaptively selected, and sliding window prediction or k nearest neighbor prediction is performed to capture the spatially varying spectral characteristics Experimental results on AVRIS data demonstrate that the proposed scheme achieves competitive compression performance with respect to other state-of-the-art 3D codecs and with even lower encoding complexity than 2D codecs

7 citations


16 May 2010
TL;DR: Simulations have been carried out to demonstrate that the proposed scheme, with innovative combination of low complexity encoding, lossless compression and progressive coding, can achieve competitive performance comparing with high complexity state-of-the-art 3-D DPCM technique.
Abstract: We present in this paper a novel distributed coding scheme for lossless and progressive compression of multispectral images. The main strategy of this new scheme is to explore data redundancies at the decoder in order to design a lightweight yet very efficient encoder suitable for onboard applications during acquisition of multispectral image. A sequence of increasing resolution layers is encoded and transmitted successively until the original image can be losslessly reconstructed from all layers. We assume that the decoder with abundant resources is able to perform adaptive region-based predictor estimation to capture spatially varying spectral correlation with the knowledge of lower-resolution layers, thus generate high quality side information for decoding the higher-resolution layer. Progressive transmission enables the spectral correlation to be refined successively, resulting in gradually improved decoding performance of higher-resolution layers as more data are decoded. Simulations have been carried out to demonstrate that the proposed scheme, with innovative combination of low complexity encoding, lossless compression and progressive coding, can achieve competitive performance comparing with high complexity state-of-the-art 3-D DPCM technique.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: A packetization algorithm is proposed to improve the quality of the reconstructed video stream while the expected packet loss rate remains unchanged compared to conventional packetization, based on appending MGS enhancement layer data into conventionally generated packet payloads until the Maximum Transmission Unit (MTU) is reached.
Abstract: In packet-oriented networks, packet losses occur mainly due to queue overflows in congested network elements. An increase of the packet rate therefore raises the likelihood of congestion and expected number of lost packets. However, an increase of the packet size has typically a negligible impact on the packet loss rate or congestion in wired packet-switched networks. Scalable Video Coding (SVC), an extension of the Advanced Video Coding (H.264/AVC), provides different types of scalability, one of which is Medium Grain Quality Scalability (MGS). When MGS is in use, layer representations can be pruned unevenly without affecting the decoding of the remaining bitstream. The article proposes a packetization algorithm to improve the quality of the reconstructed video stream while the expected packet loss rate remains unchanged compared to conventional packetization. The algorithm is based on appending MGS enhancement layer data into conventionally generated packet payloads until the Maximum Transmission Unit (MTU) is reached. The simulation results show that the proposed algorithm provides 0.3 to 0.5 dB gain in average luma Peak Signal-to-Noise Ratio when compared with a conventional packetization method while the packet rate remains unchanged.

Proceedings ArticleDOI
23 Aug 2010
TL;DR: It is proposed to use visual context learning to discover visual word significance and develop Weighted Set Coverage algorithm to select canonical images containing distinctive visual words.
Abstract: Canonical image selection is to select a subset of photos that best summarize a photo collection. In this paper, we define the canonical image as those that contain most important and distinctive visual words. We propose to use visual context learning to discover visual word significance and develop Weighted Set Coverage algorithm to select canonical images containing distinctive visual words. Experiments with web image datasets demonstrate that the canonical images selected by our approach are not only representatives of the collected photos, but also exhibit a diverse set of views with minimal redundancy.

Proceedings ArticleDOI
01 May 2010
TL;DR: An extremely low complexity scheme based on redundant pictures information is developed and is able to achieve the desired error resilient scalability of video bit-stream at the media gateway for heterogeneous networks.
Abstract: Robust transmission of compressed video bit-streams over heterogeneous packet loss network is one of the key challenges in the contemporary video communication system. Recent researches have focused on enhancing error resilience of bit-stream through the deployment of transcoder between wired and wireless networks. However, the conventional error resilient algorithms (e.g., Intra refresh) through cascading decoding and encoding processes usually have high degree of complexity. We introduce in this paper a completely new concept of error resilient scalability. We develop an extremely low complexity scheme based on redundant pictures information. In this scheme, redundant picture information is generated at the encoder and can be applied at media gateway to determine the redundant quantity of bit-stream according to the packet loss rate of access network. By transmitting compressed bit-stream together with redundant picture information, we are able to achieve the desired error resilient scalability of video bit-stream at the media gateway for heterogeneous networks. Joint rate source-channel-distortion model is adopted to optimize the generation of redundant picture information under various packet loss rates. Experimental results demonstrate expected effectiveness of the proposed scheme in error resilience scalability.

Proceedings ArticleDOI
11 Jul 2010
TL;DR: In this paper, a rate control scheme based on intermediate description is proposed to provide fast rate control for narrow and time-varying transmission channel in scenarios such as video streaming, video sharing and video on demand.
Abstract: Video adaptation has been proved to be an efficient technique in dealing with various constraints such as bandwidth limitation and user requirement in multimedia applications. However, existing methods including Scalable Video Coding and transcoding cannot get a fine performance when bandwidth constraints exist in various scenarios particularly in realtime applications. In this paper, we propose a novel rate control scheme based on intermediate description. The proposed scheme can provide fast rate control for narrow and time-varying transmission channel in scenarios such as video streaming, video sharing and video on demand. In this scheme, Discrete Cosine Transform (DCT) coefficients distribution is modeled by generalized Gaussian distribution, meanwhile the parameter information of this model is stored as side information for rate control. With the stored parameter information, encoder and transcoder can achieve the target bit-rate with low complexity. Furthermore, an initial Quantization Parameter (QP) determination method is also presented to calculate a proper QP for the Instantaneous Decoding Refresh (IDR) picture. Experimental results show that compared with JVT-G012 in H.264, the proposed rate control scheme can save more than 85% encoding time and obtain the required bit-rate more precisely, meanwhile gains a performance improvement by 0.2dB averagely.

Proceedings ArticleDOI
11 Jul 2010
TL;DR: A hybrid bit-stream rewriting approach to support both quality and spatial scalability is proposed based on the principle of residue upsampling in transform domain and the computational complexity of the proposed approach is much lower than the conventional scheme of cascading transcoding.
Abstract: Scalable Video Coding (SVC) is an extension of H264/AVC standard The base layer of SVC is compatible with H264/AVC standard, while the enhancement layers provide desired temporal, quality and/or spatial scalabilities Bit-stream rewriting in SVC standard allows an SVC bit-stream to be converted to an H264/AVC bit-stream without quality loss and preferably with low computational complexity However, current rewriting is only supported in quality scalability rather than spatial scalability, which limits the application in many practical scenarios In this paper, a hybrid bit-stream rewriting approach to support both quality and spatial scalability is proposed based on the principle of residue upsampling in transform domain The computational complexity of the proposed approach is much lower than the conventional scheme of cascading transcoding Extensive experimental results demonstrate that the loss of the rate-distortion (RD) performance of the proposed rewritable SVC bit-stream is acceptable compared with the conventional SVC bit-stream, however, the RD performance is better than that of simulcast Furthermore, the RD performance of the H264/AVC bit-stream rewritten from the rewritable SVC bit-stream is even better than that of the input SVC bit-stream Compared with the cascading transcoding scheme, the proposed hybrid rewriting can achieve 08 dB Y-PSNR gains while saving 80% processing time on average

Proceedings ArticleDOI
25 Oct 2010
TL;DR: This work proposes a retrieval system based on a novel scheme, spatial coding, to encode the spatial information among local features in an image, which is both efficient and effective to discover false matches of local features between images, and can greatly improve retrieval performance.
Abstract: The state-of-the-art image retrieval approaches represent images with a high dimensional vector of visual words by quantizing local features, such as SIFT, in the descriptor space. The geometric clues among visual words in an image is usually ignored or exploited for full geometric verification, which is computationally expensive. In recent years, partially duplicated images are prevalent on the web. In this demo, we focus on partial-duplicated web image retrieval, and propose a retrieval system based on a novel scheme, spatial coding, to encode the spatial information among local features in an image. Our spatial coding is both efficient and effective to discover false matches of local features between images, and can greatly improve retrieval performance.

Proceedings ArticleDOI
11 Jul 2010
TL;DR: Zhang et al. as mentioned in this paper introduced background color histogram into a MAP formulation and divided the target into hierarchical blocks, which are described with a histogram and tracked as a whole.
Abstract: Mean Shift is popular in object tracking due to its simplicity and efficiency. It finds local maximum of the similarity measure between the target model and target candidate, and works well in many situations. However, it suffers from two aspects. First, Mean Shift tracker ignores background knowledge. As a result, it may fail when the background color is similar to that of the target or the initial target region contains too much background. Second, Mean Shift tracker omits the geometric structure with a global color histogram as the target model. Therefore, it may not work in the case of partial occlusion. To solve the first problem, we introduce background color histogram into a MAP formulation. To address the second problem, we divide the target into hierarchical blocks. These blocks are described with a histogram each but tracked as a whole. The two threads lead to a new algorithm, named MAP spatial pyramid (MAP-SP) Mean Shift. The efficiency of MAP-SP Mean Shift is demonstrated via comparative experiments on both standard and our own video sequences

Proceedings ArticleDOI
11 Jul 2010
TL;DR: The results indicate that when the maximum disparity between the left and right views was relatively small, the presented time-variable camera separation method was imperceptible and a compression gain, the magnitude of which depended on the scene duration, was achieved.
Abstract: This paper presents a hypothesis that stereoscopic perception requires a short adjustment period after a scene change before it is fully effective. A compression method based on this hypothesis is proposed - instead of coding pictures from the left and right views conventionally, a view in the middle of the left and right view is coded for a limited period after a scene change. The coded middle view can be utilized in two alternative ways in rendering. First, it can be rendered as such, which causes an abrupt change from conventional monoscopic video to stereoscopic video. Second, the layered depth video (LDV) coding scheme can be used to associate depth, background texture, and background depth to the middle view, enabling view synthesis and gradual view disparity increase in rendering. Subjective experiments were conducted to evaluate and validate the presented hypothesis and compare the two rendering methods. The results indicate that when the maximum disparity between the left and right views was relatively small, the presented time-variable camera separation method was imperceptible. A compression gain, the magnitude of which depended on the scene duration, was achieved with half of the sequences having a suitable disparity for the presented coding method.