scispace - formally typeset
Search or ask a question
Author

Shipeng Li

Bio: Shipeng Li is an academic researcher from Microsoft. The author has contributed to research in topics: Motion compensation & Scalable Video Coding. The author has an hindex of 70, co-authored 440 publications receiving 17207 citations.


Papers
More filters
Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper regards saliency map computation as a regression problem, which is based on multi-level image segmentation, and uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the salency map.
Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional background ness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, background ness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.

1,057 citations

Journal ArticleDOI
TL;DR: A multimedia-aware cloud is presented, which addresses how a cloud can perform distributed multimedia processing and storage and provide quality of service (QoS) provisioning for multimedia services, and a media-edge cloud (MEC) architecture is proposed, in which storage, central processing unit (CPU), and graphics processing units (GPU) clusters are presented at the edge.
Abstract: This article introduces the principal concepts of multimedia cloud computing and presents a novel framework. We address multimedia cloud computing from multimedia-aware cloud (media cloud) and cloud-aware multimedia (cloud media) perspectives. First, we present a multimedia-aware cloud, which addresses how a cloud can perform distributed multimedia processing and storage and provide quality of service (QoS) provisioning for multimedia services. To achieve a high QoS for multimedia services, we propose a media-edge cloud (MEC) architecture, in which storage, central processing unit (CPU), and graphics processing unit (GPU) clusters are presented at the edge to provide distributed parallel processing and QoS adaptation for various types of devices.

439 citations

01 Jan 2008
TL;DR: This special issue, which focuses on event analysis in broad problem domains, has witnessed the effectiveness of using both static and temporal information in event recognition from other video sources.
Abstract: Event analysis in videos is a critical task in many applications. Activity recognition that aims to recognize actions from video and in particular abnormal event recognition in surveillance video has received significant attention from the research community. In this special issue, we focus on event analysis in broad problem domains. Event recognition in specific domains, such as highlight detection in sports videos, has attracted much interest in the past decade. Recently, due to the emergence of online video search, the research community has become interested in event content analysis for both broadcast and user-generated videos. For news videos, Large-Scale Concept Ontology for Multimedia (LSCOM) has defined 56 event/activity concepts, covering a broad range of events such as airplane flying, car crash, riot, people marching, and so on. Researchers have also started to investigate event recognition from other video sources, such as education videos and medical videos. For these applications, we have witnessed the effectiveness of using both static and temporal information.

428 citations

Journal ArticleDOI
Feng Wu1, Shipeng Li1, Ya-Qin Zhang1
TL;DR: Experimental results show that the PFGS framework can improve the coding efficiency up to more than 1 dB over the FGS scheme in terms of average PSNR, yet still keeps all the original properties, such as fine granularity, bandwidth adaptation, and error recovery.
Abstract: A basic framework for efficient scalable video coding, namely progressive fine granularity scalable (PFGS) video coding is proposed. Similar to the fine granularity scalable (PGS) video coding in MPEG-4, the PFGS framework has all the features of FGS, such as fine granularity bit-rate scalability, channel adaptation, and error recovery. On the other hand, different from the PGS coding, the PFGS framework uses multiple layers of references with increasing quality to make motion prediction more accurate for improved video-coding efficiency. However, using multiple layers of references with different quality also introduces several issues. First, extra frame buffers are needed for storing the multiple reconstructed reference layers. This would increase the memory cost and computational complexity of the PFGS scheme. Based on the basic framework, a simplified and efficient PFGS framework is further proposed. The simplified PPGS framework needs only one extra frame buffer with almost the same coding efficiency as in the original framework. Second, there might be undesirable increase and fluctuation of the coefficients to be coded when switching from a low-quality reference to a high-quality one, which could partially offset the advantage of using a high-quality reference. A further improved PFGS scheme can eliminate the fluctuation of enhancement-layer coefficients when switching references by always using only one high-quality prediction reference for all enhancement layers. Experimental results show that the PFGS framework can improve the coding efficiency up to more than 1 dB over the FGS scheme in terms of average PSNR, yet still keeps all the original properties, such as fine granularity, bandwidth adaptation, and error recovery. A simple simulation of transmitting the PFGS video over a wireless channel further confirms the error robustness of the PFGS scheme, although the advantages of PFGS have not been fully exploited.

343 citations

Patent
Guobin Shen1, Shipeng Li1
29 Dec 2005
TL;DR: In this paper, an intention mining engine collects information from natural user responses to the results of a search and uses this information to refine the search results, which can be used to refine search results.
Abstract: After a user instigated search returns results, an intention mining engine collects information from the natural user responses to the results. This information is used to refine the search.

255 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An overview of the basic concepts for extending H.264/AVC towards SVC are provided and the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.
Abstract: With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC standard. SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. Hence, SVC provides functionalities such as graceful degradation in lossy transmission environments as well as bit rate, format, and power adaptation. These functionalities provide enhancements to transmission and storage applications. SVC has achieved significant improvements in coding efficiency with an increased degree of supported scalability relative to the scalable profiles of prior video coding standards. This paper provides an overview of the basic concepts for extending H.264/AVC towards SVC. Moreover, the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.

3,592 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: A minor contribution, inspired by recent advances in large-scale image search, an unsupervised Bag-of-Words descriptor is proposed that yields competitive accuracy on VIPeR, CUHK03, and Market-1501 datasets, and is scalable on the large- scale 500k dataset.
Abstract: This paper contributes a new high quality dataset for person re-identification, named "Market-1501". Generally, current datasets: 1) are limited in scale, 2) consist of hand-drawn bboxes, which are unavailable under realistic settings, 3) have only one ground truth and one query image for each identity (close environment). To tackle these problems, the proposed Market-1501 dataset is featured in three aspects. First, it contains over 32,000 annotated bboxes, plus a distractor set of over 500K images, making it the largest person re-id dataset to date. Second, images in Market-1501 dataset are produced using the Deformable Part Model (DPM) as pedestrian detector. Third, our dataset is collected in an open system, where each identity has multiple images under each camera. As a minor contribution, inspired by recent advances in large-scale image search, this paper proposes an unsupervised Bag-of-Words descriptor. We view person re-identification as a special task of image search. In experiment, we show that the proposed descriptor yields competitive accuracy on VIPeR, CUHK03, and Market-1501 datasets, and is scalable on the large-scale 500k dataset.

3,564 citations

Journal ArticleDOI
TL;DR: In this article, a review of deep learning-based object detection frameworks is provided, focusing on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further.
Abstract: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.

3,097 citations