Home
/
Authors
/
Shipeng Li

Author

Shipeng Li

Bio: Shipeng Li is an academic researcher from Microsoft. The author has contributed to research in topics: Motion compensation & Scalable Video Coding. The author has an hindex of 70, co-authored 440 publications receiving 17207 citations.

Papers published on a yearly basis

2020
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Salient Object Detection: A Discriminative Regional Feature Integration Approach

[...]

Huaizu Jiang¹, Jingdong Wang², Zejian Yuan¹, Yang Wu³, Nanning Zheng¹, Shipeng Li² - Show less +2 more•Institutions (3)

Xi'an Jiaotong University¹, Microsoft², Kyoto University³

23 Jun 2013

TL;DR: This paper regards saliency map computation as a regression problem, which is based on multi-level image segmentation, and uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the salency map.

...read moreread less

Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional background ness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, background ness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.

...read moreread less

1,057 citations

Journal Article•DOI•

Multimedia Cloud Computing

[...]

Wenwu Zhu¹, Chong Luo², Jianfeng Wang³, Shipeng Li³•Institutions (3)

Intel¹, Shanghai Jiao Tong University², Microsoft³

21 Apr 2011-IEEE Signal Processing Magazine

TL;DR: A multimedia-aware cloud is presented, which addresses how a cloud can perform distributed multimedia processing and storage and provide quality of service (QoS) provisioning for multimedia services, and a media-edge cloud (MEC) architecture is proposed, in which storage, central processing unit (CPU), and graphics processing units (GPU) clusters are presented at the edge.

...read moreread less

Abstract: This article introduces the principal concepts of multimedia cloud computing and presents a novel framework. We address multimedia cloud computing from multimedia-aware cloud (media cloud) and cloud-aware multimedia (cloud media) perspectives. First, we present a multimedia-aware cloud, which addresses how a cloud can perform distributed multimedia processing and storage and provide quality of service (QoS) provisioning for multimedia services. To achieve a high QoS for multimedia services, we propose a media-edge cloud (MEC) architecture, in which storage, central processing unit (CPU), and graphics processing unit (GPU) clusters are presented at the edge to provide distributed parallel processing and QoS adaptation for various types of devices.

...read moreread less

439 citations

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Editor-in-Chief

[...]

Chang Wen Chen, Hamid Gharavi, Thomas Sikora, Ishfaq Ahmad, John F. Arnold, Oscar Au, Mauro Barni, Vincent Bottreau, Jill Macdonald Boyce, Thomson Corp, Jianfei Cai, Homer H. Chen, Shao-Yi Chien, Mary Comer, Paulo Lobato Correia, Ricardo De Queiroz, Pascal Frossard, Toshiaki Fujii, Wen Gao, Richard Green, Yo-Sung Ho, Ebroul Izquierdo, Queen Mary, Rosa C. Lancini, Shipeng Li, Xin Li, Xuelong Li, Antonio Navarro, Sharath Pankanti, Justin Ridge, Yong Rui, Dan Schonfeld, Eckehard Steinbach, Sanghoon Sull, Huifang Sun, Clark N. Taylor, Deepak Turaga, Gene Wen, Thomas Wiegand, Dapeng Oliver Wu, Jar-Ferr Yang, Haoping Yu - Show less +38 more

01 Jan 2008

TL;DR: This special issue, which focuses on event analysis in broad problem domains, has witnessed the effectiveness of using both static and temporal information in event recognition from other video sources.

...read moreread less

Abstract: Event analysis in videos is a critical task in many applications. Activity recognition that aims to recognize actions from video and in particular abnormal event recognition in surveillance video has received significant attention from the research community. In this special issue, we focus on event analysis in broad problem domains. Event recognition in specific domains, such as highlight detection in sports videos, has attracted much interest in the past decade. Recently, due to the emergence of online video search, the research community has become interested in event content analysis for both broadcast and user-generated videos. For news videos, Large-Scale Concept Ontology for Multimedia (LSCOM) has defined 56 event/activity concepts, covering a broad range of events such as airplane flying, car crash, riot, people marching, and so on. Researchers have also started to investigate event recognition from other video sources, such as education videos and medical videos. For these applications, we have witnessed the effectiveness of using both static and temporal information.

...read moreread less

428 citations

Journal Article•DOI•

A framework for efficient progressive fine granularity scalable video coding

[...]

Feng Wu¹, Shipeng Li¹, Ya-Qin Zhang¹•Institutions (1)

Microsoft¹

01 Mar 2001-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Experimental results show that the PFGS framework can improve the coding efficiency up to more than 1 dB over the FGS scheme in terms of average PSNR, yet still keeps all the original properties, such as fine granularity, bandwidth adaptation, and error recovery.

...read moreread less

Abstract: A basic framework for efficient scalable video coding, namely progressive fine granularity scalable (PFGS) video coding is proposed. Similar to the fine granularity scalable (PGS) video coding in MPEG-4, the PFGS framework has all the features of FGS, such as fine granularity bit-rate scalability, channel adaptation, and error recovery. On the other hand, different from the PGS coding, the PFGS framework uses multiple layers of references with increasing quality to make motion prediction more accurate for improved video-coding efficiency. However, using multiple layers of references with different quality also introduces several issues. First, extra frame buffers are needed for storing the multiple reconstructed reference layers. This would increase the memory cost and computational complexity of the PFGS scheme. Based on the basic framework, a simplified and efficient PFGS framework is further proposed. The simplified PPGS framework needs only one extra frame buffer with almost the same coding efficiency as in the original framework. Second, there might be undesirable increase and fluctuation of the coefficients to be coded when switching from a low-quality reference to a high-quality one, which could partially offset the advantage of using a high-quality reference. A further improved PFGS scheme can eliminate the fluctuation of enhancement-layer coefficients when switching references by always using only one high-quality prediction reference for all enhancement layers. Experimental results show that the PFGS framework can improve the coding efficiency up to more than 1 dB over the FGS scheme in terms of average PSNR, yet still keeps all the original properties, such as fine granularity, bandwidth adaptation, and error recovery. A simple simulation of transmitting the PFGS video over a wireless channel further confirms the error robustness of the PFGS scheme, although the advantages of PFGS have not been fully exploited.

...read moreread less

343 citations

Patent•

Dynamic search with implicit user intention mining

[...]

Guobin Shen¹, Shipeng Li¹•Institutions (1)

Microsoft¹

29 Dec 2005

TL;DR: In this paper, an intention mining engine collects information from natural user responses to the results of a search and uses this information to refine the search results, which can be used to refine search results.

...read moreread less

Abstract: After a user instigated search returns results, an intention mining engine collects information from the natural user responses to the results. This information is used to refine the search.

...read moreread less

255 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Journal Article•DOI•

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

[...]

Heiko Schwarz¹, Detlev Marpe¹, Thomas Wiegand¹•Institutions (1)

Heinrich Hertz Institute¹

01 Sep 2007-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: An overview of the basic concepts for extending H.264/AVC towards SVC are provided and the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.

...read moreread less

Abstract: With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC standard. SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. Hence, SVC provides functionalities such as graceful degradation in lossy transmission environments as well as bit rate, format, and power adaptation. These functionalities provide enhancements to transmission and storage applications. SVC has achieved significant improvements in coding efficiency with an increased degree of supported scalability relative to the scalable profiles of prior video coding standards. This paper provides an overview of the basic concepts for extending H.264/AVC towards SVC. Moreover, the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.

...read moreread less

3,592 citations

Proceedings Article•DOI•

Scalable Person Re-identification: A Benchmark

[...]

Liang Zheng¹, Liang Zheng², Liyue Shen², Lu Tian², Shengjin Wang², Jingdong Wang³, Qi Tian² - Show less +3 more•Institutions (3)

University of Texas at San Antonio¹, Tsinghua University², Microsoft³

07 Dec 2015

TL;DR: A minor contribution, inspired by recent advances in large-scale image search, an unsupervised Bag-of-Words descriptor is proposed that yields competitive accuracy on VIPeR, CUHK03, and Market-1501 datasets, and is scalable on the large- scale 500k dataset.

...read moreread less

Abstract: This paper contributes a new high quality dataset for person re-identification, named "Market-1501". Generally, current datasets: 1) are limited in scale, 2) consist of hand-drawn bboxes, which are unavailable under realistic settings, 3) have only one ground truth and one query image for each identity (close environment). To tackle these problems, the proposed Market-1501 dataset is featured in three aspects. First, it contains over 32,000 annotated bboxes, plus a distractor set of over 500K images, making it the largest person re-id dataset to date. Second, images in Market-1501 dataset are produced using the Deformable Part Model (DPM) as pedestrian detector. Third, our dataset is collected in an open system, where each identity has multiple images under each camera. As a minor contribution, inspired by recent advances in large-scale image search, this paper proposes an unsupervised Bag-of-Words descriptor. We view person re-identification as a special task of image search. In experiment, we show that the proposed descriptor yields competitive accuracy on VIPeR, CUHK03, and Market-1501 datasets, and is scalable on the large-scale 500k dataset.

...read moreread less

3,564 citations

The Scalable Video Coding Extension of the H.264/AVC Standard

[...]

Heiko Schwarz, Mathias Wien

01 Jan 2008

3,357 citations

Journal Article•DOI•

Object Detection With Deep Learning: A Review

[...]

Zhong-Qiu Zhao¹, Peng Zheng¹, Shou-Tao Xu¹, Xindong Wu²•Institutions (2)

Hefei University of Technology¹, University of Louisiana at Lafayette²

28 Jan 2019-IEEE Transactions on Neural Networks

TL;DR: In this article, a review of deep learning-based object detection frameworks is provided, focusing on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further.

...read moreread less

Abstract: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.

...read moreread less

3,097 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse