Home
/
Authors
/
Yun Zhai

Author

Yun Zhai

Bio: Yun Zhai is an academic researcher from University of Central Florida. The author has contributed to research in topics: Feature (computer vision) & Image segmentation. The author has an hindex of 11, co-authored 26 publications receiving 1433 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Visual attention detection in video sequences using spatiotemporal cues

[...]

Yun Zhai¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

23 Oct 2006

TL;DR: The proposed spatiotemporal video attention framework has been applied on over 20 testing video sequences, and attended regions are detected to highlight interesting objects and motions present in the sequences with very high user satisfaction rate.

...read moreread less

Abstract: Human vision system actively seeks interesting regions in images to reduce the search effort in tasks, such as object detection and recognition. Similarly, prominent actions in video sequences are more likely to attract our first sight than their surrounding neighbors. In this paper, we propose a spatiotemporal video attention detection technique for detecting the attended regions that correspond to both interesting objects and actions in video sequences. Both spatial and temporal saliency maps are constructed and further fused in a dynamic fashion to produce the overall spatiotemporal attention model. In the temporal attention model, motion contrast is computed based on the planar motions (homography) between images, which is estimated by applying RANSAC on point correspondences in the scene. To compensate the non-uniformity of spatial distribution of interest-points, spanning areas of motion segments are incorporated in the motion contrast computation. In the spatial attention model, a fast method for computing pixel-level saliency maps has been developed using color histograms of images. A hierarchical spatial attention representation is established to reveal the interesting points in images as well as the interesting regions. Finally, a dynamic fusion technique is applied to combine both the temporal and spatial saliency maps, where temporal attention is dominant over the spatial model when large motion contrast exists, and vice versa. The proposed spatiotemporal attention framework has been applied on over 20 testing video sequences, and attended regions are detected to highlight interesting objects and motions present in the sequences with very high user satisfaction rate.

...read moreread less

983 citations

Journal Article•DOI•

Video scene segmentation using Markov chain Monte Carlo

[...]

Yun Zhai¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

01 Aug 2006-IEEE Transactions on Multimedia

TL;DR: A general framework for temporal scene segmentation in various video domains that is able to find the weak boundaries as well as the strong boundaries, i.e., it does not rely on the fixed threshold and can be applied to different video domains.

...read moreread less

Abstract: Videos are composed of many shots that are caused by different camera operations, e.g., on/off operations and switching between cameras. One important goal in video analysis is to group the shots into temporal scenes, such that all the shots in a single scene are related to the same subject, which could be a particular physical setting, an ongoing action or a theme. In this paper, we present a general framework for temporal scene segmentation in various video domains. The proposed method is formulated in a statistical fashion and uses the Markov chain Monte Carlo (MCMC) technique to determine the boundaries between video scenes. In this approach, a set of arbitrary scene boundaries are initialized at random locations and are automatically updated using two types of updates: diffusion and jumps. Diffusion is the process of updating the boundaries between adjacent scenes. Jumps consist of two reversible operations: the merging of two scenes and the splitting of an existing scene. The posterior probability of the target distribution of the number of scenes and their corresponding boundary locations is computed based on the model priors and the data likelihood. The updates of the model parameters are controlled by the hypothesis ratio test in the MCMC process, and the samples are collected to generate the final scene boundaries. The major advantage of the proposed framework is two-fold: 1) it is able to find the weak boundaries as well as the strong boundaries, i.e., it does not rely on the fixed threshold; 2) it can be applied to different video domains. We have tested the proposed method on two video domains: home videos and feature films, and accurate results have been obtained

...read moreread less

125 citations

Journal Article•DOI•

Content based video matching using spatiotemporal volumes

[...]

Arslan Basharat¹, Yun Zhai¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

01 Jun 2008-Computer Vision and Image Understanding

TL;DR: This paper presents a novel framework for matching video sequences using the spatiotemporal segmentation of videos that uses interest point trajectories to generate video volumes and employs an Earth Mover's Distance based approach for the comparison of volume features.

...read moreread less

93 citations

Proceedings Article•DOI•

Tracking news stories across different sources

[...]

Yun Zhai¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

06 Nov 2005

TL;DR: The proposed semantic linking framework and the story ranking method have been tested on a set of 60 hours open-benchmark TRECVID video data, and very satisfactory results for both tasks have been obtained.

...read moreread less

Abstract: Information linkage is becoming more and more important in this digital age. In this paper, we propose a concept tracking method, which links news stories on the same topic across multiple sources. The semantic linkage between the news stories is reflected in combination of both of their visual content and their spoken language content. Visually, each news story is represented by a set of key-frames with or without detected faces. The facial key-frames are linked based on the analysis of the extended facial regions, and the non-facial key-frames are correlated using the global Affine matching. The language similarity is expressed in terms of the normalized text similarity between the stories' keywords. The output results of the story linking are further used in a story ranking task, which indicate the interesting level of the stories. The proposed semantic linking framework and the story ranking method have been tested on a set of 60 hours open-benchmark TRECVID video data, and very satisfactory results for both tasks have been obtained.

...read moreread less

76 citations

Book Chapter•DOI•

Story segmentation in news videos using visual and text cues

[...]

Yun Zhai¹, Alper Yilmaz¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

20 Jul 2005

TL;DR: The proposed framework for segmenting the news programs into different story topics has been tested on a widely used data set provided by NIST, which contains the ground truth of the story boundaries, and competitive evaluation results have been obtained.

...read moreread less

Abstract: In this paper, we present a framework for segmenting the news programs into different story topics. The proposed method utilizes both visual and text information of the video. We represent the news video by a Shot Connectivity Graph (SCG), where the nodes in the graph represent the shots in the video, and the edges between nodes represent the transitions between shots. The cycles in the graph correspond to the story segments in the news program. We first detect the cycles in the graph by finding the anchor persons in the video. This provides us with the coarse segmentation of the news video. The initial segmentation is later refined by the detections of the weather and sporting news, and the merging of similar stories. For the weather detection, the global color information of the images and the motion of the shots are considered. We have used the text obtained from automatic speech recognition (ASR) for detecting the potential sporting shots to form the sport stories. Adjacent stories with similar semantic meanings are further merged based on the visual and text similarities. The proposed framework has been tested on a widely used data set provided by NIST, which contains the ground truth of the story boundaries, and competitive evaluation results have been obtained.

...read moreread less

41 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Global contrast based salient region detection

[...]

Ming-Ming Cheng¹, Guo-Xin Zhang¹, Niloy J. Mitra², Xiaolei Huang³, Shi-Min Hu¹ - Show less +1 more•Institutions (3)

Tsinghua University¹, King Abdullah University of Science and Technology², Lehigh University³

20 Jun 2011

TL;DR: This work proposes a regional contrast based saliency extraction algorithm, which simultaneously evaluates global contrast differences and spatial coherence, and consistently outperformed existing saliency detection methods.

...read moreread less

Abstract: Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object detection algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. The proposed algorithm is simple, efficient, naturally multi-scale, and produces full-resolution, high-quality saliency maps. These saliency maps are further used to initialize a novel iterative version of GrabCut, namely SaliencyCut, for high quality unsupervised salient object segmentation. We extensively evaluated our algorithm using traditional salient object detection datasets, as well as a more challenging Internet image dataset. Our experimental results demonstrate that our algorithm consistently outperforms 15 existing salient object detection and segmentation methods, yielding higher precision and better recall rates. We also show that our algorithm can be used to efficiently extract salient object masks from Internet images, enabling effective sketch-based image retrieval (SBIR) via simple shape comparisons. Despite such noisy internet images, where the saliency regions are ambiguous, our saliency guided image retrieval achieves a superior retrieval rate compared with state-of-the-art SBIR methods, and additionally provides important target object region information.

...read moreread less

3,653 citations

Journal Article•DOI•

Image retrieval: Ideas, influences, and trends of the new age

[...]

Ritendra Datta¹, Dhiraj Joshi¹, Jia Li¹, James Z. Wang¹•Institutions (1)

Pennsylvania State University¹

08 May 2008-ACM Computing Surveys

TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.

...read moreread less

Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

...read moreread less

3,433 citations

Journal Article•DOI•

Learning to Detect a Salient Object

[...]

Tie Liu¹, Zejian Yuan¹, Jian Sun², Jingdong Wang², Nanning Zheng¹, Xiaoou Tang³, Heung-Yeung Shum² - Show less +3 more•Institutions (3)

Xi'an Jiaotong University¹, Microsoft², The Chinese University of Hong Kong³

01 Feb 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, are proposed to describe a salient object locally, regionally, and globally.

...read moreread less

Abstract: In this paper, we study the salient object detection problem for images. We formulate this problem as a binary labeling task where we separate the salient object from the background. We propose a set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. Further, we extend the proposed approach to detect a salient object from sequential images by introducing the dynamic salient features. We collected a large image database containing tens of thousands of carefully labeled images by multiple users and a video segment database, and conducted a set of experiments over them to demonstrate the effectiveness of the proposed approach.

...read moreread less

2,319 citations

Proceedings Article•DOI•

Saliency Detection via Graph-Based Manifold Ranking

[...]

Chuan Yang¹, Lihe Zhang¹, Huchuan Lu¹, Xiang Ruan², Ming-Hsuan Yang³ - Show less +1 more•Institutions (3)

Dalian University of Technology¹, Omron², University of California, Merced³

23 Jun 2013

TL;DR: This work considers both foreground and background cues in a different way and ranks the similarity of the image elements with foreground cues or background cues via graph-based manifold ranking, defined based on their relevances to the given seeds or queries.

...read moreread less

Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking The saliency of the image elements is defined based on their relevances to the given seeds or queries We represent the image as a close-loop graph with super pixels as nodes These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed We also create a more difficult benchmark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field

...read moreread less

2,278 citations

Journal Article•DOI•

State-of-the-Art in Visual Attention Modeling

[...]

Ali Borji¹, Laurent Itti¹•Institutions (1)

University of Southern California¹

01 Jan 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A taxonomy of nearly 65 models of attention provides a critical comparison of approaches, their capabilities, and shortcomings, and addresses several challenging issues with models, including biological plausibility of the computations, correlation with eye movement datasets, bottom-up and top-down dissociation, and constructing meaningful performance measures.

...read moreread less

Abstract: Modeling visual attention-particularly stimulus-driven, saliency-based attention-has been a very active research area over the past 25 years. Many different models of attention are now available which, aside from lending theoretical contributions to other fields, have demonstrated successful applications in computer vision, mobile robotics, and cognitive systems. Here we review, from a computational perspective, the basic concepts of attention implemented in these models. We present a taxonomy of nearly 65 models, which provides a critical comparison of approaches, their capabilities, and shortcomings. In particular, 13 criteria derived from behavioral and computational studies are formulated for qualitative comparison of attention models. Furthermore, we address several challenging issues with models, including biological plausibility of the computations, correlation with eye movement datasets, bottom-up and top-down dissociation, and constructing meaningful performance measures. Finally, we highlight current research trends in attention modeling and provide insights for future.

...read moreread less

1,817 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse