Home
/
Authors
/
Jian Yao

Author

Jian Yao

Other affiliations: The Chinese University of Hong Kong, Idiap Research Institute, Institute for the Protection and Security of the Citizen ...read more

Bio: Jian Yao is an academic researcher from Wuhan University. The author has contributed to research in topics: Artificial intelligence & Pixel. The author has an hindex of 24, co-authored 121 publications receiving 3058 citations. Previous affiliations of Jian Yao include The Chinese University of Hong Kong & Idiap Research Institute.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Unified Framework for Street-View Panorama Stitching.

[...]

Li Li¹, Jian Yao¹, Renping Xie¹, Menghan Xia¹, Wei Zhang² - Show less +1 more•Institutions (2)

Wuhan University¹, Shandong University²

22 Dec 2016-Sensors

TL;DR: Experimental results on a large set of challenging street-view panoramic images captured form the real world illustrate that the proposed system is capable of creating high-quality panoramas.

...read moreread less

Abstract: In this paper, we propose a unified framework to generate a pleasant and high-quality street-view panorama by stitching multiple panoramic images captured from the cameras mounted on the mobile platform. Our proposed framework is comprised of four major steps: image warping, color correction, optimal seam line detection and image blending. Since the input images are captured without a precisely common projection center from the scenes with the depth differences with respect to the cameras to different extents, such images cannot be precisely aligned in geometry. Therefore, an efficient image warping method based on the dense optical flow field is proposed to greatly suppress the influence of large geometric misalignment at first. Then, to lessen the influence of photometric inconsistencies caused by the illumination variations and different exposure settings, we propose an efficient color correction algorithm via matching extreme points of histograms to greatly decrease color differences between warped images. After that, the optimal seam lines between adjacent input images are detected via the graph cut energy minimization framework. At last, the Laplacian pyramid blending algorithm is applied to further eliminate the stitching artifacts along the optimal seam lines. Experimental results on a large set of challenging street-view panoramic images captured form the real world illustrate that the proposed system is capable of creating high-quality panoramas.

...read moreread less

863 citations

Journal Article•DOI•

DeepCrack: A deep hierarchical feature learning architecture for crack segmentation

[...]

Yahui Liu¹, Jian Yao¹, Xiaohu Lu¹, Renping Xie¹, Li Li¹ - Show less +1 more•Institutions (1)

Wuhan University¹

21 Apr 2019-Neurocomputing

TL;DR: A deep hierarchical convolutional neural network (CNN) is proposed, called as DeepCrack, to predict pixel-wise crack segmentation in an end-to-end method using both guided filtering and Conditional Random Fields methods to refine the final prediction results.

...read moreread less

356 citations

Journal Article•DOI•

Text-Attentional Convolutional Neural Network for Scene Text Detection

[...]

Tong He¹, Weilin Huang¹, Yu Qiao¹, Jian Yao²•Institutions (2)

Chinese Academy of Sciences¹, Wuhan University²

01 Jun 2016-IEEE Transactions on Image Processing

TL;DR: A new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components and a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed.

...read moreread less

Abstract: Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/non-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of text/non-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results.

...read moreread less

304 citations

Proceedings Article•DOI•

Multi-Layer Background Subtraction Based on Color and Texture

[...]

Jian Yao¹, Jean-Marc Odobez¹•Institutions (1)

Idiap Research Institute¹

17 Jun 2007

TL;DR: A robust multi-layer background subtraction technique which takes advantages of local texture features represented by local binary patterns (LBP) and photometric invariant color measurements in RGB color space and allows to implicitly smooth detection results over regions of similar intensity and preserve object boundaries.

...read moreread less

Abstract: In this paper, we propose a robust multi-layer background subtraction technique which takes advantages of local texture features represented by local binary patterns (LBP) and photometric invariant color measurements in RGB color space. LBP can work robustly with respective to light variation on rich texture regions but not so efficiently on uniform regions. In the latter case, color information should overcome LBP's limitation. Due to the illumination invariance of both the LBP feature and the selected color feature, the method is able to handle local illumination changes such as cast shadows from moving objects. Due to the use of a simple layer-based strategy, the approach can model moving background pixels with quasi-periodic flickering as well as background scenes which may vary over time due to the addition and removal of long-time stationary objects. Finally, the use of a cross-bilateral filter allows to implicitly smooth detection results over regions of similar intensity and preserve object boundaries. Numerical and qualitative experimental results on both simulated and real data demonstrate the robustness of the proposed method.

...read moreread less

293 citations

Journal Article•DOI•

Text-Attentional Convolutional Neural Networks for Scene Text Detection

[...]

Tong He, Weilin Huang, Yu Qiao, Jian Yao

12 Oct 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components.

...read moreread less

Abstract: Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this work, we present a new system for scene text detection by proposing a novel Text-Attentional Convolutional Neural Network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary text/nontext information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates main task of text/non-text classification. In addition, a powerful low-level detector called Contrast- Enhancement Maximally Stable Extremal Regions (CE-MSERs) is developed, which extends the widely-used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 dataset, with a F-measure of 0.82, improving the state-of-the-art results substantially.

...read moreread less

115 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

Collapse

Cited by

PDF

Open Access

More filters

Book Chapter•DOI•

Performance Measures and a Data Set for Multi-target, Multi-camera Tracking

[...]

Ergys Ristani¹, Francesco Solera², Roger S. Zou¹, Rita Cucchiara², Carlo Tomasi¹ - Show less +1 more•Institutions (2)

Duke University¹, University of Modena and Reggio Emilia²

08 Oct 2016

TL;DR: A new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error are presented to help accelerate progress in multi-target, multi-camera tracking systems.

...read moreread less

Abstract: To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080 p, 60 fps video taken by 8 cameras observing more than 2,700 identities over 85 min; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art.

...read moreread less

1,775 citations

Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector

[...]

Р Ю Чуйков, Д А Юдин

01 Jan 2017

1,687 citations

Journal Article•DOI•

DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo

[...]

Engin Tola¹, Vincent Lepetit¹, Pascal Fua¹•Institutions (1)

École Normale Supérieure¹

01 May 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using a local image descriptor, DAISY, which is very efficient to compute densely and robust against many photometric and geometric transformations.

...read moreread less

Abstract: In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations than the pixel and correlation-based algorithms that are commonly used in narrow-baseline stereo. Also, using a descriptor makes our algorithm robust against many photometric and geometric transformations. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF, which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance when used densely. It is important to note that our approach is the first algorithm that attempts to estimate dense depth maps from wide-baseline image pairs, and we show that it is a good one at that with many experiments for depth estimation accuracy, occlusion detection, and comparing it against other descriptors on laser-scanned ground truth scenes. We also tested our approach on a variety of indoor and outdoor scenes with different photometric and geometric transformations and our experiments support our claim to being robust against these.

...read moreread less

1,484 citations

Journal Article•DOI•

ASIFT: A New Framework for Fully Affine Invariant Image Comparison

[...]

Jean-Michel Morel¹, Guoshen Yu²•Institutions (2)

École normale supérieure de Cachan¹, Chicago Metropolitan Agency for Planning²

01 Apr 2009-Siam Journal on Imaging Sciences

TL;DR: The proposed affine-SIFT (ASIFT), simulates all image views obtainable by varying the two camera axis orientation parameters, namely, the latitude and the longitude angles, left over by the SIFT method, and will be mathematically proved to be fully affine invariant.

...read moreread less

Abstract: If a physical object has a smooth or piecewise smooth boundary, its images obtained by cameras in varying positions undergo smooth apparent deformations. These deformations are locally well approximated by affine transforms of the image plane. In consequence the solid object recognition problem has often been led back to the computation of affine invariant image local features. Such invariant features could be obtained by normalization methods, but no fully affine normalization method exists for the time being. Even scale invariance is dealt with rigorously only by the scale-invariant feature transform (SIFT) method. By simulating zooms out and normalizing translation and rotation, SIFT is invariant to four out of the six parameters of an affine transform. The method proposed in this paper, affine-SIFT (ASIFT), simulates all image views obtainable by varying the two camera axis orientation parameters, namely, the latitude and the longitude angles, left over by the SIFT method. Then it covers the other four parameters by using the SIFT method itself. The resulting method will be mathematically proved to be fully affine invariant. Against any prognosis, simulating all views depending on the two camera orientation parameters is feasible with no dramatic computational load. A two-resolution scheme further reduces the ASIFT complexity to about twice that of SIFT. A new notion, the transition tilt, measuring the amount of distortion from one view to another, is introduced. While an absolute tilt from a frontal to a slanted view exceeding 6 is rare, much higher transition tilts are common when two slanted views of an object are compared (see Figure hightransitiontiltsillustration). The attainable transition tilt is measured for each affine image comparison method. The new method permits one to reliably identify features that have undergone transition tilts of large magnitude, up to 36 and higher. This fact is substantiated by many experiments which show that ASIFT significantly outperforms the state-of-the-art methods SIFT, maximally stable extremal region (MSER), Harris-affine, and Hessian-affine.

...read moreread less

1,480 citations

Journal Article•DOI•

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

[...]

Jianqi Ma¹, Weiyuan Shao², Hao Ye², Li Wang¹, Hong Wang², Yingbin Zheng², Xiangyang Xue¹ - Show less +3 more•Institutions (2)

Fudan University¹, Chinese Academy of Sciences²

23 Mar 2018-IEEE Transactions on Multimedia

TL;DR: The Rotation Region Proposal Networks are designed to generate inclined proposals with text orientation angle information that are adapted for bounding box regression to make the proposals more accurately fit into the text region in terms of the orientation.

...read moreread less

Abstract: This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the Rotation Region Proposal Networks , which are designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in terms of the orientation. The Rotation Region-of-Interest pooling layer is proposed to project arbitrary-oriented proposals to a feature map for a text region classifier. The whole framework is built upon a region-proposal-based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection compared with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.

...read moreread less

1,002 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse