Home
/
Authors
/
Siwei Ma

Author

Siwei Ma

Other affiliations: University of Southern California, Beihang University, Chinese Academy of Sciences ...read more

Bio: Siwei Ma is an academic researcher from Peking University. The author has contributed to research in topics: Coding tree unit & Motion compensation. The author has an hindex of 44, co-authored 464 publications receiving 7878 citations. Previous affiliations of Siwei Ma include University of Southern California & Beihang University.

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003

Papers

PDF

Open Access

More filters

Posted Content•

Pre-Trained Image Processing Transformer

[...]

Hanting Chen¹, Yunhe Wang², Tianyu Guo¹, Chang Xu³, Yiping Deng², Zhenhua Liu², Siwei Ma¹, Chunjing Xu², Chao Xu¹, Wen Gao¹ - Show less +6 more•Institutions (3)

Peking University¹, Huawei², University of Sydney³

01 Dec 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: To maximally excavate the capability of transformer, the IPT model is presented to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs and the contrastive learning is introduced for well adapting to different image processing tasks.

...read moreread less

Abstract: As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at this https URL and this https URL

...read moreread less

631 citations

Proceedings Article•DOI•

Pre-Trained Image Processing Transformer

[...]

Hanting Chen¹, Yunhe Wang², Tianyu Guo¹, Chang Xu³, Yiping Deng², Zhenhua Liu², Siwei Ma¹, Chunjing Xu², Chao Xu¹, Wen Gao¹ - Show less +6 more•Institutions (3)

Peking University¹, Huawei², University of Sydney³

01 Jun 2021

TL;DR: Hu et al. as discussed by the authors proposed a pre-trained image processing transformer (IPT) model for denoising, super-resolution and deraining tasks, which is trained on corrupted image pairs with multi-heads and multi-tails.

...read moreread less

416 citations

Proceedings Article•DOI•

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

[...]

Qi Mao¹, Hsin-Ying Lee², Hung-Yu Tseng², Siwei Ma¹, Ming-Hsuan Yang² - Show less +1 more•Institutions (2)

Peking University¹, University of California, Merced²

13 Mar 2019

TL;DR: This work proposes a simple yet effective regularization term to address the mode collapse issue for cGANs and explicitly maximizes the ratio of the distance between generated images with respect to the corresponding latent codes, thus encouraging the generators to explore more minor modes during training.

...read moreread less

Abstract: Most conditional generation tasks expect diverse outputs given a single conditional context. However, conditional generative adversarial networks (cGANs) often focus on the prior conditional information and ignore the input noise vectors, which contribute to the output variations. Recent attempts to resolve the mode collapse issue for cGANs are usually task-specific and computationally expensive. In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs. The proposed method explicitly maximizes the ratio of the distance between generated images with respect to the corresponding latent codes, thus encouraging the generators to explore more minor modes during training. This mode seeking regularization term is readily applicable to various conditional generation tasks without imposing training overhead or modifying the original network structures. We validate the proposed algorithm on three conditional image synthesis tasks including categorical generation, image-to-image translation, and text-to-image synthesis with different baseline models. Both qualitative and quantitative results demonstrate the effectiveness of the proposed regularization method for improving diversity without loss of quality.

...read moreread less

362 citations

Journal Article•DOI•

Rate-distortion analysis for H.264/AVC video coding and its application to rate control

[...]

Siwei Ma¹, Wen Gao¹, Yan Lu²•Institutions (2)

Chinese Academy of Sciences¹, Microsoft²

01 Dec 2005-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A new rate-distortion (R-D) model is proposed by utilizing the true quantization stepsize and an improved rate-control scheme for the H.264/AVC encoder based on this new R-D model is developed.

...read moreread less

Abstract: In this paper, an efficient rate-control scheme for H.264/AVC video encoding is proposed. The redesign of the quantization scheme in H.264/AVC results in that the relationship between the quantization parameter and the true quantization stepsize is no longer linear. Based on this observation, we propose a new rate-distortion (R-D) model by utilizing the true quantization stepsize and then develop an improved rate-control scheme for the H.264/AVC encoder based on this new R-D model. In general, the current R-D optimization (RDO) mode-selection scheme in H.264/AVC test model is difficult for rate control, because rate control usually requires a predetermined set of motion vectors and coding modes to select the quantization parameter, whereas the RDO does in the different order and requires a predetermined quantization parameter to select motion vectors and coding modes. To tackle this problem, we develop a complexity-adjustable rate-control scheme based on the proposed R-D model. Briefly, the proposed scheme is a one-pass process at frame level and a partial two-pass process at macroblock level. Since the number of macroblocks with the two-pass processing can be controlled by an encoder parameter, the fully one-pass implementation is a subset of the proposed algorithm. An additional topic discussed in this paper is about video buffering. Since a hypothetical reference decoder (HRD) has been defined in H.264/AVC to guarantee that the buffers never overflow or underflow, the more accurate rate-allocation schemes are proposed to satisfy these requirements of HRD.

...read moreread less

341 citations

Proceedings Article•DOI•

Fast mode decision algorithm for intra prediction in HEVC

[...]

Liang Zhao¹, Li Zhang², Siwei Ma², Debin Zhao¹•Institutions (2)

Harbin Institute of Technology¹, Peking University²

29 Dec 2011

TL;DR: Experimental results show that the fast intra mode decision scheme provides almost 20% time savings in all intra low complexity cases on average with negligible loss of coding efficiency.

...read moreread less

Abstract: As the next generation standard of video coding, the High Efficiency Video Coding (HEVC) is intended to provide significantly better coding efficiency than all existing video coding standards. To improve the coding efficiency of intra frame coding, up to 34 intra prediction modes are defined in HEVC. The best mode among these pre-defined intra prediction modes is selected by rate-distortion optimization (RDO) for each block. If all directions are tested in the RDO process, it will be very time-consuming. To alleviate the encoder computation load, this paper proposes a new method to reduce the candidates in RDO process. In addition, the direction information of the neighboring blocks is made full use of to speed up intra mode decision. Experimental results show that the proposed scheme provides 20% and 28% time savings in intra high efficiency and low complexity cases on average compared to the default encoding scheme in HM 1.0 with almost the same coding efficiency. This algorithm has been proposed to HEVC standard and partially adopted into the HEVC test model.

...read moreread less

311 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Journal Article•DOI•

FSIM: A Feature Similarity Index for Image Quality Assessment

[...]

Lin Zhang, Lei Zhang, Xuanqin Mou¹, David Zhang•Institutions (1)

Xi'an Jiaotong University¹

01 Aug 2011-IEEE Transactions on Image Processing

TL;DR: A novel feature similarity (FSIM) index for full reference IQA is proposed based on the fact that human visual system (HVS) understands an image mainly according to its low-level features.

...read moreread less

Abstract: Image quality assessment (IQA) aims to use computational models to measure the image quality consistently with subjective evaluations. The well-known structural similarity index brings IQA from pixel- to structure-based stage. In this paper, a novel feature similarity (FSIM) index for full reference IQA is proposed based on the fact that human visual system (HVS) understands an image mainly according to its low-level features. Specifically, the phase congruency (PC), which is a dimensionless measure of the significance of a local structure, is used as the primary feature in FSIM. Considering that PC is contrast invariant while the contrast information does affect HVS' perception of image quality, the image gradient magnitude (GM) is employed as the secondary feature in FSIM. PC and GM play complementary roles in characterizing the image local quality. After obtaining the local quality map, we use PC again as a weighting function to derive a single quality score. Extensive experiments performed on six benchmark IQA databases demonstrate that FSIM can achieve much higher consistency with the subjective evaluations than state-of-the-art IQA metrics.

...read moreread less

4,028 citations

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Journal Article•DOI•

No-Reference Image Quality Assessment in the Spatial Domain

[...]

Anish Mittal¹, Anush K. Moorthy¹, Alan C. Bovik¹•Institutions (1)

University of Texas at Austin¹

01 Dec 2012-IEEE Transactions on Image Processing

TL;DR: Despite its simplicity, it is able to show that BRISQUE is statistically better than the full-reference peak signal-to-noise ratio and the structural similarity index, and is highly competitive with respect to all present-day distortion-generic NR IQA algorithms.

...read moreread less

Abstract: We propose a natural scene statistic-based distortion-generic blind/no-reference (NR) image quality assessment (IQA) model that operates in the spatial domain. The new model, dubbed blind/referenceless image spatial quality evaluator (BRISQUE) does not compute distortion-specific features, such as ringing, blur, or blocking, but instead uses scene statistics of locally normalized luminance coefficients to quantify possible losses of “naturalness” in the image due to the presence of distortions, thereby leading to a holistic measure of quality. The underlying features used derive from the empirical distribution of locally normalized luminances and products of locally normalized luminances under a spatial natural scene statistic model. No transformation to another coordinate frame (DCT, wavelet, etc.) is required, distinguishing it from prior NR IQA approaches. Despite its simplicity, we are able to show that BRISQUE is statistically better than the full-reference peak signal-to-noise ratio and the structural similarity index, and is highly competitive with respect to all present-day distortion-generic NR IQA algorithms. BRISQUE has very low computational complexity, making it well suited for real time applications. BRISQUE features may be used for distortion-identification as well. To illustrate a new practical application of BRISQUE, we describe how a nonblind image denoising algorithm can be augmented with BRISQUE in order to perform blind image denoising. Results show that BRISQUE augmentation leads to performance improvements over state-of-the-art methods. A software release of BRISQUE is available online: http://live.ece.utexas.edu/research/quality/BRISQUE_release.zip for public use and evaluation.

...read moreread less

3,780 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse