Home
/
Authors
/
Fitsum A. Reda

Author

Fitsum A. Reda

Other affiliations: Google, Siemens, Vanderbilt University

Bio: Fitsum A. Reda is an academic researcher from Nvidia. The author has contributed to research in topics: Computer science & Image segmentation. The author has an hindex of 13, co-authored 39 publications receiving 2211 citations. Previous affiliations of Fitsum A. Reda include Google & Siemens.

Topics: Computer science, Image segmentation, Pixel, Frame (networking), Convolution ...read more

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Image Inpainting for Irregular Holes Using Partial Convolutions

[...]

Guilin Liu¹, Fitsum A. Reda¹, Kevin J. Shih¹, Ting-Chun Wang¹, Andrew Tao¹, Bryan Catanzaro¹ - Show less +2 more•Institutions (1)

Nvidia¹

08 Sep 2018

TL;DR: This work proposes the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels, and outperforms other methods for irregular masks.

...read moreread less

Abstract: Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness. Post-processing is usually used to reduce such artifacts, but are expensive and may fail. We propose the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels. We further include a mechanism to automatically generate an updated mask for the next layer as part of the forward pass. Our model outperforms other methods for irregular masks. We show qualitative and quantitative comparisons with other methods to validate our approach.

...read moreread less

1,606 citations

Posted Content•

Image Inpainting for Irregular Holes Using Partial Convolutions

[...]

Guilin Liu¹, Fitsum A. Reda¹, Kevin J. Shih¹, Ting-Chun Wang¹, Andrew Tao¹, Bryan Catanzaro¹ - Show less +2 more•Institutions (1)

Nvidia¹

20 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the convolution is masked and renormalized to be conditioned on only valid pixels, and a mechanism is proposed to automatically generate an updated mask for the next layer as part of the forward pass.

...read moreread less

536 citations

Proceedings Article•DOI•

Improving Semantic Segmentation via Video Propagation and Label Relaxation

[...]

Yi Zhu¹, Karan Sapra², Fitsum A. Reda², Kevin J. Shih², Shawn Newsam¹, Andrew Tao², Bryan Catanzaro² - Show less +3 more•Institutions (2)

University of California, Merced¹, Nvidia²

15 Jun 2019

TL;DR: In this article, a video prediction-based methodology was proposed to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks, which achieved state-of-the-art mIoUs of 83.5% on Cityscapes and 82.9% on CamVid.

...read moreread less

Abstract: Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models' ability to predict future frames in order to also predict future labels. A joint propagation strategy is also proposed to alleviate mis-alignments in synthesized samples. We demonstrate that training segmentation models on datasets augmented by the synthesized samples leads to significant improvements in accuracy. Furthermore, we introduce a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries. Our proposed methods achieve state-of-the-art mIoUs of 83.5% on Cityscapes and 82.9% on CamVid. Our single model, without model ensembles, achieves 72.8% mIoU on the KITTI semantic segmentation test set, which surpasses the winning entry of the ROB challenge 2018.

...read moreread less

294 citations

Book Chapter•DOI•

SDC-Net: Video prediction using spatially-displaced convolution

[...]

Fitsum A. Reda¹, Guilin Liu¹, Kevin J. Shih¹, Robert M. Kirby¹, Jon Barker¹, David Tarjan¹, Andrew Tao¹, Bryan Catanzaro¹ - Show less +4 more•Institutions (1)

Nvidia¹

08 Sep 2018

TL;DR: SDC module for video frame prediction with spatially-displaced convolution inherits the merits of both vector-based and kernel-based approaches, while ameliorating their respective disadvantages.

...read moreread less

Abstract: We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. Previous approaches rely on resampling past frames, guided by a learned future optical flow, or on direct generation of pixels. Resampling based on flow is insufficient because it cannot deal with disocclusions. Generative models currently lead to blurry results. Recent approaches synthesis a pixel by convolving input patches with a predicted kernel. However, their memory requirement increases with kernel size. Here, we present spatially-displaced convolution (SDC) module for video frame prediction. We learn a motion vector and a kernel for each pixel and synthesize a pixel by applying the kernel at a displaced location in the source image, defined by the predicted motion vector. Our approach inherits the merits of both vector-based and kernel-based approaches, while ameliorating their respective disadvantages. We train our model on 428K unlabelled 1080p video game frames. Our approach produces state-of-the-art results, achieving an SSIM score of 0.904 on high-definition YouTube-8M videos, 0.918 on Caltech Pedestrian videos. Our model handles large motion effectively and synthesizes crisp frames with consistent motion.

...read moreread less

131 citations

Journal Article•DOI•

Minimally invasive image‐guided cochlear implantation surgery: First report of clinical implementation

[...]

Robert F. Labadie¹, Ramya Balachandran¹, Jack H. Noble¹, Grégoire S. Blachon¹, Jason E. Mitchell¹, Fitsum A. Reda¹, Benoit M. Dawant¹, J. Michael Fitzpatrick¹ - Show less +4 more•Institutions (1)

Vanderbilt University¹

01 Aug 2014-Laryngoscope

TL;DR: This is the first clinical implementation of a minimally invasive image‐guided approach to cochlear implantation that involves drilling a narrow, linear tunnel to the cochlea.

...read moreread less

Abstract: OBJECTIVE Minimally-invasive image-guided approach to cochlear implantation (CI) involves drilling a narrow, linear tunnel to the cochlea. Reported herein is the first clinical implementation of this approach.

...read moreread less

114 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Proceedings Article•

Implicit Neural Representations with Periodic Activation Functions

[...]

Vincent Sitzmann¹, Julien N. P. Martel¹, Alexander W. Bergman¹, David B. Lindell¹, Gordon Wetzstein¹ - Show less +1 more•Institutions (1)

Stanford University¹

17 Jun 2020

TL;DR: In this paper, the authors propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives.

...read moreread less

Abstract: Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. We analyze Siren activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how Sirens can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine Sirens with hypernetworks to learn priors over the space of Siren functions.

...read moreread less

1,058 citations

Book Chapter•DOI•

Object-Contextual Representations for Semantic Segmentation

[...]

Yuhui Yuan¹, Xilin Chen¹, Jingdong Wang²•Institutions (2)

Chinese Academy of Sciences¹, Microsoft²

23 Aug 2020

TL;DR: This paper addresses the semantic segmentation problem with a focus on the context aggregation strategy, and presents a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class.

...read moreread less

Abstract: In this paper, we study the context aggregation problem in semantic segmentation. Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of the ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, we compute the relation between each pixel and each object region, and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations. We empirically demonstrate our method achieves competitive performance on various benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff. Our submission “HRNet + OCR + SegFix” achieves the \({1}^{\mathrm {st}}\) place on the Cityscapes leaderboard by the ECCV 2020 submission deadline. Code is available at: https://git.io/openseg and https://git.io/HRNet.OCR.

...read moreread less

952 citations

Proceedings Article•DOI•

Free-Form Image Inpainting With Gated Convolution

[...]

Jiahui Yu¹, Zhe Lin², Jimei Yang², Xiaohui Shen, Xin Lu², Thomas S. Huang¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Adobe Systems²

22 Oct 2019

TL;DR: Yu et al. as mentioned in this paper proposed a generative image inpainting system to complete images with free-form mask and guidance, which is based on gated convolutions learned from millions of images without additional labeling efforts.

...read moreread less

Abstract: We present a generative image inpainting system to complete images with free-form mask and guidance. The system is based on gated convolutions learned from millions of images without additional labelling efforts. The proposed gated convolution solves the issue of vanilla convolution that treats all input pixels as valid ones, generalizes partial convolution by providing a learnable dynamic feature selection mechanism for each channel at each spatial location across all layers. Moreover, as free-form masks may appear anywhere in images with any shape, global and local GANs designed for a single rectangular mask are not applicable. Thus, we also present a patch-based GAN loss, named SN-PatchGAN, by applying spectral-normalized discriminator on dense image patches. SN-PatchGAN is simple in formulation, fast and stable in training. Results on automatic image inpainting and user-guided extension demonstrate that our system generates higher-quality and more flexible results than previous methods. Our system helps user quickly remove distracting objects, modify image layouts, clear watermarks and edit faces. Code, demo and models are available at: \url{https://github.com/JiahuiYu/generative_inpainting}.

...read moreread less

904 citations

Journal Article•DOI•

Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey

[...]

Longlong Jing¹, Yingli Tian¹•Institutions (1)

City University of New York¹

01 Nov 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos as a subset of unsupervised learning methods to learn general image and video features from large-scale unlabeled data without using any human-annotated labels is provided.

...read moreread less

Abstract: Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

...read moreread less

876 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse