Home
/
Authors
/
Jose M. Alvarez

Author

Jose M. Alvarez

Other affiliations: Australian National University, Courant Institute of Mathematical Sciences, Commonwealth Scientific and Industrial Research Organisation ...read more

Bio: Jose M. Alvarez is an academic researcher from Nvidia. The author has contributed to research in topics: Object detection & Computer science. The author has an hindex of 29, co-authored 124 publications receiving 4720 citations. Previous affiliations of Jose M. Alvarez include Australian National University & Courant Institute of Mathematical Sciences.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007

Papers

PDF

Open Access

More filters

Journal Article•DOI•

ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation

[...]

Eduardo Romera¹, Jose M. Alvarez², Luis M. Bergasa¹, Roberto Arroyo¹•Institutions (2)

University of Alcalá¹, Commonwealth Scientific and Industrial Research Organisation²

01 Jan 2018-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A deep architecture that is able to run in real time while providing accurate semantic segmentation, and a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy is proposed.

...read moreread less

Abstract: Semantic segmentation is a challenging task that addresses most of the perception needs of intelligent vehicles (IVs) in an unified way. Deep neural networks excel at this task, as they can be trained end-to-end to accurately classify multiple object categories in an image at pixel level. However, a good tradeoff between high quality and computational resources is yet not present in the state-of-the-art semantic segmentation approaches, limiting their application in real vehicles. In this paper, we propose a deep architecture that is able to run in real time while providing accurate semantic segmentation. The core of our architecture is a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy. Our approach is able to run at over 83 FPS in a single Titan X, and 7 FPS in a Jetson TX1 (embedded device). A comprehensive set of experiments on the publicly available Cityscapes data set demonstrates that our system achieves an accuracy that is similar to the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision. The resulting tradeoff makes our model an ideal approach for scene understanding in IV applications. The code is publicly available at: https://github.com/Eromera/erfnet

...read moreread less

1,134 citations

Posted Content•

Invertible Conditional GANs for image editing.

[...]

Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, Jose M. Alvarez

19 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work evaluates encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation, which allows to reconstruct and modify real images of faces conditioning on arbitrary attributes.

...read moreread less

Abstract: Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes. Additionally, we evaluate the design of cGANs. The combination of an encoder with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real images with deterministic complex modifications.

...read moreread less

627 citations

Book Chapter•DOI•

Less is More: Towards Compact CNNs

[...]

Hao Zhou¹, Jose M. Alvarez², Fatih Porikli³, Fatih Porikli²•Institutions (3)

University of Maryland, College Park¹, Commonwealth Scientific and Industrial Research Organisation², Australian National University³

08 Oct 2016

TL;DR: This work shows that, by incorporating sparse constraints into the objective function, it is possible to decimate the number of neurons during the training stage, thus theNumber of parameters and the memory footprint of the neural network are reduced, which is desirable at the test time.

...read moreread less

Abstract: To attain a favorable performance on large-scale datasets, convolutional neural networks (CNNs) are usually designed to have very high capacity involving millions of parameters. In this work, we aim at optimizing the number of neurons in a network, thus the number of parameters. We show that, by incorporating sparse constraints into the objective function, it is possible to decimate the number of neurons during the training stage. As a result, the number of parameters and the memory footprint of the neural network are also reduced, which is also desirable at the test time. We evaluated our method on several well-known CNN structures including AlexNet, and VGG over different datasets including ImageNet. Extensive experimental results demonstrate that our method leads to compact networks. Taking first fully connected layer as an example, our compact CNN contains only \(30\,\%\) of the original neurons without any degradation of the top-1 classification accuracy.

...read moreread less

350 citations

Journal Article•DOI•

Road Detection Based on Illuminant Invariance

[...]

Jose M. Alvarez, Antonio M. López

01 Mar 2011-IEEE Transactions on Intelligent Transportation Systems

TL;DR: In this article, a shadow-invariant feature space combined with a model-based classifier is used to detect the free road surface ahead of the ego-vehicle.

...read moreread less

Abstract: By using an onboard camera, it is possible to detect the free road surface ahead of the ego-vehicle. Road detection is of high relevance for autonomous driving, road departure warning, and supporting driver-assistance systems such as vehicle and pedestrian detection. The key for vision-based road detection is the ability to classify image pixels as belonging or not to the road surface. Identifying road pixels is a major challenge due to the intraclass variability caused by lighting conditions. A particularly difficult scenario appears when the road surface has both shadowed and nonshadowed areas. Accordingly, we propose a novel approach to vision-based road detection that is robust to shadows. The novelty of our approach relies on using a shadow-invariant feature space combined with a model-based classifier. The model is built online to improve the adaptability of the algorithm to the current lighting and the presence of other vehicles in the scene. The proposed algorithm works in still images and does not depend on either road shape or temporal restrictions. Quantitative and qualitative experiments on real-world road sequences with heavy traffic and shadows show that the method is robust to shadows and lighting variations. Moreover, the proposed method provides the highest performance when compared with hue-saturation-intensity (HSI)-based algorithms.

...read moreread less

327 citations

Posted Content•

Learning the Number of Neurons in Deep Networks.

[...]

Jose M. Alvarez¹, Mathieu Salzmann²•Institutions (2)

Australian National University¹, École Polytechnique Fédérale de Lausanne²

19 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes to make use of a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron, and shows that this approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy.

...read moreread less

Abstract: Nowadays, the number of layers and of neurons in each layer of a deep network are typically set manually. While very deep and wide networks have proven effective in general, they come at a high memory and computation cost, thus making them impractical for constrained platforms. These networks, however, are known to have many redundant parameters, and could thus, in principle, be replaced by more compact architectures. In this paper, we introduce an approach to automatically determining the number of neurons in each layer of a deep network during learning. To this end, we propose to make use of structured sparsity during learning. More precisely, we use a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron. Starting from an overcomplete network, we show that our approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy.

...read moreread less

307 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Journal Article•DOI•

Brain tumor segmentation with Deep Neural Networks

[...]

Mohammad Havaei¹, Axel Davy², David Warde-Farley³, Antoine Biard³, Aaron Courville³, Yoshua Bengio³, Chris Pal⁴, Pierre-Marc Jodoin¹, Hugo Larochelle¹ - Show less +5 more•Institutions (4)

Université de Sherbrooke¹, École Normale Supérieure², Université de Montréal³, École Polytechnique de Montréal⁴

01 Jan 2017-Medical Image Analysis

TL;DR: A fast and accurate fully automatic method for brain tumor segmentation which is competitive both in terms of accuracy and speed compared to the state of the art, and introduces a novel cascaded architecture that allows the system to more accurately model local label dependencies.

...read moreread less

2,538 citations

Proceedings Article•DOI•

StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation

[...]

Yunjey Choi¹, Minje Choi¹, Munyoung Kim², Jung-Woo Ha³, Sunghun Kim⁴, Jaegul Choo¹ - Show less +2 more•Institutions (4)

Korea University¹, The College of New Jersey², Naver Corporation³, Hong Kong University of Science and Technology⁴

18 Jun 2018

TL;DR: StarGAN as discussed by the authors proposes a unified model architecture to perform image-to-image translation for multiple domains using only a single model, which leads to superior quality of translated images compared to existing models as well as the capability of flexibly translating an input image to any desired target domain.

...read moreread less

Abstract: Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.

...read moreread less

2,479 citations

Proceedings Article•DOI•

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

[...]

Deqing Sun¹, Xiaodong Yang¹, Ming-Yu Liu¹, Jan Kautz¹•Institutions (1)

Nvidia¹

01 Jun 2018

TL;DR: PWC-Net as discussed by the authors uses the current optical flow estimate to warp the CNN features of the second image, which is processed by a CNN to estimate the optical flow, and achieves state-of-the-art performance on the MPI Sintel final pass and KITTI 2015 benchmarks.

...read moreread less

Abstract: We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the current optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024 A— 436) images. Our models are available on our project website.

...read moreread less

2,231 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse