Home
/
Authors
/
Philipp Fischer

Author

Philipp Fischer

Other affiliations: Berlin School of Economics and Law

Bio: Philipp Fischer is an academic researcher from University of Freiburg. The author has contributed to research in topics: Supervised learning & Convolutional neural network. The author has an hindex of 12, co-authored 16 publications receiving 51968 citations. Previous affiliations of Philipp Fischer include Berlin School of Economics and Law.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

[...]

Alexey Dosovitskiy¹, Philipp Fischer¹, Jost Tobias Springenberg¹, Martin Riedmiller¹, Thomas Brox¹ - Show less +1 more•Institutions (1)

University of Freiburg¹

01 Sep 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, a set of surrogate classes are formed by applying a variety of transformations to a randomly sampled image patch, and the resulting feature representation is not class specific, but provides robustness to the transformations that have been applied during training.

...read moreread less

Abstract: Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled ‘seed’ image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While features learned with our approach cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

...read moreread less

702 citations

Posted Content•

Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT

[...]

Philipp Fischer, Alexey Dosovitskiy, Thomas Brox

22 May 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper compares features from various layers of convolutional neural nets to standard SIFT descriptors and Surprisingly, convolutionAL neural networks clearly outperform SIFT on descriptor matching.

...read moreread less

Abstract: Latest results indicate that features learned via convolutional neural networks outperform previous descriptors on classification tasks by a large margin. It has been shown that these networks still work well when they are applied to datasets or recognition tasks different from those they were trained on. However, descriptors like SIFT are not only used in recognition but also for many correspondence problems that rely on descriptor matching. In this paper we compare features from various layers of convolutional neural nets to standard SIFT descriptors. We consider a network that was trained on ImageNet and another one that was trained without supervision. Surprisingly, convolutional neural networks clearly outperform SIFT on descriptor matching. This paper has been merged with arXiv:1406.6909

...read moreread less

291 citations

Journal Article•DOI•

A benchmark for comparison of dental radiography analysis algorithms

[...]

Ching-Wei Wang¹, Cheng-Ta Huang¹, Jia-Hong Lee¹, Chung-Hsing Li², Sheng-Wei Chang³, Ming-Jhih Siao³, Tat-Ming Lai, Bulat Ibragimov⁴, Tomaz Vrtovec⁴, Olaf Ronneberger⁵, Philipp Fischer⁵, Timothy F. Cootes⁶, Claudia Lindner⁶ - Show less +9 more•Institutions (6)

National Taiwan University of Science and Technology¹, National Defense Medical Center², Tri-Service General Hospital³, University of Ljubljana⁴, University of Freiburg⁵, University of Manchester⁶

01 Jul 2016-Medical Image Analysis

TL;DR: Based on the quantitative evaluation results, it is believed automatic dental radiography analysis is still a challenging and unsolved problem and the datasets and the evaluation software are made available to the research community, further encouraging future developments in this field.

...read moreread less

246 citations

Journal Article•DOI•

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation

[...]

Nikolaus Mayer¹, Eddy Ilg¹, Philipp Fischer¹, Caner Hazirbas², Daniel Cremers², Alexey Dosovitskiy¹, Thomas Brox¹ - Show less +3 more•Institutions (2)

University of Freiburg¹, Technische Universität München²

19 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper promotes the use of synthetically generated data for the purpose of training deep networks on visual recognition tasks and suggests multiple ways to generate such data and evaluates the influence of dataset properties on the performance and generalization properties of the resulting networks.

...read moreread less

Abstract: The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks.We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process.

...read moreread less

158 citations

Journal Article•DOI•

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation

[...]

Nikolaus Mayer¹, Eddy Ilg¹, Philipp Fischer¹, Caner Hazirbas², Daniel Cremers², Alexey Dosovitskiy¹, Thomas Brox¹ - Show less +3 more•Institutions (2)

University of Freiburg¹, Technische Universität München²

02 Apr 2018-International Journal of Computer Vision

TL;DR: In this article, the authors promote the use of synthetically generated data for the purpose of training deep networks on such tasks, and suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks.

...read moreread less

Abstract: The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks. We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process.

...read moreread less

84 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Fully convolutional networks for semantic segmentation

[...]

Jonathan Long¹, Evan Shelhamer¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

07 Jun 2015

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

...read moreread less

28,225 citations

Proceedings Article•DOI•

Feature Pyramid Networks for Object Detection

[...]

Tsung-Yi Lin¹, Piotr Dollár², Ross Girshick², Kaiming He², Bharath Hariharan², Serge Belongie¹ - Show less +2 more•Institutions (2)

Cornell University¹, Facebook²

21 Jul 2017

TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

...read moreread less

Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

...read moreread less

16,727 citations

Journal Article•DOI•

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

[...]

Vijay Badrinarayanan¹, Alex Kendall¹, Roberto Cipolla¹•Institutions (1)

University of Cambridge¹

01 Dec 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.

...read moreread less

Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/ .

...read moreread less

13,468 citations

Proceedings Article•DOI•

Image-to-Image Translation with Conditional Adversarial Networks

[...]

Phillip Isola¹, Jun-Yan Zhu¹, Tinghui Zhou¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

21 Jul 2017

TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

...read moreread less

11,958 citations

Posted Content•

Image-to-Image Translation with Conditional Adversarial Networks

[...]

Phillip Isola¹, Jun-Yan Zhu¹, Tinghui Zhou¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

21 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

...read moreread less

11,127 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse