Home
/
Authors
/
Timo Bolkart

Author

Timo Bolkart

Other affiliations: Ben-Gurion University of the Negev, Saarland University

Bio: Timo Bolkart is an academic researcher from Max Planck Society. The author has contributed to research in topics: Face (geometry) & Computer science. The author has an hindex of 20, co-authored 43 publications receiving 1957 citations. Previous affiliations of Timo Bolkart include Ben-Gurion University of the Negev & Saarland University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Learning a model of facial shape and expression from 4D scans

[...]

Tianye Li¹, Timo Bolkart¹, Michael J. Black¹, Hao Li², Javier Romero - Show less +1 more•Institutions (2)

Max Planck Society¹, Institute for Creative Technologies²

20 Nov 2017-ACM Transactions on Graphics

TL;DR: Faces Learned with an Articulated Model and Expressions is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model and is compared to these models by fitting them to static 3D scans and 4D sequences using the same optimization method.

...read moreread less

Abstract: The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression blendshapes. The pose and expression dependent articulations are learned from 4D face sequences in the D3DFACS dataset along with additional 4D sequences. We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).

...read moreread less

629 citations

Proceedings Article•DOI•

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

[...]

Georgios Pavlakos¹, Vasileios Choutas², Nima Ghorbani², Timo Bolkart², Ahmed A. A. Osman², Dimitrios Tzionas², Michael J. Black² - Show less +3 more•Institutions (2)

University of Pennsylvania¹, Max Planck Society²

15 Jun 2019

TL;DR: In this article, a 3D model of human body pose, hand pose, and facial expression from a single monocular image is computed using SMPL-X, which is trained using thousands of 3D scans.

...read moreread less

Abstract: To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.

...read moreread less

551 citations

Posted Content•

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

[...]

Georgios Pavlakos¹, Vasileios Choutas², Nima Ghorbani², Timo Bolkart², Ahmed A. A. Osman², Dimitrios Tzionas², Michael J. Black² - Show less +3 more•Institutions (2)

University of Pennsylvania¹, Max Planck Society²

11 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work uses the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild, and evaluates 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth.

...read moreread less

438 citations

Book Chapter•DOI•

Generating 3D Faces using Convolutional Mesh Autoencoders

[...]

Anurag Ranjan¹, Timo Bolkart¹, Soubhik Sanyal¹, Michael J. Black¹•Institutions (1)

Max Planck Society¹

08 Sep 2018

TL;DR: In this article, spectral convolutions on a mesh surface are used to learn a non-linear representation of a face using mesh sampling operations that enable a hierarchical mesh representation that captures nonlinear variations in shape and expression at multiple scales within the model.

...read moreread less

Abstract: Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations Due to this linearity, they can not capture extreme deformations and non-linear expressions To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters We show that, replacing the expression space of an existing state-of-the-art face model with our model, achieves a lower reconstruction error Our data, model and code are available at http://comaistuempgde/

...read moreread less

396 citations

Proceedings Article•DOI•

Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision

[...]

Soubhik Sanyal¹, Timo Bolkart¹, Haiwen Feng¹, Michael J. Black¹•Institutions (1)

Max Planck Society¹

15 Jun 2019

TL;DR: RingNet as discussed by the authors uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people, achieving invariance to expression by representing the face using the FLAME model.

...read moreread less

Abstract: The estimation of 3D face shape from a single image must be robust to variations in lighting, head pose, expression, facial hair, makeup, and occlusions. Robustness requires a large training set of in-the-wild images, which by construction, lack ground truth 3D shape. To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Our key observation is that an individual’s face shape is constant across images, regardless of expression, pose, lighting, etc. RingNet leverages multiple images of a person and automatically detected 2D face features. It uses a novel loss that encourages the face shape to be similar when the identity is the same and different for different people. We achieve invariance to expression by representing the face using the FLAME model. Once trained, our method takes a single image and outputs the parameters of FLAME, which can be readily animated. Additionally we create a new database of faces “not quite in-the-wild” (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. We evaluate publicly available methods and find that RingNet is more accurate than methods that use 3D supervision. The dataset, model, and results are available for research purposes at http://ringnet.is.tuebingen.mpg.de.

...read moreread less

233 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Dynamic Graph CNN for Learning on Point Clouds

[...]

Yue Wang¹, Yongbin Sun¹, Ziwei Liu², Sanjay E. Sarma¹, Michael M. Bronstein³, Justin Solomon¹ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, University of California, Berkeley², Imperial College London³

10 Oct 2019-ACM Transactions on Graphics

TL;DR: This work proposes a new neural network module suitable for CNN-based high-level tasks on point clouds, including classification and segmentation called EdgeConv, which acts on graphs dynamically computed in each layer of the network.

...read moreread less

Abstract: Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information, so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds, including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks, including ModelNet40, ShapeNetPart, and S3DIS.

...read moreread less

3,727 citations

Posted Content•

Fast Graph Representation Learning with PyTorch Geometric

[...]

Matthias Fey, Jan Eric Lenssen

06 Mar 2019-arXiv: Learning

TL;DR: PyTorch Geometric is introduced, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch, and a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios is performed.

...read moreread less

Abstract: We introduce PyTorch Geometric, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch. In addition to general graph data structures and processing methods, it contains a variety of recently published methods from the domains of relational learning and 3D data processing. PyTorch Geometric achieves high data throughput by leveraging sparse GPU acceleration, by providing dedicated CUDA kernels and by introducing efficient mini-batch handling for input examples of different size. In this work, we present the library in detail and perform a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios.

...read moreread less

2,308 citations

Proceedings Article•

A morphable model for the synthesis of 3D faces

[...]

Matthew Turk

01 Jan 1999

2,010 citations

Posted Content•

Occupancy Networks: Learning 3D Reconstruction in Function Space

[...]

Lars Mescheder¹, Michael Oechsle¹, Michael Niemeyer¹, Sebastian Nowozin², Andreas Geiger¹ - Show less +1 more•Institutions (2)

University of Tübingen¹, Google²

10 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes Occupancy Networks, a new representation for learning-based 3D reconstruction methods that encodes a description of the 3D output at infinite resolution without excessive memory footprint, and validate that the representation can efficiently encode 3D structure and can be inferred from various kinds of input.

...read moreread less

Abstract: With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

...read moreread less

1,212 citations

Proceedings Article•DOI•

Occupancy Networks: Learning 3D Reconstruction in Function Space

[...]

Lars Mescheder¹, Michael Oechsle¹, Michael Niemeyer¹, Sebastian Nowozin², Andreas Geiger¹ - Show less +1 more•Institutions (2)

University of Tübingen¹, Google²

15 Jun 2019

TL;DR: In this paper, the authors propose Occupancy Networks, which implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier, which can be used for learning-based 3D reconstruction methods.

...read moreread less

1,192 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse