Home
/
Authors
/
Daniel Vlasic

Author

Daniel Vlasic

Other affiliations: Mitsubishi Electric Research Laboratories, Massachusetts Institute of Technology

Bio: Daniel Vlasic is an academic researcher from Google. The author has contributed to research in topics: Optical flow & Dynamic data. The author has an hindex of 20, co-authored 39 publications receiving 3044 citations. Previous affiliations of Daniel Vlasic include Mitsubishi Electric Research Laboratories & Massachusetts Institute of Technology.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Face transfer with multilinear models

[...]

Daniel Vlasic¹, Matthew Brand², Hanspeter Pfister², Jovan Popović¹•Institutions (2)

Massachusetts Institute of Technology¹, Mitsubishi Electric Research Laboratories²

01 Jul 2005

TL;DR: Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another, based on a multilinear model of 3D face meshes that separably parameterizes the space of geometric variations due to different attributes.

...read moreread less

Abstract: Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another It extracts visemes (speech-related mouth articulations), expressions, and three-dimensional (3D) pose from monocular video or film footage These parameters are then used to generate and drive a detailed 3D textured face mesh for a target identity, which can be seamlessly rendered back into target footage The underlying face model automatically adjusts for how the target performs facial expressions and visemes The performance data can be easily edited to change the visemes, expressions, pose, or even the identity of the target---the attributes are separably controllable This supports a wide variety of video rewrite and puppetry applicationsFace Transfer is based on a multilinear model of 3D face meshes that separably parameterizes the space of geometric variations due to different attributes (eg, identity, expression, and viseme) Separability means that each of these attributes can be independently varied A multilinear model can be estimated from a Cartesian product of examples (identities × expressions × visemes) with techniques from statistical analysis, but only after careful preprocessing of the geometric data set to secure one-to-one correspondence, to minimize cross-coupling artifacts, and to fill in any missing examples Face Transfer offers new solutions to these problems and links the estimated model with a face-tracking algorithm to extract pose, expression, and viseme parameters

...read moreread less

679 citations

Journal Article•DOI•

Articulated mesh animation from multi-view silhouettes

[...]

Daniel Vlasic¹, Ilya Baran¹, Wojciech Matusik², Jovan Popović¹•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

01 Aug 2008

TL;DR: This work demonstrates a practical software system for capturing details in mesh animations from multi-view video recordings given a stream of synchronized video images that record a human performance from multiple viewpoints and an articulated template of the performer.

...read moreread less

Abstract: Details in mesh animations are difficult to generate but they have great impact on visual quality. In this work, we demonstrate a practical software system for capturing such details from multi-view video recordings. Given a stream of synchronized video images that record a human performance from multiple viewpoints and an articulated template of the performer, our system captures the motion of both the skeleton and the shape. The output mesh animation is enhanced with the details observed in the image silhouettes. For example, a performance in casual loose-fitting clothes will generate mesh animations with flowing garment motions. We accomplish this with a fast pose tracking method followed by nonrigid deformation of the template to fit the silhouettes. The entire process takes less than sixteen seconds per frame and requires no markers or texture cues. Captured meshes are in full correspondence making them readily usable for editing operations including texturing, deformation transfer, and deformation model learning.

...read moreread less

633 citations

Proceedings Article•DOI•

Practical motion capture in everyday surroundings

[...]

Daniel Vlasic¹, Rolf Adelsberger², Giovanni Vannucci², John C. Barnwell², Markus Gross³, Wojciech Matusik², Jovan Popović¹ - Show less +3 more•Institutions (3)

Massachusetts Institute of Technology¹, Mitsubishi Electric Research Laboratories², ETH Zurich³

29 Jul 2007

TL;DR: Experimental results show that even motions that are traditionally difficult to acquire are recorded with ease within their natural settings, and suggest that this system could become a versatile input device for a variety of augmented-reality applications.

...read moreread less

Abstract: Commercial motion-capture systems produce excellent in-studio reconstructions, but offer no comparable solution for acquisition in everyday environments. We present a system for acquiring motions almost anywhere. This wearable system gathers ultrasonic time-of-flight and inertial measurements with a set of inexpensive miniature sensors worn on the garment. After recording, the information is combined using an Extended Kalman Filter to reconstruct joint configurations of a body. Experimental results show that even motions that are traditionally difficult to acquire are recorded with ease within their natural settings. Although our prototype does not reliably recover the global transformation, we show that the resulting motions are visually similar to the original ones, and that the combined acoustic and intertial system reduces the drift commonly observed in purely inertial systems. Our final results suggest that this system could become a versatile input device for a variety of augmented-reality applications.

...read moreread less

352 citations

Journal Article•DOI•

Dynamic shape capture using multi-view photometric stereo

[...]

Daniel Vlasic¹, Pieter Peers², Ilya Baran¹, Paul Debevec², Jovan Popović¹, Szymon Rusinkiewicz³, Wojciech Matusik³ - Show less +3 more•Institutions (3)

Massachusetts Institute of Technology¹, Institute for Creative Technologies², Adobe Systems³

01 Dec 2009

TL;DR: In this article, a system for high-resolution capture of moving 3D geometry is described, where dynamic normal maps from multiple views are captured using active shape-from-shading (photometric stereo), with a large lighting dome providing a series of novel spherical lighting configurations.

...read moreread less

Abstract: We describe a system for high-resolution capture of moving 3D geometry, beginning with dynamic normal maps from multiple views. The normal maps are captured using active shape-from-shading (photometric stereo), with a large lighting dome providing a series of novel spherical lighting configurations. To compensate for low-frequency deformation, we perform multi-view matching and thin-plate spline deformation on the initial surfaces obtained by integrating the normal maps. Next, the corrected meshes are merged into a single mesh using a volumetric method. The final output is a set of meshes, which were impossible to produce with previous methods. The meshes exhibit details on the order of a few millimeters, and represent the performance over human-size working volumes at a temporal resolution of 60Hz.

...read moreread less

290 citations

Proceedings Article•DOI•

Unsupervised Training for 3D Morphable Model Regression

[...]

Kyle Genova¹, Forrester Cole², Aaron Maschinot², Aaron Sarna², Daniel Vlasic², William T. Freeman² - Show less +2 more•Institutions (2)

Princeton University¹, Google²

15 Jun 2018

TL;DR: In this paper, a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs is presented. But the training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer.

...read moreread less

Abstract: We present a method for training a regression network from image pixels to 3D morphable model coordinates using only unlabeled photographs. The training loss is based on features from a facial recognition network, computed on-the-fly by rendering the predicted faces with a differentiable renderer. To make training from features feasible and avoid network fooling effects, we introduce three objectives: a batch distribution loss that encourages the output distribution to match the distribution of the morphable model, a loopback loss that ensures the network can correctly reinterpret its own output, and a multi-view identity loss that compares the features of the predicted 3D face and the input photograph from multiple viewing angles. We train a regression network using these objectives, a set of unlabeled photographs, and the morphable model itself, and demonstrate state-of-the-art results.

...read moreread less

261 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Tensor Decompositions and Applications

[...]

Tamara G. Kolda¹, Brett W. Bader¹•Institutions (1)

Sandia National Laboratories¹

01 Aug 2009-Siam Review

TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.

...read moreread less

Abstract: This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or $N$-way array. Decompositions of higher-order tensors (i.e., $N$-way arrays with $N \geq 3$) have applications in psycho-metrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, and elsewhere. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decomposition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal component analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox, Tensor Toolbox, and Multilinear Engine are examples of software packages for working with tensors.

...read moreread less

9,227 citations

Posted Content•

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

[...]

Ben Mildenhall¹, Pratul P. Srinivasan¹, Matthew Tancik¹, Jonathan T. Barron², Ravi Ramamoorthi³, Ren Ng¹ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, Google², University of California, San Diego³

19 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work describes how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrates results that outperform prior work on neural rendering and view synthesis.

...read moreread less

Abstract: We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(\theta, \phi)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.

...read moreread less

2,435 citations

Proceedings Article•DOI•

KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera

[...]

Shahram Izadi¹, David Kim¹, Otmar Hilliges¹, David Molyneaux¹, Richard Newcombe², Pushmeet Kohli¹, Jamie Shotton¹, Steve Hodges¹, Dustin Freeman³, Andrew J. Davison², Andrew Fitzgibbon¹ - Show less +7 more•Institutions (3)

Microsoft¹, Imperial College London², University of Toronto³

16 Oct 2011

TL;DR: Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction, to enable real-time multi-touch interactions anywhere.

...read moreread less

Abstract: KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D reconstructions of an indoor scene. Only the depth data from Kinect is used to track the 3D pose of the sensor and reconstruct, geometrically precise, 3D models of the physical scene in real-time. The capabilities of KinectFusion, as well as the novel GPU-based pipeline are described in full. Uses of the core system for low-cost handheld scanning, and geometry-aware augmented reality and physics-based interactions are shown. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction. These extensions are used to enable real-time multi-touch interactions anywhere, allowing any planar or non-planar reconstructed physical surface to be appropriated for touch.

...read moreread less

2,373 citations

Journal Article•DOI•

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

[...]

Catalin Ionescu¹, Dragos Papava¹, Vlad Olaru¹, Cristian Sminchisescu²•Institutions (2)

Romanian Academy¹, Lund University²

01 Jul 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, is introduced for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms.

...read moreread less

Abstract: We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms. Besides increasing the size of the datasets in the current state-of-the-art by several orders of magnitude, we also aim to complement such datasets with a diverse set of motions and poses encountered as part of typical human activities (taking photos, talking on the phone, posing, greeting, eating, etc.), with additional synchronized image, human motion capture, and time of flight (depth) data, and with accurate 3D body scans of all the subject actors involved. We also provide controlled mixed reality evaluation scenarios where 3D human models are animated using motion capture and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide a set of large-scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. Our experiments show that our best large-scale model can leverage our full training set to obtain a 20% improvement in performance compared to a training set of the scale of the largest existing public dataset for this problem. Yet the potential for improvement by leveraging higher capacity, more complex models with our large dataset, is substantially vaster and should stimulate future research. The dataset together with code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, is available online at http://vision.imar.ro/human3.6m .

...read moreread less

2,209 citations

Proceedings Article•

A morphable model for the synthesis of 3D faces

[...]

Matthew Turk

01 Jan 1999

2,010 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse