scispace - formally typeset
Open AccessProceedings ArticleDOI

A case study evaluation: perceptually accurate textured surface models

Reads0
Chats0
TLDR
A case study of naïve subjects who found that surfaces captured with this method, when rendered under novel lighting and view conditions, were statistically indistinguishable from photographs, is presented.
Abstract
This paper evaluates a new method for capturing surfaces with variations in albedo, height, and local orientation using a standard digital camera with three flash units. Similar to other approaches, captured areas are assumed to be globally flat and largely diffuse. Fortunately, this encompasses a wide array of interesting surfaces, including most materials found in the built environment, e.g., masonry, fabrics, floor coverings, and textured paints. We present a case study of naive subjects who found that surfaces captured with our method, when rendered under novel lighting and view conditions, were statistically indistinguishable from photographs. This is a significant improvement over previous methods, to which our results are also compared.

read more

Content maybe subject to copyright    Report

This item was submitted to Loughborough’s Institutional Repository
(https://dspace.lboro.ac.uk/) by the author and is made available under the
following Creative Commons Licence conditions.
For the full text of this licence, please go to:
http://creativecommons.org/licenses/by-nc-nd/2.5/

A Case Study Evaluation: Perceptually Accurate Textured Surface Models
Greg Ward, Dolby Canada, greg.ward@acm.org
Mashhuda Glencross, University of Manchester, mashhud a@manchester.ac.uk
Figure 1. Left is the depth hallucination method of Glencross et al. [2008]; Right is our improved, three-flash method; Center is a photograph.
ABSTRACT
This paper evaluates a new method for capturing surfaces with
variations in albedo, height, and local orientation using a standard
digital camera with three flash units. Similar to other approaches,
captured areas are assumed to be globally flat and largely diffuse.
Fortunately, this encompasses a wide array of interesting surfaces,
including most materials found in the built environment, e.g.,
masonry, fabrics, floor coverings, and textured paints. We present
a case study of naïve subjects who found that surfaces captured
with our method, when rendered under novel lighting and view
conditions, were statistically indistinguishable from photographs.
This is a significant improvement over previous methods, to
which our results are also compared.
Index Terms—Lighting, shading and textures, Perceptual
validation, Computer vision, Texture.
1 I
NTRODUCTION
Photographic textures have been applied to geometric models
to enhance realism for decades, and are an integral part of every
modern rendering engine. However, two-dimensional textures
have a tendency to resemble wallpaper at oblique angles, and are
unable to produce realistic silhouettes or change appearance under
different lighting. Displacement mapping or relief mapping
methods [Oliveira 2000] can overcome these limitations, but full
reflectance and geometry model data are difficult to capture from
real surfaces, requiring expensive scanning equipment and
subsequent manual alignment with photographically acquired
textures [Rushmeier et al. 2003; Lensch et al. 2003], a large set of
data, and/or complicated rigs [Dana et al. 1999; Marschner et al.
1999]. Games companies often employ skilled artists to create
texture model data for displacement mapping using 3D modeling
packages, which is a laborious process. Glencross et al.
introduced a simple and inexpensive shape-from-shading
technique for “hallucinating” depth information from a pair of
photographs taken from the same viewpoint, one captured with
diffuse lighting and another taken with a flash [Glencross et al.
2008].
Since the method captures albedo simultaneously, no alignment
steps are needed. Although the authors do not claim absolute
accuracy in terms of reproducing depth values, user studies
showed that subjects found it difficult to distinguish the
plausibility of hallucinated depth relative to ground truth data,
adequately demonstrating the technique’s value for realistic
computer graphics.
In this paper, we ask the question “what level of additional
captured model accuracy will result in synthetic images that are
indistinguishable from photographs?” To answer this question, we
extend the depth hallucination method to include photometrically
measured surface orientation [8]. By adding two additional flash
units to the one employed by [Glencross et al. 2008], we are able
to derive accurate surface orientations at most pixels in our
captures. Our validation studies demonstrate that the addition of
measured surface orientation results in no statistically significant
differences in perception between photographs and captured, re-
rendered images.
The entire process has been automated, with capture taking a
few seconds and model extraction less than a minute.
Since the focus of this work is the evaluation of captured
model fidelity and its impact on the visual accuracy of the results,
we begin by first briefly discussing related work, and then give an
overview of the photometric method used. We evaluate the visual
impact of measured surface orientation on computer-generated
imagery through an experimental study. Finally, we conclude by
discussing the limitations and suggesting future directions.
2 RELATED WORK
Besides the aforementioned work of Glencross et al. [2008],
our method is closely related to that of Rushmeier and Bernardini,
who used a comparable multi-source arrangement to recover
surface normal information [Rushmeier and Bernardini 1999].
This is similarly built on the photometric stereo technique of
Woodham [1980]. Rushmeier and Bernardini also employ a
separate shape camera with a structured light source to obtain
large-scale geometry, which they went to considerable effort to
align with the captured texture information. Their system
employed 5 tungsten-halogen sources, so they could dismiss up to
2 lights that were shadowed or caused specular reflection and still
have enough information to recover the surface normal at a pixel.
Ours is not so much an improvement on their method, as a
simplified approach for a different application. Since our goal is
local depth and surface normal variations, we do not require the 3-
D geometry capture equipment or registration software, and our
single-perspective diffuse plus flash images are sufficient for us to

hallucinate depth at each pixel. To avoid specular highlights, we
employ crossed polarizers as suggested by [Glencross et al. 2008],
and interpolate normals over pixels that are shadowed in one or
more captures.
Our technique also bears close resemblance to the material
capture work of Paterson et al. [2005]. Using photometric stereo
in combination with surface normal integration and multiple view
captures, these researchers were able to recover displacement
maps plus inhomogeneous BRDFs over nearly planar sample
surfaces using a simple flash plus camera arrangement. Their
method incorporates a physical calibration frame around the
captured surface to recover camera pose and flash calibration
data. In contrast, our method uses only single-view capture, and
flash/lens calibration is performed in advance, thus avoiding any
restrictions of surface dimensions. Since we do not rely on
surface normal integration to derive height information, our
method is more robust to flash shadowing and irregular or spiky
terrain. Similar to their technique, we assume a nearly planar
surface with primarily diffuse reflection, and capture under
ambient conditions. However, we make no attempt to recover
specular characteristics in our method, which would be difficult
from a single view.
Figure 2. Three-flash capture system mounted on a tripod with a digital
SLR camera.
Multiple flashes have also been used to produce non-
photorealistic imagery. Specifically, Raskar et al. developed a
method for enhancing photographic illustrations exploiting the
shadows cast by multiple flashes [Raskar et al. 2004]. Toler-
Franklin et al. employed photometric stereo to capture surface
normals, then applied these to enhance and annotate photo-based
renderings [Toler-Franklin et al. 2007]. With the additional depth
information our technique provides from the same data, it could
be applied in a similar way to the problem of non-photorealistic
rendering, though that is not our focus.
3
METHOD
Our technique borrows from and improves upon previous
methods by employing a digital camera with three external flash
units. We build on the flash/no-flash depth hallucination method
of Glencross et al. [2008] by capturing two additional flash
images to derive surface normal information and overcome
limitations in their original albedo estimation. Employing three
flashes virtually guarantees that every point on the surface will be
illuminated in at least one image, and for points lit by all three
flashes, we can accurately measure the surface normal as well.
This normal map is used to correct the albedo estimate and further
enhance re-rendering under different lighting conditions.
We begin by describing our three-flash capture system,
followed by a description of the capture process and how the
images are processed into a detailed surface model.
Figure 3. Circuit diagram for our three-flash controller.
3.1 Three-Flash Controller
To automatically sequence each flash, we built the simple
controller circuit shown in Figure 3 to fire each flash in sequence,
followed by a no-flash capture. In our configuration, we cycle the
power to a shoe-mounted flash to force the camera into ambient
exposure mode for the no-flash capture. This avoids having to
touch the camera or control it via a USB tether – a tripod and a
remote release cable are the only additional equipment required.
The hot-shoe flash sync is controlled by the camera, so it fires
while it has power. Therefore, some additional image processing
is required for this set-up, which we explain in Section C, below.
A full cycle is achieved after 4 shutter releases. The first
shutter release fires Flash 1 mounted on the hot-shoe only. The
second shutter release fires Flash 2 as well, and the third shutter
release fires Flashes 1 and 3. After three firings, power is turned
off to Flash 1 mounted on the hot-shoe, thus putting the camera
into ambient exposure mode, and none of the flashes fire. Once
this final no-flash image has been captured, the cycle repeats.
Figure 2 shows our capture system mounted on a tripod. An
amber LED indicates the controller is powered in its initial state,
ready to begin a capture sequence. Linear polarizers are placed
over each flash unit and aligned 90° out-of-phase with a polarizer
filter mounted on the lens in order to reduce specular reflections
as suggested in [Glencross et al. 2008].

3.2
Capture Process
The hot-shoe mounted flash is set to half its maximum output
in manual mode, while the other two flashes are set to maximum.
Since the hot-shoe flash fires every time, setting its output to half
prevents it from drowning out the other flashes when they fire.
Sufficient time is allowed between shutter releases for the flashes
to fully recharge, ensuring that they produce roughly the same
output each time. A cable release is used to avoid any camera
movement, which would make subsequent image processing more
difficult. After the full sequence of 4 images is captured and the
histograms are checked to ensure a good set of exposures, the
capture process is complete.
Figure 4. Diagram of RAW capture processing with dark subtraction
used to obtain three separate flash no-flash images.
Figure 5. Our three separate flash images with the no-flash image in the
lower right, all after RAW processing.
3.3 Image Processing
The first stage of our image-processing pipeline converts RAW
captures to 16-bit/channel linear encoded TIFF. Taking
advantage of the dark subtraction feature of dcraw [Coffin], we
eliminate the effect of ambient lighting on our Flash 1 capture by
subtracting the no-flash capture after applying the appropriate
scale factor to account for differences in exposure time. We use
this same trick to separate flash images by subtracting the Flash1-
only capture from the Flash 1+2 and Flash 1+3 captures. Since
Flash 1 also includes the ambient lighting, this takes care of the
whole process for Flashes 2 and 3. This conversion is illustrated
in Figure 4, with results shown in Figure 5.
The second stage of our image processing applies a calibration
to the flash images to correct for vignetting and other uniformity
issues. Since this correction varies with distance, aperture, lens
and focal length, we capture a set of 50 to 100 reference flash
images of a white, diffuse wall, then interpolate these calibration
images to obtain a more accurate result. This interpolation process
pulls out the six nearest flash triplets from our set and applies a
weighted average to these. We then divide each flash image by its
interpolated calibration image as in [Glencross et al. 2008] in
preparation for the next processing stage.
In the third image processing stage, we simultaneously obtain
local surface orientation (normals) and albedo (reflectance) by
solving the following 3x3 matrix equation at each pixel
illuminated by all three flashes [Rushmeier and Bernardini 1999]:
V
r
n =
r
i
(1)
where:
V = illumination direction matrix
r
n = normal vector times albedo
r
i = adjusted flash pixel values
The computed adjusted flash pixel values are the corrected
luminance values for each flash capture, multiplied again by the
cosine of the incident angle, which was undone by our flash
calibration. We compute the illumination direction matrix V by
subtracting the estimated 3-D pixel positions given by our lens
focal length and focus distance (recorded in the image metadata)
from the known flash positions. We normalize each of these
vectors, thus our measured pixels in
r
i
are proportional to the dot
product of the illumination vectors with the surface normal, times
albedo. Solving for
r
n
at each pixel, we take this vector length as
our local variation in albedo. In shadow regions where only two
flashes illuminate the surface, a technique such as [Hernández et
al. 2008] could be used to resolve normals via an integrability
constraint. We found that a simple hole-filling algorithm that
averaged the four closest neighbors worked well enough in
shadow regions, thanks to the masking from texture complexity
that hides small artifacts.
A global scale factor may be applied to ensure an expected
range of albedo values as a final step if necessary. Similarly, we
found that applying a global flattening of the derived surface
normals improves later rendering. This is accomplished by
subtracting a low-frequency (blurred) version of the normal map
from the high-resolution original, providing local detail while
suppressing systematic errors due to imperfect calibration.
The fourth and final stage exactly follows the method laid out
by [Glencross et al. 2008] to hallucinate depth using a multi-scale
model based on the no-flash image divided by the albedo image.
The important differences here are that we have a better estimate
of albedo based on our knowledge of local surface orientation,
and our multiple flashes avoid areas of complete shadow.

Figure 6. Left image contains depth hallucinated from a single flash/no-flash pair. Right image shows results of 3-flash system. Center is a photograph.
Figure 7. Comparison of hallucination and rendering methods showing the original diffuse photo, single-flash re-rendered result, three-flash depth result, and
finally the three-flash result with derived normals.
4 RESULTS
4.1 Comparison to Single-flash Method
Figure 7 shows a side-by-side comparison between our three-
flash method and the previous method of Glencross et al. [2008].
The upper-left image shows the original no-flash (diffusely lit)
photograph. The upper-right image shows a rendering under
simulated daylight using depth hallucinated with a single flash
image and this diffuse photo. The lower-left image shows the same
rendering using depth hallucinated from all three flashes, but
without taking advantage of the derived surface normals. The final
image on the lower-right shows the same improved depth map with
derived normal information.
While we expected some slight improvements to the depth
hallucination using three flashes, we found that most of the visible
differences in the result came when we applied the derived surface

Citations
More filters
Proceedings ArticleDOI

Tactile perceptions of digital textiles: a design research approach

TL;DR: A new perspective from which the production of multi-touch interactive video representations of the tactile qualities of materials is considered is offered, and methods to animate and bring these properties alive using design methods are developed.
Proceedings ArticleDOI

Relightable Buildings from Images

TL;DR: A complete image-based process that facilitates recovery of both gross-scale geometry and local surface structure to create highly detailed 3D models of building facades from photographs, imparting the illusion of measured reflectance is proposed.
Proceedings ArticleDOI

Transfer of albedo and local depth variation to photo-textures

TL;DR: A material appearance transfer method designed to infer surface detail and diffuse reflectance for textured surfaces like the present in building façades, and shows how these methods are used to create relightable models with a high degree of texture detail.
Dissertation

Image based surface reflectance remapping for consistent and tool independent material appearence

TL;DR: Automatic solutions to material appearance consistency are suggested in this work, accounting for the constraints of real-world scenarios, where the only available information is a reference rendering and the renderer used to obtain it, with no access to the implementation of the shaders.
Proceedings Article

Why a Single Measure of Photorealism Is Unrealistic

TL;DR: It is concluded that there is no single measure of photorealism that is appropriate in all situations, and photrealism appears to be a multifaceted phenomenon that requires different measurement procedures for different use scenarios.
References
More filters
Proceedings ArticleDOI

Illustration of complex real-world objects using images with normals

TL;DR: This paper investigates the creation of non-photorealistic illustrations from a type of data lying between simple 2D images and full 3D models: images with both a color and a surface normal stored at each pixel, and introduces new stylization effects based on multiscale mean curvature shading.
Journal ArticleDOI

A perceptually validated model for surface depth hallucination

TL;DR: This work presents a method for recovering models of predominantly diffuse textured surfaces that can be plausibly relit and viewed from any angle under any illumination, using only a standard digital camera and a single view.
Proceedings ArticleDOI

Design and Use of an In-Museum System for Artifact Capture

TL;DR: The design and use of a 3D scanning system currently installed in Cairo's Egyptian Museum is described, designed to capture both the geometry and photometry of the museum artifacts.
Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "A case study evaluation: perceptually accurate textured surface models" ?

This paper evaluates a new method for capturing surfaces with variations in albedo, height, and local orientation using a standard digital camera with three flash units. The authors present a case study of naïve subjects who found that surfaces captured with their method, when rendered under novel lighting and view conditions, were statistically indistinguishable from photographs.