Journal Article•DOI•

From the Digitization of Cultural Artifacts to the Web Publishing of Digital 3D Collections: an Automatic Pipeline for Knowledge Sharing

Frédéric Larue¹, Marco Di Benedetto¹, Matteo Dellepiane¹, Roberto Scopigno¹•Institutions (1)

Istituto di Scienza e Tecnologie dell'Informazione¹

04 Jan 2012-Journal of Multimedia-Vol. 7, Iss: 2, pp 132-144

TL;DR: A novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field is introduced, consisting in a pipeline that covers all steps from the digitization of the objects up to the Web publishing of the resulting digital copies.

read less

Abstract: In this paper, we introduce a novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field. It consists in a pipeline that covers all steps from the digitization of the objects up to the Web publishing of the resulting digital copies. During a first stage, the digitization is performed by a high speed 3D scanner that recovers the object's geometry. A second stage then extracts from the recovered data a color texture as well as a texture of details, in order to enrich the acquired geometry in a more realistic way. Finally, a third stage converts these data so that they are compatible with the recent WebGL paradigm, then providing 3D multimedia content directly exploitable by end-users by means of standard Internet browsers. The pipeline design is centered on automation and speed, so that it can be used by non expert users to produce mul- timedia content from potentially large object's collections, like it may be the case in cultural heritage. The choice of a high speed scanner is particularly adapted for such a design, since this kind of devices has the advantage of being fast and intuitive. Processing stages that follow the digitization are both completely automatic and "seamless", in the sense that it is not incumbent upon the user to perform tasks manually, nor to use external softwares that generally need additional operations to solve compatibility issues.

...read moreread less

Summary (4 min read)

Jump to: [Introduction] – [A. Real-time 3D scanning] – [B. Color acquisition and visualization on 3D models] – [III. DIGITIZATION AND PROCESSING OF CH ARTIFACTS FOR GENERATING DIGITAL COPIES] – [A. Acquisition by in-hand scanning] – [B. Recovery of a diffuse color texture] – [C. Recovery of a texture of details] – [IV. WEB PUBLISHING] – [A. Data optimization] – [B. Data storage and retrieval] – [V. RESULTS] – [B. Diffuse color texture reconstruction] – [C. Detail texture reconstruction] and [VI. CONCLUSION]

Introduction

In the field of cultural heritage (CH), knowledge sharing is one of the most essential aspects for communication activities between museal institutions, that conserve and take care of cultural collections, and the public.
For this purpose, multimedia technologies are becoming more and more widespread in the CH field, where these surrogates are then represented by digital copies.
Moreover, inasmuch as high speed scanning systems are often prone to a lack of accuracy with respect to more classic digitization technologies, the authors also estimate a normal texture from these data, once again in an automatic manner.
Hence, the archival and the sharing of vast item collections becomes possible and easy also for non expert users.
Section III presents the two first stages of their system, namely the in-hand scanner used for the acquisition, as well as their processing step for generating a digital copy from the acquired data.

A. Real-time 3D scanning

An overview of the 3D scanning and stereo reconstruction goes well beyond the scope of this paper.
Their main issues are the availability of technology and the problem of aligning data in a very fast way.
Among the latter, the most robust approach is based on the use of fast structured-light scanners [1], where a high speed camera and a projector are used to recover the range maps in real-time.
This is essentially due to the low resolution of the cameras, and to the difficulty of handling the peculiar illumination provided by the projector.
Other systems have been proposed which take into account also the color, but they are not able to achieve real-time performances [6] or to reconstruct the geometry in an accurate way [7].

B. Color acquisition and visualization on 3D models

The most flexible approach starts from a set of images acquired either in a second stage with respect to the geometry acquisition, or simultaneously but using different devices.
Image-to-geometry registration, which can be solved by automatic [8]–[10] or semiautomatic [11] approaches, is then necessary.
Due to the lack of consistency from one image to another, artifacts are visible at the junctions between surface areas receiving color from different images.
In particular, Callieri et al. [20] presented a flexible weighting system that can be extended in order to accommodate additional criteria.
The first tools aimed at visualizing 3D models in Web pages were based on embedded software components, such as Java applets or ActiveX controls [25].

III. DIGITIZATION AND PROCESSING OF CH ARTIFACTS FOR GENERATING DIGITAL COPIES

Cultural heritage has been a privileged field of application for 3D scanning since the beginning of its evolution.
This is due to the enormous variety and variability of the types of objects that can be acquired.
Moreover, archival and preservation are extremely important issues as well.
The acquisition of a large number of objects can be expensive both in terms of hardware and time needed for data processing.
They usually need the placement of markers on the object, which is something that is hard to make on CH artifacts.

A. Acquisition by in-hand scanning

The first stage of their workflow, namely the one producing all data required for the creation of digital copies from cultural artifacts, is based on an in-hand scanner whose hardware configuration is shown in Figure 2.
This scanner, like most of the high speed digitization systems, is based on structured light.
The scanning can be performed in two different ways.
Occlusions are detected by a hue analysis which produces, for each video frame, a map of skin presence probability.
These data are then used in the next stage of the workflow in order to produce the color texture and the texture of details, as explained hereafter, in section III-B.

B. Recovery of a diffuse color texture

The texturing method extends the work proposed in [20] so as to adapt it to the data flow produced by the scanning system presented above.
These criteria have been chosen so as to penalize image regions that are known to lack of accuracy, in order to deal with data redundancy from one image to another in © 2012 ACADEMY PUBLISHER a way that ensures seamless color transitions.
Since the complete geometric configura- tion of the scene is known, the authors can use a simple shadow mapping algorithm to estimate shadowed areas, to which a null weight is assigned.
During the subsequent passes, highlights are identified by computing the luminosity difference between the texture obtained at the previous pass and the input picture.
Each elementary mask contains values in the range ]0, 1], zero being excluded in order to ensure that texturing is guaranteed for every surface point that is visible in at least one picture.

C. Recovery of a texture of details

It often leads to a loss of accuracy with respect to traditional scanning devices, thus preventing the acquisition of the finest geometric details.
The authors use here a similar approach for extracting a normal map from the video flow produced by the in-hand scanner.
This uneven sampling distribution may result in an estimated normal which is reliable along this plane but really uncertain along the orthogonal direction.
To alleviate this problem, the authors propose to analyze the sampling distribution at each point p by performing a PCA on the set of light directions.
By definition, ν2 is the direction along which the sampling is the poorest.

IV. WEB PUBLISHING

After geometry and texture images have been processed, the third and last stage of their pipeline optimizes the generated data for network transmission and realtime rendering on standard Web browsers.
The optimized version of the 3D model is stored in the server file system and is accessed by a standard HTTP server to serve requests of visualization clients.
In the following, the authors describe the steps they use for preparing and storing the data.

A. Data optimization

The optimization phase is composed of two sequential steps: geometry partitioning and rendering optimizations.
To this end, the authors use a simple greedy method that iteratively adds triangles to a chunk until the maximum number of vertices is reached.
One advantage of the indexed triangle mesh representation is that vertices referenced by more than one triangle need to be stored only once, also known as Rendering optimizations.
To convey this advantage from memory occupancy to rendering performances, graphics accelerators have introduced a vertex cache capable of storing data associated to up to 32 vertices, thus allowing to reuse the results of a considerable amount of per-vertex calculations.
Even though the problem does not have a polynomial-time solution, several works have been developed [42], [43] that produce a very good approximate solution in a relatively small amount of time.

B. Data storage and retrieval

One of their goal is to exploit standard and easily available technologies for making the produced models accessible on the Web platform.
To this end, the authors decided to use the well-known Apache HTTP server and use the server file system as the storage database.
Model data is saved under standard file formats: to store geometry information the authors use the Stanford polygon file format (PLY), which support multiple vertex attributes and binary encoding, while Portable Network Graphics (PNG) images are used for color and normal textures.
Even though those formats are already compact, the authors take advantage of the automatic compression (gzip) applied by the Apache server on data transmission, as well as automatic decompression executed by browsers on data arrival.
To access the remote 3D model, visualization clients use JavaScript to issue a HTTP request with a base URL of the form http://example-data-domain.org/modelname/, and appending predefined file names to discriminate among geometry and texture files, such as geometry.ply, color.png and normal.png.

V. RESULTS

The authors present in this section some results of their pipeline, as well as some implementation details.
The proposed objects are a sample of a group of artifacts which were used to test the entire system.
The proposed results and processing times show that an extension to large collections is straightforward.
For the texture of details, both matrices (LTL) and (LTC) of equation 6 can be constructed incrementally by processing input pictures one by one on the GPU and accumulating intermediate results using buffer textures.
This permits to obtain the 3D models of several objects within an hour of work.

B. Diffuse color texture reconstruction

The authors texturing results are shown in the bottom row of Figure 6.
The most obvious difference that can be noticed is clearly the drastic loss of luminosity that occurs in the case of the naive approach.
For the Pot model , the big vertical crack in the white rectangle results from the fact that one portion of the surface was depicted by a much greater number of frames with respect to the adjacent one: this produces an imbalance among the number of summed color contributions, and the consequent abrupt change of color.
During this texturing phase, the only parameters that must be set by the user are the number of applications for each elementary mask.
The set of parameters is then really small and can be tuned in an easy and intuitive manner.

C. Detail texture reconstruction

Figure 8 illustrates the efficiency of their normal correction procedure by showing the normal field computed for the same object with and without correction.
Since the eigenvalue ratio decreases, the estimated normal is forced to get © 2012 ACADEMY PUBLISHER closer to the original mesh normal along the direction of highest uncertainty.
The frames on the right side of these images highlight once again the improvement resulting from the estimated texture of details.
Thus, the user do not have to perform on purpose an exhaustive measurement just to satisfy the fitting constraints.
As shown at the bottom of each Web page snapshot in Figure 13, the rendering performance is in the order of thousands of frames per second (FPS) for models that range from 50K to 100K triangles.

VI. CONCLUSION

The authors presented a complete pipeline for the creation of Web browsable digital content from real objects, consisting in 3D models enhanced by two textures respectively encoding artifact free color and fine geometric details.
Even though the proposed approach is generic enough to be used in any application for which producing and sharing digital content about real artifacts present an interest, its three main advantages (namely its ease of use, its high automation and its quickness) make it particularly appropriate to cases where huge collections have to be processed.
It is not easy to know in advance if enough frames have been acquired so that accurate color or fine geometric detail can be extracted safely.
Hence, a “Photo Tourism-like” [46] image navigation could be possible.

Did you find this useful? Give us your feedback

Figures (14)

Figure 4. Example of a 3D model with the associated acquisition frames, which shows a typical path for a small object.

Figure 5. Outline of our normal fitting correction.

Figure 1. Overview of the presented framework, that covers the whole chain from the 3D digitization of real CH artifacts up to the Web publishing of the resulting digital 3D copies for archiving, browsing and visualization through Internet.

TABLE I. INFORMATION AND TIMINGS FOR VARIOUS ACQUIRED OBJECTS.

Figure 10. The Strawberry model rendered with and without color, and with and without normal map.

Figure 9. The Gargoyle model rendered with and without color, and with and without normal map. Bottom row: close-up on the tail details.

Figure 6. Comparison of texturing obtained by a naive averaging of all input frames (top row) and the proposed approach using weighting masks (bottom row).

Figure 7. Color reconstruction for the Elephant model.

Figure 2. The in-hand scanning device used during the first step of the presented workflow, producing the data flow required for the generation of cultural artifacts digital copies.

Figure 13. Rendering examples of our 3D digital artifacts using the SpiderGL library on an common Web browser.

Figure 8. Differences induced by our constrained fitting on the normal computation for a region with a good sampling (top row) and a region with a poor sampling (bottom row). From left to right: initial mesh normals; normals from unconstrained fitting; normals from our constrained fitting.

Figure 3. The diffuse color texture is computed as the weighted average of all video frames. Weights are obtained by the composition of multiple elementary masks, each one corresponding to a particular criterion related to viewing, lighting or scanning conditions.

Content maybe subject to copyright Report

From the Digitization of Cultural Artifacts to the

Web Publishing of Digital 3D Collections:

an Automatic Pipeline for Knowledge Sharing

eric Larue, Marco Di Benedetto, Matteo Dellepiane and Roberto Scopigno

ISTI-CNR, Pisa, Italy

Abstract— In this paper, we introduce a novel approach

intended to simplify the production of multimedia content

from real objects for the purpose of knowledge sharing,

which is particularly appropriate to the cultural heritage

ﬁeld. It consists in a pipeline that covers all steps from the

digitization of the objects up to the Web publishing of the

resulting digital copies. During a ﬁrst stage, the digitization

is performed by a high speed 3D scanner that recovers

the object’s geometry. A second stage then extracts from

the recovered data a color texture as well as a texture of

details, in order to enrich the acquired geometry in a more

realistic way. Finally, a third stage converts these data so

that they are compatible with the recent WebGL paradigm,

then providing 3D multimedia content directly exploitable

by end-users by means of standard Internet browsers.

The pipeline design is centered on automation and speed,

so that it can be used by non expert users to produce mul-

timedia content from potentially large object’s collections,

like it may be the case in cultural heritage. The choice of a

high speed scanner is particularly adapted for such a design,

since this kind of devices has the advantage of being fast

and intuitive. Processing stages that follow the digitization

are both completely automatic and “seamless”, in the sense

that it is not incumbent upon the user to perform tasks

manually, nor to use external softwares that generally need

additional operations to solve compatibility issues.

I. INTRODUCTION

In the ﬁeld of cultural heritage (CH), knowledge shar-

ing is one of the most essential aspects for communication

activities between museal institutions, that conserve and

take care of cultural collections, and the public. Among

other things, these activities include education, research

and study as well as entertainment. All of them are really

precious for the spread of culture. However, the public

is not the only one to beneﬁt from knowledge sharing:

it is important for promotion and advertisement purposes

as well, which are both of a high interest for the museal

institutions themselves regarding visibility, development

and long term sustainability.

In order to preserve the integrity of cultural goods,

knowledge sharing generally makes use of surrogates so

as to avoid to expose directly the real artifacts to poten-

tial risks of deterioration. For this purpose, multimedia

technologies are becoming more and more widespread in

the CH ﬁeld, where these surrogates are then represented

by digital copies. This popularity can be explained by

at least two reasons. On the one hand, computing tools

clearly provide an underlying easiness for data storage,

indexation, browsing and sharing, due to the existing

network facilities and to the new Web technologies. On

the other hand, recent advances in 3D scanning give the

possibility to create multimedia content from real arti-

facts, producing faithful digital imprints and avoiding the

tedious and time consuming task of a manual modeling

through CAD softwares.

Particularly, new high speed systems like in-hand scanners

present big advantages for CH. Firstly, they are able

to acquire digital copies in a few minutes, which is

really important when the multimedia content must be

produced from huge collections in reasonable times, or

when several fragments must be scanned in order to plan

the restoration of destroyed pieces. Moreover, they can

be manipulated by non-expert users as well, since they

provide an interactive feedback and rely on the temporal

coherency of the high-speed acquisition to get rid of the

traditional alignment problems that generally need to be

solved manually during a tedious post-processing phase.

Despite the availability of these technologies and their

increasing popularity, there is still a lack of global and

automatic solutions enabling to cover the whole process-

ing chain that ranges from content creation to content

publishing.

The ﬁrst weak point of this chain occurs during the

acquisition itself: for CH applications, a good representa-

tion of the geometry is not sufﬁcient to produce faithful

surrogates, since interactive visualization requires to be

able to provide synthetic images as near as possible

to the real appearance of the depicted object. In that

case, the geometry needs to be paired with an accurate

representation of the surface appearance (color, small

shape details, reﬂection characteristics). Unfortunately,

commercial scanning systems mostly focused on shape

measurement, putting aside until recently the problem of

recovering quality textures from real objects. This has led

to a lack of efﬁcient and automatic processing tools for

color acquisition and reconstruction.

The second problem is that both tasks of creation and

publishing of multimedia content are generally totally

uncorrelated from each other. From a practical point of

view, it means that a different software must be used

for each of them. A work for converting the various

inputs/outputs in a compatible way is then necessary, and

generally consists in a manual task that is incumbent upon

the user.

132

JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012

doi:10.4304/jmm.7.2.132-144

Figure 1. Overview of the presented framework, that covers the whole chain from the 3D digitization of real CH artifacts up to the Web publishing

of the resulting digital 3D copies for archiving, browsing and visualization through Internet.

In this paper, we present a complete system that enables to

create colored 3D digital copies from existing artifacts and

to publish them directly on Internet, through an interactive

visualization based on WebGL technology. This system,

outlined on Figure 1, consists in three stages.

During the ﬁrst one, acquisition is performed directly by

the user in an intuitive manner thanks to an in-hand digi-

tization device performing 3D scanning in real-time. The

data provided by the scanner, as well as some properties

speciﬁc to this kind of devices, are then exploited to

automatically produce a diffuse color texture for the 3D

model. This texture is deprived of the traditional visual

artifacts that may appear due to the presence in the input

pictures of shadows, specular highlights, lighting incon-

sistencies or calibration inaccuracies. Moreover, inasmuch

as high speed scanning systems are often prone to a

lack of accuracy with respect to more classic digitization

technologies, we also estimate a normal texture from these

data, once again in an automatic manner. This texture

captures the ﬁnest geometric details that may be missed

during the 3D acquisition, and can then be used afterwards

to enrich the original geometry during visualization.

Once geometry and texture information have been pro-

cessed, the third and last stage of the production pipeline

performs an optimization phase aimed at producing a

compact and Web-friendly version of the data. The output

of this stage will be used for real-time visualization

on commodity platforms. One of our main goals is the

archival and deployment of digital copies using standard,

well-settled and widely accessible technologies: in this

view, we use a standard Web server as our data provider,

and the WebGL technology to visualize and integrate the

digital copy on standard Web pages.

The contributions proposed in this paper can be summa-

rized as follows:

• a complete and almost fully automatic pipeline for

the production of 3D multimedia content for Inter-

net applications, covering a chain ranging from the

digitization of real artifacts to the Web publishing of

the produced digital copies;

• a texturing method speciﬁcally designed for real-

time scanning systems, that accounts for speciﬁc

properties of this kind of devices in order to improve

the acquired 3D model with a good quality color

texture without cracks nor illumination related visual

artifacts, as well as a normal texture capturing the

ﬁnest geometric features;

• the coupling of intuitive acquisition techniques with

the recent paradigms proposed by WebGL technol-

ogy for Web publishing. Hence, the archival and the

sharing of vast item collections becomes possible and

easy also for non expert users.

The remainder of this paper is organized as follows. Sec-

tion II reviews the related work on software approaches

or complete systems for color acquisition, texture recon-

struction and real-time visualization on the Web platform.

Section III presents the two ﬁrst stages of our system,

namely the in-hand scanner used for the acquisition, as

well as our processing step for generating a digital copy

from the acquired data. The third and last stage dedicated

to the preparation of the digital copy for Web publishing is

then presented in section IV. Finally, section V shows the

results achieved and section VI draws the conclusions.

II. RELATED WORK

A. Real-time 3D scanning

An overview of the 3D scanning and stereo reconstruc-

tion goes well beyond the scope of this paper. We will

mainly focus on systems for real-time, in-hand acquisition

of geometry and/or color. Their main issues are the

availability of technology and the problem of aligning

data in a very fast way.

Concerning the ﬁrst point, 3D acquisition can be based on

stereo techniques or on active optical scanning solutions.

Among the latter, the most robust approach is based on

the use of fast structured-light scanners [1], where a

high speed camera and a projector are used to recover

the range maps in real-time. The alignment problem is

usually solved with smart implementations of the ICP

algorithm [2], [3], where the most difﬁcult aspect to solve

JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012

133

is related to the loop closure during registration.

In the last few years, some in-hand scanning solutions

have been proposed [2], [4], [5]: they essentially differ

on the way projection patterns are handled, and in the

implementation of ICP. None of the proposed systems

takes into account the acquisition of color, although the

one proposed by Weise et al. [5] contains also a color

camera (see next section for a detailed description). This

is essentially due to the low resolution of the cameras,

and to the difﬁculty of handling the peculiar illumination

provided by the projector. Other systems have been pro-

posed which take into account also the color, but they

are not able to achieve real-time performances [6] or to

reconstruct the geometry in an accurate way [7].

B. Color acquisition and visualization on 3D models

Adding color information to an acquired 3D model is

a complex task. The most ﬂexible approach starts from

a set of images acquired either in a second stage with

respect to the geometry acquisition, or simultaneously but

using different devices. Image-to-geometry registration,

which can be solved by automatic [8]–[10] or semi-

automatic [11] approaches, is then necessary. In our case,

this registration step is not required, because the in-

hand scanning system provides images which are already

aligned to the 3D model.

Once alignment is performed, it is necessary to extract

information about the surface material appearance and

transfer it on the geometry. The most correct way to

represent the material properties of an object is to describe

them through a reﬂection function (e.g. BRDF), which

attempts to model the observed scattering behavior of a

class of real surfaces. A detailed presentation of its theory

and applications can be found in Dorsey [12]. Unfortu-

nately, state-of-the-art BRDF acquisition approaches rely

on complex and controlled illumination setups, making

them difﬁcult to apply in more general cases, or when

fast or unconstrained acquisition is needed.

A less accurate but more robust solution is the direct

use of images to transfer the color to the 3D model. In

this case, the apparent color value is mapped onto the

digital object’s surface by applying an inverse projection.

In addition to other important issues, there are numerous

difﬁculties in selecting the correct color when multiple

candidates come from different images.

To solve these problems, a ﬁrst group of methods selects,

for each surface part, a portion of a representative image

following a speciﬁc criterion − in most cases, the orthog-

onality between the surface and the view direction [13],

[14]. However, due to the lack of consistency from one

image to another, artifacts are visible at the junctions

between surface areas receiving color from different im-

ages. They can be partially removed by working on these

junctions [13]–[15].

Another group of methods “blends” all image contri-

butions by assigning a weight to each one or to each

input pixel, and by selecting the ﬁnal surface color as

the weighted average of the input data, as in Pulli et

al. [16]. The weight is usually a combination of various

quality metrics [17]–[19]. In particular, Callieri et al. [20]

presented a ﬂexible weighting system that can be extended

in order to accommodate additional criteria. These meth-

ods provide better visual results and their implementation

permits very complex datasets to be used, i.e. hundreds

of images and very dense 3D models. Nevertheless,

undesirable ghosting effects may be produced when the

starting set of calibrated images is not perfectly aligned.

This problem can be solved, for example, by applying a

local warping using optical ﬂow [21], [22].

Another issue, which is common to all the cited methods,

is the projection of lighting artifacts on the model, i.e.

shadows, highlights, and peculiar BRDF effects, since the

lighting environment is usually not known in advance. In

order to correct (or to avoid to project) lighting artifacts,

two possible approaches include the estimation of the

lighting environment [23] or the use of easily controllable

lighting setups [24].

C. 3D graphics on the Web platform

Since the birth of Internet, content of Web document

has been characterized by several types of media, ranging

from plain text to images, audio or video streams. When

personal computers have started being equipped with fast

enough graphics acceleration hardware, 3D content began

in its turn to have an important role in the multimedia

sphere. The ﬁrst tools aimed at visualizing 3D models

in Web pages were based on embedded software compo-

nents, such as Java applets or ActiveX controls [25]. Sev-

eral proprietary plug-ins and extensions for Web browsers

were developed, giving evidence at the lack of standard-

ization for this new content type. Beside the developers

fragmentation that arises due to this wide variety of

available tools and to their incompatibilities, the burden

incumbent upon the user for the installation of additional

software components prevented a wide adoption of online

3D content.

Steps toward a standardization have been taken with the

introduction of the Virtual Reality Modeling Language

(VRML) [26] in 1995 and X3D [27] in 2007. However,

even though they have been well-accepted by the com-

munity, the 3D scene visualization was still delegated to

external software components.

The fundamental change happened in 2009 with the

introduction of the WebGL standard [28], promoted by

the Khronos Group [29]. With minor restrictions related to

security issues, the WebGL API is a one-to-one mapping

of the OpenGL|ES 2.0 speciﬁcations [30] in JavaScript.

This implies that modern Web browsers, like Google

Chrome or Mozilla Firefox, are able to natively access

the graphics hardware without needing additional plug-

ins or extensions. WebGL being a low-level API, a series

of higher-level libraries have been developed on top of

it. They differ from each other by the programming

paradigm they use, ranging from scene-graph-based in-

terfaces, like Scene.js [31] and GLGE [32], to procedural

paradigms, like SpiderGL [33] and WebGLU [34].

134

JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012

Figure 2. The in-hand scanning device used during the ﬁrst step of the

presented workﬂow, producing the data ﬂow required for the generation

of cultural artifacts digital copies.

In our pipeline, as it will be shown, we use SpiderGL as

the rendering library for the real-time visualization of the

acquired digital copies.

III. DIGITIZATIO N AND PROCESSING OF CH

ARTIFACTS FOR GENERATING DIGITAL COPIES

Cultural heritage has been a privileged ﬁeld of applica-

tion for 3D scanning since the beginning of its evolution.

This is due to the enormous variety and variability of the

types of objects that can be acquired. Moreover, archival

and preservation are extremely important issues as well.

Although 3D scanning can be considered now a ”mature”

technology, the acquisition of a large number of objects

can be expensive both in terms of hardware and time

needed for data processing. Very good results can be

achieved by customizing solutions for collections where

objects are almost of the same size and material, but

this can be expensive [35] or hard to extend to generic

cases [36]. Although some low cost and/or hand-held

devices are available, they usually need the placement of

markers on the object, which is something that is hard to

make on CH artifacts. Conversely, the presented method

uses only an affordable scanning system and does not

make any particular assumption on the measured objects

(except the fact that they are manipulable by hand),

neither for the scanning session itself nor for the post-

processing steps.

This section describes the two ﬁrst stages of our work-

ﬂow: how and with which technology real artifacts can

be easily digitized by the user (section III-A) and how

the resulting data are exploited to recover automatically

a color texture (section III-B) and a texture of details

(section III-C) to enrich the 3D model provided by the

acquisition.

A. Acquisition by in-hand scanning

The ﬁrst stage of our workﬂow, namely the one produc-

ing all data required for the creation of digital copies from

cultural artifacts, is based on an in-hand scanner whose

hardware conﬁguration is shown in Figure 2. This scanner,

like most of the high speed digitization systems, is based

on structured light. Shape measurement is performed by

phase-shifting, using three different sinusoidal patterns to

establish correspondences (and then to perform optical

triangulation) between the projector and the two black and

white video cameras. The phase unwrapping, namely how

the different signal periods are demodulated, is achieved

by a GPU stereo matching between both cameras (see [3],

[5] for more details). The whole process produces one

range map in 14ms. Simultaneously, a color video ﬂow

is captured by the third camera. During an acquisition,

the only light source in the scene is the scanner projector

itself, for which the position is always perfectly known.

The scanning can be performed in two different ways.

If the object color is neither red nor brown, it can be

done by holding the object directly by hand. In this case,

occlusions are detected by a hue analysis which produces,

for each video frame, a map of skin presence probability.

Otherwise, a black glove must be used. Although much

less convenient for the scanning itself, it makes the

occlusion detection trivial by simply ignoring dark regions

in the input pictures.

Each scanning session then produces a 3D mesh and

a color video ﬂow. For each frame of this video, the

viewpoint and the position of the light (ie. the scanner

projector) are given, as well as the skin probability map

in the case of a digitization performed by hand. These data

are then used in the next stage of the workﬂow in order

to produce the color texture and the texture of details, as

explained hereafter, in section III-B.

Even if this stage requires the intervention of the user, the

choice of a real-time scanner to perform the digitization is

particularly appropriate for non expert operators. Indeed,

for reasons already discussed in the introduction, its usage

is particularly easy and intuitive, and does not require

technical knowledge or manual post-processing. Finally,

it must be noticed that our method does not work only

with the presented scanner, but can be implemented for

any high speed digitization device that is based on the

same principle.

B. Recovery of a diffuse color texture

Our texturing method extends the work proposed

in [20] so as to adapt it to the data ﬂow produced by the

scanning system presented above. The idea, summarized

in Figure 3, is to weight each input picture by a mask

(typically a gray scale image) which represents a per-pixel

conﬁdence value. The ﬁnal color of a given surface point

is then computed as the weighted average of all color

contributions coming from the pictures into which this

point is visible. Masks are built by the composition of

multiple elementary masks, which are themselves com-

puted by image processing applied either on the input

image or on a rendering of the mesh performed from the

same viewpoint.

In the original paper, three criteria related to viewing con-

ditions have been considered for the mask computation:

the distance to the camera, the orientation with respect to

the viewpoint, and the proximity to a step discontinuity.

These criteria have been chosen so as to penalize image

regions that are known to lack of accuracy, in order to

deal with data redundancy from one image to another in

JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012

135

Figure 3. The diffuse color texture is computed as the weighted average of all video frames. Weights are obtained by the composition of multiple

elementary masks, each one corresponding to a particular criterion related to viewing, lighting or scanning conditions.

a way that ensures seamless color transitions. More details

about these masks can be found in [20].

Although sufﬁcient to avoid texture cracks, these masks

cannot handle self projected shadows or specular high-

lights since knowledge about the lighting is necessary.

In our case, both positions of the viewpoint and the light

(projector lamp) are always exactly known. Moreover, the

light moves with the scanner, which means that highlights

and shadows are different for each frame, as well as the

illumination direction. We then deﬁne the three following

additional masks, that aim at making prevailing image

parts deprived of illumination effects:

• Shadows. Since the complete geometric conﬁgura-

tion of the scene is known, we can use a simple

shadow mapping algorithm to estimate shadowed

areas, to which a null weight is assigned.

• Specular highlights. Conversely to shadows, high-

lights partially depend on the object material, which

is unknown. For this reason, we use a multi-pass

algorithm to detect them. The ﬁrst pass computes

the object texture without accounting for highlights.

Due to the high data redundancy, the averaging tends

to reduce their visual impact. During the subsequent

passes, highlights are identiﬁed by computing the

luminosity difference between the texture obtained

at the previous pass and the input picture. This dif-

ference corresponds to our highlight removal mask.

In practice, only two passes are sufﬁcient.

• Projector illumination. This mask aims at avoiding

luminosity loss during the averaging by giving more

inﬂuence to surface parts facing the light source. It

corresponds to the dot product between the surface

normal and the line of sight.

We also introduce two other masks to cope with the

occlusions that are inherent to in-hand scanning. Indeed,

if they are ignored, picture regions corresponding to the

operator’s hand may be introduced in the computation,

leading to visible artifacts in the ﬁnal texture. Thus,

when digitization is performed with the dark glove, an

occlusion mask is simply computed by thresholding pixel

intensities. In the case of a digitization made by hand, the

mask corresponds to the aforementioned skin probability

map produced by the scanner.

Each elementary mask contains values in the range ]0, 1],

zero being excluded in order to ensure that texturing

is guaranteed for every surface point that is visible in

at least one picture. They are multiplied all together

to produce the ﬁnal mask that selectively weights the

pixels of the corresponding picture. During this operation,

each elementary mask can obviously be applied more

than once. The inﬂuence of each criterion can then be

tuned independently, although we empirically determined

default values that work quite well for most cases.

C. Recovery of a texture of details

Despite the fact that in-hand scanning is a really conve-

nient technology, it often leads to a loss of accuracy with

respect to traditional scanning devices, thus preventing the

acquisition of the ﬁnest geometric details. Nevertheless,

thanks to the fact that we know the light position for

each video frame, it is possible to partially recover them

by using a photometric stereo approach.

Photometric stereo consists in computing high quality

normal/range maps by taking several photographs from

the same viewpoint but with different illumination direc-

tions [37]–[39], or by moving the object in front of a

camera and a light source that are ﬁxed with respect to

each other [40], [41]. We use here a similar approach for

extracting a normal map from the video ﬂow produced

by the in-hand scanner.

In the following, vectors are assumed to be column

vectors. Let {F

} be the set of frames corresponding to the

acquisition sequence. A light position ℓ

, corresponding

to the scanner projector’s location, is associated to each

frame F

. Assuming that the object surface is Lambertian,

the color c

observed at a given surface point p in F

136

JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012

HTML Viewer

Frequently Asked Questions (19)

Q1. What have the authors contributed in "From the digitization of cultural artifacts to the web publishing of digital 3d collections: an automatic pipeline for knowledge sharing" ?

In this paper, the authors introduce a novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field. The pipeline design is centered on automation and speed, so that it can be used by non expert users to produce multimedia content from potentially large object ’ s collections, like it may be the case in cultural heritage.

Q2. What have the authors stated for future works in "From the digitization of cultural artifacts to the web publishing of digital 3d collections: an automatic pipeline for knowledge sharing" ?

Other appealing directions of work could include the possibility to enrich the Web publishing phase, by auto- matically formatting a Web page based not only on the 3D model, but on other types of data, like text and images.

Q3. What is the way to store the geometry data?

Model data is saved under standard file formats: to store geometry information the authors use the Stanford polygon file format (PLY), which support multiple vertex attributes and binary encoding, while Portable Network Graphics (PNG) images are used for color and normal textures.

Q4. What is the correct way to describe the material properties of an object?

The most correct way to represent the material properties of an object is to describe them through a reflection function (e.g. BRDF), which attempts to model the observed scattering behavior of a class of real surfaces.

Q5. What is the effect of the projector illumination mask?

As expected, the projector illumination mask tends to increase the influence of image regions that correspond to the most illuminated surface parts, which leads to a conservation of luminosity.

Q6. Why are artifacts visible at the junctions between different images?

due to the lack of consistency from one image to another, artifacts are visible at the junctions between surface areas receiving color from different images.

Q7. How long does it take to recover the texture?

The diffuse color and detail textures recovery can take up to 5-10 minutes, while the data optimization for Web publishing is almost instantaneous.

Q8. What is the use of the interaction metaphor?

The interaction metaphor known as world-in-hand or trackball has been used to facilitate the artifact inspection by using the mouse.

Q9. How can the authors use a simple shadow mapping algorithm to estimate shadowed areas?

Since the complete geometric configura-tion of the scene is known, the authors can use a simple shadow mapping algorithm to estimate shadowed areas, to which a null weight is assigned.

Q10. How can the authors recover the occlusions of a video?

thanks to the fact that the authors know the light position for each video frame, it is possible to partially recover them by using a photometric stereo approach.

Q11. How many high quality 3D models can be made available every day?

Tens of high quality 3D models can be made available every day, for any kind of use (archival, study, presentation to the public).

Q12. What is the only light source in the scene?

During an acquisition, the only light source in the scene is the scanner projector itself, for which the position is always perfectly known.

Q13. What is the main reason for the lack of standardization of 3D content?

Beside the developers fragmentation that arises due to this wide variety of available tools and to their incompatibilities, the burden incumbent upon the user for the installation of additional software components prevented a wide adoption of online 3D content.

Q14. What other ways to make a Web page more appealing?

Other appealing directions of work could include the possibility to enrich the Web publishing phase, by auto-matically formatting a Web page based not only on the 3D model, but on other types of data, like text and images.

Q15. What is the common issue with the cited methods?

Another issue, which is common to all the cited methods, is the projection of lighting artifacts on the model, i.e. shadows, highlights, and peculiar BRDF effects, since the lighting environment is usually not known in advance.

Q16. What is the common method of avoiding lighting artifacts?

In order to correct (or to avoid to project) lighting artifacts, two possible approaches include the estimation of the lighting environment [23] or the use of easily controllable lighting setups [24].C. 3D graphics on the Web platformSince the birth of Internet, content of Web document has been characterized by several types of media, ranging from plain text to images, audio or video streams.

Q17. What is the advantage of the indexed triangle mesh?

It has the property to save a significant amount of space for the vast majority of 3D models, for which, on average, a vertex is referenced by six triangles.

Q18. What is the main advantage of the proposed approach?

Even though the proposed approach is generic enough to be used in any application for which producing and sharing digital content about real artifacts present an interest, its three main advantages (namely its ease of use, its high automation and its quickness) make it particularly appropriate to cases where huge collections have to be processed.

Q19. What is the advantage of the vertex cache?

To convey this advantage from memory occupancy to rendering performances, graphics accelerators have introduced a vertex cache capable of storing data associated to up to 32 vertices, thus allowing to reuse the results of a considerable amount of per-vertex calculations.

From the Digitization of Cultural Artifacts to the Web Publishing of Digital 3D Collections: an Automatic Pipeline for Knowledge Sharing

Summary (4 min read)

Introduction

A. Real-time 3D scanning

B. Color acquisition and visualization on 3D models

III. DIGITIZATION AND PROCESSING OF CH ARTIFACTS FOR GENERATING DIGITAL COPIES

A. Acquisition by in-hand scanning

B. Recovery of a diffuse color texture

C. Recovery of a texture of details

IV. WEB PUBLISHING

A. Data optimization

B. Data storage and retrieval

V. RESULTS

B. Diffuse color texture reconstruction

C. Detail texture reconstruction

VI. CONCLUSION

Figures (14)

Citations

Cites background from "From the Digitization of Cultural A..."

Cites background from "From the Digitization of Cultural A..."

Cites background from "From the Digitization of Cultural A..."

References

"From the Digitization of Cultural A..." refers background in this paper

"From the Digitization of Cultural A..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (19)

Q1. What have the authors contributed in "From the digitization of cultural artifacts to the web publishing of digital 3d collections: an automatic pipeline for knowledge sharing" ?

Q2. What have the authors stated for future works in "From the digitization of cultural artifacts to the web publishing of digital 3d collections: an automatic pipeline for knowledge sharing" ?

Q3. What is the way to store the geometry data?

Q4. What is the correct way to describe the material properties of an object?

Q5. What is the effect of the projector illumination mask?

Q6. Why are artifacts visible at the junctions between different images?

Q7. How long does it take to recover the texture?

Q8. What is the use of the interaction metaphor?

Q9. How can the authors use a simple shadow mapping algorithm to estimate shadowed areas?

Q10. How can the authors recover the occlusions of a video?

Q11. How many high quality 3D models can be made available every day?

Q12. What is the only light source in the scene?

Q13. What is the main reason for the lack of standardization of 3D content?

Q14. What other ways to make a Web page more appealing?

Q15. What is the common issue with the cited methods?

Q16. What is the common method of avoiding lighting artifacts?

Q17. What is the advantage of the indexed triangle mesh?

Q18. What is the main advantage of the proposed approach?

Q19. What is the advantage of the vertex cache?