scispace - formally typeset
Search or ask a question
Journal ArticleDOI

From the Digitization of Cultural Artifacts to the Web Publishing of Digital 3D Collections: an Automatic Pipeline for Knowledge Sharing

TL;DR: A novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field is introduced, consisting in a pipeline that covers all steps from the digitization of the objects up to the Web publishing of the resulting digital copies.
Abstract: In this paper, we introduce a novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field. It consists in a pipeline that covers all steps from the digitization of the objects up to the Web publishing of the resulting digital copies. During a first stage, the digitization is performed by a high speed 3D scanner that recovers the object's geometry. A second stage then extracts from the recovered data a color texture as well as a texture of details, in order to enrich the acquired geometry in a more realistic way. Finally, a third stage converts these data so that they are compatible with the recent WebGL paradigm, then providing 3D multimedia content directly exploitable by end-users by means of standard Internet browsers. The pipeline design is centered on automation and speed, so that it can be used by non expert users to produce mul- timedia content from potentially large object's collections, like it may be the case in cultural heritage. The choice of a high speed scanner is particularly adapted for such a design, since this kind of devices has the advantage of being fast and intuitive. Processing stages that follow the digitization are both completely automatic and "seamless", in the sense that it is not incumbent upon the user to perform tasks manually, nor to use external softwares that generally need additional operations to solve compatibility issues.

Summary (4 min read)

Introduction

  • In the field of cultural heritage (CH), knowledge sharing is one of the most essential aspects for communication activities between museal institutions, that conserve and take care of cultural collections, and the public.
  • For this purpose, multimedia technologies are becoming more and more widespread in the CH field, where these surrogates are then represented by digital copies.
  • Moreover, inasmuch as high speed scanning systems are often prone to a lack of accuracy with respect to more classic digitization technologies, the authors also estimate a normal texture from these data, once again in an automatic manner.
  • Hence, the archival and the sharing of vast item collections becomes possible and easy also for non expert users.
  • Section III presents the two first stages of their system, namely the in-hand scanner used for the acquisition, as well as their processing step for generating a digital copy from the acquired data.

A. Real-time 3D scanning

  • An overview of the 3D scanning and stereo reconstruction goes well beyond the scope of this paper.
  • Their main issues are the availability of technology and the problem of aligning data in a very fast way.
  • Among the latter, the most robust approach is based on the use of fast structured-light scanners [1], where a high speed camera and a projector are used to recover the range maps in real-time.
  • This is essentially due to the low resolution of the cameras, and to the difficulty of handling the peculiar illumination provided by the projector.
  • Other systems have been proposed which take into account also the color, but they are not able to achieve real-time performances [6] or to reconstruct the geometry in an accurate way [7].

B. Color acquisition and visualization on 3D models

  • The most flexible approach starts from a set of images acquired either in a second stage with respect to the geometry acquisition, or simultaneously but using different devices.
  • Image-to-geometry registration, which can be solved by automatic [8]–[10] or semiautomatic [11] approaches, is then necessary.
  • Due to the lack of consistency from one image to another, artifacts are visible at the junctions between surface areas receiving color from different images.
  • In particular, Callieri et al. [20] presented a flexible weighting system that can be extended in order to accommodate additional criteria.
  • The first tools aimed at visualizing 3D models in Web pages were based on embedded software components, such as Java applets or ActiveX controls [25].

III. DIGITIZATION AND PROCESSING OF CH ARTIFACTS FOR GENERATING DIGITAL COPIES

  • Cultural heritage has been a privileged field of application for 3D scanning since the beginning of its evolution.
  • This is due to the enormous variety and variability of the types of objects that can be acquired.
  • Moreover, archival and preservation are extremely important issues as well.
  • The acquisition of a large number of objects can be expensive both in terms of hardware and time needed for data processing.
  • They usually need the placement of markers on the object, which is something that is hard to make on CH artifacts.

A. Acquisition by in-hand scanning

  • The first stage of their workflow, namely the one producing all data required for the creation of digital copies from cultural artifacts, is based on an in-hand scanner whose hardware configuration is shown in Figure 2.
  • This scanner, like most of the high speed digitization systems, is based on structured light.
  • The scanning can be performed in two different ways.
  • Occlusions are detected by a hue analysis which produces, for each video frame, a map of skin presence probability.
  • These data are then used in the next stage of the workflow in order to produce the color texture and the texture of details, as explained hereafter, in section III-B.

B. Recovery of a diffuse color texture

  • The texturing method extends the work proposed in [20] so as to adapt it to the data flow produced by the scanning system presented above.
  • These criteria have been chosen so as to penalize image regions that are known to lack of accuracy, in order to deal with data redundancy from one image to another in © 2012 ACADEMY PUBLISHER a way that ensures seamless color transitions.
  • Since the complete geometric configura- tion of the scene is known, the authors can use a simple shadow mapping algorithm to estimate shadowed areas, to which a null weight is assigned.
  • During the subsequent passes, highlights are identified by computing the luminosity difference between the texture obtained at the previous pass and the input picture.
  • Each elementary mask contains values in the range ]0, 1], zero being excluded in order to ensure that texturing is guaranteed for every surface point that is visible in at least one picture.

C. Recovery of a texture of details

  • It often leads to a loss of accuracy with respect to traditional scanning devices, thus preventing the acquisition of the finest geometric details.
  • The authors use here a similar approach for extracting a normal map from the video flow produced by the in-hand scanner.
  • This uneven sampling distribution may result in an estimated normal which is reliable along this plane but really uncertain along the orthogonal direction.
  • To alleviate this problem, the authors propose to analyze the sampling distribution at each point p by performing a PCA on the set of light directions.
  • By definition, ν2 is the direction along which the sampling is the poorest.

IV. WEB PUBLISHING

  • After geometry and texture images have been processed, the third and last stage of their pipeline optimizes the generated data for network transmission and realtime rendering on standard Web browsers.
  • The optimized version of the 3D model is stored in the server file system and is accessed by a standard HTTP server to serve requests of visualization clients.
  • In the following, the authors describe the steps they use for preparing and storing the data.

A. Data optimization

  • The optimization phase is composed of two sequential steps: geometry partitioning and rendering optimizations.
  • To this end, the authors use a simple greedy method that iteratively adds triangles to a chunk until the maximum number of vertices is reached.
  • One advantage of the indexed triangle mesh representation is that vertices referenced by more than one triangle need to be stored only once, also known as Rendering optimizations.
  • To convey this advantage from memory occupancy to rendering performances, graphics accelerators have introduced a vertex cache capable of storing data associated to up to 32 vertices, thus allowing to reuse the results of a considerable amount of per-vertex calculations.
  • Even though the problem does not have a polynomial-time solution, several works have been developed [42], [43] that produce a very good approximate solution in a relatively small amount of time.

B. Data storage and retrieval

  • One of their goal is to exploit standard and easily available technologies for making the produced models accessible on the Web platform.
  • To this end, the authors decided to use the well-known Apache HTTP server and use the server file system as the storage database.
  • Model data is saved under standard file formats: to store geometry information the authors use the Stanford polygon file format (PLY), which support multiple vertex attributes and binary encoding, while Portable Network Graphics (PNG) images are used for color and normal textures.
  • Even though those formats are already compact, the authors take advantage of the automatic compression (gzip) applied by the Apache server on data transmission, as well as automatic decompression executed by browsers on data arrival.
  • To access the remote 3D model, visualization clients use JavaScript to issue a HTTP request with a base URL of the form http://example-data-domain.org/modelname/, and appending predefined file names to discriminate among geometry and texture files, such as geometry.ply, color.png and normal.png.

V. RESULTS

  • The authors present in this section some results of their pipeline, as well as some implementation details.
  • The proposed objects are a sample of a group of artifacts which were used to test the entire system.
  • The proposed results and processing times show that an extension to large collections is straightforward.
  • For the texture of details, both matrices (LTL) and (LTC) of equation 6 can be constructed incrementally by processing input pictures one by one on the GPU and accumulating intermediate results using buffer textures.
  • This permits to obtain the 3D models of several objects within an hour of work.

B. Diffuse color texture reconstruction

  • The authors texturing results are shown in the bottom row of Figure 6.
  • The most obvious difference that can be noticed is clearly the drastic loss of luminosity that occurs in the case of the naive approach.
  • For the Pot model , the big vertical crack in the white rectangle results from the fact that one portion of the surface was depicted by a much greater number of frames with respect to the adjacent one: this produces an imbalance among the number of summed color contributions, and the consequent abrupt change of color.
  • During this texturing phase, the only parameters that must be set by the user are the number of applications for each elementary mask.
  • The set of parameters is then really small and can be tuned in an easy and intuitive manner.

C. Detail texture reconstruction

  • Figure 8 illustrates the efficiency of their normal correction procedure by showing the normal field computed for the same object with and without correction.
  • Since the eigenvalue ratio decreases, the estimated normal is forced to get © 2012 ACADEMY PUBLISHER closer to the original mesh normal along the direction of highest uncertainty.
  • The frames on the right side of these images highlight once again the improvement resulting from the estimated texture of details.
  • Thus, the user do not have to perform on purpose an exhaustive measurement just to satisfy the fitting constraints.
  • As shown at the bottom of each Web page snapshot in Figure 13, the rendering performance is in the order of thousands of frames per second (FPS) for models that range from 50K to 100K triangles.

VI. CONCLUSION

  • The authors presented a complete pipeline for the creation of Web browsable digital content from real objects, consisting in 3D models enhanced by two textures respectively encoding artifact free color and fine geometric details.
  • Even though the proposed approach is generic enough to be used in any application for which producing and sharing digital content about real artifacts present an interest, its three main advantages (namely its ease of use, its high automation and its quickness) make it particularly appropriate to cases where huge collections have to be processed.
  • It is not easy to know in advance if enough frames have been acquired so that accurate color or fine geometric detail can be extracted safely.
  • Hence, a “Photo Tourism-like” [46] image navigation could be possible.

Did you find this useful? Give us your feedback

Figures (14)

Content maybe subject to copyright    Report

From the Digitization of Cultural Artifacts to the
Web Publishing of Digital 3D Collections:
an Automatic Pipeline for Knowledge Sharing
Fr
´
ed
´
eric Larue, Marco Di Benedetto, Matteo Dellepiane and Roberto Scopigno
ISTI-CNR, Pisa, Italy
Abstract In this paper, we introduce a novel approach
intended to simplify the production of multimedia content
from real objects for the purpose of knowledge sharing,
which is particularly appropriate to the cultural heritage
field. It consists in a pipeline that covers all steps from the
digitization of the objects up to the Web publishing of the
resulting digital copies. During a first stage, the digitization
is performed by a high speed 3D scanner that recovers
the object’s geometry. A second stage then extracts from
the recovered data a color texture as well as a texture of
details, in order to enrich the acquired geometry in a more
realistic way. Finally, a third stage converts these data so
that they are compatible with the recent WebGL paradigm,
then providing 3D multimedia content directly exploitable
by end-users by means of standard Internet browsers.
The pipeline design is centered on automation and speed,
so that it can be used by non expert users to produce mul-
timedia content from potentially large object’s collections,
like it may be the case in cultural heritage. The choice of a
high speed scanner is particularly adapted for such a design,
since this kind of devices has the advantage of being fast
and intuitive. Processing stages that follow the digitization
are both completely automatic and “seamless”, in the sense
that it is not incumbent upon the user to perform tasks
manually, nor to use external softwares that generally need
additional operations to solve compatibility issues.
I. INTRODUCTION
In the field of cultural heritage (CH), knowledge shar-
ing is one of the most essential aspects for communication
activities between museal institutions, that conserve and
take care of cultural collections, and the public. Among
other things, these activities include education, research
and study as well as entertainment. All of them are really
precious for the spread of culture. However, the public
is not the only one to benefit from knowledge sharing:
it is important for promotion and advertisement purposes
as well, which are both of a high interest for the museal
institutions themselves regarding visibility, development
and long term sustainability.
In order to preserve the integrity of cultural goods,
knowledge sharing generally makes use of surrogates so
as to avoid to expose directly the real artifacts to poten-
tial risks of deterioration. For this purpose, multimedia
technologies are becoming more and more widespread in
the CH field, where these surrogates are then represented
by digital copies. This popularity can be explained by
at least two reasons. On the one hand, computing tools
clearly provide an underlying easiness for data storage,
indexation, browsing and sharing, due to the existing
network facilities and to the new Web technologies. On
the other hand, recent advances in 3D scanning give the
possibility to create multimedia content from real arti-
facts, producing faithful digital imprints and avoiding the
tedious and time consuming task of a manual modeling
through CAD softwares.
Particularly, new high speed systems like in-hand scanners
present big advantages for CH. Firstly, they are able
to acquire digital copies in a few minutes, which is
really important when the multimedia content must be
produced from huge collections in reasonable times, or
when several fragments must be scanned in order to plan
the restoration of destroyed pieces. Moreover, they can
be manipulated by non-expert users as well, since they
provide an interactive feedback and rely on the temporal
coherency of the high-speed acquisition to get rid of the
traditional alignment problems that generally need to be
solved manually during a tedious post-processing phase.
Despite the availability of these technologies and their
increasing popularity, there is still a lack of global and
automatic solutions enabling to cover the whole process-
ing chain that ranges from content creation to content
publishing.
The first weak point of this chain occurs during the
acquisition itself: for CH applications, a good representa-
tion of the geometry is not sufficient to produce faithful
surrogates, since interactive visualization requires to be
able to provide synthetic images as near as possible
to the real appearance of the depicted object. In that
case, the geometry needs to be paired with an accurate
representation of the surface appearance (color, small
shape details, reflection characteristics). Unfortunately,
commercial scanning systems mostly focused on shape
measurement, putting aside until recently the problem of
recovering quality textures from real objects. This has led
to a lack of efficient and automatic processing tools for
color acquisition and reconstruction.
The second problem is that both tasks of creation and
publishing of multimedia content are generally totally
uncorrelated from each other. From a practical point of
view, it means that a different software must be used
for each of them. A work for converting the various
inputs/outputs in a compatible way is then necessary, and
generally consists in a manual task that is incumbent upon
the user.
132
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.2.132-144

Figure 1. Overview of the presented framework, that covers the whole chain from the 3D digitization of real CH artifacts up to the Web publishing
of the resulting digital 3D copies for archiving, browsing and visualization through Internet.
In this paper, we present a complete system that enables to
create colored 3D digital copies from existing artifacts and
to publish them directly on Internet, through an interactive
visualization based on WebGL technology. This system,
outlined on Figure 1, consists in three stages.
During the first one, acquisition is performed directly by
the user in an intuitive manner thanks to an in-hand digi-
tization device performing 3D scanning in real-time. The
data provided by the scanner, as well as some properties
specific to this kind of devices, are then exploited to
automatically produce a diffuse color texture for the 3D
model. This texture is deprived of the traditional visual
artifacts that may appear due to the presence in the input
pictures of shadows, specular highlights, lighting incon-
sistencies or calibration inaccuracies. Moreover, inasmuch
as high speed scanning systems are often prone to a
lack of accuracy with respect to more classic digitization
technologies, we also estimate a normal texture from these
data, once again in an automatic manner. This texture
captures the finest geometric details that may be missed
during the 3D acquisition, and can then be used afterwards
to enrich the original geometry during visualization.
Once geometry and texture information have been pro-
cessed, the third and last stage of the production pipeline
performs an optimization phase aimed at producing a
compact and Web-friendly version of the data. The output
of this stage will be used for real-time visualization
on commodity platforms. One of our main goals is the
archival and deployment of digital copies using standard,
well-settled and widely accessible technologies: in this
view, we use a standard Web server as our data provider,
and the WebGL technology to visualize and integrate the
digital copy on standard Web pages.
The contributions proposed in this paper can be summa-
rized as follows:
a complete and almost fully automatic pipeline for
the production of 3D multimedia content for Inter-
net applications, covering a chain ranging from the
digitization of real artifacts to the Web publishing of
the produced digital copies;
a texturing method specifically designed for real-
time scanning systems, that accounts for specific
properties of this kind of devices in order to improve
the acquired 3D model with a good quality color
texture without cracks nor illumination related visual
artifacts, as well as a normal texture capturing the
finest geometric features;
the coupling of intuitive acquisition techniques with
the recent paradigms proposed by WebGL technol-
ogy for Web publishing. Hence, the archival and the
sharing of vast item collections becomes possible and
easy also for non expert users.
The remainder of this paper is organized as follows. Sec-
tion II reviews the related work on software approaches
or complete systems for color acquisition, texture recon-
struction and real-time visualization on the Web platform.
Section III presents the two first stages of our system,
namely the in-hand scanner used for the acquisition, as
well as our processing step for generating a digital copy
from the acquired data. The third and last stage dedicated
to the preparation of the digital copy for Web publishing is
then presented in section IV. Finally, section V shows the
results achieved and section VI draws the conclusions.
II. RELATED WORK
A. Real-time 3D scanning
An overview of the 3D scanning and stereo reconstruc-
tion goes well beyond the scope of this paper. We will
mainly focus on systems for real-time, in-hand acquisition
of geometry and/or color. Their main issues are the
availability of technology and the problem of aligning
data in a very fast way.
Concerning the first point, 3D acquisition can be based on
stereo techniques or on active optical scanning solutions.
Among the latter, the most robust approach is based on
the use of fast structured-light scanners [1], where a
high speed camera and a projector are used to recover
the range maps in real-time. The alignment problem is
usually solved with smart implementations of the ICP
algorithm [2], [3], where the most difficult aspect to solve
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
133
© 2012 ACADEMY PUBLISHER

is related to the loop closure during registration.
In the last few years, some in-hand scanning solutions
have been proposed [2], [4], [5]: they essentially differ
on the way projection patterns are handled, and in the
implementation of ICP. None of the proposed systems
takes into account the acquisition of color, although the
one proposed by Weise et al. [5] contains also a color
camera (see next section for a detailed description). This
is essentially due to the low resolution of the cameras,
and to the difficulty of handling the peculiar illumination
provided by the projector. Other systems have been pro-
posed which take into account also the color, but they
are not able to achieve real-time performances [6] or to
reconstruct the geometry in an accurate way [7].
B. Color acquisition and visualization on 3D models
Adding color information to an acquired 3D model is
a complex task. The most flexible approach starts from
a set of images acquired either in a second stage with
respect to the geometry acquisition, or simultaneously but
using different devices. Image-to-geometry registration,
which can be solved by automatic [8]–[10] or semi-
automatic [11] approaches, is then necessary. In our case,
this registration step is not required, because the in-
hand scanning system provides images which are already
aligned to the 3D model.
Once alignment is performed, it is necessary to extract
information about the surface material appearance and
transfer it on the geometry. The most correct way to
represent the material properties of an object is to describe
them through a reflection function (e.g. BRDF), which
attempts to model the observed scattering behavior of a
class of real surfaces. A detailed presentation of its theory
and applications can be found in Dorsey [12]. Unfortu-
nately, state-of-the-art BRDF acquisition approaches rely
on complex and controlled illumination setups, making
them difficult to apply in more general cases, or when
fast or unconstrained acquisition is needed.
A less accurate but more robust solution is the direct
use of images to transfer the color to the 3D model. In
this case, the apparent color value is mapped onto the
digital object’s surface by applying an inverse projection.
In addition to other important issues, there are numerous
difficulties in selecting the correct color when multiple
candidates come from different images.
To solve these problems, a first group of methods selects,
for each surface part, a portion of a representative image
following a specific criterion in most cases, the orthog-
onality between the surface and the view direction [13],
[14]. However, due to the lack of consistency from one
image to another, artifacts are visible at the junctions
between surface areas receiving color from different im-
ages. They can be partially removed by working on these
junctions [13]–[15].
Another group of methods “blends” all image contri-
butions by assigning a weight to each one or to each
input pixel, and by selecting the final surface color as
the weighted average of the input data, as in Pulli et
al. [16]. The weight is usually a combination of various
quality metrics [17]–[19]. In particular, Callieri et al. [20]
presented a flexible weighting system that can be extended
in order to accommodate additional criteria. These meth-
ods provide better visual results and their implementation
permits very complex datasets to be used, i.e. hundreds
of images and very dense 3D models. Nevertheless,
undesirable ghosting effects may be produced when the
starting set of calibrated images is not perfectly aligned.
This problem can be solved, for example, by applying a
local warping using optical flow [21], [22].
Another issue, which is common to all the cited methods,
is the projection of lighting artifacts on the model, i.e.
shadows, highlights, and peculiar BRDF effects, since the
lighting environment is usually not known in advance. In
order to correct (or to avoid to project) lighting artifacts,
two possible approaches include the estimation of the
lighting environment [23] or the use of easily controllable
lighting setups [24].
C. 3D graphics on the Web platform
Since the birth of Internet, content of Web document
has been characterized by several types of media, ranging
from plain text to images, audio or video streams. When
personal computers have started being equipped with fast
enough graphics acceleration hardware, 3D content began
in its turn to have an important role in the multimedia
sphere. The first tools aimed at visualizing 3D models
in Web pages were based on embedded software compo-
nents, such as Java applets or ActiveX controls [25]. Sev-
eral proprietary plug-ins and extensions for Web browsers
were developed, giving evidence at the lack of standard-
ization for this new content type. Beside the developers
fragmentation that arises due to this wide variety of
available tools and to their incompatibilities, the burden
incumbent upon the user for the installation of additional
software components prevented a wide adoption of online
3D content.
Steps toward a standardization have been taken with the
introduction of the Virtual Reality Modeling Language
(VRML) [26] in 1995 and X3D [27] in 2007. However,
even though they have been well-accepted by the com-
munity, the 3D scene visualization was still delegated to
external software components.
The fundamental change happened in 2009 with the
introduction of the WebGL standard [28], promoted by
the Khronos Group [29]. With minor restrictions related to
security issues, the WebGL API is a one-to-one mapping
of the OpenGL|ES 2.0 specifications [30] in JavaScript.
This implies that modern Web browsers, like Google
Chrome or Mozilla Firefox, are able to natively access
the graphics hardware without needing additional plug-
ins or extensions. WebGL being a low-level API, a series
of higher-level libraries have been developed on top of
it. They differ from each other by the programming
paradigm they use, ranging from scene-graph-based in-
terfaces, like Scene.js [31] and GLGE [32], to procedural
paradigms, like SpiderGL [33] and WebGLU [34].
134
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
© 2012 ACADEMY PUBLISHER

Figure 2. The in-hand scanning device used during the first step of the
presented workflow, producing the data flow required for the generation
of cultural artifacts digital copies.
In our pipeline, as it will be shown, we use SpiderGL as
the rendering library for the real-time visualization of the
acquired digital copies.
III. DIGITIZATIO N AND PROCESSING OF CH
ARTIFACTS FOR GENERATING DIGITAL COPIES
Cultural heritage has been a privileged field of applica-
tion for 3D scanning since the beginning of its evolution.
This is due to the enormous variety and variability of the
types of objects that can be acquired. Moreover, archival
and preservation are extremely important issues as well.
Although 3D scanning can be considered now a ”mature”
technology, the acquisition of a large number of objects
can be expensive both in terms of hardware and time
needed for data processing. Very good results can be
achieved by customizing solutions for collections where
objects are almost of the same size and material, but
this can be expensive [35] or hard to extend to generic
cases [36]. Although some low cost and/or hand-held
devices are available, they usually need the placement of
markers on the object, which is something that is hard to
make on CH artifacts. Conversely, the presented method
uses only an affordable scanning system and does not
make any particular assumption on the measured objects
(except the fact that they are manipulable by hand),
neither for the scanning session itself nor for the post-
processing steps.
This section describes the two first stages of our work-
flow: how and with which technology real artifacts can
be easily digitized by the user (section III-A) and how
the resulting data are exploited to recover automatically
a color texture (section III-B) and a texture of details
(section III-C) to enrich the 3D model provided by the
acquisition.
A. Acquisition by in-hand scanning
The first stage of our workflow, namely the one produc-
ing all data required for the creation of digital copies from
cultural artifacts, is based on an in-hand scanner whose
hardware configuration is shown in Figure 2. This scanner,
like most of the high speed digitization systems, is based
on structured light. Shape measurement is performed by
phase-shifting, using three different sinusoidal patterns to
establish correspondences (and then to perform optical
triangulation) between the projector and the two black and
white video cameras. The phase unwrapping, namely how
the different signal periods are demodulated, is achieved
by a GPU stereo matching between both cameras (see [3],
[5] for more details). The whole process produces one
range map in 14ms. Simultaneously, a color video flow
is captured by the third camera. During an acquisition,
the only light source in the scene is the scanner projector
itself, for which the position is always perfectly known.
The scanning can be performed in two different ways.
If the object color is neither red nor brown, it can be
done by holding the object directly by hand. In this case,
occlusions are detected by a hue analysis which produces,
for each video frame, a map of skin presence probability.
Otherwise, a black glove must be used. Although much
less convenient for the scanning itself, it makes the
occlusion detection trivial by simply ignoring dark regions
in the input pictures.
Each scanning session then produces a 3D mesh and
a color video flow. For each frame of this video, the
viewpoint and the position of the light (ie. the scanner
projector) are given, as well as the skin probability map
in the case of a digitization performed by hand. These data
are then used in the next stage of the workflow in order
to produce the color texture and the texture of details, as
explained hereafter, in section III-B.
Even if this stage requires the intervention of the user, the
choice of a real-time scanner to perform the digitization is
particularly appropriate for non expert operators. Indeed,
for reasons already discussed in the introduction, its usage
is particularly easy and intuitive, and does not require
technical knowledge or manual post-processing. Finally,
it must be noticed that our method does not work only
with the presented scanner, but can be implemented for
any high speed digitization device that is based on the
same principle.
B. Recovery of a diffuse color texture
Our texturing method extends the work proposed
in [20] so as to adapt it to the data flow produced by the
scanning system presented above. The idea, summarized
in Figure 3, is to weight each input picture by a mask
(typically a gray scale image) which represents a per-pixel
confidence value. The final color of a given surface point
is then computed as the weighted average of all color
contributions coming from the pictures into which this
point is visible. Masks are built by the composition of
multiple elementary masks, which are themselves com-
puted by image processing applied either on the input
image or on a rendering of the mesh performed from the
same viewpoint.
In the original paper, three criteria related to viewing con-
ditions have been considered for the mask computation:
the distance to the camera, the orientation with respect to
the viewpoint, and the proximity to a step discontinuity.
These criteria have been chosen so as to penalize image
regions that are known to lack of accuracy, in order to
deal with data redundancy from one image to another in
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
135
© 2012 ACADEMY PUBLISHER

Figure 3. The diffuse color texture is computed as the weighted average of all video frames. Weights are obtained by the composition of multiple
elementary masks, each one corresponding to a particular criterion related to viewing, lighting or scanning conditions.
a way that ensures seamless color transitions. More details
about these masks can be found in [20].
Although sufficient to avoid texture cracks, these masks
cannot handle self projected shadows or specular high-
lights since knowledge about the lighting is necessary.
In our case, both positions of the viewpoint and the light
(projector lamp) are always exactly known. Moreover, the
light moves with the scanner, which means that highlights
and shadows are different for each frame, as well as the
illumination direction. We then define the three following
additional masks, that aim at making prevailing image
parts deprived of illumination effects:
Shadows. Since the complete geometric configura-
tion of the scene is known, we can use a simple
shadow mapping algorithm to estimate shadowed
areas, to which a null weight is assigned.
Specular highlights. Conversely to shadows, high-
lights partially depend on the object material, which
is unknown. For this reason, we use a multi-pass
algorithm to detect them. The first pass computes
the object texture without accounting for highlights.
Due to the high data redundancy, the averaging tends
to reduce their visual impact. During the subsequent
passes, highlights are identified by computing the
luminosity difference between the texture obtained
at the previous pass and the input picture. This dif-
ference corresponds to our highlight removal mask.
In practice, only two passes are sufficient.
Projector illumination. This mask aims at avoiding
luminosity loss during the averaging by giving more
influence to surface parts facing the light source. It
corresponds to the dot product between the surface
normal and the line of sight.
We also introduce two other masks to cope with the
occlusions that are inherent to in-hand scanning. Indeed,
if they are ignored, picture regions corresponding to the
operator’s hand may be introduced in the computation,
leading to visible artifacts in the final texture. Thus,
when digitization is performed with the dark glove, an
occlusion mask is simply computed by thresholding pixel
intensities. In the case of a digitization made by hand, the
mask corresponds to the aforementioned skin probability
map produced by the scanner.
Each elementary mask contains values in the range ]0, 1],
zero being excluded in order to ensure that texturing
is guaranteed for every surface point that is visible in
at least one picture. They are multiplied all together
to produce the final mask that selectively weights the
pixels of the corresponding picture. During this operation,
each elementary mask can obviously be applied more
than once. The influence of each criterion can then be
tuned independently, although we empirically determined
default values that work quite well for most cases.
C. Recovery of a texture of details
Despite the fact that in-hand scanning is a really conve-
nient technology, it often leads to a loss of accuracy with
respect to traditional scanning devices, thus preventing the
acquisition of the finest geometric details. Nevertheless,
thanks to the fact that we know the light position for
each video frame, it is possible to partially recover them
by using a photometric stereo approach.
Photometric stereo consists in computing high quality
normal/range maps by taking several photographs from
the same viewpoint but with different illumination direc-
tions [37]–[39], or by moving the object in front of a
camera and a light source that are fixed with respect to
each other [40], [41]. We use here a similar approach for
extracting a normal map from the video flow produced
by the in-hand scanner.
In the following, vectors are assumed to be column
vectors. Let {F
i
} be the set of frames corresponding to the
acquisition sequence. A light position
i
, corresponding
to the scanner projector’s location, is associated to each
frame F
i
. Assuming that the object surface is Lambertian,
the color c
i
observed at a given surface point p in F
i
is
136
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
© 2012 ACADEMY PUBLISHER

Citations
More filters
Journal ArticleDOI
TL;DR: The results indicated that students had a positive attitude toward the use of an interactive virtual museum in cultural heritage education, and confirmed the views of experts regarding the importance and the value of virtual museums as a method for effective learning about cultural heritage.
Abstract: The goal of this study was to investigate students’ views of the interactive Virtual Museum of Al Hassa Cultural Heritage. In this context, a study was carried out during the second semester of the 2014–2015 school year among sixth-grade elementary school students in Al Hassa, Saudi Arabia. After participating in an interactive virtual museum, 118 students answered a questionnaire after the teaching intervention. SPSS v.21 was used to analyze the data. The results indicated that students had a positive attitude toward the use of an interactive virtual museum in cultural heritage education. The results support the inclusion of cultural heritage in the social studies curricula in K–12 education in Saudi Arabia in order to raise awareness and knowledge of national heritage. The results also confirmed the views of experts regarding the importance and the value of virtual museums as a method for effective learning about cultural heritage.

14 citations


Cites background from "From the Digitization of Cultural A..."

  • ...The importance of cultural heritage education has been underlined in the literature review [2-4, 30]....

    [...]

  • ...A number of studies have claimed that museums support and enhance cultural heritage education [1-4]....

    [...]

  • ...In the field of cultural heritage, knowledge sharing is an essential aspect of communication that preserves and maintains cultural collections [2]....

    [...]

  • ...Multimedia technologies such as VMs are increasingly prevalent in cultural heritage [2]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors presented a methodology for 3D scanning and processing of the acquired data on the surface condition of large architectural objects to develop their models and make various variants of their visualization.
Abstract: The development of information technology (IT) now allows for rapid, semi-automatic digitization of cultural heritage objects, both typical museum exhibits and architectural monuments. However, the same IT development makes it possible to disseminate the results of digitization, i.e., to reach a wide audience. This article presents the methodology of 3D scanning and processing of the acquired data on the surface condition of large architectural objects to develop their models and make various variants of their visualization. The developed methodology was divided into many stages and steps. It provides methods, techniques, and recommendations on how to solve specific problems that interfere with the process of data acquisition under typical conditions of continuous access to the objects by tourists and the impossibility of turning the object off from the visitors. The methodology prepared was used and tested during field trips to Kazakhstan and Uzbekistan in 2018--2019 to prepare digital visualizations of objects belonging to the group of monuments called Timurid architecture. Preparation and in situ verification of 3D data acquisition procedures under different traditions, cultures, religions, customs, administrative procedures, time constraints, technical resources, and climatic conditions is of great importance for effectively equipping research expeditions dealing with the digitization of architecture in different areas of the world.

13 citations

Journal ArticleDOI
TL;DR: A new local shape descriptor for 3D surfaces, called the histogram of spherical orientations (HoSO), is developed, which is used in combination with a bag-of-words approach to compute visual similarity between3D surfaces.
Abstract: We address the problem of the statistical description of 3D surfaces with the purpose of automatic classification and retrieval of archaeological potsherds. These are particularly interesting problems in archaeology, as pottery comprises a great volume of findings in archaeological excavations. Indeed, the analysis of potsherds brings relevant cues for understanding the culture of ancient groups. In particular, we develop a new local shape descriptor for 3D surfaces, called the histogram of spherical orientations (HoSO), which we use in combination with a bag-of-words approach to compute visual similarity between 3D surfaces. Given a point of interest on a 3D surface, its local shape descriptor (HoSO) captures the distribution of the spherical orientations of its neighboring points. In turn, those spherical orientations are computed with respect to the point of interest itself, both in the azimuth and the zenith axis. The proposed HoSO is invariant to scale transformations and highly robust to rotation and noise. In addition, it is efficient, as it only exploits the information of the position of the 3D points and disregards other types of information like faces or normals. We performed experiments on a set of 3D surfaces representing potsherds from the Teotihuacan civilization and further validations on a set of 3D models of generic objects. Our results show that our methodology is effective for describing 3D models and that it improves classification performance with respect to previous local descriptors.

12 citations


Cites background from "From the Digitization of Cultural A..."

  • ...…the cultural value and fragility of objects compel professionals to search for better ways to guarantee long-term archiving [Razdan et al. 2001; Larue et al. 2012], as well as to expedite sharing of This work was partially supported by the Swiss-NSF through the projects Tepalcatl P2ELP2 152166…...

    [...]

  • ...In recent years, 3D digitization has become a standard technique to document shape of artifacts, especially in fields like archaeology [Maiza and Gaildrat 2005; Karasik and Smilanski 2008], where the cultural value and fragility of objects compel professionals to search for better ways to guarantee long-term archiving [Razdan et al. 2001; Larue et al. 2012], as well as to expedite sharing of...

    [...]

DOI
01 Jan 2013
TL;DR: This paper focuses on content-based image retrieval, which involves clustering, sparse coding, and histogram of orientations in the context of Maya civilization.
Abstract: Keywords: content-based image retrieval ; shape descriptor ; histogram of orientations ; clustering ; sparse coding ; image detection ; cultural heritage ; Maya civilization ; hieroglyphs These Ecole polytechnique federale de Lausanne EPFL, n° 5616 (2013)Programme doctoral Genie electriqueFaculte des sciences et techniques de l'ingenieurInstitut de genie electrique et electroniqueLaboratoire de l'IDIAPJury: S. Susstrunk (presidente), S. Marchand-Maillet, J.-Ph. Thiran, C. Wang Public defense: 2013-2-27 Reference doi:10.5075/epfl-thesis-5616Print copy in library catalog Record created on 2013-02-20, modified on 2017-05-10

7 citations

Book ChapterDOI
01 Nov 2014
TL;DR: The Tepalcatl project, an ongoing bi-disciplinary effort conducted by archaeologists and computer vision researchers, which focuses on developing statistical methods for the automatic categorization of potsherds from ancient Mexico including the Teotihuacan and Aztec civilizations, is introduced.
Abstract: We introduce the Tepalcatl project, an ongoing bi-disciplinary effort conducted by archaeologists and computer vision researchers, which focuses on developing statistical methods for the automatic categorization of potsherds; more precisely, potsherds from ancient Mexico including the Teotihuacan and Aztec civilizations. We captured 3D models of several potsherds, and annotated them using seven taxonomic criteria appropriate for categorization. Our first task consisted in exploiting the descriptive power of two state-of-the-art 3D descriptors. Then, we evaluated their retrieval and classification performance. Finally, we investigated the effects of dimensionality reduction for categorization of our data. Our results are promising and demonstrate the potential of computer vision techniques for archaeological classification of potsherds.

7 citations


Cites background from "From the Digitization of Cultural A..."

  • ...The extraction of 3D digital data has also brought extra benefits, such as the possibility to undertake new types of content analyses, as well as an easier sharing of information among professionals, the design of better ceramic documentation and archiving systems [6, 7], and the performance of virtual reconstruction of vessels [4, 5]....

    [...]

  • ...This is especially true with regard to the creation and use of digital 3D models, which enable capabilities that would not be available using the original artifacts, such as automatic and semi-automatic content analysis [2, 3], virtual reconstructions[4, 5], more efficient archiving [6, 7], sharing documentation online [1, 7], training of novel scholars, etc....

    [...]

References
More filters
Book
20 Dec 2007
TL;DR: Dorsey and Rushmeier as mentioned in this paper provide a comprehensive treatment of the digital modeling of material appearance, based on the physics of how light interacts with materials, how people perceive appearance, and the implications of rendering appearance on a digital computer.
Abstract: Computer graphics systems are capable of generating stunningly realistic images of objects that have never physically existed. In order for computers to create these accurately detailed images, digital models of appearance must include robust data to give viewers a credible visual impression of the depicted materials. In particular, digital models demonstrating the nuances of how materials interact with light are essential to this capability. This is the first comprehensive work on the digital modeling of material appearance: it explains how models from physics and engineering are combined with keen observation skills for use in computer graphics rendering. Written by the foremost experts in appearance modeling and rendering, this book is for practitioners who want a general framework for understanding material modeling tools, and also for researchers pursuing the development of new modeling techniques. The text is not a "how to" guide for a particular software system. Instead, it provides a thorough discussion of foundations and detailed coverage of key advances. Practitioners and researchers in applications such as architecture, theater, product development, cultural heritage documentation, visual simulation and training, as well as traditional digital application areas such as feature film, television, and computer games, will benefit from this much needed resource. ABOUT THE AUTHORS Julie Dorsey and Holly Rushmeier are professors in the Computer Science Department at Yale University and co-directors of the Yale Computer Graphics Group. Francois Sillion is a senior researcher with INRIA (Institut National de Recherche en Informatique et Automatique), and director of its Grenoble Rhone-Alpes research center. * First comprehensive treatment of the digital modeling of material appearance; * Provides a foundation for modeling appearance, based on the physics of how light interacts with materials, how people perceive appearance, and the implications of rendering appearance on a digital computer; * An invaluable, one-stop resource for practitioners and researchers in a variety of fields dealing with the digital modeling of material appearance.

175 citations

Journal ArticleDOI
01 Aug 2008
TL;DR: This work presents an inexpensive system for acquiring all three types of information, and associated metadata, for small objects such as fragments of wall paintings, and presents a novel 3-D matching algorithm that efficiently searches for matching fragments using the scanned geometry.
Abstract: Although mature technologies exist for acquiring images, geometry, and normals of small objects, they remain cumbersome and time-consuming for non-experts to employ on a large scale. In an archaeological setting, a practical acquisition system for routine use on every artifact and fragment would open new possibilities for archiving, analysis, and dissemination. We present an inexpensive system for acquiring all three types of information, and associated metadata, for small objects such as fragments of wall paintings. The acquisition system requires minimal supervision, so that a single, non-expert user can scan at least 10 fragments per hour. To achieve this performance, we introduce new algorithms to robustly and automatically align range scans, register 2-D scans to 3-D geometry, and compute normals from 2-D scans. As an illustrative application, we present a novel 3-D matching algorithm that efficiently searches for matching fragments using the scanned geometry.

161 citations


"From the Digitization of Cultural A..." refers background in this paper

  • ...Very good results can be achieved by customizing solutions for collections where objects are almost of the same size and material, but this can be expensive [35] or hard to extend to generic cases [36]....

    [...]

Journal ArticleDOI
Ran Gal1, Yonatan Wexler2, Eyal Ofek2, Hugues Hoppe2, Daniel Cohen-Or1 
01 May 2010
TL;DR: This work presents an automatic method to recover high‐resolution texture over an object by mapping detailed photographs onto its surface, thereby allowing simple and easy creation of textured models for use in computer graphics.
Abstract: We present an automatic method to recover high-resolution texture over an object by mapping detailed photographs onto its surface. Such high-resolution detail often reveals inaccuracies in geometry and registration, as well as lighting variations and surface reflections. Simple image projection results in visible seams on the surface. We minimize such seams using a global optimization that assigns compatible texture to adjacent triangles. The key idea is to search not only combinatorially over the source images, but also over a set of local image transformations that compensate for geometric misalignment. This broad search space is traversed using a discrete labeling algorithm, aided by a coarse-to-fine strategy. Our approach significantly improves resilience to acquisition errors, thereby allowing simple and easy creation of textured models for use in computer graphics.

148 citations

Journal ArticleDOI
TL;DR: This paper presents an approach where a multivariate blending function weights all the available pixel data with respect to geometric, topological and colorimetric criteria and selectively mapped on the geometry to make profitable use of all the data available and to avoid the texture size bottleneck.

146 citations

Journal ArticleDOI
TL;DR: Results are presented from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators.
Abstract: Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40% were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65-84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80% agreement with manually annotated boundary positions at a 0.35 false alarm rate. Categorization was considerably more challenging, with a nearest-neighbor technique yielding F=0.3. This is less than half the value obtained by the same technique on a standard newswire categorization benchmark, but replication on human-transcribed interviews showed that ASR errors explain little of that difference. The paper concludes with a description of how these capabilities could be used together to search large collections of recorded oral histories.

138 citations


"From the Digitization of Cultural A..." refers background in this paper

  • ...right) A figure from page 109 of [4], an 1858 text on honors and decorations....

    [...]

  • ...Chroma vectors, introduced in [4] to capture the harmonic content of a signal as explained in Section IV, are probably the most popular audio feature in the literature on music identification....

    [...]

  • ...Structure from motion is a well-studied problem [7], [4]....

    [...]

  • ...recorded by survivors and witnesses of the Holocaust [4]....

    [...]

  • ...In the last few years, some in-hand scanning solutions have been proposed [2], [4], [5]: they essentially differ on the way projection patterns are handled, and in the implementation of ICP....

    [...]

Frequently Asked Questions (19)
Q1. What have the authors contributed in "From the digitization of cultural artifacts to the web publishing of digital 3d collections: an automatic pipeline for knowledge sharing" ?

In this paper, the authors introduce a novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field. The pipeline design is centered on automation and speed, so that it can be used by non expert users to produce multimedia content from potentially large object ’ s collections, like it may be the case in cultural heritage. 

Other appealing directions of work could include the possibility to enrich the Web publishing phase, by auto- matically formatting a Web page based not only on the 3D model, but on other types of data, like text and images. 

Model data is saved under standard file formats: to store geometry information the authors use the Stanford polygon file format (PLY), which support multiple vertex attributes and binary encoding, while Portable Network Graphics (PNG) images are used for color and normal textures. 

The most correct way to represent the material properties of an object is to describe them through a reflection function (e.g. BRDF), which attempts to model the observed scattering behavior of a class of real surfaces. 

As expected, the projector illumination mask tends to increase the influence of image regions that correspond to the most illuminated surface parts, which leads to a conservation of luminosity. 

due to the lack of consistency from one image to another, artifacts are visible at the junctions between surface areas receiving color from different images. 

The diffuse color and detail textures recovery can take up to 5-10 minutes, while the data optimization for Web publishing is almost instantaneous. 

The interaction metaphor known as world-in-hand or trackball has been used to facilitate the artifact inspection by using the mouse. 

Since the complete geometric configura-tion of the scene is known, the authors can use a simple shadow mapping algorithm to estimate shadowed areas, to which a null weight is assigned. 

thanks to the fact that the authors know the light position for each video frame, it is possible to partially recover them by using a photometric stereo approach. 

Tens of high quality 3D models can be made available every day, for any kind of use (archival, study, presentation to the public). 

During an acquisition, the only light source in the scene is the scanner projector itself, for which the position is always perfectly known. 

Beside the developers fragmentation that arises due to this wide variety of available tools and to their incompatibilities, the burden incumbent upon the user for the installation of additional software components prevented a wide adoption of online 3D content. 

Other appealing directions of work could include the possibility to enrich the Web publishing phase, by auto-matically formatting a Web page based not only on the 3D model, but on other types of data, like text and images. 

Another issue, which is common to all the cited methods, is the projection of lighting artifacts on the model, i.e. shadows, highlights, and peculiar BRDF effects, since the lighting environment is usually not known in advance. 

In order to correct (or to avoid to project) lighting artifacts, two possible approaches include the estimation of the lighting environment [23] or the use of easily controllable lighting setups [24].C. 3D graphics on the Web platformSince the birth of Internet, content of Web document has been characterized by several types of media, ranging from plain text to images, audio or video streams. 

It has the property to save a significant amount of space for the vast majority of 3D models, for which, on average, a vertex is referenced by six triangles. 

Even though the proposed approach is generic enough to be used in any application for which producing and sharing digital content about real artifacts present an interest, its three main advantages (namely its ease of use, its high automation and its quickness) make it particularly appropriate to cases where huge collections have to be processed. 

To convey this advantage from memory occupancy to rendering performances, graphics accelerators have introduced a vertex cache capable of storing data associated to up to 32 vertices, thus allowing to reuse the results of a considerable amount of per-vertex calculations.