scispace - formally typeset
Search or ask a question
Journal ArticleDOI

From the Digitization of Cultural Artifacts to the Web Publishing of Digital 3D Collections: an Automatic Pipeline for Knowledge Sharing

TL;DR: A novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field is introduced, consisting in a pipeline that covers all steps from the digitization of the objects up to the Web publishing of the resulting digital copies.
Abstract: In this paper, we introduce a novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field. It consists in a pipeline that covers all steps from the digitization of the objects up to the Web publishing of the resulting digital copies. During a first stage, the digitization is performed by a high speed 3D scanner that recovers the object's geometry. A second stage then extracts from the recovered data a color texture as well as a texture of details, in order to enrich the acquired geometry in a more realistic way. Finally, a third stage converts these data so that they are compatible with the recent WebGL paradigm, then providing 3D multimedia content directly exploitable by end-users by means of standard Internet browsers. The pipeline design is centered on automation and speed, so that it can be used by non expert users to produce mul- timedia content from potentially large object's collections, like it may be the case in cultural heritage. The choice of a high speed scanner is particularly adapted for such a design, since this kind of devices has the advantage of being fast and intuitive. Processing stages that follow the digitization are both completely automatic and "seamless", in the sense that it is not incumbent upon the user to perform tasks manually, nor to use external softwares that generally need additional operations to solve compatibility issues.

Summary (4 min read)

Introduction

  • In the field of cultural heritage (CH), knowledge sharing is one of the most essential aspects for communication activities between museal institutions, that conserve and take care of cultural collections, and the public.
  • For this purpose, multimedia technologies are becoming more and more widespread in the CH field, where these surrogates are then represented by digital copies.
  • Moreover, inasmuch as high speed scanning systems are often prone to a lack of accuracy with respect to more classic digitization technologies, the authors also estimate a normal texture from these data, once again in an automatic manner.
  • Hence, the archival and the sharing of vast item collections becomes possible and easy also for non expert users.
  • Section III presents the two first stages of their system, namely the in-hand scanner used for the acquisition, as well as their processing step for generating a digital copy from the acquired data.

A. Real-time 3D scanning

  • An overview of the 3D scanning and stereo reconstruction goes well beyond the scope of this paper.
  • Their main issues are the availability of technology and the problem of aligning data in a very fast way.
  • Among the latter, the most robust approach is based on the use of fast structured-light scanners [1], where a high speed camera and a projector are used to recover the range maps in real-time.
  • This is essentially due to the low resolution of the cameras, and to the difficulty of handling the peculiar illumination provided by the projector.
  • Other systems have been proposed which take into account also the color, but they are not able to achieve real-time performances [6] or to reconstruct the geometry in an accurate way [7].

B. Color acquisition and visualization on 3D models

  • The most flexible approach starts from a set of images acquired either in a second stage with respect to the geometry acquisition, or simultaneously but using different devices.
  • Image-to-geometry registration, which can be solved by automatic [8]–[10] or semiautomatic [11] approaches, is then necessary.
  • Due to the lack of consistency from one image to another, artifacts are visible at the junctions between surface areas receiving color from different images.
  • In particular, Callieri et al. [20] presented a flexible weighting system that can be extended in order to accommodate additional criteria.
  • The first tools aimed at visualizing 3D models in Web pages were based on embedded software components, such as Java applets or ActiveX controls [25].

III. DIGITIZATION AND PROCESSING OF CH ARTIFACTS FOR GENERATING DIGITAL COPIES

  • Cultural heritage has been a privileged field of application for 3D scanning since the beginning of its evolution.
  • This is due to the enormous variety and variability of the types of objects that can be acquired.
  • Moreover, archival and preservation are extremely important issues as well.
  • The acquisition of a large number of objects can be expensive both in terms of hardware and time needed for data processing.
  • They usually need the placement of markers on the object, which is something that is hard to make on CH artifacts.

A. Acquisition by in-hand scanning

  • The first stage of their workflow, namely the one producing all data required for the creation of digital copies from cultural artifacts, is based on an in-hand scanner whose hardware configuration is shown in Figure 2.
  • This scanner, like most of the high speed digitization systems, is based on structured light.
  • The scanning can be performed in two different ways.
  • Occlusions are detected by a hue analysis which produces, for each video frame, a map of skin presence probability.
  • These data are then used in the next stage of the workflow in order to produce the color texture and the texture of details, as explained hereafter, in section III-B.

B. Recovery of a diffuse color texture

  • The texturing method extends the work proposed in [20] so as to adapt it to the data flow produced by the scanning system presented above.
  • These criteria have been chosen so as to penalize image regions that are known to lack of accuracy, in order to deal with data redundancy from one image to another in © 2012 ACADEMY PUBLISHER a way that ensures seamless color transitions.
  • Since the complete geometric configura- tion of the scene is known, the authors can use a simple shadow mapping algorithm to estimate shadowed areas, to which a null weight is assigned.
  • During the subsequent passes, highlights are identified by computing the luminosity difference between the texture obtained at the previous pass and the input picture.
  • Each elementary mask contains values in the range ]0, 1], zero being excluded in order to ensure that texturing is guaranteed for every surface point that is visible in at least one picture.

C. Recovery of a texture of details

  • It often leads to a loss of accuracy with respect to traditional scanning devices, thus preventing the acquisition of the finest geometric details.
  • The authors use here a similar approach for extracting a normal map from the video flow produced by the in-hand scanner.
  • This uneven sampling distribution may result in an estimated normal which is reliable along this plane but really uncertain along the orthogonal direction.
  • To alleviate this problem, the authors propose to analyze the sampling distribution at each point p by performing a PCA on the set of light directions.
  • By definition, ν2 is the direction along which the sampling is the poorest.

IV. WEB PUBLISHING

  • After geometry and texture images have been processed, the third and last stage of their pipeline optimizes the generated data for network transmission and realtime rendering on standard Web browsers.
  • The optimized version of the 3D model is stored in the server file system and is accessed by a standard HTTP server to serve requests of visualization clients.
  • In the following, the authors describe the steps they use for preparing and storing the data.

A. Data optimization

  • The optimization phase is composed of two sequential steps: geometry partitioning and rendering optimizations.
  • To this end, the authors use a simple greedy method that iteratively adds triangles to a chunk until the maximum number of vertices is reached.
  • One advantage of the indexed triangle mesh representation is that vertices referenced by more than one triangle need to be stored only once, also known as Rendering optimizations.
  • To convey this advantage from memory occupancy to rendering performances, graphics accelerators have introduced a vertex cache capable of storing data associated to up to 32 vertices, thus allowing to reuse the results of a considerable amount of per-vertex calculations.
  • Even though the problem does not have a polynomial-time solution, several works have been developed [42], [43] that produce a very good approximate solution in a relatively small amount of time.

B. Data storage and retrieval

  • One of their goal is to exploit standard and easily available technologies for making the produced models accessible on the Web platform.
  • To this end, the authors decided to use the well-known Apache HTTP server and use the server file system as the storage database.
  • Model data is saved under standard file formats: to store geometry information the authors use the Stanford polygon file format (PLY), which support multiple vertex attributes and binary encoding, while Portable Network Graphics (PNG) images are used for color and normal textures.
  • Even though those formats are already compact, the authors take advantage of the automatic compression (gzip) applied by the Apache server on data transmission, as well as automatic decompression executed by browsers on data arrival.
  • To access the remote 3D model, visualization clients use JavaScript to issue a HTTP request with a base URL of the form http://example-data-domain.org/modelname/, and appending predefined file names to discriminate among geometry and texture files, such as geometry.ply, color.png and normal.png.

V. RESULTS

  • The authors present in this section some results of their pipeline, as well as some implementation details.
  • The proposed objects are a sample of a group of artifacts which were used to test the entire system.
  • The proposed results and processing times show that an extension to large collections is straightforward.
  • For the texture of details, both matrices (LTL) and (LTC) of equation 6 can be constructed incrementally by processing input pictures one by one on the GPU and accumulating intermediate results using buffer textures.
  • This permits to obtain the 3D models of several objects within an hour of work.

B. Diffuse color texture reconstruction

  • The authors texturing results are shown in the bottom row of Figure 6.
  • The most obvious difference that can be noticed is clearly the drastic loss of luminosity that occurs in the case of the naive approach.
  • For the Pot model , the big vertical crack in the white rectangle results from the fact that one portion of the surface was depicted by a much greater number of frames with respect to the adjacent one: this produces an imbalance among the number of summed color contributions, and the consequent abrupt change of color.
  • During this texturing phase, the only parameters that must be set by the user are the number of applications for each elementary mask.
  • The set of parameters is then really small and can be tuned in an easy and intuitive manner.

C. Detail texture reconstruction

  • Figure 8 illustrates the efficiency of their normal correction procedure by showing the normal field computed for the same object with and without correction.
  • Since the eigenvalue ratio decreases, the estimated normal is forced to get © 2012 ACADEMY PUBLISHER closer to the original mesh normal along the direction of highest uncertainty.
  • The frames on the right side of these images highlight once again the improvement resulting from the estimated texture of details.
  • Thus, the user do not have to perform on purpose an exhaustive measurement just to satisfy the fitting constraints.
  • As shown at the bottom of each Web page snapshot in Figure 13, the rendering performance is in the order of thousands of frames per second (FPS) for models that range from 50K to 100K triangles.

VI. CONCLUSION

  • The authors presented a complete pipeline for the creation of Web browsable digital content from real objects, consisting in 3D models enhanced by two textures respectively encoding artifact free color and fine geometric details.
  • Even though the proposed approach is generic enough to be used in any application for which producing and sharing digital content about real artifacts present an interest, its three main advantages (namely its ease of use, its high automation and its quickness) make it particularly appropriate to cases where huge collections have to be processed.
  • It is not easy to know in advance if enough frames have been acquired so that accurate color or fine geometric detail can be extracted safely.
  • Hence, a “Photo Tourism-like” [46] image navigation could be possible.

Did you find this useful? Give us your feedback

Figures (14)

Content maybe subject to copyright    Report

From the Digitization of Cultural Artifacts to the
Web Publishing of Digital 3D Collections:
an Automatic Pipeline for Knowledge Sharing
Fr
´
ed
´
eric Larue, Marco Di Benedetto, Matteo Dellepiane and Roberto Scopigno
ISTI-CNR, Pisa, Italy
Abstract In this paper, we introduce a novel approach
intended to simplify the production of multimedia content
from real objects for the purpose of knowledge sharing,
which is particularly appropriate to the cultural heritage
field. It consists in a pipeline that covers all steps from the
digitization of the objects up to the Web publishing of the
resulting digital copies. During a first stage, the digitization
is performed by a high speed 3D scanner that recovers
the object’s geometry. A second stage then extracts from
the recovered data a color texture as well as a texture of
details, in order to enrich the acquired geometry in a more
realistic way. Finally, a third stage converts these data so
that they are compatible with the recent WebGL paradigm,
then providing 3D multimedia content directly exploitable
by end-users by means of standard Internet browsers.
The pipeline design is centered on automation and speed,
so that it can be used by non expert users to produce mul-
timedia content from potentially large object’s collections,
like it may be the case in cultural heritage. The choice of a
high speed scanner is particularly adapted for such a design,
since this kind of devices has the advantage of being fast
and intuitive. Processing stages that follow the digitization
are both completely automatic and “seamless”, in the sense
that it is not incumbent upon the user to perform tasks
manually, nor to use external softwares that generally need
additional operations to solve compatibility issues.
I. INTRODUCTION
In the field of cultural heritage (CH), knowledge shar-
ing is one of the most essential aspects for communication
activities between museal institutions, that conserve and
take care of cultural collections, and the public. Among
other things, these activities include education, research
and study as well as entertainment. All of them are really
precious for the spread of culture. However, the public
is not the only one to benefit from knowledge sharing:
it is important for promotion and advertisement purposes
as well, which are both of a high interest for the museal
institutions themselves regarding visibility, development
and long term sustainability.
In order to preserve the integrity of cultural goods,
knowledge sharing generally makes use of surrogates so
as to avoid to expose directly the real artifacts to poten-
tial risks of deterioration. For this purpose, multimedia
technologies are becoming more and more widespread in
the CH field, where these surrogates are then represented
by digital copies. This popularity can be explained by
at least two reasons. On the one hand, computing tools
clearly provide an underlying easiness for data storage,
indexation, browsing and sharing, due to the existing
network facilities and to the new Web technologies. On
the other hand, recent advances in 3D scanning give the
possibility to create multimedia content from real arti-
facts, producing faithful digital imprints and avoiding the
tedious and time consuming task of a manual modeling
through CAD softwares.
Particularly, new high speed systems like in-hand scanners
present big advantages for CH. Firstly, they are able
to acquire digital copies in a few minutes, which is
really important when the multimedia content must be
produced from huge collections in reasonable times, or
when several fragments must be scanned in order to plan
the restoration of destroyed pieces. Moreover, they can
be manipulated by non-expert users as well, since they
provide an interactive feedback and rely on the temporal
coherency of the high-speed acquisition to get rid of the
traditional alignment problems that generally need to be
solved manually during a tedious post-processing phase.
Despite the availability of these technologies and their
increasing popularity, there is still a lack of global and
automatic solutions enabling to cover the whole process-
ing chain that ranges from content creation to content
publishing.
The first weak point of this chain occurs during the
acquisition itself: for CH applications, a good representa-
tion of the geometry is not sufficient to produce faithful
surrogates, since interactive visualization requires to be
able to provide synthetic images as near as possible
to the real appearance of the depicted object. In that
case, the geometry needs to be paired with an accurate
representation of the surface appearance (color, small
shape details, reflection characteristics). Unfortunately,
commercial scanning systems mostly focused on shape
measurement, putting aside until recently the problem of
recovering quality textures from real objects. This has led
to a lack of efficient and automatic processing tools for
color acquisition and reconstruction.
The second problem is that both tasks of creation and
publishing of multimedia content are generally totally
uncorrelated from each other. From a practical point of
view, it means that a different software must be used
for each of them. A work for converting the various
inputs/outputs in a compatible way is then necessary, and
generally consists in a manual task that is incumbent upon
the user.
132
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
© 2012 ACADEMY PUBLISHER
doi:10.4304/jmm.7.2.132-144

Figure 1. Overview of the presented framework, that covers the whole chain from the 3D digitization of real CH artifacts up to the Web publishing
of the resulting digital 3D copies for archiving, browsing and visualization through Internet.
In this paper, we present a complete system that enables to
create colored 3D digital copies from existing artifacts and
to publish them directly on Internet, through an interactive
visualization based on WebGL technology. This system,
outlined on Figure 1, consists in three stages.
During the first one, acquisition is performed directly by
the user in an intuitive manner thanks to an in-hand digi-
tization device performing 3D scanning in real-time. The
data provided by the scanner, as well as some properties
specific to this kind of devices, are then exploited to
automatically produce a diffuse color texture for the 3D
model. This texture is deprived of the traditional visual
artifacts that may appear due to the presence in the input
pictures of shadows, specular highlights, lighting incon-
sistencies or calibration inaccuracies. Moreover, inasmuch
as high speed scanning systems are often prone to a
lack of accuracy with respect to more classic digitization
technologies, we also estimate a normal texture from these
data, once again in an automatic manner. This texture
captures the finest geometric details that may be missed
during the 3D acquisition, and can then be used afterwards
to enrich the original geometry during visualization.
Once geometry and texture information have been pro-
cessed, the third and last stage of the production pipeline
performs an optimization phase aimed at producing a
compact and Web-friendly version of the data. The output
of this stage will be used for real-time visualization
on commodity platforms. One of our main goals is the
archival and deployment of digital copies using standard,
well-settled and widely accessible technologies: in this
view, we use a standard Web server as our data provider,
and the WebGL technology to visualize and integrate the
digital copy on standard Web pages.
The contributions proposed in this paper can be summa-
rized as follows:
a complete and almost fully automatic pipeline for
the production of 3D multimedia content for Inter-
net applications, covering a chain ranging from the
digitization of real artifacts to the Web publishing of
the produced digital copies;
a texturing method specifically designed for real-
time scanning systems, that accounts for specific
properties of this kind of devices in order to improve
the acquired 3D model with a good quality color
texture without cracks nor illumination related visual
artifacts, as well as a normal texture capturing the
finest geometric features;
the coupling of intuitive acquisition techniques with
the recent paradigms proposed by WebGL technol-
ogy for Web publishing. Hence, the archival and the
sharing of vast item collections becomes possible and
easy also for non expert users.
The remainder of this paper is organized as follows. Sec-
tion II reviews the related work on software approaches
or complete systems for color acquisition, texture recon-
struction and real-time visualization on the Web platform.
Section III presents the two first stages of our system,
namely the in-hand scanner used for the acquisition, as
well as our processing step for generating a digital copy
from the acquired data. The third and last stage dedicated
to the preparation of the digital copy for Web publishing is
then presented in section IV. Finally, section V shows the
results achieved and section VI draws the conclusions.
II. RELATED WORK
A. Real-time 3D scanning
An overview of the 3D scanning and stereo reconstruc-
tion goes well beyond the scope of this paper. We will
mainly focus on systems for real-time, in-hand acquisition
of geometry and/or color. Their main issues are the
availability of technology and the problem of aligning
data in a very fast way.
Concerning the first point, 3D acquisition can be based on
stereo techniques or on active optical scanning solutions.
Among the latter, the most robust approach is based on
the use of fast structured-light scanners [1], where a
high speed camera and a projector are used to recover
the range maps in real-time. The alignment problem is
usually solved with smart implementations of the ICP
algorithm [2], [3], where the most difficult aspect to solve
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
133
© 2012 ACADEMY PUBLISHER

is related to the loop closure during registration.
In the last few years, some in-hand scanning solutions
have been proposed [2], [4], [5]: they essentially differ
on the way projection patterns are handled, and in the
implementation of ICP. None of the proposed systems
takes into account the acquisition of color, although the
one proposed by Weise et al. [5] contains also a color
camera (see next section for a detailed description). This
is essentially due to the low resolution of the cameras,
and to the difficulty of handling the peculiar illumination
provided by the projector. Other systems have been pro-
posed which take into account also the color, but they
are not able to achieve real-time performances [6] or to
reconstruct the geometry in an accurate way [7].
B. Color acquisition and visualization on 3D models
Adding color information to an acquired 3D model is
a complex task. The most flexible approach starts from
a set of images acquired either in a second stage with
respect to the geometry acquisition, or simultaneously but
using different devices. Image-to-geometry registration,
which can be solved by automatic [8]–[10] or semi-
automatic [11] approaches, is then necessary. In our case,
this registration step is not required, because the in-
hand scanning system provides images which are already
aligned to the 3D model.
Once alignment is performed, it is necessary to extract
information about the surface material appearance and
transfer it on the geometry. The most correct way to
represent the material properties of an object is to describe
them through a reflection function (e.g. BRDF), which
attempts to model the observed scattering behavior of a
class of real surfaces. A detailed presentation of its theory
and applications can be found in Dorsey [12]. Unfortu-
nately, state-of-the-art BRDF acquisition approaches rely
on complex and controlled illumination setups, making
them difficult to apply in more general cases, or when
fast or unconstrained acquisition is needed.
A less accurate but more robust solution is the direct
use of images to transfer the color to the 3D model. In
this case, the apparent color value is mapped onto the
digital object’s surface by applying an inverse projection.
In addition to other important issues, there are numerous
difficulties in selecting the correct color when multiple
candidates come from different images.
To solve these problems, a first group of methods selects,
for each surface part, a portion of a representative image
following a specific criterion in most cases, the orthog-
onality between the surface and the view direction [13],
[14]. However, due to the lack of consistency from one
image to another, artifacts are visible at the junctions
between surface areas receiving color from different im-
ages. They can be partially removed by working on these
junctions [13]–[15].
Another group of methods “blends” all image contri-
butions by assigning a weight to each one or to each
input pixel, and by selecting the final surface color as
the weighted average of the input data, as in Pulli et
al. [16]. The weight is usually a combination of various
quality metrics [17]–[19]. In particular, Callieri et al. [20]
presented a flexible weighting system that can be extended
in order to accommodate additional criteria. These meth-
ods provide better visual results and their implementation
permits very complex datasets to be used, i.e. hundreds
of images and very dense 3D models. Nevertheless,
undesirable ghosting effects may be produced when the
starting set of calibrated images is not perfectly aligned.
This problem can be solved, for example, by applying a
local warping using optical flow [21], [22].
Another issue, which is common to all the cited methods,
is the projection of lighting artifacts on the model, i.e.
shadows, highlights, and peculiar BRDF effects, since the
lighting environment is usually not known in advance. In
order to correct (or to avoid to project) lighting artifacts,
two possible approaches include the estimation of the
lighting environment [23] or the use of easily controllable
lighting setups [24].
C. 3D graphics on the Web platform
Since the birth of Internet, content of Web document
has been characterized by several types of media, ranging
from plain text to images, audio or video streams. When
personal computers have started being equipped with fast
enough graphics acceleration hardware, 3D content began
in its turn to have an important role in the multimedia
sphere. The first tools aimed at visualizing 3D models
in Web pages were based on embedded software compo-
nents, such as Java applets or ActiveX controls [25]. Sev-
eral proprietary plug-ins and extensions for Web browsers
were developed, giving evidence at the lack of standard-
ization for this new content type. Beside the developers
fragmentation that arises due to this wide variety of
available tools and to their incompatibilities, the burden
incumbent upon the user for the installation of additional
software components prevented a wide adoption of online
3D content.
Steps toward a standardization have been taken with the
introduction of the Virtual Reality Modeling Language
(VRML) [26] in 1995 and X3D [27] in 2007. However,
even though they have been well-accepted by the com-
munity, the 3D scene visualization was still delegated to
external software components.
The fundamental change happened in 2009 with the
introduction of the WebGL standard [28], promoted by
the Khronos Group [29]. With minor restrictions related to
security issues, the WebGL API is a one-to-one mapping
of the OpenGL|ES 2.0 specifications [30] in JavaScript.
This implies that modern Web browsers, like Google
Chrome or Mozilla Firefox, are able to natively access
the graphics hardware without needing additional plug-
ins or extensions. WebGL being a low-level API, a series
of higher-level libraries have been developed on top of
it. They differ from each other by the programming
paradigm they use, ranging from scene-graph-based in-
terfaces, like Scene.js [31] and GLGE [32], to procedural
paradigms, like SpiderGL [33] and WebGLU [34].
134
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
© 2012 ACADEMY PUBLISHER

Figure 2. The in-hand scanning device used during the first step of the
presented workflow, producing the data flow required for the generation
of cultural artifacts digital copies.
In our pipeline, as it will be shown, we use SpiderGL as
the rendering library for the real-time visualization of the
acquired digital copies.
III. DIGITIZATIO N AND PROCESSING OF CH
ARTIFACTS FOR GENERATING DIGITAL COPIES
Cultural heritage has been a privileged field of applica-
tion for 3D scanning since the beginning of its evolution.
This is due to the enormous variety and variability of the
types of objects that can be acquired. Moreover, archival
and preservation are extremely important issues as well.
Although 3D scanning can be considered now a ”mature”
technology, the acquisition of a large number of objects
can be expensive both in terms of hardware and time
needed for data processing. Very good results can be
achieved by customizing solutions for collections where
objects are almost of the same size and material, but
this can be expensive [35] or hard to extend to generic
cases [36]. Although some low cost and/or hand-held
devices are available, they usually need the placement of
markers on the object, which is something that is hard to
make on CH artifacts. Conversely, the presented method
uses only an affordable scanning system and does not
make any particular assumption on the measured objects
(except the fact that they are manipulable by hand),
neither for the scanning session itself nor for the post-
processing steps.
This section describes the two first stages of our work-
flow: how and with which technology real artifacts can
be easily digitized by the user (section III-A) and how
the resulting data are exploited to recover automatically
a color texture (section III-B) and a texture of details
(section III-C) to enrich the 3D model provided by the
acquisition.
A. Acquisition by in-hand scanning
The first stage of our workflow, namely the one produc-
ing all data required for the creation of digital copies from
cultural artifacts, is based on an in-hand scanner whose
hardware configuration is shown in Figure 2. This scanner,
like most of the high speed digitization systems, is based
on structured light. Shape measurement is performed by
phase-shifting, using three different sinusoidal patterns to
establish correspondences (and then to perform optical
triangulation) between the projector and the two black and
white video cameras. The phase unwrapping, namely how
the different signal periods are demodulated, is achieved
by a GPU stereo matching between both cameras (see [3],
[5] for more details). The whole process produces one
range map in 14ms. Simultaneously, a color video flow
is captured by the third camera. During an acquisition,
the only light source in the scene is the scanner projector
itself, for which the position is always perfectly known.
The scanning can be performed in two different ways.
If the object color is neither red nor brown, it can be
done by holding the object directly by hand. In this case,
occlusions are detected by a hue analysis which produces,
for each video frame, a map of skin presence probability.
Otherwise, a black glove must be used. Although much
less convenient for the scanning itself, it makes the
occlusion detection trivial by simply ignoring dark regions
in the input pictures.
Each scanning session then produces a 3D mesh and
a color video flow. For each frame of this video, the
viewpoint and the position of the light (ie. the scanner
projector) are given, as well as the skin probability map
in the case of a digitization performed by hand. These data
are then used in the next stage of the workflow in order
to produce the color texture and the texture of details, as
explained hereafter, in section III-B.
Even if this stage requires the intervention of the user, the
choice of a real-time scanner to perform the digitization is
particularly appropriate for non expert operators. Indeed,
for reasons already discussed in the introduction, its usage
is particularly easy and intuitive, and does not require
technical knowledge or manual post-processing. Finally,
it must be noticed that our method does not work only
with the presented scanner, but can be implemented for
any high speed digitization device that is based on the
same principle.
B. Recovery of a diffuse color texture
Our texturing method extends the work proposed
in [20] so as to adapt it to the data flow produced by the
scanning system presented above. The idea, summarized
in Figure 3, is to weight each input picture by a mask
(typically a gray scale image) which represents a per-pixel
confidence value. The final color of a given surface point
is then computed as the weighted average of all color
contributions coming from the pictures into which this
point is visible. Masks are built by the composition of
multiple elementary masks, which are themselves com-
puted by image processing applied either on the input
image or on a rendering of the mesh performed from the
same viewpoint.
In the original paper, three criteria related to viewing con-
ditions have been considered for the mask computation:
the distance to the camera, the orientation with respect to
the viewpoint, and the proximity to a step discontinuity.
These criteria have been chosen so as to penalize image
regions that are known to lack of accuracy, in order to
deal with data redundancy from one image to another in
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
135
© 2012 ACADEMY PUBLISHER

Figure 3. The diffuse color texture is computed as the weighted average of all video frames. Weights are obtained by the composition of multiple
elementary masks, each one corresponding to a particular criterion related to viewing, lighting or scanning conditions.
a way that ensures seamless color transitions. More details
about these masks can be found in [20].
Although sufficient to avoid texture cracks, these masks
cannot handle self projected shadows or specular high-
lights since knowledge about the lighting is necessary.
In our case, both positions of the viewpoint and the light
(projector lamp) are always exactly known. Moreover, the
light moves with the scanner, which means that highlights
and shadows are different for each frame, as well as the
illumination direction. We then define the three following
additional masks, that aim at making prevailing image
parts deprived of illumination effects:
Shadows. Since the complete geometric configura-
tion of the scene is known, we can use a simple
shadow mapping algorithm to estimate shadowed
areas, to which a null weight is assigned.
Specular highlights. Conversely to shadows, high-
lights partially depend on the object material, which
is unknown. For this reason, we use a multi-pass
algorithm to detect them. The first pass computes
the object texture without accounting for highlights.
Due to the high data redundancy, the averaging tends
to reduce their visual impact. During the subsequent
passes, highlights are identified by computing the
luminosity difference between the texture obtained
at the previous pass and the input picture. This dif-
ference corresponds to our highlight removal mask.
In practice, only two passes are sufficient.
Projector illumination. This mask aims at avoiding
luminosity loss during the averaging by giving more
influence to surface parts facing the light source. It
corresponds to the dot product between the surface
normal and the line of sight.
We also introduce two other masks to cope with the
occlusions that are inherent to in-hand scanning. Indeed,
if they are ignored, picture regions corresponding to the
operator’s hand may be introduced in the computation,
leading to visible artifacts in the final texture. Thus,
when digitization is performed with the dark glove, an
occlusion mask is simply computed by thresholding pixel
intensities. In the case of a digitization made by hand, the
mask corresponds to the aforementioned skin probability
map produced by the scanner.
Each elementary mask contains values in the range ]0, 1],
zero being excluded in order to ensure that texturing
is guaranteed for every surface point that is visible in
at least one picture. They are multiplied all together
to produce the final mask that selectively weights the
pixels of the corresponding picture. During this operation,
each elementary mask can obviously be applied more
than once. The influence of each criterion can then be
tuned independently, although we empirically determined
default values that work quite well for most cases.
C. Recovery of a texture of details
Despite the fact that in-hand scanning is a really conve-
nient technology, it often leads to a loss of accuracy with
respect to traditional scanning devices, thus preventing the
acquisition of the finest geometric details. Nevertheless,
thanks to the fact that we know the light position for
each video frame, it is possible to partially recover them
by using a photometric stereo approach.
Photometric stereo consists in computing high quality
normal/range maps by taking several photographs from
the same viewpoint but with different illumination direc-
tions [37]–[39], or by moving the object in front of a
camera and a light source that are fixed with respect to
each other [40], [41]. We use here a similar approach for
extracting a normal map from the video flow produced
by the in-hand scanner.
In the following, vectors are assumed to be column
vectors. Let {F
i
} be the set of frames corresponding to the
acquisition sequence. A light position
i
, corresponding
to the scanner projector’s location, is associated to each
frame F
i
. Assuming that the object surface is Lambertian,
the color c
i
observed at a given surface point p in F
i
is
136
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 2, MAY 2012
© 2012 ACADEMY PUBLISHER

Citations
More filters
Proceedings ArticleDOI
01 Sep 2015
TL;DR: The INRAP (French National Institute of Research in Preventive Archaeology) is the leading institution in France for preventive archaeology, and its team is operating in the La Courneuve base, North-East of Paris, where they are producing 3D models of the monitored excavations.
Abstract: The INRAP (French National Institute of Research in Preventive Archaeology) is the leading institution in France for preventive archaeology. Every year, thousands of acres are scientifically excavated and documented by its archaeologists. Our team is operating in the La Courneuve base, North-East of Paris. After several years of producing 3D models of the monitored excavations, we ended up with a lot of 3D data, on a lot of different computers, DVDs or external hard disks. As this filing method was very disorganized, risky and not easily searchable, it seemed to be a good idea to start thinking about a proper way to classify and save the 3D data before losing some of it and to avoid the older to sink into oblivion.
Book ChapterDOI
24 Jun 2015
TL;DR: This work presents an analysis of visual diagnostic features that help classifying potsherds that can be automatically identified by the appropriate combination of the 3D models resulting from the proposed scanning technique, with robust local descriptors and a bag representation approach, thus providing with enough information to achieve competitive performance in the task of automatic classification of potsherd.
Abstract: This work presents a new methodology for efficient scanning and analysis of 3D shapes representing archaeological potsherds, which is based on single-view 3D scanning. More precisely, this work presents an analysis of visual diagnostic features that help classifying potsherds, and that can be automatically identified by the appropriate combination of the 3D models resulting from the proposed scanning technique, with robust local descriptors and a bag representation approach, thus providing with enough information to achieve competitive performance in the task of automatic classification of potsherd. Also, this technique allows to perform other types of analysis such as similarity analysis.
References
More filters
01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations


"From the Digitization of Cultural A..." refers background or methods in this paper

  • ...Photometric stereo consists in computing high quality normal/range maps by taking several photographs from the same viewpoint but with different illumination directions [37]–[39], or by moving the object in front of a camera and a light source that are fixed with respect to each other [40], [41]....

    [...]

  • ...This is not a one-off fortuitous occurrence, but generally true (see additional examples at [37])....

    [...]

  • ...The figures for other values of d are at [37]....

    [...]

  • ...EXPERIMENTAL RESULTS We have designed all experiments such that they are reproducible, and as such, all data and code are freely available at [37]....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations


"From the Digitization of Cultural A..." refers background or methods in this paper

  • ...They differ from each other by the programming paradigm they use, ranging from scene-graph-based interfaces, like Scene.js [31] and GLGE [32], to procedural paradigms, like SpiderGL [33] and WebGLU [34]....

    [...]

  • ...do consider “Finding Motifs in a Database of Shapes” [34]....

    [...]

  • ...In 2007, Xi et al.[34] proposed a method to find image motifs or the most similar pair of images in the image database....

    [...]

  • ...js [31] and GLGE [32], to procedural paradigms, like SpiderGL [33] and WebGLU [34]....

    [...]

Proceedings ArticleDOI
Sivic1, Zisserman1
13 Oct 2003
TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.
Abstract: We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.

6,938 citations

Proceedings Article
01 Jan 2004
TL;DR: This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches and shows that it is simple, computationally efficient and intrinsically invariant.
Abstract: We present a novel method for generic visual categorization: the problem of identifying the object content of natural images while generalizing across variations inherent to the object class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naive Bayes and SVM. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We present results for simultaneously classifying seven semantic visual categories. These results clearly demonstrate that the method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.

5,046 citations


"From the Digitization of Cultural A..." refers background or methods in this paper

  • ...West: A Monograph of the British Desmidiaceae [32], which is still referenced in modern scientific texts, and some of them are vanity publications by “gentlemen scholars”....

    [...]

  • ...They differ from each other by the programming paradigm they use, ranging from scene-graph-based interfaces, like Scene.js [31] and GLGE [32], to procedural paradigms, like SpiderGL [33] and WebGLU [34]....

    [...]

  • ...js [31] and GLGE [32], to procedural paradigms, like SpiderGL [33] and WebGLU [34]....

    [...]

  • ...They are typical examples from the perhaps hundreds of books on Diatoms published during the Victorian era [23][28][32]....

    [...]

Journal ArticleDOI
TL;DR: A comparative evaluation of different detectors is presented and it is shown that the proposed approach for detecting interest points invariant to scale and affine transformations provides better results than existing methods.
Abstract: In this paper we propose a novel approach for detecting interest points invariant to scale and affine transformations. Our scale and affine invariant detectors are based on the following recent results: (1) Interest points extracted with the Harris detector can be adapted to affine transformations and give repeatable results (geometrically stable). (2) The characteristic scale of a local structure is indicated by a local extremum over scale of normalized derivatives (the Laplacian). (3) The affine shape of a point neighborhood is estimated based on the second moment matrix. Our scale invariant detector computes a multi-scale representation for the Harris interest point detector and then selects points at which a local measure (the Laplacian) is maximal over scales. This provides a set of distinctive points which are invariant to scale, rotation and translation as well as robust to illumination changes and limited changes of viewpoint. The characteristic scale determines a scale invariant region for each point. We extend the scale invariant detector to affine invariance by estimating the affine shape of a point neighborhood. An iterative algorithm modifies location, scale and neighborhood of each point and converges to affine invariant points. This method can deal with significant affine transformations including large scale changes. The characteristic scale and the affine shape of neighborhood determine an affine invariant region for each point. We present a comparative evaluation of different detectors and show that our approach provides better results than existing methods. The performance of our detector is also confirmed by excellent matching resultss the image is described by a set of scale/affine invariant descriptors computed on the regions associated with our points.

4,107 citations


"From the Digitization of Cultural A..." refers background in this paper

  • ...Photometric stereo consists in computing high quality normal/range maps by taking several photographs from the same viewpoint but with different illumination directions [37]–[39], or by moving the object in front of a camera and a light source that are fixed with respect to each other [40], [41]....

    [...]

Frequently Asked Questions (19)
Q1. What have the authors contributed in "From the digitization of cultural artifacts to the web publishing of digital 3d collections: an automatic pipeline for knowledge sharing" ?

In this paper, the authors introduce a novel approach intended to simplify the production of multimedia content from real objects for the purpose of knowledge sharing, which is particularly appropriate to the cultural heritage field. The pipeline design is centered on automation and speed, so that it can be used by non expert users to produce multimedia content from potentially large object ’ s collections, like it may be the case in cultural heritage. 

Other appealing directions of work could include the possibility to enrich the Web publishing phase, by auto- matically formatting a Web page based not only on the 3D model, but on other types of data, like text and images. 

Model data is saved under standard file formats: to store geometry information the authors use the Stanford polygon file format (PLY), which support multiple vertex attributes and binary encoding, while Portable Network Graphics (PNG) images are used for color and normal textures. 

The most correct way to represent the material properties of an object is to describe them through a reflection function (e.g. BRDF), which attempts to model the observed scattering behavior of a class of real surfaces. 

As expected, the projector illumination mask tends to increase the influence of image regions that correspond to the most illuminated surface parts, which leads to a conservation of luminosity. 

due to the lack of consistency from one image to another, artifacts are visible at the junctions between surface areas receiving color from different images. 

The diffuse color and detail textures recovery can take up to 5-10 minutes, while the data optimization for Web publishing is almost instantaneous. 

The interaction metaphor known as world-in-hand or trackball has been used to facilitate the artifact inspection by using the mouse. 

Since the complete geometric configura-tion of the scene is known, the authors can use a simple shadow mapping algorithm to estimate shadowed areas, to which a null weight is assigned. 

thanks to the fact that the authors know the light position for each video frame, it is possible to partially recover them by using a photometric stereo approach. 

Tens of high quality 3D models can be made available every day, for any kind of use (archival, study, presentation to the public). 

During an acquisition, the only light source in the scene is the scanner projector itself, for which the position is always perfectly known. 

Beside the developers fragmentation that arises due to this wide variety of available tools and to their incompatibilities, the burden incumbent upon the user for the installation of additional software components prevented a wide adoption of online 3D content. 

Other appealing directions of work could include the possibility to enrich the Web publishing phase, by auto-matically formatting a Web page based not only on the 3D model, but on other types of data, like text and images. 

Another issue, which is common to all the cited methods, is the projection of lighting artifacts on the model, i.e. shadows, highlights, and peculiar BRDF effects, since the lighting environment is usually not known in advance. 

In order to correct (or to avoid to project) lighting artifacts, two possible approaches include the estimation of the lighting environment [23] or the use of easily controllable lighting setups [24].C. 3D graphics on the Web platformSince the birth of Internet, content of Web document has been characterized by several types of media, ranging from plain text to images, audio or video streams. 

It has the property to save a significant amount of space for the vast majority of 3D models, for which, on average, a vertex is referenced by six triangles. 

Even though the proposed approach is generic enough to be used in any application for which producing and sharing digital content about real artifacts present an interest, its three main advantages (namely its ease of use, its high automation and its quickness) make it particularly appropriate to cases where huge collections have to be processed. 

To convey this advantage from memory occupancy to rendering performances, graphics accelerators have introduced a vertex cache capable of storing data associated to up to 32 vertices, thus allowing to reuse the results of a considerable amount of per-vertex calculations.