scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Design, Implementation, and Evaluation of a Point Cloud Codec for Tele-Immersive Video

TL;DR: A subjective study in a state-of-the-art mixed reality system shows that introduced prediction distortions are negligible compared with the original reconstructed point clouds and shows the benefit of reconstructed point cloud video as a representation in the 3D virtual world.
Abstract: We present a generic and real-time time-varying point cloud codec for 3D immersive video. This codec is suitable for mixed reality applications in which 3D point clouds are acquired at a fast rate. In this codec, intra frames are coded progressively in an octree subdivision. To further exploit inter-frame dependencies, we present an inter-prediction algorithm that partitions the octree voxel space in $N \times N \times N$ macroblocks ( $N=8,16,32$ ). The algorithm codes points in these blocks in the predictive frame as a rigid transform applied to the points in the intra-coded frame. The rigid transform is computed using the iterative closest point algorithm and compactly represented in a quaternion quantization scheme. To encode the color attributes, we defined a mapping of color per vertex attributes in the traversed octree to an image grid and use legacy image coding method based on JPEG. As a result, a generic compression framework suitable for real-time 3D tele-immersion is developed. This framework has been optimized to run in real time on commodity hardware for both the encoder and decoder. Objective evaluation shows that a higher rate-distortion performance is achieved compared with available point cloud codecs. A subjective study in a state-of-the-art mixed reality system shows that introduced prediction distortions are negligible compared with the original reconstructed point clouds. In addition, it shows the benefit of reconstructed point cloud video as a representation in the 3D virtual world. The codec is available as open source for integration in immersive and augmented communication applications and serves as a base reference software platform in JTC1/SC29/WG11 (MPEG) for the further development of standardized point-cloud compression solutions.

Summary (2 min read)

Introduction

  • AVALANCHE photodiodes (APDs) and single-photonavalanche diodes , or Geiger-mode APDs, are widely used in optical telecommunications, imaging, and medical diagnostics, where high sensitivity to light in the visible or near-infrared (NIR) ranges is needed [1].
  • Ge is attractive because it has been possible to develop processes allowing integration in CMOS, and low-quality Ge photodiodes have been demonstrated in telecom circuits [5], [6].

II. DEVICE DESIGN AND FABRICATION

  • Fig. 1 shows a schematic of the fabrication process of the PureGaB Ge-on-Si arrays.
  • First, a 30-nm thermal SiO2 is grown on the Si surface followed by a low-pressure CVD SiO2 layer deposition with a thickness of ∼1 μm.

GEOMETRICAL PARAMETERS OF THE THREE DIFFERENT PHOTODIODE PIXELS

  • Etched on the Si surface by a mask that defined the Ge-on-Si photodiode areas.
  • Since the diodes in the multidiode devices were separated by a 1-μm-wide oxide region, the actual Ge area decreases with the number of diodes while the Ge perimeter increases.
  • Ge deposited on surrounding oxide regions from reaching the windows designed for the deposition of the As-doped Ge islands.
  • Next, 800-nm Al/Si1% was sputtered and then removed over the photosensitive junctions by means of selective plasma etching of the Al layer to the oxide covering the PureGaB.
  • This phenomenon was studied in detail in [7].

III. ELECTRICAL CHARACTERIZATION

  • A continuous-flow cryostat system was used for achieving low temperatures.
  • Vacuum conditions were maintained to be free from ice and frosting on the chip surface even for temperatures as low as 77 K achieved with liquid nitrogen cooling.
  • Typical I–V characteristics are shown in Fig. 4 for both room temperature and 77 K measurements of all three device types.
  • The perimeter of the hexa device was 3 times longer than that of the single-diode pixel, i.e., it became less likely that a perfect device would be found.
  • The depletion width in reverse will quickly become larger than the 0.55 μm thickness of the Ge-islands and transverse the nonperfect interface with the Si.

IV. OPTICAL CHARACTERIZATION

  • The Ge photodiodes were mounted in the vacuum chamber and cooled to 77 K.
  • The incident optical power on the devices was attenuated, resulting in 9.8, 0.75, and 0.78 μW for the respective laser sources.
  • The diameter of the incident beam spot was 0.3 cm.
  • The photocurrent was measured using a computer-controlled HP Semiconductor Parameter Analyzer model 4145B.
  • Neutral density filters were used to prevent light saturation and pileup.

SUMMARY OF OVERALL BEST PERFORMANCE COMPARED WITH

  • The responsivity R as a function of reverse voltage Vx was calculated as R(Vx ) = (Iph,Vx − Idark,Vx )/P (2) where the photocurrent Iph,Vx and the dark current Idark,Vx are measured at Vx and P is the incident power at the surface of the pixel.
  • Ge crystal but the leakage current is not yet influenced by impact ionization.
  • The responsivity at 660 nm is mainly not reported for Ge photodiodes.
  • This trend is corroborated by Fig. 10, which plots the maximum optical gain as a function of Vbd.

SUMMARY OF APD PERFORMANCE PARAMETERS FOR THE THREE PIXEL TYPES

  • Inner ellipse for σ and the outer ellipse for 3σ spread.
  • The mean values of maximum optical gain and breakdown, as well as the breakdown standard deviation, are listed in Table III.
  • Just as the lower spread around high current levels of the reverse I–V characteristics of the hexa devices can be explained by the higher probability of perimeter imperfections, this also explains the lower spread with perimeter of the maximum gain to breakdown voltage relationship.

V. CONCLUSION

  • The PureGaB Ge-on-Si photodiodes integrated in 300 × 1 pixel arrays were characterized at cryogenic temperatures for operation as proportional APDs.
  • The main differentiating factor is a very different area and perimeter, while the electrical/optical performance is comparable with very high optical gain of up to 106 measured at 77 K, where the low-voltage dark current is lower than the measurement limit of 2.5 × 10−2 μA/cm2.
  • Ge area but the largest perimeter with an average increase in the Ge thickness of ∼30% due mainly to V-groove formation.
  • All in all, the PureGaB Ge-on-Si offers a very low-complexity CMOS-compatible means of fabricating uniform arrays of the NIR sensitive photodetectors that are operational in linear, avalanche, and even Geiger modes.

Did you find this useful? Give us your feedback

Figures (24)

Content maybe subject to copyright    Report

9955
`
1
Abstractwe present a generic and real-time time-varying
point cloud codec for 3D immersive video. This codec is suitable
for mixed reality applications where 3D point clouds are
acquired at a fast rate. In this codec, intra frames are coded
progressively in an octree subdivision. To further exploit inter-
frame dependencies, we present an inter-prediction algorithm
that partitions the octree voxel space in N times N times N
macroblocks (N=8,16,32). The algorithm codes points in these
blocks in the predictive frame as a rigid transform applied to the
points in the intra coded frame. The rigid transform is computed
using the iterative closest point algorithm and compactly
represented in a quaternion quantization scheme. To encode the
color attributes, we defined a mapping of color per vertex
attributes in the traversed octree to an image grid and use legacy
image coding method based on JPEG. As a result, a generic
compression framework suitable for real-time 3D tele-immersion
is developed. This framework has been optimized to run in real-
time on commodity hardware for both encoder and decoder.
Objective evaluation shows that a higher rate-distortion (R-D)
performance is achieved compared to available point cloud
codecs. A subjective study in a state of art mixed reality system
shows that introduced prediction distortions are negligible
compared to the original reconstructed point clouds. In addition,
it shows the benefit of reconstructed point cloud video as a
representation in the 3D Virtual world. The codec is available as
open source for integration in immersive and augmented
communication applications and serves as a base reference
software platform in JCT1/SC29/WG11 (MPEG) for the further
development of standardized point cloud compression solutions.
Index Terms Data Compression, Video Codecs, Tele
Conferencing, Virtual Reality, Point Clouds
I. INTRODUCTION
ITH increasing capability of 3D data acquisition
devices and computational power, it is becoming easier
to reconstruct highly detailed photo realistic point clouds (i.e.
point sampled data) representing naturalistic content such as
persons or moving objects/scenes [1] [2]. 3D point clouds are
a useful representation for 3D video streams in mixed reality
systems. They do not only allow free-view point rendering
(for example based on splat rendering), but can also be
compositely rendered in a synthetic 3D scene as they provide
full 3D geometry coordinates information. Therefore, this type
of video representation is preferable in mixed reality systems,
such as augmented reality where a natural scene is combined
with synthetic (authored e.g. computer graphics) objects. Or,
vice versa, in immersive virtual rooms where a synthetic scene
is augmented with a live captured natural 3D video stream
representing a user.
Traditionally, 3D polygon meshes have often been used to
represent 3D object based visual data. However, point clouds
are simpler to acquire than 3D polygon meshes as no
triangulation needs to be computed, and they are more
compact as they do not require the topology/connectivity
information to be stored. Therefore, 3D point clouds are more
suitable for real-time acquisition and communication at a fast
rate. However, realistic reconstructed 3D Point clouds may
contain hundreds of thousands up to millions of points and
compression is critical to achieve efficient and real-time
communication in bandwidth limited networks.
Compression of 3D Point clouds has received significant
attention in recent years [3] [4] [5] [6]. Much work has been
done to efficiently compress single point clouds progressively,
such that lower quality clouds can be obtained from partial bit
streams (i.e. a subset of the original stream). To compare
different solutions, often the compression rate and the
geometric distortion have been evaluated. Sometimes the
algorithmic complexity was analyzed, and in addition schemes
for attribute coding (i.e. colors, normals) have been proposed.
While these approaches are a good starting point, in the
context of immersive, augmented and mixed reality
communication systems, several other additional factors are of
importance.
One important aspect in these systems is real-time
performance for encoding and decoding. Often, in modern
systems, parallel computing that exploits multi-core
architectures available in current computing infrastructures is
utilized. Therefore, the parallelizability becomes important.
Second, as in these systems point cloud sequences are
captured at a fast rate, inter-frame redundancy can be
exploited to achieve a better compression performance via
inter-prediction which was usually not considered in existing
static point cloud coders.
Furthermore, as tele-immersive codecs are intended for
systems with real users subjective quality assessment is
needed to assess the performance of the proposed codec in
addition to the more common objective quality assessment.
Further, a codec should be generic, in a sense that it can
compress point cloud sequences coming from different setups
with different geometric properties (i.e. sampling density,
manifoldness of the surface etc.).
With these requirements in mind we introduce a codec for
time varying 3D Point clouds for augmented and immersive
3D video. The codec is parallelizable, generic, and operates in
Design, Implementation and Evaluation of a
Point Cloud Codec for Tele-Immersive Video
R. Mekuria, Student Member IEEE, K. Blom, P. Cesar., Member, IEEE
W

9955
`
2
real time on commodity hardware. In addition, it exploits
inter-frame redundancies. For evaluation, we propose an
objective quality metric that corresponds to common practice
in video and mesh coding. In addition to objective evaluation
using this metric, subjective evaluation in a realistic mixed
reality system is performed in a user study with 20 users.
The codec is available as open source and currently serves as
the reference software framework for the development of a
point cloud compression technology standard in JCT1/
SC29/WG11 (MPEG)
1
. To facilitate benchmarking, the
objective quality metrics and file loaders used in this work
have all been included in the package. In addition, the point
cloud test data is available publicly.
The rest of the paper is structured as follows. In section III
we detail the lossy attribute/color coding and progressive
decoding scheme. In section IV the inter-predictive coding
algorithm is detailed. Section V details the experimental
results including subjective test results in a mixed reality
system, objective rate distortion and real-time performance
assessment. In section II we provide the overview of the
codec and in this section, the context and related work.
A. Contributions
In this work we propose a novel compression framework for
progressive coding of time varying point clouds for 3D
immersive and augmented video. The major contributions are:
Generic Compression Framework: that can be used to
compress point clouds with arbitrary topology (i.e.
different capturing setups or file formats)
Inter-Predictive Point Cloud Coding: correlation
between subsequent point clouds in time is exploited to
achieve better compression performance
Efficient Lossy Color Attribute Coding: the framework
includes a method for lossy coding of color attributes that
takes advantage of naturalistic source of the data using
existing image coding standards
Progressive Decoding: the codec allows a lower quality
point cloud to be reconstructed by a partial bit stream
Real-Time Implementation: the codec runs in (near)
real-time on commodity hardware, benefiting from multi-
core architectures and a parallel implementation.
B. Augmented and Immersive 3D Video vs FVV and 3DTV
There exist quite a few technologies for coding 3D Video. In
this section we further motivate the additional need for point
cloud compression for immersive and augmented 3D Video.
3D Video often refers to 3D television (3DTV) or free
viewpoint video (FVV). 3DTV creates a depth perception,
while FVV enables arbitrary viewpoint rendering. Existing
video coding standards such as AVC MVV [7] and MVV-D
[8] can support these functionalities via techniques from
(depth) image based rendering (DIBR). Arbitrary views can
be interpolated from decompressed view data using spherical
interpolation and original camera parameter information. This
enables free view point rendering without having explicit
geometry information available. However, in mixed reality
1
http://wg11.sc29.org/svn/repos/MPEG-04/Part16-
Animation_Framework_eXtension_(AFX)/trunk/3Dgraphics/
systems, explicit object geometry information is needed to
facilitate composite rendering and object navigation.
In such systems, rendering is usually done based on object
geometry such as meshes or 3D point clouds using generic
computer graphics API’s (e.g. OpenGL, Direct3D etc.). This is
in line with common practice in computer games and virtual
worlds. Therefore, to enable true convergence between
naturalistic and synthetic content in immersive and augmented
reality, object based video compression of point clouds and
3D Meshes is a remaining challenge. To illustrate this need
further, we show some examples from a practical tele-
immersive system that we have been working on in the last
four years, the Reverie system [9]. The Reverie system is an
immersive communication system that enables online
interaction in 3D virtual rooms, either represented by a 3D
avatar or 3D object based video stream based on 3D Point
Clouds or Meshes. As can be seen in Fig.1 each representation
can be rendered compositely in a common 3D space. This 3D
Fig. 2 Low cost point cloud capturing setup, point clouds are reconstructed
from multiple 3D input streams of calibrated Microsoft Kinect devices.
Fig.3 Example schematic of 3D point cloud reconstruction for immersive video.
Many variations can be implemented
Multi
Camera Capture
Align
&
Fuse
intrinsic
camera
parameters
reconstruct
Segmeted
3D Point Cloud

9955
`
3
video stream is a segmentation of the real 3D user. Fig. 2
illustrates the low cost 3D point cloud acquisition system
deployed in this system. It uses multiple calibrated Kinect
sensors. Original color plus depth streams (RGB+D) are fused
and the foreground moving object is segmented as a point
cloud. The basic steps of such an algorithm are shown in Fig.
3. The output is a point cloud ready for remote rendering.
This raises the need for efficient 3D point cloud video
compression and transmission.
Alternatively, by running the algorithm in Fig. 3 at the remote
site on decompressed RGB+D data, one could use RGB + D
video coders directly on sensor data. We have done some
experimentation comparing RGB+D coding using MPEG-4
AVC simulcast (QP8-QP48, zero latency, x264 encoder) and
point cloud compression (8-9-10 bits per direction component)
using the point clouds in [2] reconstructed from 5 RGB + D
streams (data available in [10], captured with Microsoft
Kinect). Byte Sizes of132Kbyte-800Kbyte per frame were
achieved by using RGB-D coding, while bit-rates of 40 -265
KiloBytes were obtained using the octree based point cloud
compression. In addition, the distortion introduced to the point
cloud, which could be unpredictable for low bit rate RGB-D
coding, is bound by the octree resolution in an octree based
point cloud codec. Last, the experiments showed lower
encoding/decoding latencies when using point cloud
compression. As in tele-presence and immersive reality low
latency, low bit-rate and bound distortion are critical; we
develop compression for time varying point clouds.
C. Related Work
Point Cloud Compression: There has been some work on
point cloud compression in the past, but most works only aim
at compression of static point clouds, instead of time-varying
point clouds. Such a codec was introduced in [11] based on
octree composition. This codec includes bit-reordering to
reduce the entropy of the occupancy codes that represent
octree subdivisions. This method also includes color coding
based on frequency of occurrence (colorization) and normal
coding based on efficient spherical quantization. A similar
work in [12] used surface approximations to predict
occupancy codes, and an octree structure to encode color
information. Instead, the work in [3] introduced a real-time
octree based codec that can also exploit temporal redundancies
by XoR operations on the octree byte stream. This method can
operate in real-time, as the XoR prediction is extremely simple
and fast. A disadvantage of this approach is that by using
XoR, only geometry and not colors can be predicted, and that
the effectiveness is only significant for scenes with limited
movement (which is not the case in out envisioned application
with moving humans). Last, [5] introduced a time varying
point cloud codec that can predict graph encoded octree
structures between adjacent frames. The method uses spectral
wavelet based features to achieve this, and an encoding of
differences, resulting in a lossless encoding. This method also
includes the color coding method from [6], which defines a
small sub-graphs based on the octree of the point cloud. These
subgraphs are then used to efficiently code the colors by
composing them on the eigenvectors of the graph Laplacian.
Mesh Compression: 3D objects are often coded as 3D
Meshes, for which a significant number of compression
methods have been developed. Mesh codecs can be
categorized as progressive, i.e. allowing a lower resolution
rendering from partial bit streams and single rate (only
decoding at full resolution is available) [13]. For networked
transmission, progressive methods have generally been
preferred, but for 3D immersive Video, single rate can also be
useful, as they introduce less encoder computation and bit-rate
overhead [14]. Especially, some recent work has aimed at
compression of object based immersive 3D video [15] [16]
[17] uses single rate coding. While these methods are
promising, it seems that methods based on 3D Point clouds
can result in coding with even less overhead and more flexible
progressive rendering capabilities, as the format is simpler to
acquire and process. Last, there have been methods defined in
the international standards for mesh compression [18] [19]
which is greatly beneficial for interoperability between
devices and services. These methods have been mostly
designed for remote rendering, and have low decoder
complexity and a slightly higher encoder complexity; In 3D
immersive and augmented 3D Video coding, having both low
encoder and decoder complexity are important (analogous to
video coding in video conferencing systems compared to
video on demand).
Multi-View Plus Depth Compression: Multi-View plus
depth representation was considered for storing video and
depth maps from multiple cameras [8] [7]. Arbitrary
viewpoints are then rendered by interpolation between
different camera views using techniques from depth image
based rendering (DIBR) to enable free viewpoint. While these
formats can be used to represent the visual 3D scene, they do
not explicitly store 3D object geometries, which is useful for
composite rendering in immersive communications and
augmented reality. Therefore, these formats are not directly
applicable to immersive and augmented 3D object based
video.
Compression of Immersive 3D Video: The specific
design requirements posed by 3D Video for immersive
communications have been addressed before in [20] and [21].
The first work introduces a compression scheme of polygon
based 3D video (in this case a segmentation and meshing in a
color + depth video), and a purely perception based
compression mechanism combined with entropy encoding. In
this work the level of detail (polygon size) is adapted to match
user/application needs. These needs were derived offline in
subjective studies with pre-recorded stimuli. In [21] the
MPEG-4 video codec was used in a smart way together with
camera parameters. Both methods have been developed in a
context that combines the capturing, compression and
networking components towards the specific tele-immersive
configuration. This lack of generality makes it harder to assess
the relevance and performance of the video compression
compared to other methods. In our work we aim to provide a
generic 3D point cloud codec that can be compared with other
codecs for point cloud compression and applied to any
capturing setup that produces 3D point cloud data.

9955
`
4
II. OVERVIEW OF POINT CLOUD CODING SCHEME
We first outline the requirements for point cloud compression,
after which we detail the point cloud coding schematic (Fig. 4)
A. Requirements and Use Cases
3D Video based on point clouds is relevant for augmented and
mixed reality applications as shown in Fig 1. In addition, it has
various applications (e.g., to store data used in geographic
information systems and 3D printing applications). We focus
on time-varying point cloud compression for 3D immersive
and augmented video and follow the requirements for point
cloud compression as defined in the MPEG-4 media standard
[22]:
Partial Bit Stream Decoding: it is possible to decode a coarse
point cloud and refine it.
Lossless Compression: the reconstructed data is
mathematically identical to the original.
Lossy Compression: compression with parameter control of
the bit-rate.
Time Variations and Animations: temporal variations i.e.
coding of point cloud sequences, should be supported
Low Encoder and Decoder Complexity: this is not a strict
requirement but desirable for building real-time systems.
B. Schematic Overview
The architecture design of the proposed 3D Video based on
point clouds combines features from common 3D Point cloud
(octree based) codecs ( [12] [11]) and common hybrid video
codecs such as MPEG-4 p.10 AVC and HEVC (that include
block based motion compensation). Based on the numbering
in the diagram in Fig 4 we detail the most important
components in the codec, which also correspond to our main
contributions.
Bounding Box Alignment and filter (1): A specific
algorithm for bounding box computation and alignment has
been developed. This algorithm is applied to the point cloud
before the octree composition (2) is performed. This algorithm
aligns subsequent frames by computing a common expanded
bounding box. This allows common axis and range
representation between frames which facilitates the inter-
predictive coding and a consistent assignment of the octree
bits. In addition, it includes an outlier filter, as our
experiments have shown that outlier points can reduce both
the effectiveness of the bounding box alignment and of inter
predictive coding, and should be filtered out from the input
clouds. See III.A for all details.
Constructing the Progressive Octree (2): The encoder
recursively subdivides the point cloud aligned bounding box
into eight children. Only non-empty children voxels continue
to be subdivided. This results in an octree data structure,
where the position of each voxel is represented by its cell
center. Its attributes (color) is set to the average of enclosed
points and needs to be coded separately by the attribute
encoder. Each level in the octree structure can represent a
level of detail (LoD). The final level of detail is specified by
the octree bit settings. This is a common scheme for both
regularizing the unorganized point cloud and for compressing
it. (See III.B for all details).
Coding of the Occupancy Codes (3) (LOD): To code the
octree structure efficiently, the first step is the encoding of
(LOD’s) as sub-divisions of non-empty cells. Contrary to
previous work [12] [11], we develop a position coder that uses
an entropy coder in the corresponding LoD’s. In addition, by
avoiding surface approximations as in [12] and occupancy re-
ordering as in [11] we keep the occupancy code encoder as
simple and fast (See III.B for all details). In particular we set
the level of decodable LoD’s to two, thus reducing overhead.
Fig. 4 Schematic of time-varying point cloud compression codec

9955
`
5
Coding of Color Attributes (4): In order to code the color
attributes in the point cloud in a highly efficient manner, we
integrated methods based on mapping the octree traversal
graph to a JPEG image grid, exploiting correlation between
color attributes in final LoD’s. The rationale of the JPEG
mapping is that as the color attributes result from natural
inputs, comparable correlation between adjacent pixels/points
exists. By mapping the octree traversal graph to a JPEG grid
we aim to exploit this correlation in an easy fast way suitable
for real-time 3D tele-immersion.
Inter Predictive Frame Coding (5): We introduce a
method for inter-predictive encoding of frames based on
previous inputs that includes both rigid transform estimation
and rigid transform compensation. We first compute a
common bounding box between the octree structures of
consecutive frames i.e. block (1). Then we find shared larger
macroblocks on K (=4) levels above the final LoD voxel size.
For blocks that are not empty in both frames, color variance
and the difference in point count are used to decide if we
compute a rigid based transform based on the iterative closest
point algorithm or not. If this is the case and the iterative
closes point algorithm is successful (converges) the computed
rigid transformation can be used as a predictor. The rigid
transform is the stored compactly in a quaternion and
translation quantization schema (6). This prediction scheme
can typically save upto 30% bit-rate. This is important to
reduce the data volume at high capture rates.
Coder Control (8) and Header Formatting (7): The coder
uses pre-specified codec settings (via a configuration file)
which include the octree bit allocation for (2), macroblock size
and prediction method for (5) and color bit allocation and
mapping mode for (4). In addition it includes the settings for
the filter and the color coding modes. We use common header
format and entropy coding based on a static range coder to
further compress these fields.
III. INTRA FRAME CODER
The intra frame coder consists of three stages (1, 2 and 3 in
Fig 4). It first filters outliers and computes a bounding box.
Second, it performs an octree composition of space. Third,
entropy encoding of the resulting occupancy codes is
performed.
A. Bounding Box Normalization and Outlier Filter
The bounding box of a mesh or point cloud frame is typically
computed as a box with a lower corner (x_min,y_min,z_min)
and an upper corner (x_max,y_max,z_max). This bounding
box is used as the root level of the octree. The bounding box
can change from frame to frame as it is defined by these
extrema. As a consequence, the correspondence of octree
voxel coordinates between subsequent frames is lost, which
makes inter-prediction much harder. To mitigate this problem,
Fig 5 illustrates a scheme that aims to reduce bounding box
changes in adjacent frames. The scheme enlarges (expands)
the bounding box with a certain percentage δ, and then, if the
bounding box of the subsequent frame fits this bounding box,
it can be used instead of the original bounding box. As shown
in Fig 5 the bounding box of an intra frame is expanded from
the BB_IE (x_min bb_exp, y_min bb_exp, z_min
bb_exp) to upper corner (x_max+ bb_exp, y_max+
bb_exp,z_max+bb_exp), where bb_exp was computed from δ
and the ranges of the original bounding box. Then, the
subsequent P cloud is loaded and if a bounding box computed
for this frame , BB_P, fits the expanded bounding box BB_IE,
the P is normalized on BB_IE. BB_IE is subsequently used as
the octree root and the frame P can be used by the predictive
coding algorithm presented in section IV. Otherwise, the
expanded bounding box is computed as BB_PE and the cloud
P is normalized on BB_PE. In this case, the cloud P is intra
coded instead, as our predictive coding algorithm only works
when the bounding boxes are aligned.
The bounding box operation is an important pre-processing
step to enable efficient intra and inter-coding schemes. We
assume that point clouds represent segmented objects
reconstructed from multi-camera recordings and in our
experiments we also work with such data. We discovered that
in some cases the segmentation of the main object (human or
object) is not perfect and several erroneous
background/foreground points exist in the original cloud. As
these points can degrade the bounding box computation and
alignment scheme, we have implemented filters to pre-process
the point cloud. Our method is based on a radius removal
filter. It removes points with less than K neighbors in radius R
from the point cloud. This filter removes erroneously
segmented points from the point cloud and improves the
performance of the subsequent codec operations.
Fig 5.Bounding box alignment scheme
B. Octree Subdivision and Occupancy Coding
Fig 6.Octree Composition of Space
I input
cloud
compute BB
compute BB
P input
cloud
expand BB I
normalize I on
BB_IE
normalize P on
BB_IE
P Coding of P
I Coding of I
expand BB P
normalize P on
BB_PE
I Coding of P
BB_P fits BB_IE
BB_I
BB_P
BB_IE
BB_IE
N
BB_IE
Y
BB_PE

Citations
More filters
Journal ArticleDOI
TL;DR: The main developments and technical aspects of this ongoing standardization effort for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels are introduced.
Abstract: Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing the real world in multiple dimensions and in presenting it to users in an immersible fashion has never been higher. Distributing such representations enables users to freely navigate in multi-sensory 3D media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Efficient compression technologies well adopted in the content chain are in high demand and are key components to democratize augmented and virtual reality applications. Moving Picture Experts Group, as one of the main standardization groups dealing with multimedia, identified the trend and started recently the process of building an open standard for compactly representing 3D point clouds, which are the 3D equivalent of the very well-known 2D pixels. This paper introduces the main developments and technical aspects of this ongoing standardization effort.

470 citations


Cites background or methods from "Design, Implementation, and Evaluat..."

  • ...Previous compression solutions for volumetric visual representations either focused on computer-generated content [1], [4] or suffered from low spatial and temporal compression performance [5], [6] when...

    [...]

  • ...To have a baseline for determining target bitrates and distortions, a recent hybrid octree-image point cloud codec for tele-immersive video [6] was chosen as anchor....

    [...]

  • ...In [6], an extension to this framework was introduced, combining the octree-based codec with a common image codec for color attribute coding....

    [...]

Proceedings ArticleDOI
01 Sep 2019
TL;DR: In this article, the authors proposed a data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. And they cast the decoding process as a binary classification of the point cloud occupancy map.
Abstract: Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https://github.com/mauriceqch/pcc_geo_cnn.

118 citations

Proceedings ArticleDOI
26 Jul 2019
TL;DR: A comprehensive overview of the3D-PCC state-of-the-art methods is proposed, including 1D traversal compression, 2D-oriented techniques, which take leverage of existing 2D image/video compression technologies and finally purely 3D approaches, based on a direct analysis of the 3D data.
Abstract: In recent years, 3D point clouds have enjoyed a great popularity for representing both static and dynamic 3D objects. When compared to 3D meshes, they offer the advantage of providing a simpler, denser and more close-to-reality representation. However, point clouds always carry a huge amount of data. For a typical example of a point cloud with 0.7 million points per 3D frame at 30 fps, the point cloud raw video needs a bandwidth around 500MB/s. Thus, efficient compression methods are mandatory for ensuring the storage/transmission of such data, which include both geometry and attribute information. In the last years, the issue of 3D point cloud compression (3D-PCC) has emerged as a new field of research. In addition, an ISO/MPEG standardization process on 3D-PCC is currently on-going. In this paper, a comprehensive overview of the 3D-PCC state-of-the-art methods is proposed. Different families of approaches are identified, described in details and summarized, including 1D traversal compression, 2D-oriented techniques, which take leverage of existing 2D image/video compression technologies and finally purely 3D approaches, based on a direct analysis of the 3D data.

95 citations


Cites background or methods or result from "Design, Implementation, and Evaluat..."

  • ...Here, the rigid-body motion estimation is based on the Iterative Closest Points (ICP) algorithm [Paul J. Besl 1992], which has been also used for similar purpose in [Mekuria et al. 2017] 6⃝....

    [...]

  • ...Authors show that the method achieves significant gains in compression ratio when compared to the JPEGbased compression approach introduced in [Mekuria et al. 2017] 6⃝....

    [...]

  • ...…on the multi-view video coding algorithms introduced in [Merkle et al. 2007], which exploit both temporal and inter-view statistical dependencies under a prediction framework, several coding schemes targeting real-time compression, are proposed in [Mekuria et al. 2017] 6⃝ and [Lien et al. 2009] 7⃝....

    [...]

  • ...Besl 1992], which has been also used for similar purpose in [Mekuria et al. 2017] 6 ⃝....

    [...]

  • ...2007], which exploit both temporal and inter-view statistical dependencies under a prediction framework, several coding schemes targeting real-time compression, are proposed in [Mekuria et al. 2017] 6 ⃝ and [Lien et al....

    [...]

Journal ArticleDOI
TL;DR: This survey focuses on the edge-enabled Metaverse to realize its ultimate vision and explores how blockchain technologies can aid in the interoperable development of the Metaverse, not just in terms of empowering the economic circulation of virtual user-generated content but also to manage physical edge resources in a decentralized, transparent, and immutable manner.
Abstract: Dubbed “the successor to the mobile Internet,” the concept of the Metaverse has grown in popularity. While there exist lite versions of the Metaverse today, they are still far from realizing the full vision of an immersive, embodied, and interoperable Metaverse. Without addressing the issues of implementation from the communication and networking, as well as computation perspectives, the Metaverse is difficult to succeed the Internet, especially in terms of its accessibility to billions of users today. In this survey, we focus on the edge-enabled Metaverse to realize its ultimate vision. We first provide readers with a succinct tutorial of the Metaverse, an introduction to the architecture, as well as current developments. To enable ubiquitous, seamless, and embodied access to the Metaverse, we discuss the communication and networking challenges and survey cutting-edge solutions and concepts that leverage next-generation communication systems for users to immerse as and interact with embodied avatars in the Metaverse. Moreover, given the high computation costs required, e.g., to render 3D virtual worlds and run data-hungry artificial intelligence-driven avatars, we discuss the computation challenges and cloud-edge-end computation framework-driven solutions to realize the Metaverse on resource-constrained edge devices. Next, we explore how blockchain technologies can aid in the interoperable development of the Metaverse, not just in terms of empowering the economic circulation of virtual user-generated content but also to manage physical edge resources in a decentralized, transparent, and immutable manner. Finally, we discuss the future research directions towards realizing the true vision of the edge-enabled Metaverse.

95 citations

Proceedings ArticleDOI
26 May 2020
TL;DR: This paper introduces PCQM, a full-reference objective metric for visual quality assessment of 3D point clouds, an optimally-weighted linear combination of geometry-based and color-based features that outperforms all previous metrics in terms of correlation with mean opinion scores.
Abstract: 3D point clouds constitute an emerging multimedia content, now used in a wide range of applications. The main drawback of this representation is the size of the data since typical point clouds may contain millions of points, usually associated with both geometry and color information. Consequently, a significant amount of work has been devoted to the efficient compression of this representation. Lossy compression leads to a degradation of the data and thus impacts the visual quality of the displayed content. In that context, predicting perceived visual quality computationally is essential for the optimization and evaluation of compression algorithms. In this paper, we introduce PCQM, a full-reference objective metric for visual quality assessment of 3D point clouds. The metric is an optimally-weighted linear combination of geometry-based and color-based features. We evaluate its performance on an open subjective dataset of colored point clouds compressed by several algorithms; the proposed quality assessment approach outperforms all previous metrics in terms of correlation with mean opinion scores.

95 citations


Additional excerpts

  • ...the efficient compression of this representation [1]–[5]....

    [...]

References
More filters
Proceedings ArticleDOI
09 May 2011
TL;DR: PCL (Point Cloud Library) is presented, an advanced and extensive approach to the subject of 3D perception that contains state-of-the art algorithms for: filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation.
Abstract: With the advent of new, low-cost 3D sensing hardware such as the Kinect, and continued efforts in advanced point cloud processing, 3D perception gains more and more importance in robotics, as well as other fields. In this paper we present one of our most recent initiatives in the areas of point cloud perception: PCL (Point Cloud Library - http://pointclouds.org). PCL presents an advanced and extensive approach to the subject of 3D perception, and it's meant to provide support for all the common 3D building blocks that applications need. The library contains state-of-the art algorithms for: filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation. PCL is supported by an international community of robotics and perception researchers. We provide a brief walkthrough of PCL including its algorithmic capabilities and implementation strategies.

4,501 citations


"Design, Implementation, and Evaluat..." refers methods in this paper

  • ...The vnn_deg is efficiently computed via a K -d tree in L2 distance norm based on algorithms available in [4]...

    [...]

Journal ArticleDOI
31 Jan 2011
TL;DR: An overview of the algorithmic design used for extending H.264/MPEG-4 AVC towards MVC is provided and a summary of the coding performance achieved by MVC for both stereo- and multiview video is provided.
Abstract: Significant improvements in video compression capability have been demonstrated with the introduction of the H.264/MPEG-4 advanced video coding (AVC) standard. Since developing this standard, the Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) has also standardized an extension of that technology that is referred to as multiview video coding (MVC). MVC provides a compact representation for multiple views of a video scene, such as multiple synchronized video cameras. Stereo-paired video for 3-D viewing is an important special case of MVC. The standard enables inter-view prediction to improve compression capability, as well as supporting ordinary temporal and spatial prediction. It also supports backward compatibility with existing legacy systems by structuring the MVC bitstream to include a compatible “base view.” Each other view is encoded at the same picture resolution as the base view. In recognition of its high-quality encoding capability and support for backward compatibility, the stereo high profile of the MVC extension was selected by the Blu-Ray Disc Association as the coding format for 3-D video with high-definition resolution. This paper provides an overview of the algorithmic design used for extending H.264/MPEG-4 AVC towards MVC. The basic approach of MVC for enabling inter-view prediction and view scalability in the context of H.264/MPEG-4 AVC is reviewed. Related supplemental enhancement information (SEI) metadata is also described. Various “frame compatible” approaches for support of stereo-view video as an alternative to MVC are also discussed. A summary of the coding performance achieved by MVC for both stereo- and multiview video is also provided. Future directions and challenges related to 3-D video are also briefly discussed.

683 citations


"Design, Implementation, and Evaluat..." refers methods in this paper

  • ...3) Multiview Plus Depth Compression: Multiview plus depth representation was considered for storing video and depth maps from multiple cameras [7], [8]....

    [...]

  • ...Existing video coding standards, such as Advanced Video Coding (AVC) Multi View Video (MVV) [7] and MVV-D [8], can support these functionalities via techniques from (depth) image-based rendering (DIBR)....

    [...]

Proceedings ArticleDOI
29 Jul 2006
TL;DR: A progressive compression method for point sampled models that is specifically apt at dealing with densely sampled surface geometry and it is demonstrated that additional point attributes, such as color, can be well integrated and efficiently encoded in this framework.
Abstract: In this paper we present a progressive compression method for point sampled models that is specifically apt at dealing with densely sampled surface geometry. The compression is lossless and therefore is also suitable for storing the unfiltered, raw scan data. Our method is based on an octree decomposition of space. The point-cloud is encoded in terms of occupied octree-cells. To compress the octree we employ novel prediction techniques that were specifically designed for point sampled geometry and are based on local surface approximations to achieve high compression rates that outperform previous progressive coders for point-sampled geometry. Moreover we demonstrate that additional point attributes, such as color, which are of great importance for point-sampled geometry, can be well integrated and efficiently encoded in this framework.

406 citations


"Design, Implementation, and Evaluat..." refers background or methods in this paper

  • ...Note that we made comparisons with the available real-time point cloud codec in [3] with octree composition and DPCM coding....

    [...]

  • ...Surprisingly, the JPEG color coding method does not introduce significant subjective distortion, as this method codes at up to ten times lower bitrates compared with the 8-b DPCM color data....

    [...]

  • ...The real-time results for intra encoding for different LoDs and both proposed (JPEG) and original DPCM-based color coding are shown in Fig....

    [...]

  • ...to [11] and [12], we develop a position coder that uses an...

    [...]

  • ...In Section V-E, we will show in the subjective studies that the color quality degradation introduced by the color coding method is negligible even compared with 8-b DPCM-based coding....

    [...]

Journal ArticleDOI
TL;DR: This paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data, and develops and integrated a novel encoder control that guarantees that high quality intermediate views can be generated based on the decoded data.
Abstract: This paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data. In addition to the known concept of disparity-compensated prediction, inter-view motion parameter, and inter-view residual prediction for coding of the dependent video views are developed and integrated. Furthermore, for depth coding, new intra coding modes, a modified motion compensation and motion vector coding as well as the concept of motion parameter inheritance are part of the HEVC extension. A novel encoder control uses view synthesis optimization, which guarantees that high quality intermediate views can be generated based on the decoded data. The bitstream format supports the extraction of partial bitstreams, so that conventional 2D video, stereo video, and the full multi-view video plus depth format can be decoded from a single bitstream. Objective and subjective results are presented, demonstrating that the proposed approach provides 50% bit rate savings in comparison with HEVC simulcast and 20% in comparison with a straightforward multi-view extension of HEVC without the newly developed coding tools.

365 citations


"Design, Implementation, and Evaluat..." refers methods in this paper

  • ...3) Multiview Plus Depth Compression: Multiview plus depth representation was considered for storing video and depth maps from multiple cameras [7], [8]....

    [...]

  • ...Existing video coding standards, such as Advanced Video Coding (AVC) Multi View Video (MVV) [7] and MVV-D [8], can support these functionalities via techniques from (depth) image-based rendering (DIBR)....

    [...]

Proceedings ArticleDOI
14 May 2012
TL;DR: This work presents a novel lossy compression approach for point cloud streams which exploits spatial and temporal redundancy within the point data and presents a technique for comparing the octree data structures of consecutive point clouds.
Abstract: We present a novel lossy compression approach for point cloud streams which exploits spatial and temporal redundancy within the point data. Our proposed compression framework can handle general point cloud streams of arbitrary and varying size, point order and point density. Furthermore, it allows for controlling coding complexity and coding precision. To compress the point clouds, we perform a spatial decomposition based on octree data structures. Additionally, we present a technique for comparing the octree data structures of consecutive point clouds. By encoding their structural differences, we can successively extend the point clouds at the decoder. In this way, we are able to detect and remove temporal redundancy from the point cloud data stream. Our experimental results show a strong compression performance of a ratio of 14 at 1 mm coordinate precision and up to 40 at a coordinate precision of 9 mm.

341 citations


"Design, Implementation, and Evaluat..." refers background or methods in this paper

  • ...we made comparisons with the available real-time point cloud codec in [3] with octree composition and DPCM coding....

    [...]

  • ...The work in [3] introduced a real-time octree-based codec that can also exploit temporal redundancies by XOR operations on the octree byte stream....

    [...]

  • ...Compression of 3D point clouds has received significant attention in recent years [3]–[6]....

    [...]

  • ...Instead, we follow a modified approach, taken from [3] (based on a carry-less-byte-based range coder), which we applied to the different decodable LoDs....

    [...]

Frequently Asked Questions (14)
Q1. What are the contributions in this paper?

The authors present a generic and real-time time-varying point cloud codec for 3D immersive video. To further exploit interframe dependencies, the authors present an inter-prediction algorithm that partitions the octree voxel space in N times N times N macroblocks ( N=8,16,32 ). This framework has been optimized to run in realtime on commodity hardware for both encoder and decoder. A subjective study in a state of art mixed reality system shows that introduced prediction distortions are negligible compared to the original reconstructed point clouds. The codec is available as open source for integration in immersive and augmented communication applications and serves as a base reference software platform in JCT1/SC29/WG11 ( MPEG ) for the further development of standardized point cloud compression solutions. 

Mesh Compression: 3D objects are often coded as 3DMeshes, for which a significant number of compression methods have been developed. 

To evaluate the point cloud quality, the authors deploy a full reference quality metric that combines common practices from 3D mesh and Video Compression: a PSNR metric based on point to point symmetric root mean square distances. 

The hardware used in the experiments are a Dell Precision M6800 PC with intel core i7-4810MQ 2,8 MHz CPU and 16.0 GB of Ram running 64 win7 Operating system, A Dell Precision T3210 (Xeon 3.7 Ghz), and a custom built system with i7 3.2 GhZ running Ubuntu Linux. 

The main aim of their experiments is to check that the developed codec does not introduce significant extra subjective distortions. 

In addition, the authors have been able to speed up the inter-predictive encoding significantly by parallelizing its execution based on OpenMP for multi core intel architectures (we measured up to 20 % improvement on the windows platforms). 

The results show that the degradation introduced by the codecs is negligible and that the point cloud has an added value in terms of “feeling together” compared to simple avatar representations, highlighting the importance of point clouds in these applications and the importance of work on compression of point clouds and its standardization for immersive and augmented reality systems. 

In V.E the authors will show in the subjective studies that the color quality degradation introduced by the color coding method is negligible even compared to 8-bits DPCM based coding. 

in the traversal of macroblocks in M_p (2), operations (5) and especially (6) are most computationally intensive but can happen using parallel computation. 

While these methods are promising, it seems that methods based on 3D Point cloudscan result in coding with even less overhead and more flexibleprogressive rendering capabilities, as the format is simpler to acquire and process. 

The percentage thatis shared and converged is shown in red, and relates largely to the bitrate savings (which is over 30% for the Dimitrios dataset that has the largest percentage of shared blocks). 

These quality evaluation metrics are well aligned with existing practice in mesh and video compression and are recommended for evaluation of point cloud codecs [25]. 

In 3Dimmersive and augmented 3D Video coding, having both lowencoder and decoder complexity are important (analogous to video coding in video conferencing systems compared tovideo on demand). 

As shown in Fig 5 the bounding box of an intra frame is expanded from the BB_IE (x_min – bb_exp, y_min – bb_exp, z_min –bb_exp) to upper corner (x_max+ bb_exp, y_max+ bb_exp,z_max+bb_exp), where bb_exp was computed from δ and the ranges of the original bounding box.