Journal Article•DOI•

Design, Implementation, and Evaluation of a Point Cloud Codec for Tele-Immersive Video

Rufael Mekuria¹, Kees Blom², Pablo Cesar²•Institutions (2)

VU University Amsterdam¹, Centrum Wiskunde & Informatica²

01 Apr 2017-IEEE Transactions on Circuits and Systems for Video Technology (IEEE)-Vol. 27, Iss: 4, pp 828-842

TL;DR: A subjective study in a state-of-the-art mixed reality system shows that introduced prediction distortions are negligible compared with the original reconstructed point clouds and shows the benefit of reconstructed point cloud video as a representation in the 3D virtual world.

read less

Abstract: We present a generic and real-time time-varying point cloud codec for 3D immersive video. This codec is suitable for mixed reality applications in which 3D point clouds are acquired at a fast rate. In this codec, intra frames are coded progressively in an octree subdivision. To further exploit inter-frame dependencies, we present an inter-prediction algorithm that partitions the octree voxel space in $N \times N \times N$ macroblocks ( $N=8,16,32$ ). The algorithm codes points in these blocks in the predictive frame as a rigid transform applied to the points in the intra-coded frame. The rigid transform is computed using the iterative closest point algorithm and compactly represented in a quaternion quantization scheme. To encode the color attributes, we defined a mapping of color per vertex attributes in the traversed octree to an image grid and use legacy image coding method based on JPEG. As a result, a generic compression framework suitable for real-time 3D tele-immersion is developed. This framework has been optimized to run in real time on commodity hardware for both the encoder and decoder. Objective evaluation shows that a higher rate-distortion performance is achieved compared with available point cloud codecs. A subjective study in a state-of-the-art mixed reality system shows that introduced prediction distortions are negligible compared with the original reconstructed point clouds. In addition, it shows the benefit of reconstructed point cloud video as a representation in the 3D virtual world. The codec is available as open source for integration in immersive and augmented communication applications and serves as a base reference software platform in JTC1/SC29/WG11 (MPEG) for the further development of standardized point-cloud compression solutions.

...read moreread less

Summary (2 min read)

Jump to: [Introduction] – [II. DEVICE DESIGN AND FABRICATION] – [GEOMETRICAL PARAMETERS OF THE THREE DIFFERENT PHOTODIODE PIXELS] – [III. ELECTRICAL CHARACTERIZATION] – [IV. OPTICAL CHARACTERIZATION] – [SUMMARY OF OVERALL BEST PERFORMANCE COMPARED WITH] – [SUMMARY OF APD PERFORMANCE PARAMETERS FOR THE THREE PIXEL TYPES] and [V. CONCLUSION]

Introduction

AVALANCHE photodiodes (APDs) and single-photonavalanche diodes , or Geiger-mode APDs, are widely used in optical telecommunications, imaging, and medical diagnostics, where high sensitivity to light in the visible or near-infrared (NIR) ranges is needed [1].
Ge is attractive because it has been possible to develop processes allowing integration in CMOS, and low-quality Ge photodiodes have been demonstrated in telecom circuits [5], [6].

II. DEVICE DESIGN AND FABRICATION

Fig. 1 shows a schematic of the fabrication process of the PureGaB Ge-on-Si arrays.
First, a 30-nm thermal SiO2 is grown on the Si surface followed by a low-pressure CVD SiO2 layer deposition with a thickness of ∼1 μm.

GEOMETRICAL PARAMETERS OF THE THREE DIFFERENT PHOTODIODE PIXELS

Etched on the Si surface by a mask that defined the Ge-on-Si photodiode areas.
Since the diodes in the multidiode devices were separated by a 1-μm-wide oxide region, the actual Ge area decreases with the number of diodes while the Ge perimeter increases.
Ge deposited on surrounding oxide regions from reaching the windows designed for the deposition of the As-doped Ge islands.
Next, 800-nm Al/Si1% was sputtered and then removed over the photosensitive junctions by means of selective plasma etching of the Al layer to the oxide covering the PureGaB.
This phenomenon was studied in detail in [7].

III. ELECTRICAL CHARACTERIZATION

A continuous-flow cryostat system was used for achieving low temperatures.
Vacuum conditions were maintained to be free from ice and frosting on the chip surface even for temperatures as low as 77 K achieved with liquid nitrogen cooling.
Typical I–V characteristics are shown in Fig. 4 for both room temperature and 77 K measurements of all three device types.
The perimeter of the hexa device was 3 times longer than that of the single-diode pixel, i.e., it became less likely that a perfect device would be found.
The depletion width in reverse will quickly become larger than the 0.55 μm thickness of the Ge-islands and transverse the nonperfect interface with the Si.

IV. OPTICAL CHARACTERIZATION

The Ge photodiodes were mounted in the vacuum chamber and cooled to 77 K.
The incident optical power on the devices was attenuated, resulting in 9.8, 0.75, and 0.78 μW for the respective laser sources.
The diameter of the incident beam spot was 0.3 cm.
The photocurrent was measured using a computer-controlled HP Semiconductor Parameter Analyzer model 4145B.
Neutral density filters were used to prevent light saturation and pileup.

SUMMARY OF OVERALL BEST PERFORMANCE COMPARED WITH

The responsivity R as a function of reverse voltage Vx was calculated as R(Vx ) = (Iph,Vx − Idark,Vx )/P (2) where the photocurrent Iph,Vx and the dark current Idark,Vx are measured at Vx and P is the incident power at the surface of the pixel.
Ge crystal but the leakage current is not yet influenced by impact ionization.
The responsivity at 660 nm is mainly not reported for Ge photodiodes.
This trend is corroborated by Fig. 10, which plots the maximum optical gain as a function of Vbd.

SUMMARY OF APD PERFORMANCE PARAMETERS FOR THE THREE PIXEL TYPES

Inner ellipse for σ and the outer ellipse for 3σ spread.
The mean values of maximum optical gain and breakdown, as well as the breakdown standard deviation, are listed in Table III.
Just as the lower spread around high current levels of the reverse I–V characteristics of the hexa devices can be explained by the higher probability of perimeter imperfections, this also explains the lower spread with perimeter of the maximum gain to breakdown voltage relationship.

V. CONCLUSION

The PureGaB Ge-on-Si photodiodes integrated in 300 × 1 pixel arrays were characterized at cryogenic temperatures for operation as proportional APDs.
The main differentiating factor is a very different area and perimeter, while the electrical/optical performance is comparable with very high optical gain of up to 106 measured at 77 K, where the low-voltage dark current is lower than the measurement limit of 2.5 × 10−2 μA/cm2.
Ge area but the largest perimeter with an average increase in the Ge thickness of ∼30% due mainly to V-groove formation.
All in all, the PureGaB Ge-on-Si offers a very low-complexity CMOS-compatible means of fabricating uniform arrays of the NIR sensitive photodetectors that are operational in linear, avalanche, and even Geiger modes.

Did you find this useful? Give us your feedback

Figures (24)

TABLE I SYMBOLS USED IN INTER PREDICTIVE POINT CLOUD COMPRESSION ALGORITHM

Fig. 21 Results at LoD 10 (10 bit octree)

Fig. 20 Results at LoD Alex 11 (11 bit octree)

Fig. 1 Screenshot of composite rendering in virtual room based on the Reverie System. The Point cloud naturalistic content is rendered compositely with synthetic content. Navigation and user support enables interaction between naturalistic point cloud and synthetic avatar users

Fig. 2 Low cost point cloud capturing setup, point clouds are reconstructed from multiple 3D input streams of calibrated Microsoft Kinect devices.

Fig. 19 Results at LoD Christos 10 (10 bit octree)

Fig. 8 Inter Predictive Point Cloud Coding Algorithm

Fig. 18 Results at LoD Christos 11 (11 bit octree)

Fig. 15 shared macroblock and convergence percentage Fig. 16 frame size predicted vs intra Fig. 17 Quality predicted frames vs intra frames

Fig. 31 Real-Time Performance of intra coding for different configuration

Fig. 30 Real-Time Performance of intra coding for different configuration

Fig. 4 Schematic of time-varying point cloud compression codec

Fig. 22 Screenshot of Point Cloud Rendering in Virtual World. Fig. 23. Test setup with user filling in the questionaire

Fig. 28 Subjective Results Quality, near vs far Fig. 29 Feeling of togetherness, avatar user versus 3D Video user

Fig. 26 Subjective Results, Quality of the motion Fig. 27 Subjective Results, perceived realism

Fig. 24 Subjective Results, perceived quality of the 3D Human Fig. 25 Subjective Results, colors of the 3D Human

Fig. 11 Rate Distortion for color coding (Y component)

Fig. 12 Rate Distortion for color coding (U component)

Fig. 10 Bytes per point for color coding per LoD

Content maybe subject to copyright Report

9955

Abstract—we present a generic and real-time time-varying

point cloud codec for 3D immersive video. This codec is suitable

for mixed reality applications where 3D point clouds are

acquired at a fast rate. In this codec, intra frames are coded

progressively in an octree subdivision. To further exploit inter-

frame dependencies, we present an inter-prediction algorithm

that partitions the octree voxel space in N times N times N

macroblocks (N=8,16,32). The algorithm codes points in these

blocks in the predictive frame as a rigid transform applied to the

points in the intra coded frame. The rigid transform is computed

using the iterative closest point algorithm and compactly

represented in a quaternion quantization scheme. To encode the

color attributes, we defined a mapping of color per vertex

attributes in the traversed octree to an image grid and use legacy

image coding method based on JPEG. As a result, a generic

compression framework suitable for real-time 3D tele-immersion

is developed. This framework has been optimized to run in real-

time on commodity hardware for both encoder and decoder.

Objective evaluation shows that a higher rate-distortion (R-D)

performance is achieved compared to available point cloud

codecs. A subjective study in a state of art mixed reality system

shows that introduced prediction distortions are negligible

compared to the original reconstructed point clouds. In addition,

it shows the benefit of reconstructed point cloud video as a

representation in the 3D Virtual world. The codec is available as

open source for integration in immersive and augmented

communication applications and serves as a base reference

software platform in JCT1/SC29/WG11 (MPEG) for the further

development of standardized point cloud compression solutions.

Index Terms— Data Compression, Video Codecs, Tele

Conferencing, Virtual Reality, Point Clouds

I. INTRODUCTION

ITH increasing capability of 3D data acquisition

devices and computational power, it is becoming easier

to reconstruct highly detailed photo realistic point clouds (i.e.

point sampled data) representing naturalistic content such as

persons or moving objects/scenes [1] [2]. 3D point clouds are

a useful representation for 3D video streams in mixed reality

systems. They do not only allow free-view point rendering

(for example based on splat rendering), but can also be

compositely rendered in a synthetic 3D scene as they provide

full 3D geometry coordinates information. Therefore, this type

of video representation is preferable in mixed reality systems,

such as augmented reality where a natural scene is combined

with synthetic (authored e.g. computer graphics) objects. Or,

vice versa, in immersive virtual rooms where a synthetic scene

is augmented with a live captured natural 3D video stream

representing a user.

Traditionally, 3D polygon meshes have often been used to

represent 3D object based visual data. However, point clouds

are simpler to acquire than 3D polygon meshes as no

triangulation needs to be computed, and they are more

compact as they do not require the topology/connectivity

information to be stored. Therefore, 3D point clouds are more

suitable for real-time acquisition and communication at a fast

rate. However, realistic reconstructed 3D Point clouds may

contain hundreds of thousands up to millions of points and

compression is critical to achieve efficient and real-time

communication in bandwidth limited networks.

Compression of 3D Point clouds has received significant

attention in recent years [3] [4] [5] [6]. Much work has been

done to efficiently compress single point clouds progressively,

such that lower quality clouds can be obtained from partial bit

streams (i.e. a subset of the original stream). To compare

different solutions, often the compression rate and the

geometric distortion have been evaluated. Sometimes the

algorithmic complexity was analyzed, and in addition schemes

for attribute coding (i.e. colors, normals) have been proposed.

While these approaches are a good starting point, in the

context of immersive, augmented and mixed reality

communication systems, several other additional factors are of

importance.

One important aspect in these systems is real-time

performance for encoding and decoding. Often, in modern

systems, parallel computing that exploits multi-core

architectures available in current computing infrastructures is

utilized. Therefore, the parallelizability becomes important.

Second, as in these systems point cloud sequences are

captured at a fast rate, inter-frame redundancy can be

exploited to achieve a better compression performance via

inter-prediction which was usually not considered in existing

static point cloud coders.

Furthermore, as tele-immersive codecs are intended for

systems with real users subjective quality assessment is

needed to assess the performance of the proposed codec in

addition to the more common objective quality assessment.

Further, a codec should be generic, in a sense that it can

compress point cloud sequences coming from different setups

with different geometric properties (i.e. sampling density,

manifoldness of the surface etc.).

With these requirements in mind we introduce a codec for

time varying 3D Point clouds for augmented and immersive

3D video. The codec is parallelizable, generic, and operates in

Design, Implementation and Evaluation of a

Point Cloud Codec for Tele-Immersive Video

R. Mekuria, Student Member IEEE, K. Blom, P. Cesar., Member, IEEE

9955

real time on commodity hardware. In addition, it exploits

inter-frame redundancies. For evaluation, we propose an

objective quality metric that corresponds to common practice

in video and mesh coding. In addition to objective evaluation

using this metric, subjective evaluation in a realistic mixed

reality system is performed in a user study with 20 users.

The codec is available as open source and currently serves as

the reference software framework for the development of a

point cloud compression technology standard in JCT1/

SC29/WG11 (MPEG)

. To facilitate benchmarking, the

objective quality metrics and file loaders used in this work

have all been included in the package. In addition, the point

cloud test data is available publicly.

The rest of the paper is structured as follows. In section III

we detail the lossy attribute/color coding and progressive

decoding scheme. In section IV the inter-predictive coding

algorithm is detailed. Section V details the experimental

results including subjective test results in a mixed reality

system, objective rate distortion and real-time performance

assessment. In section II we provide the overview of the

codec and in this section, the context and related work.

A. Contributions

In this work we propose a novel compression framework for

progressive coding of time varying point clouds for 3D

immersive and augmented video. The major contributions are:

 Generic Compression Framework: that can be used to

compress point clouds with arbitrary topology (i.e.

different capturing setups or file formats)

 Inter-Predictive Point Cloud Coding: correlation

between subsequent point clouds in time is exploited to

achieve better compression performance

 Efficient Lossy Color Attribute Coding: the framework

includes a method for lossy coding of color attributes that

takes advantage of naturalistic source of the data using

existing image coding standards

 Progressive Decoding: the codec allows a lower quality

point cloud to be reconstructed by a partial bit stream

 Real-Time Implementation: the codec runs in (near)

real-time on commodity hardware, benefiting from multi-

core architectures and a parallel implementation.

B. Augmented and Immersive 3D Video vs FVV and 3DTV

There exist quite a few technologies for coding 3D Video. In

this section we further motivate the additional need for point

cloud compression for immersive and augmented 3D Video.

3D Video often refers to 3D television (3DTV) or free

viewpoint video (FVV). 3DTV creates a depth perception,

while FVV enables arbitrary viewpoint rendering. Existing

video coding standards such as AVC MVV [7] and MVV-D

[8] can support these functionalities via techniques from

(depth) image based rendering (DIBR). Arbitrary views can

be interpolated from decompressed view data using spherical

interpolation and original camera parameter information. This

enables free view point rendering without having explicit

geometry information available. However, in mixed reality

http://wg11.sc29.org/svn/repos/MPEG-04/Part16-

Animation_Framework_eXtension_(AFX)/trunk/3Dgraphics/

systems, explicit object geometry information is needed to

facilitate composite rendering and object navigation.

In such systems, rendering is usually done based on object

geometry such as meshes or 3D point clouds using generic

computer graphics API’s (e.g. OpenGL, Direct3D etc.). This is

in line with common practice in computer games and virtual

worlds. Therefore, to enable true convergence between

naturalistic and synthetic content in immersive and augmented

reality, object based video compression of point clouds and

3D Meshes is a remaining challenge. To illustrate this need

further, we show some examples from a practical tele-

immersive system that we have been working on in the last

four years, the Reverie system [9]. The Reverie system is an

immersive communication system that enables online

interaction in 3D virtual rooms, either represented by a 3D

avatar or 3D object based video stream based on 3D Point

Clouds or Meshes. As can be seen in Fig.1 each representation

can be rendered compositely in a common 3D space. This 3D

Fig. 2 Low cost point cloud capturing setup, point clouds are reconstructed

from multiple 3D input streams of calibrated Microsoft Kinect devices.

Fig.3 Example schematic of 3D point cloud reconstruction for immersive video.

Many variations can be implemented

Multi

Camera Capture

Align

Fuse

intrinsic

camera

parameters

reconstruct

Segmeted

3D Point Cloud

Fig. 1 Screenshot of composite rendering in virtual room based on the Reverie

System. The Point cloud naturalistic content is rendered compositely with

synthetic content. Navigation and user support enables interaction between

naturalistic point cloud and synthetic avatar users

9955

video stream is a segmentation of the real 3D user. Fig. 2

illustrates the low cost 3D point cloud acquisition system

deployed in this system. It uses multiple calibrated Kinect

sensors. Original color plus depth streams (RGB+D) are fused

and the foreground moving object is segmented as a point

cloud. The basic steps of such an algorithm are shown in Fig.

3. The output is a point cloud ready for remote rendering.

This raises the need for efficient 3D point cloud video

compression and transmission.

Alternatively, by running the algorithm in Fig. 3 at the remote

site on decompressed RGB+D data, one could use RGB + D

video coders directly on sensor data. We have done some

experimentation comparing RGB+D coding using MPEG-4

AVC simulcast (QP8-QP48, zero latency, x264 encoder) and

point cloud compression (8-9-10 bits per direction component)

using the point clouds in [2] reconstructed from 5 RGB + D

streams (data available in [10], captured with Microsoft

Kinect). Byte Sizes of132Kbyte-800Kbyte per frame were

achieved by using RGB-D coding, while bit-rates of 40 -265

KiloBytes were obtained using the octree based point cloud

compression. In addition, the distortion introduced to the point

cloud, which could be unpredictable for low bit rate RGB-D

coding, is bound by the octree resolution in an octree based

point cloud codec. Last, the experiments showed lower

encoding/decoding latencies when using point cloud

compression. As in tele-presence and immersive reality low

latency, low bit-rate and bound distortion are critical; we

develop compression for time varying point clouds.

C. Related Work

Point Cloud Compression: There has been some work on

point cloud compression in the past, but most works only aim

at compression of static point clouds, instead of time-varying

point clouds. Such a codec was introduced in [11] based on

octree composition. This codec includes bit-reordering to

reduce the entropy of the occupancy codes that represent

octree subdivisions. This method also includes color coding

based on frequency of occurrence (colorization) and normal

coding based on efficient spherical quantization. A similar

work in [12] used surface approximations to predict

occupancy codes, and an octree structure to encode color

information. Instead, the work in [3] introduced a real-time

octree based codec that can also exploit temporal redundancies

by XoR operations on the octree byte stream. This method can

operate in real-time, as the XoR prediction is extremely simple

and fast. A disadvantage of this approach is that by using

XoR, only geometry and not colors can be predicted, and that

the effectiveness is only significant for scenes with limited

movement (which is not the case in out envisioned application

with moving humans). Last, [5] introduced a time varying

point cloud codec that can predict graph encoded octree

structures between adjacent frames. The method uses spectral

wavelet based features to achieve this, and an encoding of

differences, resulting in a lossless encoding. This method also

includes the color coding method from [6], which defines a

small sub-graphs based on the octree of the point cloud. These

subgraphs are then used to efficiently code the colors by

composing them on the eigenvectors of the graph Laplacian.

Mesh Compression: 3D objects are often coded as 3D

Meshes, for which a significant number of compression

methods have been developed. Mesh codecs can be

categorized as progressive, i.e. allowing a lower resolution

rendering from partial bit streams and single rate (only

decoding at full resolution is available) [13]. For networked

transmission, progressive methods have generally been

preferred, but for 3D immersive Video, single rate can also be

useful, as they introduce less encoder computation and bit-rate

overhead [14]. Especially, some recent work has aimed at

compression of object based immersive 3D video [15] [16]

[17] uses single rate coding. While these methods are

promising, it seems that methods based on 3D Point clouds

can result in coding with even less overhead and more flexible

progressive rendering capabilities, as the format is simpler to

acquire and process. Last, there have been methods defined in

the international standards for mesh compression [18] [19]

which is greatly beneficial for interoperability between

devices and services. These methods have been mostly

designed for remote rendering, and have low decoder

complexity and a slightly higher encoder complexity; In 3D

immersive and augmented 3D Video coding, having both low

encoder and decoder complexity are important (analogous to

video coding in video conferencing systems compared to

video on demand).

Multi-View Plus Depth Compression: Multi-View plus

depth representation was considered for storing video and

depth maps from multiple cameras [8] [7]. Arbitrary

viewpoints are then rendered by interpolation between

different camera views using techniques from depth image

based rendering (DIBR) to enable free viewpoint. While these

formats can be used to represent the visual 3D scene, they do

not explicitly store 3D object geometries, which is useful for

composite rendering in immersive communications and

augmented reality. Therefore, these formats are not directly

applicable to immersive and augmented 3D object based

video.

Compression of Immersive 3D Video: The specific

design requirements posed by 3D Video for immersive

communications have been addressed before in [20] and [21].

The first work introduces a compression scheme of polygon

based 3D video (in this case a segmentation and meshing in a

color + depth video), and a purely perception based

compression mechanism combined with entropy encoding. In

this work the level of detail (polygon size) is adapted to match

user/application needs. These needs were derived offline in

subjective studies with pre-recorded stimuli. In [21] the

MPEG-4 video codec was used in a smart way together with

camera parameters. Both methods have been developed in a

context that combines the capturing, compression and

networking components towards the specific tele-immersive

configuration. This lack of generality makes it harder to assess

the relevance and performance of the video compression

compared to other methods. In our work we aim to provide a

generic 3D point cloud codec that can be compared with other

codecs for point cloud compression and applied to any

capturing setup that produces 3D point cloud data.

9955

II. OVERVIEW OF POINT CLOUD CODING SCHEME

We first outline the requirements for point cloud compression,

after which we detail the point cloud coding schematic (Fig. 4)

A. Requirements and Use Cases

3D Video based on point clouds is relevant for augmented and

mixed reality applications as shown in Fig 1. In addition, it has

various applications (e.g., to store data used in geographic

information systems and 3D printing applications). We focus

on time-varying point cloud compression for 3D immersive

and augmented video and follow the requirements for point

cloud compression as defined in the MPEG-4 media standard

[22]:

Partial Bit Stream Decoding: it is possible to decode a coarse

point cloud and refine it.

Lossless Compression: the reconstructed data is

mathematically identical to the original.

Lossy Compression: compression with parameter control of

the bit-rate.

Time Variations and Animations: temporal variations i.e.

coding of point cloud sequences, should be supported

Low Encoder and Decoder Complexity: this is not a strict

requirement but desirable for building real-time systems.

B. Schematic Overview

The architecture design of the proposed 3D Video based on

point clouds combines features from common 3D Point cloud

(octree based) codecs ( [12] [11]) and common hybrid video

codecs such as MPEG-4 p.10 AVC and HEVC (that include

block based motion compensation). Based on the numbering

in the diagram in Fig 4 we detail the most important

components in the codec, which also correspond to our main

contributions.

Bounding Box Alignment and filter (1): A specific

algorithm for bounding box computation and alignment has

been developed. This algorithm is applied to the point cloud

before the octree composition (2) is performed. This algorithm

aligns subsequent frames by computing a common expanded

bounding box. This allows common axis and range

representation between frames which facilitates the inter-

predictive coding and a consistent assignment of the octree

bits. In addition, it includes an outlier filter, as our

experiments have shown that outlier points can reduce both

the effectiveness of the bounding box alignment and of inter

predictive coding, and should be filtered out from the input

clouds. See III.A for all details.

Constructing the Progressive Octree (2): The encoder

recursively subdivides the point cloud aligned bounding box

into eight children. Only non-empty children voxels continue

to be subdivided. This results in an octree data structure,

where the position of each voxel is represented by its cell

center. Its attributes (color) is set to the average of enclosed

points and needs to be coded separately by the attribute

encoder. Each level in the octree structure can represent a

level of detail (LoD). The final level of detail is specified by

the octree bit settings. This is a common scheme for both

regularizing the unorganized point cloud and for compressing

it. (See III.B for all details).

Coding of the Occupancy Codes (3) (LOD): To code the

octree structure efficiently, the first step is the encoding of

(LOD’s) as sub-divisions of non-empty cells. Contrary to

previous work [12] [11], we develop a position coder that uses

an entropy coder in the corresponding LoD’s. In addition, by

avoiding surface approximations as in [12] and occupancy re-

ordering as in [11] we keep the occupancy code encoder as

simple and fast (See III.B for all details). In particular we set

the level of decodable LoD’s to two, thus reducing overhead.

Fig. 4 Schematic of time-varying point cloud compression codec

9955

Coding of Color Attributes (4): In order to code the color

attributes in the point cloud in a highly efficient manner, we

integrated methods based on mapping the octree traversal

graph to a JPEG image grid, exploiting correlation between

color attributes in final LoD’s. The rationale of the JPEG

mapping is that as the color attributes result from natural

inputs, comparable correlation between adjacent pixels/points

exists. By mapping the octree traversal graph to a JPEG grid

we aim to exploit this correlation in an easy fast way suitable

for real-time 3D tele-immersion.

Inter Predictive Frame Coding (5): We introduce a

method for inter-predictive encoding of frames based on

previous inputs that includes both rigid transform estimation

and rigid transform compensation. We first compute a

common bounding box between the octree structures of

consecutive frames i.e. block (1). Then we find shared larger

macroblocks on K (=4) levels above the final LoD voxel size.

For blocks that are not empty in both frames, color variance

and the difference in point count are used to decide if we

compute a rigid based transform based on the iterative closest

point algorithm or not. If this is the case and the iterative

closes point algorithm is successful (converges) the computed

rigid transformation can be used as a predictor. The rigid

transform is the stored compactly in a quaternion and

translation quantization schema (6). This prediction scheme

can typically save upto 30% bit-rate. This is important to

reduce the data volume at high capture rates.

Coder Control (8) and Header Formatting (7): The coder

uses pre-specified codec settings (via a configuration file)

which include the octree bit allocation for (2), macroblock size

and prediction method for (5) and color bit allocation and

mapping mode for (4). In addition it includes the settings for

the filter and the color coding modes. We use common header

format and entropy coding based on a static range coder to

further compress these fields.

III. INTRA FRAME CODER

The intra frame coder consists of three stages (1, 2 and 3 in

Fig 4). It first filters outliers and computes a bounding box.

Second, it performs an octree composition of space. Third,

entropy encoding of the resulting occupancy codes is

performed.

A. Bounding Box Normalization and Outlier Filter

The bounding box of a mesh or point cloud frame is typically

computed as a box with a lower corner (x_min,y_min,z_min)

and an upper corner (x_max,y_max,z_max). This bounding

box is used as the root level of the octree. The bounding box

can change from frame to frame as it is defined by these

extrema. As a consequence, the correspondence of octree

voxel coordinates between subsequent frames is lost, which

makes inter-prediction much harder. To mitigate this problem,

Fig 5 illustrates a scheme that aims to reduce bounding box

changes in adjacent frames. The scheme enlarges (expands)

the bounding box with a certain percentage δ, and then, if the

bounding box of the subsequent frame fits this bounding box,

it can be used instead of the original bounding box. As shown

in Fig 5 the bounding box of an intra frame is expanded from

the BB_IE (x_min – bb_exp, y_min – bb_exp, z_min –

bb_exp) to upper corner (x_max+ bb_exp, y_max+

bb_exp,z_max+bb_exp), where bb_exp was computed from δ

and the ranges of the original bounding box. Then, the

subsequent P cloud is loaded and if a bounding box computed

for this frame , BB_P, fits the expanded bounding box BB_IE,

the P is normalized on BB_IE. BB_IE is subsequently used as

the octree root and the frame P can be used by the predictive

coding algorithm presented in section IV. Otherwise, the

expanded bounding box is computed as BB_PE and the cloud

P is normalized on BB_PE. In this case, the cloud P is intra

coded instead, as our predictive coding algorithm only works

when the bounding boxes are aligned.

The bounding box operation is an important pre-processing

step to enable efficient intra and inter-coding schemes. We

assume that point clouds represent segmented objects

reconstructed from multi-camera recordings and in our

experiments we also work with such data. We discovered that

in some cases the segmentation of the main object (human or

object) is not perfect and several erroneous

background/foreground points exist in the original cloud. As

these points can degrade the bounding box computation and

alignment scheme, we have implemented filters to pre-process

the point cloud. Our method is based on a radius removal

filter. It removes points with less than K neighbors in radius R

from the point cloud. This filter removes erroneously

segmented points from the point cloud and improves the

performance of the subsequent codec operations.

Fig 5.Bounding box alignment scheme

B. Octree Subdivision and Occupancy Coding

Fig 6.Octree Composition of Space

I input

cloud

compute BB

P input

cloud

expand BB I

normalize I on

BB_IE

normalize P on

BB_IE

P Coding of P

I Coding of I

expand BB P

normalize P on

BB_PE

I Coding of P

BB_P fits BB_IE

BB_I

BB_P

BB_IE

BB_PE

HTML Viewer

Frequently Asked Questions (14)

Q1. What are the contributions in this paper?

The authors present a generic and real-time time-varying point cloud codec for 3D immersive video. To further exploit interframe dependencies, the authors present an inter-prediction algorithm that partitions the octree voxel space in N times N times N macroblocks ( N=8,16,32 ). This framework has been optimized to run in realtime on commodity hardware for both encoder and decoder. A subjective study in a state of art mixed reality system shows that introduced prediction distortions are negligible compared to the original reconstructed point clouds. The codec is available as open source for integration in immersive and augmented communication applications and serves as a base reference software platform in JCT1/SC29/WG11 ( MPEG ) for the further development of standardized point cloud compression solutions.

Q2. What is the common method of compression for 3D objects?

Mesh Compression: 3D objects are often coded as 3DMeshes, for which a significant number of compression methods have been developed.

Q3. What is the metric used to evaluate the point cloud quality?

To evaluate the point cloud quality, the authors deploy a full reference quality metric that combines common practices from 3D mesh and Video Compression: a PSNR metric based on point to point symmetric root mean square distances.

Q4. What is the hardware used in the experiments?

The hardware used in the experiments are a Dell Precision M6800 PC with intel core i7-4810MQ 2,8 MHz CPU and 16.0 GB of Ram running 64 win7 Operating system, A Dell Precision T3210 (Xeon 3.7 Ghz), and a custom built system with i7 3.2 GhZ running Ubuntu Linux.

Q5. What is the main aim of the experiments?

The main aim of their experiments is to check that the developed codec does not introduce significant extra subjective distortions.

Q6. How did the authors speed up the inter-predictive encoding?

In addition, the authors have been able to speed up the inter-predictive encoding significantly by parallelizing its execution based on OpenMP for multi core intel architectures (we measured up to 20 % improvement on the windows platforms).

Q7. what is the value of point clouds in augmented reality?

The results show that the degradation introduced by the codecs is negligible and that the point cloud has an added value in terms of “feeling together” compared to simple avatar representations, highlighting the importance of point clouds in these applications and the importance of work on compression of point clouds and its standardization for immersive and augmented reality systems.

Q8. How is the color quality degradation in the subjective studies?

In V.E the authors will show in the subjective studies that the color quality degradation introduced by the color coding method is negligible even compared to 8-bits DPCM based coding.

Q9. What can happen in the traversal of macroblocks in M_p?

in the traversal of macroblocks in M_p (2), operations (5) and especially (6) are most computationally intensive but can happen using parallel computation.

Q10. What is the method for coding 3D objects?

While these methods are promising, it seems that methods based on 3D Point cloudscan result in coding with even less overhead and more flexibleprogressive rendering capabilities, as the format is simpler to acquire and process.

Q11. What is the percentage that is shared and converged?

The percentage thatis shared and converged is shown in red, and relates largely to the bitrate savings (which is over 30% for the Dimitrios dataset that has the largest percentage of shared blocks).

Q12. What are the quality evaluation metrics for point cloud codecs?

These quality evaluation metrics are well aligned with existing practice in mesh and video compression and are recommended for evaluation of point cloud codecs [25].

Q13. What is the difference between 3Dimmersive and augmented 3D video coding?

In 3Dimmersive and augmented 3D Video coding, having both lowencoder and decoder complexity are important (analogous to video coding in video conferencing systems compared tovideo on demand).

Q14. What is the coding algorithm used to compute the bounding box?

As shown in Fig 5 the bounding box of an intra frame is expanded from the BB_IE (x_min – bb_exp, y_min – bb_exp, z_min –bb_exp) to upper corner (x_max+ bb_exp, y_max+ bb_exp,z_max+bb_exp), where bb_exp was computed from δ and the ranges of the original bounding box.

Design, Implementation, and Evaluation of a Point Cloud Codec for Tele-Immersive Video

Summary (2 min read)

Introduction

II. DEVICE DESIGN AND FABRICATION

GEOMETRICAL PARAMETERS OF THE THREE DIFFERENT PHOTODIODE PIXELS

III. ELECTRICAL CHARACTERIZATION

IV. OPTICAL CHARACTERIZATION

SUMMARY OF OVERALL BEST PERFORMANCE COMPARED WITH

SUMMARY OF APD PERFORMANCE PARAMETERS FOR THE THREE PIXEL TYPES

V. CONCLUSION

Figures (24)

Citations

Cites background or methods from "Design, Implementation, and Evaluat..."

Cites background or methods or result from "Design, Implementation, and Evaluat..."

Additional excerpts

References

"Design, Implementation, and Evaluat..." refers methods in this paper

"Design, Implementation, and Evaluat..." refers methods in this paper

"Design, Implementation, and Evaluat..." refers background or methods in this paper

"Design, Implementation, and Evaluat..." refers methods in this paper

"Design, Implementation, and Evaluat..." refers background or methods in this paper

Related Papers (5)

Frequently Asked Questions (14)

Q1. What are the contributions in this paper?

Q2. What is the common method of compression for 3D objects?

Q3. What is the metric used to evaluate the point cloud quality?

Q4. What is the hardware used in the experiments?

Q5. What is the main aim of the experiments?

Q6. How did the authors speed up the inter-predictive encoding?

Q7. what is the value of point clouds in augmented reality?

Q8. How is the color quality degradation in the subjective studies?

Q9. What can happen in the traversal of macroblocks in M_p?

Q10. What is the method for coding 3D objects?

Q11. What is the percentage that is shared and converged?

Q12. What are the quality evaluation metrics for point cloud codecs?

Q13. What is the difference between 3Dimmersive and augmented 3D video coding?

Q14. What is the coding algorithm used to compute the bounding box?