What contributions have the authors mentioned in the paper "Towards a point cloud structural similarity metric" ?

In this paper, the authors propose and assess a family of statistical dispersion measurements for the prediction of perceptual degradations. The extracted features aim at capturing local changes, similarly to the wellknown Structural Similarity Index.

What is the way to mitigate the quality scores of the features?

the authors propose the use of voxelization prior to feature extraction, in order to mitigate objective quality scores that are achieved at high-resolution models.

What is the metric for evaluating the visual quality of 3D meshes?

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no.

What are the common features extracted from a point cloud?

The features are extracted from computed quantities that depend on point cloud attributes, including geometry, normal vectors, curvature values, and colors.

What was the purpose of the study?

The point clouds were evaluated in three different sessions, with one of them being relevant to this study; that is a fixed-size point-based rendering with color information obtained from the original models after a re-coloring step.

(Open Access) Towards a Point Cloud Structural Similarity Metric (2020) | Evangelos Alexiou

Q: How many point clouds were used in this study?

The second dataset consists of 6 point clouds whose geometry was compressed using three codecs, namely, V-PCC, G-PCC (TriSoup module) and PCL, at three degradation levels.

Q: What is the first dataset of point clouds?

The first dataset consists of 8 point clouds whose geometry and color were compressed1A prototype MATLAB implementation is made available online at https://www.epfl.ch/labs/mmspg/pointssim/.at 6 levels using five codecs, namely, V-PCC and the four GPCC test model variations.

Q: What is the purpose of the analysis?

A Principal Component Analysis (PCA) is then issued to provide an orthonormal basis and a linear approximation of the local surface, which passes from the centroid of the neighborhood.

TOWARDS A POINT CLOUD STRUCTURAL SIMILARITY METRIC

Evangelos Alexiou and Touradj Ebrahimi

Multimedia Signal Processing Group (MMSPG)

Ecole Polytechnique F

erale de Lausanne (EPFL)

Emails: FirstName.LastName@epﬂ.ch

ABSTRACT

Point cloud is a 3D image representation that has recently

emerged as a viable approach for advanced content modality

in modern communication systems. In view of its wide adop-

tion, quality evaluation metrics are essential. In this paper, we

propose and assess a family of statistical dispersion measure-

ments for the prediction of perceptual degradations. The em-

ployed features characterize local distributions of point cloud

attributes reﬂecting topology and color. After associating lo-

cal regions between a reference and a distorted model, the cor-

responding feature values are compared. The visual quality

of a distorted model is then predicted by error pooling across

individual quality scores obtained per region. The extracted

features aim at capturing local changes, similarly to the well-

known Structural Similarity Index. Benchmarking results us-

ing available datasets reveal best-performing attributes and

features, under different neighborhood sizes. Finally, point

cloud voxelization is examined as part of the process, improv-

ing the prediction accuracy under certain conditions.

Index Terms— Point cloud, objective quality metric, vi-

sual quality assessment.

1. INTRODUCTION

Objective quality evaluation is a research area that aims at

investigating and proposing algorithms that are able to pre-

dict the visual quality of content representations, typically as

perceived by human end-users. This research ﬁeld is impact-

ful on several tasks that are related to information and com-

munication systems. For instance, having access to accurate

predictions of quality for contents after encoding or trans-

mission can greatly assist in improving user experience by

updating corresponding conﬁgurations of the underlying sys-

tems. Moreover, the benchmarking of new solutions can be

facilitated by using well-performing objective quality metrics

instead of subjective evaluation experiments. The latter are

considered to reveal ground truth scores of visual quality; yet,

This work has been conducted in the framework of the Swiss National

Foundation for Scientiﬁc Research project Advanced Visual Representation

and Coding in Augmented and Virtual Reality (FN 178854).

they are costly and cumbersome, as well as limited in terms

of large scale realization and ad-hoc implementation.

The development of algorithms to accurately predict the

level of distortion introduced in content representations un-

der realistic types of degradations (e.g., compression, noise)

has been at the center of attention for the research commu-

nity for many years. The initial focus was naturally drawn

by conventional images, where it was early understood that

naive implementations of error quantiﬁcation in a pixel-by-

pixel basis, e.,g., Mean Square Error (MSE), did not corre-

late well with human judgements. As a consequence, efforts

were concentrated on approaches that consider characteris-

tics of the human visual system. These, in principle, can be

categorized as bottom-up, and top-down. The former denote

theoretical approaches that aim at measuring perceived errors

in a content, whereas the latter signify engineering solutions

that aim at capturing properties of human visual perception.

Objective quality metrics can also be clustered based on the

availability of the original version of the content at run-time

as full-reference, reduced-reference and no-reference metrics.

In the ﬁeld of three-dimensional imaging, top-down, full-

reference approaches are the most common. Although largely

explored in the case of polygonal meshes, objective quality

evaluation for point clouds still remains a widely open prob-

lem. This type of media content has recently drawn a signif-

icant amount of interest due to the ﬂexibility and efﬁciency

in acquisition, processing and rendering. In fact, the MPEG

standardization body is releasing its ﬁrst point cloud coding

standard, which will facilitate compatibility across devices,

while the relevant efforts have triggered research on the devel-

opment of more efﬁcient encoding solutions. However, there

is only a limited number of available objective quality assess-

ment methods so far with reported weaknesses [1, 2], which

urges the demand for better-performing algorithms.

In this study we aim to shed light on the effectiveness of a

wide range of features that we construct from explicit and/or

implicit information that is carried in a point cloud model.

The adopted features are estimators of statistical dispersion

in local neighborhoods, computed using several related for-

mulas. As part of the study, the impact of the neighborhood

size on the obtained quality scores is analysed, based on stan-

dardized benchmarking indexes. Moreover, the impact of a

978-1-7281-1485-9/20/$31.00

2020 IEEE

voxelization step prior to the computation of features is inves-

tigated, obtaining promising results, which bring us to intro-

duce the concept of “multi-scale” approaches in point clouds.

The performance evaluation is conducted using two available

datasets with diverse characteristics. This study can be re-

garded as an exploration of the applicability of the Structural

Similarity (SSIM) index [3] in a higher dimensional, irregular

space (volumetric content), incorporating not only color, but

also topological coherence among local regions.

2. RELATED WORK

Polygonal mesh representation has been the prevailing 3D

modality in computer graphics. Thus, a substantial amount

of related work has preceded and is reported in the liter-

ature. In particular, existing objective quality metrics for

meshes can be divided in two categories, namely, “image-

based” and “model-based” [4]. The ﬁrst essentially exploits

the development of high-performing solutions from 2D imag-

ing, which are applied on a set of representative views of the

model. The second category relies on geometric errors, dihe-

dral angles [5], curvature statistics [6], and roughness mea-

surements [7], among others. The reader can refer to [8] for

an excellent review study on this topic.

Point cloud objective quality metrics can also be clus-

tered as “image-based” and “model-based” approaches, sim-

ilarly to mesh modelling counterpart. The idea of convert-

ing point clouds to meshes prior to application of relevant

algorithms was discarded quickly, as this additional process-

ing step is commonly lossy. Similarly to mesh models case,

“image-based” metrics rely on 2D imaging algorithms ap-

plied on model views [9, 10]. Although able to capture ge-

ometry, color and rendering distortions, this type of metrics

are governed by the selection of viewpoints (i.e., camera po-

sition and distance), the rendering mechanism to consume the

content [1], as well as environmental and lighting conditions

set on the virtual scene, which affect the perception of colors.

The most common “model-based” approaches so far as-

sess geometry and depend on Euclidean distances (point-to-

point), or projected errors along normal vectors (point-to-

plane) [11]. Algorithms based on similarity of local surface

approximations (plane-to-plane) [12] and curvature statistics

(PC-MSDM) [13] have been introduced, showing advances

in predicting the perceived distortions, especially in colorless

point cloud datasets. In [14], measurements of color errors

based on MSE and PSNR, applied either on the RGB or the

YCbCr color space, have been proposed. More recently, the

ﬁrst attempts to combine geometry and color features have

been reported in [15, 16]. In [15], the evaluation of geome-

try distortions is based on curvature statistics, whereas sev-

eral color features based on lightness, chroma and hue are

examined, showing that lightness comparison and structure

perform better. In [16], color statistics such as histograms are

used to predict the texture distortion of point cloud contents,

formulation

extraction

Quantities

computation

Features

Neighborhoods

Fig. 1: Feature extraction.

associat i on

Error po ol in g

Neighborhoods

Relative

di↵erence

Fig. 2: Structural similarity score computation.

and a linear combination of geometry and color measures is

proposed to obtain a global indicator of distortion.

In this work, we extend previous efforts by generalizing

feature extraction process, while also exploring a higher di-

mensional feature space to obtain visual quality scores; that

is, we include information from additional point cloud at-

tributes, such as geometry and normal vectors, and evaluate

their efﬁciency. Moreover, we propose voxelization of mod-

els to improve prediction accuracy.

3. POINT CLOUD STRUCTURAL SIMILARITY

To measure structural similarity, we construct features that

quantify statistical dispersion of quantities that characterize

local topology and appearance of a point cloud. To capture lo-

cal properties, neighborhoods are formed around every point

of a model. Quantities to reﬂect local properties are computed

from point cloud attributes, which are either present (e.g., ge-

ometry), or can be estimated in case of absence (e.g., normal

vectors). In this study, four types of attributes are explored:

(i) geometry, (ii) normal vectors, (iii), curvature values, and

(iv) colors; yet, the same working principle can be easily ex-

tended to any other attribute.

The geometry-related quantities, in our case, are com-

puted based on Euclidean distances between a point of focus

and each point belonging to its neighborhood, and are em-

ployed to assess the coherence of the local geometric struc-

ture. The normal-related quantities are obtained by comput-

ing the angular similarity between the normal vector of a par-

ticular point and each neighbor, in order to examine the uni-

formity of the shape of the local surface. For the same pur-

pose, the curvature values are used. Finally, the color-related

quantities consist of luminance values that are employed to

estimate the local contrast, similarly to SSIM [3].

The features are extracted per neighborhood after ap-

plying dispersion statistics on the aforementioned quantities.

Statistical dispersion measurements are often utilized to es-

timate scale parameters; that is, population parameters that

indicate the spread of a distribution. Several such estima-

tors exist. Yet, in this study, the following are adopted: me-

dian (m

), variance





, mean absolute deviation (µAD

median absolute deviation (mAD

), coefﬁcient of variation

(COV

), and quartile coefﬁcient of dispersion (QCD

), us-

ing Equations 1-4 for the last four metrics, respectively

µAD

= E(A − µ

) (1)

mAD

= E(A − m

) (2)

COV

(3)

QCD

(3) − Q

(1)

(3) + Q

(1)

(4)

where µ

indicates the mean, σ

the standard deviation, and

(i) denotes the i-th quartile of a data set A.

A schematic diagram of the feature extraction process, as

explained above, is illustrated in Figure 1, with C denoting an

input point cloud and F

indicating the output features. Note

that each combination of dispersion estimator and attribute

quantity leads to a different F

. For simplicity reasons, the

reference to features henceforth implies a particular combina-

tion of these parameters, without limiting the generality.

The perceptual quality prediction is based on the similar-

ity of feature values that are extracted from a reference X and

a point cloud under evaluation Y , as presented in Figure 2.

For this purpose, each neighborhood of Y is associated with a

neighborhood of X, by identifying for every point p of Y its

nearest point q in X. Then, the similarity is measured as the

relative difference between the corresponding feature values,

using Equation 5

(p) =

| F

(q) − F

(p) |

max {| F

(q) |, | F

(p) |} + ε

(5)

with ε expressing an arbitrarily small number to avoid unde-

ﬁned operations; here, we set ε equal to the machine rounding

error for ﬂoating point numbers. A total similarity score S

for the model under evaluation Y is estimated through error

pooling across all points N

, based on Equation 6

p=1

(p)

(6)

with k = {1, 2}, denoting the mean and MSE, respectively.

4. VALIDATION METHODOLOGY

4.1. Datasets

Two datasets of static voxelized point clouds under com-

pression artifacts from state-of-the-art codecs are used [1, 2]

for performance evaluation. Hereafter, we refer to them as

MPCQA and IRPC, accordingly. The ﬁrst dataset consists of

8 point clouds whose geometry and color were compressed

A prototype MATLAB implementation is made available online at

https://www.epfl.ch/labs/mmspg/pointssim/.

Attribute

estimation

Point fusion

Voxelization

Fig. 3: Model pre-processing. With dashed lines the step in-

troduced for measurements at different scales.

at 6 levels using ﬁve codecs, namely, V-PCC and the four G-

PCC test model variations. The point clouds were evaluated

in an interactive platform side-by-side, using point-based ren-

dering with adaptive point size.

The second dataset consists of 6 point clouds whose ge-

ometry was compressed using three codecs, namely, V-PCC,

G-PCC (TriSoup module) and PCL, at three degradation lev-

els. The point clouds were evaluated in three different ses-

sions, with one of them being relevant to this study; that is

a ﬁxed-size point-based rendering with color information ob-

tained from the original models after a re-coloring step. The

point clouds were evaluated in a passive scenario, using video

sequences of both the reference and the distorted models,

which were shown one after the other.

Both datasets consist of point clouds with diverse charac-

teristics, resulting from the different nature of the represented

models and the acquisition technologies that were employed.

Moreover, the wide span of encoding schemes that were used

lead to different types of artifacts, making them representative

and suitable candidates for benchmarking purposes.

4.2. Structural similarity computation

Prior to feature extraction, a pre-processing methodology out-

lined in Figure 3 is proposed and followed in this study. In

particular, point fusion is recommended in order to enable

identiﬁcation of duplicated coordinates in a point cloud. In

our implementation, the redundant locations are discarded

and the corresponding color values are blended. This step

prevents enlisting the same location in neighborhood formu-

lation more than once (Figure 1), and eliminates unnecessary

correspondences in neighborhood association (Figure 2).

High-quality surface approximations are essential to ben-

eﬁt from curvature- and normal-based features. To estimate

relevant attributes, we adopt the algorithm described in [15]

for quadric surface ﬁtting. In this context, the k-nearest neigh-

bors of each point are initially identiﬁed (k = 12 in our case).

A Principal Component Analysis (PCA) is then issued to pro-

vide an orthonormal basis and a linear approximation of the

local surface, which passes from the centroid of the neigh-

borhood. A least-squares error quadratic ﬁtting function is

computed across the normal of the plane, after transferring

the origin of the orthonormal basis from the centroid to the

transformed point of focus. The normal vector in this new co-

ordinate system is obtained by simply computing the gradient

of the locally ﬁtted quadric surface at that point. Then, the

inverse transform brings the estimated normal vector back to

the original coordinate system. Moreover, the mean curvature

(a) Geometry-based features (b) Normal-based features (c) Curvature-based features (d) Color-based features

Fig. 4: MPCQA dataset. PCC (thick bars) and SROCC (thin bars) are grouped per metric. In every group, the neighborhood

size is 6, 12, 24 and 48, from left to right.

(a) Geometry-based features (b) Normal-based features (c) Curvature-based features (d) Color-based features

Fig. 5: IRPC dataset, rcolor session. PCC (thick bars) and SROCC (thin bars) are grouped per metric. In every group, the

neighborhood size is 6, 12, 24 and 48, from left to right.

value at the point of focus is computed from the coefﬁcients

of the ﬁtted quadric surface, as described in [15].

The feature extraction process described in Section 3, is

subsequently performed. In regard to the neighborhood for-

mulation (Figure 1), two approaches are common, namely, the

k-nearest neighbor, and the range search. In the ﬁrst case, the

set is extended until the speciﬁed number of points is reached,

whereas in the second case, the set consists of points whose

distance is smaller than the speciﬁed range. Thus, in the for-

mer case, the range of the set is adaptive in terms of size and

the number of points is ﬁxed, whereas in the latter case the

range is ﬁxed and the number of points can vary. In our imple-

mentation, we follow the k-nearest neighbor in order to ﬁxate

the population of quantities (distributions) over which the fea-

tures are computed. To examine the impact of the neighbor-

hood size in the prediction accuracy, in our analysis k takes

values from the set {6, 12, 24, 48}.

A structural similarity score for a model under evaluation

is computed based on relative differences of feature values

using Equation 6, after following the methodology described

in Section 3, and illustrated in Figure 2. In our analysis, the

same procedure is repeated using both the original and the

distorted models as a reference, resulting in quality scores

that are referred to as asymmetric-original and asymmetric-

distorted, respectively. The symmetric error is also computed

as the minimum out of the two asymmetric scores, leading to

a total of 144 quality predictors per neighborhood size.

Finally, we explore the prediction potentials of structural

similarity measurements obtained from scaled point clouds.

For this purpose, a voxelization step is introduced in our pre-

processing pipeline, as depicted in Figure 3. Voxelization is

realized by quantizing the coordinates of a model and by color

blending between points that fall in the same voxel. The res-

olution of the voxel grid is deﬁned by a target voxel depth.

In our implementation, no clipping is applied on coordinates

lying outside of the grid, to avoid introducing extra loss.

4.3. Benchmarking

To evaluate how well an objective metric is able to estimate

perceptual quality, Mean Opinion Scores (MOS) computed

from ratings of subjects that participate in an experiment are

required and serve as ground truth. The objective quality

scores are typically benchmarked after application of a regres-

sion model. In our case, the logistic function is used following

the Recommendation ITU-T P.1401 [17]. The Pearson linear

correlation coefﬁcient (PCC), the Spearman rank order corre-

lation coefﬁcient (SROCC), and the Root-Mean-Square Error

(RMSE) are computed to conclude on the linearity, mono-

tonicity, and accuracy of objective predictors, respectively.

5. EXPERIMENTAL RESULTS

In Figures 4 and 5, the benchmarking results are provided

for both testing datasets. Each ﬁgure indicates all the dis-

persion estimators (i.e., metrics) that were employed, per at-

tribute. In the provided plots, the thick bars correspond to

the PCC index, while the thin bars indicate the SROCC. They

are grouped per metric, indicated on the x-axis, and in each

group, the four selected neighborhoods are displayed in an in-

creasing order. The reported results correspond to predictions

based on asymmetric-distortion scores, which were found to

perform slightly better with respect to the alternatives.

In Figure 4, the performance of the metrics is presented

for the MPCQA dataset. It is evident that color-based features

are over-performing, achieving high scores, with the best be-

ing a PCC of 0.928 and SROCC of 0.920, for σ

and k = 12.

We observe that median is the worst-performing metric, al-

though still achieving good results. In principle, the neigh-

borhood size is not critical, albeit better performance is ob-

tained for the majority of the metrics in mid-ranges (i.e., k

equal to 12 or 24). The curvature-based features are the sec-

ond best-performing solutions in this dataset. For the disper-

sion estimators that work better, namely, σ

, µAD and mAD,

the number of neighbors k is also not crucial. Regarding

geometry-based features, they are rather unstable with respect

to the local region size, while the majority of normal-based

features tend to improve as the neighborhoods are expanding.

In Figure 5, similar plots are employed to present the per-

formance of the metrics in the IRPC dataset. The color-based

features are found again to be the most accurate predictors.

However, their performance is notably deteriorated with re-

spect to the MPCQA dataset, achieving a maximum of 0.792

PCC and 0.643 SROCC using COV with k = 12. This perfor-

mance decrease can be explained by the fact that the color is

not directly degraded in this dataset. Nonetheless, distortions

are inherently added from point re-positioning and downsam-

pling due to geometry encoding. The second best option is

given by the geometry-based features, in regard to the PCC

index. However, the low SROCC values indicate that the pre-

dictions are not very reliable. The majority of features that

capture uniformity of surface shape perform very poorly, with

the exception of some metrics, namely, σ

, µAD and mAD,

applied on curvature values. The general poor performance

can be explained by the fact that (a) the dataset consists of

several rather noisy point clouds, and (b) the original color

values used for the decompressed models act as distractors.

Our results are in accordance with what is seen in [15].

5.1. Towards multi-scale structural similarity

The performance of structural similarity measurements is ﬁ-

nally investigated after point cloud scaling. The latter is im-

plemented through voxelization, which enables color smooth-

ing and regular down-sampling of geometry, only for models

whose original geometric resolution is higher than the target

voxel depth.

Moreover, color blur is introduced, reducing

blocking artifacts, simulating visual inspection from far dis-

tances. Note that if the original voxel depth is smaller than

the target, the color distribution remains unaltered, while the

This essentially applies when the effective voxel depth of a model is

larger than the target; the effective implies no prior up-scaling.

(a) MPCQA dataset with k = 12 (b) IRPC dataset with k = 24

Fig. 6: Performance of color-based features after voxeliza-

tion. PCC (thick bars) and SROCC (thin bars) are grouped

per metric. In every group, the corner left bar corresponds

to no voxelization, whereas the rest of the bars correspond to

voxel depths obtained after decreasing the lowest reference

resolution of the dataset by 0, 1, 2 and 3, from left to right.

topology is up-scaled without impacting the number of points.

In Figure 6a, the performance of color-based features is

presented in the MPCQA dataset, after voxelization using bit-

depths equal and below the lowest resolution among original

models (i.e., this dataset consists of 9-bit and 10-bit models).

The best results are obtained with voxel depth equal to 9, with

a maximum PCC of 0.929 and SROCC of 0.936 using σ

with

k = 12. It can be observed that as the target voxel depth is fur-

ther decreasing, the performance of the metrics is deteriorat-

ing. This can be explained by the increasing levels of blurring

artifacts that appear at lower bit-depths, which are enhanced

for models with severe color compression distortions. For this

demonstration, k was chosen equal to 12, while very similar

trends are obtained for other neighborhood sizes.

The beneﬁts of model voxelization are more evident in

our next paradigm. In Figure 6b, the performance of color-

based features is demonstrated for the IRPC dataset, follow-

ing the same rationale (i.e., this dataset consists of 10-bit and

12-bit models). Based on our results, a remarkable perfor-

mance increase is observed, with a maximum of 0.893 for

PCC and 0.832 for SROCC at 8-bit voxel depth using mAD

with k = 24. Similar tendencies are noted for other k values.

This outcome is explained by the fact that geometry degrada-

tions that are exhibited in blocks, as well as the resolution of

the blocks, over which color blending is performed, affect the

output color values. In other terms, through voxelization, ge-

ometry degradations are reﬂected on the output color values.

In general, voxelization can be seen as a way to reduce

cross-content resolution, potentially providing a more suit-

able scale for objective quality predictors. However, the iden-

tiﬁcation of appropriate voxel depths for point cloud models

or datasets, is not explored in this study.

In Table 1 the best-performing metrics reported in the lit-

erature and attained in this study, are summarized. For the

latter, results including and excluding model voxelization are

handled separately, and are presented following the notation:

(attribute, metric, neighborhood, voxel depth).

Towards a Point Cloud Structural Similarity Metric

Figures

Citations

Predicting the Perceptual Quality of Point Cloud: A 3D-to-2D Projection-Based Exploration

Reduced Reference Perceptual Quality Model and Application to Rate Control for 3D Point Cloud Compression.

A Reduced Reference Metric for Visual Quality Evaluation of Point Cloud Contents

Mahalanobis Based Point to Distribution Metric for Point Cloud Geometry Quality Evaluation

Subjective Quality Database and Objective Study of Compressed Point Clouds with 6DoF Head-mounted Display

References

Image quality assessment: from error visibility to structural similarity

Geometric distortion metrics for point cloud compression

A Multiscale Metric for 3D Mesh Visual Quality Assessment

Watermarked 3-D Mesh Quality Assessment

Point Cloud Quality Assessment Metric Based on Angular Similarity

Related Papers (5)

Point Cloud Quality Assessment Metric Based on Angular Similarity

Geometric distortion metrics for point cloud compression

Image quality assessment: from error visibility to structural similarity

PCQM: A Full-Reference Quality Metric for Colored 3D Point Clouds

A comprehensive study of the rate-distortion performance in MPEG point cloud compression

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Towards a point cloud structural similarity metric" ?

Q2. What is the important metric in the MPCQA dataset?

Q3. What is the way to mitigate the quality scores of the features?

Q4. How many point clouds were used in this study?

Q5. What is the first dataset of point clouds?

Q6. What is the metric for evaluating the visual quality of 3D meshes?

Q7. What is the purpose of the analysis?

Q8. What are the common features extracted from a point cloud?

Q9. What was the purpose of the study?

Q10. What is the performance of the metrics for the MPCQA dataset?

Q11. How is the performance of color-based features shown in the MPCQA dataset?

Q12. What is the reason for the poor performance of the MPCQA dataset?

Q13. What is the performance of color-based features in the MPCQA dataset?