What are the future works mentioned in the paper "A no-reference metric for perceived ringing artifacts in images" ?

To facilitate further benchmarking of ringing metrics, apart from developing computational models, future work should also focus on collecting and distributing more reliable subjective data.

What is the definition of ringing artifacts?

Since the high-frequency components play a significant role in the representation of an edge, coarse quantization in this frequency range (i.e., truncation of the high-frequency transform coefficients) consequently results in apparent irregularities around edges in the spatial domain, which are usually referred to as ringing artifacts.

What can be done to evaluate the performance of the metric?

As suggested in [39], the metric’s performance can also be evaluated with nonlinear correlations using a nonlinear mapping function for the objective predictions before computing the correlation.

What is the way to evaluate the performance of the metric?

On the other hand, without a sophisticated nonlinear fitting (often including various parameters) the correlation coefficients cannot mask a bad performance of the metric itself.

What are the limitations of spurious ringing pixels?

The effect of the spurious ringing pixels on the RAS is avoided by applying two thresholds: 1) high threshold (Thr−vc−high), and 2) low threshold (Thr−vc−low).

(Open Access) A No-Reference Metric for Perceived Ringing Artifacts in Images (2010) | Hantao Liu

Q: What is the ringing behavior of a pixel?

Since ringing manifests itself in the form of artificial oscillations in the spatial domain, its local behavior can be reasonably described as the intensity variance of pixels in the neighborhood [28], [29].

Q: How can the authors measure the annoyance of ringing in detected areas?

Quantification of the annoyance of ringing in the detected areas can be easily achieved by calculating the signal difference between the ringing regions and their corresponding reference, as used in the FR approach described in [14].

Q: What is the main reason why the metric does not reflect perceived ringing?

An obvious shortcoming of the metrics defined in [14], [16], and [17] is the absence of masking, typically occurring in the HVS, with the consequence that these metrics do not always reflect perceived ringing.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010 529

A No-Reference Metric for Perceived Ringing

Artifacts in Images

Hantao Liu, Student Member, IEEE, Nick Klomp, and Ingrid Heynderickx

Abstract—A novel no-reference metric that can automatically

quantify ringing annoyance in compressed images is presented.

In the ﬁrst step a recently proposed ringing region detection

method extracts the regions which are likely to be impaired by

ringing artifacts. To quantify ringing annoyance in these detected

regions, the visibility of ringing artifacts is estimated, and is

compared to the activity of the corresponding local background.

The local annoyance score calculated for each individual ringing

region is averaged over all ringing regions to yield a ringing

annoyance score for the whole image. A psychovisual experiment

is carried out to measure ringing annoyance subjectively and to

validate the proposed metric. The performance of our metric is

compared to existing alternatives in literature and shows to be

highly consistent with subjective data.

Index Terms—Human vision model, image quality assessment,

objective metric, ringing artifact annoyance.

I. Introduction

BJECTIVE metrics have the aim to automatically pro-

vide a quantitative measure for image quality aspects,

and to eventually serve as computational alternative for expen-

sive image quality assessments by human observers. They are

of fundamental importance to a broad range of applications,

such as the optimization of digital imaging systems, bench-

marking of image and video coding, and quality monitoring

and control in displays [1]. They are generally classiﬁed into

full-reference (FR) metrics and no-reference (NR) metrics,

depending on the use of the original image or video. FR

metrics are based on measuring the similarity or ﬁdelity

between the distorted image and its original version, which

is considered as a distortion-free reference. The most widely

used FR metrics are mean squared error and peak signal-to-

noise ratio. These metrics, however, have long been criticized

for their poor correlation with perceived image quality [1].

A lot of research effort is devoted to the development of

FR metrics that can reﬂect the way human beings perceive

image quality [2]. Improved alternatives of FR metrics include

the structural similarity index [3] and the visual information

Manuscript received March 12, 2009, revised June 23, 2009 and August 7,

2009. First version published November 3, 2009; current version published

April 2, 2010. This paper was recommended by Associate Editor, X. Li.

H. Liu and N. Klomp are with the Department of Mediamatics, Delft

University of Technology, Delft 2628 CD, The Netherlands (e-mail:

hantao.liu@tudelft.nl; n.c.r.klomp@tudelft.nl).

I. Heynderickx is with Philips Research Laboratories, Eindhoven 5656 AE,

The Netherlands. She is also with Delft University of Technology, Delft 2628

CD, The Netherlands (e-mail: ingrid.heynderickx@philips.com).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TCSVT.2009.2035848

ﬁdelity index [4]. Since FR metrics require the access to the

original, which is however not always available in real-world

applications, they are usually employed as tools for in-lab

testing of image and video processing algorithms. NR metrics

instead are more practical because the quality prediction is

based on the distorted image only. However, designing NR

metrics is still an academic challenge mainly due to the limited

understanding of the human visual system (HVS).

In the last decades, considerable progress on the develop-

ment of NR metrics is made, and some successful methods

are reported in the literature [5]–[19]. In [5], natural scene

statistics are used to blindly measure the quality of images

compressed by JPEG2000. The approach in [5] relies on the

assumption that typical natural images exhibit strong statistical

regularities, and therefore, reside in a tiny area of the space

containing all possible images. Based on this assumption it

quantiﬁes image quality by detecting variations in statistical

image features in the wavelet domain. In [6] and [7], NR

image quality assessment is formulated as a machine learning

problem, in which the HVS is treated as a black box whose

input–output relationship, such as the one between image

characteristics and the quality rating, is to be learned. After

appropriate training with subjective data, these models proved

to be able to consistently predict the perceived quality of JPEG

compressed or otherwise distorted images.

A large number of NR metrics, proposed, e.g., in [8]–[19],

are based on directly measuring a speciﬁc type of artifact

created by a speciﬁc image distortion process, such as blur

caused by acquisition systems, sensor noise, and compression

artifacts. In such a scenario, the design of the NR metric can

make use of the speciﬁc characteristics of the artifact, and

therefore, generally obtains a higher reliability with perceived

quality degradation [1]. Fortunately, in many practical appli-

cations, the distortion processes involved are known, and thus,

the design of speciﬁc NR metrics turns out to be much more

realistic and useful. They can, for example, be combined to

predict the overall perceived quality. Various examples of this

approach are given in literature. A blockiness metric (see, e.g.,

[8]–[11]) can be combined with a ﬂatness metric (see, e.g.,

[12] and [13]) to evaluate the quality of images or video after

block-based compression. A ringing metric and a blur metric

are often combined to assess the image quality of wavelet-

based compression (see, e.g., [14]–[16]). In [17] and [18], mul-

tiple artifact metrics are adopted to predict the overall quality

of still images or video. In addition to assessing the overall

image quality, these speciﬁc artifact metrics individually are

1051-8215/$26.00

 2010 IEEE

Authorized licensed use limited to: Technische Universiteit Delft. Downloaded on April 18,2010 at 21:23:23 UTC from IEEE Xplore. Restrictions apply.

530 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010

beneﬁcial for optimizing real-time digital imaging systems

[20]–[22]. In the video chain of current television (TV)-sets,

various NR metrics, which quantify the quality of the incoming

video based on the occurrence of individual artifacts, are used

to adapt the parameter settings of the video enhancement algo-

rithms accordingly (see, e.g., [23] and [24]). To optimize the

performance of both applications mentioned above, reliably

modeling speciﬁc types of artifacts has clear added value.

Since the widespread use of compression, research on NR

metrics is mainly dedicated to compression artifacts and trans-

mission errors [25]. Especially, the blocking artifact, which is

one of the most annoying artifacts introduced by block-based

compression algorithms [26], such as JPEG or MPEG/H.263,

got a lot of attention. Another compression artifact, especially

visible at relatively high-bit rates of block-based compression

[21], [26], but also in wavelet compression [27], is ringing.

Unlike the blocking artifact, whose spatial location is very

regular and thus easily predictable, the location of ringing is

edge dependent, and as such also image content dependent.

This makes the task of quantifying ringing annoyance much

more difﬁcult. In this paper, we present our recent efforts to

develop a NR ringing metric, validate its performance using

a subjective study of ringing annoyance in JPEG compressed

images, and compare its performance against existing ringing

metrics. Before discussing our approach (Section III) and its

performance (Sections IV and V), a more extended explanation

of the occurrence and visibility of ringing, and an overview

of existing ringing metrics are given in Section II.

II. Background

A. Perceived Ringing Artifacts

1) Physical Structure: Current image and video coding

techniques are based on lossy data compression, which con-

tains an inherent irreversible information loss. This loss is

due to coarse quantization of the image’s representation in

the frequency domain. The loss within a certain spectral band

of the signal in the transform domain reveals itself most

prominently at those spatial locations where the contribution

from this spectral band to the overall signal power is signiﬁcant

(see [26], [27], and [38]). Since the high-frequency compo-

nents play a signiﬁcant role in the representation of an edge,

coarse quantization in this frequency range (i.e., truncation

of the high-frequency transform coefﬁcients) consequently

results in apparent irregularities around edges in the spatial

domain, which are usually referred to as ringing artifacts.

More speciﬁcally, ringing artifacts manifest themselves in the

form of ripples or oscillations around high-contrast edges in

compressed images. They can range from imperceptible to

very annoying, depending on the data source, target bit rate, or

underlying compression scheme [38]. As an example, Fig. 1

illustrates ringing artifacts induced by JPEG compression on

a natural image.

The occurrence of ringing spreads out to a ﬁnite region

surrounding the edges, depending on the speciﬁc implementa-

tion of the coding technique. For example, in discrete cosine

transform (DCT) coding ringing appears outwards from the

edge up to the encompassing block’s boundary [26]. An

Fig. 1. Illustration of ringing artifacts. (a) Natural image compressed with

JPEG (MATLAB’s imwrite function with “quality” of 30). (b) Gray-scale

intensity proﬁle along one row of the compressed image [indicated by the

solid double arrowhead line in (a)]. Dashed lines “e1,” “e2,” and “e3” indicate

the position of the sharp intensity transitions (i.e., edges) along that arrow.

Ringing can be perceived as ﬂuctuations in the gray-scale values around the

edges at “e1,” “e2,” and “e3,” while the image content here should be uniform.

example of how to calculate the extent of the ringing region

in a particular codecs is given in [38]. In addition to the edge

location dependency, the behavior of ringing also depends on

the strength of the edges. It is found in [14], [29], and [38]

that, over a wide range of compression ratios, the variance

of the ringing artifacts is proportional to the contrast of the

associated edge. These important ﬁndings have great potential

in the design of a reliable ringing metric, and therefore, are

explicitly adopted in our algorithm.

2) Masking of the HVS: Taking into account the way

the HVS perceives artifacts, while removing perceptual re-

dundancies, can be greatly beneﬁcial for matching objective

artifact measurement to the human perception of artifacts

[39]. Masking designates the reduction in the visibility of one

stimulus due to the simultaneous presence of another, and

it is strongest when both stimuli have the same or similar

frequency, orientation, and location [41]. It is basically due to

the limitations in sensitivity of a certain cell or neuron at the

retina in relation to the activity of its surrounding cells and

neurons. There are two fundamental visual masking effects

highly relevant to the perception of ringing artifacts [28]–[31].

The ﬁrst one is luminance masking, which refers to the effect

that the visibility of a distortion (such as ringing) is maximum

for medium background intensity, and it is reduced when the

distortion occurs against a very low or very high intensity

background [40]. This masking phenomenon happens because

of the brightness sensitivity of the HVS, where the average

brightness of the surrounding background alters the visibility

threshold of a distortion [42]. The second masking effect

is texture masking, which refers to the observation that a

distortion (such as ringing) is more visible in homogenous

areas than in textured or detailed areas [40]. In textured

image regions, small variations in the texture are masked by

the macro properties of genuine high-frequency details, and

therefore, are not perceived by the HVS [38]. The effect

of luminance and texture masking on ringing artifacts is

illustrated in Figs. 2 and 3, respectively.

B. Existing Ringing Metrics

Until recently, only a limited amount of research effort was

devoted to the development of a ringing metric. Some of these

Authorized licensed use limited to: Technische Universiteit Delft. Downloaded on April 18,2010 at 21:23:23 UTC from IEEE Xplore. Restrictions apply.

LIU et al.: A NO-REFERENCE METRIC FOR PERCEIVED RINGING ARTIFACTS IN IMAGES 531

Fig. 2. Example of luminance masking on ringing artifacts. (a) Image patch

compressed with JPEG (MATLAB’s imwrite function with “quality” of 30).

(b) Pixel intensity proﬁle along one row of the compressed image patch

[indicated by the solid double arrowhead line in (a)]. Original image includes

two adjacent parts with different gray-scale levels (i.e., 5 for “a1” and 127 for

“a2”). Note that although both sides of a step edge exhibit ringing artifacts,

the visibility of ringing differs.

Fig. 3. Example of texture masking on ringing artifacts. (a) Image patch

extracted from a JPEG compressed image of bit rate 0.59 bits per pixel

(b/p). (b) Pixel intensity proﬁle along one row of the compressed image patch

[indicated by the solid double arrowhead line in (a)]. Dashed line “e” indicates

the object boundary edge. Note that although both sides of the edge at “e”

exhibit ringing artifacts, the visibility of ringing differs.

metrics are FR, others NR. A FR approach presented in [14]

starts from ﬁnding important edges in the original image (noise

and insigniﬁcant edges are removed by applying a threshold to

the Sobel gradient image), and then measures ringing around

each edge by calculating the difference between the processed

image and the reference. Since this metric needs the original

image, it has its limitations, e.g., for the application in a TV

chain. The NR ringing metric, proposed in [17], performs a

anisotropic diffusion on the image and measures the noise

spectrum ﬁltered out by the anisotropic diffusion process. The

basic idea behind this metric is that due to the effectiveness

of anisotropic diffusion on deringing, the artifacts would be

mostly assimilated into the spectrum of the ﬁltered noise. The

NR ringing metric described in [16] indentiﬁes the ringing

regions around strong edges in the compressed image, and

deﬁnes ringing as the ratio of the activity in middle low

over middle high frequencies in these ringing regions. An

obvious shortcoming of the metrics deﬁned in [14], [16], and

[17] is the absence of masking, typically occurring in the

HVS, with the consequence that these metrics do not always

reﬂect perceived ringing. Typical masking characteristics, such

as luminance and texture masking, are explicitly considered

in the metrics deﬁned in [28] and [29], in which ringing

regions are no longer simply assumed to surround all strong

edges in an image, but are determined by a model of the

HVS. Including a HVS model in an objective metric might

improve its accuracy, but often is computationally intensive

for real-time applications. For example, the HVS model used

in the metric presented in [28] largely depends on a parameter

estimation procedure, which requires a number of calculations

to achieve an optimal selection. The model described in [29] is

based on a computationally heavy clustering scheme, including

both color clustering and texture clustering. From a practical

point of view, it is highly desirable to reduce the complexity

of the HVS-based metric without compromising its overall

performance.

The essential idea behind most of the existing metrics

mentioned so far (see, e.g., in [14], [16], and [28]) is that

they consist of a two-step approach. The ﬁrst step identiﬁes

the spatial location, where perceived ringing occurs, and the

second step quantiﬁes the visibility or annoyance of ringing

in the detected regions. This approach intrinsically avoids the

estimation of ringing in irrelevant regions in an image, thus

making the quantiﬁcation of ringing annoyance more reliable,

and the calculation more efﬁcient. Additionally, a local de-

termination of the artifact metric provides a spatially varying

quality degradation proﬁle within an image, which is useful

in, e.g., video chain optimization as mentioned in Section I.

Since ringing occurs near sharp edges, where it is not visually

masked by local texture or luminance, the detection of ringing

regions largely relies on an edge detection method followed by

a HVS model. Existing methods (such as, e.g., [14], [16], [28],

and [29]) usually employ an ordinary edge detector, where a

threshold is applied to the gradient image to capture strong

edges. Depending on the choice of the threshold, this runs

the risk of omitting obvious ringing regions near nondetected

edges (e.g., in case of a high threshold) or of increasing

the computational cost by modeling the rather complex HVS

near irrelevant edges (e.g., in case of a low threshold). This

implies that to ensure a reliable detection of perceived ringing

while maintaining low complexity for real-time applications,

an efﬁcient approach for both detecting relevant edges and

modeling the HVS is needed. Quantiﬁcation of the annoyance

of ringing in the detected areas can be easily achieved by

calculating the signal difference between the ringing regions

and their corresponding reference, as used in the FR approach

described in [14]. However, for a NR ringing metric, the

quantiﬁcation of ringing becomes more challenging mainly

due to the lack of a reference. Metrics in literature (such as

in [16] and [28]) estimate the visibility of ringing artifacts

from the local variance in intensity around each pixel within

the detected ringing regions, and average these local variances

over all ringing regions to obtain an overall annoyance score.

This approach, however, has limited reliability, since it does

not include background texture in the ringing regions, which

might affect ringing visibility.

To validate the performance of a ringing metric, its predicted

quality degradation should be evaluated against subjectively

perceived image quality. To prove whether a ringing metric is

robust against different compression levels and different image

content, the correlation between its objective predictions and

subjective ringing ratings must be calculated. Unfortunately,

Authorized licensed use limited to: Technische Universiteit Delft. Downloaded on April 18,2010 at 21:23:23 UTC from IEEE Xplore. Restrictions apply.

532 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010

Fig. 4. Schematic overview of the proposed ringing region detection method.

In PEM, each perceptually relevant LS is labeled in a different color. In the

CRR map, the white areas indicate the detected perceived ringing regions,

and the spatial location of these regions is illustrated in a separate image by

green areas.

only the performance of the metric reported in [14] is evaluated

against subjective data of perceived ringing. For all other

metrics (such as the ones in [15]–[17] and [28]) nothing can

be concluded with respect to their performance in predicting

perceived ringing. Since we had no access to the data used

in [14] for our metric evaluation, we performed our own

subjective experiment.

In this paper, we propose a NR ringing metric based on

the same two-step approach mentioned above. For the ﬁrst

step, we rely on our ringing region detection method (see [30]

and [31]), the performance of which in terms of extracting

regions with perceived ringing has been shown to be promising

[31]. Therefore, we consider this part of the metric readily

applicable for the second step, in which the ringing annoyance

is quantiﬁed. To quantify ringing annoyance, we consider each

detected ringing region as a perceptual element, in which the

local visibility of ringing artifacts is estimated. The contrast

in activity between each ringing region and its corresponding

background is calculated as the local annoyance score, which

is then averaged over all ringing regions to yield an overall

ringing annoyance score. It should be noted that the proposed

metric is built upon the luminance component of images only

in order to reduce the computational load. The performance

of the NR metric is evaluated against subjective ringing

annoyance in JPEG compression.

III. Proposed NR Ringing Metric

A. Perceived Ringing Region Detection

For the design of our ringing region detection method (see

[30] and [31]), we explicitly exploited the speciﬁc physical

structure of ringing artifacts and some properties of the

HVS. The overall proposed algorithm is schematically shown

in Fig. 4, which mainly consists of two processing steps:

1) extraction of edges relevant for ringing, which results in

a perceptual edge map (PEM), and 2) detection of perceived

ringing regions, which yields a computational ringing region

(CRR) map. This method is already described in more detail

in [30] and [31], and is only brieﬂy repeated here.

The data collected from this experiment are available to the image

quality assessment community on the website http://mmi.tudelft.nl/∼ingrid/

ringing.html.

To extract the most relevant edges for the purpose of ringing

detection, an advanced edge detector is used. It adopts a

bilateral ﬁlter [32] to largely smooth “irrelevant edges” (i.e.,

in textured areas), while the position of the “relevant edges”

(e.g., contours of objects) is retained. Subsequently, a Canny

edge detector [33] is applied on the ﬁltered image to obtain the

“relevant edges.” The detected edges are combined into line

segments [hereafter referred to as line segment (LS)], which

are deﬁned as elements of connected edge pixels. These LSs

are constructed over the Canny edge map by a simple grouping

process, including skeletonizing, edge linking, noise removal,

and LS labeling. Fig. 4 shows the extracted PEM, which is

formed by a set of these LSs. It clearly illustrates the selection

of the edges more relevant for ringing (i.e., the contours of the

leopard) in combination with the avoidance of the irrelevant

edges (i.e., the texture in the skin of the leopard).

To select the edges around which ringing is actually per-

ceived each LS of the PEM is examined individually on the

occurrence of perceived ringing. To this end, the region around

a LS is divided into three zones: 1) the edge region (i.e.,

EdReg); 2) the detection region (i.e., DeReg); and 3) the

feature extraction region (i.e., FeXReg). First, the level of

texture or detail is estimated from the FeXReg, and those parts

of the DeReg, in which the visibility of ringing is masked by

texture, are discarded. Subsequently, the average luminance in

each remaining part of the DeReg is calculated and those parts

with a value above or below a certain threshold are discarded.

In this way, only those regions around each LS, in which

ringing is visible, are extracted, and then accumulated in the

CRR map as illustrated in Fig. 4.

B. Ringing Annoyance Estimation

The CRR map indicates the spatial location of perceived

ringing, but it does not give any information yet on how

annoying the ringing artifacts in the detected region are. To

quantify ringing annoyance, we ﬁrst split up the detected

region in the CRR map into so-called ringing objects (ROs).

Fig. 5 illustrates the deﬁnition of an RO. It starts from the

LSs of the PEM, shown in Fig. 4. Each LS is considered

to be split up in a set of connected components (i.e., objects)

depending on the local level of texture and averaged luminance

in its DeReg (as deﬁned in [30] and [31]). Then, by using the

model of the HVS, the visibility of ringing in each object

is determined. By removing the objects, in which ringing is

invisible due to masking, the remaining objects are deﬁned as

ROs. As an example, illustrated in Fig. 5(b) the LS1 of the

PEM in Fig. 4 is split up in two ROs, while the LS2 remains

as one RO. Some of the LSs, e.g., LS5, LS6, LS8, and LS9,

do not result in an RO, since no visible ringing is detected

around this LS based on the HVS. So, each RO intrinsically

is a single cluster resulting from the application of the human

vision model to the LSs of the PEM. Hence, the deﬁnition of

an RO fully relies on the local image content, and as such, is

independent of scaling or cropping the image. Once the ROs

are deﬁned [as illustrated in Fig. 5(c)], a ringing annoyance

score (RAS) is calculated for each of them, and the overall

annoyance score for the image is simply the mean of the RAS

over all ROs.

Authorized licensed use limited to: Technische Universiteit Delft. Downloaded on April 18,2010 at 21:23:23 UTC from IEEE Xplore. Restrictions apply.

LIU et al.: A NO-REFERENCE METRIC FOR PERCEIVED RINGING ARTIFACTS IN IMAGES 533

Fig. 5. Illustration of the deﬁnition of an RO. (a) Original JPEG image and

two (out of ten) of its detected LSs (i.e., LS1 and LS2 of the PEM in Fig. 4).

(b) Implementation of the human vision model to LS1 and LS2, resulting in

two separate ROs for LS1 and one RO for LS2. (c) All detected ROs as a

result of applying the human vision model to the whole PEM (i.e., ten LSs);

they are indicated with different colors.

Fig. 6. Illustration of region assignment. (a) RO [see “RO3” in Fig. 5(b)]

with its corresponding LS and feature extraction region (FeXReg).

(b) Corresponding edge of LS covered by the dilated RO is assigned as the

Sub-LS. (c) Corresponding region of FeXReg covered by the dilated RO is

assigned as the Sub-FeXReg. (d) Results of region assignment.

The approach taken to quantify perceived ringing is inspired

by the basic idea used in the FR metric [14], and is accom-

plished by the following two steps: 1) calculating the activity

of each RO; and 2) comparing that activity to the activity in

the neighboring background to which the RO belongs.

1) Region Assignment: To implement the two steps men-

tioned above, we ﬁrst assign two relevant components to each

RO in the CRR map: 1) the edge corresponding to each

LS (i.e., referred to as Sub-LS), which is used to determine

whether a pixel in the RO is a visible ringing pixel, and

2) the corresponding FeXReg region (i.e., referred to as Sub-

FeXReg), which is employed as the reference for the RO. The

FeXReg is located far away from the LS, and thus unlikely

to be impaired by ringing artifacts. This region assignment is

implemented by thickening an RO with a dilation operation.

The corresponding LS and FeXReg which are covered by

the RO during the dilation process are referred to as the

Sub-LS and Sub-FeXReg, respectively. Fig. 6 illustrates this

procedure. A speciﬁc RO (i.e., “RO3” in the CRR map of

Fig. 5) with its corresponding LS and FeXReg are shown

in Fig. 6(a). When dilating the RO with a square structuring

element of 5 pixels width (e.g., for an image of 256 × 384

(height × width) pixels), the region of LS which is covered by

the expanded RO is assigned as the Sub-LS (i.e., the yellow

region in Fig. 6(b)). The Sub-FeXReg [i.e., the purple region

Fig. 7. Illustration of the list of coordinates as the result of region assignment

(the total number of RO in the CRR map is n).

in Fig. 6(c)] is assigned in the same way by dilating the

RO with a square structuring element of 9 pixels width. The

resulting Sub-LS and Sub-FeXReg are shown in Fig. 6(d). It

is noted that the size of the structuring element should be

linearly scaled with the image size. The region assignment

mentioned above is performed for each RO in the CRR map

to eventually obtain a list of coordinates, which indicates the

spatial location of each individual RO and its corresponding

Sub-LS and Sub-FeXReg. Fig. 7 indicates the format of such a

resulting list of coordinates. This way of working intrinsically

facilitates the subsequent local analysis and processing of

image characteristics.

2) Local Visibility of Ringing Pixels: Since ringing man-

ifests itself in the form of artiﬁcial oscillations in the spatial

domain, its local behavior can be reasonably described as the

intensity variance of pixels in the neighborhood [28], [29]. In

this paper, determining whether a pixel in an RO is a visible

ringing pixel is based on calculating the local variance (LV)

in intensity in its 3 × 3 neighborhood, which is formulated as

LV(i, j)=

i+1



k=i−1

j+1



l=j−1

⎡

⎣

(

k, l

)

−

i+1



k=i−1

j+1



l=j−1

I(k, l)

⎤

⎦

( i, j ) ∈ Coord{RO

} (1)

where LV(i, j) denotes the local variance computed over a

3 × 3 template, centered at pixel (i, j ) having an intensity

I(i, j) within the nth ringing object (i.e., RO

The LV only yields an accurate result in case the RO is

originally smooth around the edge; indeed, otherwise the LV

can be high due to the activity of a textured or edge pixel.

One would expect that the issue of considering texture as

ringing is efﬁciently avoided by the application of a texture

masking model in the ringing region detection phase (see

[30] and [31]). However, we experienced that the dilation

operation used in the human vision model may misclassify

certain edge or texture components into an RO. In addition,

there might be pixels in the RO exhibiting no or a very small

intensity variance in their neighborhoods, which means they

are not impaired by ringing artifacts (e.g., in higher bit rate

compression). This implies that an RO still possibly contains

spurious ringing pixels, which manifest themselves either as

“noisy pixels” (i.e., misclassiﬁed edge or texture pixels) or

as “unimpaired pixels” (i.e., pixels with a very low variance

in intensity in the neighborhood). Fig. 8 gives an example

of the image content underneath a detected RO (i.e., “RO2”

as illustrated in Fig. 5), where noisy pixels and unimpaired

pixels coexist with real ringing pixels. Calculating the LV

Authorized licensed use limited to: Technische Universiteit Delft. Downloaded on April 18,2010 at 21:23:23 UTC from IEEE Xplore. Restrictions apply.

A No-Reference Metric for Perceived Ringing Artifacts in Images

Figures

Citations

A Feature-Enriched Completely Blind Image Quality Evaluator

Perceptual image quality assessment: a survey

Visual Attention in Objective Image Quality Assessment: Based on Eye-Tracking Data

No-Reference Image Blur Assessment Based on Discrete Orthogonal Moments

Hybrid No-Reference Quality Metric for Singly and Multiply Distorted Images

References

Image quality assessment: from error visibility to structural similarity

A Computational Approach to Edge Detection

Bilateral filtering for gray and color images

Image information and visual quality

JPEG2000 : image compression fundamentals, standards, and practice

Related Papers (5)

No-Reference Image Quality Assessment in the Spatial Domain

Image quality assessment: from error visibility to structural similarity

Blind Image Quality Assessment: A Natural Scene Statistics Approach in the DCT Domain

Blind Image Quality Assessment: From Natural Scene Statistics to Perceptual Quality

Making a “Completely Blind” Image Quality Analyzer

Frequently Asked Questions (10)

Q1. What are the contributions in "A no-reference metric for perceived ringing artifacts in images" ?

Q2. What are the future works mentioned in the paper "A no-reference metric for perceived ringing artifacts in images" ?

Q3. What is the definition of ringing artifacts?

Q4. What is the ringing behavior of a pixel?

Q5. What can be done to evaluate the performance of the metric?

Q6. What is the way to evaluate the performance of the metric?

Q7. How can the authors measure the annoyance of ringing in detected areas?

Q8. Why is the proposed metric built upon the luminance component of images?

Q9. What is the main reason why the metric does not reflect perceived ringing?

Q10. What are the limitations of spurious ringing pixels?