Journal Article•DOI•

State of the "Art”: A Taxonomy of Artistic Stylization Techniques for Images and Video

Jan Eric Kyprianidis¹, John Collomosse², Tinghuai Wang², Tobias Isenberg³•Institutions (3)

University of Potsdam¹, University of Surrey², French Institute for Research in Computer Science and Automation³

01 May 2013-IEEE Transactions on Visualization and Computer Graphics (IEEE)-Vol. 19, Iss: 5, pp 866-885

TL;DR: This paper presents a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique, and describes a chronology of development from the semiautomatic paint systems of the early nineties, through to the automated painterly rendering system of the late nineties driven by image gradient analysis.

read less

Abstract: This paper surveys the field of nonphotorealistic rendering (NPR), focusing on techniques for transforming 2D input (images and video) into artistically stylized renderings. We first present a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique. We then describe a chronology of development from the semiautomatic paint systems of the early nineties, through to the automated painterly rendering systems of the late nineties driven by image gradient analysis. Two complementary trends in the NPR literature are then addressed, with reference to our taxonomy. First, the fusion of higher level computer vision and NPR, illustrating the trends toward scene analysis to drive artistic abstraction and diversity of style. Second, the evolution of local processing approaches toward edge-aware filtering for real-time stylization of images and video. The survey then concludes with a discussion of open challenges for 2D NPR identified in recent NPR symposia, including topics such as user and aesthetic evaluation.

...read moreread less

Summary (10 min read)

Jump to: [1 INTRODUCTION] – [Late 1980s Advances in media emulation] – [2 TAXONOMY OF IB-AR TECHNIQUES] – [2.1 Stroke-based Rendering (SBR)] – [2.1.1 Brush Stroke Techniques] – [2.1.2 Mosaicking, Tiling and Stippling] – [2.2 Region-based Techniques] – [2.3 Example-based Rendering] – [2.4 Image Processing and Filtering] – [3 CLASSICAL STYLIZATION ALGORITHMS] – [3.1 Local Algorithms for Stroke Placement] – [3.1.1 Early Pen-and-Ink Hatching Algorithms] – [3.1.2 Early Painterly Rendering Algorithms] – [3.2 Local Coarse-to-fine IB-AR Algorithms] – [3.3 Video Stylization] – [4 VISION FOR STYLIZATION] – [4.1 Perceptual Measures for Stylization] – [4.2 Artistic Rendering as a Global Optimization] – [4.2.1 Global Approaches to SBR: Brush-based] – [4.2.2 Global Approaches to SBR: Tonal Depiction] – [4.3.1 Texture by Analogy] – [4.3.2 Color Transfer] – [4.4 Region-based IB-AR Algorithms] – [4.4.1 Region Painting and Texturing] – [4.4.2 Deformation and Composition] – [4.5 Region Tiling and Packing Algorithms] – [4.5.1 Photo and Video Mosaics] – [4.5.2 Voronoi Methods] – [4.5.3 Packing and Tessellation methods] – [4.6 Computer Vision for Video Stylization] – [4.6.1 Visual Stylization through Optical Flow] – [4.6.2 Visual Stylization through Segmentation] – [4.6.3 Motion Stylization] – [5 IMAGE PROCESSING AND FILTERING] – [5.1 Bilateral Filter and Difference of Gaussians] – [5.3 Diffusion and Shock Filter] – [5.4 Morphological Filtering] – [5.5 Gradient Domain Techniques] – [6 FUTURE CHALLENGES] – [6.1 Evaluation] – [6.2 Interaction] and [6.3 Technical directions]

1 INTRODUCTION

A S the advent of photography stimulated artistic diversity in the late 19 th century, so did the successes of photorealistic computer graphics in the early nineties motivate alternative techniques for rendering in non-photorealistic styles.
It is this latter category of artistic rendering (AR) that forms the subject of this survey; specifically, techniques focusing on artistic stylization of two-dimensional content (photographs and video) to which the authors refer as image-based artistic rendering (IB-AR).
Today, IB-AR has diversified into a highly cross-disciplinary activity, which builds upon computer vision (CV), perceptual modeling, human computer interaction (HCI), and computer graphics.

Late 1980s Advances in media emulation

From the semi-automated SBR systems of the early nineties, to increasingly automated systems drawing upon image processing.
Later the aesthetic gamut is enhanced through more sophisticated computer vision and edge-aware filtering.
Next, the authors describe how the early convergence of computer graphics and image processing developed, enabling IB-AR to draw increasingly upon the more sophisticated image analysis offered by contemporary computer vision algorithms (Sec. 4).
One consequence of the increasingly sophisticated interpretation or 'parsing' of the image was a divergence from SBR to alternative forms of rendering primitives: the use of regions and tiles which, in turn, unlocked greater diversity in the gamut of styles available to IB-AR.

2 TAXONOMY OF IB-AR TECHNIQUES

Early prototype IB-AR systems followed the SBR paradigm and synthesized artistic renderings by incrementally compositing virtual brush strokes whose color, orientation, scale, and ordering were derived from semi- [47] or fully automated processes [55] , [90] , [151] .
The problems of media emulation and stroke placement may be considered de-coupled.
The curved spline strokes placed by Hertzmann's [55] algorithm could be rendered by sweeping various brush models along their trajectories, to emulate thick oil paint, crayon, charcoal, or pastel, to name but a few different media.
The authors also survey nonlinear filters that introduce an anisotropy that conveys the impression of stroke placement.
Accordingly, their taxonomy avoids the categorization of IB-AR purely in terms of media (painterly, sketch, cartoon shading) and instead clusters the space of IB-AR algorithms by the elementary rendering primitive or stylization mechanism employed.

2.1 Stroke-based Rendering (SBR)

SBR algorithms cover a 2D canvas with atomic rendering primitives according to some process or desired end goal, designed to simulate a particular style.
In many SBR algorithms these primitives are the eponymous virtual brush stroke, but the definition of SBR has diversified to primitives including tiles, stipples and hatch marks [58] .

2.1.1 Brush Stroke Techniques

The most prevalent form of IB-AR are perhaps SBR algorithms using either short dabs of paint, or long curved brush strokes as rendering primitives.
The process of covering the canvas can be categorized broadly as local or global.
Local approaches typically drive stroke placement decisions based on the pixels in the spatial neighborhood of the stroke; this can be explicit in the algorithm (e. g., image moments within a window [140] , [151] ) or implicit due to a prior convolution (e. g., Sobel edges).
Various strategies have been applied from snake relaxation [56] , to evolutionary algorithms [17] , and Monte-Carlo optimization [150] .
In the parallel SBR branch of semi-automated (i. e., userassisted) algorithms, the low/high-level distinction is again mirrored; with early techniques relying on image filters to orient brush strokes [47] and later work-predating automated measures for emphasis-using gaze trackers to directly harness the perceptual measures inherent in the human visual system [131] .

2.1.2 Mosaicking, Tiling and Stippling

A further sub-category of SBR aims to approximate the image using a medium other than colored pixels or paint, packing image regions with a multitude of atomic rendering primitives.
The techniques approximate the image content by either (i) stippling, the distribution of small points often for the purpose of tonal depiction; (ii) hatching, the use of line patterns or curves for the same; and (iii) mosaicking algorithms that pack small tiles together.
Stippling IB-AR techniques are closely related to digital half-toning and dithering algorithms that locally approximate regions using dot patterns, either with the sole goal of representing a local brightness or with an additional artistic intent [114] .
This culminated most recently in techniques designed to emphasize image structure [118] , following the trend toward perceptual analysis in SBR.
Aside from dedicated image-based hatching approaches [129] , some techniques grow labyrinthian patterns using spacefilling curves [26] or reaction diffusion processes [123] that adapt to the intensity of the image.

2.2 Region-based Techniques

Much as SBR in the 1990s relied increasingly on lowlevel image processing (e. g., intensity gradient, moments, optical flow), a trend post-2000 was the emergence of mid-level computer vision in IB-AR.
For images, the authors categorize region-based approaches into those considering the arrangement of rendering primitives (e. g., strokes) within the interiors of regions and those manipulating shape, form, and composition of regions.
Both methodologies have seen applications to IB-AR for the purpose of cartooning or otherwise stylizing the appearance of video.
This frames the problem of IB-AR as one of automated rotoscoping.
Finally, when considering regions, it is possible to track and analyze the motion of objects.

2.3 Example-based Rendering

Most IB-AR algorithms encode a set of heuristics, typically emulating artistic practice with the goal of faithfully depicting a prescribed style.
A complementary approach to IB-AR-example-based rendering pioneered by Hertzmann et al. [59] -learns the mapping between an exemplar pair: a source image and an artist's rendering of that image.
Color EBR typically performs a piecewise mapping between the color histograms of two images to effect a nonphotorealistic recoloring.
Often there is only weak enforcement of spatial coherence in the color mapping process.
The corresponding patch from the exemplar artistic image is then pasted into place in the output rendering.

2.4 Image Processing and Filtering

Many image processing filters have been explored for IB-AR but few have been recognized so far to produce interesting results from an artistic point of view.
Among the filtering approaches to IB-AR, the authors distinguish two major categories depending on the domain the techniques operate on.
Most approaches that have been derived from classical image processing techniques fall into this category.
The authors adopt the usual distinction between first and second order derivative methods.
Techniques based on the bilateral filter fall into the anisotropic diffusion category since the bilateral filter can be interpreted as fast filter-based approximation of anisotropic diffusion.

3 CLASSICAL STYLIZATION ALGORITHMS

IB-AR arguably began to gain momentum in the early 1990s with semi-automated paint systems such as Haeberli's [47] that enabled photos to be transformed into impressionist-like paintings with minimal labor.
People would click on a photo, each click prompting the generation of a virtual brush stroke of the underlying image color.
The introduction of noise prior to this process results in stroke color variation reminiscent of an impressionist painting.
Haeberli's system [47] was motivated by a desire to enrich digital painting by automating the color selection process (reducing the 'time to palette').
This concept of stroke-based rendering (SBR) [58] underpins almost all of the IB-AR work developed in the nineties, which sought increasingly to automate and to enhance the sophistication of stroke placement.

3.1 Local Algorithms for Stroke Placement

Haeberli's framework [47] automates the selection of stroke color, and for non-circular brush strokes can also decide stroke orientation by painting strokes orthogonal to the intensity gradient in the source image.
The system relies upon the user to determine the order and Fig. 3 .
Adapting Haeberli's framework [47] to randomly assign stroke size and order leads to loss of salient detail [48] and motivated later the use of image processing operators for stroke placement.
The size and sequencing of stroke overpainting is crucial to producing results with an acceptable aesthetic and without any loss of salient detail.
With this solution strokes can be painted at sizes disproportionate to the features they represent.

3.1.1 Early Pen-and-Ink Hatching Algorithms

Early semi-automated systems for rendering in pen-andink and cross-hatched styles follow a similar pattern of development.
Salisbury et al. [129] developed a semiautomated hatching system that oriented textures according to the underlying image gradients, much as Haeberli's oriented brush strokes [47] .
A multi-scale extension of the system [128] offered aesthetic improvements in the viewing of hatching patterns at multiple scales.
Regionbased editing and manipulation of the underlying image gradient was later introduced by Salisbury et al. [130] , enabling discontinuities and swirl effects to be manually introduced, improving tool expressiveness.

3.1.2 Early Painterly Rendering Algorithms

The first automatic solution to IB-AR described in full detail within the literature was a painterly rendering tool proposed by Litwinowicz [90] .
Strokes are rectangular, and oriented using Sobel gradients as done previously [47] , [48] .
The clipping process results in crisp edges that mitigate against strokes from unimportant regions over-painting more important regions.
Later, Hays and Essa [52] adopted a similar technique for interpolation in their video painting algorithm.
A notable exception is Treavett and Chen's [151] adoption of image moments computed local to each pixel.

3.2 Local Coarse-to-fine IB-AR Algorithms

Constant-sized rectangular strokes, even after clipping, generate an artificial regularity that could degrade the resulting aesthetic.
Each scale of the pyramid corresponds to a layer in the painting; the coarsest scale is painted as the first layer with large strokes.
The latter enables texturing or bump-mapping of the stroke to produce a convincing oil-paint effect [57] .
Multi-resolution rendering image analysis was adopted soon afterwards by half-toning algorithms that similarly distributed rendering primitives (stipples or short lines) across a low-pass pyramid [147] .

3.3 Video Stylization

The extension of SBR to video stylization is non-trivial, since independent per-frame rendering of the image sequence will result in a distracting flickering or scintillation in the animation.
It is, therefore, desirable that: 1) the motion of brush strokes both matches the motion of the underlying video content, and 2) the animated sequence is flicker free.
In the case of overly dense regions, strokes are deleted at random until the density reaches acceptable levels.
When a reliable optical flow estimate is available, the technique performs well.
In the late nineties real-time optical flow was impractical, leading Hertzmann and Perlin [60] to present an alternative video painting technique amenable to interactive rendering: a "Living Painting.".

4 VISION FOR STYLIZATION

An increasing reliance upon local image processing techniques (predominantly the Sobel gradient operator) was instrumental in transforming the interactive IB-AR systems of the early nineties into fully automatic rendering systems.
Continuing this trend towards deeper image analysis, a major trend post-nineties was the tendency to rely increasingly upon higher-level computer vision to guide artistic rendering.
This trend began with the adoption of mid-level computer vision methods, specifically the use of image segmentation.

4.1 Perceptual Measures for Stylization

DeCarlo and Santella [29] were among the first to apply image segmentation in IB-AR.
Images were segmented using a variant of mean-shift [23] , [101] at multiple downsampled resolutions.
This hierarchical representation enabled an image to be rendered in a highly abstract form (using coarse regions from the top of the pyramid), or for certain regions to be locally decomposed into finer grain regions by descending the hierarchy.
Such techniques scaled strokes in inverse proportion to edge magnitude and so conserved all fine (i. e., high frequency) detail in the painting unless interactively down weighted, e. g., using manually specified masks [56] .
While Decarlo and Santella harnessed the power of the human visual system to generate their importance maps, a number of IB-AR algorithms were developed using fully automated measures of salience to drive emphasis in renderings.

4.2 Artistic Rendering as a Global Optimization

Most IB-AR algorithms in the late 1990s treated painting as a local process: pixels in the image are examined in turn and strokes placed according to various heuristics.
Each stroke is placed according to information in its local spatial neighborhood only.
By contrast, global approaches to IB-AR iteratively optimize the position of rendering elements (e.g. brush strokes, or stipples) to minimize some objective function defined to describe the 'optimality' according to one or more heuristics.
It was not until a decade later that the first algorithmic solution was described for painterly rendering [56] .

4.2.1 Global Approaches to SBR: Brush-based

Hertzmann [56] extended his local curved stroke painterly algorithm [55] by treating each stroke as an active contour or snake.
A snake is a piecewise curve, whose control points are iteratively updated to minimize an energy function.
In Hertzmann's optimization [56] , a single painting is created from the source photograph and iteratively updated to converge toward an aesthetic ideal.
The weights ω 1..4 control the influence of each quality attribute and are determined empirically.
A similar model of stroke redundancy was presented in the global approach of Szirányi et al. [150] , using a Monte-Carlo Markov Chain (MCMC) optimization.

4.2.2 Global Approaches to SBR: Tonal Depiction

A purely tonal IB-AR depiction is achieved using stippling.
In practice, however, random noise can only partially remove artifacts.
By contrast, most stippling algorithms seek to minimize such artifacts-and in this sense such techniques are related to image-based hatching approaches [115] , [130] that also take structure into account in placing marks.
Kim et al. [74] generate stipple dot distributions with the same statistical properties as those created by artists.
The authors now describe this area of example based rendering in greater detail.

4.3.1 Texture by Analogy

The majority of artistic EBR algorithms focus on the transfer of artistic texture, and borrow from the nonparametric patch-based methods used for texture synthesis and photo in-painting.
Such methods (e. g., due to Efros et al. [35] , [36] ) in-fill from the edges of 'holes' in an image-iteratively copying patches from elsewhere in the image that share similarity with adjacent texture.
PCA is used to reduce the dimensionality of the search, which can be time-consuming for ANN over large dimensions (patch sizes).
Only the normalized luminance channel is considered.
Video EBR is challenging due to the problem of constraining patch choice to satisfy not only local and global spatial coherence terms but also temporal coherence.

4.3.2 Color Transfer

Manipulating color tone can affect the mood of an artistically rendered image, and forms a useful addition to the IB-AR toolbox.
Early approaches model the histogram as unimodal, equalizing the mean and variance of the source and target image (either as three 1D per-channel operations [126] or in 3D space [107] ).
More sophisticated approaches adapt to edges by considering image gradients [165] or perform matching of the histogram at multiple scales [124] .

4.4 Region-based IB-AR Algorithms

Initially proposed by DeCarlo and Santella [29] as a mechanism for interactive abstraction of photographs (Sec. 4.1), image segmentation has become a cornerstone of many automatic IB-AR algorithms that make rendering decisions based on mid-level structure parsed from the image.
The ability to harness structural representations of image content led to greater diversity of style (unlocking styles such as stained glass rendering or compositional artwork such as pseudo-Cubism).
Arguably, aesthetics were also open for improvement as style and emphasis could be controlled at a higher level (e. g., regions) rather than in response to low-level features.

4.4.1 Region Painting and Texturing

The earliest region-based IB-AR algorithms focused on painterly rendering and were essentially SBR algorithms that used the shape of the region rather than an image gradient field (as common in pre-2000 SBR) to guide the placement of strokes [41] , [77] .
Shugrina et al. [141] filled region interiors with brush strokes aligned with the principal axis but placed brush strokes on the region boundaries for outlines.
The systems described so far only make use of the color and gradient information within regions.
The classification drives the type of stroke placed, based on a pre-digitized database of stroke textures from real brushwork mapped to each texture category.
Variants of flat shading using only black and white were presented by Xu and Kaplan [167] and sought to depict the underlying image tone whilst discouraging connected regions of similar tone.

4.4.2 Deformation and Composition

Song et al. [144] classify regions into one of several canonical shapes and replace regions with those shapes to create a simplified shape rendering resembling a paper cut out.
Region deformation was also employed to warp regions into superquadric shapes reminiscent of Cubist renderings [16] .
This work also re-arranges the position of regions in order to create abstract compositions; arguably styles such as Cubism could not be generated without region-based analysis.
Shape simplification was also explored by Mi et al. [103] through decomposition into parts rather than substitution with simpler shapes [144] .

4.5 Region Tiling and Packing Algorithms

A considerable volume of IB-AR literature addresses the arrangement of a multitude of small tiles (from regular shapes to irregular pictograms) to form artistic representations.
These mosaicking algorithms are typically phrased as optimization problems seeking to maximize coverage of a 2D region, whilst minimizing tile overlap.
The tile placement is content-aware, penalizing solutions that misalign tiles to cross edges in the image.
A spatial coherence term is often introduced to encourage smoothly varying scale and orientation over the tiled region.

4.5.1 Photo and Video Mosaics

The recti-linear tiling of small image thumbnails to approximate a larger image (so called photomosaics) were among the earliest form of synthetic mosaic, inspired by early physical macro-artwork such as Dali's Abraham Lincoln.
Thumbnails are often chosen to have a semantic connection to the larger image being created, as in Dali's work.
The IB-AR literature describes optimized search strategies for expedited rendering of photomosaics [6] as well as alternative optimization strategies such as evolutionary search [14] .
Klein et al. [76] extended photomosaics to video, updating elements of the mosaic to approximate video content whilst penalizing frequent changes of a given tile to prevent flicker.
Work approximating images with irregular tiles (e. g., jigsaw image mosaics [73] ) can be considered extensions of photomosaicking.

4.5.2 Voronoi Methods

The earliest mosaic-like renderings relied on Voronoi diagrams constructed from points randomly seeded over the image [47] .
Dobashi et al. [32] modified this approach to iteratively relax the position of the Voronoi seeds to better approximate the image using a mean-squared error (MSE) between the source and rendered image.
Faustino et al. [39] place regular tiles instead of relying on Voronoi segments but guide tile placement using Voronoi regions.
The tiles are scaled in proportion to image size to preserve detail.
Grundland et al. [46] form Voronoi segments according to both edge strength and image intensity.

4.5.3 Packing and Tessellation methods

Hausner et al. [51] were the first to address irregular tile shapes through an energy minimization scheme for shape packing.
Kim et al.'s [73] jigsaw image mosaics (JIM) extended this approach using an active contour based optimization scheme to minimize the energy function to allow moderate tile deformation.
Branch and bound heuristics are used to improve search efficiency (Fig. 9(d) ).
The work follows up on an earlier specific case of irregular tiling: calligraphic (text) packing [166] .
Hurtut et al. [63] combined the principles of texture modeling and mosaicking to learn statistical distributions of tiles.

4.6 Computer Vision for Video Stylization

A major goal in video stylization is temporal coherence; requiring video to exhibit minimal flicker and the rendering primitives (e. g., strokes) to move with the underlying video content.
Early algorithms for 2D video stylization were based on per-pixel analysis using optical flow and frame differencing (Sec. 3.3).
Temporal incoherence is common in such algorithms [60] , [90] since stroke placement decisions are being made on a spatially (per-pixel) and temporally (per-frame) local basis.
Higher-level analysis of visual structure, e. g., through computer vision can lead to improved coherence.
The authors now survey two categories of post-nineties algorithms: techniques based on optical flow and segmentation-based methods.

4.6.1 Visual Stylization through Optical Flow

Approaches that employ optical flow to stylize video were revisited by Hays and Essa [52] .
To mitigate against temporal incoherence arising from flow estimates, strokes were categorized as weak or strong; the latter in edge areas where gradients are higher.
Park and Yoon [122] adopted a similar strong-weak categorization.
Blended texture patches were moved not only forward but also backward in time using a bi-directional estimate of optical flow.
This mitigated against the cumulative errors inherent in the forward propagation strategies of prior approaches.

4.6.2 Visual Stylization through Segmentation

Segmentation is now a common component in IB-AR, and by leveraging a similar mid-level representation for video, the consistent motion of strokes within an object can be enforced.
These benefits come at the cost of generality; not all object are amenable to segmentation (e. g., smoke or water).
Regions were associated over time using a space-time region adjacency graph that pruned sporadic association to improve stability.
Painterly and cartoon effects were demonstrated by filling regions with strokes and textures that deform coherently with the boundary.
In the system of Kagaya et al. [66] , the video is first segmented into spatial-temporal coherent regions.

4.6.3 Motion Stylization

Video analysis at the region level enables not only consistent rendering within objects, but also facilitates the analysis of object motion.
Automated methods to generate speed-lines in video require camera motion compensation, as the camera typically pans to track objects.
This can be approximated by estimating inter-frame homographies.
Chenney et al. [13] presented early work automatically deforming objects to emphasize motion.
Other distortions warping the object according to velocity or acceleration emphasized drag or inertia.

5 IMAGE PROCESSING AND FILTERING

Many of the techniques described in the previous sections are infeasible for real-time rendering and cannot be trivially adapted for multi-core CPUs or GPUs.
Image processing techniques performing local filtering operations provide an interesting alternative since parallelization and GPU implementations are straightforward in most cases.
Moreover, a number of filtering techniques have been shown to perform with reasonable temporal coherence when processed frame by frame.
These advantages, however, come at the expense of style diversity afforded by higher-level interpretation of content.

5.1 Bilateral Filter and Difference of Gaussians

A fully automatic pipeline for the stylization of cartoon renderings based on images and videos was first proposed in the seminal work by Winnemöller et al. [164] .
After the conversion to CIELab, the input is iteratively abstracted using the bilateral filter.
Furthermore, iterative filtering may blur edges resulting in a washedout appearance (Fig. 12(d) ).
The next section discusses a further popular approach.

5.3 Diffusion and Shock Filter

Osher and Rudin [112] as well as Weickert [160] recognized the artistic merit of shock filtered imagery, but the work of Kang and Lee [68] was the first to apply diffusion in combination with shock filtering for IB-AR.
It also creates blurred edges, leading Kang and Lee [68] to perform de-blurring with a shock filter after some MCF iterations, which helps to preserve edges.
Diffusion that deviates from the local image structure (Fig. 12(f) ). MCF and its constrained variant contract isophote curves to points.
For this reason, important image features must be protected by a user-defined mask.
A further limitation is that the technique is not stable against small changes in the input and, therefore, not suitable for perframe video processing.

5.4 Morphological Filtering

Mathematical morphology (MM) provides a set-theoretic approach to image analysis and processing.
For grayscale images, dilation is equivalent to a maximum filter and erosion corresponds to a minimum filter.
Morphological smoothing is applied in Bousseau et al.'s [9] , [10] work on watercolor rendering and in Bangham's et al.'s [5] oil paintings to simplify input images and videos before rendering.
Because opening and closing are dual, this is equivalent to inverting the output of morphological smoothing applied to the inverted image.
Then, for every pixel the probability of the pixel's value belonging to a certain cluster is defined.

5.5 Gradient Domain Techniques

In recent years, gradient domain methods have become very popular in computer vision and computer graphics [3] .
The basic idea behind such methods is to construct a gradient field representing the result.
Using scale-space analysis, they extracted a multi-scale Canny edge representation with lifetime and best scale information, which is used to define the gradient field and allows for image operations such as detail removal and shape abstraction.
Besides being computationally expensive, this technique is also known not to create temporally coherent output for video.
Bhat et al. [8] have presented a robust optimization framework that allows for the specification of zero-order (pixel value) and first-order (gradient value) constraints over space and time.

6 FUTURE CHALLENGES

Over the past two decades, IB-AR has delivered many high-quality expressive rendering algorithms and interactive systems.
As the field gathered momentum, researchers sought to identify the key emerging challenges.
Artistic rendering or Artistic stylization is also in common parlance, whilst illustrative visualization is being used for approaches in Salesin's third challenge.
DeCarlo and Stone's discussion focused on visual explanations, that IB-AR can enhance communication by simplification through structural abstraction.

6.1 Evaluation

Almost one decade since Salesin's panel discussion of this problem, few papers present structured methodologies for evaluation.
Evaluation work more closely aligned with Salesin's visual communication challenge was proposed by Gooch et al. [43] and Winnemöller et al. [164] in their portrait abstractions.
Methodologies have been developed to evaluate specific aspects of IB-AR such as visual interest [132] and stippling aesthetics [96] .
No gold standard methodology has emerged for NPR evaluation.

6.2 Interaction

Passing the artistic Turing test), the frequently stated motivation of contemporary IB-AR work is to retain human creativity and to deliver useful tools and new artistic media.
This trend also reflects the limitations of contemporary computer vision and shows that, by carefully designing minimal but well-placed interaction, a high-quality automated visual effects workflow can result.
Addressing this is especially important if, as Gooch et al. [40] suggest, IB-AR's priority is to develop new artistic media and tools.
Collaboration with end-users is essential in closing this cycle.
Connections could be forged with research communities studying computational creativity and evolutionary art.

6.3 Technical directions

The technical direction of algorithmic research in IB-AR is challenging to predict for a longer term but may develop in the direction of several established mid-term trends.
Willats and Durand [162] clearly differentiate between such renderings and current IB-AR when writing about the distinction between spatial and depictive systems.
By contrast, video stylization approaches based on computer vision can perform more aggressive abstraction through mid-level scene parsing (e. g., segmentation) at the cost of generality.
There is a tendency for complex image processing decisions to become less stable in the presence of noise.
Overall aesthetics are heavily influenced by media realism, especially in the emulation of traditional artistic styles.

Did you find this useful? Give us your feedback

Figures (12)

Fig. 3. Adapting Haeberli’s framework [47] to randomly assign stroke size and order leads to loss of salient detail [48] and motivated later the use of image processing operators for stroke placement.

Fig. 8. Image analogies extend patch based texture in-filling techniques to match between a source image and its artistic rendering. Images are rendered in analogous styles using the learned mapping.

Fig. 11. Generalized pipeline for creating cartoon-like effects by local filtering [70], [82], [164]. After the conversion to CIELab, the input is iteratively abstracted using the bilateral filter. First, 1–2 bilateral filter iterations suppress noise, and outlines are extracted from the intermediate result using a DoG filter. Further iterations of the bilateral filter are performed, with luminance quantization applied afterwards. DoG edges and the output of the luminance quantization are then composited, followed by optional sharpening by warping and smoothing of the edges.

Fig. 4. Illustrating the coarse to fine rendering and curved path tracing components of Hertzmann’s painting algorithm [55].

Fig. 7. Stippling examples: (a) Secord’s [137] approach (2000 dots); (b) Kopf et al.’s [78] non-repetitive stippling (7700 dots).

Fig. 10. Video stylization driven by coherent segmentation. (a) 3D approach. (b) 2D+t approach. (c) Cartoon and painterly styles arising from rotoscoping the coherent regions [157]. (d) Example of coherent painterly rendering [157].

Fig. 1. Chronology of IB-AR development. From the semi-automated SBR systems of the early nineties, to increasingly automated systems drawing upon image processing. Later the aesthetic gamut is enhanced through more sophisticated computer vision and edge-aware filtering. Recently attention returns to user interaction, raising new questions around the evaluation of aesthetics and usability.

Fig. 9. Mosaicking algorithms aim to tightly pack tiles without overlap. (a) A greedy approach to tiling [6]. (b) Tiling by global graph-cut based optimization [92]. (c) Mosaicking with regular [51] and (d)–(e) irregular overlapping tiles [73], [110].

Fig. 5. Interactive abstraction system of Decarlo and Santella [29]. Images are segmented in a scale-space pyramid (top-left). The viewer’s gaze is drawn to particular image regions; these regions are locally decomposed into finer segments traversing the pyramid (bottom). Outlines are smoothed and superimposed to delineate region boundaries (top-right).

Fig. 6. Global optimization algorithms for painting. (a) Curved β−spline stroke paintings produced by Hertzmann’s greedy algorithm [55] and global optimization [56]. Note the improvements in precision for all edge detail. (b)–(c) Global optimization for painting using GA [17] guided by a salience field (manually damped in region B). Note the difference in emphasis between the non-salient shrubbery and the salient sign detail; enlargement for region A compares against source (upper) and Litwinowicz’s [90] method (lower).

Fig. 12. Left: Different results that were all created with the generalized cartoon pipeline. (a) Thresholded output of the separable implementation of the flow-based DoG [82]. (b) Flow-based DoG with XDoG thresholding [163]. (c) Cartoon-style abstraction generated with bilateral and flow-based DoG filter [70], [82]. Right: A selection of popular image abstraction techniques. (d) Bilateral filter (4 iterations) [121]. (e) Anisotropic Kuwahara filter [84]. (f) Shape-simplifying image abstraction [68]. (g) Coherence-enhancing filtering [83].

Content maybe subject to copyright Report

HAL Id: hal-00781502

https://hal.inria.fr/hal-00781502

Submitted on 19 Jul 2013

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

State of the ”Art”: A Taxonomy of Artistic Stylization

Techniques for Images and Video

Jan Eric Kyprianidis, John Collomosse, Tinghuai Wang, Tobias Isenberg

To cite this version:

Jan Eric Kyprianidis, John Collomosse, Tinghuai Wang, Tobias Isenberg. State of the ”Art”: A Tax-

onomy of Artistic Stylization Techniques for Images and Video. IEEE Transactions on Visualization

and Computer Graphics, Institute of Electrical and Electronics Engineers, 2013, 19 (5), pp.866-885.

�10.1109/TVCG.2012.160�. �hal-00781502�

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 5, MAY 2013 (AUTHORS’ VERSION) 1

State of the ‘Art’: A Taxonomy of Artistic

Stylization Techniques for Images and Video

Jan Eric Kyprianidis, John Collomosse, Tinghuai Wang, and Tobias Isenberg

Abstract

—This paper surveys the ﬁeld of non-photorealistic rendering (NPR), focusing on techniques for transforming 2D input

(images and video) into artistically stylized renderings. We ﬁrst present a taxonomy of the 2D NPR algorithms developed over

the past two decades, structured according to the design characteristics and behavior of each technique. We then describe

a chronology of development from the semi-automatic paint systems of the early nineties, through to the automated painterly

rendering systems of the late nineties driven by image gradient analysis. Two complementary trends in the NPR literature are then

addressed, with reference to our taxonomy. First, the fusion of higher level computer vision and NPR, illustrating the trends toward

scene analysis to drive artistic abstraction and diversity of style. Second, the evolution of local processing approaches toward

edge-aware ﬁltering for real-time stylization of images and video. The survey then concludes with a discussion of open challenges

for 2D NPR identiﬁed in recent NPR symposia, including topics such as user and aesthetic evaluation.

Index Terms—Image and Video Stylization, Non-photorealistic Rendering (NPR), Artistic Rendering.

✦

1 INTRODUCTION

S the advent of photography stimulated artistic

diversity in the late 19

century, so did the suc-

cesses of photorealistic computer graphics in the early

nineties motivate alternative techniques for rendering in

non-photorealistic styles. Two decades later, the ﬁeld of

non-photorealistic rendering (NPR) has expanded into a

vibrant area of research covering a plethora of expressive

rendering styles for the visual communication: exploded

diagrams [88], false color [124], [126], and artistic styles

such as painterly [10], [168] and constrained palette

rendering [106], [167]. It is this latter category of artistic

rendering (AR) that forms the subject of this survey;

speciﬁcally, techniques focusing on artistic stylization

of two-dimensional content (photographs and video) to

which we refer as image-based artistic rendering (IB-AR).

IB-AR’s origins reach back to seminal works exploring

the emulation of traditional artistic media and styles [25],

[47], [55], [90], [130]. Today, IB-AR has diversiﬁed into a

highly cross-disciplinary activity, which builds upon com-

puter vision (CV), perceptual modeling, human computer

interaction (HCI), and computer graphics. Many classic

IB-AR problems have been found to closely relate to long-

standing problems in computer graphics or computer

vision; for example, video cartooning [21], [156] and its

relationship to video matting and automated rotoscoping

[2]. In many cases computer graphics problems have

beneﬁted from or motivated entirely new computer vision

•

J. E. Kyprianidis is with the Computer Graphics Systems Group of the

Hasso-Plattner-Institut, University of Potsdam, Germany.

•

T. Wang and J. Collomosse are with the Centre for Vision, Speech and

Signal Processing, University of Surrey, UK.

•

T. Isenberg is with the University of Groningen’s Johan Bernoulli

Institute, the Netherlands, and with DIGITEO/CNRS/INRIA, France.

This is the authors’ version of the work. The deﬁnitive version was

published in IEEE Transactions on Visualization and Computer Graphics.

Vol. 19, No. 5, pp. 866–885, 2013. doi: 10.1109/TVCG.2012.160.

research. Similarly, the goal of much IB-AR research—

that of producing a creative or artistic tool—demands a

careful, user-led HCI design process.

Despite several years of discipline convergence and the

resulting improvements in aesthetic quality and diversity,

there have been few surveys of the IB-AR literature

in the past decade. Common references for IB-AR are

the texts of Gooch and Gooch [42] and Strothotte and

Schlechtweg [148], both of which surveyed pre-2000

techniques (Sec. 3). The majority of other survey material

takes the form of conference tutorials; yet these primarily

focus upon illustrative visualization [95] or NPR for 3D

graphics and games [100]. This survey follows up a recent

tutorial [18] by some of the authors at Eurographics 2011,

prior to which the most recent major conference tutorials

on the topic were by Hertzmann et al. [95] in 2003 and

Green et al. [44] in 1999. Also, a number of web-based

curated bibliographies are available via Reynolds [127]

(to 2004), Schlechtweg [133] (to 2007), and Stavrakis [145].

This article delivers a comprehensive view of the

IB-AR landscape, covering classical and contemporary

techniques while offering two perspectives. First, we

provide an up-to-date taxonomy of IB-AR techniques in

which algorithms are grouped according to the family of

techniques used (e. g., nonlinear ﬁlters, region segmenta-

tion) or design characteristics (e. g., local greedy, or global

optimization approaches to rendering).

Second, we present IB-AR’s development in chrono-

logical order, from the early nineties to the modern day

(c. 2011), to reﬂect the contemporaneous development

of techniques clustered together in our taxonomy; for

example local methods, followed later by global methods.

We ﬁrst document ‘classical’ (pre-2000) IB-AR and so

introduce the key concepts and algorithms that continue

to underpin and inﬂuence more contemporary methods

(Sec. 3). These classical algorithms focused on the stroke-

based rendering (SBR) paradigm [47], [58] with increasing

2 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 5, MAY 2013 (AUTHORS’ VERSION)

1980 1990 1997 1998 2002 2004 2005 2006 2008 2010

NPAR

2010

Grand

Challen-

ges

Video

painting

Litwinowicz’97

Semi-automatic

painting systems

Haeberli’90

Fully automatic

painting

Hertzmann’98

Treveatt’97

Perceptual UI and

segmentation

DeCarlo’02

Automatic

perceptual

Collomosse’05

Space-time

video

Wang’04

Collomosse’05

GPU-based image

processing

Winnemöller’06

Kang’07/’09

Kyprianidis’08/’09

Late 1980s

Advances

in media

emulation

Strassmann’86

User evaluation

Isenberg’06

Fig. 1. Chronology of IB-AR development. From the semi-automated SBR systems of the early nineties, to increasingly automated systems

drawing upon image processing. Later the aesthetic gamut is enhanced through more sophisticated computer vision and edge-aware ﬁltering.

Recently attention returns to user interaction, raising new questions around the evaluation of aesthetics and usability.

levels of automation and sophistication in stroke place-

ment and driven by low-level image processing (typically

the Sobel operator).

Next, we describe how the early convergence of

computer graphics and image processing developed,

enabling IB-AR to draw increasingly upon the more

sophisticated image analysis offered by contemporary

computer vision algorithms (Sec. 4). One consequence of

the increasingly sophisticated interpretation or ‘parsing’

of the image was a divergence from SBR to alternative

forms of rendering primitives: the use of regions and

tiles which, in turn, unlocked greater diversity in the

gamut of styles available to IB-AR. In line with the

trend toward more complex image analysis, we also

observe IB-AR to be deﬁned increasingly as a goal-

directed task—drawing upon global optimization rather

than local approaches. Although these goals were initially

deﬁned at the low level of image artifacts (e. g., image

gradient), the description of these goals later evolved to

include higher level concepts such as perceptual salience

measures and even emotional or ‘affective’ contexts.

In parallel with the trend toward more sophisticated

scene analysis, IB-AR beneﬁted from the emerging

popularity of anisotropic and edge-preserving forms of

ﬁlters in computer graphics (Sec. 5). On the one hand,

such operations lacked high-level image ‘understanding’,

limiting their artistic gamut to painterly, sketchy, and

cartoon styles. On the other hand, their simplicity led

to real-time speeds on GPU hardware, making them

practical for video processing—and applicable to footage

(e. g., water, smoke, fur) that is otherwise challenging to

parse using vision methods such as segmentation.

Concluding, we catalog a number of challenges that

remain outstanding in IB-AR (Sec. 6).

2 TAXONOMY OF IB-AR TECHNIQUES

Early prototype IB-AR systems followed the SBR

paradigm and synthesized artistic renderings by incre-

mentally compositing virtual brush strokes whose color,

orientation, scale, and ordering were derived from semi-

[47] or fully automated processes [55], [90], [151]. The

aesthetics of the output generated by a SBR algorithm

is, therefore, a function of both the media simulation

applied to render each brush stroke and the process by

which strokes are positioned and their attributes are set

(referred to hereafter as the stroke placement algorithm).

Although sometimes described simultaneously in early IB-

AR papers, the problems of media emulation and stroke

placement may be considered de-coupled. The curved

spline strokes placed by Hertzmann’s [55] algorithm

could be rendered by sweeping various brush models

along their trajectories, to emulate thick oil paint, crayon,

charcoal, or pastel, to name but a few different media.

It is, therefore, not surprising that IB-AR has evolved in

parallel with increasingly sophisticated media emulation

models; from simple simulations of hairy brushes [146]

to full multi-layered models of pigment diffusion and

bi-directional transfer between brush and canvas [25].

A detailed exposition of media simulation warrants a

survey in its own right, but in this work we focus only

on the problem of stroke placement, or more generally,

the placement of artistic rendering primitives (regions,

strokes, stipples, tiles). We also survey nonlinear ﬁlters

that introduce an anisotropy that conveys the impression

of stroke placement. Accordingly, our taxonomy avoids

the categorization of IB-AR purely in terms of media

(painterly, sketch, cartoon shading) and instead clusters

the space of IB-AR algorithms by the elementary render-

ing primitive or stylization mechanism employed. We

then expand the lower branches of the taxonomy by

considering similarities in the nature of the algorithm;

local approaches vs. global arrangement strategies, or

approaches that address the rendering of outlines vs. the

interior of image regions.

2.1 Stroke-based Rendering (SBR)

SBR algorithms cover a 2D canvas with atomic rendering

primitives according to some process or desired end goal,

designed to simulate a particular style. In many SBR

algorithms these primitives are the eponymous virtual

brush stroke, but the deﬁnition of SBR has diversiﬁed to

primitives including tiles, stipples and hatch marks [58].

2.1.1 Brush Stroke Techniques

The most prevalent form of IB-AR are perhaps SBR

algorithms using either short dabs of paint, or long

curved brush strokes as rendering primitives. The process

of covering the canvas can be categorized broadly as

local or global. Local approaches typically drive stroke

placement decisions based on the pixels in the spatial

neighborhood of the stroke; this can be explicit in the

algorithm (e. g., image moments within a window [140],

[151]) or implicit due to a prior convolution (e. g., Sobel

edges). An alteration to the image would thus affect

only strokes in the locality. Global methods optimize

KYPRIANIDIS et al.: A TAXONOMY OF ARTISTIC STYLIZATION TECHNIQUES FOR IMAGES AND VIDEO 3

Stroke-based Rendering for Image Approximation

Brush Stroke Techniques

Local

User Interaction

Low

Level

Haeberli’90 [47]

Salisbury’96 [128]

Salisbury’97 [130]

Curtis’97 [25]

Gooch’04 [43]

Grubert’08 [45]

Lin’10 [89]

Kagaya’11 [66]

O’Donovan’11 [108]

Perceptual

Measure

Santella’02 [131]

Automatic

Low Level

Image

Haggerty’91 [48]

Treavett’97 [151]

Salisbury’97 [130]

Hertzmann’98 [55]

Shiraishi’00 [140]

Sziranyi’00 [150]

Wen’06 [161]

Video

Litwinowicz’97 [90]

Hertzmann’00 [60]

Kovacs’02 [79]

Hays’04 [52]

Park’07 [122]

Lu’10 [94]

Perceptual

Measure

Collomosse’02 [15]

Collomosse’05 [17]

Shugrina’06 [141]

Colton’08 [22]

Global

User-guided

Emphasis

Hertzmann’01 [56]

Tresset’05 [152]

Automatic

Emphasis

Szirányi’01 [150]

Collomosse’05 [21]

Mosaicking & Tiling

Still

Hausner’01 [51]

Kim’02 [73]

Dobashi’02 [32]

Elber’03 [37]

Di Blasi’05 [31]

Faustino’05 [39]

Schlechtweg’05 [134]

Orchard’08 [110]

Xu’07 [166]

Xu’08 [167]

Hurtut’09 [63]

Animated

Klein’02 [76]

Smith’05 [142]

Dalal’06 [27]

Kang’11 [67]

Tonal Depiction

Stippling

Local

Single

Resolution

Ulichney’87 [153]

Ostromoukhov’93 [113]

Ostromoukhov’94 [117]

Ostromoukhov’99 [114]

Ostromoukhov’99 [116]

Multiple

Resolution

Streit’98 [147]

Global

Spatial

Constraint

Deussen’00 [30]

Secord’02 [137]

Hiller’03 [61]

Schlechtweg’05 [134]

Kopf’06 [78]

Mould’07 [105]

Vanderhaeghe’07 [154]

Structure and

Spatial Constraint

Kim’08 [72]

Pang’08 [118]

Kim’09 [74]

Martin’11 [98]

Li’11 [87]

Hatching and

Line Art

Salisbury’94 [129]

Dafner’00 [26]

Pedersen’06 [123]

Pang’08 [118]

Mi’09 [103]

Inglis’11 [64]

Region-based Techniques

Image

Fill

Gooch’02 [41]

Mould’03 [104]

O’Donovan’06 [109]

Setlur’06 [139]

Shugrina’06 [141]

Xu’08 [167]

Form/Shape

Salisbury’96 [128]

Salisbury’97 [130]

Gooch’04 [43]

Grubert’08 [45]

Song’08 [144]

Composition

Collomosse’03 [16]

Hall’07 [49]

Hierarchical

DeCarlo’02 [29]

Bangham’03 [5]

Mould’08 [106]

Zeng’09 [168]

Zhao’10 [169]

Video

Appearance

2D+t

Agarwala’02 [1]

Collomosse’03 [20]

Agarwala’04 [2]

Collomosse’05 [21]

Bousseau’06 [9]

Bousseau’07 [10]

Wang’10 [157]

Kagaya’11 [66]

O’Donovan’11 [108]

Wang’04 [156]

Lin’10 [89]

Motion

Stylization

Collomosse’03 [19]

Smith’05 [142]

Liu’05 [91]

Wang’06 [155]

Example-based Techniques

Color

Reinhard’01 [126]

Neumann’05 [107]

Xiao’09 [165]

Pouli’11 [124]

Texture

Hertzmann’01 [59]

Ashikhmin’03 [4]

Hashimoto’03 [50]

Kim’09 [74]

Lee’10 [86]

Martin’11 [98]

Zhao’11 [170]

Image Processing and Filtering

Spatial Domain

Outlines

First

Derivative

Orzan’07 [111]

Second

Derivative

Gooch’04 [43]

Winnemöller’06 [164]

Kang’07 [69]

Kyprianidis’08 [82]

Kang’09 [70]

Winnemöller’11 [163]

Content

Anisotropic

Diffusion

Winnemöller’06 [164]

Kang’07 [69]

Kang’08 [68]

Kyprianidis’08 [82]

Kang’09 [70]

Kyprianidis’11 [83]

Local

Statistics

Papari’07 [119]

Kyprianidis’09 [84]

Kyprianidis’11 [81]

Morphological

Filtering

Bousseau’06 [9]

Bousseau’07 [10]

Papari’09 [120]

Criminisi’10 [24]

Gradient Domain

Orzan’07 [111]

Bhat’10 [8]

Fig. 2. Taxonomy of IB-AR techniques.

the placement of all strokes to minimize some objective

function. Various strategies have been applied from

snake relaxation [56], to evolutionary algorithms [17],

and Monte-Carlo optimization [150]. In all cases the

desired objective relates to retention of detail, for example,

encouraging maximal retention of visual detail [56], [150]

using low-level operators (e. g., Sobel gradient) or higher-

level measures such as image salience to retain only

perceptually important detail [17].

On the more heavily populated ‘local’ branch of the

SBR taxonomy, we partition algorithms into user-assisted

and automatic processes—the former typically pre-dating

the latter, pointing to a trend toward automation post-

nineties. The mechanism behind the automation can, as

with ‘global’ SBR, be divided into lower- and higher-level

analysis according to the deﬁnition of the ‘importance’

ﬁeld that guides the emphasis of features in the artwork.

In the parallel SBR branch of semi-automated (i. e., user-

assisted) algorithms, the low/high-level distinction is

again mirrored; with early techniques relying on image

ﬁlters to orient brush strokes [47] and later work—pre-

dating automated measures for emphasis—using gaze

trackers to directly harness the perceptual measures

inherent in the human visual system [131]. In some recent

automated algorithms, stroke placement is inﬂuenced by

even higher-level contextual parameters such as emotion

and mood [22], [141]. Most recently, there has been a trend

back toward interaction, producing semi-automated tools

for painterly video that enable keyframing of the ﬁelds

used to arrange strokes [66], [89], [108].

For automatic techniques, a clear distinction can be

made between those operating over images versus video

content. Video extensions of SBR are non-trivial as

strokes must not scintillate (ﬂicker) and their motion

must match the underlying video content. In the SBR

branch of the taxonomy this problem has largely been

addressed—though by no means solved—using optical

ﬂow. Elsewhere, nonlinear ﬁlters and segmentation have

been applied.

2.1.2 Mosaicking, Tiling and Stippling

A further sub-category of SBR aims to approximate the

image using a medium other than colored pixels or

paint, packing image regions with a multitude of atomic

rendering primitives. The techniques approximate the

image content by either (i) stippling, the distribution of

small points (stipples) often for the purpose of tonal

depiction; (ii) hatching, the use of line patterns or curves

for the same; and (iii) mosaicking algorithms that pack

small tiles together.

Stippling IB-AR techniques are closely related to digital

half-toning and dithering algorithms that locally approxi-

mate regions using dot patterns, either with the sole goal

of representing a local brightness or with an additional

artistic intent [114]. Many early half-toning techniques

developed heuristically informed greedy strategies for

populating regions with stipples to avoid artifacts due

to aliasing. Such techniques operate at either single

or multiple scales, placing dots using local decision

making. This culminated most recently in techniques

designed to emphasize image structure [118], following

4 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 5, MAY 2013 (AUTHORS’ VERSION)

the trend toward perceptual analysis in SBR. In contrast

to half-toning, stippling does not simply decide whether

to use a black or a white pixel on a regular grid

but tries to place larger dots, with the shared goal to

represent the brightness and to (typically) avoid visible

patterns. Early stippling used a number of brush-based

techniques [30]. However, much as local SBR painterly

approaches evolved into global relaxation approaches, so

image stippling began to adopt a more global strategy

for stipple placement. Recently, goals in stippling are to

capture and replicate aspects of the stippling style of

artists [74], [98] or to be able to reproduce non-repetitive

patterns [78]. A smaller subset of IB-AR explored the

approximation of images using lines and curves. Aside

from dedicated image-based hatching approaches [129],

some techniques grow labyrinthian patterns using space-

ﬁlling curves [26] or reaction diffusion processes [123]

that adapt to the intensity of the image.

Artistic mosaicking algorithms are closely related to

packing problems, and so are approached almost uni-

versally as global optimization problems. While packing

strategies vary widely, they can be categorized into those

obeying purely spatial or spatio-temporal constraints.

The latter are especially challenging since a balance

must be maintained between a faithful approximation of

frame content and the introduction of ﬂicker (temporal

incoherence) due to frequent update of the tile or glyph

chosen to represent a particular spatial region.

2.2 Region-based Techniques

Much as SBR in the 1990s relied increasingly on low-

level image processing (e. g., intensity gradient, moments,

optical ﬂow), a trend post-2000 was the emergence of

mid-level computer vision in IB-AR. Segmentation is

frequently incorporated as step toward parsing image

structure, enabling the adaptation of rendering according

to the content in regions. In some techniques, SBR

algorithms are applied to render the interiors of regions

independently [41], [141], [157]. However, the use of

regions as rendering primitives in their own right has

also given rise to additional styles including cartoon

‘ﬂat’ shading [21], [156], new materials such as stained

glass [104], [139], felt [109], and even emulation of abstract

artistic styles [16].

For images, we categorize region-based approaches

into those considering the arrangement of rendering

primitives (e. g., strokes) within the interiors of regions

and those manipulating shape, form, and composition of

regions. A further category explores techniques based on

image pyramids. Various interactive techniques (human

gaze-trackers [29], importance maps [5]) are used to

browse a region containment hierarchy constructed by

segmenting successively lower resolution versions of the

source image. An image can be rendered at a high level of

abstraction by drawing only coarse large regions near the

top of the hierarchy, or particular regions can be rendered

in greater detail at lower levels. This enables local control

over the level of detail. Such methods were among the

ﬁrst region-based IB-AR algorithms and are signiﬁcant by

being among the ﬁrst to consider perceptual importance.

The consideration of regions in IB-AR has also ben-

eﬁted video stylization, offering an alternative to SBR

techniques dependent on optical ﬂow. Video segmenta-

tion is a well-studied problem in computer vision and

is broadly separated into two categories: techniques that

segment frames independently and associate regions over

time (2D

) and those segmenting video as a spatio-

temporal

volume (3D). Both methodologies have

seen applications to IB-AR for the purpose of cartooning

or otherwise stylizing the appearance of video. All

techniques share the observation that once video has

been coherently segmented into regions (a non-trivial

problem), the problem of hatching, sketching, or painting

with temporal coherence can be solved by attaching

strokes to a rigid [21] or deforming [2] region. This frames

the problem of IB-AR as one of automated rotoscoping.

Finally, when considering regions, it is possible to track

and analyze the motion of objects. This gives rise to

a complementary form of video stylization—that of

artistically manipulating object motion.

2.3 Example-based Rendering

Most IB-AR algorithms encode a set of heuristics, typ-

ically emulating artistic practice with the goal of faith-

fully depicting a prescribed style. A complementary

approach to IB-AR—example-based rendering pioneered

by Hertzmann et al. [59]—learns the mapping between an

exemplar pair: a source image and an artist’s rendering

of that image. The learned mapping can then be applied

to render arbitrary images in the exemplar style.

Example-based rendering (EBR) can be categorized

as performing either texture or color transfer. Color

EBR typically performs a piecewise mapping between

the color histograms of two images to effect a non-

photorealistic recoloring. Often there is only weak enforce-

ment of spatial coherence in the color mapping process.

By contrast, texture-based EBR shares similarities with

patch-based texture in-ﬁlling techniques [35], [36], which

seek to ﬁll holes in images by searching for visually

similar patches elsewhere in the image. However, in

the case of EBR the patches are not matched within

the source image to be rendered but instead within the

exemplar source image. The corresponding patch from

the exemplar artistic image is then pasted into place in

the output rendering. As with texture in-ﬁlling, a careful

balance must be maintained between ﬁdelity of the patch

matching and the spatial coherence in the rendering.

2.4 Image Processing and Filtering

Many image processing ﬁlters have been explored for

IB-AR but few have been recognized so far to produce

interesting results from an artistic point of view. This is

probably because these ﬁlters are often concerned with

the restoration and recovery of photorealistic imagery. By

contrast, IB-AR generally aims for simpliﬁcation.

HTML Viewer

Frequently Asked Questions (16)

Q1. What are the contributions in "State of the ”art”: a taxonomy of artistic stylization techniques for images and video" ?

This paper surveys the field of non-photorealistic rendering ( NPR ), focusing on techniques for transforming 2D input ( images and video ) into artistically stylized renderings. The authors first present a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique. The authors then describe a chronology of development from the semi-automatic paint systems of the early nineties, through to the automated painterly rendering systems of the late nineties driven by image gradient analysis.

Q2. What have the authors stated for future works in "State of the ”art”: a taxonomy of artistic stylization techniques for images and video" ?

They express the view that ( 6 ) remains the most promising direction ; that NPR should “ not just imitate and emulate styles of the past but create styles for the future. ” They also observe that Salesin ’ s research questions regarding definitions of aesthetics and the artistic Turing test should be given equal weight in terms of new artistic styles emerging as a consequence of NPR. Further positions regarding directions for NPR were presented at NPAR 2010 by DeCarlo and Stone [ 28 ] and Hertzmann [ 54 ].

Q3. What are the main approaches to such example-based rendering?

There are two main approaches to such example-based rendering (EBR): methods seeking to perform texture transfer (typically performed by modulating the luminance channel) and those focusing on color transfer leaving texture constant.

Q4. What is the effect of the bilateral filter on low-contrast images?

The bilateral filter smoothes low-contrast regions while preserving high-contrast edges, but may fail for highcontrast images where either no abstraction is performed or salient visual features may be removed.

Q5. What is the meaning of the term "Extensions of photomosaicking"?

Work approximating images with irregular tiles (e. g., jigsaw image mosaics [73]) can be considered extensions of photomosaicking.

Q6. What is the common method of applying morphological smoothing to watercolor paintings?

Since watercolor paintings typically have light colors, Bousseau et al. [10] proposed to swap the order of the morphological operators and apply closing followed by opening.

Q7. How many man-hours of manual correction to optical flow fields were required to produce the short?

Green et al. [44] report that over 1000 man-hours of manual correction to optical flow fields were required to produce the short painterly scenes in the movie.

Q8. What is the way to preserve the visual richness of color photographs?

Qu et al. [125], for example, preserve the visual richness of color photographs by applying a range of stippling and related bitonal techniques to different regions in the image.

Q9. What is the common term used for morphological smoothing?

These are related to order-statistics filters and applying opening and closing in sequence results in a smoothing operation that is often referred to as morphological smoothing.

Q10. What is the main idea behind the IB-AR algorithm?

Initially proposed by DeCarlo and Santella [29] as a mechanism for interactive abstraction of photographs (Sec. 4.1), image segmentation has become a cornerstone of many automatic IB-AR algorithms that make rendering decisions based on mid-level structure parsed from the image.

Q11. What are the main approaches to the transfer of artistic texture?

The majority of artistic EBR algorithms focus on the transfer of artistic texture, and borrow from the nonparametric patch-based methods used for texture synthesis and photo in-painting.

Q12. What are the different types of techniques used to browse a region containment hierarchy?

Various interactive techniques (human gaze-trackers [29], importance maps [5]) are used to browse a region containment hierarchy constructed by segmenting successively lower resolution versions of the source image.

Q13. What is the motivation of contemporary IB-AR work?

Although a few IB-AR systems of the early nineties cited their motivation as emulating the artist (i. e., passing the artistic Turing test), the frequently stated motivation of contemporary IB-AR work is to retain human creativity and to deliver useful tools and new artistic media.

Q14. What was not present in the final smoothing pass?

Also not present were the iterative application of the DoG filter [69] and the final smoothing pass to further reduce aliasing of edges.

Q15. What is the technique used to create a sequence of spline control points?

Given a starting or seed pixel, a sequence of spline control points is generated by iteratively hopping between pixels normal to the direction of the image gradient (Fig. 4).

Q16. What is the definition of a high quality painting?

A high quality painting is deemed to be one that matches the source image as closely as possible, using a minimal number of strokes but covering the maximum area of canvas in paint.

State of the "Art&#x201D;: A Taxonomy of Artistic Stylization Techniques for Images and Video

Summary (10 min read)

1 INTRODUCTION

Late 1980s Advances in media emulation

2 TAXONOMY OF IB-AR TECHNIQUES

2.1 Stroke-based Rendering (SBR)

2.1.1 Brush Stroke Techniques

2.1.2 Mosaicking, Tiling and Stippling

2.2 Region-based Techniques

2.3 Example-based Rendering

2.4 Image Processing and Filtering

3 CLASSICAL STYLIZATION ALGORITHMS

3.1 Local Algorithms for Stroke Placement

3.1.1 Early Pen-and-Ink Hatching Algorithms

3.1.2 Early Painterly Rendering Algorithms

3.2 Local Coarse-to-fine IB-AR Algorithms

3.3 Video Stylization

4 VISION FOR STYLIZATION

4.1 Perceptual Measures for Stylization

4.2 Artistic Rendering as a Global Optimization

4.2.1 Global Approaches to SBR: Brush-based

4.2.2 Global Approaches to SBR: Tonal Depiction

4.3.1 Texture by Analogy

4.3.2 Color Transfer

4.4 Region-based IB-AR Algorithms

4.4.1 Region Painting and Texturing

4.4.2 Deformation and Composition

4.5 Region Tiling and Packing Algorithms

4.5.1 Photo and Video Mosaics

4.5.2 Voronoi Methods

4.5.3 Packing and Tessellation methods

4.6 Computer Vision for Video Stylization

4.6.1 Visual Stylization through Optical Flow

4.6.2 Visual Stylization through Segmentation

4.6.3 Motion Stylization

5 IMAGE PROCESSING AND FILTERING

5.1 Bilateral Filter and Difference of Gaussians

5.3 Diffusion and Shock Filter

5.4 Morphological Filtering

5.5 Gradient Domain Techniques

6 FUTURE CHALLENGES

6.1 Evaluation

6.2 Interaction

6.3 Technical directions

Figures (12)

Citations

Cites background from "State of the "Art&#x201D;: A Taxono..."

Cites background from "State of the "Art&#x201D;: A Taxono..."

References

"State of the "Art&#x201D;: A Taxono..." refers background in this paper

"State of the "Art&#x201D;: A Taxono..." refers methods in this paper

"State of the "Art&#x201D;: A Taxono..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (16)

Q1. What are the contributions in "State of the ”art”: a taxonomy of artistic stylization techniques for images and video" ?

Q2. What have the authors stated for future works in "State of the ”art”: a taxonomy of artistic stylization techniques for images and video" ?

Q3. What are the main approaches to such example-based rendering?

Q4. What is the effect of the bilateral filter on low-contrast images?

Q5. What is the meaning of the term "Extensions of photomosaicking"?

Q6. What is the common method of applying morphological smoothing to watercolor paintings?

Q7. How many man-hours of manual correction to optical flow fields were required to produce the short?

Q8. What is the way to preserve the visual richness of color photographs?

Q9. What is the common term used for morphological smoothing?

Q10. What is the main idea behind the IB-AR algorithm?

Q11. What are the main approaches to the transfer of artistic texture?

Q12. What are the different types of techniques used to browse a region containment hierarchy?

Q13. What is the motivation of contemporary IB-AR work?

Q14. What was not present in the final smoothing pass?

Q15. What is the technique used to create a sequence of spline control points?

Q16. What is the definition of a high quality painting?

State of the "Art”: A Taxonomy of Artistic Stylization Techniques for Images and Video

Cites background from "State of the "Art”: A Taxono..."

Cites background from "State of the "Art”: A Taxono..."

"State of the "Art”: A Taxono..." refers background in this paper

"State of the "Art”: A Taxono..." refers methods in this paper

"State of the "Art”: A Taxono..." refers background in this paper