scispace - formally typeset
Search or ask a question
Journal ArticleDOI

State of the "Art”: A Taxonomy of Artistic Stylization Techniques for Images and Video

TL;DR: This paper presents a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique, and describes a chronology of development from the semiautomatic paint systems of the early nineties, through to the automated painterly rendering system of the late nineties driven by image gradient analysis.
Abstract: This paper surveys the field of nonphotorealistic rendering (NPR), focusing on techniques for transforming 2D input (images and video) into artistically stylized renderings. We first present a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique. We then describe a chronology of development from the semiautomatic paint systems of the early nineties, through to the automated painterly rendering systems of the late nineties driven by image gradient analysis. Two complementary trends in the NPR literature are then addressed, with reference to our taxonomy. First, the fusion of higher level computer vision and NPR, illustrating the trends toward scene analysis to drive artistic abstraction and diversity of style. Second, the evolution of local processing approaches toward edge-aware filtering for real-time stylization of images and video. The survey then concludes with a discussion of open challenges for 2D NPR identified in recent NPR symposia, including topics such as user and aesthetic evaluation.

Summary (10 min read)

Jump to: [1 INTRODUCTION][Late 1980s Advances in media emulation][2 TAXONOMY OF IB-AR TECHNIQUES][2.1 Stroke-based Rendering (SBR)][2.1.1 Brush Stroke Techniques][2.1.2 Mosaicking, Tiling and Stippling][2.2 Region-based Techniques][2.3 Example-based Rendering][2.4 Image Processing and Filtering][3 CLASSICAL STYLIZATION ALGORITHMS][3.1 Local Algorithms for Stroke Placement][3.1.1 Early Pen-and-Ink Hatching Algorithms][3.1.2 Early Painterly Rendering Algorithms][3.2 Local Coarse-to-fine IB-AR Algorithms][3.3 Video Stylization][4 VISION FOR STYLIZATION][4.1 Perceptual Measures for Stylization][4.2 Artistic Rendering as a Global Optimization][4.2.1 Global Approaches to SBR: Brush-based][4.2.2 Global Approaches to SBR: Tonal Depiction][4.3.1 Texture by Analogy][4.3.2 Color Transfer][4.4 Region-based IB-AR Algorithms][4.4.1 Region Painting and Texturing][4.4.2 Deformation and Composition][4.5 Region Tiling and Packing Algorithms][4.5.1 Photo and Video Mosaics][4.5.2 Voronoi Methods][4.5.3 Packing and Tessellation methods][4.6 Computer Vision for Video Stylization][4.6.1 Visual Stylization through Optical Flow][4.6.2 Visual Stylization through Segmentation][4.6.3 Motion Stylization][5 IMAGE PROCESSING AND FILTERING][5.1 Bilateral Filter and Difference of Gaussians][5.3 Diffusion and Shock Filter][5.4 Morphological Filtering][5.5 Gradient Domain Techniques][6 FUTURE CHALLENGES][6.1 Evaluation][6.2 Interaction] and [6.3 Technical directions]

1 INTRODUCTION

  • A S the advent of photography stimulated artistic diversity in the late 19 th century, so did the successes of photorealistic computer graphics in the early nineties motivate alternative techniques for rendering in non-photorealistic styles.
  • It is this latter category of artistic rendering (AR) that forms the subject of this survey; specifically, techniques focusing on artistic stylization of two-dimensional content (photographs and video) to which the authors refer as image-based artistic rendering (IB-AR).
  • Today, IB-AR has diversified into a highly cross-disciplinary activity, which builds upon computer vision (CV), perceptual modeling, human computer interaction (HCI), and computer graphics.

Late 1980s Advances in media emulation

  • From the semi-automated SBR systems of the early nineties, to increasingly automated systems drawing upon image processing.
  • Later the aesthetic gamut is enhanced through more sophisticated computer vision and edge-aware filtering.
  • Next, the authors describe how the early convergence of computer graphics and image processing developed, enabling IB-AR to draw increasingly upon the more sophisticated image analysis offered by contemporary computer vision algorithms (Sec. 4).
  • One consequence of the increasingly sophisticated interpretation or 'parsing' of the image was a divergence from SBR to alternative forms of rendering primitives: the use of regions and tiles which, in turn, unlocked greater diversity in the gamut of styles available to IB-AR.

2 TAXONOMY OF IB-AR TECHNIQUES

  • Early prototype IB-AR systems followed the SBR paradigm and synthesized artistic renderings by incrementally compositing virtual brush strokes whose color, orientation, scale, and ordering were derived from semi- [47] or fully automated processes [55] , [90] , [151] .
  • The problems of media emulation and stroke placement may be considered de-coupled.
  • The curved spline strokes placed by Hertzmann's [55] algorithm could be rendered by sweeping various brush models along their trajectories, to emulate thick oil paint, crayon, charcoal, or pastel, to name but a few different media.
  • The authors also survey nonlinear filters that introduce an anisotropy that conveys the impression of stroke placement.
  • Accordingly, their taxonomy avoids the categorization of IB-AR purely in terms of media (painterly, sketch, cartoon shading) and instead clusters the space of IB-AR algorithms by the elementary rendering primitive or stylization mechanism employed.

2.1 Stroke-based Rendering (SBR)

  • SBR algorithms cover a 2D canvas with atomic rendering primitives according to some process or desired end goal, designed to simulate a particular style.
  • In many SBR algorithms these primitives are the eponymous virtual brush stroke, but the definition of SBR has diversified to primitives including tiles, stipples and hatch marks [58] .

2.1.1 Brush Stroke Techniques

  • The most prevalent form of IB-AR are perhaps SBR algorithms using either short dabs of paint, or long curved brush strokes as rendering primitives.
  • The process of covering the canvas can be categorized broadly as local or global.
  • Local approaches typically drive stroke placement decisions based on the pixels in the spatial neighborhood of the stroke; this can be explicit in the algorithm (e. g., image moments within a window [140] , [151] ) or implicit due to a prior convolution (e. g., Sobel edges).
  • Various strategies have been applied from snake relaxation [56] , to evolutionary algorithms [17] , and Monte-Carlo optimization [150] .
  • In the parallel SBR branch of semi-automated (i. e., userassisted) algorithms, the low/high-level distinction is again mirrored; with early techniques relying on image filters to orient brush strokes [47] and later work-predating automated measures for emphasis-using gaze trackers to directly harness the perceptual measures inherent in the human visual system [131] .

2.1.2 Mosaicking, Tiling and Stippling

  • A further sub-category of SBR aims to approximate the image using a medium other than colored pixels or paint, packing image regions with a multitude of atomic rendering primitives.
  • The techniques approximate the image content by either (i) stippling, the distribution of small points often for the purpose of tonal depiction; (ii) hatching, the use of line patterns or curves for the same; and (iii) mosaicking algorithms that pack small tiles together.
  • Stippling IB-AR techniques are closely related to digital half-toning and dithering algorithms that locally approximate regions using dot patterns, either with the sole goal of representing a local brightness or with an additional artistic intent [114] .
  • This culminated most recently in techniques designed to emphasize image structure [118] , following the trend toward perceptual analysis in SBR.
  • Aside from dedicated image-based hatching approaches [129] , some techniques grow labyrinthian patterns using spacefilling curves [26] or reaction diffusion processes [123] that adapt to the intensity of the image.

2.2 Region-based Techniques

  • Much as SBR in the 1990s relied increasingly on lowlevel image processing (e. g., intensity gradient, moments, optical flow), a trend post-2000 was the emergence of mid-level computer vision in IB-AR.
  • For images, the authors categorize region-based approaches into those considering the arrangement of rendering primitives (e. g., strokes) within the interiors of regions and those manipulating shape, form, and composition of regions.
  • Both methodologies have seen applications to IB-AR for the purpose of cartooning or otherwise stylizing the appearance of video.
  • This frames the problem of IB-AR as one of automated rotoscoping.
  • Finally, when considering regions, it is possible to track and analyze the motion of objects.

2.3 Example-based Rendering

  • Most IB-AR algorithms encode a set of heuristics, typically emulating artistic practice with the goal of faithfully depicting a prescribed style.
  • A complementary approach to IB-AR-example-based rendering pioneered by Hertzmann et al. [59] -learns the mapping between an exemplar pair: a source image and an artist's rendering of that image.
  • Color EBR typically performs a piecewise mapping between the color histograms of two images to effect a nonphotorealistic recoloring.
  • Often there is only weak enforcement of spatial coherence in the color mapping process.
  • The corresponding patch from the exemplar artistic image is then pasted into place in the output rendering.

2.4 Image Processing and Filtering

  • Many image processing filters have been explored for IB-AR but few have been recognized so far to produce interesting results from an artistic point of view.
  • Among the filtering approaches to IB-AR, the authors distinguish two major categories depending on the domain the techniques operate on.
  • Most approaches that have been derived from classical image processing techniques fall into this category.
  • The authors adopt the usual distinction between first and second order derivative methods.
  • Techniques based on the bilateral filter fall into the anisotropic diffusion category since the bilateral filter can be interpreted as fast filter-based approximation of anisotropic diffusion.

3 CLASSICAL STYLIZATION ALGORITHMS

  • IB-AR arguably began to gain momentum in the early 1990s with semi-automated paint systems such as Haeberli's [47] that enabled photos to be transformed into impressionist-like paintings with minimal labor.
  • People would click on a photo, each click prompting the generation of a virtual brush stroke of the underlying image color.
  • The introduction of noise prior to this process results in stroke color variation reminiscent of an impressionist painting.
  • Haeberli's system [47] was motivated by a desire to enrich digital painting by automating the color selection process (reducing the 'time to palette').
  • This concept of stroke-based rendering (SBR) [58] underpins almost all of the IB-AR work developed in the nineties, which sought increasingly to automate and to enhance the sophistication of stroke placement.

3.1 Local Algorithms for Stroke Placement

  • Haeberli's framework [47] automates the selection of stroke color, and for non-circular brush strokes can also decide stroke orientation by painting strokes orthogonal to the intensity gradient in the source image.
  • The system relies upon the user to determine the order and Fig. 3 .
  • Adapting Haeberli's framework [47] to randomly assign stroke size and order leads to loss of salient detail [48] and motivated later the use of image processing operators for stroke placement.
  • The size and sequencing of stroke overpainting is crucial to producing results with an acceptable aesthetic and without any loss of salient detail.
  • With this solution strokes can be painted at sizes disproportionate to the features they represent.

3.1.1 Early Pen-and-Ink Hatching Algorithms

  • Early semi-automated systems for rendering in pen-andink and cross-hatched styles follow a similar pattern of development.
  • Salisbury et al. [129] developed a semiautomated hatching system that oriented textures according to the underlying image gradients, much as Haeberli's oriented brush strokes [47] .
  • A multi-scale extension of the system [128] offered aesthetic improvements in the viewing of hatching patterns at multiple scales.
  • Regionbased editing and manipulation of the underlying image gradient was later introduced by Salisbury et al. [130] , enabling discontinuities and swirl effects to be manually introduced, improving tool expressiveness.

3.1.2 Early Painterly Rendering Algorithms

  • The first automatic solution to IB-AR described in full detail within the literature was a painterly rendering tool proposed by Litwinowicz [90] .
  • Strokes are rectangular, and oriented using Sobel gradients as done previously [47] , [48] .
  • The clipping process results in crisp edges that mitigate against strokes from unimportant regions over-painting more important regions.
  • Later, Hays and Essa [52] adopted a similar technique for interpolation in their video painting algorithm.
  • A notable exception is Treavett and Chen's [151] adoption of image moments computed local to each pixel.

3.2 Local Coarse-to-fine IB-AR Algorithms

  • Constant-sized rectangular strokes, even after clipping, generate an artificial regularity that could degrade the resulting aesthetic.
  • Each scale of the pyramid corresponds to a layer in the painting; the coarsest scale is painted as the first layer with large strokes.
  • The latter enables texturing or bump-mapping of the stroke to produce a convincing oil-paint effect [57] .
  • Multi-resolution rendering image analysis was adopted soon afterwards by half-toning algorithms that similarly distributed rendering primitives (stipples or short lines) across a low-pass pyramid [147] .

3.3 Video Stylization

  • The extension of SBR to video stylization is non-trivial, since independent per-frame rendering of the image sequence will result in a distracting flickering or scintillation in the animation.
  • It is, therefore, desirable that: 1) the motion of brush strokes both matches the motion of the underlying video content, and 2) the animated sequence is flicker free.
  • In the case of overly dense regions, strokes are deleted at random until the density reaches acceptable levels.
  • When a reliable optical flow estimate is available, the technique performs well.
  • In the late nineties real-time optical flow was impractical, leading Hertzmann and Perlin [60] to present an alternative video painting technique amenable to interactive rendering: a "Living Painting.".

4 VISION FOR STYLIZATION

  • An increasing reliance upon local image processing techniques (predominantly the Sobel gradient operator) was instrumental in transforming the interactive IB-AR systems of the early nineties into fully automatic rendering systems.
  • Continuing this trend towards deeper image analysis, a major trend post-nineties was the tendency to rely increasingly upon higher-level computer vision to guide artistic rendering.
  • This trend began with the adoption of mid-level computer vision methods, specifically the use of image segmentation.

4.1 Perceptual Measures for Stylization

  • DeCarlo and Santella [29] were among the first to apply image segmentation in IB-AR.
  • Images were segmented using a variant of mean-shift [23] , [101] at multiple downsampled resolutions.
  • This hierarchical representation enabled an image to be rendered in a highly abstract form (using coarse regions from the top of the pyramid), or for certain regions to be locally decomposed into finer grain regions by descending the hierarchy.
  • Such techniques scaled strokes in inverse proportion to edge magnitude and so conserved all fine (i. e., high frequency) detail in the painting unless interactively down weighted, e. g., using manually specified masks [56] .
  • While Decarlo and Santella harnessed the power of the human visual system to generate their importance maps, a number of IB-AR algorithms were developed using fully automated measures of salience to drive emphasis in renderings.

4.2 Artistic Rendering as a Global Optimization

  • Most IB-AR algorithms in the late 1990s treated painting as a local process: pixels in the image are examined in turn and strokes placed according to various heuristics.
  • Each stroke is placed according to information in its local spatial neighborhood only.
  • By contrast, global approaches to IB-AR iteratively optimize the position of rendering elements (e.g. brush strokes, or stipples) to minimize some objective function defined to describe the 'optimality' according to one or more heuristics.
  • It was not until a decade later that the first algorithmic solution was described for painterly rendering [56] .

4.2.1 Global Approaches to SBR: Brush-based

  • Hertzmann [56] extended his local curved stroke painterly algorithm [55] by treating each stroke as an active contour or snake.
  • A snake is a piecewise curve, whose control points are iteratively updated to minimize an energy function.
  • In Hertzmann's optimization [56] , a single painting is created from the source photograph and iteratively updated to converge toward an aesthetic ideal.
  • The weights ω 1..4 control the influence of each quality attribute and are determined empirically.
  • A similar model of stroke redundancy was presented in the global approach of Szirányi et al. [150] , using a Monte-Carlo Markov Chain (MCMC) optimization.

4.2.2 Global Approaches to SBR: Tonal Depiction

  • A purely tonal IB-AR depiction is achieved using stippling.
  • In practice, however, random noise can only partially remove artifacts.
  • By contrast, most stippling algorithms seek to minimize such artifacts-and in this sense such techniques are related to image-based hatching approaches [115] , [130] that also take structure into account in placing marks.
  • Kim et al. [74] generate stipple dot distributions with the same statistical properties as those created by artists.
  • The authors now describe this area of example based rendering in greater detail.

4.3.1 Texture by Analogy

  • The majority of artistic EBR algorithms focus on the transfer of artistic texture, and borrow from the nonparametric patch-based methods used for texture synthesis and photo in-painting.
  • Such methods (e. g., due to Efros et al. [35] , [36] ) in-fill from the edges of 'holes' in an image-iteratively copying patches from elsewhere in the image that share similarity with adjacent texture.
  • PCA is used to reduce the dimensionality of the search, which can be time-consuming for ANN over large dimensions (patch sizes).
  • Only the normalized luminance channel is considered.
  • Video EBR is challenging due to the problem of constraining patch choice to satisfy not only local and global spatial coherence terms but also temporal coherence.

4.3.2 Color Transfer

  • Manipulating color tone can affect the mood of an artistically rendered image, and forms a useful addition to the IB-AR toolbox.
  • Early approaches model the histogram as unimodal, equalizing the mean and variance of the source and target image (either as three 1D per-channel operations [126] or in 3D space [107] ).
  • More sophisticated approaches adapt to edges by considering image gradients [165] or perform matching of the histogram at multiple scales [124] .

4.4 Region-based IB-AR Algorithms

  • Initially proposed by DeCarlo and Santella [29] as a mechanism for interactive abstraction of photographs (Sec. 4.1), image segmentation has become a cornerstone of many automatic IB-AR algorithms that make rendering decisions based on mid-level structure parsed from the image.
  • The ability to harness structural representations of image content led to greater diversity of style (unlocking styles such as stained glass rendering or compositional artwork such as pseudo-Cubism).
  • Arguably, aesthetics were also open for improvement as style and emphasis could be controlled at a higher level (e. g., regions) rather than in response to low-level features.

4.4.1 Region Painting and Texturing

  • The earliest region-based IB-AR algorithms focused on painterly rendering and were essentially SBR algorithms that used the shape of the region rather than an image gradient field (as common in pre-2000 SBR) to guide the placement of strokes [41] , [77] .
  • Shugrina et al. [141] filled region interiors with brush strokes aligned with the principal axis but placed brush strokes on the region boundaries for outlines.
  • The systems described so far only make use of the color and gradient information within regions.
  • The classification drives the type of stroke placed, based on a pre-digitized database of stroke textures from real brushwork mapped to each texture category.
  • Variants of flat shading using only black and white were presented by Xu and Kaplan [167] and sought to depict the underlying image tone whilst discouraging connected regions of similar tone.

4.4.2 Deformation and Composition

  • Song et al. [144] classify regions into one of several canonical shapes and replace regions with those shapes to create a simplified shape rendering resembling a paper cut out.
  • Region deformation was also employed to warp regions into superquadric shapes reminiscent of Cubist renderings [16] .
  • This work also re-arranges the position of regions in order to create abstract compositions; arguably styles such as Cubism could not be generated without region-based analysis.
  • Shape simplification was also explored by Mi et al. [103] through decomposition into parts rather than substitution with simpler shapes [144] .

4.5 Region Tiling and Packing Algorithms

  • A considerable volume of IB-AR literature addresses the arrangement of a multitude of small tiles (from regular shapes to irregular pictograms) to form artistic representations.
  • These mosaicking algorithms are typically phrased as optimization problems seeking to maximize coverage of a 2D region, whilst minimizing tile overlap.
  • The tile placement is content-aware, penalizing solutions that misalign tiles to cross edges in the image.
  • A spatial coherence term is often introduced to encourage smoothly varying scale and orientation over the tiled region.

4.5.1 Photo and Video Mosaics

  • The recti-linear tiling of small image thumbnails to approximate a larger image (so called photomosaics) were among the earliest form of synthetic mosaic, inspired by early physical macro-artwork such as Dali's Abraham Lincoln.
  • Thumbnails are often chosen to have a semantic connection to the larger image being created, as in Dali's work.
  • The IB-AR literature describes optimized search strategies for expedited rendering of photomosaics [6] as well as alternative optimization strategies such as evolutionary search [14] .
  • Klein et al. [76] extended photomosaics to video, updating elements of the mosaic to approximate video content whilst penalizing frequent changes of a given tile to prevent flicker.
  • Work approximating images with irregular tiles (e. g., jigsaw image mosaics [73] ) can be considered extensions of photomosaicking.

4.5.2 Voronoi Methods

  • The earliest mosaic-like renderings relied on Voronoi diagrams constructed from points randomly seeded over the image [47] .
  • Dobashi et al. [32] modified this approach to iteratively relax the position of the Voronoi seeds to better approximate the image using a mean-squared error (MSE) between the source and rendered image.
  • Faustino et al. [39] place regular tiles instead of relying on Voronoi segments but guide tile placement using Voronoi regions.
  • The tiles are scaled in proportion to image size to preserve detail.
  • Grundland et al. [46] form Voronoi segments according to both edge strength and image intensity.

4.5.3 Packing and Tessellation methods

  • Hausner et al. [51] were the first to address irregular tile shapes through an energy minimization scheme for shape packing.
  • Kim et al.'s [73] jigsaw image mosaics (JIM) extended this approach using an active contour based optimization scheme to minimize the energy function to allow moderate tile deformation.
  • Branch and bound heuristics are used to improve search efficiency (Fig. 9(d) ).
  • The work follows up on an earlier specific case of irregular tiling: calligraphic (text) packing [166] .
  • Hurtut et al. [63] combined the principles of texture modeling and mosaicking to learn statistical distributions of tiles.

4.6 Computer Vision for Video Stylization

  • A major goal in video stylization is temporal coherence; requiring video to exhibit minimal flicker and the rendering primitives (e. g., strokes) to move with the underlying video content.
  • Early algorithms for 2D video stylization were based on per-pixel analysis using optical flow and frame differencing (Sec. 3.3).
  • Temporal incoherence is common in such algorithms [60] , [90] since stroke placement decisions are being made on a spatially (per-pixel) and temporally (per-frame) local basis.
  • Higher-level analysis of visual structure, e. g., through computer vision can lead to improved coherence.
  • The authors now survey two categories of post-nineties algorithms: techniques based on optical flow and segmentation-based methods.

4.6.1 Visual Stylization through Optical Flow

  • Approaches that employ optical flow to stylize video were revisited by Hays and Essa [52] .
  • To mitigate against temporal incoherence arising from flow estimates, strokes were categorized as weak or strong; the latter in edge areas where gradients are higher.
  • Park and Yoon [122] adopted a similar strong-weak categorization.
  • Blended texture patches were moved not only forward but also backward in time using a bi-directional estimate of optical flow.
  • This mitigated against the cumulative errors inherent in the forward propagation strategies of prior approaches.

4.6.2 Visual Stylization through Segmentation

  • Segmentation is now a common component in IB-AR, and by leveraging a similar mid-level representation for video, the consistent motion of strokes within an object can be enforced.
  • These benefits come at the cost of generality; not all object are amenable to segmentation (e. g., smoke or water).
  • Regions were associated over time using a space-time region adjacency graph that pruned sporadic association to improve stability.
  • Painterly and cartoon effects were demonstrated by filling regions with strokes and textures that deform coherently with the boundary.
  • In the system of Kagaya et al. [66] , the video is first segmented into spatial-temporal coherent regions.

4.6.3 Motion Stylization

  • Video analysis at the region level enables not only consistent rendering within objects, but also facilitates the analysis of object motion.
  • Automated methods to generate speed-lines in video require camera motion compensation, as the camera typically pans to track objects.
  • This can be approximated by estimating inter-frame homographies.
  • Chenney et al. [13] presented early work automatically deforming objects to emphasize motion.
  • Other distortions warping the object according to velocity or acceleration emphasized drag or inertia.

5 IMAGE PROCESSING AND FILTERING

  • Many of the techniques described in the previous sections are infeasible for real-time rendering and cannot be trivially adapted for multi-core CPUs or GPUs.
  • Image processing techniques performing local filtering operations provide an interesting alternative since parallelization and GPU implementations are straightforward in most cases.
  • Moreover, a number of filtering techniques have been shown to perform with reasonable temporal coherence when processed frame by frame.
  • These advantages, however, come at the expense of style diversity afforded by higher-level interpretation of content.

5.1 Bilateral Filter and Difference of Gaussians

  • A fully automatic pipeline for the stylization of cartoon renderings based on images and videos was first proposed in the seminal work by Winnemöller et al. [164] .
  • After the conversion to CIELab, the input is iteratively abstracted using the bilateral filter.
  • Furthermore, iterative filtering may blur edges resulting in a washedout appearance (Fig. 12(d) ).
  • The next section discusses a further popular approach.

5.3 Diffusion and Shock Filter

  • Osher and Rudin [112] as well as Weickert [160] recognized the artistic merit of shock filtered imagery, but the work of Kang and Lee [68] was the first to apply diffusion in combination with shock filtering for IB-AR.
  • It also creates blurred edges, leading Kang and Lee [68] to perform de-blurring with a shock filter after some MCF iterations, which helps to preserve edges.
  • Diffusion that deviates from the local image structure (Fig. 12(f) ). MCF and its constrained variant contract isophote curves to points.
  • For this reason, important image features must be protected by a user-defined mask.
  • A further limitation is that the technique is not stable against small changes in the input and, therefore, not suitable for perframe video processing.

5.4 Morphological Filtering

  • Mathematical morphology (MM) provides a set-theoretic approach to image analysis and processing.
  • For grayscale images, dilation is equivalent to a maximum filter and erosion corresponds to a minimum filter.
  • Morphological smoothing is applied in Bousseau et al.'s [9] , [10] work on watercolor rendering and in Bangham's et al.'s [5] oil paintings to simplify input images and videos before rendering.
  • Because opening and closing are dual, this is equivalent to inverting the output of morphological smoothing applied to the inverted image.
  • Then, for every pixel the probability of the pixel's value belonging to a certain cluster is defined.

5.5 Gradient Domain Techniques

  • In recent years, gradient domain methods have become very popular in computer vision and computer graphics [3] .
  • The basic idea behind such methods is to construct a gradient field representing the result.
  • Using scale-space analysis, they extracted a multi-scale Canny edge representation with lifetime and best scale information, which is used to define the gradient field and allows for image operations such as detail removal and shape abstraction.
  • Besides being computationally expensive, this technique is also known not to create temporally coherent output for video.
  • Bhat et al. [8] have presented a robust optimization framework that allows for the specification of zero-order (pixel value) and first-order (gradient value) constraints over space and time.

6 FUTURE CHALLENGES

  • Over the past two decades, IB-AR has delivered many high-quality expressive rendering algorithms and interactive systems.
  • As the field gathered momentum, researchers sought to identify the key emerging challenges.
  • Artistic rendering or Artistic stylization is also in common parlance, whilst illustrative visualization is being used for approaches in Salesin's third challenge.
  • DeCarlo and Stone's discussion focused on visual explanations, that IB-AR can enhance communication by simplification through structural abstraction.

6.1 Evaluation

  • Almost one decade since Salesin's panel discussion of this problem, few papers present structured methodologies for evaluation.
  • Evaluation work more closely aligned with Salesin's visual communication challenge was proposed by Gooch et al. [43] and Winnemöller et al. [164] in their portrait abstractions.
  • Methodologies have been developed to evaluate specific aspects of IB-AR such as visual interest [132] and stippling aesthetics [96] .
  • No gold standard methodology has emerged for NPR evaluation.

6.2 Interaction

  • Passing the artistic Turing test), the frequently stated motivation of contemporary IB-AR work is to retain human creativity and to deliver useful tools and new artistic media.
  • This trend also reflects the limitations of contemporary computer vision and shows that, by carefully designing minimal but well-placed interaction, a high-quality automated visual effects workflow can result.
  • Addressing this is especially important if, as Gooch et al. [40] suggest, IB-AR's priority is to develop new artistic media and tools.
  • Collaboration with end-users is essential in closing this cycle.
  • Connections could be forged with research communities studying computational creativity and evolutionary art.

6.3 Technical directions

  • The technical direction of algorithmic research in IB-AR is challenging to predict for a longer term but may develop in the direction of several established mid-term trends.
  • Willats and Durand [162] clearly differentiate between such renderings and current IB-AR when writing about the distinction between spatial and depictive systems.
  • By contrast, video stylization approaches based on computer vision can perform more aggressive abstraction through mid-level scene parsing (e. g., segmentation) at the cost of generality.
  • There is a tendency for complex image processing decisions to become less stable in the presence of noise.
  • Overall aesthetics are heavily influenced by media realism, especially in the emulation of traditional artistic styles.

Did you find this useful? Give us your feedback

Figures (12)

Content maybe subject to copyright    Report

HAL Id: hal-00781502
https://hal.inria.fr/hal-00781502
Submitted on 19 Jul 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
State of the ”Art”: A Taxonomy of Artistic Stylization
Techniques for Images and Video
Jan Eric Kyprianidis, John Collomosse, Tinghuai Wang, Tobias Isenberg
To cite this version:
Jan Eric Kyprianidis, John Collomosse, Tinghuai Wang, Tobias Isenberg. State of the ”Art”: A Tax-
onomy of Artistic Stylization Techniques for Images and Video. IEEE Transactions on Visualization
and Computer Graphics, Institute of Electrical and Electronics Engineers, 2013, 19 (5), pp.866-885.
�10.1109/TVCG.2012.160�. �hal-00781502�

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 5, MAY 2013 (AUTHORS’ VERSION) 1
State of the ‘Art’: A Taxonomy of Artistic
Stylization Techniques for Images and Video
?
Jan Eric Kyprianidis, John Collomosse, Tinghuai Wang, and Tobias Isenberg
Abstract
—This paper surveys the field of non-photorealistic rendering (NPR), focusing on techniques for transforming 2D input
(images and video) into artistically stylized renderings. We first present a taxonomy of the 2D NPR algorithms developed over
the past two decades, structured according to the design characteristics and behavior of each technique. We then describe
a chronology of development from the semi-automatic paint systems of the early nineties, through to the automated painterly
rendering systems of the late nineties driven by image gradient analysis. Two complementary trends in the NPR literature are then
addressed, with reference to our taxonomy. First, the fusion of higher level computer vision and NPR, illustrating the trends toward
scene analysis to drive artistic abstraction and diversity of style. Second, the evolution of local processing approaches toward
edge-aware filtering for real-time stylization of images and video. The survey then concludes with a discussion of open challenges
for 2D NPR identified in recent NPR symposia, including topics such as user and aesthetic evaluation.
Index Terms—Image and Video Stylization, Non-photorealistic Rendering (NPR), Artistic Rendering.
1 INTRODUCTION
A
S the advent of photography stimulated artistic
diversity in the late 19
th
century, so did the suc-
cesses of photorealistic computer graphics in the early
nineties motivate alternative techniques for rendering in
non-photorealistic styles. Two decades later, the field of
non-photorealistic rendering (NPR) has expanded into a
vibrant area of research covering a plethora of expressive
rendering styles for the visual communication: exploded
diagrams [88], false color [124], [126], and artistic styles
such as painterly [10], [168] and constrained palette
rendering [106], [167]. It is this latter category of artistic
rendering (AR) that forms the subject of this survey;
specifically, techniques focusing on artistic stylization
of two-dimensional content (photographs and video) to
which we refer as image-based artistic rendering (IB-AR).
IB-AR’s origins reach back to seminal works exploring
the emulation of traditional artistic media and styles [25],
[47], [55], [90], [130]. Today, IB-AR has diversified into a
highly cross-disciplinary activity, which builds upon com-
puter vision (CV), perceptual modeling, human computer
interaction (HCI), and computer graphics. Many classic
IB-AR problems have been found to closely relate to long-
standing problems in computer graphics or computer
vision; for example, video cartooning [21], [156] and its
relationship to video matting and automated rotoscoping
[2]. In many cases computer graphics problems have
benefited from or motivated entirely new computer vision
J. E. Kyprianidis is with the Computer Graphics Systems Group of the
Hasso-Plattner-Institut, University of Potsdam, Germany.
T. Wang and J. Collomosse are with the Centre for Vision, Speech and
Signal Processing, University of Surrey, UK.
T. Isenberg is with the University of Groningen’s Johan Bernoulli
Institute, the Netherlands, and with DIGITEO/CNRS/INRIA, France.
?
This is the authors’ version of the work. The definitive version was
published in IEEE Transactions on Visualization and Computer Graphics.
Vol. 19, No. 5, pp. 866–885, 2013. doi: 10.1109/TVCG.2012.160.
research. Similarly, the goal of much IB-AR research—
that of producing a creative or artistic tool—demands a
careful, user-led HCI design process.
Despite several years of discipline convergence and the
resulting improvements in aesthetic quality and diversity,
there have been few surveys of the IB-AR literature
in the past decade. Common references for IB-AR are
the texts of Gooch and Gooch [42] and Strothotte and
Schlechtweg [148], both of which surveyed pre-2000
techniques (Sec. 3). The majority of other survey material
takes the form of conference tutorials; yet these primarily
focus upon illustrative visualization [95] or NPR for 3D
graphics and games [100]. This survey follows up a recent
tutorial [18] by some of the authors at Eurographics 2011,
prior to which the most recent major conference tutorials
on the topic were by Hertzmann et al. [95] in 2003 and
Green et al. [44] in 1999. Also, a number of web-based
curated bibliographies are available via Reynolds [127]
(to 2004), Schlechtweg [133] (to 2007), and Stavrakis [145].
This article delivers a comprehensive view of the
IB-AR landscape, covering classical and contemporary
techniques while offering two perspectives. First, we
provide an up-to-date taxonomy of IB-AR techniques in
which algorithms are grouped according to the family of
techniques used (e. g., nonlinear filters, region segmenta-
tion) or design characteristics (e. g., local greedy, or global
optimization approaches to rendering).
Second, we present IB-AR’s development in chrono-
logical order, from the early nineties to the modern day
(c. 2011), to reflect the contemporaneous development
of techniques clustered together in our taxonomy; for
example local methods, followed later by global methods.
We first document ‘classical’ (pre-2000) IB-AR and so
introduce the key concepts and algorithms that continue
to underpin and influence more contemporary methods
(Sec. 3). These classical algorithms focused on the stroke-
based rendering (SBR) paradigm [47], [58] with increasing

2 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 5, MAY 2013 (AUTHORS’ VERSION)
1980 1990 1997 1998 2002 2004 2005 2006 2008 2010
NPAR
2010
Grand
Challen-
ges
Video
painting
Litwinowicz’97
Semi-automatic
painting systems
Haeberli’90
Fully automatic
painting
Hertzmann’98
Treveatt’97
Perceptual UI and
segmentation
DeCarlo’02
Automatic
perceptual
Collomosse’05
Space-time
video
Wang’04
Collomosse’05
GPU-based image
processing
Winnemöller’06
Kang’07/’09
Kyprianidis’08/’09
Late 1980s
Advances
in media
emulation
Strassmann’86
User evaluation
Isenberg’06
Fig. 1. Chronology of IB-AR development. From the semi-automated SBR systems of the early nineties, to increasingly automated systems
drawing upon image processing. Later the aesthetic gamut is enhanced through more sophisticated computer vision and edge-aware filtering.
Recently attention returns to user interaction, raising new questions around the evaluation of aesthetics and usability.
levels of automation and sophistication in stroke place-
ment and driven by low-level image processing (typically
the Sobel operator).
Next, we describe how the early convergence of
computer graphics and image processing developed,
enabling IB-AR to draw increasingly upon the more
sophisticated image analysis offered by contemporary
computer vision algorithms (Sec. 4). One consequence of
the increasingly sophisticated interpretation or ‘parsing’
of the image was a divergence from SBR to alternative
forms of rendering primitives: the use of regions and
tiles which, in turn, unlocked greater diversity in the
gamut of styles available to IB-AR. In line with the
trend toward more complex image analysis, we also
observe IB-AR to be defined increasingly as a goal-
directed task—drawing upon global optimization rather
than local approaches. Although these goals were initially
defined at the low level of image artifacts (e. g., image
gradient), the description of these goals later evolved to
include higher level concepts such as perceptual salience
measures and even emotional or ‘affective’ contexts.
In parallel with the trend toward more sophisticated
scene analysis, IB-AR benefited from the emerging
popularity of anisotropic and edge-preserving forms of
filters in computer graphics (Sec. 5). On the one hand,
such operations lacked high-level image ‘understanding’,
limiting their artistic gamut to painterly, sketchy, and
cartoon styles. On the other hand, their simplicity led
to real-time speeds on GPU hardware, making them
practical for video processing—and applicable to footage
(e. g., water, smoke, fur) that is otherwise challenging to
parse using vision methods such as segmentation.
Concluding, we catalog a number of challenges that
remain outstanding in IB-AR (Sec. 6).
2 TAXONOMY OF IB-AR TECHNIQUES
Early prototype IB-AR systems followed the SBR
paradigm and synthesized artistic renderings by incre-
mentally compositing virtual brush strokes whose color,
orientation, scale, and ordering were derived from semi-
[47] or fully automated processes [55], [90], [151]. The
aesthetics of the output generated by a SBR algorithm
is, therefore, a function of both the media simulation
applied to render each brush stroke and the process by
which strokes are positioned and their attributes are set
(referred to hereafter as the stroke placement algorithm).
Although sometimes described simultaneously in early IB-
AR papers, the problems of media emulation and stroke
placement may be considered de-coupled. The curved
spline strokes placed by Hertzmann’s [55] algorithm
could be rendered by sweeping various brush models
along their trajectories, to emulate thick oil paint, crayon,
charcoal, or pastel, to name but a few different media.
It is, therefore, not surprising that IB-AR has evolved in
parallel with increasingly sophisticated media emulation
models; from simple simulations of hairy brushes [146]
to full multi-layered models of pigment diffusion and
bi-directional transfer between brush and canvas [25].
A detailed exposition of media simulation warrants a
survey in its own right, but in this work we focus only
on the problem of stroke placement, or more generally,
the placement of artistic rendering primitives (regions,
strokes, stipples, tiles). We also survey nonlinear filters
that introduce an anisotropy that conveys the impression
of stroke placement. Accordingly, our taxonomy avoids
the categorization of IB-AR purely in terms of media
(painterly, sketch, cartoon shading) and instead clusters
the space of IB-AR algorithms by the elementary render-
ing primitive or stylization mechanism employed. We
then expand the lower branches of the taxonomy by
considering similarities in the nature of the algorithm;
local approaches vs. global arrangement strategies, or
approaches that address the rendering of outlines vs. the
interior of image regions.
2.1 Stroke-based Rendering (SBR)
SBR algorithms cover a 2D canvas with atomic rendering
primitives according to some process or desired end goal,
designed to simulate a particular style. In many SBR
algorithms these primitives are the eponymous virtual
brush stroke, but the definition of SBR has diversified to
primitives including tiles, stipples and hatch marks [58].
2.1.1 Brush Stroke Techniques
The most prevalent form of IB-AR are perhaps SBR
algorithms using either short dabs of paint, or long
curved brush strokes as rendering primitives. The process
of covering the canvas can be categorized broadly as
local or global. Local approaches typically drive stroke
placement decisions based on the pixels in the spatial
neighborhood of the stroke; this can be explicit in the
algorithm (e. g., image moments within a window [140],
[151]) or implicit due to a prior convolution (e. g., Sobel
edges). An alteration to the image would thus affect
only strokes in the locality. Global methods optimize

KYPRIANIDIS et al.: A TAXONOMY OF ARTISTIC STYLIZATION TECHNIQUES FOR IMAGES AND VIDEO 3
Stroke-based Rendering for Image Approximation
Brush Stroke Techniques
Local
User Interaction
Low
Level
Haeberli’90 [47]
Salisbury’96 [128]
Salisbury’97 [130]
Curtis’97 [25]
Gooch’04 [43]
Grubert’08 [45]
Lin’10 [89]
Kagaya’11 [66]
O’Donovan’11 [108]
Perceptual
Measure
Santella’02 [131]
Automatic
Low Level
Image
Haggerty’91 [48]
Treavett’97 [151]
Salisbury’97 [130]
Hertzmann’98 [55]
Shiraishi’00 [140]
Sziranyi’00 [150]
Wen’06 [161]
Video
Litwinowicz’97 [90]
Hertzmann’00 [60]
Kovacs’02 [79]
Hays’04 [52]
Park’07 [122]
Lu’10 [94]
Perceptual
Measure
Collomosse’02 [15]
Collomosse’05 [17]
Shugrina’06 [141]
Colton’08 [22]
Global
User-guided
Emphasis
Hertzmann’01 [56]
Tresset’05 [152]
Automatic
Emphasis
Szirányi’01 [150]
Collomosse’05 [21]
Mosaicking & Tiling
Still
Hausner’01 [51]
Kim’02 [73]
Dobashi’02 [32]
Elber’03 [37]
Di Blasi’05 [31]
Faustino’05 [39]
Schlechtweg’05 [134]
Orchard’08 [110]
Xu’07 [166]
Xu’08 [167]
Hurtut’09 [63]
Animated
Klein’02 [76]
Smith’05 [142]
Dalal’06 [27]
Kang’11 [67]
Tonal Depiction
Stippling
Local
Single
Resolution
Ulichney’87 [153]
Ostromoukhov’93 [113]
Ostromoukhov’94 [117]
Ostromoukhov’99 [114]
Ostromoukhov’99 [116]
Multiple
Resolution
Streit’98 [147]
Global
Spatial
Constraint
Deussen’00 [30]
Secord’02 [137]
Hiller’03 [61]
Schlechtweg’05 [134]
Kopf’06 [78]
Mould’07 [105]
Vanderhaeghe’07 [154]
Structure and
Spatial Constraint
Kim’08 [72]
Pang’08 [118]
Kim’09 [74]
Martin’11 [98]
Li’11 [87]
Hatching and
Line Art
Salisbury’94 [129]
Dafner’00 [26]
Pedersen’06 [123]
Pang’08 [118]
Mi’09 [103]
Inglis’11 [64]
Region-based Techniques
Image
Fill
Gooch’02 [41]
Mould’03 [104]
O’Donovan’06 [109]
Setlur’06 [139]
Shugrina’06 [141]
Xu’08 [167]
Form/Shape
Salisbury’96 [128]
Salisbury’97 [130]
Gooch’04 [43]
Grubert’08 [45]
Song’08 [144]
Composition
Collomosse’03 [16]
Hall’07 [49]
Hierarchical
DeCarlo’02 [29]
Bangham’03 [5]
Mould’08 [106]
Zeng’09 [168]
Zhao’10 [169]
Video
Appearance
2D+t
Agarwala’02 [1]
Collomosse’03 [20]
Agarwala’04 [2]
Collomosse’05 [21]
Bousseau’06 [9]
Bousseau’07 [10]
Wang’10 [157]
Kagaya’11 [66]
O’Donovan’11 [108]
3D
Wang’04 [156]
Lin’10 [89]
Motion
Stylization
Collomosse’03 [19]
Smith’05 [142]
Liu’05 [91]
Wang’06 [155]
Example-based Techniques
Color
Reinhard’01 [126]
Neumann’05 [107]
Xiao’09 [165]
Pouli’11 [124]
Texture
Hertzmann’01 [59]
Ashikhmin’03 [4]
Hashimoto’03 [50]
Kim’09 [74]
Lee’10 [86]
Martin’11 [98]
Zhao’11 [170]
Image Processing and Filtering
Spatial Domain
Outlines
First
Derivative
Orzan’07 [111]
Second
Derivative
Gooch’04 [43]
Winnemöller’06 [164]
Kang’07 [69]
Kyprianidis’08 [82]
Kang’09 [70]
Winnemöller’11 [163]
Content
Anisotropic
Diffusion
Winnemöller’06 [164]
Kang’07 [69]
Kang’08 [68]
Kyprianidis’08 [82]
Kang’09 [70]
Kyprianidis’11 [83]
Local
Statistics
Papari’07 [119]
Kyprianidis’09 [84]
Kyprianidis’11 [81]
Morphological
Filtering
Bousseau’06 [9]
Bousseau’07 [10]
Papari’09 [120]
Criminisi’10 [24]
Gradient Domain
Orzan’07 [111]
Bhat’10 [8]
Fig. 2. Taxonomy of IB-AR techniques.
the placement of all strokes to minimize some objective
function. Various strategies have been applied from
snake relaxation [56], to evolutionary algorithms [17],
and Monte-Carlo optimization [150]. In all cases the
desired objective relates to retention of detail, for example,
encouraging maximal retention of visual detail [56], [150]
using low-level operators (e. g., Sobel gradient) or higher-
level measures such as image salience to retain only
perceptually important detail [17].
On the more heavily populated ‘local’ branch of the
SBR taxonomy, we partition algorithms into user-assisted
and automatic processes—the former typically pre-dating
the latter, pointing to a trend toward automation post-
nineties. The mechanism behind the automation can, as
with ‘global’ SBR, be divided into lower- and higher-level
analysis according to the definition of the ‘importance’
field that guides the emphasis of features in the artwork.
In the parallel SBR branch of semi-automated (i. e., user-
assisted) algorithms, the low/high-level distinction is
again mirrored; with early techniques relying on image
filters to orient brush strokes [47] and later work—pre-
dating automated measures for emphasis—using gaze
trackers to directly harness the perceptual measures
inherent in the human visual system [131]. In some recent
automated algorithms, stroke placement is influenced by
even higher-level contextual parameters such as emotion
and mood [22], [141]. Most recently, there has been a trend
back toward interaction, producing semi-automated tools
for painterly video that enable keyframing of the fields
used to arrange strokes [66], [89], [108].
For automatic techniques, a clear distinction can be
made between those operating over images versus video
content. Video extensions of SBR are non-trivial as
strokes must not scintillate (flicker) and their motion
must match the underlying video content. In the SBR
branch of the taxonomy this problem has largely been
addressed—though by no means solved—using optical
flow. Elsewhere, nonlinear filters and segmentation have
been applied.
2.1.2 Mosaicking, Tiling and Stippling
A further sub-category of SBR aims to approximate the
image using a medium other than colored pixels or
paint, packing image regions with a multitude of atomic
rendering primitives. The techniques approximate the
image content by either (i) stippling, the distribution of
small points (stipples) often for the purpose of tonal
depiction; (ii) hatching, the use of line patterns or curves
for the same; and (iii) mosaicking algorithms that pack
small tiles together.
Stippling IB-AR techniques are closely related to digital
half-toning and dithering algorithms that locally approxi-
mate regions using dot patterns, either with the sole goal
of representing a local brightness or with an additional
artistic intent [114]. Many early half-toning techniques
developed heuristically informed greedy strategies for
populating regions with stipples to avoid artifacts due
to aliasing. Such techniques operate at either single
or multiple scales, placing dots using local decision
making. This culminated most recently in techniques
designed to emphasize image structure [118], following

4 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 5, MAY 2013 (AUTHORS’ VERSION)
the trend toward perceptual analysis in SBR. In contrast
to half-toning, stippling does not simply decide whether
to use a black or a white pixel on a regular grid
but tries to place larger dots, with the shared goal to
represent the brightness and to (typically) avoid visible
patterns. Early stippling used a number of brush-based
techniques [30]. However, much as local SBR painterly
approaches evolved into global relaxation approaches, so
image stippling began to adopt a more global strategy
for stipple placement. Recently, goals in stippling are to
capture and replicate aspects of the stippling style of
artists [74], [98] or to be able to reproduce non-repetitive
patterns [78]. A smaller subset of IB-AR explored the
approximation of images using lines and curves. Aside
from dedicated image-based hatching approaches [129],
some techniques grow labyrinthian patterns using space-
filling curves [26] or reaction diffusion processes [123]
that adapt to the intensity of the image.
Artistic mosaicking algorithms are closely related to
packing problems, and so are approached almost uni-
versally as global optimization problems. While packing
strategies vary widely, they can be categorized into those
obeying purely spatial or spatio-temporal constraints.
The latter are especially challenging since a balance
must be maintained between a faithful approximation of
frame content and the introduction of flicker (temporal
incoherence) due to frequent update of the tile or glyph
chosen to represent a particular spatial region.
2.2 Region-based Techniques
Much as SBR in the 1990s relied increasingly on low-
level image processing (e. g., intensity gradient, moments,
optical flow), a trend post-2000 was the emergence of
mid-level computer vision in IB-AR. Segmentation is
frequently incorporated as step toward parsing image
structure, enabling the adaptation of rendering according
to the content in regions. In some techniques, SBR
algorithms are applied to render the interiors of regions
independently [41], [141], [157]. However, the use of
regions as rendering primitives in their own right has
also given rise to additional styles including cartoon
‘flat’ shading [21], [156], new materials such as stained
glass [104], [139], felt [109], and even emulation of abstract
artistic styles [16].
For images, we categorize region-based approaches
into those considering the arrangement of rendering
primitives (e. g., strokes) within the interiors of regions
and those manipulating shape, form, and composition of
regions. A further category explores techniques based on
image pyramids. Various interactive techniques (human
gaze-trackers [29], importance maps [5]) are used to
browse a region containment hierarchy constructed by
segmenting successively lower resolution versions of the
source image. An image can be rendered at a high level of
abstraction by drawing only coarse large regions near the
top of the hierarchy, or particular regions can be rendered
in greater detail at lower levels. This enables local control
over the level of detail. Such methods were among the
first region-based IB-AR algorithms and are significant by
being among the first to consider perceptual importance.
The consideration of regions in IB-AR has also ben-
efited video stylization, offering an alternative to SBR
techniques dependent on optical flow. Video segmenta-
tion is a well-studied problem in computer vision and
is broadly separated into two categories: techniques that
segment frames independently and associate regions over
time (2D
+t
) and those segmenting video as a spatio-
temporal
(x
,
y
,
t)
volume (3D). Both methodologies have
seen applications to IB-AR for the purpose of cartooning
or otherwise stylizing the appearance of video. All
techniques share the observation that once video has
been coherently segmented into regions (a non-trivial
problem), the problem of hatching, sketching, or painting
with temporal coherence can be solved by attaching
strokes to a rigid [21] or deforming [2] region. This frames
the problem of IB-AR as one of automated rotoscoping.
Finally, when considering regions, it is possible to track
and analyze the motion of objects. This gives rise to
a complementary form of video stylization—that of
artistically manipulating object motion.
2.3 Example-based Rendering
Most IB-AR algorithms encode a set of heuristics, typ-
ically emulating artistic practice with the goal of faith-
fully depicting a prescribed style. A complementary
approach to IB-AR—example-based rendering pioneered
by Hertzmann et al. [59]—learns the mapping between an
exemplar pair: a source image and an artist’s rendering
of that image. The learned mapping can then be applied
to render arbitrary images in the exemplar style.
Example-based rendering (EBR) can be categorized
as performing either texture or color transfer. Color
EBR typically performs a piecewise mapping between
the color histograms of two images to effect a non-
photorealistic recoloring. Often there is only weak enforce-
ment of spatial coherence in the color mapping process.
By contrast, texture-based EBR shares similarities with
patch-based texture in-filling techniques [35], [36], which
seek to fill holes in images by searching for visually
similar patches elsewhere in the image. However, in
the case of EBR the patches are not matched within
the source image to be rendered but instead within the
exemplar source image. The corresponding patch from
the exemplar artistic image is then pasted into place in
the output rendering. As with texture in-filling, a careful
balance must be maintained between fidelity of the patch
matching and the spatial coherence in the rendering.
2.4 Image Processing and Filtering
Many image processing filters have been explored for
IB-AR but few have been recognized so far to produce
interesting results from an artistic point of view. This is
probably because these filters are often concerned with
the restoration and recovery of photorealistic imagery. By
contrast, IB-AR generally aims for simplification.

Citations
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: A Neural Algorithm of Artistic Style is introduced that can separate and recombine the image content and style of natural images and provide new insights into the deep image representations learned by Convolutional Neural Networks and demonstrate their potential for high level image synthesis and manipulation.
Abstract: Rendering the semantic content of an image in different styles is a difficult image processing task. Arguably, a major limiting factor for previous approaches has been the lack of image representations that explicitly represent semantic information and, thus, allow to separate image content from style. Here we use image representations derived from Convolutional Neural Networks optimised for object recognition, which make high level image information explicit. We introduce A Neural Algorithm of Artistic Style that can separate and recombine the image content and style of natural images. The algorithm allows us to produce new images of high perceptual quality that combine the content of an arbitrary photograph with the appearance of numerous wellknown artworks. Our results provide new insights into the deep image representations learned by Convolutional Neural Networks and demonstrate their potential for high level image synthesis and manipulation.

4,888 citations


Cites background from "State of the "Art”: A Taxono..."

  • ...For a recent review of the field we refer the reader to [21]....

    [...]

Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this paper, adaptive instance normalization (AdaIN) is proposed to align the mean and variance of the content features with those of the style features, which enables arbitrary style transfer in real-time.
Abstract: Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.

2,266 citations

Posted Content
TL;DR: This paper presents a simple yet effective approach that for the first time enables arbitrary style transfer in real-time, comparable to the fastest existing approach, without the restriction to a pre-defined set of styles.
Abstract: Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.

1,286 citations


Cites background from "State of the "Art”: A Taxono..."

  • ...des abundant user controls at runtime, without any modification to the training process. 2. Related Work Style transfer. The problem of style transfer has its origin from non-photo-realistic rendering [28], and is closely related to texture synthesis and transfer [13,12,14]. Some early approaches include histogram matching on linear filter responses [19] and non-parametric sampling [12,15]. These method...

    [...]

Posted Content
TL;DR: This work introduces an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality and offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.
Abstract: In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.

1,019 citations

Proceedings Article
01 Jan 2017
TL;DR: In this article, adaptive instance normalization (AdaIN) is proposed to align the mean and variance of the content features with those of the style features, which enables arbitrary style transfer in real-time.
Abstract: Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.

972 citations

References
More filters
Journal ArticleDOI
TL;DR: This work uses snakes for interactive interpretation, in which user-imposed constraint forces guide the snake near features of interest, and uses scale-space continuation to enlarge the capture region surrounding a feature.
Abstract: A snake is an energy-minimizing spline guided by external constraint forces and influenced by image forces that pull it toward features such as lines and edges. Snakes are active contour models: they lock onto nearby edges, localizing them accurately. Scale-space continuation can be used to enlarge the capture region surrounding a feature. Snakes provide a unified account of a number of visual problems, including detection of edges, lines, and subjective contours; motion tracking; and stereo matching. We have used snakes successfully for interactive interpretation, in which user-imposed constraint forces guide the snake near features of interest.

18,095 citations


"State of the "Art”: A Taxono..." refers background in this paper

  • ...the original snakes algorithm [71] moves the curve incrementally closer to an edge over time by minimizing the distance between the curve and edges in the image....

    [...]

Journal ArticleDOI
S. P. Lloyd1
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Abstract: It has long been realized that in pulse-code modulation (PCM), with a given ensemble of signals to handle, the quantum values should be spaced more closely in the voltage regions where the signal amplitude is more likely to fall. It has been shown by Panter and Dite that, in the limit as the number of quanta becomes infinite, the asymptotic fractional density of quanta per unit voltage should vary as the one-third power of the probability density per unit voltage of signal amplitudes. In this paper the corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy. The optimization criterion used is that the average quantization noise power be a minimum. It is shown that the result obtained here goes over into the Panter and Dite result as the number of quanta become large. The optimum quautization schemes for 2^{b} quanta, b=1,2, \cdots, 7 , are given numerically for Gaussian and for Laplacian distribution of signal amplitudes.

11,872 citations

Journal ArticleDOI
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Abstract: A general non-parametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure: the mean shift. For discrete data, we prove the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-Watson estimator from kernel regression and the robust M-estimators; of location is also established. Algorithms for two low-level vision tasks discontinuity-preserving smoothing and image segmentation - are described as applications. In these algorithms, the only user-set parameter is the resolution of the analysis, and either gray-level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

11,727 citations

S. P. Lloyd1
01 Jan 1982
TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
Abstract: It has long been realized that in pulse-code modulation (PCM), with a given ensemble of signals to handle, the quantum values should be spaced more closely in the voltage regions where the signal amplitude is more likely to fall. It has been shown by Panter and Dite that, in the limit as the number of quanta becomes infinite, the asymptotic fractional density of quanta per unit voltage should vary as the one-third power of the probability density per unit voltage of signal amplitudes. In this paper the corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy. The optimization criterion used is that the average quantization noise power be a minimum. It is shown that the result obtained here goes over into the Panter and Dite result as the number of quanta become large. The optimum quautization schemes for 2^{b} quanta, b=1,2, \cdots, 7 , are given numerically for Gaussian and for Laplacian distribution of signal amplitudes.

9,602 citations


"State of the "Art”: A Taxono..." refers methods in this paper

  • ...The earliest stipple techniques achieved this goal using Lloyd’s method [93], [99], which computes the Voronoi diagram of a point distribution and...

    [...]

  • ...One issue with Lloyd’s method [93], [99] is that it introduces a regularity of stipple dot placement, leading...

    [...]

Proceedings ArticleDOI
20 Sep 1999
TL;DR: A non-parametric method for texture synthesis that aims at preserving as much local structure as possible and produces good results for a wide variety of synthetic and real-world textures.
Abstract: A non-parametric method for texture synthesis is proposed. The texture synthesis process grows a new image outward from an initial seed, one pixel at a time. A Markov random field model is assumed, and the conditional distribution of a pixel given all its neighbors synthesized so far is estimated by querying the sample image and finding all similar neighborhoods. The degree of randomness is controlled by a single perceptually intuitive parameter. The method aims at preserving as much local structure as possible and produces good results for a wide variety of synthetic and real-world textures.

2,972 citations


"State of the "Art”: A Taxono..." refers background in this paper

  • ...By contrast, texture-based EBR shares similarities with patch-based texture in-filling techniques [35], [36], which seek to fill holes in images by searching for visually similar patches elsewhere in the image....

    [...]

  • ...[35], [36]) in-fill from the edges of ‘holes’ in an image—iteratively copying patches from elsewhere in the image that share similarity with adjacent texture....

    [...]

Frequently Asked Questions (16)
Q1. What are the contributions in "State of the ”art”: a taxonomy of artistic stylization techniques for images and video" ?

This paper surveys the field of non-photorealistic rendering ( NPR ), focusing on techniques for transforming 2D input ( images and video ) into artistically stylized renderings. The authors first present a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique. The authors then describe a chronology of development from the semi-automatic paint systems of the early nineties, through to the automated painterly rendering systems of the late nineties driven by image gradient analysis. 

They express the view that ( 6 ) remains the most promising direction ; that NPR should “ not just imitate and emulate styles of the past but create styles for the future. ” They also observe that Salesin ’ s research questions regarding definitions of aesthetics and the artistic Turing test should be given equal weight in terms of new artistic styles emerging as a consequence of NPR. Further positions regarding directions for NPR were presented at NPAR 2010 by DeCarlo and Stone [ 28 ] and Hertzmann [ 54 ]. 

There are two main approaches to such example-based rendering (EBR): methods seeking to perform texture transfer (typically performed by modulating the luminance channel) and those focusing on color transfer leaving texture constant. 

The bilateral filter smoothes low-contrast regions while preserving high-contrast edges, but may fail for highcontrast images where either no abstraction is performed or salient visual features may be removed. 

Work approximating images with irregular tiles (e. g., jigsaw image mosaics [73]) can be considered extensions of photomosaicking. 

Since watercolor paintings typically have light colors, Bousseau et al. [10] proposed to swap the order of the morphological operators and apply closing followed by opening. 

Green et al. [44] report that over 1000 man-hours of manual correction to optical flow fields were required to produce the short painterly scenes in the movie. 

Qu et al. [125], for example, preserve the visual richness of color photographs by applying a range of stippling and related bitonal techniques to different regions in the image. 

These are related to order-statistics filters and applying opening and closing in sequence results in a smoothing operation that is often referred to as morphological smoothing. 

Initially proposed by DeCarlo and Santella [29] as a mechanism for interactive abstraction of photographs (Sec. 4.1), image segmentation has become a cornerstone of many automatic IB-AR algorithms that make rendering decisions based on mid-level structure parsed from the image. 

The majority of artistic EBR algorithms focus on the transfer of artistic texture, and borrow from the nonparametric patch-based methods used for texture synthesis and photo in-painting. 

Various interactive techniques (human gaze-trackers [29], importance maps [5]) are used to browse a region containment hierarchy constructed by segmenting successively lower resolution versions of the source image. 

Although a few IB-AR systems of the early nineties cited their motivation as emulating the artist (i. e., passing the artistic Turing test), the frequently stated motivation of contemporary IB-AR work is to retain human creativity and to deliver useful tools and new artistic media. 

Also not present were the iterative application of the DoG filter [69] and the final smoothing pass to further reduce aliasing of edges. 

Given a starting or seed pixel, a sequence of spline control points is generated by iteratively hopping between pixels normal to the direction of the image gradient (Fig. 4). 

A high quality painting is deemed to be one that matches the source image as closely as possible, using a minimal number of strokes but covering the maximum area of canvas in paint.