scispace - formally typeset
Search or ask a question
Book ChapterDOI

IIIT-CFW: A Benchmark Database of Cartoon Faces in the Wild

TL;DR: This database contains 8,928 annotated images of cartoon faces of 100 public figures and will be useful in conducting research on spectrum of problems associated with cartoon understanding.
Abstract: In this paper, we introduce the cartoon faces in the wild (IIIT-CFW) database and associated problems. This database contains 8,928 annotated images of cartoon faces of 100 public figures. It will be useful in conducting research on spectrum of problems associated with cartoon understanding. Note that to our knowledge, such realistic and large databases of cartoon faces are not available in the literature.
Citations
More filters
Posted Content
TL;DR: This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances.
Abstract: Object detection-the computer vision task dealing with detecting instances of objects of a certain class (e.g., 'car', 'plane', etc.) in images-attracted a lot of attention from the community during the last 5 years. This strong interest can be explained not only by the importance this task has for many applications but also by the phenomenal advances in this area since the arrival of deep convolutional neural networks (DCNN). This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances. The survey covers not only the typical architectures (SSD, YOLO, Faster-RCNN) but also discusses the challenges currently met by the community and goes on to show how the problem of object detection can be extended. This survey also reviews the public datasets and associated state-of-the-art algorithms.

98 citations

Proceedings Article
09 Mar 2017
TL;DR: A new caricature dataset is built, with the objective to facilitate research in caricature recognition, and a framework for caricature face recognition is presented to make a thorough analyze of the challenges of caricature recognition.
Abstract: Studying caricature recognition is fundamentally important to understanding of face perception. However, little research has been conducted in the computer vision community, largely due to the shortage of suitable datasets. In this paper, a new caricature dataset is built, with the objective to facilitate research in caricature recognition. All the caricatures and face images were collected from the Web. Compared with two existing datasets, this dataset is much more challenging, with a much greater number of available images, artistic styles and larger intra-personal variations. Evaluation protocols are also offered together with their baseline performances on the dataset to allow fair comparisons. Besides, a framework for caricature face recognition is presented to make a thorough analyze of the challenges of caricature recognition. By analyzing the challenges, the goal is to show problems that worth to be further investigated. Additionally, based on the evaluation protocols and the framework, baseline performances of various state-of-the-art algorithms are provided. A conclusion is that there is still a large space for performance improvement and the analyzed problems still need further investigation.

55 citations


Cites background from "IIIT-CFW: A Benchmark Database of C..."

  • ...[23] has more images, but only from 100 subjects....

    [...]

  • ...Mishra [23] 100 subjects Caricature: 8,928 images Face: 1,000 images Cartoon recognition...

    [...]

  • ...In fact, there are only four publicly available datasets [1, 6, 16, 23] that are related to caricature recognition, shown in Table 1....

    [...]

Journal ArticleDOI
TL;DR: A novel universal face photo-sketch style transfer method that does not need any image from the source domain for training and flexibly leverages a convolutional neural network representation with hand-crafted features in an optimal way is presented.
Abstract: Face photo-sketch style transfer aims to convert a representation of a face from the photo (or sketch) domain to the sketch (respectively, photo) domain while preserving the character of the subject. It has wide-ranging applications in law enforcement, forensic investigation and digital entertainment. However, conventional face photo-sketch synthesis methods usually require training images from both the source domain and the target domain, and are limited in that they cannot be applied to universal conditions where collecting training images in the source domain that match the style of the test image is unpractical. This problem entails two major challenges: 1) designing an effective and robust domain translation model for the universal situation in which images of the source domain needed for training are unavailable, and 2) preserving the facial character while performing a transfer to the style of an entire image collection in the target domain. To this end, we present a novel universal face photo-sketch style transfer method that does not need any image from the source domain for training. The regression relationship between an input test image and the entire training image collection in the target domain is inferred via a deep domain translation framework, in which a domain-wise adaption term and a local consistency adaption term are developed. To improve the robustness of the style transfer process, we propose a multiview domain translation method that flexibly leverages a convolutional neural network representation with hand-crafted features in an optimal way. Qualitative and quantitative comparisons are provided for universal unconstrained conditions of unavailable training images from the source domain, demonstrating the effectiveness and superiority of our method for universal face photo-sketch style transfer.

28 citations


Cites background from "IIIT-CFW: A Benchmark Database of C..."

  • ...sketches from three caricature datasets [58]–[60], and group them into three collections of sketch styles based on the...

    [...]

Proceedings ArticleDOI
Yi Zheng, Yifan Zhao1, Mengyuan Ren, He Yan, Xiangju Lu, Junhui Liu, Jia Li1 
12 Oct 2020
TL;DR: This work presents a new challenging benchmark dataset, consisting of 389,678 images of 5,013 cartoon characters annotated with identity, bounding box, pose, and other auxiliary attributes, and proposes a multi-task domain adaptation approach that jointly utilizes the human and cartoon domain knowledge with three discriminative regularizations.
Abstract: Recent years have witnessed increasing attention in cartoon media, powered by the strong demands of industrial applications. As the first step to understand this media, cartoon face recognition is a crucial but less-explored task with few datasets proposed. In this work, we first present a new challenging benchmark dataset, consisting of 389,678 images of 5,013 cartoon characters annotated with identity, bounding box, pose, and other auxiliary attributes. The dataset, named iCartoonFace, is currently the largest-scale, high-quality, rich-annotated, and spanning multiple occurrences in the field of image recognition, including near-duplications, occlusions, and appearance changes. In addition, we provide two types of annotations for cartoon media, i.e., face recognition, and face detection, with the help of a semi-automatic labeling algorithm. To further investigate this challenging dataset, we propose a multi-task domain adaptation approach that jointly utilizes the human and cartoon domain knowledge with three discriminative regularizations. We hence perform a benchmark analysis of the proposed dataset and verify the superiority of the proposed approach in the cartoon face recognition task. The dataset is available at https://iqiyi.cn/icartoonface.

25 citations


Cites background from "IIIT-CFW: A Benchmark Database of C..."

  • ...To answer the aforementioned question, two natural problems raise our concerns: 1) what is the desirable need in cartoon dataset? 2) what is the relationship between human faces and cartoon ones? In this less-explored recognition of virtual media, few datasets [2, 7, 14, 21] have been proposed for specific purposes, which can be roughly grouped into two categories....

    [...]

  • ...WebCaricature [14] Caricature recognition 12,016 252 ✗ ✓ Facial Landmark IIIT-CFW [21] Caricature recognition 8,928 100 ✗ ✗ Pose (1D), Age Manga109 [7] Cartoon detection&retrival 21,142 Unified ✓ ✗...

    [...]

  • ...IIIT-CFW can be used for a wide spectrum of problems due to the fact that it contains detailed annotations such as type of cartoon, pose, expression, age group, and etc.WebCaricature [14] is a large photograph-caricature dataset consisting of 6,042 caricatures and 5,974 photographs from 252 persons collected from the web....

    [...]

  • ...IIIT-CFW [21] established a challenging dataset of 8,928 annotated unconstrained cartoon faces of 100 international celebrities....

    [...]

  • ...For example, IIIT-CFW [21] contains 8,928 annotated cartoon faces of 100 international celebrities....

    [...]

Journal ArticleDOI
TL;DR: Zheng et al. as mentioned in this paper designed a dual pathway model with one coarse discriminator and one fine discriminator to capture global structure with local statistics while translation, which can also be used for other high-level image-to-image translation tasks.

24 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

13,037 citations

Journal ArticleDOI
TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Abstract: We develop a face recognition algorithm which is insensitive to large variation in lighting direction and facial expression. Taking a pattern classification approach, we consider each pixel in an image as a coordinate in a high-dimensional space. We take advantage of the observation that the images of a particular face, under varying illumination but fixed pose, lie in a 3D linear subspace of the high dimensional image space-if the face is a Lambertian surface without shadowing. However, since faces are not truly Lambertian surfaces and do indeed produce self-shadowing, images will deviate from this linear subspace. Rather than explicitly modeling this deviation, we linearly project the image into a subspace in a manner which discounts those regions of the face with large deviation. Our projection method is based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variation in lighting and facial expressions. The eigenface technique, another method based on linearly projecting the image space to a low dimensional subspace, has similar computational requirements. Yet, extensive experimental results demonstrate that the proposed "Fisherface" method has error rates that are lower than those of the eigenface technique for tests on the Harvard and Yale face databases.

11,674 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Abstract: Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

8,289 citations


Additional excerpts

  • ...The Google and Facebook database contains 200Million and 4.4 Million faces of 4030 and 8 Million characters respectively....

    [...]

  • ...Since cartoon 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 Databases Variation Size Target applications R G P E A (# people, # images) ORL [11] X X X X X (40, 400) P1 Yale [12] X X × X X (15, 165) P1 Indian Face [13] × X X X X (40, 440) P1 LFW [14] X X X × X (5749, 13233) P1, P2 Texas 3D [15][16] X X × X X (105, 1149) P1 FEI [17] × X X X X (200, 2800) P1 PubFig83 [18] X X X X X (100, 8300) P2 Celeb Faces [19] X X X X X (5346, 87628) P1,P2 IMFDB [20] × X X X X (100, 34512) P1, P2 LWF [22] X X X X X (1500, 8500) P1, P2 FaceScrub [21] X X X X X (530, 106863) P1,P3 Facebook [5] - - - - - (200M, 8M) * Google [4] - - - - - (4.4M, 4030M) * VGG [6] X X X X X (2363, 2.6M) P1,P2 The IIIT-CFW X X X X X (100, 8927) P1,P2,P3,P4,P5,P6,P7 Table 1....

    [...]

  • ...These face databases introduced by Google and Facebook are the largest in size....

    [...]

  • ...These real faces are harvested from Google image search and can be used as query for photo2cartoon retrieval and database for cartoon2photo retrieval....

    [...]

  • ...The CFW is harvested from Google image search....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.
Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

6,384 citations

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work revisits both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network.
Abstract: In modern face recognition, the conventional pipeline consists of four stages: detect => align => represent => classify. We revisit both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network. This deep network involves more than 120 million parameters using several locally connected layers without weight sharing, rather than the standard convolutional layers. Thus we trained it on the largest facial dataset to-date, an identity labeled dataset of four million facial images belonging to more than 4, 000 identities. The learned representations coupling the accurate model-based alignment with the large facial database generalize remarkably well to faces in unconstrained environments, even with a simple classifier. Our method reaches an accuracy of 97.35% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 27%, closely approaching human-level performance.

6,132 citations


Additional excerpts

  • ...Databases Variation Size Target applications R G P E A (# people, # images) ORL [11] X X X X X (40, 400) P1 Yale [12] X X × X X (15, 165) P1 Indian Face [13] × X X X X (40, 440) P1 LFW [14] X X X × X (5749, 13233) P1, P2 Texas 3D [15][16] X X × X X (105, 1149) P1 FEI [17] × X X X X (200, 2800) P1 PubFig83 [18] X X X X X (100, 8300) P2 Celeb Faces [19] X X X X X (5346, 87628) P1,P2 IMFDB [20] × X X X X (100, 34512) P1, P2 LWF [22] X X X X X (1500, 8500) P1, P2 FaceScrub [21] X X X X X (530, 106863) P1,P3 Facebook [5] - - - - - (200M, 8M) * Google [4] - - - - - (4....

    [...]

  • ...Since cartoon 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 Databases Variation Size Target applications R G P E A (# people, # images) ORL [11] X X X X X (40, 400) P1 Yale [12] X X × X X (15, 165) P1 Indian Face [13] × X X X X (40, 440) P1 LFW [14] X X X × X (5749, 13233) P1, P2 Texas 3D [15][16] X X × X X (105, 1149) P1 FEI [17] × X X X X (200, 2800) P1 PubFig83 [18] X X X X X (100, 8300) P2 Celeb Faces [19] X X X X X (5346, 87628) P1,P2 IMFDB [20] × X X X X (100, 34512) P1, P2 LWF [22] X X X X X (1500, 8500) P1, P2 FaceScrub [21] X X X X X (530, 106863) P1,P3 Facebook [5] - - - - - (200M, 8M) * Google [4] - - - - - (4.4M, 4030M) * VGG [6] X X X X X (2363, 2.6M) P1,P2 The IIIT-CFW X X X X X (100, 8927) P1,P2,P3,P4,P5,P6,P7 Table 1....

    [...]