scispace - formally typeset
Search or ask a question
Author

Shengchuan Zhang

Other affiliations: Xidian University
Bio: Shengchuan Zhang is an academic researcher from Xiamen University. The author has contributed to research in topics: Sketch & Computer science. The author has an hindex of 11, co-authored 30 publications receiving 438 citations. Previous affiliations of Shengchuan Zhang include Xidian University.

Papers
More filters
Proceedings ArticleDOI
Yunhan Shen1, Rongrong Ji1, Shengchuan Zhang1, Wangmeng Zuo1, Yan Wang2 
18 Jun 2018
TL;DR: This paper proposes a novel generative adversarial learning paradigm for weakly supervised object detection that outperforms all existing schemes in terms of detection accuracy and adapts a structural similarity loss in combination with an adversarial loss into the training objective, which solves the challenge that the bounding boxes produced by the surrogator may not well capture their ground truth.
Abstract: Weakly supervised object detection has attracted extensive research efforts in recent years. Without the need of annotating bounding boxes, the existing methods usually follow a two/multi-stage pipeline with an online compulsive stage to extract object proposals, which is an order of magnitude slower than fast fully supervised object detectors such as SSD [31] and YOLO [34]. In this paper, we speedup online weakly supervised object detectors by orders of magnitude by proposing a novel generative adversarial learning paradigm. In the proposed paradigm, the generator is a one-stage object detector to generate bounding boxes from images. To guide the learning of object-level generator, a surrogator is introduced to mine high-quality bounding boxes for training. We further adapt a structural similarity loss in combination with an adversarial loss into the training objective, which solves the challenge that the bounding boxes produced by the surrogator may not well capture their ground truth. Our one-stage detector outperforms all existing schemes in terms of detection accuracy, running at 118 frames per second, which is up to 438A— faster than the state-of-the-art weakly supervised detectors [8, 30, 15, 27, 45]. The code will be available publicly soon.

86 citations

Journal ArticleDOI
Shengchuan Zhang1, Xinbo Gao1, Nannan Wang1, Jie Li1, Mingjin Zhang1 
TL;DR: This paper presents a novel method that combines both the similarity between different image patches and prior knowledge to synthesize face sketches and outperforms several state-of-the-arts in terms of perceptual and objective metrics.
Abstract: Face sketch synthesis has wide applications in digital entertainment and law enforcement. Although there is much research on face sketch synthesis, most existing algorithms cannot handle some nonfacial factors, such as hair style, hairpins, and glasses if these factors are excluded in the training set. In addition, previous methods only work on well controlled conditions and fail on images with different backgrounds and sizes as the training set. To this end, this paper presents a novel method that combines both the similarity between different image patches and prior knowledge to synthesize face sketches. Given training photo-sketch pairs, the proposed method learns a photo patch feature dictionary from the training photo patches and replaces the photo patches with their sparse coefficients during the searching process. For a test photo patch, we first obtain its sparse coefficient via the learnt dictionary and then search its nearest neighbors (candidate patches) in the whole training photo patches with sparse coefficients. After purifying the nearest neighbors with prior knowledge, the final sketch corresponding to the test photo can be obtained by Bayesian inference. The contributions of this paper are as follows: 1) we relax the nearest neighbor search area from local region to the whole image without too much time consuming and 2) our method can produce nonfacial factors that are not contained in the training set and is robust against image backgrounds and can even ignore the alignment and image size aspects of test photos. Our experimental results show that the proposed method outperforms several state-of-the-arts in terms of perceptual and objective metrics.

80 citations

Proceedings ArticleDOI
02 Mar 2021
TL;DR: Li et al. as discussed by the authors propose Hierarchical Style Disentanglement (HiSD), which disentangles the labels into independent tags, exclusive attributes, and disentangled styles from top to bottom.
Abstract: Recently, image-to-image translation has made significant progress in achieving both multi-label (i.e., translation conditioned on different labels) and multi-style (i.e., generation with diverse styles) tasks. However, due to the unexplored independence and exclusiveness in the labels, existing endeavors are defeated by involving uncontrolled manipulations to the translation results. In this paper, we propose Hierarchical Style Disentanglement (HiSD) to address this issue. Specifically, we organize the labels into a hierarchical tree structure, in which independent tags, exclusive attributes, and disentangled styles are allocated from top to bottom. Correspondingly, a new translation process is designed to adapt the above structure, in which the styles are identified for controllable translations. Both qualitative and quantitative results on the CelebA-HQ dataset verify the ability of the proposed HiSD. The code has been released at https://github.com/imlixinyang/HiSD.

67 citations

Journal ArticleDOI
TL;DR: This paper presents a novel face sketch synthesis method by multidomain adversarial learning (termed MDAL), which overcomes the defects of blurs and deformations toward high-quality synthesis.
Abstract: Given a training set of face photo-sketch pairs, face sketch synthesis targets at learning a mapping from the photo domain to the sketch domain. Despite the exciting progresses made in the literature, it retains as an open problem to synthesize high-quality sketches against blurs and deformations. Recent advances in generative adversarial training provide a new insight into face sketch synthesis, from which perspective the existing synthesis pipelines can be fundamentally revisited. In this paper, we present a novel face sketch synthesis method by multidomain adversarial learning (termed MDAL), which overcomes the defects of blurs and deformations toward high-quality synthesis. The principle of our scheme relies on the concept of “interpretation through synthesis.” In particular, we first interpret face photographs in the photodomain and face sketches in the sketch domain by reconstructing themselves respectively via adversarial learning. We define the intermediate products in the reconstruction process as latent variables, which form a latent domain. Second, via adversarial learning, we make the distributions of latent variables being indistinguishable between the reconstruction process of the face photograph and that of the face sketch. Finally, given an input face photograph, the latent variable obtained by reconstructing this face photograph is applied for synthesizing the corresponding sketch. Quantitative comparisons to the state-of-the-art methods demonstrate the superiority of the proposed MDAL method.

59 citations

Journal ArticleDOI
TL;DR: This work proposes a robust face sketch style synthesis method, which can convert photos to arbitrarily stylistic sketches based on only one corresponding template sketch, and introduces a multi-feature-based optimization model.
Abstract: Heterogeneous image conversion is a critical issue in many computer vision tasks, among which example-based face sketch style synthesis provides a convenient way to make artistic effects for photos. However, existing face sketch style synthesis methods generate stylistic sketches depending on many photo-sketch pairs. This requirement limits the generalization ability of these methods to produce arbitrarily stylistic sketches. To handle such a drawback, we propose a robust face sketch style synthesis method, which can convert photos to arbitrarily stylistic sketches based on only one corresponding template sketch. In the proposed method, a sparse representation-based greedy search strategy is first applied to estimate an initial sketch. Then, multi-scale features and Euclidean distance are employed to select candidate image patches from the initial estimated sketch and the template sketch. In order to further refine the obtained candidate image patches, a multi-feature-based optimization model is introduced. Finally, by assembling the refined candidate image patches, the completed face sketch is obtained. To further enhance the quality of synthesized sketches, a cascaded regression strategy is adopted. Compared with the state-of-the-art face sketch synthesis methods, experimental results on several commonly used face sketch databases and celebrity photos demonstrate the effectiveness of the proposed method.

56 citations


Cited by
More filters
01 Jan 2006

3,012 citations

Posted Content
TL;DR: This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019), and makes an in-deep analysis of their challenges as well as technical improvements in recent years.
Abstract: Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

802 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: A simple but effective framework for COD, termed Search Identification Network (SINet), which outperforms various state-of-the-art object detection baselines on all datasets tested, making it a robust, general framework that can help facilitate future research in COD.
Abstract: We present a comprehensive study on a new task named camouflaged object detection (COD), which aims to identify objects that are “seamlessly” embedded in their surroundings. The high intrinsic similarities between the target object and the background make COD far more challenging than the traditional object detection task. To address this issue, we elaborately collect a novel dataset, called COD10K, which comprises 10,000 images covering camouflaged objects in various natural scenes, over 78 object categories. All the images are densely annotated with category, bounding-box, object-/instance-level, and matting-level labels. This dataset could serve as a catalyst for progressing many vision tasks, e.g., localization, segmentation, and alpha-matting, etc. In addition, we develop a simple but effective framework for COD, termed Search Identification Network (SINet). Without any bells and whistles, SINet outperforms various state-of-the-art object detection baselines on all datasets tested, making it a robust, general framework that can help facilitate future research in COD. Finally, we conduct a large-scale COD study, evaluating 13 cutting-edge models, providing some interesting findings, and showing several potential applications. Our research offers the community an opportunity to explore more in this new field. The code will be available at https://github.com/DengPingFan/SINet/.

289 citations