scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Sketch-based manga retrieval using manga109 dataset

01 Oct 2017-Multimedia Tools and Applications (Springer US)-Vol. 76, Iss: 20, pp 21811-21838
TL;DR: A manga-specific image retrieval system that consists of efficient margin labeling, edge orientation histogram feature description with screen tone removal, and approximate nearest-neighbor search using product quantization is proposed.
Abstract: Manga (Japanese comics) are popular worldwide. However, current e-manga archives offer very limited search support, i.e., keyword-based search by title or author. To make the manga search experience more intuitive, efficient, and enjoyable, we propose a manga-specific image retrieval system. The proposed system consists of efficient margin labeling, edge orientation histogram feature description with screen tone removal, and approximate nearest-neighbor search using product quantization. For querying, the system provides a sketch-based interface. Based on the interface, two interactive reranking schemes are presented: relevance feedback and query retouch. For evaluation, we built a novel dataset of manga images, Manga109, which consists of 109 comic books of 21,142 pages drawn by professional manga artists. To the best of our knowledge, Manga109 is currently the biggest dataset of manga images available for research. Experimental results showed that the proposed framework is efficient and scalable (70 ms from 21,142 pages using a single computer with 204 MB RAM).

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
18 Jun 2018
TL;DR: This paper proposes residual dense block (RDB) to extract abundant local features via dense connected convolutional layers and uses global feature fusion in RDB to jointly and adaptively learn global hierarchical features in a holistic way.
Abstract: A very deep convolutional neural network (CNN) has recently achieved great success for image super-resolution (SR) and offered hierarchical features as well. However, most deep CNN based SR models do not make full use of the hierarchical features from the original low-resolution (LR) images, thereby achieving relatively-low performance. In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via dense connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory (CM) mechanism. Local feature fusion in RDB is then used to adaptively learn more effective features from preceding and current local features and stabilizes the training of wider network. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. Experiments on benchmark datasets with different degradation models show that our RDN achieves favorable performance against state-of-the-art methods.

2,860 citations


Additional excerpts

  • ...For testing, we use five standard benchmark datasets: Set5 [1], Set14 [33], B100 [18], Urban100 [8], and Manga109 [19]....

    [...]

  • ...For testing, we use five standard benchmark datasets: Set5 [1], Set14 [32], B100 [17], Urban100 [8], and Manga109 [18]....

    [...]

  • ...Table 3 shows the average PSNR and SSIM results on Set5, Set14, B100, Urban100, and Manga109 with scaling factor ×3....

    [...]

Posted Content
Yulun Zhang1, Kunpeng Li1, Kai Li1, Lichen Wang1, Bineng Zhong1, Yun Fu1 
TL;DR: This work proposes a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections, and proposes a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels.
Abstract: Convolutional neural network (CNN) depth is of crucial importance for image super-resolution (SR). However, we observe that deeper networks for image SR are more difficult to train. The low-resolution inputs and features contain abundant low-frequency information, which is treated equally across channels, hence hindering the representational ability of CNNs. To solve these problems, we propose the very deep residual channel attention networks (RCAN). Specifically, we propose a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections. Each residual group contains some residual blocks with short skip connections. Meanwhile, RIR allows abundant low-frequency information to be bypassed through multiple skip connections, making the main network focus on learning high-frequency information. Furthermore, we propose a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels. Extensive experiments show that our RCAN achieves better accuracy and visual improvements against state-of-the-art methods.

2,025 citations


Additional excerpts

  • ...For Urban100 and Manga109, the PSNR gains of RCAN over EDSR are 0.49 dB and 0.55 dB. EDSR has much larger number of parameters (43 M) than ours (16 M), but our RCAN obtains much better performance....

    [...]

  • ...For testing, we use five standard benchmark datasets: Set5 [36], Set14 [37], B100 [38], Urban100 [22], and Manga109 [39]....

    [...]

Book ChapterDOI
Yulun Zhang1, Kunpeng Li1, Kai Li1, Lichen Wang1, Bineng Zhong1, Yun Fu1 
08 Sep 2018
TL;DR: Very deep residual channel attention networks (RCAN) as mentioned in this paper proposes a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections Each residual group contains some residual blocks with short skip connections.
Abstract: Convolutional neural network (CNN) depth is of crucial importance for image super-resolution (SR) However, we observe that deeper networks for image SR are more difficult to train The low-resolution inputs and features contain abundant low-frequency information, which is treated equally across channels, hence hindering the representational ability of CNNs To solve these problems, we propose the very deep residual channel attention networks (RCAN) Specifically, we propose a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections Each residual group contains some residual blocks with short skip connections Meanwhile, RIR allows abundant low-frequency information to be bypassed through multiple skip connections, making the main network focus on learning high-frequency information Furthermore, we propose a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels Extensive experiments show that our RCAN achieves better accuracy and visual improvements against state-of-the-art methods

1,991 citations

Proceedings ArticleDOI
12 Apr 2017
TL;DR: In this paper, the Laplacian pyramid super-resolution network (LapSRN) is proposed to progressively reconstruct the sub-band residuals of high-resolution images.
Abstract: Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.

1,651 citations

Posted Content
TL;DR: This paper proposes the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images and generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications.
Abstract: Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.

1,417 citations


Additional excerpts

  • ...Among these datasets, SET5, SET14 and BSDS100 consist of natural scenes; URBAN100 contains challenging urban scenes images with details in different frequency bands; and MANGA109 is a dataset of Japanese manga....

    [...]

  • ...Algorithm Scale SET5 SET14 BSDS100 URBAN100 MANGA109PSNR / SSIM / IFC PSNR / SSIM / IFC PSNR / SSIM / IFC PSNR / SSIM / IFC PSNR / SSIM / IFC Bicubic 2 33.65 / 0.930 / 6.166 30.34 / 0.870 / 6.126 29.56 / 0.844 / 5.695 26.88 / 0.841 / 6.319 30.84 / 0.935 / 6.214 A+ [30] 2 36.54 / 0.954 / 8.715 32.40 / 0.906 / 8.201 31.22 / 0.887 / 7.464 29.23 / 0.894 / 8.440 35.33 / 0.967 / 8.906 SRCNN [7] 2 36.65 / 0.954 / 8.165 32.29 / 0.903 / 7.829 31.36 / 0.888 / 7.242 29.52 / 0.895 / 8.092 35.72 / 0.968 / 8.471 FSRCNN [8] 2 36.99 / 0.955 / 8.200 32.73 / 0.909 / 7.843 31.51 / 0.891 / 7.180 29.87 / 0.901 / 8.131 36.62 / 0.971 / 8.587 SelfExSR [15] 2 36.49 / 0.954 / 8.391 32.44 / 0.906 / 8.014 31.18 / 0.886 / 7.239 29.54 / 0.897 / 8.414 35.78 / 0.968 / 8.721 RFL [26] 2 36.55 / 0.954 / 8.006 32.36 / 0.905 / 7.684 31.16 / 0.885 / 6.930 29.13 / 0.891 / 7.840 35.08 / 0.966 / 8.921 SCN [33] 2 36.52 / 0.953 / 7.358 32.42 / 0.904 / 7.085 31.24 / 0.884 / 6.500 29.50 / 0.896 / 7.324 35.47 / 0.966 / 7.601 VDSR [17] 2 37.53 / 0.958 / 8.190 32.97 / 0.913 / 7.878 31.90 / 0.896 / 7.169 30.77 / 0.914 / 8.270 37.16 / 0.974 / 9.120 DRCN [18] 2 37.63 / 0.959 / 8.326 32.98 / 0.913 / 8.025 31.85 / 0.894 / 7.220 30.76 / 0.913 / 8.527 37.57 / 0.973 / 9.541 LapSRN (ours 2×) 2 37.52 / 0.959 / 9.010 33.08 / 0.913 / 8.505 31.80 / 0.895 / 7.715 30.41 / 0.910 / 8.907 37.27 / 0.974 / 9.481 LapSRN (ours 8×) 2 37.25 / 0.957 / 8.527 32.96 / 0.910 / 8.140 31.68 / 0.892 / 7.430 30.25 / 0.907 / 8.564 36.73 / 0.972 / 8.933 Bicubic 4 28.42 / 0.810 / 2.337 26.10 / 0.704 / 2.246 25.96 / 0.669 / 1.993 23.15 / 0.659 / 2.386 24.92 / 0.789 / 2.289 A+ [30] 4 30.30 / 0.859 / 3.260 27.43 / 0.752 / 2.961 26.82 / 0.710 / 2.564 24.34 / 0.720 / 3.218 27.02 / 0.850 / 3.177 SRCNN [7] 4 30.49 / 0.862 / 2.997 27.61 / 0.754 / 2.767 26.91 / 0.712 / 2.412 24.53 / 0.724 / 2.992 27.66 / 0.858 / 3.045 FSRCNN [8] 4 30.71 / 0.865 / 2.994 27.70 / 0.756 / 2.723 26.97 / 0.714 / 2.370 24.61 / 0.727 / 2.916 27.89 / 0.859 / 2.950 SelfExSR [15] 4 30.33 / 0.861 / 3.249 27.54 / 0.756 / 2.952 26.84 / 0.712 / 2.512 24.82 / 0.740 / 3.381 27.82 / 0.865 / 3.358 RFL [26] 4 30.15 / 0.853 / 3.135 27.33 / 0.748 / 2.853 26.75 / 0.707 / 2.455 24.20 / 0.711 / 3.000 26.80 / 0.840 / 3.055 SCN [33] 4 30.39 / 0.862 / 2.911 27.48 / 0.751 / 2.651 26.87 / 0.710 / 2.309 24.52 / 0.725 / 2.861 27.39 / 0.856 / 2.889 VDSR [17] 4 31.35 / 0.882 / 3.496 28.03 / 0.770 / 3.071 27.29 / 0.726 / 2.627 25.18 / 0.753 / 3.405 28.82 / 0.886 / 3.664 DRCN [18] 4 31.53 / 0.884 / 3.502 28.04 / 0.770 / 3.066 27.24 / 0.724 / 2.587 25.14 / 0.752 / 3.412 28.97 / 0.886 / 3.674 LapSRN (ours 4×) 4 31.54 / 0.885 / 3.559 28.19 / 0.772 / 3.147 27.32 / 0.728 / 2.677 25.21 / 0.756 / 3.530 29.09 / 0.890 / 3.729 LapSRN (ours 8×) 4 31.33 / 0.881 / 3.491 28.06 / 0.768 / 3.100 27.22 / 0.724 / 2.660 25.02 / 0.747 / 3.426 28.68 / 0.882 / 3.595 Bicubic 8 24.39 / 0.657 / 0.836 23.19 / 0.568 / 0.784 23.67 / 0.547 / 0.646 20.74 / 0.515 / 0.858 21.47 / 0.649 / 0.810 A+ [30] 8 25.52 / 0.692 / 1.077 23.98 / 0.597 / 0.983 24.20 / 0.568 / 0.797 21.37 / 0.545 / 1.092 22.39 / 0.680 / 1.056 SRCNN [7] 8 25.33 / 0.689 / 0.938 23.85 / 0.593 / 0.865 24.13 / 0.565 / 0.705 21.29 / 0.543 / 0.947 22.37 / 0.682 / 0.940 FSRCNN [8] 8 25.41 / 0.682 / 0.989 23.93 / 0.592 / 0.928 24.21 / 0.567 / 0.772 21.32 / 0.537 / 0.986 22.39 / 0.672 / 0.977 SelfExSR [15] 8 25.52 / 0.704 / 1.131 24.02 / 0.603 / 1.001 24.18 / 0.568 / 0.774 21.81 / 0.576 / 1.283 22.99 / 0.718 / 1.244 RFL [26] 8 25.36 / 0.677 / 0.985 23.88 / 0.588 / 0.910 24.13 / 0.562 / 0.741 21.27 / 0.535 / 0.978 22.27 / 0.668 / 0.968 SCN [33] 8 25.59 / 0.705 / 1.063 24.11 / 0.605 / 0.967 24.30 / 0.573 / 0.777 21.52 / 0.559 / 1.074 22.68 / 0.700 / 1.073 VDSR [17] 8 25.72 / 0.711 / 1.123 24.21 / 0.609 / 1.016 24.37 / 0.576 / 0.816 21.54 / 0.560 / 1.119 22.83 / 0.707 / 1.138 LapSRN (ours 8×) 8 26.14 / 0.738 / 1.302 24.44 / 0.623 / 1.134 24.54 / 0.586 / 0.893 21.81 / 0.581 / 1.288 23.39 / 0.735 / 1.352 For 8× SR, we re-train the model of A+, SRCNN, FSRCNN, RFL and VDSR using the publicly available code1....

    [...]

  • ...We carry out extensive experiments using 5 datasets: SET5 [2], SET14 [39], BSDS100 [1], URBAN100 [15] and MANGA109 [23]....

    [...]

  • ...In Figure 4, we show visual comparisons on URBAN100, BSDS100 and MANGA109 with the a scale factor of 4×....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Sketch-based manga retrieval using ..." refers methods in this paper

  • ...BoF using a histogram of oriented gradients [17] was proposed [5], [6], [18]....

    [...]

Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations


"Sketch-based manga retrieval using ..." refers background or methods in this paper

  • ...The setup is similar to that for image detection evaluation [17]....

    [...]

  • ...Evaluation criteria For evaluation, we employed a standard PASCAL overlap criterion [17]....

    [...]

  • ..., the PASCAL VOC datasets [17] for image recognition in the 2000s, and ImageNet [51] for recent rapid progress in deep architecture....

    [...]

Journal ArticleDOI
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

14,245 citations


"Sketch-based manga retrieval using ..." refers background in this paper

  • ...For example, texture-based features such as Local Binary Pattern (LBP) [42] is not effective for manga because manga images do not contain texture information (Fig....

    [...]

Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Abstract: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence. This technique works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting "spatial pyramid" is a simple and computationally efficient extension of an orderless bag-of-features image representation, and it shows significantly improved performance on challenging scene categorization tasks. Specifically, our proposed method exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories. The spatial pyramid framework also offers insights into the success of several recently proposed image descriptions, including Torralba’s "gist" and Lowe’s SIFT descriptors.

8,736 citations


"Sketch-based manga retrieval using ..." refers methods in this paper

  • ...This kind of task has been tackled in spatial verification-based reranking methods [25], [26], [27], [28], [29], [30], [31]....

    [...]