scispace - formally typeset
Search or ask a question
Author

Yanming Guo

Bio: Yanming Guo is an academic researcher from National University of Defense Technology. The author has contributed to research in topics: Feature (computer vision) & Convolutional neural network. The author has an hindex of 13, co-authored 37 publications receiving 1867 citations. Previous affiliations of Yanming Guo include Leiden University & Zhejiang University.

Papers
More filters
Journal ArticleDOI
TL;DR: The state-of-the-art in deep learning algorithms in computer vision is reviewed by highlighting the contributions and challenges from over 210 recent research papers, and the future trends and challenges in designing and training deep neural networks are summarized.

1,733 citations

Journal ArticleDOI
TL;DR: The field of semantic segmentation as pertaining to deep convolutional neural networks is reviewed and comprehensive coverage of the top approaches is provided and the strengths, weaknesses and major challenges are summarized.
Abstract: During the long history of computer vision, one of the grand challenges has been semantic segmentation which is the ability to segment an unknown image into different parts and objects (e.g., beach, ocean, sun, dog, swimmer). Furthermore, segmentation is even deeper than object recognition because recognition is not necessary for segmentation. Specifically, humans can perform image segmentation without even knowing what the objects are (for example, in satellite imagery or medical X-ray scans, there may be several objects which are unknown, but they can still be segmented within the image typically for further investigation). Performing segmentation without knowing the exact identity of all objects in the scene is an important part of our visual understanding process which can give us a powerful model to understand the world and also be used to improve or augment existing computer vision techniques. Herein this work, we review the field of semantic segmentation as pertaining to deep convolutional neural networks. We provide comprehensive coverage of the top approaches and summarize the strengths, weaknesses and major challenges.

451 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This work introduces a novel bridge between the modality-specific representations by creating a co-embedding space based on a recurrent residual fusion (RRF) block that adapts the recurrent mechanism to residual learning, so that it can recursively improve feature embeddings while retaining the shared parameters.
Abstract: A major challenge in matching between vision and language is that they typically have completely different features and representations. In this work, we introduce a novel bridge between the modality-specific representations by creating a co-embedding space based on a recurrent residual fusion (RRF) block. Specifically, RRF adapts the recurrent mechanism to residual learning, so that it can recursively improve feature embeddings while retaining the shared parameters. Then, a fusion module is used to integrate the intermediate recurrent outputs and generates a more powerful representation. In the matching network, RRF acts as a feature enhancement component to gather visual and textual representations into a more discriminative embedding space where it allows to narrow the crossmodal gap between vision and language. Moreover, we employ a bi-rank loss function to enforce separability of the two modalities in the embedding space. In the experiments, we evaluate the proposed RRF-Net using two multi-modal datasets where it achieves state-of-the-art results.

148 citations

Journal ArticleDOI
TL;DR: A high performance network based on the CNN-RNN paradigm is built which outperforms the original CNN and also the current state-of-the-art and is built on top of any CNN architecture which is primarily designed for leaf-level classification.
Abstract: Objects are often organized in a semantic hierarchy of categories, where fine-level categories are grouped into coarse-level categories according to their semantic relations. While previous works usually only classify objects into the leaf categories, we argue that generating hierarchical labels can actually describe how the leaf categories evolved from higher level coarse-grained categories, thus can provide a better understanding of the objects. In this paper, we propose to utilize the CNN-RNN framework to address the hierarchical image classification task. CNN allows us to obtain discriminative features for the input images, and RNN enables us to jointly optimize the classification of coarse and fine labels. This framework can not only generate hierarchical labels for images, but also improve the traditional leaf-level classification performance due to incorporating the hierarchical information. Moreover, this framework can be built on top of any CNN architecture which is primarily designed for leaf-level classification. Accordingly, we build a high performance network based on the CNN-RNN paradigm which outperforms the original CNN (wider-ResNet) and also the current state-of-the-art. In addition, we investigate how to utilize the CNN-RNN framework to improve the fine category classification when a fraction of the training data is only annotated with coarse labels. Experimental results demonstrate that CNN-RNN can use the coarse-labeled training data to improve the classification of fine categories, and in some cases it even surpasses the performance achieved by fully annotated training data. This reveals that, CNN-RNN can alleviate the challenge of specialized and expensive annotation of fine labels.

102 citations

Journal ArticleDOI
TL;DR: Polycrystalline ZnO : Cu-based film photodetectors with extended detection waveband (UV and visible light) were fabricated using facile colloidal chemistry and a post-annealing process to understand this complex photoconduction behaviour.
Abstract: Polycrystalline ZnO : Cu-based film photodetectors with extended detection waveband (UV and visible light) were fabricated using facile colloidal chemistry and a post-annealing process. The obtained detectors are highly sensitive to visible light and can realize the response switch between UV and visible light. A native and extrinsic trap cooperatively controlled space charge limited (SCL) transport mechanism is proposed to understand this complex photoconduction behaviour.

51 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This review, which focuses on the application of CNNs to image classification tasks, covers their development, from their predecessors up to recent state-of-the-art deep learning systems.
Abstract: Convolutional neural networks CNNs have been applied to visual tasks since the late 1980s. However, despite a few scattered applications, they were dormant until the mid-2000s when developments in computing power and the advent of large amounts of labeled data, supplemented by improved algorithms, contributed to their advancement and brought them to the forefront of a neural network renaissance that has seen rapid progression since 2012. In this review, which focuses on the application of CNNs to image classification tasks, we cover their development, from their predecessors up to recent state-of-the-art deep learning systems. Along the way, we analyze 1 their early successes, 2 their role in the deep learning renaissance, 3 selected symbolic works that have contributed to their recent popularity, and 4 several improvement attempts by reviewing contributions and challenges of over 300 publications. We also introduce some of their current trends and remaining challenges.

2,366 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: All the elements and important issues related to CNN, and how these elements work, are explained and defined and the parameters that effect CNN efficiency are state.
Abstract: The term Deep Learning or Deep Neural Network refers to Artificial Neural Networks (ANN) with multi layers. Over the last few decades, it has been considered to be one of the most powerful tools, and has become very popular in the literature as it is able to handle a huge amount of data. The interest in having deeper hidden layers has recently begun to surpass classical methods performance in different fields; especially in pattern recognition. One of the most popular deep neural networks is the Convolutional Neural Network (CNN). It take this name from mathematical linear operation between matrixes called convolution. CNN have multiple layers; including convolutional layer, non-linearity layer, pooling layer and fully-connected layer. The convolutional and fully-connected layers have parameters but pooling and non-linearity layers don't have parameters. The CNN has an excellent performance in machine learning problems. Specially the applications that deal with image data, such as largest image classification data set (Image Net), computer vision, and in natural language processing (NLP) and the results achieved were very amazing. In this paper we will explain and define all the elements and important issues related to CNN, and how these elements work. In addition, we will also state the parameters that effect CNN efficiency. This paper assumes that the readers have adequate knowledge about both machine learning and artificial neural network.

2,338 citations