scispace - formally typeset
Search or ask a question
Author

Jiuxiang Gu

Bio: Jiuxiang Gu is an academic researcher from Adobe Systems. The author has contributed to research in topics: Computer science & Closed captioning. The author has an hindex of 12, co-authored 35 publications receiving 3599 citations. Previous affiliations of Jiuxiang Gu include Nanyang Technological University.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

3,125 citations

Posted Content
TL;DR: This paper details the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation, and introduces various applications of convolutional neural networks in computer vision, speech and natural language processing.
Abstract: In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.

1,302 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: This work proposes to incorporate generative processes into the cross-modal feature embedding, through which it is able to learn not only the global abstract features but also the local grounded features of image-text pairs.
Abstract: Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities. Learning appropriate representations for multi-modal data is crucial for the cross-modal retrieval performance. Unlike existing image-text retrieval approaches that embed image-text pairs as single feature vectors in a common representational space, we propose to incorporate generative processes into the cross-modal feature embedding, through which we are able to learn not only the global abstract features but also the local grounded features. Extensive experiments show that our framework can well match images and sentences with complex content, and achieve the state-of-the-art cross-modal retrieval results on MSCOCO dataset.

393 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: This paper proposes a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome dataset issues, and extracts commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability inscene graph generation.
Abstract: Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction, etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. In particular, we extract commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability in scene graph generation. To address the bias of noisy object annotations, we introduce an auxiliary image reconstruction path to regularize the scene graph generation network. Extensive experiments show that our framework can generate better scene graphs, achieving the state-of-the-art performance on two benchmark datasets: Visual Relationship Detection and Visual Genome datasets.

271 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper introduces a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning, and is competitive with the state-of-the-art methods.
Abstract: Language models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all the previous words and can model the long-range dependencies in history words, which are critical for image captioning. The effectiveness of our approach is validated on two datasets: Flickr30K and MS COCO. Our extensive experimental results show that our method outperforms the vanilla recurrent neural network based language models and is competitive with the state-of-the-art methods.

116 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper reviews the major deep learning concepts pertinent to medical image analysis and summarizes over 300 contributions to the field, most of which appeared in the last year, to survey the use of deep learning for image classification, object detection, segmentation, registration, and other tasks.

8,730 citations

Journal ArticleDOI
TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

3,125 citations

Journal ArticleDOI
TL;DR: This review, which focuses on the application of CNNs to image classification tasks, covers their development, from their predecessors up to recent state-of-the-art deep learning systems.
Abstract: Convolutional neural networks CNNs have been applied to visual tasks since the late 1980s. However, despite a few scattered applications, they were dormant until the mid-2000s when developments in computing power and the advent of large amounts of labeled data, supplemented by improved algorithms, contributed to their advancement and brought them to the forefront of a neural network renaissance that has seen rapid progression since 2012. In this review, which focuses on the application of CNNs to image classification tasks, we cover their development, from their predecessors up to recent state-of-the-art deep learning systems. Along the way, we analyze 1 their early successes, 2 their role in the deep learning renaissance, 3 selected symbolic works that have contributed to their recent popularity, and 4 several improvement attempts by reviewing contributions and challenges of over 300 publications. We also introduce some of their current trends and remaining challenges.

2,366 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
Abstract: Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

1,897 citations