Showing papers by "Yangqing Jia published in 2014"

PDF

Open Access

Posted Content•

Caffe: Convolutional Architecture for Fast Feature Embedding

[...]

Yangqing Jia¹, Evan Shelhamer², Jeff Donahue², Sergey Karayev², Jonathan Long², Ross Girshick², Sergio Guadarrama², Trevor Darrell² - Show less +4 more•Institutions (2)

Google¹, University of California, Berkeley²

20 Jun 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

...read moreread less

12,531 citations

Proceedings Article•DOI•

Caffe: Convolutional Architecture for Fast Feature Embedding

[...]

Yangqing Jia¹, Evan Shelhamer², Jeff Donahue², Sergey Karayev², Jonathan Long², Ross Girshick², Sergio Guadarrama², Trevor Darrell² - Show less +4 more•Institutions (2)

Google¹, University of California, Berkeley²

03 Nov 2014

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

...read moreread less

10,161 citations

Proceedings Article•

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

[...]

Jeff Donahue¹, Yangqing Jia¹, Oriol Vinyals¹, Judy Hoffman¹, Ning Zhang¹, Eric Tzeng¹, Trevor Darrell¹ - Show less +3 more•Institutions (1)

University of California, Berkeley¹

21 Jun 2014

TL;DR: DeCAF as discussed by the authors is an open-source implementation of these deep convolutional activation features, along with all associated network parameters, to enable vision researchers to conduct experimentation with deep representations across a range of visual concept learning paradigms.

...read moreread less

Abstract: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be repurposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient labeled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We investigate and visualize the semantic clustering of deep convolutional features with respect to a variety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed feature, and report novel results that significantly outperform the state-of-the-art on several important vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.

...read moreread less

3,760 citations

Posted Content•

Going Deeper with Convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

17 Sep 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep convolutional neural network architecture codenamed Inception is proposed that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

2,567 citations

Book Chapter•DOI•

Large-Scale Object Classification Using Label Relation Graphs

[...]

Jia Deng¹, Jia Deng², Nan Ding², Yangqing Jia², Andrea Frome², Kevin Murphy², Samy Bengio², Yuan Li², Hartmut Neven², Hartwig Adam² - Show less +6 more•Institutions (2)

University of Michigan¹, Google²

06 Sep 2014

TL;DR: A new model that allows encoding of flexible relations between labels is developed that can significantly improve object classification by exploiting the label relations and a probabilistic classification model based on HEX graphs is proposed.

...read moreread less

Abstract: In this paper we study how to perform object classification in a principled way that exploits the rich structure of real world labels. We develop a new model that allows encoding of flexible relations between labels. We introduce Hierarchy and Exclusion (HEX) graphs, a new formalism that captures semantic relations between any two labels applied to the same object: mutual exclusion, overlap and subsumption. We then provide rigorous theoretical analysis that illustrates properties of HEX graphs such as consistency, equivalence, and computational implications of the graph structure. Next, we propose a probabilistic classification model based on HEX graphs and show that it enjoys a number of desirable properties. Finally, we evaluate our method using a large-scale benchmark. Empirical results demonstrate that our model can significantly improve object classification by exploiting the label relations.

...read moreread less

448 citations

Proceedings Article•

Deep Convolutional Ranking for Multilabel Image Annotation

[...]

Yunchao Gong¹, Yangqing Jia², Thomas Leung², Alexander Toshev², Sergey Ioffe² - Show less +1 more•Institutions (2)

University of North Carolina at Chapel Hill¹, Google²

01 Jan 2014

TL;DR: It is shown that a significant performance gain could be obtained by combining convolutional architectures with approximate top-$k$ ranking objectives, as thye naturally fit the multilabel tagging problem.

...read moreread less

Abstract: Multilabel image annotation is one of the most important challenges in computer vision with many real-world applications. While existing work usually use conventional visual features for multilabel annotation, features based on Deep Neural Networks have shown potential to significantly boost performance. In this work, we propose to leverage the advantage of such features and analyze key components that lead to better performances. Specifically, we show that a significant performance gain could be obtained by combining convolutional architectures with approximate top-$k$ ranking objectives, as thye naturally fit the multilabel tagging problem. Our experiments on the NUS-WIDE dataset outperforms the conventional visual features by about 10%, obtaining the best reported performance in the literature.

...read moreread less

129 citations

Patent•

Ranking approach to train deep neural nets for multilabel image annotation

[...]

Yunchao Gong¹, King Hong Thomas Leung¹, Alexander Toshev¹, Sergey Ioffe¹, Yangqing Jia¹ - Show less +1 more•Institutions (1)

Google¹

28 Jul 2014

TL;DR: In this paper, a ranking approach to train deep neural networks for multilabel image annotation is presented. But the approach is limited to image classification, and it is not suitable for image classification with a large number of labels.

...read moreread less

Abstract: Systems and techniques are provided for a ranking approach to train deep neural nets for multilabel image annotation. Label scores may be received for labels determined by a neural network for training examples. Each label may be a positive label or a negative label for the training example. An error of the neural network may be determined based on a comparison, for each of the training examples, of the label scores for positive labels and negative labels for the training example and a semantic distance between each positive label and each negative label for the training example. Updated weights may be determined for the neural network based on a gradient of the determined error of the neural network. The updated weights may be applied to the neural network to train the neural network.

...read moreread less

18 citations

Journal Article•DOI•

Regularized Tree Partitioning and Its Application to Unsupervised Image Segmentation

[...]

Jingdong Wang¹, Huaizu Jiang², Yangqing Jia³, Xian-Sheng Hua¹, Changshui Zhang⁴, Long Quan⁵ - Show less +2 more•Institutions (5)

Microsoft¹, Xi'an Jiaotong University², University of California, Berkeley³, Tsinghua University⁴, Hong Kong University of Science and Technology⁵

01 Apr 2014-IEEE Transactions on Image Processing

TL;DR: To demonstrate the effectiveness of the proposed regularized tree partitioning approaches, its application to image segmentation over the Berkeley image segmentsation data set is shown and qualitative and quantitative comparisons with state-of-the-art methods are presented.

...read moreread less

Abstract: In this paper, we propose regularized tree partitioning approaches. We study normalized cut (NCut) and average cut (ACut) criteria over a tree, forming two approaches: 1) normalized tree partitioning (NTP) and 2) average tree partitioning (ATP). We give the properties that result in an efficient algorithm for NTP and ATP. In addition, we present the relations between the solutions of NTP and ATP over the maximum weight spanning tree of a graph and NCut and ACut over this graph. To demonstrate the effectiveness of the proposed approaches, we show its application to image segmentation over the Berkeley image segmentation data set and present qualitative and quantitative comparisons with state-of-the-art methods.

...read moreread less

17 citations