scispace - formally typeset
Search or ask a question
Author

Yi Liu

Other affiliations: Chinese Academy of Sciences
Bio: Yi Liu is an academic researcher from Peking University. The author has contributed to research in topics: Computer science & Convolutional neural network. The author has an hindex of 10, co-authored 25 publications receiving 948 citations. Previous affiliations of Yi Liu include Chinese Academy of Sciences.

Papers
More filters
Proceedings ArticleDOI
23 Jun 2008
TL;DR: A novel human detection system in personal albums based on LBP (local binary pattern) descriptor is developed and carefully designed experiments demonstrate the superiority of LBP over other traditional features for human detection.
Abstract: In recent years, local pattern based object detection and recognition have attracted increasing interest in computer vision research community. However, to our best knowledge no previous work has focused on utilizing local patterns for the task of human detection. In this paper we develop a novel human detection system in personal albums based on LBP (local binary pattern) descriptor. Firstly we review the existing gradient based local features widely used in human detection, analyze their limitations and argue that LBP is more discriminative. Secondly, original LBP descriptor does not suit the human detecting problem well due to its high complexity and lack of semantic consistency, thus we propose two variants of LBP: Semantic-LBP and Fourier-LBP. Carefully designed experiments demonstrate the superiority of LBP over other traditional features for human detection. Especially we adopt a random ensemble algorithm for better comparison between different descriptors. All experiments are conducted on INRIA human database.

319 citations

Journal ArticleDOI
08 Dec 2017-Science
TL;DR: This work introduces recursive cortical network (RCN), a probabilistic generative model for vision in which message-passing–based inference handles recognition, segmentation, and reasoning in a unified manner and outperforms deep neural networks on a variety of benchmarks while being orders of magnitude more data-efficient.
Abstract: INTRODUCTION Compositionality, generalization, and learning from a few examples are among the hallmarks of human intelligence. CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart), images used by websites to block automated interactions, are examples of problems that are easy for people but difficult for computers. CAPTCHAs add clutter and crowd letters together to create a chicken-and-egg problem for algorithmic classifiers—the classifiers work well for characters that have been segmented out, but segmenting requires an understanding of the characters, which may be rendered in a combinatorial number of ways. CAPTCHAs also demonstrate human data efficiency: A recent deep-learning approach for parsing one specific CAPTCHA style required millions of labeled examples, whereas humans solve new styles without explicit training. By drawing inspiration from systems neuroscience, we introduce recursive cortical network (RCN), a probabilistic generative model for vision in which message-passing–based inference handles recognition, segmentation, and reasoning in a unified manner. RCN learns with very little training data and fundamentally breaks the defense of modern text-based CAPTCHAs by generatively segmenting characters. In addition, RCN outperforms deep neural networks on a variety of benchmarks while being orders of magnitude more data-efficient. RATIONALE Modern deep neural networks resemble the feed-forward hierarchy of simple and complex cells in the neocortex. Neuroscience has postulated computational roles for lateral and feedback connections, segregated contour and surface representations, and border-ownership coding observed in the visual cortex, yet these features are not commonly used by deep neural nets. We hypothesized that systematically incorporating these findings into a new model could lead to higher data efficiency and generalization. Structured probabilistic models provide a natural framework for incorporating prior knowledge, and belief propagation (BP) is an inference algorithm that can match the cortical computational speed. The representational choices in RCN were determined by investigating the computational underpinnings of neuroscience data under the constraint that accurate inference should be possible using BP. RESULTS RCN was effective in breaking a wide variety of CAPTCHAs with very little training data and without using CAPTCHA-specific heuristics. By comparison, a convolutional neural network required a 50,000-fold larger training set and was less robust to perturbations to the input. Similar results are shown on one- and few-shot MNIST (modified National Institute of Standards and Technology handwritten digit data set) classification, where RCN was significantly more robust to clutter introduced during testing. As a generative model, RCN outperformed neural network models when tested on noisy and cluttered examples and generated realistic samples from one-shot training of handwritten characters. RCN also proved to be effective at an occlusion reasoning task that required identifying the precise relationships between characters at multiple points of overlap. On a standard benchmark for parsing text in natural scenes, RCN outperformed state-of-the-art deep-learning methods while requiring 300-fold less training data. CONCLUSION Our work demonstrates that structured probabilistic models that incorporate inductive biases from neuroscience can lead to robust, generalizable machine learning models that learn with high data efficiency. In addition, our model’s effectiveness in breaking text-based CAPTCHAs with very little training data suggests that websites should seek more robust mechanisms for detecting automated interactions.

239 citations

Proceedings ArticleDOI
17 Jun 2006
TL;DR: An efficient new method to perform well online partial shape retrieval for large 3D shape repositories using a Monte Carlo sampling strategy and a partial shape dissimilarity measure to rank shapes according to their distances to the input query.
Abstract: This paper develops an efficient new method for 3D partial shape retrieval. First, a Monte Carlo sampling strategy is employed to extract local shape signatures from each 3D model. After vector quantization, these features are represented by using a bag-of-words model. The main contributions of this paper are threefold as follows: 1) a partial shape dissimilarity measure is proposed to rank shapes according to their distances to the input query, without using any timeconsuming alignment procedure; 2) by applying the probabilistic text analysis technique, a highly compact representation "Shape Topics" and accompanying algorithms are developed for efficient 3D partial shape retrieval, the mapping from "Shape Topics" to "object categories" is established using multi-class SVMs; and 3) a method for evaluating the performance of partial shape retrieval is proposed and tested. To our best knowledge, very few existing methods are able to perform well online partial shape retrieval for large 3D shape repositories. Our experimental results are expected to validate the efficacy and effectiveness of our novel approach.

124 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A method for augmenting and training CNNs so that their learned features are compositional, which encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization.
Abstract: Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.

61 citations

Proceedings ArticleDOI
01 Jul 2018
TL;DR: In this paper, a multi-sentiment-resource enhanced attention network (MEAN) was proposed to integrate three kinds of sentiment linguistic knowledge (i.e., sentiment lexicon, negation words, and intensity words) into the deep neural network via attention mechanisms.
Abstract: Deep learning approaches for sentiment classification do not fully exploit sentiment linguistic knowledge. In this paper, we propose a Multi-sentiment-resource Enhanced Attention Network (MEAN) to alleviate the problem by integrating three kinds of sentiment linguistic knowledge (e.g., sentiment lexicon, negation words, intensity words) into the deep neural network via attention mechanisms. By using various types of sentiment resources, MEAN utilizes sentiment-relevant information from different representation sub-spaces, which makes it more effective to capture the overall semantics of the sentiment, negation and intensity words for sentiment prediction. The experimental results demonstrate that MEAN has robust superiority over strong competitors.

58 citations


Cited by
More filters
Proceedings ArticleDOI
01 Sep 2009
TL;DR: By combining Histograms of Oriented Gradients (HOG) and Local Binary Pattern (LBP) as the feature set, this work proposes a novel human detection approach capable of handling partial occlusion and achieves the best human detection performance on the INRIA dataset.
Abstract: By combining Histograms of Oriented Gradients (HOG) and Local Binary Pattern (LBP) as the feature set, we propose a novel human detection approach capable of handling partial occlusion. Two kinds of detectors, i.e., global detector for whole scanning windows and part detectors for local regions, are learned from the training data using linear SVM. For each ambiguous scanning window, we construct an occlusion likelihood map by using the response of each block of the HOG feature to the global detector. The occlusion likelihood map is then segmented by Mean-shift approach. The segmented portion of the window with a majority of negative response is inferred as an occluded region. If partial occlusion is indicated with high likelihood in a certain scanning window, part detectors are applied on the unoccluded regions to achieve the final classification on the current scanning window. With the help of the augmented HOG-LBP feature and the global-part occlusion handling method, we achieve a detection rate of 91.3% with FPPW= 10−6, 94.7% with FPPW= 10−5, and 97.9% with FPPW= 10−4 on the INRIA dataset, which, to our best knowledge, is the best human detection performance on the INRIA dataset. The global-part occlusion handling method is further validated using synthesized occlusion data constructed from the INRIA and Pascal dataset.

1,838 citations

Posted Content
TL;DR: This work discusses core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration, and important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn.
Abstract: We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, natural language processing, including dialogue systems, machine translation, and text generation, computer vision, neural architecture design, business management, finance, healthcare, Industry 4.0, smart grid, intelligent transportation systems, and computer systems. We mention topics not reviewed yet, and list a collection of RL resources. After presenting a brief summary, we close with discussions. Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update.

935 citations

Journal ArticleDOI
TL;DR: This article uses multiscale diffusion heat kernels as “geometric words” to construct compact and informative shape descriptors by means of the “bag of features” approach, and shows that shapes can be efficiently represented as binary codes.
Abstract: The computer vision and pattern recognition communities have recently witnessed a surge of feature-based methods in object recognition and image retrieval applications. These methods allow representing images as collections of “visual words” and treat them using text search approaches following the “bag of features” paradigm. In this article, we explore analogous approaches in the 3D world applied to the problem of nonrigid shape retrieval in large databases. Using multiscale diffusion heat kernels as “geometric words,” we construct compact and informative shape descriptors by means of the “bag of features” approach. We also show that considering pairs of “geometric words” (“geometric expressions”) allows creating spatially sensitive bags of features with better discriminative power. Finally, adopting metric learning approaches, we show that shapes can be efficiently represented as binary codes. Our approach achieves state-of-the-art results on the SHREC 2010 large-scale shape retrieval benchmark.

894 citations

Journal ArticleDOI
TL;DR: A survey of content-based 3D shape retrieval methods can be found in this article, where the authors evaluate the suitability of several requirements of content based shape retrieval, such as shape representation requirements, properties of dissimilarity measures, efficiency, discrimination abilities, ability to perform partial matching, robustness, and necessity of pose normalization.
Abstract: Recent developments in techniques for modeling, digitizing and visualizing 3D shapes has led to an explosion in the number of available 3D models on the Internet and in domain-specific databases. This has led to the development of 3D shape retrieval systems that, given a query object, retrieve similar 3D objects. For visualization, 3D shapes are often represented as a surface, in particular polygonal meshes, for example in VRML format. Often these models contain holes, intersecting polygons, are not manifold, and do not enclose a volume unambiguously. On the contrary, 3D volume models, such as solid models produced by CAD systems, or voxels models, enclose a volume properly. This paper surveys the literature on methods for content based 3D retrieval, taking into account the applicability to surface models as well as to volume models. The methods are evaluated with respect to several requirements of content based 3D shape retrieval, such as: (1) shape representation requirements, (2) properties of dissimilarity measures, (3) efficiency, (4) discrimination abilities, (5) ability to perform partial matching, (6) robustness, and (7) necessity of pose normalization. Finally, the advantages and limitations of the several approaches in content based 3D shape retrieval are discussed.

857 citations

Posted Content
TL;DR: Ten concerns for deep learning are presented, and it is suggested that deep learning must be supplemented by other techniques if the authors are to reach artificial general intelligence.
Abstract: Although deep learning has historical roots going back decades, neither the term "deep learning" nor the approach was popular just over five years ago, when the field was reignited by papers such as Krizhevsky, Sutskever and Hinton's now classic (2012) deep network model of Imagenet. What has the field discovered in the five subsequent years? Against a background of considerable progress in areas such as speech recognition, image recognition, and game playing, and considerable enthusiasm in the popular press, I present ten concerns for deep learning, and suggest that deep learning must be supplemented by other techniques if we are to reach artificial general intelligence.

779 citations