scispace - formally typeset
Search or ask a question

Showing papers on "Conditional random field published in 2014"


Posted Content
TL;DR: This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF).
Abstract: Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our "DeepLab" system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the 'hole' algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU.

3,389 citations


Posted Content
TL;DR: A deep structured learning scheme which learns the unary and pairwise potentials of continuous CRF in a unified deep CNN framework and can be used for depth estimations of general scenes with no geometric priors nor any extra information injected.
Abstract: We consider the problem of depth estimation from a single monocular image in this work. It is a challenging task as no reliable depth cues are available, e.g., stereo correspondences, motions, etc. Previous efforts have been focusing on exploiting geometric priors or additional sources of information, with all using hand-crafted features. Recently, there is mounting evidence that features from deep convolutional neural networks (CNN) are setting new records for various vision applications. On the other hand, considering the continuous characteristic of the depth values, depth estimations can be naturally formulated into a continuous conditional random field (CRF) learning problem. Therefore, we in this paper present a deep convolutional neural field model for estimating depths from a single image, aiming to jointly explore the capacity of deep CNN and continuous CRF. Specifically, we propose a deep structured learning scheme which learns the unary and pairwise potentials of continuous CRF in a unified deep CNN framework. The proposed method can be used for depth estimations of general scenes with no geometric priors nor any extra information injected. In our case, the integral of the partition function can be analytically calculated, thus we can exactly solve the log-likelihood optimization. Moreover, solving the MAP problem for predicting depths of a new image is highly efficient as closed-form solutions exist. We experimentally demonstrate that the proposed method outperforms state-of-the-art depth estimation methods on both indoor and outdoor scene datasets.

643 citations


Book
11 Nov 2014
TL;DR: This book summarizes the recent advancement in the field of automatic speech recognition with a focus on discriminative and hierarchical models and presents insights and theoretical foundation of a series of recent models such as conditional random field, semi-Markov and hidden conditionalrandom field, deep neural network, deep belief network, and deep stacking models for sequential learning.
Abstract: This book summarizes the recent advancement in the field of automatic speech recognition with a focus on discriminative and hierarchical models. This will be the first automatic speech recognition book to include a comprehensive coverage of recent developments such as conditional random field and deep learning techniques. It presents insights and theoretical foundation of a series of recent models such as conditional random field, semi-Markov and hidden conditional random field, deep neural network, deep belief network, and deep stacking models for sequential learning. It also discusses practical considerations of using these models in both acoustic and language modeling for continuous speech recognition.

520 citations


Journal ArticleDOI
TL;DR: This work integrates a Random Forest classifier into a Conditional Random Field framework, a flexible approach for obtaining a reliable classification result even in complex urban scenes, and investigates the relevance of different features for the LiDAR points as well as for the interaction of neighbouring points.
Abstract: In this work we address the task of the contextual classification of an airborne LiDAR point cloud. For that purpose, we integrate a Random Forest classifier into a Conditional Random Field (CRF) framework. It is a flexible approach for obtaining a reliable classification result even in complex urban scenes. In this way, we benefit from the consideration of context on the one hand and from the opportunity to use a large amount of features on the other hand. Considering the interactions in our experiments increases the overall accuracy by 2%, though a larger improvement becomes apparent in the completeness and correctness of some of the seven classes discerned in our experiments. We compare the Random Forest approach to linear models for the computation of unary and pairwise potentials of the CRF, and investigate the relevance of different features for the LiDAR points as well as for the interaction of neighbouring points. In a second step, building objects are detected based on the classified point cloud. For that purpose, the CRF probabilities for the classes are plugged into a Markov Random Field as unary potentials, in which the pairwise potentials are based on a Potts model. The 2D binary building object masks are extracted and evaluated by the benchmark ISPRS Test Project on Urban Classification and 3D Building Reconstruction. The evaluation shows that the main buildings (larger than 50 m 2 ) can be detected very reliably with a correctness larger than 96% and a completeness of 100%.

455 citations


Posted Content
TL;DR: A new, large-scale, open dataset of materials in the wild, the Materials in Context Database (MINC), is introduced, and convolutional neural networks are trained for two tasks: classifying materials from patches, and simultaneous material recognition and segmentation in full images.
Abstract: Recognizing materials in real-world images is a challenging task. Real-world materials have rich surface texture, geometry, lighting conditions, and clutter, which combine to make the problem particularly difficult. In this paper, we introduce a new, large-scale, open dataset of materials in the wild, the Materials in Context Database (MINC), and combine this dataset with deep learning to achieve material recognition and segmentation of images in the wild. MINC is an order of magnitude larger than previous material databases, while being more diverse and well-sampled across its 23 categories. Using MINC, we train convolutional neural networks (CNNs) for two tasks: classifying materials from patches, and simultaneous material recognition and segmentation in full images. For patch-based classification on MINC we found that the best performing CNN architectures can achieve 85.2% mean class accuracy. We convert these trained CNN classifiers into an efficient fully convolutional framework combined with a fully connected conditional random field (CRF) to predict the material at every pixel in an image, achieving 73.1% mean class accuracy. Our experiments demonstrate that having a large, well-sampled dataset such as MINC is crucial for real-world material recognition and segmentation.

365 citations


Proceedings ArticleDOI
Kaisheng Yao1, Baolin Peng1, Yu Zhang1, Dong Yu1, Geoffrey Zweig1, Yangyang Shi1 
01 Dec 2014
TL;DR: This paper investigates using long short-term memory (LSTM) neural networks, which contain input, output and forgetting gates and are more advanced than simple RNN, for the word labeling task and proposes a regression model on top of the LSTM un-normalized scores to explicitly model output-label dependence.
Abstract: Neural network based approaches have recently produced record-setting performances in natural language understanding tasks such as word labeling. In the word labeling task, a tagger is used to assign a label to each word in an input sequence. Specifically, simple recurrent neural networks (RNNs) and convolutional neural networks (CNNs) have shown to significantly outperform the previous state-of-the-art - conditional random fields (CRFs). This paper investigates using long short-term memory (LSTM) neural networks, which contain input, output and forgetting gates and are more advanced than simple RNN, for the word labeling task. To explicitly model output-label dependence, we propose a regression model on top of the LSTM un-normalized scores. We also propose to apply deep LSTM to the task. We investigated the relative importance of each gate in the LSTM by setting other gates to a constant and only learning particular gates. Experiments on the ATIS dataset validated the effectiveness of the proposed models.

350 citations


Journal ArticleDOI
TL;DR: A joint model of three core tasks in the entity analysis stack: coreference resolution, named entity recognition, and entity linking, which achieves state-of-the-art results for all three tasks.
Abstract: We present a joint model of three core tasks in the entity analysis stack: coreference resolution (within-document clustering), named entity recognition (coarse semantic typing), and entity linking (matching to Wikipedia entities). Our model is formally a structured conditional random field. Unary factors encode local features from strong baselines for each task. We then add binary and ternary factors to capture cross-task interactions, such as the constraint that coreferent mentions have the same semantic type. On the ACE 2005 and OntoNotes datasets, we achieve state-of-the-art results for all three tasks. Moreover, joint modeling improves performance on each task over strong independent baselines.

284 citations


Book ChapterDOI
06 Sep 2014
TL;DR: Improved 3D structure and temporally consistent semantic segmentation for difficult, large scale, forward moving monocular image sequences is demonstrated.
Abstract: We present an approach for joint inference of 3D scene structure and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occupancy map, which is much more useful than a series of 2D semantic label images or a sparse point cloud produced by traditional semantic segmentation and Structure from Motion(SfM) pipelines respectively. We derive a Conditional Random Field (CRF) model defined in the 3D space, that jointly infers the semantic category and occupancy for each voxel. Such a joint inference in the 3D CRF paves the way for more informed priors and constraints, which is otherwise not possible if solved separately in their traditional frameworks. We make use of class specific semantic cues that constrain the 3D structure in areas, where multiview constraints are weak. Our model comprises of higher order factors, which helps when the depth is unobservable.We also make use of class specific semantic cues to reduce either the degree of such higher order factors, or to approximately model them with unaries if possible. We demonstrate improved 3D structure and temporally consistent semantic segmentation for difficult, large scale, forward moving monocular image sequences.

282 citations


Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new dataset is described, which contains Facebook posts and comments that exhibit code mixing between Bengali, English and Hindi, and it is found that the dictionary-based approach is surpassed by supervised classification and sequence labelling, and that it is important to take contextual clues into consideration.
Abstract: In social media communication, multilingual speakers often switch between languages, and, in such an environment, automatic language identification becomes both a necessary and challenging task. In this paper, we describe our work in progress on the problem of automatic language identification for the language of social media. We describe a new dataset that we are in the process of creating, which contains Facebook posts and comments that exhibit code mixing between Bengali, English and Hindi. We also present some preliminary word-level language identification experiments using this dataset. Different techniques are employed, including a simple unsupervised dictionary-based approach, supervised word-level classification with and without contextual clues, and sequence labelling using Conditional Random Fields. We find that the dictionary-based approach is surpassed by supervised classification and sequence labelling, and that it is important to take contextual clues into consideration.

273 citations


Proceedings ArticleDOI
15 Apr 2014
TL;DR: MapCraft is presented, a novel, robust and responsive technique that is extremely computationally efficient, does not require training in different sites, and tracks well even when presented with very noisy sensor data, enabling a new era of location-aware applications to be developed.
Abstract: Indoor tracking and navigation is a fundamental need for pervasive and context-aware smartphone applications. Although indoor maps are becoming increasingly available, there is no practical and reliable indoor map matching solution available at present. We present MapCraft, a novel, robust and responsive technique that is extremely computationally efficient (running in under 10 ms on an Android smartphone), does not require training in different sites, and tracks well even when presented with very noisy sensor data. Key to our approach is expressing the tracking problem as a conditional random field (CRF), a technique which has had great success in areas such as natural language processing, but has yet to be considered for indoor tracking. Unlike directed graphical models like Hidden Markov Models, CRFs capture arbitrary constraints that express how well observations support state transitions, given map constraints. Extensive experiments in multiple sites show how MapCraft outperforms state-of-the art approaches, demonstrating excellent tracking error and accurate reconstruction of tortuous trajectories with zero training effort. As proof of its robustness, we also demonstrate how it is able to accurately track the position of a user from accelerometer and magnetometer measurements only (i.e. gyro- and WiFi-free). We believe that such an energy-efficient approach will enable always-on background localisation, enabling a new era of location-aware applications to be developed.

157 citations


Journal ArticleDOI
TL;DR: The authors' evaluation on the independent test set showed that most types of feature were beneficial to Chinese NER systems, although the improvements were limited, and the system achieved the highest performance by combining word segmentation and section information, indicating that these two types offeature complement each other.

Proceedings ArticleDOI
Kaisheng Yao1, Baolin Peng2, Geoffrey Zweig1, Dong Yu1, Xiaolong Li1, Feng Gao1 
04 May 2014
TL;DR: This paper shows that the performance of an RNN tagger can be significantly improved by incorporating elements of the CRF model; specifically, the explicit modeling of output-label dependencies with transition features, its global sequence-level objective function, and offline decoding.
Abstract: Recurrent neural networks (RNNs) have recently produced record setting performance in language modeling and word-labeling tasks. In the word-labeling task, the RNN is used analogously to the more traditional conditional random field (CRF) to assign a label to each word in an input sequence, and has been shown to significantly outperform CRFs. In contrast to CRFs, RNNs operate in an online fashion to assign labels as soon as a word is seen, rather than after seeing the whole word sequence. In this paper, we show that the performance of an RNN tagger can be significantly improved by incorporating elements of the CRF model; specifically, the explicit modeling of output-label dependencies with transition features, its global sequence-level objective function, and offline decoding. We term the resulting model a “recurrent conditional random field” and demonstrate its effectiveness on the ATIS travel domain dataset and a variety of web-search language understanding datasets.

Book ChapterDOI
14 Sep 2014
TL;DR: This work presents a novel method for blood vessel segmentation in fundus images based on a discriminatively trained, fully connected conditional random field model with more expressive potentials, and employs recent results enabling extremely fast inference in a fully connected model.
Abstract: In this work, we present a novel method for blood vessel segmentation in fundus images based on a discriminatively trained, fully connected conditional random field model. Retinal image analysis is greatly aided by blood vessel segmentation as the vessel structure may be considered both a key source of signal, e.g. in the diagnosis of diabetic retinopathy, or a nuisance, e.g. in the analysis of pigment epithelium or choroid related abnormalities. Blood vessel segmentation in fundus images has been considered extensively in the literature, but remains a challenge largely due to the desired structures being thin and elongated, a setting that performs particularly poorly using standard segmentation priors such as a Potts model or total variation. In this work, we overcome this difficulty using a discriminatively trained conditional random field model with more expressive potentials. In particular, we employ recent results enabling extremely fast inference in a fully connected model. We find that this rich but computationally efficient model family, combined with principled discriminative training based on a structured output support vector machine yields a fully automated system that achieves results statistically indistinguishable from an expert human annotator. Implementation details are available at http://pages.saclay.inria.fr/ matthew.blaschko/projects/retina/.

Proceedings ArticleDOI
11 Aug 2014
TL;DR: This paper describes the system used in the Aspect Based Sentiment Analysis Task 4 at the SemEval-2014, which consists of a Conditional Random Field based classifier for Aspect Term Extraction (ATE) and a linear classifiers for Aspects Term Polarity Classification (ATP).
Abstract: This paper describes our system used in the Aspect Based Sentiment Analysis Task 4 at the SemEval-2014. Our system consists of two components to address two of the subtasks respectively: a Conditional Random Field (CRF) based classifier for Aspect Term Extraction (ATE) and a linear classifier for Aspect Term Polarity Classification (ATP). For the ATE subtask, we implement a variety of lexicon, syntactic and semantic features, as well as cluster features induced from unlabeled data. Our system achieves state-of-the-art performances in ATE, ranking 1st (among 28 submissions) and 2rd (among 27 submissions) for the restaurant and laptop domain respectively.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work presents a practical framework to automatically detect shadows in real world scenes from a single photograph using multiple convolutional deep neural networks (ConvNets) and learns features at the super-pixel level and along the object boundaries.
Abstract: We present a practical framework to automatically detect shadows in real world scenes from a single photograph. Previous works on shadow detection put a lot of effort in designing shadow variant and invariant hand-crafted features. In contrast, our framework automatically learns the most relevant features in a supervised manner using multiple convolutional deep neural networks (ConvNets). The 7-layer network architecture of each ConvNet consists of alternating convolution and sub-sampling layers. The proposed framework learns features at the super-pixel level and along the object boundaries. In both cases, features are extracted using a context aware window centered at interest points. The predicted posteriors based on the learned features are fed to a conditional random field model to generate smooth shadow contours. Our proposed framework consistently performed better than the state-of-the-art on all major shadow databases collected under a variety of conditions.

Journal ArticleDOI
01 Jan 2014
TL;DR: A robust segmentation method using model-aware affinity demonstrates comparable performance with other state-of-the art algorithms for brain tumor MRI scans.
Abstract: Detection and segmentation of a brain tumor such as glioblastoma multiforme (GBM) in magnetic resonance (MR) images are often challenging due to its intrinsically heterogeneous signal characteristics. A robust segmentation method for brain tumor MRI scans was developed and tested. Simple thresholds and statistical methods are unable to adequately segment the various elements of the GBM, such as local contrast enhancement, necrosis, and edema. Most voxel-based methods cannot achieve satisfactory results in larger data sets, and the methods based on generative or discriminative models have intrinsic limitations during application, such as small sample set learning and transfer. A new method was developed to overcome these challenges. Multimodal MR images are segmented into superpixels using algorithms to alleviate the sampling issue and to improve the sample representativeness. Next, features were extracted from the superpixels using multi-level Gabor wavelet filters. Based on the features, a support vector machine (SVM) model and an affinity metric model for tumors were trained to overcome the limitations of previous generative models. Based on the output of the SVM and spatial affinity models, conditional random fields theory was applied to segment the tumor in a maximum a posteriori fashion given the smoothness prior defined by our affinity model. Finally, labeling noise was removed using “structural knowledge” such as the symmetrical and continuous characteristics of the tumor in spatial domain. The system was evaluated with 20 GBM cases and the BraTS challenge data set. Dice coefficients were computed, and the results were highly consistent with those reported by Zikic et al. (MICCAI 2012, Lecture notes in computer science. vol 7512, pp 369–376, 2012). A brain tumor segmentation method using model-aware affinity demonstrates comparable performance with other state-of-the art algorithms.

Journal ArticleDOI
TL;DR: The entity recognition results for the individual entities Disorder and Finding show that it is meaningful to separate the general category Medical Problem into these two more granular entity types, e.g. for knowledge mining of co-morbidity relations and disorder-finding relations.

Journal ArticleDOI
TL;DR: This paper proposes a new model-The associative hierarchical random field (AHRF), and a novel algorithm for its optimization; the second is the application of this model to the problem of semantic segmentation.
Abstract: This paper makes two contributions: the first is the proposal of a new model—The associative hierarchical random field (AHRF), and a novel algorithm for its optimization; the second is the application of this model to the problem of semantic segmentation. Most methods for semantic segmentation are formulated as a labeling problem for variables that might correspond to either pixels or segments such as super-pixels. It is well known that the generation of super pixel segmentations is not unique. This has motivated many researchers to use multiple super pixel segmentations for problems such as semantic segmentation or single view reconstruction. These super-pixels have not yet been combined in a principled manner, this is a difficult problem, as they may overlap, or be nested in such a way that the segmentations form a segmentation tree. Our new hierarchical random field model allows information from all of the multiple segmentations to contribute to a global energy. MAP inference in this model can be performed efficiently using powerful graph cut based move making algorithms. Our framework generalizes much of the previous work based on pixels or segments, and the resulting labelings can be viewed both as a detailed segmentation at the pixel level, or at the other extreme, as a segment selector that pieces together a solution like a jigsaw, selecting the best segments from different segmentations as pieces. We evaluate its performance on some of the most challenging data sets for object class segmentation, and show that this ability to perform inference using multiple overlapping segmentations leads to state-of-the-art results.

Proceedings ArticleDOI
01 Jun 2014
TL;DR: A novel context-aware method for analyzing sentiment at the level of individual sentences that encoding intuitive lexical and discourse knowledge as expressive constraints and integrating them into the learning of conditional random field models via posterior regularization is proposed.
Abstract: This paper proposes a novel context-aware method for analyzing sentiment at the level of individual sentences. Most existing machine learning approaches suffer from limitations in the modeling of complex linguistic structures across sentences and often fail to capture nonlocal contextual cues that are important for sentiment interpretation. In contrast, our approach allows structured modeling of sentiment while taking into account both local and global contextual information. Specifically, we encode intuitive lexical and discourse knowledge as expressive constraints and integrate them into the learning of conditional random field models via posterior regularization. The context-aware constraints provide additional power to the CRF model and can guide semi-supervised learning when labeled data is limited. Experiments on standard product review datasets show that our method outperforms the state-of-theart methods in both the supervised and semi-supervised settings.

Journal ArticleDOI
TL;DR: A robust hand parsing scheme to extract a high-level description of the hand from the depth image is presented and a Superpixel-Markov Random Field (SMRF) parsing scheme is proposed to enforce the spatial smoothness and the label co-occurrence prior to remove the misclassified regions.
Abstract: Hand pose tracking and gesture recognition are useful for human-computer interaction, while a major problem is the lack of discriminative features for compact hand representation. We present a robust hand parsing scheme to extract a high-level description of the hand from the depth image. A novel distance-adaptive selection method is proposed to get more discriminative depth-context features. Besides, we propose a Superpixel-Markov Random Field (SMRF) parsing scheme to enforce the spatial smoothness and the label co-occurrence prior to remove the misclassified regions. Compared to pixel-level filtering, the SMRF scheme is more suitable to model the misclassified regions. By fusing the temporal constraints, its performance can be further improved. Overall, the proposed hand parsing scheme is accurate and efficient. The tests on synthesized dataset show it gives much higher accuracy for single-frame parsing and enhanced robustness for continuous sequence parsing compared to benchmarks. The tests on real-world depth images of the hand and human body show the robustness to complex hand configurations of our method and its generalization power to different kinds of articulated objects.

Book ChapterDOI
01 Nov 2014
TL;DR: This paper frames the problem of clothing parsing as the one of inference in a pose-aware Conditional Random Field which exploits appearance, figure/ground segmentation, shape and location priors for each garment as well as similarities between segments, and symmetries between different human body parts.
Abstract: In this paper we tackle the problem of clothing parsing: Our goal is to segment and classify different garments a person is wearing. We frame the problem as the one of inference in a pose-aware Conditional Random Field (CRF) which exploits appearance, figure/ground segmentation, shape and location priors for each garment as well as similarities between segments, and symmetries between different human body parts. We demonstrate the effectiveness of our approach on the Fashionista dataset [1] and show that we can obtain a significant improvement over the state-of-the-art.

Proceedings Article
23 Jul 2014
TL;DR: The experimental results demonstrate the superiority of the label enhancement model in terms of both prediction performance and running time comparing to the-state-of-the-art multi-label learning methods.
Abstract: In this paper, we present a novel probabilistic label enhancement model to tackle multi-label image classification problem. Recognizing multiple objects in images is a challenging problem due to label sparsity, appearance variations of the objects and occlusions. We propose to tackle these difficulties from a novel perspective by constructing auxiliary labels in the output space. Our idea is to exploit label combinations to enrich the label space and improve the label identification capacity in the original label space. In particular, we identify a set of informative label combination pairs by constructing a tree-structured graph in the label space using the maximum spanning tree algorithm, which naturally forms a conditional random field. We then use the produced label pairs as auxiliary new labels to augment the original labels and perform piecewise training under the framework of conditional random fields. In the test phase, max-product message passing is used to perform efficient inference on the tree graph, which integrates the augmented label pair classifiers and the standard individual binary classifiers for multi-label prediction. We evaluate the proposed approach on several image classification datasets. The experimental results demonstrate the superiority of our label enhancement model in terms of both prediction performance and running time comparing to the-state-of-the-art multi-label learning methods.

Journal Article
TL;DR: PyStruct aims at providing a general purpose implementation of standard structured prediction methods, both for practitioners and as a baseline for researchers, written in Python and adapts paradigms and types from the scientific Python community for seamless integration with other projects.
Abstract: Structured prediction methods have become a central tool for many machine learning applications. While more and more algorithms are developed, only very few implementations are available. PyStruct aims at providing a general purpose implementation of standard structured prediction methods, both for practitioners and as a baseline for researchers. It is written in Python and adapts paradigms and types from the scientific Python community for seamless integration with other projects.

Journal ArticleDOI
Hee-Deok Yang1
24 Dec 2014-Sensors
TL;DR: This research uses 3D depth information from hand motions, generated from Microsoft's Kinect sensor and applies a hierarchical conditional random field (CRF) that recognizes hand signs from the hand motions to detect candidate segments of signs using hand motions.
Abstract: Sign language is a visual language used by deaf people. One difficulty of sign language recognition is that sign instances of vary in both motion and shape in three-dimensional (3D) space. In this research, we use 3D depth information from hand motions, generated from Microsoft's Kinect sensor and apply a hierarchical conditional random field (CRF) that recognizes hand signs from the hand motions. The proposed method uses a hierarchical CRF to detect candidate segments of signs using hand motions, and then a BoostMap embedding method to verify the hand shapes of the segmented signs. Experiments demonstrated that the proposed method could recognize signs from signed sentence data at a rate of 90.4%.

Book ChapterDOI
06 Sep 2014
TL;DR: This paper proposes a new formulation of the human pose estimation problem, a binary Conditional Random Field model designed to detect human body parts of articulated people in single images.
Abstract: This paper proposes a new formulation of the human pose estimation problem. We present the Fields of Parts model, a binary Conditional Random Field model designed to detect human body parts of articulated people in single images.

Journal ArticleDOI
TL;DR: A hybrid object-oriented CRF classification framework for HSR imagery, namely, CRF + OO, is proposed to address problems of segmentation scale choice and competitive quantitative and qualitative performance when compared with other state-of-the-art classification algorithms.
Abstract: High spatial resolution (HSR) remote sensing imagery provides abundant geometric and detailed information, which is important for classification. In order to make full use of the spatial contextual information, object-oriented classification and pairwise conditional random fields (CRFs) are widely used. However, the segmentation scale choice is a challenging problem in object-oriented classification, and the classification result of pairwise CRF always has an oversmooth appearance. In this paper, a hybrid object-oriented CRF classification framework for HSR imagery, namely, CRF $+$ OO, is proposed to address these problems by integrating object-oriented classification and CRF classification. In CRF $+$ OO, a probabilistic pixel classification is first performed, and then, the classification results of two CRF models with different potential functions are used to obtain the segmentation map by a connected-component labeling algorithm. As a result, an object-level classification fusion scheme can be used, which integrates the object-oriented classifications using a majority voting strategy at the object level to obtain the final classification result. The experimental results using two multispectral HSR images (QuickBird and IKONOS) and a hyperspectral HSR image (HYDICE) demonstrate that the proposed classification framework has a competitive quantitative and qualitative performance for HSR image classification when compared with other state-of-the-art classification algorithms.

Journal ArticleDOI
TL;DR: A segment-based probabilistic approach to robustly recognize continuous sign language sentences using a two-layer conditional random field model and a novel decoding scheme for the semi-Markov CRF used in the 2-layer CRF.

Proceedings ArticleDOI
29 Sep 2014
TL;DR: This work presents a structured learning approach to semantic annotation of RGB-D images, and finds that the conditional random field approach improves upon previous work, setting a new state-of-the-art for the dataset.
Abstract: We present a structured learning approach to semantic annotation of RGB-D images. Our method learns to reason about spatial relations of objects and fuses low-level class predictions to a consistent interpretation of a scene. Our model incorporates color, depth and 3D scene features, on which an energy function is learned to directly optimize object class prediction using the loss-based maximum-margin principle of structural support vector machines. We evaluate our approach on the NYU V2 dataset of indoor scenes, a challenging dataset covering a wide variety of scene layouts and object classes. We hard-code much less information about the scene layout into our model then previous approaches, and instead learn object relations directly from the data. We find that our conditional random field approach improves upon previous work, setting a new state-of-the-art for the dataset. I. INTRODUCTION For robots to perform varied tasks in unstructured envi- ronments, understanding their surroundings is essential. We formulate the problem of semantic annotation of maps as a dense labeling of RGB-D images into semantic classes. Dense labeling of measured surfaces allows for a detailed reasoning about the scene. In this work, we propose the use of random forests combined with conditional random fields (CRF) to perform robust estimation of structure classes in RGB-D images. The CRF is learned using a structural support vector machine, allowing it to integrate the noisy categorization produced by a pixel-based random forest to a consistent interpretation of the scene. We thereby extend the success of learned CRF models for semantic segmentation in RGB images to the domain of 3D scenes. Our emphasis lies on exploiting the additional depth and 3D information in all processing steps, while relying on learning to create a model that is adjusted to the properties of the sensor input and environment. Our approach starts with a random forest, providing a noisy local estimate of semantic classes based on color and depth information. These estimates are grouped together using a superpixel approach, for which we extend previous superpixel algorithms from the RGB to the RGB-D domain. We then build a geometric model of the scene, based on the neighborhood graph of superpixels. We use this graph not only to capture spatial relations in the 2D plane of the image, but also to model object distances and surface angles in 3D, using a point cloud generated from the RGB-D image. The process is illustrated in Figure 1.

Book ChapterDOI
05 Dec 2014
TL;DR: A rule- based emotion cause detection method is developed which uses 25 manually complied rules and two machine learning based cause detection methods are developed including a classification-based method using support vec- tor machines and a sequence labeling based method using conditional random fields model.
Abstract: To identify the cause of emotion is a new challenge for researchers in nature language processing. Currently, there is no existing works on emotion cause detection from Chinese micro-blogging (Weibo) text. In this study, an emotion cause annotated corpus is firstly designed and developed through anno- tating the emotion cause expressions in Chinese Weibo Text. Up to now, an emotion cause annotated corpus which consists of the annotations for 1,333 Chinese Weibo is constructed. Based on the observations on this corpus, the characteristics of emotion cause expression are identified. Accordingly, a rule- based emotion cause detection method is developed which uses 25 manually complied rules. Furthermore, two machine learning based cause detection me- thods are developed including a classification-based method using support vec- tor machines and a sequence labeling based method using conditional random fields model. It is the largest available resources in this research area. The expe- rimental results show that the rule-based method achieves 68.30% accuracy rate. Furthermore, the method based on conditional random fields model achieved 77.57% accuracy which is 37.45% higher than the reference baseline method. These results show the effectiveness of our proposed emotion cause detection method.

Proceedings ArticleDOI
12 Jul 2014
TL;DR: This work presents an algorithm that produces hierarchical labelings of a scene, following is-part-of and is-type-of relationships, based on a Conditional Random Field that relates pixel-wise and pair-wise observations to labels.
Abstract: Semantic labeling of RGB-D scenes is very important in enabling robots to perform mobile manipulation tasks, but different tasks may require entirely different sets of labels. For example, when navigating to an object, we may need only a single label denoting its class, but to manipulate it, we might need to identify individual parts. In this work, we present an algorithm that produces hierarchical labelings of a scene, following is-part-of and is-type-of relationships. Our model is based on a Conditional Random Field that relates pixel-wise and pair-wise observations to labels. We encode hierarchical labeling constraints into the model while keeping inference tractable. Our model thus predicts different specificities in labeling based on its confidence—if it is not sure whether an object is Pepsi or Sprite, it will predict soda rather than making an arbitrary choice. In extensive experiments, both offline on standard datasets as well as in online robotic experiments, we show that our model outperforms other stateof-the-art methods in labeling performance as well as in success rate for robotic tasks.