Showing papers on "Conditional random field published in 2017"

PDF

Open Access

Proceedings Article•DOI•

Learning to Detect Salient Objects with Image-Level Supervision

[...]

Lijun Wang¹, Huchuan Lu¹, Yifan Wang¹, Mengyang Feng¹, Dong Wang¹, Baocai Yin¹, Xiang Ruan - Show less +3 more•Institutions (1)

Dalian University of Technology¹

21 Jul 2017

TL;DR: This paper develops a weakly supervised learning method for saliency detection using image-level tags only, which outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.

...read moreread less

Abstract: Deep Neural Networks (DNNs) have substantially improved the state-of-the-art in salient object detection. However, training DNNs requires costly pixel-level annotations. In this paper, we leverage the observation that image-level tags provide important cues of foreground salient objects, and develop a weakly supervised learning method for saliency detection using image-level tags only. The Foreground Inference Network (FIN) is introduced for this challenging task. In the first stage of our training method, FIN is jointly trained with a fully convolutional network (FCN) for image-level tag prediction. A global smooth pooling layer is proposed, enabling FCN to assign object category tags to corresponding object regions, while FIN is capable of capturing all potential foreground regions with the predicted saliency maps. In the second stage, FIN is fine-tuned with its predicted saliency maps as ground truth. For refinement of ground truth, an iterative Conditional Random Field is developed to enforce spatial label consistency and further boost performance. Our method alleviates annotation efforts and allows the usage of existing large scale training sets with image-level tags. Our model runs at 60 FPS, outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.

...read moreread less

909 citations

Proceedings Article•DOI•

SEGCloud: Semantic Segmentation of 3D Point Clouds

[...]

Lyne P. Tchapmi¹, Christopher Choy¹, Iro Armeni¹, JunYoung Gwak¹, Silvio Savarese¹ - Show less +1 more•Institutions (1)

Stanford University¹

20 Oct 2017

TL;DR: SEGCloud as discussed by the authors combines the advantages of NNs, trilinear interpolation (TI) and fully connected CRF (FC-CRF) to obtain 3D point-level segmentation.

...read moreread less

Abstract: 3D semantic scene labeling is fundamental to agents operating in the real world. In particular, labeling raw 3D point sets from sensors provides fine-grained semantics. Recent works leverage the capabilities of Neural Networks(NNs), but are limited to coarse voxel predictions and do not explicitly enforce global consistency. We present SEGCloud, an end-to-end framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields (FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw 3D points via trilinear interpolation. Then the FC-CRF enforces global consistency and provides fine-grained semantics on the points. We implement the latter as a differentiable Recurrent NN to allow joint optimization. We evaluate the framework on two indoor and two outdoor 3D datasets (NYU V2, S3DIS, KITTI, Semantic3D.net), and show performance comparable or superior to the state-of-the-art on all datasets.

...read moreread less

603 citations

Proceedings Article•DOI•

Multi-context Attention for Human Pose Estimation

[...]

Xiao Chu¹, Wei Yang¹, Wanli Ouyang¹, Cheng Ma², Alan L. Yuille³, Xiaogang Wang¹ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, Tsinghua University², Johns Hopkins University³

21 Jul 2017

TL;DR: Zhang et al. as mentioned in this paper adopted stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics, and designed Hourglass Residual Units (HRUs) to increase the receptive field of the network.

...read moreread less

Abstract: In this paper, we propose to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation. We adopt stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics. The Conditional Random Field (CRF) is utilized to model the correlations among neighboring regions in the attention map. We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on detailed descriptions for different body parts. Hence our model has the ability to focus on different granularity from local salient regions to global semantic consistent spaces. Additionally, we design novel Hourglass Residual Units (HRUs) to increase the receptive field of the network. These units are extensions of residual units with a side branch incorporating filters with larger receptive field, hence features with various scales are learned and combined within the HRUs. The effectiveness of the proposed multi-context attention mechanism and the hourglass residual units is evaluated on two widely used human pose estimation benchmarks. Our approach outperforms all existing methods on both benchmarks over all the body parts. Code has been made publicly available.

...read moreread less

543 citations

Journal Article•DOI•

A Discriminatively Trained Fully Connected Conditional Random Field Model for Blood Vessel Segmentation in Fundus Images

[...]

José Ignacio Orlando¹, Elena Prokofyeva², Matthew B. Blaschko³•Institutions (3)

National Scientific and Technical Research Council¹, French Institute of Health and Medical Research², Katholieke Universiteit Leuven³

01 Jan 2017-IEEE Transactions on Biomedical Engineering

TL;DR: Results suggest that this method for blood vessel segmentation in fundus images based on a discriminatively trained fully connected conditional random field model is suitable for the task of segmenting elongated structures, a feature that can be exploited to contribute with other medical and biological applications.

...read moreread less

Abstract: Goal: In this work, we present an extensive description and evaluation of our method for blood vessel segmentation in fundus images based on a discriminatively trained fully connected conditional random field model. Methods: Standard segmentation priors such as a Potts model or total variation usually fail when dealing with thin and elongated structures. We overcome this difficulty by using a conditional random field model with more expressive potentials, taking advantage of recent results enabling inference of fully connected models almost in real time. Parameters of the method are learned automatically using a structured output support vector machine, a supervised technique widely used for structured prediction in a number of machine learning applications. Results: Our method, trained with state of the art features, is evaluated both quantitatively and qualitatively on four publicly available datasets: DRIVE, STARE, CHASEDB1, and HRF. Additionally, a quantitative comparison with respect to other strategies is included. Conclusion: The experimental results show that this approach outperforms other techniques when evaluated in terms of sensitivity, F1-score, G-mean, and Matthews correlation coefficient. Additionally, it was observed that the fully connected model is able to better distinguish the desired structures than the local neighborhood-based approach. Significance: Results suggest that this method is suitable for the task of segmenting elongated structures, a feature that can be exploited to contribute with other medical and biological applications.

...read moreread less

429 citations

Proceedings Article•DOI•

Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation

[...]

Dan Xu¹, Elisa Ricci², Wanli Ouyang³, Xiaogang Wang³, Nicu Sebe¹ - Show less +1 more•Institutions (3)

University of Trento¹, fondazione bruno kessler², The Chinese University of Hong Kong³

21 Jul 2017

TL;DR: In this article, a deep model which fuses complementary information derived from multiple CNN side outputs is proposed, which is obtained by means of continuous Conditional Random Fields (CRFs).

...read moreread less

Abstract: This paper addresses the problem of depth estimation from a single still image. Inspired by recent works on multi-scale convolutional neural networks (CNN), we propose a deep model which fuses complementary information derived from multiple CNN side outputs. Different from previous methods, the integration is obtained by means of continuous Conditional Random Fields (CRFs). In particular, we propose two different variations, one based on a cascade of multiple CRFs, the other on a unified graphical model. By designing a novel CNN implementation of mean-field updates for continuous CRFs, we show that both proposed models can be regarded as sequential deep networks and that training can be performed end-to-end. Through extensive experimental evaluation we demonstrate the effectiveness of the proposed approach and establish new state of the art results on publicly available datasets.

...read moreread less

400 citations

Posted Content•

Multi-Context Attention for Human Pose Estimation

[...]

Xiao Chu¹, Wei Yang¹, Wanli Ouyang¹, Cheng Ma², Alan L. Yuille³, Xiaogang Wang¹ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, Tsinghua University², Johns Hopkins University³

24 Feb 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation and designs novel Hourglass Residual Units (HRUs) to increase the receptive field of the network.

...read moreread less

Abstract: In this paper, we propose to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation. We adopt stacked hourglass networks to generate attention maps from features at multiple resolutions with various semantics. The Conditional Random Field (CRF) is utilized to model the correlations among neighboring regions in the attention map. We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on the detailed description for different body parts. Hence our model has the ability to focus on different granularity from local salient regions to global semantic-consistent spaces. Additionally, we design novel Hourglass Residual Units (HRUs) to increase the receptive field of the network. These units are extensions of residual units with a side branch incorporating filters with larger receptive fields, hence features with various scales are learned and combined within the HRUs. The effectiveness of the proposed multi-context attention mechanism and the hourglass residual units is evaluated on two widely used human pose estimation benchmarks. Our approach outperforms all existing methods on both benchmarks over all the body parts.

...read moreread less

383 citations

Proceedings Article•DOI•

3D Shape Segmentation with Projective Convolutional Networks

[...]

Evangelos Kalogerakis¹, Melinos Averkiou², Subhransu Maji¹, Siddhartha Chaudhuri³•Institutions (3)

University of Massachusetts Amherst¹, University of Cyprus², Indian Institute of Technology Bombay³

01 Jul 2017

TL;DR: This paper introduces a deep architecture for segmenting 3D objects into their labeled semantic parts that significantly outperforms the existing state-of-the-art methods in the currently largest segmentation benchmark (ShapeNet).

...read moreread less

Abstract: This paper introduces a deep architecture for segmenting 3D objects into their labeled semantic parts. Our architecture combines image-based Fully Convolutional Networks (FCNs) and surface-based Conditional Random Fields (CRFs) to yield coherent segmentations of 3D shapes. The image-based FCNs are used for efficient view-based reasoning about 3D object parts. Through a special projection layer, FCN outputs are effectively aggregated across multiple views and scales, then are projected onto the 3D object surfaces. Finally, a surface-based CRF combines the projected outputs with geometric consistency cues to yield coherent segmentations. The whole architecture (multi-view FCNs and CRF) is trained end-to-end. Our approach significantly outperforms the existing state-of-the-art methods in the currently largest segmentation benchmark (ShapeNet). Finally, we demonstrate promising segmentation results on noisy 3D shapes acquired from consumer-grade depth cameras.

...read moreread less

357 citations

Journal Article•DOI•

DeepNAT: Deep convolutional neural network for segmenting neuroanatomy.

[...]

Christian Wachinger¹, Martin Reuter², Tassilo Klein•Institutions (2)

Ludwig Maximilian University of Munich¹, Massachusetts Institute of Technology²

20 Feb 2017-NeuroImage

TL;DR: DeepNAT is an end‐to‐end learning‐based approach to brain segmentation that jointly learns an abstract feature representation and a multi‐class classification and the results show that DeepNAT compares favorably to state‐of‐the‐art methods.

...read moreread less

347 citations

Journal Article•DOI•

DeepCut: Object Segmentation From Bounding Box Annotations Using Convolutional Neural Networks

[...]

Martin Rajchl¹, Matthew C. H. Lee¹, Ozan Oktay¹, Konstantinos Kamnitsas¹, Jonathan Passerat-Palmbach¹, Wenjia Bai¹, Mellisa Damodaram¹, Mary A. Rutherford², Joseph V. Hajnal², Bernhard Kainz¹, Daniel Rueckert¹ - Show less +7 more•Institutions (2)

Imperial College London¹, King's College London²

01 Feb 2017-IEEE Transactions on Medical Imaging

TL;DR: DeepCut as discussed by the authors proposes a method to obtain pixelwise object segmentations given an image dataset labeled weak annotations, in our case bounding boxes, by training a neural network classifier from bounding box annotations.

...read moreread less

Abstract: In this paper, we propose DeepCut, a method to obtain pixelwise object segmentations given an image dataset labelled weak annotations, in our case bounding boxes. It extends the approach of the well-known GrabCut[1] method to include machine learning by training a neural network classifier from bounding box annotations. We formulate the problem as an energy minimisation problem over a densely-connected conditional random field and iteratively update the training targets to obtain pixelwise object segmentations. Additionally, we propose variants of the DeepCut method and compare those to a naive approach to CNN training under weak supervision. We test its applicability to solve brain and lung segmentation problems on a challenging fetal magnetic resonance dataset and obtain encouraging results in terms of accuracy.

...read moreread less

320 citations

Posted Content•

Structured Attention Networks

[...]

Yoon Kim¹, Carl Denton, Luong Hoang², Alexander M. Rush³•Institutions (3)

Harvard University¹, Microsoft², Imperial College London³

03 Feb 2017-arXiv: Computation and Language

TL;DR: Structured Attention Networks as mentioned in this paper extend the basic attention procedure by encoding richer structural distributions, encoded using graphical models, within deep networks, and show that these structured attention networks outperform baseline attention models on a variety of synthetic and real tasks.

...read moreread less

Abstract: Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network. However, for many tasks we may want to model richer structural dependencies without abandoning end-to-end training. In this work, we experiment with incorporating richer structural distributions, encoded using graphical models, within deep networks. We show that these structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees. We experiment with two different classes of structured attention networks: a linear-chain conditional random field and a graph-based parsing model, and describe how these models can be practically implemented as neural network layers. Experiments show that this approach is effective for incorporating structural biases, and structured attention networks outperform baseline attention models on a variety of synthetic and real tasks: tree transduction, neural machine translation, question answering, and natural language inference. We further find that models trained in this way learn interesting unsupervised hidden representations that generalize simple attention.

...read moreread less

287 citations

Journal Article•DOI•

Top-Down Visual Saliency via Joint CRF and Dictionary Learning

[...]

Jimei Yang¹, Ming-Hsuan Yang²•Institutions (2)

Adobe Systems¹, University of California, Merced²

01 Mar 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a discriminative dictionary and proposes a max-margin approach to train the dictionary modulated by CRF, and meanwhile a CRF with sparse coding.

...read moreread less

Abstract: Top-down visual saliency is an important module of visual attention. In this work, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a visual dictionary. The proposed model incorporates a layered structure from top to bottom: CRF, sparse coding and image patches. With sparse coding as an intermediate layer, CRF is learned in a feature-adaptive manner; meanwhile with CRF as the output layer, the dictionary is learned under structured supervision. For efficient and effective joint learning, we develop a max-margin approach via a stochastic gradient descent algorithm. Experimental results on the Graz-02 and PASCAL VOC datasets show that our model performs favorably against state-of-the-art top-down saliency methods for target object localization. In addition, the dictionary update significantly improves the performance of our model. We demonstrate the merits of the proposed top-down saliency model by applying it to prioritizing object proposals for detection and predicting human fixations.

...read moreread less

Journal Article•DOI•

A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery

[...]

Narges Ahmidi¹, Lingling Tao¹, Shahin Sefati¹, Yixin Gao¹, Colin Lea¹, Benjamin Bejar Haro¹, Luca Zappella¹, Sanjeev Khudanpur¹, René Vidal¹, Gregory D. Hager¹ - Show less +6 more•Institutions (1)

Johns Hopkins University¹

04 Jan 2017-IEEE Transactions on Biomedical Engineering

TL;DR: The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database, a public dataset that is created to support comparative research benchmarking.

...read moreread less

Abstract: Objective : State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. Methods : In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. Results : Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. Conclusion : Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. Significance : The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.

...read moreread less

Proceedings Article•

Structured Attention Networks

[...]

Yoon Kim¹, Carl Denton, Luong Hoang², Alexander M. Rush³•Institutions (3)

Harvard University¹, Microsoft², Imperial College London³

03 Feb 2017

TL;DR: This work shows that structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees.

...read moreread less

Posted Content•

W-Net: A Deep Model for Fully Unsupervised Image Segmentation

[...]

Xide Xia, Brian Kulis

22 Nov 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper revisits the problem of purely unsupervised image segmentation and proposes a novel deep architecture for this problem by concatenating two fully convolutional networks together into an autoencoder--one for encoding and one for decoding.

...read moreread less

Abstract: While significant attention has been recently focused on designing supervised deep semantic segmentation algorithms for vision tasks, there are many domains in which sufficient supervised pixel-level labels are difficult to obtain. In this paper, we revisit the problem of purely unsupervised image segmentation and propose a novel deep architecture for this problem. We borrow recent ideas from supervised semantic segmentation methods, in particular by concatenating two fully convolutional networks together into an autoencoder--one for encoding and one for decoding. The encoding layer produces a k-way pixelwise prediction, and both the reconstruction error of the autoencoder as well as the normalized cut produced by the encoder are jointly minimized during training. When combined with suitable postprocessing involving conditional random field smoothing and hierarchical segmentation, our resulting algorithm achieves impressive results on the benchmark Berkeley Segmentation Data Set, outperforming a number of competing methods.

...read moreread less

Journal Article•DOI•

Character-level neural network for biomedical named entity recognition.

[...]

Mourad Gridach

01 Jun 2017-Journal of Biomedical Informatics

TL;DR: This paper introduces a novel neural network architecture that benefits from both word- and character-level representations automatically, by using a combination of bidirectional long short-term memory (LSTM) and conditional random field (CRF) eliminating the need for most feature engineering tasks.

...read moreread less

Journal Article•DOI•

Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts.

[...]

Anne Cocos¹, Alexander G. Fiks¹, Aaron J. Masino¹•Institutions (1)

Children's Hospital of Philadelphia¹

01 Jul 2017-Journal of the American Medical Informatics Association

TL;DR: A recurrent neural network model that labels words in an input sequence with ADR membership tags is developed that reduces manual data-labeling requirements and is scalable to large social media datasets.

...read moreread less

Proceedings Article•DOI•

Shadow Detection with Conditional Generative Adversarial Networks

[...]

Vu Nguyen¹, Tomas F. Yago Vicente¹, Maozheng Zhao², Minh Hoai¹, Dimitris Samaras¹ - Show less +1 more•Institutions (2)

Stony Brook University¹, Beijing University of Posts and Telecommunications²

01 Oct 2017

TL;DR: This work introduces scGAN, a novel extension of conditional Generative Adversarial Networks (GAN) tailored for the challenging problem of shadow detection in images, and introduces an additional sensitivity parameter to the generator of a conditional GAN.

...read moreread less

Abstract: We introduce scGAN, a novel extension of conditional Generative Adversarial Networks (GAN) tailored for the challenging problem of shadow detection in images. Previous methods for shadow detection focus on learning the local appearance of shadow regions, while using limited local context reasoning in the form of pairwise potentials in a Conditional Random Field. In contrast, the proposed adversarial approach is able to model higher level relationships and global scene characteristics. We train a shadow detector that corresponds to the generator of a conditional GAN, and augment its shadow accuracy by combining the typical GAN loss with a data loss term. Due to the unbalanced distribution of the shadow labels, we use weighted cross entropy. With the standard GAN architecture, properly setting the weight for the cross entropy would require training multiple GANs, a computationally expensive grid procedure. In scGAN, we introduce an additional sensitivity parameter w to the generator. The proposed approach effectively parameterizes the loss of the trained detector. The resulting shadow detector is a single network that can generate shadow maps corresponding to different sensitivity levels, obviating the need for multiple models and a costly training procedure. We evaluate our method on the large-scale SBU and UCF shadow datasets, and observe up to 17% error reduction with respect to the previous state-of-the-art method.

...read moreread less

Book Chapter•DOI•

Exploiting Context for Rumour Detection in Social Media

[...]

Arkaitz Zubiaga¹, Maria Liakata¹, Maria Liakata², Rob Procter¹, Rob Procter² - Show less +1 more•Institutions (2)

University of Warwick¹, The Turing Institute²

13 Sep 2017

TL;DR: A novel approach using Conditional Random Fields that learns from the sequential dynamics of social media posts with the current state-of-the-art rumour detection system, as well as other baselines, and results provide evidence for the generalisability of the classifier.

...read moreread less

Abstract: Tools that are able to detect unverified information posted on social media during a news event can help to avoid the spread of rumours that turn out to be false. In this paper we compare a novel approach using Conditional Random Fields that learns from the sequential dynamics of social media posts with the current state-of-the-art rumour detection system, as well as other baselines. In contrast to existing work, our classifier does not need to observe tweets querying the stance of a post to deem it a rumour but, instead, exploits context learned during the event. Our classifier has improved precision and recall over the state-of-the-art classifier that relies on querying tweets, as well as outperforming our best baseline. Moreover, the results provide evidence for the generalisability of our classifier.

...read moreread less

Proceedings Article•DOI•

End-to-End Training of Hybrid CNN-CRF Models for Stereo

[...]

Patrick Knöbelreiter¹, Christian Reinbacher¹, Alexander Shekhovtsov¹, Thomas Pock²•Institutions (2)

Graz University of Technology¹, Austrian Institute of Technology²

01 Jul 2017

TL;DR: In this paper, a hybrid CNN+CRF model is proposed for stereo estimation, which combines the advantages of both CNNs and CRFs in an unified approach, and achieves state-of-the-art performance.

...read moreread less

Abstract: We propose a novel and principled hybrid CNN+CRF model for stereo estimation. Our model allows to exploit the advantages of both, convolutional neural networks (CNNs) and conditional random fields (CRFs) in an unified approach. The CNNs compute expressive features for matching and distinctive color edges, which in turn are used to compute the unary and binary costs of the CRF. For inference, we apply a recently proposed highly parallel dual block descent algorithm which only needs a small fixed number of iterations to compute a high-quality approximate minimizer. As the main contribution of the paper, we propose a theoretically sound method based on the structured output support vector machine (SSVM) to train the hybrid CNN+CRF model on large-scale data end-to-end. Our trained models perform very well despite the fact that we are using shallow CNNs and do not apply any kind of post-processing to the final output of the CRF. We evaluate our combined models on challenging stereo benchmarks such as Middlebury 2014 and Kitti 2015 and also investigate the performance of each individual component.

...read moreread less

Journal Article•DOI•

De-identification of clinical notes via recurrent neural network and conditional random field

[...]

Zengjian Liu¹, Buzhou Tang¹, Xiaolong Wang¹, Qingcai Chen¹•Institutions (1)

Harbin Institute of Technology Shenzhen Graduate School¹

01 Nov 2017-Journal of Biomedical Informatics

TL;DR: A hybrid system is developed that achieves the highest micro F1-scores under the "token, "strict" and "binary token" criteria respectively, ranking first in the 2016 CEGS N-GRID NLP challenge and outperforming other state-of-the-art systems.

...read moreread less

Journal Article•DOI•

Feature selection and ensemble construction

[...]

Shad Akhtar¹, Deepak Gupta¹, Asif Ekbal¹, Pushpak Bhattacharyya¹•Institutions (1)

Indian Institute of Technology Patna¹

01 Jun 2017-Knowledge Based Systems

TL;DR: A cascaded framework of feature selection and classifier ensemble using particle swarm optimization (PSO) for aspect based sentiment analysis using three classifiers, namely Maximum Entropy, Conditional Random Field and Support Vector Machine are presented.

...read moreread less

Abstract: In this paper we present a cascaded framework of feature selection and classifier ensemble using particle swarm optimization (PSO) for aspect based sentiment analysis. Aspect based sentiment analysis is performed in two steps, viz. aspect term extraction and sentiment classification. The pruned, compact set of features performs better compared to the baseline model that makes use of the complete set of features for aspect term extraction and sentiment classification. We further construct an ensemble based on PSO, and put it in cascade after the feature selection module. We use the features that are identified based on the properties of different classifiers and domains. As base learning algorithms we use three classifiers, namely Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM). Experiments for aspect term extraction and sentiment analysis on two different kinds of domains show the effectiveness of our proposed approach.

...read moreread less

Journal Article•DOI•

Hybrid conditional random field based camera-LIDAR fusion for road detection

[...]

Liang Xiao¹, Ruili Wang², Bin Dai¹, Yuqiang Fang¹, Daxue Liu¹, Tao Wu¹ - Show less +2 more•Institutions (2)

National University of Defense Technology¹, Massey University²

29 Apr 2017-Information Sciences

TL;DR: This model integrates the information from the two sensors in a probabilistic way and makes good use of both sensors, and can be optimized efficiently with graph cuts to get road areas.

...read moreread less

Journal Article•DOI•

Entity recognition from clinical texts via recurrent neural network

[...]

Zengjian Liu¹, Ming Yang², Xiaolong Wang¹, Qingcai Chen¹, Buzhou Tang³, Buzhou Tang¹, Zhe Wang³, Hua Xu⁴ - Show less +4 more•Institutions (4)

Harbin Institute of Technology Shenzhen Graduate School¹, Shenzhen University², Jilin University³, University of Texas Health Science Center at Houston⁴

05 Jul 2017-BMC Medical Informatics and Decision Making

TL;DR: This paper comprehensively investigates the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition, and shows that L STM outperforms traditional machine learning methods that suffer from fussy feature engineering.

...read moreread less

Abstract: Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts. In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer – generates representation of each word of a sentence; LSTM layer – outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer – makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence. Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems. LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge bases widely existing in the clinical domain into LSTM, which is a case of our future work. Moreover, how to use LSTM to recognize entities in specific formats is also another possible future direction.

...read moreread less

Proceedings Article•DOI•

A Multi-task Approach for Named Entity Recognition in Social Media Data

[...]

Gustavo Aguilar¹, Suraj Maharjan², Adrián Pastor López-Monroy, Thamar Solorio¹•Institutions (2)

University of Houston¹, Jilin University²

01 Sep 2017

TL;DR: A novel multi-task approach is proposed by employing a more general secondary task of Named Entity (NE) segmentation together with the primary task of fine-grained NE categorization, which learns higher order feature representations from word and character sequences along with basic Part-of-Speech tags and gazetteer information.

...read moreread less

Abstract: Named Entity Recognition for social media data is challenging because of its inherent noisiness. In addition to improper grammatical structures, it contains spelling inconsistencies and numerous informal abbreviations. We propose a novel multi-task approach by employing a more general secondary task of Named Entity (NE) segmentation together with the primary task of fine-grained NE categorization. The multi-task neural network architecture learns higher order feature representations from word and character sequences along with basic Part-of-Speech tags and gazetteer information. This neural network acts as a feature extractor to feed a Conditional Random Fields classifier. We were able to obtain the first position in the 3rd Workshop on Noisy User-generated Text (WNUT-2017) with a 41.86% entity F1-score and a 40.24% surface F1-score.

...read moreread less

Proceedings Article•DOI•

Global Hypothesis Generation for 6D Object Pose Estimation

[...]

Frank Michel¹, Alexander Kirillov¹, Eric Brachmann¹, Alexander Krull¹, Stefan Gumhold¹, Bogdan Savchynskyy¹, Carsten Rother¹ - Show less +3 more•Institutions (1)

Dresden University of Technology¹

21 Jul 2017

TL;DR: In this article, a fully-connected conditional random field (CRF) was proposed to generate a small number of pose-hypotheses from a single RGB-D image.

...read moreread less

Abstract: This paper addresses the task of estimating the 6D-pose of a known 3D object from a single RGB-D image. Most modern approaches solve this task in three steps: i) compute local features, ii) generate a pool of pose-hypotheses, iii) select and refine a pose from the pool. This work focuses on the second step. While all existing approaches generate the hypotheses pool via local reasoning, e.g. RANSAC or Hough-Voting, we are the first to show that global reasoning is beneficial at this stage. In particular, we formulate a novel fully-connected Conditional Random Field (CRF) that outputs a very small number of pose-hypotheses. Despite the potential functions of the CRF being non-Gaussian, we give a new, efficient two-step optimization procedure, with some guarantees for optimality. We utilize our global hypotheses generation procedure to produce results that exceed state-of-the-art for the challenging Occluded Object Dataset.

...read moreread less

Proceedings Article•DOI•

Combining Bottom-Up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation

[...]

Anirban Roy¹, Sinisa Todorovic¹•Institutions (1)

Oregon State University¹

01 Jul 2017

TL;DR: This paper addresses the problem of weakly supervised semantic image segmentation with a novel deep architecture which fuses three distinct computation processes toward semantic segmentation – and formulate a unified end-to-end learning of all components of the deep architecture.

...read moreread less

Abstract: This paper addresses the problem of weakly supervised semantic image segmentation. Our goal is to label every pixel in a new image, given only image-level object labels associated with training images. Our problem statement differs from common semantic segmentation, where pixel-wise annotations are typically assumed available in training. We specify a novel deep architecture which fuses three distinct computation processes toward semantic segmentation – namely, (i) the bottom-up computation of neural activations in a CNN for the image-level prediction of object classes, (ii) the top-down estimation of conditional likelihoods of the CNNs activations given the predicted objects, resulting in probabilistic attention maps per object class, and (iii) the lateral attention-message passing from neighboring neurons at the same CNN layer. The fusion of (i)-(iii) is realized via a conditional random field as recurrent network aimed at generating a smooth and boundary-preserving segmentation. Unlike existing work, we formulate a unified end-to-end learning of all components of our deep architecture. Evaluation on the benchmark PASCAL VOC 2012 dataset demonstrates that we outperform reasonable weakly supervised baselines and state-of-the-art approaches.

...read moreread less

Proceedings Article•DOI•

Object-based affordances detection with Convolutional Neural Networks and dense Conditional Random Fields

[...]

Anh Nguyen¹, Dimitrios Kanoulas¹, Darwin G. Caldwell¹, Nikos G. Tsagarakis¹•Institutions (1)

Istituto Italiano di Tecnologia¹

01 Sep 2017

TL;DR: A new method to detect object affordances in real-world scenes using deep Convolutional Neural Networks, an object detector and dense Conditional Random Fields and a grasping method that is robust to noisy data is presented.

...read moreread less

Abstract: We present a new method to detect object affordances in real-world scenes using deep Convolutional Neural Networks (CNN), an object detector and dense Conditional Random Fields (CRF). Our system first trains an object detector to generate bounding box candidates from the images. A deep CNN is then used to learn the depth features from these bounding boxes. Finally, these feature maps are post-processed with dense CRF to improve the prediction along class boundaries. The experimental results on our new challenging dataset show that the proposed approach outperforms recent state-of-the-art methods by a substantial margin. Furthermore, from the detected affordances we introduce a grasping method that is robust to noisy data. We demonstrate the effectiveness of our framework on the full-size humanoid robot WALK-MAN using different objects in real-world scenarios.

...read moreread less

Journal Article•DOI•

Long short-term memory RNN for biomedical named entity recognition

[...]

Chen Lyu¹, Bo Chen², Yafeng Ren³, Donghong Ji¹•Institutions (3)

Wuhan University¹, Hubei University², Guangdong University of Foreign Studies³

30 Oct 2017-BMC Bioinformatics

TL;DR: A recurrent neural network architecture can be successfully used for BNER without any manual feature engineering, and experimental results show that domain-specific pre-trained word embeddings and character-level representation can improve the performance of the LSTM-RNN models.

...read moreread less

Abstract: Biomedical named entity recognition(BNER) is a crucial initial step of information extraction in biomedical domain. The task is typically modeled as a sequence labeling problem. Various machine learning algorithms, such as Conditional Random Fields (CRFs), have been successfully used for this task. However, these state-of-the-art BNER systems largely depend on hand-crafted features. We present a recurrent neural network (RNN) framework based on word embeddings and character representation. On top of the neural network architecture, we use a CRF layer to jointly decode labels for the whole sentence. In our approach, contextual information from both directions and long-range dependencies in the sequence, which is useful for this task, can be well modeled by bidirectional variation and long short-term memory (LSTM) unit, respectively. Although our models use word embeddings and character embeddings as the only features, the bidirectional LSTM-RNN (BLSTM-RNN) model achieves state-of-the-art performance — 86.55% F1 on BioCreative II gene mention (GM) corpus and 73.79% F1 on JNLPBA 2004 corpus. Our neural network architecture can be successfully used for BNER without any manual feature engineering. Experimental results show that domain-specific pre-trained word embeddings and character-level representation can improve the performance of the LSTM-RNN models. On the GM corpus, we achieve comparable performance compared with other systems using complex hand-crafted features. Considering the JNLPBA corpus, our model achieves the best results, outperforming the previously top performing systems. The source code of our method is freely available under GPL at https://github.com/lvchen1989/BNER .

...read moreread less

Proceedings Article•DOI•

Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection

[...]

Dafang He¹, Scott Cohen², Brian Price², Daniel Kifer, C. Lee Giles³ - Show less +1 more•Institutions (3)

Pennsylvania State University¹, Adobe Systems², Penn State College of Information Sciences and Technology³

02 Jul 2017

TL;DR: This work presents a page segmentation algorithm that incorporates state-of-the-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures and proposes a conditional random field (CRF) that uses features output from the semantic segmentsation and contour networks to improve upon the semantic segmentation network output.

...read moreread less

Abstract: Page segmentation and table detection play an important role in understanding the structure of documents We present a page segmentation algorithm that incorporates state-of-the-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures We propose a multi-scale, multi-task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection The semantic segmentation network accurately predicts the probability at each pixel of the three element classes The contour detection network accurately predicts instance level "edges" around each element occurrence We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents Our approach learns the representative features for page segmentation from real and synthetic training data %, and produces good results on real documents The learning-based property makes it a more general method than existing methods in terms of document types and element appearances For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods

...read moreread less

Proceedings Article•DOI•

Detection and Localization of Image Forgeries Using Resampling Features and Deep Learning

[...]

Jason Bunk, Jawadul H. Bappy¹, Tajuddin Manhar Mohammed, Lakshmanan Nataraj, Arjuna Flenner², B.S. Manjunath³, Shivkumar Chandrasekaran³, Amit K. Roy-Chowdhury¹, Lawrence Peterson² - Show less +5 more•Institutions (3)

University of California, Riverside¹, Naval Air Warfare Center Weapons Division², University of California, Santa Barbara³

21 Jul 2017

TL;DR: Two methods to detect and localize image manipulations based on a combination of resampling features and deep learning are proposed, both effective in detecting and localizing digital image forgeries.

...read moreread less

Abstract: Resampling is an important signature of manipulated images. In this paper, we propose two methods to detect and localize image manipulations based on a combination of resampling features and deep learning. In the first method, the Radon transform of resampling features are computed on overlapping image patches. Deep learning classifiers and a Gaussian conditional random field model are then used to create a heatmap. Tampered regions are located using a Random Walker segmentation method. In the second method, resampling features computed on overlapping image patches are passed through a Long short-term memory (LSTM) based network for classification and localization. We compare the performance of detection/localization of both these methods. Our experimental results show that both techniques are effective in detecting and localizing digital image forgeries.

...read moreread less

Collapse