scispace - formally typeset
Search or ask a question

Showing papers on "Conditional random field published in 2007"


01 Jan 2007
TL;DR: A solution to this problem is to directly model the conditional distribution p(y|x), which is sufficient for classification, and this is the approach taken by conditional random fields.
Abstract: 1.1 Introduction Relational data has two characteristics: first, statistical dependencies exist between the entities we wish to model, and second, each entity often has a rich set of features that can aid classification. For example, when classifying Web documents, the page's text provides much information about the class label, but hyperlinks define a relationship between pages that can improve classification [Taskar et al., 2002]. Graphical models are a natural formalism for exploiting the dependence structure among entities. Traditionally, graphical models have been used to represent the joint probability distribution p(y, x), where the variables y represent the attributes of the entities that we wish to predict, and the input variables x represent our observed knowledge about the entities. But modeling the joint distribution can lead to difficulties when using the rich local features that can occur in relational data, because it requires modeling the distribution p(x), which can include complex dependencies. Modeling these dependencies among inputs can lead to intractable models, but ignoring them can lead to reduced performance. A solution to this problem is to directly model the conditional distribution p(y|x), which is sufficient for classification. This is the approach taken by conditional random fields [Lafferty et al., 2001]. A conditional random field is simply a conditional distribution p(y|x) with an associated graphical structure. Because the model is

977 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper has constructed a large number of stereo datasets with ground-truth disparities, and a subset of these datasets are used to learn the parameters of conditional random fields (CRFs) and presents experimental results illustrating the potential of this approach for automatically learning the Parameters of models with richer structure than standard hand-tuned MRF models.
Abstract: State-of-the-art stereo vision algorithms utilize color changes as important cues for object boundaries. Most methods impose heuristic restrictions or priors on disparities, for example by modulating local smoothness costs with intensity gradients. In this paper we seek to replace such heuristics with explicit probabilistic models of disparities and intensities learned from real images. We have constructed a large number of stereo datasets with ground-truth disparities, and we use a subset of these datasets to learn the parameters of conditional random fields (CRFs). We present experimental results illustrating the potential of our approach for automatically learning the parameters of models with richer structure than standard hand-tuned MRF models.

893 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work proposes to incorporate semantic object context as a post-processing step into any off-the-shelf object categorization model using a conditional random field (CRF) framework, which maximizes object label agreement according to contextual relevance.
Abstract: In the task of visual object categorization, semantic context can play the very important role of reducing ambiguity in objects' visual appearance. In this work we propose to incorporate semantic object context as a post-processing step into any off-the-shelf object categorization model. Using a conditional random field (CRF) framework, our approach maximizes object label agreement according to contextual relevance. We compare two sources of context: one learned from training data and another queried from Google Sets. The overall performance of the proposed framework is evaluated on the PASCAL and MSRC datasets. Our findings conclude that incorporating context into object categorization greatly improves categorization accuracy.

740 citations


Journal ArticleDOI
TL;DR: A discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations and a hidden-state conditional random field framework learns a set of latent variables conditioned on local features.
Abstract: We present a discriminative latent variable model for classification problems in structured domains where inputs can be represented by a graph of local observations. A hidden-state conditional random field framework learns a set of latent variables conditioned on local features. Observations need not be independent and may overlap in space and time.

578 citations


Proceedings ArticleDOI
01 Dec 2007
TL;DR: BANNER is an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field and is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps.
Abstract: There has been an increasing amount of research on biomedical named entity recognition, the most basic text extraction problem, resulting in significant progress by different research teams around the world. This has created a need for a freely-available, open source system implementing the advances described in the literature. In this paper we present BANNER, an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field. BANNER is implemented in Java as a machine-learning system based on conditional random fields and includes a wide survey of the best techniques recently described in the literature. It is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps, and achieves significantly better performance than existing baseline systems. It is therefore useful to developers as an extensible NER implementation, to researchers as a standard for comparing innovative techniques, and to biologists requiring the ability to find novel entities in large amounts of text.

524 citations


Journal ArticleDOI
TL;DR: The proposed system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons, and shows significant improvements over existing techniques.
Abstract: Learning patterns of human behavior from sensor data is extremely important for high-level activity inference. This paper describes how to extract a person's activities and significant places from traces of GPS data. The system uses hierarchically structured conditional random fields to generate a consistent model of a person's activities and places. In contrast to existing techniques, this approach takes the high-level context into account in order to detect the significant places of a person. Experiments show significant improvements over existing techniques. Furthermore, they indicate that the proposed system is able to robustly estimate a person's activities using a model that is trained from data collected by other persons.

481 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A discriminative framework for simultaneous sequence segmentation and labeling which can capture both intrinsic and extrinsic class dynamics and incorporates hidden state variables which model the sub-structure of a class sequence and learn dynamics between class labels.
Abstract: Many problems in vision involve the prediction of a class label for each frame in an unsegmented sequence. In this paper, we develop a discriminative framework for simultaneous sequence segmentation and labeling which can capture both intrinsic and extrinsic class dynamics. Our approach incorporates hidden state variables which model the sub-structure of a class sequence and learn dynamics between class labels. Each class label has a disjoint set of associated hidden states, which enables efficient training and inference in our model. We evaluated our method on the task of recognizing human gestures from unsegmented video streams and performed experiments on three different datasets of head and eye gestures. Our results demonstrate that our model compares favorably to Support Vector Machines, Hidden Markov Models, and Conditional Random Fields on visual gesture recognition tasks.

424 citations


Proceedings ArticleDOI
14 May 2007
TL;DR: It is found that the discriminatively trained CRF performs as well as or better than an HMM even when the model features do not violate the independence assumptions of the HMM, and it is confirmed that CRFs are robust against any degradation in performance.
Abstract: Activity recognition is a key component for creating intelligent, multi-agent systems. Intrinsically, activity recognition is a temporal classification problem. In this paper, we compare two models for temporal classification: hidden Markov models (HMMs), which have long been applied to the activity recognition problem, and conditional random fields (CRFs). CRFs are discriminative models for labeling sequences. They condition on the entire observation sequence, which avoids the need for independence assumptions between observations. Conditioning on the observations vastly expands the set of features that can be incorporated into the model without violating its assumptions. Using data from a simulated robot tag domain, chosen because it is multi-agent and produces complex interactions between observations, we explore the differences in performance between the discriminatively trained CRF and the generative HMM. Additionally, we examine the effect of incorporating features which violate independence assumptions between observations; such features are typically necessary for high classification accuracy. We find that the discriminatively trained CRF performs as well as or better than an HMM even when the model features do not violate the independence assumptions of the HMM. In cases where features depend on observations from many time steps, we confirm that CRFs are robust against any degradation in performance.

377 citations


Proceedings ArticleDOI
27 Aug 2007
TL;DR: Generative and discriminative approaches to modeling the sentence segmentation and concept labeling are studied and it is shown how non-local non-lexical features (e.g. a-priori knowledge) can be modeled with CRF which is the best performing algorithm across tasks.
Abstract: Spoken Language Understanding (SLU) for conversational systems (SDS) aims at extracting concept and their relations from spontaneous speech. Previous approaches to SLU have modeled concept relations as stochastic semantic networks ranging from generative approach to discriminative. As spoken dialog systems complexity increases, SLU needs to perform understanding based on a richer set of features ranging from a-priori knowledge, long dependency, dialog history, system belief, etc. This paper studies generative and discriminative approaches to modeling the sentence segmentation and concept labeling. We evaluate algorithms based on Finite State Transducers (FST) as well as discriminative algorithms based on Support Vector Machine sequence classifier based and Conditional Random Fields (CRF). We compare them in terms of concept accuracy, generalization and robustness to annotation ambiguities. We also show how non-local non-lexical features (e.g. a-priori knowledge) can be modeled with CRF which is the best performing algorithm across tasks. The evaluation is carried out on two SLU tasks of different complexity, namely ATIS and MEDIA corpora.

352 citations


Proceedings Article
06 Jan 2007
TL;DR: A Conditional Random Fields (CRF) based framework is presented to keep the merits of the above two kinds of approaches while avoiding their disadvantages and can take the outcomes of previous methods as features and seamlessly integrate them.
Abstract: Many methods, including supervised and unsupervised algorithms, have been developed for extractive document summarization. Most supervised methods consider the summarization task as a two-class classification problem and classify each sentence individually without leveraging the relationship among sentences. The unsupervised methods use heuristic rules to select the most informative sentences into a summary directly, which are hard to generalize. In this paper, we present a Conditional Random Fields (CRF) based framework to keep the merits of the above two kinds of approaches while avoiding their disadvantages. What is more, the proposed framework can take the outcomes of previous methods as features and seamlessly integrate them. The key idea of our approach is to treat the summarization task as a sequence labeling problem. In this view, each document is a sequence of sentences and the summarization procedure labels the sentences by 1 and 0. The label of a sentence depends on the assignment of labels of others. We compared our proposed approach with eight existing methods on an open benchmark data set. The results show that our approach can improve the performance by more than 7.1% and 12.1% over the best supervised baseline and unsupervised baseline respectively in terms of two popular metrics F1 and ROUGE-2. Detailed analysis of the improvement is presented as well.

349 citations


Journal ArticleDOI
TL;DR: A re-derive of the variance reduction method known in experimental design circles as ‘A-optimality’ and comparisons against different variations of the most widely used heuristic schemes are run to discover which methods work best for different classes of problems and why.
Abstract: Which active learning methods can we expect to yield good performance in learning binary and multi-category logistic regression classifiers? Addressing this question is a natural first step in providing robust solutions for active learning across a wide variety of exponential models including maximum entropy, generalized linear, log-linear, and conditional random field models. For the logistic regression model we re-derive the variance reduction method known in experimental design circles as `A-optimality.' We then run comparisons against different variations of the most widely used heuristic schemes: query by committee and uncertainty sampling, to discover which methods work best for different classes of problems and why. We find that among the strategies tested, the experimental design methods are most likely to match or beat a random sample baseline. The heuristic alternatives produced mixed results, with an uncertainty sampling variant called margin sampling and a derivative method called QBB-MM providing the most promising performance at very low computational cost. Computational running times of the experimental design methods were a bottleneck to the evaluations. Meanwhile, evaluation of the heuristic methods lead to an accumulation of negative results. We explore alternative evaluation design parameters to test whether these negative results are merely an artifact of settings where experimental design methods can be applied. The results demonstrate a need for improved active learning methods that will provide reliable performance at a reasonable computational cost.

01 Jan 2007
TL;DR: This chapter contains sections titled: Introduction, Graphical Models, Linear-Chain Conditional Random Fields, CRFs in General, Skip-Chain CRFs, Conclusion, Acknowledgments, References.
Abstract: This chapter contains sections titled: Introduction, Graphical Models, Linear-Chain Conditional Random Fields, CRFs in General, Skip-Chain CRFs, Conclusion, Acknowledgments, References

Journal Article
TL;DR: On a natural-language chunking task, it is shown that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.
Abstract: In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges---a distributed state representation as in dynamic Bayesian networks (DBNs)---and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data. In addition to maximum conditional likelihood, we present two alternative approaches for training DCRFs: marginal likelihood training, for when we are primarily interested in predicting only a subset of the variables, and cascaded training, for when we have a distinct data set for each state variable, as in transfer learning. We evaluate marginal training and cascaded training on both synthetic data and real-world text data, finding that marginal training can improve accuracy when uncertainty exists over the latent variables, and that for transfer learning, a DCRF trained in a cascaded fashion performs better than a linear-chain CRF that predicts the final task directly.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work shows that convolutional networks can be used as a general method for low-level image processing and suggests that high model complexity is the single most important factor for good performance.
Abstract: Convolutional networks have achieved a great deal of success in high-level vision problems such as object recognition. Here we show that they can also be used as a general method for low-level image processing. As an example of our approach, convolutional networks are trained using gradient learning to solve the problem of restoring noisy or degraded images. For our training data, we have used electron microscopic images of neural circuitry with ground truth restorations provided by human experts. On this dataset, Markov random field (MRF), conditional random field (CRF), and anisotropic diffusion algorithms perform about the same as simple thresholding, but superior performance is obtained with a convolutional network containing over 34,000 adjustable parameters. When restored by this convolutional network, the images are clean enough to be used for segmentation, whereas the other approaches fail in this respect. We do not believe that convolutional networks are fundamentally superior to MRFs as a representation for image processing algorithms. On the contrary, the two approaches are closely related. But in practice, it is possible to train complex convolutional networks, while even simple MRF models are hindered by problems with Bayesian learning and inference procedures. Our results suggest that high model complexity is the single most important factor for good performance, and this is possible with convolutional networks.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: The experimental results on two recent datasets have shown that the proposed framework can not only accurately recognize human activities with temporal, intra-and inter-person variations, but also is considerably robust to noise and other factors such as partial occlusion and irregularities in motion styles.
Abstract: We describe a probabilistic framework for recognizing human activities in monocular video based on simple silhouette observations in this paper. The methodology combines kernel principal component analysis (KPCA) based feature extraction and factorial conditional random field (FCRF) based motion modeling. Silhouette data is represented more compactly by nonlinear dimensionality reduction that explores the underlying structure of the articulated action space and preserves explicit temporal orders in projection trajectories of motions. FCRF models temporal sequences in multiple interacting ways, thus increasing joint accuracy by information sharing, with the ideal advantages of discriminative models over generative ones (e.g., relaxing independence assumption between observations and the ability to effectively incorporate both overlapping features and long-range dependencies). The experimental results on two recent datasets have shown that the proposed framework can not only accurately recognize human activities with temporal, intra-and inter-person variations, but also is considerably robust to noise and other factors such as partial occlusion and irregularities in motion styles.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A probabilistic model for learning rich, distributed representations of image transformations that develops domain specific motion features, in the form of fields of locally transformed edge filters, and can fantasize new transformations on previously unseen images.
Abstract: We describe a probabilistic model for learning rich, distributed representations of image transformations. The basic model is defined as a gated conditional random field that is trained to predict transformations of its inputs using a factorial set of latent variables. Inference in the model consists in extracting the transformation, given a pair of images, and can be performed exactly and efficiently. We show that, when trained on natural videos, the model develops domain specific motion features, in the form of fields of locally transformed edge filters. When trained on affine, or more general, transformations of still images, the model develops codes for these transformations, and can subsequently perform recognition tasks that are invariant under these transformations. It can also fantasize new transformations on previously unseen images. We describe several variations of the basic model and provide experimental results that demonstrate its applicability to a variety of tasks.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: This paper surveys the current state-of-art models for structured learning problems, including Hidden Markov Model (HMM), Conditional Random Fields (CRF), Averaged Perceptron (AP), Structured SVMs (SVMstruct), Max Margin Markov Networks (M3N), and an integration of search and learning algorithm (SEARN).
Abstract: In this paper, we survey the current state-of-art models for structured learning problems, including Hidden Markov Model (HMM), Conditional Random Fields (CRF), Averaged Perceptron (AP), Structured SVMs (SVMstruct), Max Margin Markov Networks (M3N), and an integration of search and learning algorithm (SEARN). With all due tuning efforts of various parameters of each model, on the data sets we have applied the models to, we found that SVMstruct enjoys better performance compared with the others. In addition, we also propose a new method which we call the Structured Learning Ensemble (SLE) to combine these structured learning models. Empirical results show that our SLE algorithm provides more accurate solutions compared with the best results of the individual models.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: On several benchmark NLP data sets, piecewise pseudolikelihood has better accuracy than standard pseudolikedlihood, and in many cases nearly equivalent to maximum likelihood, with five to ten times less training time than batch CRF training.
Abstract: Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton & McCallum, 2005) can have better accuracy but does not scale as well in the variable cardinality. In this paper, we introduce piecewise pseudolikelihood, which retains the computational efficiency of pseudolikelihood but can have much better accuracy. On several benchmark NLP data sets, piecewise pseudolikelihood has better accuracy than standard pseudolikelihood, and in many cases nearly equivalent to maximum likelihood, with five to ten times less training time than batch CRF training.

Patent
05 Dec 2007
TL;DR: The authors applied both full text and complex feature analyses to sentences of a product review and used a CRF framework to enhance sentiment classification for each segment of a complex sentence to improve sentiment prediction.
Abstract: A sentiment classifier is described. In one implementation, a system applies both full text and complex feature analyses to sentences of a product review. Each analysis is weighted prior to linear combination into a final sentiment prediction. A full text model and a complex features model can be trained separately offline to support online full text analysis and complex features analysis. Complex features include opinion indicators, negation patterns, sentiment-specific sections of the product review, user ratings, sequence of text chunks, and sentence types and lengths. A Conditional Random Field (CRF) framework provides enhanced sentiment classification for each segment of a complex sentence to enhance sentiment prediction.

Proceedings ArticleDOI
02 Nov 2007
TL;DR: This paper investigates the emotion classification of web blog corpora using support vector machine (SVM) and conditional random field (CRF) machine learning techniques and shows that CRF classifiers outperform SVM classifiers.
Abstract: In this paper, we investigate the emotion classification of web blog corpora using support vector machine (SVM) and conditional random field (CRF) machine learning techniques. The emotion classifiers are trained at the sentence level and applied to the document level. Our methods also determine an emotion category by taking the context of a sentence into account. Experiments show that CRF classifiers outperform SVM classifiers. When applying emotion classification to a blog at the document level, the emotion of the last sentence in a document plays an important role in determining the overall emotion.

Journal ArticleDOI
TL;DR: Experiments on a wide range of images show that the ensemble model produces higher detection accuracy than single CRF and is also competitive with recent results in urban area detection.
Abstract: With complex building composition and imaging condition, urban areas show versatile characteristics in remote sensing optical images. It demonstrates that multiple features should be utilized to characterize urban areas. On the other hand, since levels of development in neighboring areas are not statistically independent, the features of each urban area site depend on those of neighboring sites. In this paper, we present a multiple conditional random fields (CRFs) ensemble model to incorporate multiple features and learn their contextual information. This model involves two aspects: one is to use a CRF as the base classifier to automatically generate a set of CRFs by changing input features, and the other is to integrate the set of CRFs by defining a conditional distribution. The model has some distinct merits: each CRF component models a kind of feature, so that the ensemble model can learn different aspects of training data. Moreover, it lets the ensemble model search in a wide solution space. The ensemble model can also avoid the well-known overfitting problem of a single CRF, i.e., the many features may cause the redundancy of irrelevant information and result in counter-effect. Experiments on a wide range of images show that our ensemble model produces higher detection accuracy than single CRF and is also competitive with recent results in urban area detection.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper shows how to train a Gaussian conditional random field (GCRF) model that overcomes this weakness and can outperform the non-convex field of experts model on the task of denoising images.
Abstract: Markov random field (MRF) models are a popular tool for vision and image processing. Gaussian MRF models are particularly convenient to work with because they can be implemented using matrix and linear algebra routines. However, recent research has focused on on discrete-valued and non-convex MRF models because Gaussian models tend to over-smooth images and blur edges. In this paper, we show how to train a Gaussian conditional random field (GCRF) model that overcomes this weakness and can outperform the non-convex field of experts model on the task of denoising images. A key advantage of the GCRF model is that the parameters of the model can be optimized efficiently on relatively large images. The competitive performance of the GCRF model and the ease of optimizing its parameters make the GCRF model an attractive option for vision and image processing applications.

Proceedings Article
06 Jan 2007
TL;DR: This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text and shows that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus.
Abstract: Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of pre-defined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or book titles) can be very difficult to pick out precisely from text. Further, the Web contains a wide variety of entity classes, which are not known in advance. Thus, hand-tagging examples of each entity class is impractical. This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text. Our key observation is that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. We show that this statistical method's F1 score is 50% higher than that of supervised techniques including Conditional Random Fields (CRFs) and Conditional Markov Models (CMMs) when applied to complex names. The method also outperforms CMMs and CRFs by 117% on entity classes absent from the training data. Finally, our method outperforms a semi-supervised CRF by 73%.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A novel motion representation, "motons", inspired by research in object recognition is introduced, and the segmentation likelihood from the spatial context of motion is proposed, which is efficiently performed by Random Forests.
Abstract: This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, "motons", inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a conditional random field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.

Proceedings Article
03 Dec 2007
TL;DR: This work introduces a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it and shows that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations.
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets.

Proceedings ArticleDOI
28 Oct 2007
TL;DR: Experiments show that the accuracy of expert finding can be significantly improved by using the proposed methods, and the method to name disambiguation performs better than the baseline method using unsupervised learning.
Abstract: This paper addresses the issue of extraction of an academic researcher social network. By researcher social network extraction, we are aimed at finding, extracting, and fusing the 'semantic '-based profiling information of a researcher from the Web. Previously, social network extraction was often undertaken separately in an ad-hoc fashion. This paper first gives a formalization of the entire problem. Specifically, it identifies the 'relevant documents' from the Web by a classifier. It then proposes a unified approach to perform the researcher profiling using conditional random fields (CRF). It integrates publications from the existing bibliography datasets. In the integration, it proposes a constraints-based probabilistic model to name disambiguation. Experimental results on an online system show that the unified approach to researcher profiling significantly outperforms the baseline methods of using rule learning or classification. Experimental results also indicate that our method to name disambiguation performs better than the baseline method using unsupervised learning. The methods have been applied to expert finding. Experiments show that the accuracy of expert finding can be significantly improved by using the proposed methods.

Book ChapterDOI
17 Sep 2007
TL;DR: This work exploits unlabeled data from the target domain to train a model that maximizes likelihood over the training sample while minimizing the distance between the training and target distribution.
Abstract: The goal in domain adaptation is to train a model using labeled data sampled from a domain different from the target domain on which the model will be deployed. We exploit unlabeled data from the target domain to train a model that maximizes likelihood over the training sample while minimizing the distance between the training and target distribution. Our focus is conditional probability models used for predicting a label structure y given input x based on features defined jointly over x and y . We propose practical measures of divergence between the two domains based on which we penalize features with large divergence, while improving the effectiveness of other less deviant correlated features. Empirical evaluation on several real-life information extraction tasks using Conditional Random Fields (CRFs) show that our method of domain adaptation leads to significant reduction in error.

Book ChapterDOI
27 Aug 2007
TL;DR: The experimental results demonstrate that the proposed blinking-based liveness detection method for human face using Conditional Random Fields is promising, and outperforms the cascaded Adaboost method and HMM method.
Abstract: This paper presents a blinking-based liveness detection method for human face using Conditional Random Fields (CRFs). Our method only needs a web camera for capturing video clips. Blinking clue is a passive action and does not need the user to to any hint, such as speaking, face moving. We model blinking activity by CRFs, which accommodates long-range contextual dependencies among the observation sequence. The experimental results demonstrate that the proposed method is promising, and outperforms the cascaded Adaboost method and HMM method.

Proceedings Article
03 Dec 2007
TL;DR: This paper derives an efficient gradient-based method for learning Gaussian regularization priors with multiple hyperparameters for log-linear models, a class of structured prediction probabilistic models which includes conditional random fields (CRFs).
Abstract: In problems where input features have varying amounts of noise, using distinct regularization hyperparameters for different features provides an effective means of managing model complexity. While regularizers for neural networks and support vector machines often rely on multiple hyperparameters, regularizers for structured prediction models (used in tasks such as sequence labeling or parsing) typically rely only on a single shared hyperparameter for all features. In this paper, we consider the problem of choosing regularization hyperparameters for log-linear models, a class of structured prediction probabilistic models which includes conditional random fields (CRFs). Using an implicit differentiation trick, we derive an efficient gradient-based method for learning Gaussian regularization priors with multiple hyperparameters. In both simulations and the real-world task of computational RNA secondary structure prediction, we find that multiple hyperparameter learning can provide a significant boost in accuracy compared to using only a single regularization hyperparameter.