scispace - formally typeset
Search or ask a question
Author

João Graça

Bio: João Graça is an academic researcher from INESC-ID. The author has contributed to research in topics: Phrase & Machine translation. The author has an hindex of 15, co-authored 32 publications receiving 1289 citations. Previous affiliations of João Graça include Technical University of Lisbon & Carnegie Mellon University.

Papers
More filters
Journal Article
TL;DR: This work presents an efficient algorithm for learning with posterior regularization and illustrates its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.
Abstract: We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.

570 citations

Proceedings Article
12 Jul 2012
TL;DR: This paper shows that it is possible to build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary.
Abstract: Despite significant recent work, purely unsupervised techniques for part-of-speech (POS) tagging have not achieved useful accuracies required by many language processing tasks Use of parallel text between resource-rich and resource-poor languages is one source of weak supervision that significantly improves accuracy However, parallel text is not always available and techniques for using it require multiple complex algorithmic steps In this paper we show that we can build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary Across eight languages for which we have labeled data to evaluate results, we achieve accuracy that significantly exceeds best unsupervised and parallel text methods We achieve highest accuracy reported for several languages and show that our approach yields better out-of-domain taggers than those trained using fully supervised Penn Treebank

106 citations

Proceedings Article
01 Jun 2007
TL;DR: The error analysis for this task suggests that a primary source of error is differences in annotation guidelines between treebanks, and suspicions are supported by the observation that no team was able to improve target domain performance substantially over a state of the art baseline.
Abstract: We describe some challenges of adaptation in the 2007 CoNLL Shared Task on Domain Adaptation. Our error analysis for this task suggests that a primary source of error is differences in annotation guidelines between treebanks. Our suspicions are supported by the observation that no team was able to improve target domain performance substantially over a state of the art baseline.

87 citations

Proceedings Article
01 Jun 2008
TL;DR: This work proposes and extensively evaluates a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and shows significant gains in end-to-end translation systems for six languages pairs used in recent MT competitions.
Abstract: Automatic word alignment is a key step in training statistical machine translation systems. Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. In this work we analyze a recently proposed agreement-constrained EM algorithm for unsupervised alignment models. We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall trade-offs, and how rare and common words are affected across several language pairs. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.

75 citations

Posted Content
TL;DR: In this article, a probabilistic multi-view learning algorithm is proposed for structured and unstructured problems and easily generalizes to partial agreement scenarios, where instances can be factored into multiple views, each of which is nearly sufficent in determining the correct labels.
Abstract: In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficent in determining the correct labels. In this paper we present a new algorithm for probabilistic multi-view learning which uses the idea of stochastic agreement between views as regularization. Our algorithm works on structured and unstructured problems and easily generalizes to partial agreement scenarios. For the full agreement case, our algorithm minimizes the Bhattacharyya distance between the models of each view, and performs better than CoBoosting and two-view Perceptron on several flat and structured classification problems.

61 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This survey paper tackles a comprehensive overview of the last update in this field of sentiment analysis with sophisticated categorizations of a large number of recent articles and the illustration of the recent trend of research in the sentiment analysis and its related areas.

2,152 citations

Posted Content
TL;DR: Conditional Random Fields (CRFs) as discussed by the authors are a popular probabilistic method for structured prediction and have seen wide application in natural language processing, computer vision, and bioinformatics.
Abstract: Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling, combining the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This tutorial describes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in natural language processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large scale CRFs. We do not assume previous knowledge of graphical modeling, so this tutorial is intended to be useful to practitioners in a wide variety of fields.

785 citations

Journal ArticleDOI
TL;DR: This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited.
Abstract: Multi-view learning or learning with multiple distinct feature sets is a rapidly growing direction in machine learning with well theoretical underpinnings and great practical success. This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited. This survey aims to provide an insightful organization of current developments in the field of multi-view learning, identify their limitations, and give suggestions for further research. One feature of this survey is that we attempt to point out specific open problems which can hopefully be useful to promote the research of multi-view machine learning.

782 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: This work proposes Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space of a CNN, and demonstrates the generality of this new learning framework.
Abstract: We present an approach to learn a dense pixel-wise labeling from image-level tags. Each image-level tag imposes constraints on the output labeling of a Convolutional Neural Network (CNN) classifier. We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i.e. predicted label distribution) of a CNN. Our loss formulation is easy to optimize and can be incorporated directly into standard stochastic gradient descent optimization. The key idea is to phrase the training objective as a biconvex optimization for linear models, which we then relax to nonlinear deep networks. Extensive experiments demonstrate the generality of our new learning framework. The constrained loss yields state-of-the-art results on weakly supervised semantic image segmentation. We further demonstrate that adding slightly more supervision can greatly improve the performance of the learning algorithm.

649 citations

Book
10 Aug 2012
TL;DR: This survey describes conditional random fields, a popular probabilistic method for structured prediction, and describes methods for inference and parameter estimation for CRFs, including practical issues for implementing large-scale CRFs.
Abstract: Many tasks involve predicting a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling. They combine the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This survey describes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in many areas, including natural language processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large-scale CRFs. We do not assume previous knowledge of graphical modeling, so this survey is intended to be useful to practitioners in a wide variety of fields.

627 citations