Institution

INESC-ID

Nonprofit•Lisbon, Portugal•

About: INESC-ID is a nonprofit organization based out in Lisbon, Portugal. It is known for research contribution in the topics: Computer science & Context (language use). The organization has 932 authors who have published 2618 publications receiving 37658 citations.

...read moreread less

Topics: Computer science, Context (language use), Field-programmable gate array, Control theory, Adaptive control ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

An Empirical Comparison of Text Categorization Methods

[...]

Ana Cardoso-Cachopo¹, Ana Cardoso-Cachopo², Arlindo L. Oliveira¹, Arlindo L. Oliveira²•Institutions (2)

Instituto Superior Técnico¹, INESC-ID²

08 Oct 2003

TL;DR: A comprehensive comparison of the performance of a number of text categorization methods in two different data sets is presented, in particular, the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of theVector and LSA models.

...read moreread less

Abstract: In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models.

...read moreread less

61 citations

Posted Content•

Multi-View Learning over Structured and Non-Identical Outputs

[...]

Kuzman Ganchev¹, João Graça², John Blitzer³, Ben Taskar¹•Institutions (3)

University of Pennsylvania¹, INESC-ID², Microsoft³

13 Jun 2012-arXiv: Learning

TL;DR: In this article, a probabilistic multi-view learning algorithm is proposed for structured and unstructured problems and easily generalizes to partial agreement scenarios, where instances can be factored into multiple views, each of which is nearly sufficent in determining the correct labels.

...read moreread less

Abstract: In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficent in determining the correct labels. In this paper we present a new algorithm for probabilistic multi-view learning which uses the idea of stochastic agreement between views as regularization. Our algorithm works on structured and unstructured problems and easily generalizes to partial agreement scenarios. For the full agreement case, our algorithm minimizes the Bhattacharyya distance between the models of each view, and performs better than CoBoosting and two-view Perceptron on several flat and structured classification problems.

...read moreread less

61 citations

Proceedings Article•

Sparsity in Dependency Grammar Induction

[...]

Jennifer Gillenwater¹, Kuzman Ganchev¹, João Graça², Fernando Pereira³, Ben Taskar¹ - Show less +1 more•Institutions (3)

University of Pennsylvania¹, INESC-ID², Google³

11 Jul 2010

TL;DR: This work investigates sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007) and shows that its approach improves on several other state-of-the-art techniques.

...read moreread less

Abstract: A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007). In experiments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.

...read moreread less

60 citations

Journal Article•DOI•

Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts

[...]

Fernando Batista¹, Helena Moniz¹, Isabel Trancoso¹, Nuno J. Mamede¹•Institutions (1)

INESC-ID¹

01 Feb 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems, using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage.

...read moreread less

Abstract: This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage. Reported experiments were conducted both over Portuguese and English broadcast news data. Both force aligned and automatic transcripts were used, allowing to measure the impact of the speech recognition errors. Capitalized words and named entities are intrinsically related, and are influenced by time variation effects. For that reason, the so-called language dynamics have been addressed for the capitalization task. Language adaptation results indicate, for both languages, that the capitalization performance is affected by the temporal distance between the training and testing data. In what regards the punctuation task, this paper covers the three most frequent punctuation marks: full stop, comma, and question marks. Different methods were explored for improving the baseline results for full stop and comma. The first uses punctuation information extracted from large written corpora. The second applies different levels of linguistic structure, including lexical, prosodic, and speaker related features. The comma detection improved significantly in the first method, thus indicating that it depends more on lexical features. The second method provided even better results, for both languages and both punctuation marks, best results being achieved mainly for full stop. As for question marks, there is a small gain, but differences are not very significant, due to the relatively small number of question marks in the corpora.

...read moreread less

60 citations

Journal Article•DOI•

Cloud-TM: harnessing the cloud with distributed transactional memories

[...]

Paolo Romano¹, Luís Rodrigues¹, Nuno Carvalho¹, João Cachopo¹•Institutions (1)

INESC-ID¹

14 Apr 2010-Operating Systems Review

TL;DR: This paper identifies where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and points several open research problems whose solution is deemed as essential to materialize the Cloud-TM vision.

...read moreread less

Abstract: One of the main challenges to harness the potential of Cloud computing is the design of programming models that simplify the development of large-scale parallel applications and that allow ordinary programmers to take full advantage of the computing power and the storage provided by the Cloud, both of which made available, on demand, in a pay-only-forwhat-you-use pricing model.In this paper, we discuss the use of the Transactional Memory programming model in the context of the cloud computing paradigm, which we refer to as Cloud-TM. We identify where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and we point several open research problems whose solution we deem as essential to materialize the Cloud-TM vision.

...read moreread less

60 citations

Collapse

Authors

Showing all 967 results

Name	H-index	Papers	Citations
João Carvalho	126	1278	77017
Jaime G. Carbonell	72	496	31267
Chris Dyer	71	240	32739
Joao P. S. Catalao	68	1039	19348
Muhammad Bilal	63	720	14720
Alan W. Black	61	413	19215
João Paulo Teixeira	60	636	19663
Bhiksha Raj	51	359	13064
Joao Marques-Silva	48	289	9374
Paulo Flores	48	321	7617
Ana Paiva	47	472	9626
Miadreza Shafie-khah	47	450	8086
Susana Cardoso	44	400	7068
Mark J. Bentum	42	226	8347
Joaquim Jorge	41	290	6366