Institution
INESC-ID
Nonprofit•Lisbon, Portugal•
About: INESC-ID is a nonprofit organization based out in Lisbon, Portugal. It is known for research contribution in the topics: Computer science & Context (language use). The organization has 932 authors who have published 2618 publications receiving 37658 citations.
Topics: Computer science, Context (language use), Field-programmable gate array, Control theory, Adaptive control
Papers published on a yearly basis
Papers
More filters
••
08 Oct 2003TL;DR: A comprehensive comparison of the performance of a number of text categorization methods in two different data sets is presented, in particular, the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of theVector and LSA models.
Abstract: In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models.
61 citations
•
TL;DR: In this article, a probabilistic multi-view learning algorithm is proposed for structured and unstructured problems and easily generalizes to partial agreement scenarios, where instances can be factored into multiple views, each of which is nearly sufficent in determining the correct labels.
Abstract: In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficent in determining the correct labels. In this paper we present a new algorithm for probabilistic multi-view learning which uses the idea of stochastic agreement between views as regularization. Our algorithm works on structured and unstructured problems and easily generalizes to partial agreement scenarios. For the full agreement case, our algorithm minimizes the Bhattacharyya distance between the models of each view, and performs better than CoBoosting and two-view Perceptron on several flat and structured classification problems.
61 citations
•
11 Jul 2010TL;DR: This work investigates sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007) and shows that its approach improves on several other state-of-the-art techniques.
Abstract: A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007). In experiments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.
60 citations
••
TL;DR: This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems, using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage.
Abstract: This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage. Reported experiments were conducted both over Portuguese and English broadcast news data. Both force aligned and automatic transcripts were used, allowing to measure the impact of the speech recognition errors. Capitalized words and named entities are intrinsically related, and are influenced by time variation effects. For that reason, the so-called language dynamics have been addressed for the capitalization task. Language adaptation results indicate, for both languages, that the capitalization performance is affected by the temporal distance between the training and testing data. In what regards the punctuation task, this paper covers the three most frequent punctuation marks: full stop, comma, and question marks. Different methods were explored for improving the baseline results for full stop and comma. The first uses punctuation information extracted from large written corpora. The second applies different levels of linguistic structure, including lexical, prosodic, and speaker related features. The comma detection improved significantly in the first method, thus indicating that it depends more on lexical features. The second method provided even better results, for both languages and both punctuation marks, best results being achieved mainly for full stop. As for question marks, there is a small gain, but differences are not very significant, due to the relatively small number of question marks in the corpora.
60 citations
••
TL;DR: This paper identifies where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and points several open research problems whose solution is deemed as essential to materialize the Cloud-TM vision.
Abstract: One of the main challenges to harness the potential of Cloud computing is the design of programming models that simplify the development of large-scale parallel applications and that allow ordinary programmers to take full advantage of the computing power and the storage provided by the Cloud, both of which made available, on demand, in a pay-only-forwhat-you-use pricing model.In this paper, we discuss the use of the Transactional Memory programming model in the context of the cloud computing paradigm, which we refer to as Cloud-TM. We identify where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and we point several open research problems whose solution we deem as essential to materialize the Cloud-TM vision.
60 citations
Authors
Showing all 967 results
Name | H-index | Papers | Citations |
---|---|---|---|
João Carvalho | 126 | 1278 | 77017 |
Jaime G. Carbonell | 72 | 496 | 31267 |
Chris Dyer | 71 | 240 | 32739 |
Joao P. S. Catalao | 68 | 1039 | 19348 |
Muhammad Bilal | 63 | 720 | 14720 |
Alan W. Black | 61 | 413 | 19215 |
João Paulo Teixeira | 60 | 636 | 19663 |
Bhiksha Raj | 51 | 359 | 13064 |
Joao Marques-Silva | 48 | 289 | 9374 |
Paulo Flores | 48 | 321 | 7617 |
Ana Paiva | 47 | 472 | 9626 |
Miadreza Shafie-khah | 47 | 450 | 8086 |
Susana Cardoso | 44 | 400 | 7068 |
Mark J. Bentum | 42 | 226 | 8347 |
Joaquim Jorge | 41 | 290 | 6366 |