scispace - formally typeset
Search or ask a question
Institution

INESC-ID

NonprofitLisbon, Portugal
About: INESC-ID is a nonprofit organization based out in Lisbon, Portugal. It is known for research contribution in the topics: Computer science & Context (language use). The organization has 932 authors who have published 2618 publications receiving 37658 citations.


Papers
More filters
Book ChapterDOI
08 Oct 2003
TL;DR: A comprehensive comparison of the performance of a number of text categorization methods in two different data sets is presented, in particular, the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of theVector and LSA models.
Abstract: In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models.

61 citations

Posted Content
TL;DR: In this article, a probabilistic multi-view learning algorithm is proposed for structured and unstructured problems and easily generalizes to partial agreement scenarios, where instances can be factored into multiple views, each of which is nearly sufficent in determining the correct labels.
Abstract: In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficent in determining the correct labels. In this paper we present a new algorithm for probabilistic multi-view learning which uses the idea of stochastic agreement between views as regularization. Our algorithm works on structured and unstructured problems and easily generalizes to partial agreement scenarios. For the full agreement case, our algorithm minimizes the Bhattacharyya distance between the models of each view, and performs better than CoBoosting and two-view Perceptron on several flat and structured classification problems.

61 citations

Proceedings Article
11 Jul 2010
TL;DR: This work investigates sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007) and shows that its approach improves on several other state-of-the-art techniques.
Abstract: A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007). In experiments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.

60 citations

Journal ArticleDOI
TL;DR: This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems, using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage.
Abstract: This paper focuses on the tasks of recovering capitalization and punctuation marks from texts without that information, such as spoken transcripts, produced by automatic speech recognition systems. These two practical rich transcription tasks were performed using the same discriminative approach, based on maximum entropy, suitable for on-the-fly usage. Reported experiments were conducted both over Portuguese and English broadcast news data. Both force aligned and automatic transcripts were used, allowing to measure the impact of the speech recognition errors. Capitalized words and named entities are intrinsically related, and are influenced by time variation effects. For that reason, the so-called language dynamics have been addressed for the capitalization task. Language adaptation results indicate, for both languages, that the capitalization performance is affected by the temporal distance between the training and testing data. In what regards the punctuation task, this paper covers the three most frequent punctuation marks: full stop, comma, and question marks. Different methods were explored for improving the baseline results for full stop and comma. The first uses punctuation information extracted from large written corpora. The second applies different levels of linguistic structure, including lexical, prosodic, and speaker related features. The comma detection improved significantly in the first method, thus indicating that it depends more on lexical features. The second method provided even better results, for both languages and both punctuation marks, best results being achieved mainly for full stop. As for question marks, there is a small gain, but differences are not very significant, due to the relatively small number of question marks in the corpora.

60 citations

Journal ArticleDOI
TL;DR: This paper identifies where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and points several open research problems whose solution is deemed as essential to materialize the Cloud-TM vision.
Abstract: One of the main challenges to harness the potential of Cloud computing is the design of programming models that simplify the development of large-scale parallel applications and that allow ordinary programmers to take full advantage of the computing power and the storage provided by the Cloud, both of which made available, on demand, in a pay-only-forwhat-you-use pricing model.In this paper, we discuss the use of the Transactional Memory programming model in the context of the cloud computing paradigm, which we refer to as Cloud-TM. We identify where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and we point several open research problems whose solution we deem as essential to materialize the Cloud-TM vision.

60 citations


Authors

Showing all 967 results

NameH-indexPapersCitations
João Carvalho126127877017
Jaime G. Carbonell7249631267
Chris Dyer7124032739
Joao P. S. Catalao68103919348
Muhammad Bilal6372014720
Alan W. Black6141319215
João Paulo Teixeira6063619663
Bhiksha Raj5135913064
Joao Marques-Silva482899374
Paulo Flores483217617
Ana Paiva474729626
Miadreza Shafie-khah474508086
Susana Cardoso444007068
Mark J. Bentum422268347
Joaquim Jorge412906366
Network Information
Related Institutions (5)
Carnegie Mellon University
104.3K papers, 5.9M citations

88% related

Eindhoven University of Technology
52.9K papers, 1.5M citations

88% related

Microsoft
86.9K papers, 4.1M citations

88% related

Vienna University of Technology
49.3K papers, 1.3M citations

86% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
202311
202252
202196
2020131
2019133
2018126