Showing papers by "INESC-ID published in 2010"

PDF

Open Access

Journal Article•

Posterior Regularization for Structured Latent Variable Models

[...]

Kuzman Ganchev¹, João Graça², Jennifer Gillenwater², Ben Taskar¹•Institutions (2)

01 Mar 2010-Journal of Machine Learning Research

TL;DR: This work presents an efficient algorithm for learning with posterior regularization and illustrates its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.

...read moreread less

Abstract: We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.

...read moreread less

570 citations

Proceedings Article•DOI•

A comparison of ray pointing techniques for very large displays

[...]

Ricardo Jota¹, Miguel A. Nacenta², Joaquim Jorge¹, Sheelagh Carpendale², Saul Greenberg² - Show less +1 more•Institutions (2)

INESC-ID¹, University of Calgary²

31 May 2010

TL;DR: This work tests four ray pointing variants on a wall display, and shows that techniques based on 'rotational control' perform better for targeting tasks, and techniques with low parallax are best for tracing tasks.

...read moreread less

Abstract: Ray-pointing techniques are often advocated as a way for people to interact with very large displays from several meters away. We are interested in two factors that can affect ray pointing: the particular technique's control type, and parallax. Consequently, we tested four ray pointing variants on a wall display that covers a large part of the user's field of view. Tasks included horizontal and vertical targeting, and tracing. Our results show that (a) techniques based on 'rotational control' perform better for targeting tasks, and (b) techniques with low parallax are best for tracing tasks. We also show that a Fitts's law analysis based on angles (as opposed to linear distances) better approximates people's ray pointing performance.

...read moreread less

119 citations

Journal Article•DOI•

Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm

[...]

Sara C. Madeira¹, Miguel C. Teixeira, Isabel Sá-Correia, Arlindo L. Oliveira²•Institutions (2)

University of Beira Interior¹, INESC-ID²

01 Jan 2010-IEEE/ACM Transactions on Computational Biology and Bioinformatics

TL;DR: Results obtained using the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress show the ability of the proposed methodology to extract relevant information compatible with documented biological knowledge but also the utility of using this algorithm in the study of other environmental stresses and of regulatory modules in general.

...read moreread less

Abstract: Although most biclustering formulations are NP-hard, in time series expression data analysis, it is reasonable to restrict the problem to the identification of maximal biclusters with contiguous columns, which correspond to coherent expression patterns shared by a group of genes in consecutive time points. This restriction leads to a tractable problem. We propose an algorithm that finds and reports all maximal contiguous column coherent biclusters in time linear in the size of the expression matrix. The linear time complexity of CCC-Biclustering relies on the use of a discretized matrix and efficient string processing techniques based on suffix trees. We also propose a method for ranking biclusters based on their statistical significance and a methodology for filtering highly overlapping and, therefore, redundant biclusters. We report results in synthetic and real data showing the effectiveness of the approach and its relevance in the discovery of regulatory modules. Results obtained using the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress show not only the ability of the proposed methodology to extract relevant information compatible with documented biological knowledge but also the utility of using this algorithm in the study of other environmental stresses and of regulatory modules in general.

...read moreread less

107 citations

Journal Article•DOI•

Candidate gene prioritization by network analysis of differential expression using machine learning approaches

[...]

Daniela Nitsch¹, Joana P. Gonçalves², Joana P. Gonçalves³, Joana P. Gonçalves¹, Fabian Ojeda¹, Bart De Moor¹, Yves Moreau¹ - Show less +3 more•Institutions (3)

Katholieke Universiteit Leuven¹, Instituto Superior Técnico², INESC-ID³

14 Sep 2010-BMC Bioinformatics

TL;DR: This study has proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation, and found that the best results were obtained using the Heat Kernel Diffusion Ranking.

...read moreread less

Abstract: Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.

...read moreread less

107 citations

Journal Article•DOI•

Virtual learning intervention to reduce bullying victimization in primary school : a controlled trial

[...]

Maria Sapouna¹, Dieter Wolke¹, Natalie Vannini², Scott Watson³, Sarah Woods³, Wolfgang Schneider², Sibylle Enz⁴, Lynne Hall⁵, Ana Paiva⁶, Elisabeth André⁷, Kerstin Dautenhahn³, Ruth Aylett⁸ - Show less +8 more•Institutions (8)

University of Warwick¹, University of Würzburg², University of Hertfordshire³, University of Bamberg⁴, University of Sunderland⁵, INESC-ID⁶, Augsburg College⁷, Heriot-Watt University⁸

01 Jan 2010-Journal of Child Psychology and Psychiatry

TL;DR: A virtual learning intervention designed to help children experience effective strategies for dealing with bullying had a short-term effect on escaping victimization for a priori identified victims, and ashort-term overall prevention effect for UK children.

...read moreread less

Abstract: Background: Anti-bullying interventions to date have shown limited success in reducing victimization and have rarely been evaluated using a controlled trial design. This study examined the effects of the FearNot! anti-bullying virtual learning intervention on escaping victimization, and reducing overall victimization rates among primary school students using a nonrandomized controlled trial design. The program was designed to enhance the coping skills of children who are known to be, or are likely to be, victimized. Methods: One thousand, one hundred twenty-nine children (mean age 8.9 years) in 27 primary schools across the UK and Germany were assigned to the FearNot! intervention or the waiting control condition. The program consisted of three sessions, each lasting approximately 30 minutes over a three-week period. The participants were assessed on self-report measures of victimization before and one and four weeks after the intervention or the normal curriculum period. Results: In the combined sample, baseline victims in the intervention group were more likely to escape victimization at the first follow-up compared with baseline victims in the control group (adjusted RR, 1.41; 95% CI, 1.02–1.81). A dose–response relationship between the amount of active interaction with the virtual victims and escaping victimization was found (adjusted OR, 1.09; 95% CI, 1.003–1.18). Subsample analyses found a significant effect on escaping victimization only to hold for UK children (adjusted RR, 1.90; CI, 1.23–2.57). UK children in the intervention group experienced decreased victimization rates at the first follow-up compared with controls, even after adjusting for baseline victimization, gender and age (adjusted RR, .60; 95% CI, .36–.93). Conclusions: A virtual learning intervention designed to help children experience effective strategies for dealing with bullying had a short-term effect on escaping victimization for a priori identified victims, and a short-term overall prevention effect for UK children. Keywords: Anti-bullying intervention, victimization, virtual learning, controlled trial. Abbreviation: FearNot!: Fun with Empathic Agents to achieve Novel Outcomes in Teaching. School bullying, defined as intentional and repeated aggression towards weaker peers, is a widespread phenomenon that is most prevalent among primary school children (Olweus, 1993). In particular, bullying victimization is associated with behavior and school adjustment problems, high levels of depression and anxiety, and poor physical health (Arse

...read moreread less

98 citations

Journal Article•DOI•

Search algorithms for the multiple constant multiplications problem: Exact and approximate

[...]

Levent Aksoy¹, Ece Olcay Gunes¹, Paulo Flores²•Institutions (2)

Istanbul Technical University¹, INESC-ID²

01 Aug 2010-Microprocessors and Microsystems

TL;DR: The main contribution of this article is the proposal of an exact depth-first search algorithm that, using lower and upper bound values of the search space for the MCM problem instance, finds the minimum solution consuming less computational resources than the previously proposed exact breadth- first search algorithm.

...read moreread less

97 citations

Journal Article•DOI•

Challenges and trends in the development of a magnetoresistive biochip portable platform

[...]

Verónica C. Martins¹, J. Germano², Filipe A. Cardoso¹, Joana Loureiro¹, Susana Cardoso, Leonel Sousa¹, Leonel Sousa², Moisés Piedade¹, Moisés Piedade², Luís P. Fonseca¹, Paulo P. Freitas¹ - Show less +7 more•Institutions (2)

Instituto Superior Técnico¹, INESC-ID²

01 May 2010-Journal of Magnetism and Magnetic Materials

TL;DR: In this article, the magnetoresistive (MR) biochip concept has emerged a decade ago and since then considerable achievements were made in the field of bio-analytical assays.

...read moreread less

71 citations

Proceedings Article•DOI•

Age and gender classification using fusion of acoustic and prosodic features.

[...]

Hugo Meinedo¹, Isabel Trancoso¹•Institutions (1)

INESC-ID¹

26 Sep 2010

TL;DR: The L2F Age classification system and the Gender classification system are composed respectively by the fusion of four and six individual sub-systems trained with short and long term acoustic and prosodic features, different classification strategies and using four different speech corpora.

...read moreread less

Abstract: This paper presents a description of the INESC-ID Spoken Language Systems Laboratory (L2F) Age and Gender classification system submitted to the INTERSPEECH 2010 Paralinguistic Challenge. The L2F Age classification system and the Gender classification system are composed respectively by the fusion of four and six individual sub-systems trained with short and long term acoustic and prosodic features, different classification strategies (GMM-UBM, MLP and SVM) and using four different speech corpora. The best results obtained by the calibration and linear logistic regression fusion back-end show an absolute improvement of 4.1% on the unweighted accuracy value for the Age and 5.8% for the Gender when compared to the competition baseline systems in the development set.

...read moreread less

65 citations

Proceedings Article•

Sparsity in Dependency Grammar Induction

[...]

Jennifer Gillenwater¹, Kuzman Ganchev¹, João Graça², Fernando Pereira³, Ben Taskar¹ - Show less +1 more•Institutions (3)

University of Pennsylvania¹, INESC-ID², Google³

11 Jul 2010

TL;DR: This work investigates sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007) and shows that its approach improves on several other state-of-the-art techniques.

...read moreread less

Abstract: A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graca et al. (2007). In experiments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.

...read moreread less

60 citations

Journal Article•DOI•

Cloud-TM: harnessing the cloud with distributed transactional memories

[...]

Paolo Romano¹, Luís Rodrigues¹, Nuno Carvalho¹, João Cachopo¹•Institutions (1)

INESC-ID¹

14 Apr 2010-Operating Systems Review

TL;DR: This paper identifies where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and points several open research problems whose solution is deemed as essential to materialize the Cloud-TM vision.

...read moreread less

Abstract: One of the main challenges to harness the potential of Cloud computing is the design of programming models that simplify the development of large-scale parallel applications and that allow ordinary programmers to take full advantage of the computing power and the storage provided by the Cloud, both of which made available, on demand, in a pay-only-forwhat-you-use pricing model.In this paper, we discuss the use of the Transactional Memory programming model in the context of the cloud computing paradigm, which we refer to as Cloud-TM. We identify where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and we point several open research problems whose solution we deem as essential to materialize the Cloud-TM vision.

...read moreread less

60 citations

Proceedings Article•DOI•

Analytical modeling of lock-based concurrency control with arbitrary transaction data access patterns

[...]

Pierangelo Di Sanzo, Roberto Palmieri, Bruno Ciciani, Francesco Quaglia, Paolo Romano¹ - Show less +1 more•Institutions (1)

INESC-ID¹

28 Jan 2010

TL;DR: This article presents an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results and captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases.

...read moreread less

Abstract: Nowadays the 2-Phase-Locking (2PL) concurrency control algorithm still plays a core rule in the construction of transactional systems (e.g. database systems and transactional memories). Hence, any technique allowing accurate analysis and prediction of the performance of 2PL based systems can be of wide interest and applicability. In this article we present an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results. In particular our model captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases. Also, our model provides significantly more accurate performance predictions in heavy contention scenarios, where the number of transactions enqueued due to conflicting lock requests is expected to be non-minimal. The accuracy of our model has been verified against simulation results based on both synthetic data access patterns and patterns derived from the TPC-C benchmark.

...read moreread less

Book•DOI•

Sketch-based Interfaces and Modeling

[...]

Joaquim Jorge¹, Faramarz Samavati²•Institutions (2)

INESC-ID¹, University of Calgary²

16 Dec 2010

TL;DR: This unique text/reference bridges the two complementary research areas of user interaction, and graphical modeling and construction (sketch-based modeling), and discusses the state of the art of this rapidly evolving field of sketch-based interfaces and modeling.

...read moreread less

Abstract: The field of sketch-based interfaces and modeling (SBIM) is concerned with developing methods and techniques to enable users to interact with a computer through sketching - a simple, yet highly expressive medium SBIM blends concepts from computer graphics, human-computer interaction, artificial intelligence, and machine learning Recent improvements in hardware, coupled with new machine learning techniques for more accurate recognition, and more robust depth inferencing techniques for sketch-based modeling, have resulted in an explosion of both sketch-based interfaces and pen-based computing devices Presenting the first coherent, unified overview of SBIM, this unique text/reference bridges the two complementary research areas of user interaction (sketch-based interfaces), and graphical modeling and construction (sketch-based modeling) The book discusses the state of the art of this rapidly evolving field, with contributions from an international selection of experts Also covered are sketch-based systems that allow the user to manipulate and edit existing data - from text, images, 3D shapes, and video - as opposed to modeling from scratch Topics and features: reviews pen/stylus interfaces to graphical applications that avoid reliance on user interface modes; describes systems for diagrammatic sketch recognition, mathematical sketching, and sketch-based retrieval of vector drawings; examines pen-based user interfaces for engineering and educational applications; presents a set of techniques for sketch recognition that rely strictly on spatial information; introduces the Teddy system; a pioneering sketching interface for designing free-form 3D models; investigates a range of advanced sketch-based systems for modeling and designing 3D objects, including complex contours, clothing, and hair-styles; explores methods for modeling from just a single sketch or using only a few strokes This text is an essential resource for researchers, practitioners and graduate students involved in human-factors and user interfaces, interactive computer graphics, and intelligent user interfaces and AI

...read moreread less

The L2F Broadcast News Speech Recognition System

[...]

Hugo Meinedo, Alberto Abad, Thomas Pellegrini, Isabel Trancoso¹, Inesc-Id Lisboa¹ - Show less +1 more•Institutions (1)

INESC-ID¹

01 Jan 2010

TL;DR: This work describes and evaluates the automatic speech recognition systems developed for two Iberian languages, European Portuguese and Spanish and also for Brazilian Portuguese, African Portuguese and English, and their efforts to port it to other languages and to other varieties of Portuguese, namely those spoken in the South American and African continents.

...read moreread less

Abstract: Broadcast news play an important role in our lives provid-ing access to news, information and entertainment. The ex-istence of an automatic transcription is an important mediumthat not only can provide subtitles for inclusion of people withspecial needs or be an advantage on noisy and populated envi-ronments, but also because it enables data search and retrievecapabilities over the multimedia streams. In this work we willdescribe and evaluate the automatic speech recognition systemsdeveloped for two Iberian languages, European Portuguese andSpanish and also for Brazilian Portuguese, African Portugueseand English. The developed systems are fully automatic andcapable to subtitling in real-time Broadcast News stream with avery small delay.Index Terms: Speech Recognition, Broadcast News, Iberianlanguages, Accent, Online processing 1. Introduction The Broadcast News (BN) processing system developed at theSpoken Language Systems Lab of INESC-ID integrates sev-eral core technologies, in a pipeline architecture: jingle detec-tion, audio segmentation, automatic speech recognition, punc-tuation, capitalization, topic segmentation/indexation, summa-rization, and translation. The ﬁrst modules of this system wereoptimized for on-line performance, given their deployment inthe fully automatic speech recognition subtitling system that isrunning on the main news shows of the public TV channel inPortugal (RTP), since March 2008.To our knowledge, the majority of subtitling systems de-scribed in the literature rely on speech-to-text alignment ratherthan full automatic speech recognition [1]. Re-speakers alsoare commonly used to simplify the original speech, and speechrecognition engines are adapted to the captioner voice [2].This paper concerns the third module in the pipeline -speech recognition, emphasizing the most recent improvements,and our efforts to port it to other languages (English and Span-ish), and to other varieties of Portuguese, namely those spokenin the South American and African continents.The development of a system for a new language is a chal-lenging task due to the need of new acoustic training data, vo-cabulary deﬁnition, lexicon generation and language model es-timation [3].The paper starts with a description of the main modules ofour recognition engine, emphasizing the two language indepen-dent components - feature extraction and decoder. The nextthree sections are devoted to the three varieties of Portuguesecovered by our system: the original one (European Portuguese,henceforth designated as EP), Brazilian Portuguese (BP), andAfrican Portuguese (AP). The porting efforts for the other twolanguages (European Spanish and American English) are de-scribed in Sections 6 and 7, respectively. For each of these sec-tions, we shall detail the corpora, vocabulary, and lexical andlanguage model generation, ending with performance results.The ﬁnal section discusses the main advantages and shortcom-ings of these systems, namely in what concerns real time closecaptioning applications.

...read moreread less

Book Chapter•DOI•

A Data Mining Approach for the Detection of High-Risk Breast Cancer Groups

[...]

Orlando Anunciação, Bruno Gomes, Susana Vinga¹, Jorge Gaspar, Arlindo L. Oliveira, José Rueff - Show less +2 more•Institutions (1)

INESC-ID¹

01 Jan 2010

TL;DR: It is shown that it is possible to find statistically significant associations with breast cancer by deriving a decision tree and selecting the best leaf, and permutation tests were used.

...read moreread less

Abstract: It is widely agreed that complex diseases are typically caused by the joint effects of multiple instead of a single genetic variation. These genetic variations may show very little effect individually but strong effect if they occur jointly, a phenomenon known as epistasis or multilocus interaction. In this work, we explore the applicability of decision trees to this problem. A case-control study was performed, composed of 164 controls and 94 cases with 32 SNPs available from the BRCA1, BRCA2 and TP53 genes. There was also information about tobacco and alcohol consumption. We used a Decision Tree to find a group with high-susceptibility of suffering from breast cancer. Our goal was to find one or more leaves with a high percentage of cases and small percentage of controls. To statistically validate the association found, permutation tests were used. We found a high-risk breast cancer group composed of 13 cases and only 1 control, with a Fisher Exact Test value of 9.7×10− 6. After running 10000 permutation tests we obtained a p-value of 0.017. These results show that it is possible to find statistically significant associations with breast cancer by deriving a decision tree and selecting the best leaf.

...read moreread less

Proceedings Article•DOI•

On Computing Backbones of Propositional Theories

[...]

Joao Marques-Silva¹, Mikoláš Janota², Inês Lynce²•Institutions (2)

University College Dublin¹, INESC-ID²

04 Aug 2010

TL;DR: This paper investigates algorithms for computing backbones of propositional theories, emphasizing the integration of these algorithms with modern SAT solvers and the experimental results indicate that propositional theory can have large backbones, often representing a significant percentage of the total number of variables.

...read moreread less

Abstract: Backbones of propositional theories are literals that are true in every model. Backbones have been used for characterizing the hardness of decision and optimization problems. Moreover, backbones find other applications. For example, backbones are often identified during product configuration. Backbones can also improve the efficiency of solving computational problems related with propositional theories. These include model enumeration, minimal model computation and prime implicant computation. This paper investigates algorithms for computing backbones of propositional theories, emphasizing the integration of these algorithms with modern SAT solvers. Experimental results, obtained on representative problem instances, indicate that the proposed algorithms are effective in practice and can be used for computing the backbones of large propositional theories. In addition, the experimental results indicate that propositional theories can have large backbones, often representing a significant percentage of the total number of variables.

...read moreread less

Book Chapter•DOI•

On the expressive power of primitives for compensation handling

[...]

Ivan Lanese¹, Cátia Vaz², Carla Ferreira³•Institutions (3)

University of Bologna¹, INESC-ID², Universidade Nova de Lisboa³

20 Mar 2010

TL;DR: This work defines an encoding of parallel recovery into static recovery enjoying nice compositionality properties, showing that the two approaches have the same expressive power.

...read moreread less

Abstract: Modern software systems have frequently to face unexpected events, reacting so to reach a consistent state. In the field of concurrent and mobile systems (e.g., for web services) the problem is usually tackled using long running transactions and compensations: activities programmed to recover partial executions of long running transactions. We compare the expressive power of different approaches to the specification of those compensations. We consider (i) static recovery, where the compensation is statically defined together with the transaction, (ii) parallel recovery, where the compensation is dynamically built as parallel composition of compensation elements and (iii) general dynamic recovery, where more refined ways of composing compensation elements are provided. We define an encoding of parallel recovery into static recovery enjoying nice compositionality properties, showing that the two approaches have the same expressive power. We also show that no such encoding of general dynamic recovery into static recovery is possible, i.e. general dynamic recovery is strictly more expressive.

...read moreread less

Proceedings Article•DOI•

Asynchronous lease-based replication of software transactional memory

[...]

Nuno Carvalho¹, Paolo Romano¹, Luís Rodrigues¹•Institutions (1)

INESC-ID¹

29 Nov 2010

TL;DR: Asynchronous Lease Certification (ALC), an innovative STM replication scheme that exploits the notion of asynchronous lease to reduce the replica coordination overhead and shelter transactions from repeated abortions due to conflicts originated on remote nodes is presented.

...read moreread less

Abstract: Software Transactional Memory (STM) systems have emerged as a powerful middleware paradigm for parallel programming. At current date, however, the problem of how to leverage replication to enhance dependability and scalability of STMs is still largely unexplored. In this paper we present Asynchronous Lease Certification (ALC), an innovative STM replication scheme that exploits the notion of asynchronous lease to reduce the replica coordination overhead and shelter transactions from repeated abortions due to conflicts originated on remote nodes. These features allow ALC to achieve up to a tenfold reduction of the commit latency phase in scenarios of low contention when compared with state of the art fault-tolerant replication schemes, and to boost the throughput of longruning transactions by a 4x factor in high conflict scenarios.

...read moreread less

Proceedings Article•DOI•

Apt-pbo: solving the software dependency problem using pseudo-boolean optimization

[...]

Paulo Trezentos¹, Inês Lynce², Arlindo L. Oliveira²•Institutions (2)

ISCTE – University Institute of Lisbon¹, INESC-ID²

20 Sep 2010

TL;DR: This work introduces the "apt-pbo" tool, the first publicly available tool that solves dependencies in a complete and optimal way and devising a way for solving dependencies according to available packages and user preferences.

...read moreread less

Abstract: The installation of software packages depends on the correct resolution of dependencies and conflicts between packages. This problem is NP-complete and, as expected, is a hard task. Moreover, today's technology still does not address this problem in an acceptable way. This paper introduces a new approach to solving the software dependency problem in a Linux environment, devising a way for solving dependencies according to available packages and user preferences. This work introduces the "apt-pbo" tool, the first publicly available tool that solves dependencies in a complete and optimal way.

...read moreread less

Proceedings Article•DOI•

AGGRO: Boosting STM Replication via Aggressively Optimistic Transaction Processing

[...]

Roberto Palmieri¹, Francesco Quaglia¹, Paolo Romano²•Institutions (2)

Sapienza University of Rome¹, INESC-ID²

15 Jul 2010

TL;DR: AGGRO as discussed by the authors is an innovative Optimistic Atomic Broadcast-based (OAB) active replication protocol that aims at maximizing the overlap between communication and processing through a novel AGGRessively Optimistic concurrency control scheme.

...read moreread less

Abstract: Software Transactional Memories (STMs) are emerging as a potentially disruptive programming model. In this paper we are address the issue of how to enhance dependability of STM systems via replication. In particular we present AGGRO, an innovative Optimistic Atomic Broadcast-based (OAB) active replication protocol that aims at maximizing the overlap between communication and processing through a novel AGGRessively Optimistic concurrency control scheme. The key idea underlying AGGRO is to propagate dependencies across uncommitted transactions in a controlled manner, namely according to a serialization order compliant with the optimistic message delivery order provided by the OAB service. Another relevant distinguishing feature of AGGRO is of not requiring a-priori knowledge about read/write sets of transactions, but rather to detect and handle conflicts dynamically, i.e. as soon (and only if) they materialize. Based on a detailed simulation study we show the striking performance gains achievable by AGGRO (up to 6x increase of the maximum sustainable throughput, and 75% response time reduction) compared to literature approaches for active replication of transactional systems.

...read moreread less

Journal Article•DOI•

Learning tractable word alignment models with complex constraints

[...]

João Graça¹, Kuzman Ganchev², Ben Taskar²•Institutions (2)

INESC-ID¹, University of Pennsylvania²

01 Sep 2010-Computational Linguistics

TL;DR: This article uses the Posterior Regularization framework to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model, and presents an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints.

...read moreread less

Abstract: Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graca, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods.

...read moreread less

Proceedings Article•DOI•

Low-sensitivity to process variations aging sensor for automotive safety-critical applications

[...]

J. C. Vazquez¹, Victor Champac², Adriel Ziesemer¹, Ricardo Reis³, Isabel C. Teixeira¹, Márcia de Lourdes Bezerra dos Santos¹, João Paulo Teixeira¹ - Show less +3 more•Institutions (3)

INESC-ID¹, National Institute of Astrophysics, Optics and Electronics², Universidade Federal do Rio Grande do Sul³

19 Apr 2010

TL;DR: In this paper, circuit failure prediction by timing degradation is used to monitor semiconductor aging, which is a safety-critical problem in the automotive market and on-chip, on-line aging monitoring is proposed for safe operation.

...read moreread less

Abstract: In this paper, circuit failure prediction by timing degradation is used to monitor semiconductor aging, which is a safety-critical problem in the automotive market. Reliability and variability issues are worsening with device scaling down. For safe operation, we propose on-chip, on-line aging monitoring. A novel aging sensor (to be selectively inserted in key locations in the design and to be activated from time to time) is proposed. The aging sensor is a programmable delay sensor, allowing decision-making for several degrees of severity in the aging process. It detects abnormal delays, regardless of their origin. Hence, it can uncover “normal” aging (namely, due to NBTI) and delay faults due to physical defects activated by long circuit operation. The proposed aging sensor has been optimized to exhibit low sensitivity to PVT (Process, power supply Voltage and Temperature) variations. Moreover, the area overhead of the new architecture is significantly less than the one of other aging sensors presented in the literature. Simulation results with a 65 nm sensor design are presented, ascertaining its usefulness and its low sensitivity, in particular to process variations.

...read moreread less

Proceedings Article•DOI•

Predictive error detection by on-line aging monitoring

[...]

J. C. Vazquez¹, Victor Champac², Adriel Ziesemer¹, Ricardo Reis³, Jorge Semião¹, Isabel C. Teixeira¹, Márcia de Lourdes Bezerra dos Santos¹, João Paulo Teixeira¹ - Show less +4 more•Institutions (3)

INESC-ID¹, National Institute of Astrophysics, Optics and Electronics², Universidade Federal do Rio Grande do Sul³

05 Jul 2010

TL;DR: A predictive error detection methodology, based on monitoring of long-term performance degradation of semiconductor systems, and a programmable aging sensor that is optimized to exhibit low sensitivity to PVT variations.

...read moreread less

Abstract: The purpose of this paper is to present a predictive error detection methodology, based on monitoring of long-term performance degradation of semiconductor systems. Delay variation is used to sense timing degradation due to aging (namely, due to NBTI), or to physical defects activated by long lifetime operation, which may occur in safety-critical systems (automotive, health, space). Error is prevented by detecting critical paths abnormal (but not fatal) propagation delays. A monitoring procedure and a programmable aging sensor are proposed. The sensor is selectively inserted in key locations in the design and can be activated either on user's requirement, or at pre-defined situations (e.g., at power-up). The sensor is optimized to exhibit low sensitivity to PVT (Process, power supply Voltage and Temperature) variations. Sensor limitations are analysed. A new sensor architecture and a sensor insertion algorithm are proposed. Simulation results are presented with a ST 65 nm sensor design.

...read moreread less

Proceedings Article•DOI•

Parallel decimal multipliers using binary multipliers

[...]

Mário P. Véstias¹, Horácio C. Neto¹•Institutions (1)

INESC-ID¹

24 Mar 2010

TL;DR: This paper analyzes the tradeoffs involved in the design of a parallel decimal multiplier, for decimal operands with 8 and 16 digits, using existent coarse-grained embedded binary arithmetic blocks and indicates that the proposed parallel multipliers are very competitive when compared to decimal multipliers implemented with direct manipulation of BCD numbers.

...read moreread less

Abstract: Human-centric applications, like financial and commercial, depend on decimal arithmetic since the results must match exactly those obtained by human calculations. The IEEE-754 2008 standard for floating point arithmetic has definitely recognized the importance of decimal for computer arithmetic. A number of hardware approaches have already been proposed for decimal arithmetic operations, including addition, subtraction, multiplication and division. However, few efforts have been done to develop decimal IP cores able to take advantage of the binary multipliers available in most reconfigurable computing architectures. In this paper, we analyze the tradeoffs involved in the design of a parallel decimal multiplier, for decimal operands with 8 and 16 digits, using existent coarse-grained embedded binary arithmetic blocks. The proposed circuits were implemented in a Xilinx Virtex 4 FPGA. The results indicate that the proposed parallel multipliers are very competitive when compared to decimal multipliers implemented with direct manipulation of BCD numbers.

...read moreread less

Proceedings Article•DOI•

Leveraging parallel nesting in transactional memory

[...]

João Barreto¹, Aleksandar Dragojevic², Paulo Ferreira¹, Rachid Guerraoui², Michal Kapalka² - Show less +1 more•Institutions (2)

INESC-ID¹, École Polytechnique Fédérale de Lausanne²

09 Jan 2010

TL;DR: This paper addresses the intrinsic difficulty behind the support for parallel nesting in transactional memory, and proposes a novel solution that is the first practical solution to meet the lowest theoretical upper bound known for the problem.

...read moreread less

Abstract: Exploiting the emerging reality of affordable multi-core architectures goes through providing programmers with simple abstractions that would enable them to easily turn their sequential programs into concurrent ones that expose as much parallelism as possible. While transactional memory promises to make concurrent programming easy to a wide programmer community, current implementations either disallow nested transactions to run in parallel or do not scale to arbitrary parallel nesting depths. This is an important obstacle to the central goal of transactional memory, as programmers can only start parallel threads in restricted parts of their code.This paper addresses the intrinsic difficulty behind the support for parallel nesting in transactional memory, and proposes a novel solution that, to the best of our knowledge, is the first practical solution to meet the lowest theoretical upper bound known for the problem.Using a synthetic workload configured to test parallel transactions on a multi-core machine, a practical implementation of our algorithm yields substantial speed-ups (up to 22x with 33 threads) relatively to serial nesting, and shows that the time to start and commit transactions, as well as to detect conflicts, is independent of nesting depth.

...read moreread less

Proceedings Article•DOI•

A Machine Learning Approach to Performance Prediction of Total Order Broadcast Protocols

[...]

Maria Couceiro¹, Paolo Romano¹, Luís Rodrigues¹•Institutions (1)

INESC-ID¹

27 Sep 2010

TL;DR: This paper assesses the accuracy and efficiency of alternative machine learning methods including neural networks, support vector machines, and decision tree-based regression models, and proposes two heuristics for the feature selection phase that allow to reduce the execution time of the TOB protocols.

...read moreread less

Abstract: Total Order Broadcast (TOB) is a fundamental building block at the core of a number of strongly consistent, fault-tolerant replication schemes. While it is widely known that the performance of existing TOB algorithms varies greatly depending on the workload and deployment scenarios, the problem of how to forecast their performance in realistic settings is, at current date, still largely unexplored. In this paper we address this problem by exploring the possibility of leveraging on machine learning techniques for building, in a fully decentralized fashion, performance models of TOB protocols. Based on an extensive experimental study considering heterogeneous workloads and multiple TOB protocols, we assess the accuracy and efficiency of alternative machine learning methods including neural networks, support vector machines, and decision tree-based regression models. We propose two heuristics for the feature selection phase, that allow to reduce its execution time up to two orders of magnitude incurring in a very limited loss of prediction accuracy.

...read moreread less

Proceedings Article•DOI•

Improving ASR error detection with non-decoder based features.

[...]

Thomas Pellegrini¹, Isabel Trancoso¹•Institutions (1)

INESC-ID¹

26 Sep 2010

TL;DR: This study reports error detection experiments in large vocabulary automatic speech recognition (ASR) systems, by using statistical classifiers, and explored new features gathered from other knowledge sources than the decoder itself: a binary feature that compares outputs from two different ASR systems (word by word), a feature based on the number of hits of the hypothesized bigrams, obtained by queries entered into a very popular Web search engine.

...read moreread less

Abstract: This study reports error detection experiments in large vocabulary automatic speech recognition (ASR) systems, by using statistical classifiers We explored new features gathered from other knowledge sources than the decoder itself: a binary feature that compares outputs from two different ASR systems (word by word), a feature based on the number of hits of the hypothesized bigrams, obtained by queries entered into a very popular Web search engine, and finally a feature related to automatically infered topics at sentence and word levels Experiments were conducted on a European Portuguese broadcast news corpus The combination of baseline decoder-based features and two of these additional features led to significant improvements, from 1387% to 1216% classification error rate (CER) with a maximum entropy model, and from 1401% to 1239% CER with linear-chain conditional random fields, comparing to a baseline using only decoder-based features

...read moreread less

Book Chapter•DOI•

On the use of audio events for improving video scene segmentation

[...]

Panagiotis Sidiropoulos, Vasileios Mezaris, Ioannis Kompatsiaris, Hugo Meinedo¹, Miguel Bugalho¹, Isabel Trancoso¹ - Show less +2 more•Institutions (1)

INESC-ID¹

12 Apr 2010

TL;DR: This work deals with the problem of automatic temporal segmentation of a video into elementary semantic units known as scenes and builds upon a recently proposed audio-visual scene segmentation approach that involves the construction of multiple scene transition graphs that separately exploit information coming from different modalities.

...read moreread less

Abstract: This work deals with the problem of automatic temporal segmentation of a video into elementary semantic units known as scenes. Its novelty lies in the use of high-level audio information in the form of audio events for the improvement of scene segmentation performance. More specifically, the proposed technique is built upon a recently proposed audio-visual scene segmentation approach that involves the construction of multiple scene transition graphs (STGs) that separately exploit information coming from different modalities. In the extension of the latter approach presented in this work, audio event detection results are introduced to the definition of an audio-based scene transition graph, while a visual-based scene transition graph is also defined independently. The results of these two types of STGs are subsequently combined. The application of the proposed technique to broadcast videos demonstrates the usefulness of audio events for scene segmentation.

...read moreread less

Proceedings Article•

Developing a Deep Linguistic Databank Supporting a Collection of Treebanks: the CINTIL DeepGramBank

[...]

António Branco¹, Francisco Costa¹, João Ricardo Silva¹, Sara Botelho Silveira¹, Sergio Castro¹, Mariana Avelãs¹, Clara Pinto¹, João Graça² - Show less +4 more•Institutions (2)

University of Lisbon¹, INESC-ID²

01 May 2010

TL;DR: This paper reports on the design features, the development conditions and the methodological options of a deep linguistic databank, the CINTIL DeepGramBank, and how such corpus permits to straightforwardly obtain a whole range of past generation annotated corpora, current generation treebanks, and next generation databanks.

...read moreread less

Abstract: Corpora of sentences annotated with grammatical information have been deployed by extending the basic lexical and morphological data with increasingly complex information, such as phrase constituency, syntactic functions, semantic roles, etc. As these corpora grow in size and the linguistic information to be encoded reaches higher levels of sophistication, the utilization of annotation tools and, above all, supporting computational grammars appear no longer as a matter of convenience but of necessity. In this paper, we report on the design features, the development conditions and the methodological options of a deep linguistic databank, the CINTIL DeepGramBank. In this corpus, sentences are annotated with fully fledged linguistically informed grammatical representations that are produced by a deep linguistic processing grammar, thus consistently integrating morphological, syntactic and semantic information. We also report on how such corpus permits to straightforwardly obtain a whole range of past generation annotated corpora (POS, NER and morphology), current generation treebanks (constituency treebanks, dependency banks, propbanks) and next generation databanks (logical form banks) simply by means of a very residual selection/extraction effort to get the appropriate ""views"" exposing the relevant layers of information.

...read moreread less

Proceedings Article•DOI•

Programmable aging sensor for automotive safety-critical applications

[...]

J. C. Vazquez¹, Victor Champac², Isabel C. Teixeira³, Marcelino B. Santos³, João Paulo Teixeira³ - Show less +1 more•Institutions (3)

INESC-ID¹, National Institute of Astrophysics, Optics and Electronics², Instituto Superior Técnico³

08 Mar 2010

TL;DR: The purpose of this paper is to present a novel programmable nanometer aging sensor that allows several levels of circuit failure prediction and exhibits low sensitivity to PVT (Process, power supply Voltage and Temperature) variations.

...read moreread less

Abstract: Electronic systems for safety-critical automotive applications must operate for many years in harsh environments. Reliability issues are worsening with device scaling down, while performance and quality requirements are increasing. One of the key reliability issues is long-term performance degradation due to aging. For safe operation, aging monitoring should be performed on chip, namely using built-in aging sensors (activated from time to time). The purpose of this paper is to present a novel programmable nanometer aging sensor. The proposed aging sensor allows several levels of circuit failure prediction and exhibits low sensitivity to PVT (Process, power supply Voltage and Temperature) variations. Simulation results with a 65 nm sensor design are presented, that ascertain the usefulness of the proposed solution.

...read moreread less

Proceedings Article•DOI•

Context dependent modelling approaches for hybrid speech recognizers.

[...]

Alberto Abad¹, Thomas Pellegrini¹, Isabel Trancoso¹, João Paulo Neto¹•Institutions (1)

INESC-ID¹

26 Sep 2010

TL;DR: Most common triphone state clustering procedures for Gaussian models are compared and applied to a connectionist speech recognizer and developed systems with clustered context-dependent triphones show above 20% relative word error rate reduction compared to a baseline hybrid system.

...read moreread less

Abstract: Speech recognition based on connectionist approaches is one of the most successful alternatives to widespread Gaussian systems. One of the main claims against hybrid recognizers is the increased complexity for context-dependent phone modeling, which is a key aspect in medium to large size vocabulary tasks. In this paper, we investigate the use of context-dependent triphone models in a connectionist speech recognizer. Thus, most common triphone state clustering procedures for Gaussian models are compared and applied to our hybrid recognizer. The developed systems with clustered context-dependent triphones show above 20% relative word error rate reduction compared to a baseline hybrid system in two selected WSJ evaluation test sets. Additionally, the recent porting efforts of the proposed context modelling approaches to a LVCSR system for English Broadcast News transcription are reported. Index Terms: speech recognition, context modeling, connectionist system

...read moreread less

Collapse