scispace - formally typeset
Search or ask a question
Institution

INESC-ID

NonprofitLisbon, Portugal
About: INESC-ID is a nonprofit organization based out in Lisbon, Portugal. It is known for research contribution in the topics: Computer science & Context (language use). The organization has 932 authors who have published 2618 publications receiving 37658 citations.


Papers
More filters
Proceedings ArticleDOI
05 Jun 2013
TL;DR: A new Application-Specific Instruction-set Processor (ASIP) architecture for biological sequences alignment is proposed in this manuscript, which achieves high processing throughputs by exploiting both fine and coarse-grained parallelism.
Abstract: A new Application-Specific Instruction-set Processor (ASIP) architecture for biological sequences alignment is proposed in this manuscript. This architecture achieves high processing throughputs by exploiting both fine and coarse-grained parallelism. The former is achieved by extending the Instruction Set Architecture (ISA) of a synthesizable processor to include multiple specialized SIMD instructions that implement vector-vector and vector-scalar arithmetic, logic, load/store and control operations. Coarse-grained parallelism is achieved by using multiple cores to cooperatively align multiple sequences in a shared memory architecture, comprising proper hardware-specific synchronization mechanisms. To ease the programming, a compilation framework based on an adaptation of the GCC back-end was also implemented. The proposed system was prototyped and evaluated on a Xilinx Virtex-7 FPGA, achieving a 200MHz working frequency. A sequential and a state-of-theart SIMD implementations of the Smith-Waterman algorithm were programmed in both the proposed ASIP and an Intel Core i7 processor. When comparing the achieved speedups, it was observed that the proposed ISA achieves a 40x speedup, which contrasts with the 11x speedup provided by SSE2 in the Intel Core i7 processor. The scalability of the multi-core system was also evaluated and proved to scale almost linearly with the number of cores.

13 citations

Proceedings ArticleDOI
01 Aug 2019
TL;DR: A comprehensive study to reassess the effects of combining Dynamic Slicing with Spectrumbased Fault Localization finds that the DS-SFL combination was practical and effective and should be encouraged to be evaluated against that optimization.
Abstract: Several approaches have been proposed to reduce debugging costs through automated software fault diagnosis. Dynamic Slicing (DS) and Spectrum-based Fault Localization (SFL) are popular fault diagnosis techniques and normally seen as complementary. This paper reports on a comprehensive study to reassess the effects of combining DS with SFL. With this combination, components that are often involved in failing but seldom in passing test runs could be located and their suspiciousness reduced. Results show that the DS-SFL combination, coined as Tandem-FL, improves the diagnostic accuracy up to 73.7% (13.4% on average). Furthermore, results indicate that the risk of missing faulty statements, which is a DS?s key limitation, is not high ? DS misses faulty statements in 9% of the 260 cases. To sum up, we found that the DS-SFL combination was practical and effective and encourage new SFL techniques to be evaluated against that optimization.

13 citations

Proceedings ArticleDOI
11 May 2008
TL;DR: A Rayleigh fading mobile-to- mobile channel simulator based on a modified Karhunen-Loeve orthogonal expansion of a complex Gaussian fading process that demonstrates a good agreement with the theory and a slight improvement relative to the IFFT method.
Abstract: This paper presents a Rayleigh fading mobile-to- mobile channel simulator based on a modified Karhunen-Loeve orthogonal expansion of a complex Gaussian fading process. The method is similar to the well-known IFFT method but with a different frequency mask. The simulation accuracy is assessed by the computation of power margins (for the autocorrelation) and of Kullback-Leibler divergence (for the envelope probability density function) of the simulated fading process. The results demonstrate a good agreement with the theory and a slight improvement relative to the IFFT method.

13 citations

Proceedings ArticleDOI
29 Jun 2010
TL;DR: The resulting implementation of FastICA, an ICA algorithm, on a multicore GPU achieved an overall speedup of 55 for estimating 256 independent components, each with 1000 samples, regarding the implementation on a general purpose processor running at 2 GHz.
Abstract: Several problems in the signal processing field require generating suitable representations of data. One possible form of representation is given by independent component analysis (ICA). The computation of these representations can be quite expensive, especially if large datasizes are used. Over the last few years graphics processing units (GPUs) have emerged as inexpensive general-purpose computation accelerators. This paper presents an implementation of FastICA, an ICA algorithm, on a multicore GPU. The resulting implementation achieved an overall speedup of 55 for estimating 256 independent components, each with 1000 samples, regarding the implementation on a general purpose processor running at 2 GHz.

13 citations

Proceedings ArticleDOI
Nuno Diegues1, Paolo Romano1
30 Sep 2013
TL;DR: Bumper can boost performance up to 3x in conflict-intensive workloads, while imposing negligible overheads in uncontended scenarios, and is integrated with SCORe, a recent, highly-scalable genuine partial replication protocol.
Abstract: This paper addresses the issue of maximizing the efficiency and scalability of distributed transactional platforms, by introducing Bumper, a set of innovative techniques to minimize aborts of transactions in high-contention scenarios. At its core, Bumper relies on two key ideas: (1) sparing update transactions from spurious aborts when they access concurrently updated data, by attempting to serialize them in the past via a novel distributed concurrency control scheme that we call Distributed Time-Warping (DTW), and (2) avoiding aborts due to contention hot spots (that cannot be tackled by DTW) via a novel programming abstraction, called delayed actions, which allows to efficiently serialize, in an abort-free fashion, the execution of conflict-prone data manipulations. The techniques used in Bumper can be applied to a wide variety of transactional replication protocols to enhance their performance in contention intensive workloads. In this paper we show how they can be integrated with SCORe, a recent, highly-scalable genuine partial replication protocol. By means of an extensive evaluation using well-known benchmarks and a cluster of 160 nodes, we show that Bumper can boost performance up to 3x in conflict-intensive workloads, while imposing negligible (2.5%) overheads in uncontended scenarios.

13 citations


Authors

Showing all 967 results

NameH-indexPapersCitations
João Carvalho126127877017
Jaime G. Carbonell7249631267
Chris Dyer7124032739
Joao P. S. Catalao68103919348
Muhammad Bilal6372014720
Alan W. Black6141319215
João Paulo Teixeira6063619663
Bhiksha Raj5135913064
Joao Marques-Silva482899374
Paulo Flores483217617
Ana Paiva474729626
Miadreza Shafie-khah474508086
Susana Cardoso444007068
Mark J. Bentum422268347
Joaquim Jorge412906366
Network Information
Related Institutions (5)
Carnegie Mellon University
104.3K papers, 5.9M citations

88% related

Eindhoven University of Technology
52.9K papers, 1.5M citations

88% related

Microsoft
86.9K papers, 4.1M citations

88% related

Vienna University of Technology
49.3K papers, 1.3M citations

86% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
202311
202252
202196
2020131
2019133
2018126