Institution
INESC-ID
Nonprofit•Lisbon, Portugal•
About: INESC-ID is a nonprofit organization based out in Lisbon, Portugal. It is known for research contribution in the topics: Computer science & Context (language use). The organization has 932 authors who have published 2618 publications receiving 37658 citations.
Topics: Computer science, Context (language use), Field-programmable gate array, Control theory, Adaptive control
Papers published on a yearly basis
Papers
More filters
••
05 Jun 2013TL;DR: A new Application-Specific Instruction-set Processor (ASIP) architecture for biological sequences alignment is proposed in this manuscript, which achieves high processing throughputs by exploiting both fine and coarse-grained parallelism.
Abstract: A new Application-Specific Instruction-set Processor (ASIP) architecture for biological sequences alignment is proposed in this manuscript. This architecture achieves high processing throughputs by exploiting both fine and coarse-grained parallelism. The former is achieved by extending the Instruction Set Architecture (ISA) of a synthesizable processor to include multiple specialized SIMD instructions that implement vector-vector and vector-scalar arithmetic, logic, load/store and control operations. Coarse-grained parallelism is achieved by using multiple cores to cooperatively align multiple sequences in a shared memory architecture, comprising proper hardware-specific synchronization mechanisms. To ease the programming, a compilation framework based on an adaptation of the GCC back-end was also implemented. The proposed system was prototyped and evaluated on a Xilinx Virtex-7 FPGA, achieving a 200MHz working frequency. A sequential and a state-of-theart SIMD implementations of the Smith-Waterman algorithm were programmed in both the proposed ASIP and an Intel Core i7 processor. When comparing the achieved speedups, it was observed that the proposed ISA achieves a 40x speedup, which contrasts with the 11x speedup provided by SSE2 in the Intel Core i7 processor. The scalability of the multi-core system was also evaluated and proved to scale almost linearly with the number of cores.
13 citations
••
01 Aug 2019TL;DR: A comprehensive study to reassess the effects of combining Dynamic Slicing with Spectrumbased Fault Localization finds that the DS-SFL combination was practical and effective and should be encouraged to be evaluated against that optimization.
Abstract: Several approaches have been proposed to reduce debugging costs through
automated software fault diagnosis. Dynamic Slicing (DS) and Spectrum-based
Fault Localization (SFL) are popular fault diagnosis techniques and normally
seen as complementary. This paper reports on a comprehensive
study to reassess the effects of combining DS with SFL. With this
combination, components that are often involved in failing but seldom in passing
test runs could be located and their suspiciousness reduced.
Results show that the DS-SFL combination, coined
as Tandem-FL, improves the diagnostic accuracy up
to 73.7% (13.4% on average). Furthermore, results
indicate that the risk of missing faulty statements,
which is a DS?s key limitation, is not high ? DS
misses faulty statements in 9% of the 260 cases. To
sum up, we found that the DS-SFL combination
was practical and effective and encourage new SFL
techniques to be evaluated against that optimization.
13 citations
••
11 May 2008TL;DR: A Rayleigh fading mobile-to- mobile channel simulator based on a modified Karhunen-Loeve orthogonal expansion of a complex Gaussian fading process that demonstrates a good agreement with the theory and a slight improvement relative to the IFFT method.
Abstract: This paper presents a Rayleigh fading mobile-to- mobile channel simulator based on a modified Karhunen-Loeve orthogonal expansion of a complex Gaussian fading process. The method is similar to the well-known IFFT method but with a different frequency mask. The simulation accuracy is assessed by the computation of power margins (for the autocorrelation) and of Kullback-Leibler divergence (for the envelope probability density function) of the simulated fading process. The results demonstrate a good agreement with the theory and a slight improvement relative to the IFFT method.
13 citations
••
29 Jun 2010TL;DR: The resulting implementation of FastICA, an ICA algorithm, on a multicore GPU achieved an overall speedup of 55 for estimating 256 independent components, each with 1000 samples, regarding the implementation on a general purpose processor running at 2 GHz.
Abstract: Several problems in the signal processing field require generating suitable representations of data. One possible form of representation is given by independent component analysis (ICA). The computation of these representations can be quite expensive, especially if large datasizes are used. Over the last few years graphics processing units (GPUs) have emerged as inexpensive general-purpose computation accelerators. This paper presents an implementation of FastICA, an ICA algorithm, on a multicore GPU. The resulting implementation achieved an overall speedup of 55 for estimating 256 independent components, each with 1000 samples, regarding the implementation on a general purpose processor running at 2 GHz.
13 citations
••
30 Sep 2013TL;DR: Bumper can boost performance up to 3x in conflict-intensive workloads, while imposing negligible overheads in uncontended scenarios, and is integrated with SCORe, a recent, highly-scalable genuine partial replication protocol.
Abstract: This paper addresses the issue of maximizing the efficiency and scalability of distributed transactional platforms, by introducing Bumper, a set of innovative techniques to minimize aborts of transactions in high-contention scenarios. At its core, Bumper relies on two key ideas: (1) sparing update transactions from spurious aborts when they access concurrently updated data, by attempting to serialize them in the past via a novel distributed concurrency control scheme that we call Distributed Time-Warping (DTW), and (2) avoiding aborts due to contention hot spots (that cannot be tackled by DTW) via a novel programming abstraction, called delayed actions, which allows to efficiently serialize, in an abort-free fashion, the execution of conflict-prone data manipulations. The techniques used in Bumper can be applied to a wide variety of transactional replication protocols to enhance their performance in contention intensive workloads. In this paper we show how they can be integrated with SCORe, a recent, highly-scalable genuine partial replication protocol. By means of an extensive evaluation using well-known benchmarks and a cluster of 160 nodes, we show that Bumper can boost performance up to 3x in conflict-intensive workloads, while imposing negligible (2.5%) overheads in uncontended scenarios.
13 citations
Authors
Showing all 967 results
Name | H-index | Papers | Citations |
---|---|---|---|
João Carvalho | 126 | 1278 | 77017 |
Jaime G. Carbonell | 72 | 496 | 31267 |
Chris Dyer | 71 | 240 | 32739 |
Joao P. S. Catalao | 68 | 1039 | 19348 |
Muhammad Bilal | 63 | 720 | 14720 |
Alan W. Black | 61 | 413 | 19215 |
João Paulo Teixeira | 60 | 636 | 19663 |
Bhiksha Raj | 51 | 359 | 13064 |
Joao Marques-Silva | 48 | 289 | 9374 |
Paulo Flores | 48 | 321 | 7617 |
Ana Paiva | 47 | 472 | 9626 |
Miadreza Shafie-khah | 47 | 450 | 8086 |
Susana Cardoso | 44 | 400 | 7068 |
Mark J. Bentum | 42 | 226 | 8347 |
Joaquim Jorge | 41 | 290 | 6366 |