scispace - formally typeset
Search or ask a question

Showing papers in "Microprocessors and Microsystems in 2012"


Journal ArticleDOI
TL;DR: It is found that SpMV is very sensitive to the application of reordering techniques on GPUs, and in most of the cases, reordered matrices outperform the original ones, showing noticeable speedups up to 2.6x.

47 citations


Journal ArticleDOI
TL;DR: It is established that like mesh and BFT, MoT can also be applied in designing NoC based systems and the limitations of MoT and other tree based topologies in NoC design in current technology are focused on.

45 citations


Journal ArticleDOI
TL;DR: New two-step methods of FSMs synthesis for PAL-based CPLDs are presented in the paper and aim at area and speed optimization.

35 citations


Journal ArticleDOI
TL;DR: A hardware implementation of scan-matching genetic SLAM (SMG-SLAM) using an field programmable gate array (FPGA) is presented, showing that it is up to 14.83 times faster compared to the software algorithm without significant loss in accuracy.

35 citations


Journal ArticleDOI
TL;DR: This paper presents a novel architecture for implementing multi-layer perceptron (MLP) neural networks on field programmable gate arrays (FPGA) that allows variable degrees of parallelism in order to achieve the best balance between performance and FPGA resources usage.

30 citations


Journal ArticleDOI
TL;DR: Results show that the presented fault detection technique enhances observability and thus error detection abilities in microprocessor-based systems without requiring modifications on the core architecture.

25 citations


Journal ArticleDOI
TL;DR: A complete fast low-cost stereo vision system that performs stereo image rectification with tangential and radial distortion removal, computes dense disparity maps using the Sum of Absolute Differences as the dissimilarity metric, and exploits a novel injective consistency check purpose-designed for eliminating unreliable disparity values is proposed.

24 citations


Journal ArticleDOI
TL;DR: This paper deals with the design and implementation on FPGA of a receiver for OFDM-based WLAN, particularized for IEEE 802.11a/g standards, to maintain the PER loss below 0.5dB for a PER=10^-^2, 64-QAM and error correction.

22 citations


Journal ArticleDOI
TL;DR: A new deterministic routing algorithm (called TRANC) is proposed that uses only one virtual channel per physical channel in torus NoCs and an algorithmic mapping that enables extracting TRANC-based routing algorithms from existing routing algorithms, which can be both deterministic and adaptive.

20 citations


Journal ArticleDOI
TL;DR: A review of the different CUDA architectures, including Fermi, and optimize a set of algorithms for each using widely-known optimization techniques to guide developers on the right path towards efficient code optimization.

20 citations


Journal ArticleDOI
TL;DR: This is the first work to exchange congestion information locally and globally to improve network utilization and transmission quality and the experimental results showed significant improvement in transfer latency, network throughput and power efficiency with moderate hardware cost overhead.

Journal ArticleDOI
TL;DR: A pheromone tracking strategy is proposed in this paper in order to reduce communication energy in the adaptive tree-based multicast routing method.

Journal ArticleDOI
TL;DR: A semi-automatic code generation process where the arithmetic operator is identified and generated and its pipeline information is used to reschedule the initial program execution in order to keep the operator's pipeline as ''busy'' as possible, while minimizing memory access.

Journal ArticleDOI
TL;DR: A novel multi-objective formulation to consider the thermal and performance constraints in the optimization approach, an efficient Mixed Integer Linear Programming representation of the floorplanning model, and a smooth integration of the MILP model with an accurate thermal modelling of the architecture are proposed.

Journal ArticleDOI
TL;DR: A hardware architecture is developed that significantly accelerates the execution performance of the PSO algorithm and is shown to be modular, flexible and reusable for solving different optimization problems.

Journal ArticleDOI
TL;DR: An evolvable algorithm has been developed providing the ability to generate the netlist of the requested CNN in any desired dimension through a very simple procedure, which greatly simplifies the network design process, without the requirement of any relative design knowledge.

Journal ArticleDOI
TL;DR: This paper presents an object-oriented approach to cope with the HW/SW integration problem in SoCs using the Object-Oriented Communication Engine, a system-level middleware particularly designed for SoCs which provides a high-level and homogeneous view of the system components based on the Distributed Object paradigm.

Journal ArticleDOI
TL;DR: These techniques exploit pixel equality and similarity in a video frame by performing a small number of comparisons among pixels used in prediction equations before the intra prediction process to reduce the amount of computations performed by H.264 intra prediction hardware.

Journal ArticleDOI
TL;DR: The performance analysis and comparison of 2x4 network on chip (NoC) topology shows that 2D Torus topology can achieve higher throughput and lower average network latency in occupying fewer resources.

Journal ArticleDOI
TL;DR: A low power multi-rate decoder hardware for low density parity check (LDPC) codes used in IEEE 802.11n wireless Local Area Network standard is presented and two novel techniques, sub-matrix reordering and differential shifting, are proposed for reducing the power consumption.

Journal ArticleDOI
TL;DR: This work presents a new heterogeneous tree-based ASIF, a modified form of heterogeneous FPGA which is designed to explore the solution space between FPGAs and ASICs, and results show that, on average, the best ASIF generation technique gives 70% area gain when compared to an equivalentFPGA architecture.

Journal ArticleDOI
TL;DR: A distributed fault-tolerant routing methodology for mesh networks is proposed which supports static and dynamic fault model, and unlike most previous methods that support dynamic fault models, the presented method is able to tolerate any number of faults with any shapes of fault regions without disabling healthy nodes.

Journal ArticleDOI
TL;DR: A novel hardware task model and an optimal 2D surface partitioning strategy for managing a partially run time reconfigurable hardware resource are proposed and an online real time operating system scheduler that supports true hardware multitasking is presented.

Journal ArticleDOI
TL;DR: New hybrid Solid-State Disk (SSD) architecture to combine Phase-change Memory (PRAM) and NAND Flash memory to achieve high-performance and experimental results show that the proposed scheme shows up to 140% performance improvement without endurance problem in PRAM in write-intensive workloads.

Journal ArticleDOI
TL;DR: Two novel architectures are proposed for multi-modulus adders that support the most common moduli cases in RNS channels, that is, modulo 2^n-1,2^n and 2^ n+1.

Journal ArticleDOI
TL;DR: This manuscript presents a new, flexible and scalable hardware accelerator architecture to speedup the implementation of the frequently used Smith-Waterman algorithm, and shows that the proposed approach allows the processing of larger DNA sequences in memory restricted environments.

Journal ArticleDOI
TL;DR: An adaptive image compression system in FPGA where optimized memory architecture, parallel processing and optimized task scheduling allow reducing the time of evolution and the quality of compression is maintained with respect to existing implementations.

Journal ArticleDOI
TL;DR: VLSI-DSP based a real time solution for Digital Scan Conversion (DSC) and speckle reduced imaging (SRI) of an ultrasonography (USG) are proposed and a new interpolation algorithm is proposed to reduce Moire artifact.

Journal ArticleDOI
TL;DR: This work proposes, for the first time, a hardware co-processor called UWJSP (Uncertain data Window Join Special co-Processor) for implementation that can achieve an order of magnitude improvement over a software implementation.

Journal ArticleDOI
TL;DR: This paper adopts the bitmask-based scheme and replaces some of its dictionary entries to achieve greatly reduced power consumption while maintaining a competitive compression ratio.