scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions in Embedded Computing Systems in 2016"


Journal ArticleDOI
TL;DR: A survey of energy-aware scheduling algorithms proposed for real-time systems, showing how the proposed solutions evolved to address the evolution of the platform's features and needs from the middle 1990s until today.
Abstract: This article presents a survey of energy-aware scheduling algorithms proposed for real-time systems. The analysis presents the main results starting from the middle 1990s until today, showing how the proposed solutions evolved to address the evolution of the platform's features and needs. The survey first presents a taxonomy to classify the existing approaches for uniprocessor systems, distinguishing them according to the technology exploited for reducing energy consumption, that is, Dynamic Voltage and Frequency Scaling (DVFS), Dynamic Power Management (DPM), or both. Then, the survey discusses the approaches proposed in the literature to deal with the additional problems related to the evolution of computing platforms toward multicore architectures.

147 citations


Journal ArticleDOI
TL;DR: A System-on-Chip perspective is used to show how the CyberPhysical System- on-Chip (CPSoC) exemplar platform achieves self-awareness through a combination of cross-layer sensing, actuation, self-aware adaptations, and online learning.
Abstract: Embedded systems must address a multitude of potentially conflicting design constraints such as resiliency, energy, heat, cost, performance, security, etc., all in the face of highly dynamic operational behaviors and environmental conditions. By incorporating elements of intelligence, the hope is that the resulting “smart” embedded systems will function correctly and within desired constraints in spite of highly dynamic changes in the applications and the environment, as well as in the underlying software/hardware platforms. Since terms related to “smartness” (e.g., self-awareness, self-adaptivity, and autonomy) have been used loosely in many software and hardware computing contexts, we first present a taxonomy of “self-x” terms and use this taxonomy to relate major “smart” software and hardware computing efforts. A major attribute for smart embedded systems is the notion of self-awareness that enables an embedded system to monitor its own state and behavior, as well as the external environment, so as to adapt intelligently. Toward this end, we use a System-on-Chip perspective to show how the CyberPhysical System-on-Chip (CPSoC) exemplar platform achieves self-awareness through a combination of cross-layer sensing, actuation, self-aware adaptations, and online learning. We conclude with some thoughts on open challenges and research directions.

82 citations


Journal ArticleDOI
TL;DR: This article focuses on Intrusion Detection techniques and proposes a new attack-defense game model to detect malicious nodes using a repeated game approach and analyzes and proves the existence of pure Nash equilibrium and mixed Nash equilibrium.
Abstract: Cyber-Physical Embedded Systems (CPESs) are distributed embedded systems integrated with various actuators and sensors. When it comes to the issue of CPES security, the most significant problem is the security of Embedded Sensor Networks (ESNs). With the continuous growth of ESNs, the security of transferring data from sensors to their destinations has become an important research area. Due to the limitations in power, storage, and processing capabilities, existing security mechanisms for wired or wireless networks cannot apply directly to ESNs. Meanwhile, ESNs are likely to be attacked by different kinds of attacks in industrial scenarios. Therefore, there is a need to develop new techniques or modify the current security mechanisms to overcome these problems. In this article, we focus on Intrusion Detection (ID) techniques and propose a new attack-defense game model to detect malicious nodes using a repeated game approach. As a direct consequence of the game model, attackers and defenders make different strategies to achieve optimal payoffs. Importantly, error detection and missing detection are taken into consideration in Intrusion Detection Systems (IDSs), where a game tree model is introduced to solve this problem. In addition, we analyze and prove the existence of pure Nash equilibrium and mixed Nash equilibrium. Simulations show that the proposed model can both reduce energy consumption by up to 50% compared with the existing All Monitor (AM) model and improve the detection rate by up to 10% to 15% compared with the existing Cluster Head (CH) monitor model.

75 citations


Journal ArticleDOI
TL;DR: This work proposes two novel, zero-power timekeepers that use remanence decay to measure the time elapsed between power failures, and proposes hourglass-like timers that give intermittently powered sensing devices a persistent sense of time.
Abstract: Sensing platforms are becoming batteryless to enable the vision of the Internet of Things, where trillions of devices collect data, interact with each other, and interact with people. However, these batteryless sensing platforms—that rely purely on energy harvesting—are rarely able to maintain a sense of time after a power failure. This makes working with sensor data that is time sensitive especially difficult. We propose two novel, zero-power timekeepers that use remanence decay to measure the time elapsed between power failures. Our approaches compute the elapsed time from the amount of decay of a capacitive device, either on-chip Static Random-Access Memory (SRAM) or a dedicated capacitor. This enables hourglass-like timers that give intermittently powered sensing devices a persistent sense of time. Our evaluation shows that applications using either timekeeper can keep time accurately through power failures as long as 45s with low overhead.

71 citations


Journal ArticleDOI
TL;DR: New security issues related to CUDA, which is the most widespread platform for GPU computing, are reported on and details and proofs-of-concept are provided about novel vulnerabilities to which CUDA architectures are subject.
Abstract: Graphics processing units (GPUs) are increasingly common on desktops, servers, and embedded platforms. In this article, we report on new security issues related to CUDA, which is the most widespread platform for GPU computing. In particular, details and proofs-of-concept are provided about novel vulnerabilities to which CUDA architectures are subject. We show how such vulnerabilities can be exploited to cause severe information leakage. As a case study, we experimentally show how to exploit one of these vulnerabilities on a GPU implementation of the AES encryption algorithm. Finally, we also suggest software patches and alternative approaches to tackle the presented vulnerabilities.

71 citations


Journal ArticleDOI
TL;DR: A precise and resilient sensor fusion algorithm is developed that combines the data received from all sensors by taking into account their specified precisions, and it is shown that the precision of the algorithm using history is never worse than the no-history one, while the benefits may be significant.
Abstract: This article focuses on the design of safe and attack-resilient Cyber-Physical Systems (CPS) equipped with multiple sensors measuring the same physical variable. A malicious attacker may be able to disrupt system performance through compromising a subset of these sensors. Consequently, we develop a precise and resilient sensor fusion algorithm that combines the data received from all sensors by taking into account their specified precisions. In particular, we note that in the presence of a shared bus, in which messages are broadcast to all nodes in the network, the attacker’s impact depends on what sensors he has seen before sending the corrupted measurements. Therefore, we explore the effects of communication schedules on the performance of sensor fusion and provide theoretical and experimental results advocating for the use of the Ascending schedule, which orders sensor transmissions according to their precision starting from the most precise. In addition, to improve the accuracy of the sensor fusion algorithm, we consider the dynamics of the system in order to incorporate past measurements at the current time. Possible ways of mapping sensor measurement history are investigated in the article and are compared in terms of the confidence in the final output of the sensor fusion. We show that the precision of the algorithm using history is never worse than the no-history one, while the benefits may be significant. Furthermore, we utilize the complementary properties of the two methods and show that their combination results in a more precise and resilient algorithm. Finally, we validate our approach in simulation and experiments on a real unmanned ground robot.

65 citations


Journal ArticleDOI
TL;DR: A Dynamic Key-Length-Based Security Framework (DLSeF) based on a shared key derived from synchronized prime numbers; the key is dynamically updated at short intervals to thwart potential attacks to ensure end-to-end security.
Abstract: Applications in risk-critical domains such as emergency management and industrial control systems need near-real-time stream data processing in large-scale sensing networks. The key problem is how to ensure online end-to-end security (e.g., confidentiality, integrity, and authenticity) of data streams for such applications. We refer to this as an online security verification problem. Existing data security solutions cannot be applied in such applications as they cannot deal with data streams with high-volume and high-velocity data in real time. They introduce a significant buffering delay during security verification, resulting in a requirement for a large buffer size for the stream processing server. To address this problem, we propose a Dynamic Key-Length-Based Security Framework (DLSeF) based on a shared key derived from synchronized prime numbers; the key is dynamically updated at short intervals to thwart potential attacks to ensure end-to-end security. Theoretical analyses and experimental results of the DLSeF framework show that it can significantly improve the efficiency of processing stream data by reducing the security verification time and buffer usage without compromising security.

65 citations


Journal ArticleDOI
TL;DR: A reinforcement learning-based runtime manager is proposed that guarantees application-specific performance requirements and controls the POSIX thread allocation and voltage/frequency scaling for energy-efficient thermal management and enables finer control on temperature in an energy- efficient manner while simultaneously addressing scalability, which is a crucial aspect for multi-/many-core embedded systems.
Abstract: Modern embedded systems execute applications, which interact with the operating system and hardware differently depending on the type of workload. These cross-layer interactions result in wide variations of the chip-wide thermal profile. In this article, a reinforcement learning-based runtime manager is proposed that guarantees application-specific performance requirements and controls the POSIX thread allocation and voltage/frequency scaling for energy-efficient thermal management. This controls three thermal aspects: peak temperature, average temperature, and thermal cycling. Contrary to existing learning-based runtime approaches that optimize energy and temperature individually, the proposed runtime manager is the first approach to combine the two objectives, simultaneously addressing all three thermal aspects. However, determining thread allocation and core frequencies to optimize energy and temperature is an NP-hard problem. This leads to exponential growth in the learning table (significant memory overhead) and a corresponding increase in the exploration time to learn the most appropriate thread allocation and core frequency for a particular application workload. To confine the learning space and to minimize the learning cost, the proposed runtime manager is implemented in a two-stage hierarchy: a heuristic-based thread allocation at a longer time interval to improve thermal cycling, followed by a learning-based hardware frequency selection at a much finer interval to improve average temperature, peak temperature, and energy consumption. This enables finer control on temperature in an energy-efficient manner while simultaneously addressing scalability, which is a crucial aspect for multi-/many-core embedded systems. The proposed hierarchical runtime manager is implemented for Linux running on nVidia’s Tegra SoC, featuring four ARM Cortex-A15 cores. Experiments conducted with a range of embedded and cpu-intensive applications demonstrate that the proposed runtime manager not only reduces energy consumption by an average 15p with respect to Linux but also improves all the thermal aspects—average temperature by 14°C, peak temperature by 16°C, and thermal cycling by 54p.

53 citations


Journal ArticleDOI
TL;DR: A combined online/offline approach, which uses aspects of the two earlier methods along with a real-time reach ability computation, also maintains safety, but with significantly less conservatism.
Abstract: The Simplex architecture ensures the safe use of an unverifiable complex/smart controller by using it in conjunction with a verified safety controller and verified supervisory controller (switching logic). This architecture enables the safe use of smart, high-performance, untrusted, and complex control algorithms to enable autonomy without requiring the smart controllers to be formally verified or certified. Simplex incorporates a supervisory controller that will take over control from the unverified complex/smart controller if it misbehaves and use a safety controller. The supervisory controller should (1) guarantee that the system never enters an unsafe state (safety), but should also (2) use the complex/smart controller as much as possible (minimize conservatism). The problem of precisely and correctly defining the switching logic of the supervisory controller has previously been considered either using a control-theoretic optimization approach or through an offline hybrid-systems reachability computation. In this work, we show that a combined online/offline approach that uses aspects of the two earlier methods, along with a real-time reachability computation, also maintains safety, but with significantly less conservatism, allowing the complex controller to be used more frequently. We demonstrate the advantages of this unified approach on a saturated inverted pendulum system, in which the verifiable region of attraction is over twice as large compared to the earlier approach. Additionally, to validate the claims that the real-time reachability approach may be implemented on embedded platforms, we have ported and conducted embedded hardware studies using both ARM processors and Atmel AVR microcontrollers. This is the first ever demonstration of a hybrid-systems reachability computation in real time on actual embedded platforms, which required addressing significant technical challenges.

53 citations


Journal ArticleDOI
TL;DR: The security analysis shows that the proposed instances of the Warbler family can pass the cryptographic statistical tests recommended by the EPC C1 Gen2 standard and NIST but also are resistant to the cryptanalytic attacks such as algebraic attacks, cube attacks, time-memory-data tradeoff attacks, Mihaljević et al.
Abstract: With the advent of ubiquitous computing and the Internet of Things (IoT), the security and privacy issues for various smart devices such as radio-frequency identification (RFID) tags and wireless sensor nodes are receiving increased attention from academia and industry. A number of lightweight cryptographic primitives have been proposed to provide security services for resource-constrained smart devices. As one of the core primitives, a cryptographically secure pseudorandom number generator (PRNG) plays an important role for lightweight embedded applications. The most existing PRNGs proposed for smart devices employ true random number generators as a component, which generally incur significant power consumption and gate count in hardware. In this article, we present Warbler family, a new pseudorandom number generator family based on nonlinear feedback shift registers (NLFSRs) with desirable randomness properties. The design of the Warbler family is based on the combination of modified de Bruijn blocks together with a nonlinear feedback Welch-Gong (WG) sequence generator, which enables us to precisely characterize the randomness properties and to flexibly adjust the security level of the resulting PRNG. Some criteria for selecting parameters of the Warbler family are proposed to offer the maximum level of security. Two instances of the Warbler family are also described, which feature two different security levels and are dedicated to EPC C1 Gen2 RFID tags and wireless sensor nodes, respectively. The security analysis shows that the proposed instances not only can pass the cryptographic statistical tests recommended by the EPC C1 Gen2 standard and NIST but also are resistant to the cryptanalytic attacks such as algebraic attacks, cube attacks, time-memory-data tradeoff attacks, Mihaljevic et al.’s attacks, and weak internal state and fault injection attacks. Our ASIC implementations using a 65nm CMOS process demonstrate that the proposed two lightweight instances of the Warbler family can achieve good performance in terms of speed and area and provide ideal solutions for securing low-cost smart devices.

47 citations


Journal ArticleDOI
TL;DR: This article proposes and benchmark fault diagnosis methods for this post-quantum cryptography variant through case studies for hash functions and presents the simulations and implementations results (through application-specific integrated circuit evaluations) to show the applicability of the presented schemes.
Abstract: Symmetric-key cryptography can resist the potential post-quantum attacks expected with the not-so-faraway advent of quantum computing power. Hash-based, code-based, lattice-based, and multivariate-quadratic equations are all other potential candidates, the merit of which is that they are believed to resist both classical and quantum computers, and applying “Shor’s algorithm”—the quantum-computer discrete-logarithm algorithm that breaks classical schemes—to them is infeasible. In this article, we propose, assess, and benchmark reliable constructions for stateless hash-based signatures. Such architectures are believed to be one of the prominent post-quantum schemes, offering security proofs relative to plausible properties of the hash function; however, it is well known that their confidentiality does not guarantee reliable architectures in the presence natural and malicious faults. We propose and benchmark fault diagnosis methods for this post-quantum cryptography variant through case studies for hash functions and present the simulations and implementations results (through application-specific integrated circuit evaluations) to show the applicability of the presented schemes. The proposed approaches make such hash-based constructions more reliable against natural faults and help protecting them against malicious faults and can be tailored based on the resources available and for different reliability objectives.

Journal ArticleDOI
TL;DR: A hierarchical Petri net-based model and social flow are presented to extend the control flow and formally describe the social interactions of multiple users, respectively and the system-level optimization for CPSS can be achieved by the improved design flow.
Abstract: The design of cyber-physical-social systems (CPSS) is a novel and challenging research field due that it emphasizes the deep fusion of cyberspace, physical space, and social space. In this article, we extend our previously proposed system-level design framework [Zeng et al. 2015] to tailor it to the needs of social scenario of multiple users. A hierarchical Petri net-based model and social flow are presented to extend the control flow and formally describe the social interactions of multiple users, respectively. By using the extended model, the system-level optimization for CPSS can be achieved by the improved design flow. Specifically, object emplacement and user satisfaction are further extended into the social environment. Also maximal power estimation algorithm is improved, leveraging the extended intermediate representation model. Finally, we use a smart office case to demonstrate the feasibility and effectiveness of our improved design approach for multiple users.

Journal ArticleDOI
TL;DR: A Socioecological Service Discovery model for advanced service discovery in M2M communication networks that can self-adapt and self-organize themselves in real time to generate higher flexibility and adaptability and achieve a better performance than the existing methods in terms of the number of discovered service and a better efficiency.
Abstract: The new development of embedded systems has the potential to revolutionize our lives and will have a significant impact on future Internet of Thing (IoT) systems if required services can be automatically discovered and accessed at runtime in Machine-to-Machine (M2M) communication networks. It is a crucial task for devices to perform timely service discovery in a dynamic environment of IoTs. In this article, we propose a Socioecological Service Discovery (SESD) model for advanced service discovery in M2M communication networks. In the SESD network, each device can perform advanced service search to dynamically resolve complex enquires and autonomously support and co-operate with each other to quickly discover and self-configure any services available in M2M communication networks to deliver a real-time capability. The proposed model has been systematically evaluated and simulated in a dynamic M2M environment. The experiment results show that SESD can self-adapt and self-organize themselves in real time to generate higher flexibility and adaptability and achieve a better performance than the existing methods in terms of the number of discovered service and a better efficiency in terms of the number of discovered services per message.

Journal ArticleDOI
TL;DR: An efficient, multidimensional, big data analytical architecture based on the fusion model is proposed that fulfills the basic requirements of the analytical architecture and efficiently extracts various features from the massive volume of satellite data.
Abstract: Machine-to-Machine communication (M2M) is nowadays increasingly becoming a world-wide network of interconnected devices uniquely addressable, via standard communication protocols. The prevalence of M2M is bound to generate a massive volume of heterogeneous, multisource, dynamic, and sparse data, which leads a system towards major computational challenges, such as, analysis, aggregation, and storage. Moreover, a critical problem arises to extract the useful information in an efficient manner from the massive volume of data. Hence, to govern an adequate quality of the analysis, diverse and capacious data needs to be aggregated and fused. Therefore, it is imperative to enhance the computational efficiency for fusing and analyzing the massive volume of data. Therefore, to address these issues, this article proposes an efficient, multidimensional, big data analytical architecture based on the fusion model. The basic concept implicates the division of magnitudes (attributes), i.e., big datasets with complex magnitudes can be altered into smaller data subsets using five levels of the fusion model that can be easily processed by the Hadoop Processing Server, resulting in formalizing the problem of feature extraction applications using earth observatory system, social networking, or networking applications. Moreover, a four-layered network architecture is also proposed that fulfills the basic requirements of the analytical architecture. The feasibility and efficiency of the proposed algorithms used in the fusion model are implemented on Hadoop single-node setup on UBUNTU 14.04 LTS core i5 machine with 3.2GHz processor and 4GB memory. The results show that the proposed system architecture efficiently extracts various features (such as land and sea) from the massive volume of satellite data.

Journal ArticleDOI
TL;DR: This article tackles the problem of scheduling tasks on heterogeneous multicore embedded systems with the constraints of time and resources for minimizing the total cost, while considering the communication overhead, and proposes several heuristic techniques to address the problem.
Abstract: Cost savings are very critical in modern heterogeneous computing systems, especially in embedded systems. Task scheduling plays an important role in cost savings. In this article, we tackle the problem of scheduling tasks on heterogeneous multicore embedded systems with the constraints of time and resources for minimizing the total cost, while considering the communication overhead. This problem is NP-hard and we propose several heuristic techniques—ISGG, RLD, and RLDG—to address the problem. Experimental results show that the proposed algorithms significantly outperform the existing approaches in terms of cost savings.

Journal ArticleDOI
TL;DR: Clover is presented, a compiler-directed soft error detection and recovery scheme for lightweight soft error resilience that leverages a novel selective instruction duplication technique called tail-DMR (dual modular redundancy) that provides a region-level error containment.
Abstract: This article presents Clover, a compiler-directed soft error detection and recovery scheme for lightweight soft error resilience. The compiler carefully generates soft-error-tolerant code based on idempotent processing without explicit checkpoints. During program execution, Clover relies on a small number of acoustic wave detectors deployed in the processor to identify soft errors by sensing the wave made by a particle strike. To cope with DUEs (detected unrecoverable errors) caused by the sensing latency of error detection, Clover leverages a novel selective instruction duplication technique called tail-DMR (dual modular redundancy) that provides a region-level error containment. Once a soft error is detected by either the sensors or the tail-DMR, Clover takes care of the error as in the case of exception handling. To recover from the error, Clover simply redirects program control to the beginning of the code region where the error is detected. The experimental results demonstrate that the average runtime overhead is only 26%, which is a 75% reduction compared to that of the state-of-the-art soft error resilience technique. In addition, this article evaluates an alternative technique called tail-wait, comparing it to Clover. According to the evaluation with the different processor configurations and the various error detection latencies, Clover turns out to be a superior technique, achieving 1.06 to 3.49 × speedup over the tail-wait.

Journal ArticleDOI
TL;DR: Results show that the crosstalk noise can be significantly reduced by adopting the main rationale of the work, thereby allowing higher network scalability, and can exhibit encouraging improvements over application-oblivious architectures.
Abstract: Optical networks-on-chip (NoCs) provide a promising answer to address the increasing requirements of ultra-high bandwidth and extremely low power consumption. Designing a photonic interconnect, however, involves a number of challenges that have no equivalent in the electronic domain, particularly the crosstalk noise, which affects the signal-to-noise ratio (SNR) possibly resulting in an inoperable architecture and hence constraining the network scalability. In this article, we point out the implications of application-driven task mapping on crosstalk effects. We motivate the main rationale of our work and provide a formalization of the problem. Then we propose a class of algorithms that automatically map the application tasks onto a generic mesh-based photonic NoC architecture such that the worst-case crosstalk is minimized. We also present a purpose-built experimental setup used for evaluating several architectural solutions in terms of crosstalk noise and SNR. The setup is used to collect extensive results from several real-world applications and case studies. The collected results show that the crosstalk noise can be significantly reduced by adopting our approach, thereby allowing higher network scalability, and can exhibit encouraging improvements over application-oblivious architectures.

Journal ArticleDOI
TL;DR: Stable Greedy is presented, which identifies page-accurate data hotness using block-level information, and jointly considers block space utilization and block stability for victim selection, and considers flash wear leveling for SSD lifetime enhancement at the block level as well as at the channel level.
Abstract: Commodity solid state drives (SSDs) have recently begun involving the adoption of powerful controllers for multichannel flash management at the page level. However, many of these models still use primitive garbage-collection algorithms, because previous approaches are subject to poor scalability with high-capacity flash memory. This study presents Stable Greedy for garbage collection in page-mapping multichannel SSDs. Stable Greedy identifies page-accurate data hotness using block-level information, and jointly considers block space utilization and block stability for victim selection. Its design considers flash wear leveling for SSD lifetime enhancement at the block level as well as at the channel level. Stable Greedy runs at a constant time, and requires limited RAM space. The simulation results revealed that Stable Greedy outperformed previous methods considerably under various workloads and multichannel architectures. Stable Greedy was successfully implemented on the OpenSSD platform, and the actual performance measurements were consistent with the simulation results.

Journal ArticleDOI
TL;DR: It is proved that a GPS-inspired fluid scheduling scheme is thermally optimal when context switch/preemption overhead is ignored and necessary and sufficient conditions for thermal feasibility of periodic tasksets on a unicore system.
Abstract: With the growing need to address the thermal issues in modern processing platforms, various performance throttling schemes have been proposed in literature (DVFS, clock gating, and so on) to manage temperature. In real-time systems, such methods are often unacceptable, as they can result in potentially catastrophic deadline misses. As a result, real-time scheduling research has recently focused on developing algorithms that meet the compute deadline while satisfying power and thermal constraints. Basic bounds that can determine if a set of tasks can be scheduled or not were established in the 1970s based on computation utilization. Similar results for thermal bounds have not been forthcoming. In this article, we address the problem of thermal constraint schedulability of tasks and derive necessary and sufficient conditions for thermal feasibility of periodic tasksets on a unicore system. We prove that a GPS-inspired fluid scheduling scheme is thermally optimal when context switch/preemption overhead is ignored. Extension of sufficient conditions to a nonfluid model is still an open problem. We also extend some of the results to a multicore processing environment. We demonstrate the efficacy of our results through extensive simulations. We also evaluate the proposed concepts on a hardware testbed.

Journal ArticleDOI
TL;DR: A solution to the integrated problem of Through-Silicon Via placement and mapping of cores to the routers in a three-dimensional mesh-based Network-on-Chip (NoC) system and the results obtained are better than many of the contemporary approaches and close to the theoretical situation in which all routers are 3D in nature.
Abstract: This article proposes a solution to the integrated problem of Through-Silicon Via (TSV) placement and mapping of cores to the routers in a three-dimensional mesh-based Network-on-Chip (NoC) system. TSV geometry restricts their number in three-dimensional (3D) ICs. As a result, only about 25% of routers in a 3D NoC can possess vertical connections. Mapping plays an important role in evolving good system solutions in such a situation. TSVs have been placed with detailed consultation with the application mapping process. The integrated problem was first solved using the exact method of Integer Liner Programming (ILP). Next, a solution was obtained via a Particle Swarm Optimization (PSO) formulation. Several augmentations to the basic PSO strategy have been proposed to generate good-quality solutions. The results obtained are better than many of the contemporary approaches and close to the theoretical situation in which all routers are 3D in nature.

Journal ArticleDOI
TL;DR: The EC project parMERASA investigated timing-analyzable parallel hard real-time applications running on a predictable multicore processor to ease sequential to parallel program transformation based on parallel design patterns that are timing analyzable.
Abstract: The EC project parMERASA (Multicore Execution of Parallelized Hard Real-Time Applications Supporting Analyzability) investigated timing-analyzable parallel hard real-time applications running on a predictable multicore processor. A pattern-supported parallelization approach was developed to ease sequential to parallel program transformation based on parallel design patterns that are timing analyzable. The parallelization approach was applied to parallelize the following industrial hard real-time programs: 3D path planning and stereo navigation algorithms (Honeywell International s.r.o.), control algorithm for a dynamic compaction machine (BAUER Maschinen GmbH), and a diesel engine management system (DENSO AUTOMOTIVE Deutschland GmbH). This article focuses on the parallelization approach, experiences during parallelization with the applications, and quantitative results reached by simulation, by static WCET analysis with the OTAWA tool, and by measurement-based WCET analysis with the RapiTime tool.

Journal ArticleDOI
TL;DR: It is demonstrated that the small-world-network-enabled wireless NoC (WiNoC) is the most suitable platform for executing the considered graph applications, and achieves an average of 38% and 18% full-system Energy Delay Product savings compared to wireline-mesh and high-radix NoCs, respectively.
Abstract: With its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. One of the most important components of a multicore chip is its communication backbone. Due to inherent irregularities in data movements manifested by graph-based applications, it is essential to design efficient on-chip interconnection architectures for multicore chips performing graph analytics. In this article, we present a detailed analysis of the traffic patterns generated by graph-based applications when mapped to multicore chips. Based on this analysis, we explore the design-space for the Network-on-Chip (NoC) architecture to enable an efficient implementation of graph analytics. We principally consider three types of NoC architectures, viz., traditional mesh, small-world, and high-radix networks. We demonstrate that the small-world-network-enabled wireless NoC (WiNoC) is the most suitable platform for executing the considered graph applications. The WiNoC achieves an average of 38p and 18p full-system Energy Delay Product savings compared to wireline-mesh and high-radix NoCs, respectively.

Journal ArticleDOI
TL;DR: This special issue, entitled "Innovative Design Methods for Smart Embedded Systems," tackles challenges by providing both machine-learning techniques and application-specific optimization solutions that guarantee that the application of smart innovations meets the imposed requirements and constraints.
Abstract: Nowadays, smart systems are becoming more relevant in a large number of critical sectors, like energy management in public spaces, healthcare, automotive, safety and security. Compared to classical embedded systems, a distinctive aspect for these systems is their smartness, which is the ability to learn from the previous experience and to seemingly react to the surrounding environment. However, this tight interaction with the physical environment implies a high level of heterogeneity in the hardware architecture. At the same time, application scenarios are becoming more complex, since an increasing amount of computation is constrained by tight performance, cost and safety requirements. Novel and comprehensive methodologies are thus required to ease the development of next-generation smart systems, with the goal of reducing design costs and time-to-market, improving the smartness of the architectures, and analyze the dynamic behavior of the resulting systems and their reactivity to unpredictable events. This special issue, entitled "Innovative Design Methods for Smart Embedded Systems," tackles such challenges by providing both machine-learning techniques and application-specific optimization solutions that guarantee that the application of smart innovations meets the imposed requirements and constraints.

Journal ArticleDOI
TL;DR: This work presents a novel write buffer algorithm that exploits temporal and spatial correlations among buffer pages that is shown to outperform the existing methods by up to 134%.
Abstract: Advanced solid-state disks (SSDs) have been equipped with page-mapping flash translation layers and multichannel architectures. The SSDs employ a RAM-based write buffer, which delays write requests for reducing write traffic, reorders requests for mitigating garbage-collection overhead, and produces parallel page writes for improving channel time utilization. This work presents a novel write buffer algorithm that exploits temporal and spatial correlations among buffer pages. The write-buffer groups temporally or spatially correlate buffer pages and then write the grouped buffer pages to the same flash block. In this way, when the correlated page data are updated in the future, flash blocks will receive bulk page invalidations and become good candidates for garbage collection. With multichannel architectures, the write buffer adaptively disperses read-most sequential data over channels for high page-level parallelism of sequential reads, while clustering write-most sequential data in the same channel for a reduced cost of garbage collection. We evaluated the proposed method and previously proposed buffer algorithms. Our method was shown to outperform the existing methods by up to 134p. We also implemented our buffer design on the OpenSSD platform; the time and space overheads of our design were reported to be very low.

Journal ArticleDOI
TL;DR: This article exploits the concept of application sensitivity, which reflects the user’s attention on each application, and devise a user-centric scheduler and governor that allocate computing resources to applications according to their sensitivity.
Abstract: Mobile applications will become progressively more complicated and diverse. Heterogeneous computing architectures like big.LITTLE are a hardware solution that allows mobile devices to combine computing performance and energy efficiency. However, software solutions that conform to the paradigm of conventional fair scheduling and governing are not applicable to mobile systems, thereby degrading user experience or reducing energy efficiency. In this article, we exploit the concept of application sensitivity, which reflects the user’s attention on each application, and devise a user-centric scheduler and governor that allocate computing resources to applications according to their sensitivity. Furthermore, we integrate our design into the Android operating system. The results of experiments conducted on a commercial big.LITTLE smartphone with real-world mobile apps demonstrate that the proposed design can achieve significant gains in energy efficiency while improving the quality of user experience.

Journal ArticleDOI
TL;DR: An algorithm to enumerate certain sets of delay constraints for the widely studied Arbiter PUF (APUF) circuit is presented, then it is demonstrated how these delay constraints can be utilized to expand the set of known Challenge--Response Pairs (CRPs), thus facilitating model-building attacks.
Abstract: Physically Unclonable Function (PUF) circuits are often vulnerable to mathematical model-building attacks. We theoretically quantify the advantage provided to an adversary by any training dataset expansion technique along the lines of security analysis of cryptographic hash functions. We present an algorithm to enumerate certain sets of delay constraints for the widely studied Arbiter PUF (APUF) circuit, then demonstrate how these delay constraints can be utilized to expand the set of known Challenge--Response Pairs (CRPs), thus facilitating model-building attacks. We provide experimental results for Field Programmable Gate Array (FPGA)--based APUF to establish the effectiveness of the proposed attack.

Journal ArticleDOI
TL;DR: The design, algorithm, and implementation of a novel framework called CURA for quality-retaining power saving on mobile OLED displays are introduced and the results demonstrate that CURA can reduce significant OLED power consumption while retaining the visual quality of images.
Abstract: Organic Light-Emitting Diode (OLED) technology is regarded as a promising alternative to mobile displays. In this article, we introduce the design, algorithm, and implementation of a novel framework called CURA for quality-retaining power saving on mobile OLED displays. First, we link human visual attention to OLED power saving and model the OLED image scaling optimization problem. The objective is to minimize the power required to display an image without adversely impacting the user’s visual experience. Then, we present the algorithm used to solve the modeled problem, and prove its optimality even without an accurate power model. Finally, based on the framework, we implement two practical applications on a commercial OLED mobile tablet. The results of experiments conducted on the tablet with real images demonstrate that CURA can reduce significant OLED power consumption while retaining the visual quality of images.

Journal ArticleDOI
TL;DR: A novel approach based on the Logic-Based Benders decomposition principle that is able to find an optimal solution within a time limit of 2 hours in the vast majority of performed experiments, while the pure ILP model fails.
Abstract: The development of efficient methods for mapping applications on heterogeneous multicore platforms is a key issue in the field of embedded systems. In this article, a novel approach based on the Logic-Based Benders decomposition principle is introduced for mapping complex applications on these platforms, aiming at optimizing their execution time. To provide optimal solutions for this problem in a short time, a new hybrid model that combines Integer Linear Programming (ILP) and Constraint Programming (CP) models is introduced. Also, to reduce the complexity of the model and its solution time, a set of novel techniques for generating additional constraints called Benders cuts is proposed. An extensive set of experiments has been performed in which synthetic applications described by Directed Acyclic Graphs (DAGs) were mapped to a number of heterogeneous multicore platforms. Moreover, experiments with DAGs that correspond to two real-life applications have also been performed. Based on the experimental results, it is proven that the proposed approach outperforms the pure ILP model in terms of the solution time and quality of the solution. Specifically, the proposed approach is able to find an optimal solution within a time limit of 2 hours in the vast majority of performed experiments, while the pure ILP model fails. Also, for the cases where both methods fail to find an optimal solution within the time limit, the solution of the proposed approach is systematically better than the solution of the ILP model.

Journal ArticleDOI
TL;DR: A new technique to minimize the memory footprints of Digital Signal Processing applications specified with Synchronous Dataflow (SDF) graphs and implemented on shared-memory Multiprocessor System-on-Chip (MPSoCs) is introduced.
Abstract: This article introduces a new technique to minimize the memory footprints of Digital Signal Processing (DSP) applications specified with Synchronous Dataflow (SDF) graphs and implemented on shared-memory Multiprocessor System-on-Chip (MPSoCs). In addition to the SDF specification, which captures data dependencies between coarse-grained tasks called actors, the proposed technique relies on two optional inputs abstracting the internal data dependencies of actors: annotations of the ports of actors, and script-based specifications of merging opportunities between input and output buffers of actors. Experimental results on a set of applications show a reduction of the memory footprint by 48p compared to state-of-the-art minimization techniques.

Journal ArticleDOI
TL;DR: This work presents a streaming architecture implemented on Field-Programmable Gate Arrays (FPGAs) to accelerate real-time ECG signal analysis and diagnosis in a pipelining and parallel way and proposes a hardware-oriented data-mining algorithm named Bit_Q_Apriori.
Abstract: Telemedicine provides health care services at a distance using information and communication technologies, which intends to be a solution to the challenges faced by current health care systems with growing numbers of population, increased demands from patients, and shortages in human resources. Recent advances in telemedicine, especially in wearable electrocardiogram (ECG) monitors, call for more intelligent and efficient automatic ECG analysis and diagnostic systems. We present a streaming architecture implemented on Field-Programmable Gate Arrays (FPGAs) to accelerate real-time ECG signal analysis and diagnosis in a pipelining and parallel way. Association-rule mining is employed to generate early diagnostic results by matching features of ECG with generated association rules. To improve performance of the processing, we propose a hardware-oriented data-mining algorithm named Bit_Q_Apriori. The corresponding hardware implementation indicates a good scalability and outperforms other hardware designs in terms of performance, throughput, and hardware cost.