scispace - formally typeset
Search or ask a question

Showing papers by "Ulrich Rückert published in 2013"


Journal ArticleDOI
TL;DR: An energy-efficient SoC with 32 b subthreshold RISC processor cores, 32 kB conventional cache memory, and 9T ultra-low voltage SRAM based on a flexible and extensible architecture that provides dynamic voltage and frequency scaling (DVFS) combined with an adaptive supply voltage generation for dynamic PVT compensation is fabricated.
Abstract: An energy-efficient SoC with 32 b subthreshold RISC processor cores, 32 kB conventional cache memory, and 9T ultra-low voltage (ULV) SRAM based on a flexible and extensible architecture was fabricated on a 2.7 mm2 test chip in 65 nm low power CMOS. The processor cores are based on a custom standard cell library that was designed using a multiobjective approach to optimize noise margins, switching energy, and propagation delay simultaneously. The cores operate over a supply voltage range from 200 mV (best samples) to 1.2 V with clock frequencies from 10 kHz to 94 MHz at room temperature. The lowest energy consumption per cycle of 9.94 pJ is observed at 325 mV and 133 kHz. A 2 kb ULV SRAM macro achieves minimum energy per operation at averages of 321 mV (0.030 σ/μ), 567 fJ (0.037 σ/μ), and 730 kHz (0.184 σ/μ), for equal number of 32 b read/write operations. The off-chip performance and power management subsystem provides dynamic voltage and frequency scaling (DVFS) combined with an adaptive supply voltage generation for dynamic PVT compensation.

87 citations


Journal ArticleDOI
TL;DR: A scalable FPGA-based hardware accelerator for the emulation of Self-Organizing Feature Maps (SOMs) with a multi-threaded software implementation on a state-of-the-art multi-core microprocessor is compared.

34 citations


Proceedings ArticleDOI
01 Nov 2013
TL;DR: The results of a design-space exploration for a classification algorithm with respect to the inherent parallelism of the CoreVA CPU are presented, showing which hardware and algorithm configurations perform best in respect to classification accuracy, runtime and energy consumption.
Abstract: In this paper we present the results of a design-space exploration for a classification algorithm with respect to the inherent parallelism of the CoreVA CPU. The CoreVA is a configurable VLIW processor which has been mainly designed for energy-constrained applications. Energy-efficient signal-processing is essential for real-time applications on wireless body sensors (WBSs). Using a velocity-estimation algorithm for a runner as an example, we show which hardware and algorithm configurations perform best in respect to classification accuracy, runtime and energy consumption. We obtained 9 Pareto-optimal configurations out of 504 simulations. The highest classification accuracy of 93.4% requires 34687 clock cycles and has an energy consumption of 1.559 μJ. The lowest energy requirements of 0.015μJ per classification are observed with a Pareto-optimal configuration at 76.3% accuracy. The three-issue VLIW configuration shows the best results with respect to the area-energy trade-off.

7 citations


Proceedings ArticleDOI
01 Sep 2013
TL;DR: This paper presents an innovative approach towards evaluation of navigation algorithms, which does not need physical robots and position sensors to be present at the experimenter's site, but relies on a special remote internet-accessible physical testbed, the “Teleworkbench”, which can be used in order to evaluate as well as uniformly cross-compare algorithms with no need of spending money on hardware or simulation software.
Abstract: Experimental evaluation of navigation algorithms requires physical robots as well as position sensing devices. The common alternative is to use simulations to run the experiments. However, simulation often does not provide an accurate prediction of real-world behavior. Therefore, in this paper, we present an innovative approach towards evaluation of navigation algorithms, which does not need physical robots and position sensors to be present at the experimenter's site, but relies on a special remote internet-accessible physical testbed, the “Teleworkbench”, which can be used in order to evaluate as well as uniformly cross-compare algorithms with no need of spending money on hardware or simulation software. More specifically, in this paper we are using the Teleworkbench to evaluate three different path planning algorithms, and compare it with simulation. Different metrics are proposed, such as the path execution time, smoothness and path clearance deviations. Our results clearly illustrate the superiority of the Telework-bench as an evaluation platform in comparison to simulation, which does not provide an accurate prediction of actual physical performance, and thus illustrate both the viability as well as the power of our novel approach.

6 citations


Journal ArticleDOI
TL;DR: The adaptation of these application-specific bypass configurations allows for a reduction of the critical path by 26%.
Abstract: The diversity of today's mobile applications requires embedded processor cores with a high resource efficiency, that means, the devices should provide a high performance at low area requirements and power consumption. The fine-grained parallelism supported by multiple functional units of VLIW architectures offers a high throughput at reasonable low clock frequencies compared to single-core RISC processors. To efficiently utilize the processor pipeline, common system architectures have to cope with data hazards due to data dependencies between consecutive operations. On the one hand, such hazards can be resolved by complex forwarding circuits (i.e., a pipeline bypass) which forward intermediate results to a subsequent instruction. On the other hand, the pipeline bypass can strongly affect or even dominate the total resource requirements and degrade the maximum clock frequency. In this work the CoreVA VLIW architecture is used for the development and the analysis of application-specific bypass configurations. It is shown that many paths of a comprehensive bypass system are rarely used and may not be required for certain applications. For this reason, several strategies have been implemented to enhance the efficiency of the total system by introducing application-specific bypass configurations. The configuration can be carried out statically by only implementing required paths or at runtime by dynamically reconfiguring the hardware. An algorithm is proposed which derives an optimized configuration by iteratively disabling single bypass paths. The adaptation of these application-specific bypass configurations allows for a reduction of the critical path by 26p. As a result, the execution time and energy requirements could be reduced by up to 21.5p. Using Dynamic Frequency Scaling (DFS) and dynamic deactivation/reactivation of bypass paths allows for a runtime reconfiguration of the bypass system. This ensures the highest efficiency while processing varying applications.

5 citations


Proceedings ArticleDOI
19 Dec 2013
TL;DR: A framework for the exploration of the design space of resource-efficient signal processing suitable for embedded processors is presented and a velocity estimation algorithm for an athlete is used to show which configurations of the algorithm perform best in respect to classification accuracy and runtime.
Abstract: Miniaturised wireless body sensors equipped with low-power microcontrollers are used in various energy-constrained applications The signal-processing algorithms often require running in real-time on a low computational and memory budget In this paper we present a framework for the exploration of the design space of resource-efficient signal processing suitable for embedded processors Using a velocity estimation algorithm for an athlete, we show which configurations of the algorithm perform best in respect to classification accuracy and runtime Altering the sampling frequency, the feature combination, the classifier (Artificial Neural Network (ANN), Decision Tree (DT)), or the classifier's parametrisation, we obtained 15 Pareto-optimal configurations out of 1008 simulations The highest classification accuracy of 9392% was obtained using an ANN, and required 22422 clock cycles per classification The lowest cycle count of 204 was obtained with a DT configuration which resulted in 8466 % accuracy

5 citations


Book ChapterDOI
11 Feb 2013
TL;DR: A biometric method for identifying athletes based on information extracted from the gait style and the electrocardiographic (ECG) waveform that combines both sources of information to allow identification despite severe intra-subjects variations in the gight patterns.
Abstract: We propose a biometric method for identifying athletes based on information extracted from the gait style and the electrocardiographic (ECG) waveform. The required signals are recorded within a non-clinical acquisition setup using a wireless body sensor attached to a chest strap with integrated textile electrodes. Our method combines both sources of information to allow identification despite severe intra-subjects variations in the gait patterns (walking and jogging) and motion related artefacts in the ECG patterns. For identification we use features extracted in time and frequency domain and a standard classifier. Within a treadmill experiment with 22 subjects we obtained an accuracy of 98.1 % for velocities from 3 to 9 km/h. On a second data set consisting of 9 subjects and two sessions of recording, our method achieved 93.8 % despite variations in the patterns due to reapplying the body sensor and an increased velocity (up to 11 km/h).

4 citations


Proceedings Article
01 Jan 2013
TL;DR: This paper proposes to identify humans using features extracted in time and frequency domain and a standard classifier to identify a human despite severe motion related artefacts in the electrocardiograph and variations in the gait patterns.
Abstract: In this paper we propose a biometric method for identifying humans during walking and jogging. We use acceleration and electrocardiographic measurements recorded with a wireless body sensor attached to a chest strap. Our method does not require a particular acquisition setup. Information on the gait style and on the physiology is combined to identify a human despite severe motion related artefacts in the electrocardiograph and variations in the gait patterns. We propose to identify humans using features extracted in time and frequency domain and a standard classifier. With the collected data of 22 subjects on a treadmill at velocities from 3 to 9 km/h we obtained an accuracy of 98.1 %. The sensitivity of the identification ranged between 94.6 to 99.5 % for the different subjects and the specificity was higher than 99.7 %.

4 citations


Proceedings ArticleDOI
30 Jun 2013
TL;DR: In this article, the impact of laser phase noise on coherent optical DFT-spread OFDM with spectral shaping was investigated and a new spectral shaping scheme was proposed to improve the system performance.
Abstract: We experimentally investigated the impact of laser phase noise on coherent optical DFT-spread OFDM with spectral shaping. We also propose a new spectral shaping to improve the system performance.

3 citations


01 Jan 2013
TL;DR: In this paper, a new Hardwareplattform RAPTOR is proposed, in which rekonfigurier-barkeit heutiger feldprogrammierbarer Bausteine (FPGAs) ermoglicht die Veranderung eines Teils der Logikblocke and deren Verbindungsstruktur im Betrieb.
Abstract: Ein wesentliches Ziel bei der Entwicklung mikroelektronischer Schaltungen ist der effiziente Umgang mit den gegebenen Ressourcen Flache, Zeit und Energie. Mikroprozessoren weisen durch ihre Programmierbarkeit ein hohes Mas an Flexibilitat auf. Im Vergleich zu anwendungsspezifischen Schaltungen (ASICs) bieten sie durch die sequentielle Verarbeitung der Programme jedoch eine eingeschrankte Leistungsfahigkeit. ASICs sind indes leistungsfahiger aber weniger flexibel, da sie nach der Fertigung nicht mehr verandert werden konnen. Dynamisch rekonfigurierbare Hardware bietet fur viele Anwendungen einen guten Kompromiss zwischen Mikroprozessoren und ASICs. Die partielle Rekonfigurierbarkeit heutiger feldprogrammierbarer Bausteine (FPGAs) ermoglicht die Veranderung eines Teils der Logikblocke und deren Verbindungsstruktur im Betrieb. Hardwarefunktionen lassen sich zur Laufzeit auf das FPGA laden und nach Abschluss der Verarbeitung wieder entfernen, um die Ressourcen zukunftigen Hardwarefunktionen zur Verfugung zu stellen. Fur intelligente technische Systeme ist in vielen Fallen eine Kombination der oben beschriebenen Ansatze sinnvoll. Daher stellen wir eine neue Hardwareplattform vor, die rekonfigurierbare Prozessoren und FPGAs in einer Systemumgebung vereint. Mit Hilfe der modularen Prototyping-Umgebung RAPTOR werden die Konzepte sowohl prototypisch umgesetzt als auch in realen technischen Systemen getestet und bewertet.

3 citations


Proceedings ArticleDOI
17 Mar 2013
TL;DR: Simulations and experimental results confirm that the proposed method outperforms the widely-used method of Schmidl&Cox.
Abstract: We present a simple and efficient method for CO-OFDM frame synchronization and IQ component aligning. A training sequence is used. Simulations and experimental results confirm that our proposed method outperforms the widely-used method of Schmidl&Cox.

Proceedings ArticleDOI
30 Jun 2013
TL;DR: In this paper, joint chromatic dispersion and phase noise compensation for 16-QAM modulated transmission reached the FEC-limit for fiber length of 960 km at 28 Gs/s and laser linewidth of 200 kHz.
Abstract: Joint chromatic dispersion and phase noise compensation by the pilot-based method is presented. The 16-QAM modulated transmission reaches the FEC-limit for fiber length of 960 km at 28 Gs/s and laser linewidth of 200 kHz.

Proceedings ArticleDOI
30 Jun 2013
TL;DR: In this article, an adaptive blind CD estimation algorithm in frequency domain is analyzed at reduced sampling rates 112 Gb/s PDM-QPSK and 224 Gb /s CDM-16-QAM systems.
Abstract: An adaptive blind CD estimation algorithm in frequency-domain is analyzed at reduced sampling rates 112 Gb/s PDM-QPSK and 224 Gb/s PDM-16-QAM systems are simulated to evaluate the performance of our resource efficient CD estimation algorithm