scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Energy-aware cooling for hot-water cooled supercomputers

09 Mar 2015-pp 1353-1358
TL;DR: This work introduces an analytical model, based on lumped parameters, which can effectively describe the cooling components and dynamics, and can be used for analysis and control purposes, and designs an energy-optimal control strategy which is capable to minimize the pump and chiller power consumption while, meeting the supercomputer cooling requirements.
Abstract: Hot-water liquid cooling is a key technology in future green supercomputers as it maximizes the cooling efficiency and energy reuse. However the cooling system still is responsible for a significant percentage of modern HPC power consumption. Standard design of liquid-cooling control relies on rules based on worst-case scenarios, or on CFD simulation of portion of the entire system, which cannot account for all the real supercomputer working conditions (workload and ambient temperature). In this work we first introduce an analytical model, based on lumped parameters, which can effectively describe the cooling components and dynamics, and can be used for analysis and control purposes. We then use it to design an energy-optimal control strategy which is capable to minimize the pump and chiller power consumption while, meeting the supercomputer cooling requirements. We validate the method with simulation tests, taking data from a real HPC cooling mechanism, and comparing the results with state-of-the-art commercial cooling system control strategies.
Citations
More filters
Proceedings ArticleDOI
27 Mar 2017
TL;DR: In this article, the authors leverage scalable, lightweight and flexible IoT technologies, such as the MQTT protocol, to build a highly scalable HPC monitoring infrastructure able to handle the massive sensor data produced by next-gen HPC components.
Abstract: Exascale computing represents the next leap in the HPC race. Reaching this level of performance is subject to several engineering challenges such as energy consumption, equipment-cooling, reliability and massive parallelism. Model-based optimization is an essential tool in the design process and control of energy efficient, reliable and thermally constrained systems. However, in the Exascale domain, model learning techniques tailored to the specific supercomputer require real measurements and must therefore handle and analyze a massive amount of data coming from the HPC monitoring infrastructure. This becomes rapidly a “big data” scale problem. The common approach where measurements are first stored in large databases and then processed is no more affordable due to the increasingly storage costs and lack of real-time support. Nowadays instead, cloud-based machine learning techniques aim to build on-line models using real-time approaches such as “stream processing” and “in-memory” computing, that avoid storage costs and enable fastdata processing. Moreover, the fast delivery and adaptation of the models to the quick data variations, make the decision stage of the optimization loop more effective and reliable. In this paper we leverage scalable, lightweight and flexible IoT technologies, such as the MQTT protocol, to build a highly scalable HPC monitoring infrastructure able to handle the massive sensor data produced by next-gen HPC components. We then show how state-of-the art tools for big data computing and analysis, such as Apache Spark, can be used to manage the huge amount of data delivered by the monitoring layer and to build adaptive models in real-time using on-line machine learning techniques.

56 citations


Cites methods from "Energy-aware cooling for hot-water ..."

  • ...The main usage of the monitoring data is oriented to the HPC facility management such as HW diagnostic for failure prevention, resource availability, capacity utilization, cooling systems, energy consumption, billing and application performance [5], [15], [16]....

    [...]

Book ChapterDOI
31 Aug 2015
TL;DR: This paper proposes two novel approaches for power capped workload dispatching and demonstrates them on a real-life high-performance machine: the Eurora supercomputer hosted at CINECA computing center in Bologna.
Abstract: Power consumption is a key factor in modern ICT infrastructure, especially in the expanding world of High Performance Computing, Cloud Computing and Big Data. Such consumption is bound to become an even greater issue as supercomputers are envisioned to enter the Exascale by 2020, granted that they obtain an order of magnitude energy efficiency gain. An important component in many strategies devised to decrease energy usage is "power capping", i.e., the possibility to constrain the system power consumption within certain power budget. In this paper we propose two novel approaches for power capped workload dispatching and we demonstrate them on a real-life high-performance machine: the Eurora supercomputer hosted at CINECA computing center in Bologna. Power capping is a feature not included in the commercial Portable Batch System PBS dispatcher currently in use on Eurora. The first method is based on a heuristic technique while the second one relies on a hybrid strategy which combines a CP and a heuristic approach. Both systems are evaluated and compared on simulated job traces.

27 citations

Journal ArticleDOI
TL;DR: In this article, the cross-validated Gaussian process regression (GPR) model, trained based on experimental measurements, is used to mimic the measurements at positions where there are no experimental data.

17 citations

Proceedings ArticleDOI
23 Jun 2020
TL;DR: This paper proposes Wintermute, a novel generic framework to enable online ODA on large-scale HPC installations, based on a set of logical abstractions to ease the configuration of models at a large scale and maximize code re-use.
Abstract: As we approach the exascale era, the size and complexity of HPC systems continues to increase, raising concerns about their manageability and sustainability. For this reason, more and more HPC centers are experimenting with fine-grained monitoring coupled with Operational Data Analytics (ODA) to optimize efficiency and effectiveness of system operations. However, while monitoring is a common reality in HPC, there is no well-stated and comprehensive list of requirements, nor matching frameworks, to support holistic and online ODA. This leads to insular ad-hoc solutions, each addressing only specific aspects of the problem. In this paper we propose Wintermute, a novel generic framework to enable online ODA on large-scale HPC installations. Its design is based on the results of a literature survey of common operational requirements. We implement Wintermute on top of the holistic DCDB monitoring system, offering a large variety of configuration options to accommodate the varying requirements of ODA applications. Moreover, Wintermute is based on a set of logical abstractions to ease the configuration of models at a large scale and maximize code re-use. We highlight Wintermute's flexibility through a series of practical case studies, each targeting a different aspect of the management of HPC systems, and then demonstrate the small resource footprint of our implementation.

16 citations


Cites background from "Energy-aware cooling for hot-water ..."

  • ...cooling or facility network), as well as adapting to environmental changes [17]–[20]....

    [...]

Proceedings ArticleDOI
18 Jul 2016
TL;DR: How the performance of the two widely used network synchronization protocols, namely the Network Time Protocol and IEEE 1588, scale on a state-of-the-art embedded platform, namely a Beaglebone Black Board is evaluated.
Abstract: Solutions for accurate and fine-grain monitoring are at the basis of the growth of future large-scale green high performance computing (HPC) infrastructures. The capability of these systems to adapt to specific application requirements relies on sensing and correlating several distributed physical parameters with application phases and states. Meeting such requirements allows thus to achieve a better use of the resources, higher throughput and higher energy-efficiency. As the capability of drawing such correlations relies on the synchronization across a network of nodes and measuring devices, the use of synchronization protocols becomes a critical component. Novel low-cost embedded devices start to include hardware support for network synchronization protocols to achieve a high resolution time accuracy. These devices are promising for monitoring physical parameters of HPC infrastructures. In this paper we evaluate how the performance of the two widely used network synchronization protocols, namely the Network Time Protocol and IEEE 1588, scale on a state-of-the-art embedded platform, namely a Beaglebone Black Board.

11 citations


Cites background from "Energy-aware cooling for hot-water ..."

  • ...Works in [9][10] show that this cost can be reduced when advanced cooling control policies, based on extensive monitoring, are in place....

    [...]

References
More filters
Book
17 Oct 1996
TL;DR: The Bernoulli Equation of Fluid Kinematics as discussed by the authors is used in the analysis of fluid flow in Pipes and open-channel flow in Turbomachines.
Abstract: Fluid Statics. Elementary Fluid DynamicsThe Bernoulli Equation. Fluid Kinematics. Finite Control Volume Analysis. Differential Analysis of Fluid Flow. Similitude, Dimensional Analysis, and Modeling. Viscous Flow in Pipes. Flow Over Immersed Bodies. Open-Channel Flow. Turbomachines. Appendices. Answers. Index.

322 citations


"Energy-aware cooling for hot-water ..." refers background in this paper

  • ...Young transportation theorem [19], [20]), the overall system dynamics can be expressed as...

    [...]

Proceedings ArticleDOI
07 Nov 2010
TL;DR: 3D-ICE, a compact transient thermal model (CTTM) for the thermal simulation of 3D ICs with multiple inter-tier microchannel liquid cooling, is presented, which offers significant speed-up over a typical commercial computational fluid dynamics simulation tool while preserving accuracy.
Abstract: Three dimensional stacked integrated circuits (3D ICs) are extremely attractive for overcoming the barriers in interconnect scaling, offering an opportunity to continue the CMOS performance trends for the next decade. However, from a thermal perspective, vertical integration of high-performance ICs in the form of 3D stacks is highly demanding since the effective areal heat dissipation increases with number of dies (with hotspot heat fluxes up to 250W/cm2) generating high chip temperatures. In this context, inter-tier integrated microchannel cooling is a promising and scalable solution for high heat flux removal. A robust design of a 3D IC and its subsequent thermal management depend heavily upon accurate modeling of the effects of liquid cooling on the thermal behavior of the IC during the early stages of design. In this paper we present 3D-ICE, a compact transient thermal model (CTTM) for the thermal simulation of 3D ICs with multiple inter-tier microchannel liquid cooling. The proposed model is compatible with existing thermal CAD tools for ICs, and offers significant speed-up (up to 975x) over a typical commercial computational fluid dynamics simulation tool while preserving accuracy (i.e., maximum temperature error of 3.4%). In addition, a thermal simulator has been built based on 3D-ICE, which is capable of running in parallel on multicore architectures, offering further savings in simulation time and demonstrating efficient parallelization of the proposed approach.

296 citations


"Energy-aware cooling for hot-water ..." refers methods in this paper

  • ...If the focus is put at chip level optimization, numerical CFD methods [10], or complex nonlinear identification tools [11], can be used....

    [...]

Book
01 Jan 2005
TL;DR: In this article, two-phase flow and heat transfer is used to transfer heat from two phases to two phases, and two phase flow is used for two phases of heat transfer.
Abstract: Thermodynamics.- Fluid Mechanics.- Heat Transfer.- Two-Phase Flow and Heat Transfer.- Applications.- Engineering Mathematics.

86 citations


"Energy-aware cooling for hot-water ..." refers background in this paper

  • ...Young transportation theorem [19], [20]), the overall system dynamics can be expressed as...

    [...]

Journal ArticleDOI
TL;DR: It is shown that the problem of processor speed control subject to thermal constraints for the environment is a convex optimization problem, and an efficient infeasible-start primal-dual interior-point method for solving the problem is presented.
Abstract: We consider the problem of adjusting speeds of multiple computer processors, sharing the same thermal environment, such as a chip or multichip package. We assume that the speed of each processor (and associated variables such as power supply voltage) can be controlled, and we model the dissipated power of a processor as a positive and strictly increasing convex function of the speed. We show that the problem of processor speed control subject to thermal constraints for the environment is a convex optimization problem. We present an efficient infeasible-start primal-dual interior-point method for solving the problem. We also present a distributed method, using dual decomposition. Both of these approaches can be interpreted as nonlinear static control laws, which adjust the processor speeds based on the measured temperatures in the system. We give numerical examples to illustrate performance of the algorithms.

63 citations

Journal ArticleDOI
TL;DR: A novel design-time/run-time thermal management strategy for improving energy efficiency in 3-D MPSoCs through liquid cooling management and dynamic voltage and frequency scaling (DVFS).
Abstract: 3-D stacked systems reduce communication delay in multiprocessor system-on-chips (MPSoCs) and enable heterogeneous integration of cores, memories, sensors, and RF devices. However, vertical integration of layers exacerbates temperature-induced problems such as reliability degradation. Liquid cooling is a highly efficient solution to overcome the accelerated thermal problems in 3-D architectures; however, it brings new challenges in modeling and run-time management for such 3-D MPSoCs with multitier liquid cooling. This paper proposes a novel design-time/run-time thermal management strategy. The design-time phase involves a rigorous thermal impact analysis of various thermal control variables. We then utilize this analysis to design a run-time fuzzy controller for improving energy efficiency in 3-D MPSoCs through liquid cooling management and dynamic voltage and frequency scaling (DVFS). The fuzzy controller adjusts the liquid flow rate dynamically to match the cooling demand of the chip for preventing overcooling and for maintaining a stable thermal profile. The DVFS decisions increase chip-level energy savings and help balance the temperature across the system. Our controller is used in conjunction with temperature-aware load balancing and dynamic power management strategies. Experimental results on 2-tier and 4-tier 3-D MPSoCs show that our strategy prevents the system from exceeding the given threshold temperature. At the same time, we reduce cooling energy by up to 63% and system-level energy by up to 21% in comparison to statically setting a flow rate setting to handle worst-case temperatures.

56 citations


"Energy-aware cooling for hot-water ..." refers methods in this paper

  • ...In [1], [2], [3], [4] energy efficient cooling control solutions, adjusting the liquid flow rate, are explored for novel technologies, based on 3D MPSoCs with inter-tier liquid cooling systems....

    [...]