scispace - formally typeset
Search or ask a question
Author

Sanjay Kale

Other affiliations: University of Tennessee
Bio: Sanjay Kale is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Petascale computing & Resilience (network). The author has an hindex of 2, co-authored 2 publications receiving 968 citations. Previous affiliations of Sanjay Kale include University of Tennessee.

Papers
More filters
Journal ArticleDOI
01 Feb 2011
TL;DR: The work of the community to prepare for the challenges of exascale computing is described, ultimately combing their efforts in a coordinated International Exascale Software Project.
Abstract: Over the last 20 years, the open-source community has provided more and more software on which the world’s high-performance computing systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. However, although the investments in these separate software elements have been tremendously valuable, a great deal of productivity has also been lost because of the lack of planning, coordination, and key integration of technologies necessary to make them work together smoothly and efficiently, both within individual petascale systems and between different systems. It seems clear that this completely uncoordinated development model will not provide the software needed to support the unprecedented parallelism required for peta/ exascale computation on millions of cores, or the flexibility required to exploit new hardware models and features, such as transactional memory, speculative execution, and graphics processing units. This report describes the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project.

736 citations

Journal ArticleDOI
06 Apr 2014
TL;DR: This paper surveys what the community has learned in the past five years and summarizes the research problems still considered critical by the HPC community.
Abstract: Resilience is a major roadblock for HPC executions on future exascale systems. These systems will typically gather millions of CPU cores running up to a billion threads. Projections from current large systems and technology evolution predict errors will happen in exascale systems many times per day. These errors will propagate and generate various kinds of malfunctions, from simple process crashes to result corruptions.The past five years have seen extraordinary technical progress in many domains related to exascale resilience. Several technical options, initially considered inapplicable or unrealistic in the HPC context, have demonstrated surprising successes. Despite this progress, the exascale resilience problem is not solved, and the community is still facing the difficult challenge of ensuring that exascale applications complete and generate correct results while running on unstable systems. Since 2009, many workshops, studies, and reports have improved the definition of the resilience problem and provided refined recommendations. Some projections made during the previous decades and some priorities established from these projections need to be revised. This paper surveys what the community has learned in the past five years and summarizes the research problems still considered critical by the HPC community.

275 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work reviews the recent status of methodologies and techniques related to the construction of digital twins mostly from a modeling perspective to provide a detailed coverage of the current challenges and enabling technologies along with recommendations and reflections for various stakeholders.
Abstract: Digital twin can be defined as a virtual representation of a physical asset enabled through data and simulators for real-time prediction, optimization, monitoring, controlling, and improved decision making. Recent advances in computational pipelines, multiphysics solvers, artificial intelligence, big data cybernetics, data processing and management tools bring the promise of digital twins and their impact on society closer to reality. Digital twinning is now an important and emerging trend in many applications. Also referred to as a computational megamodel, device shadow, mirrored system, avatar or a synchronized virtual prototype, there can be no doubt that a digital twin plays a transformative role not only in how we design and operate cyber-physical intelligent systems, but also in how we advance the modularity of multi-disciplinary systems to tackle fundamental barriers not addressed by the current, evolutionary modeling practices. In this work, we review the recent status of methodologies and techniques related to the construction of digital twins mostly from a modeling perspective. Our aim is to provide a detailed coverage of the current challenges and enabling technologies along with recommendations and reflections for various stakeholders.

660 citations

Journal ArticleDOI
01 May 2014
TL;DR: This report presents a report produced by a workshop on ‘Addressing failures in exascale computing’ held in Park City, Utah, 4–11 August 2012, which summarizes and builds on discussions on resilience.
Abstract: We present here a report produced by a workshop on 'Addressing failures in exascale computing' held in Park City, Utah, 4-11 August 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system, discuss existing knowledge on resilience across the various hardware and software layers of an exascale system, and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia, and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.

406 citations

Journal ArticleDOI
TL;DR: This work unifies traditionally separated high-performance computing and big data analytics in one place to accelerate scientific discovery and engineering innovation and foster new ideas in science and engineering.
Abstract: Scientific discovery and engineering innovation requires unifying traditionally separated high-performance computing and big data analytics.

373 citations

Journal ArticleDOI
TL;DR: New opportunities in materials design enabled by the availability of big data in imaging and data analytics approaches, including their limitations, in material systems of practical interest are discussed.
Abstract: Harnessing big data, deep data, and smart data from state-of-the-art imaging might accelerate the design and realization of advanced functional materials. Here we discuss new opportunities in materials design enabled by the availability of big data in imaging and data analytics approaches, including their limitations, in material systems of practical interest. We specifically focus on how these tools might help realize new discoveries in a timely manner. Such methodologies are particularly appropriate to explore in light of continued improvements in atomistic imaging, modelling and data analytics methods.

295 citations

Journal ArticleDOI
01 Feb 2013
TL;DR: This study considers multiphysics applications from algorithmic and architectural perspectives, where “algorithmic” includes both mathematical analysis and computational complexity, and “architectural’ includes both software and hardware environments.
Abstract: We consider multiphysics applications from algorithmic and architectural perspectives, where “algorithmic” includes both mathematical analysis and computational complexity, and “architectural” includes both software and hardware environments. Many diverse multiphysics applications can be reduced, en route to their computational simulation, to a common algebraic coupling paradigm. Mathematical analysis of multiphysics coupling in this form is not always practical for realistic applications, but model problems representative of applications discussed herein can provide insight. A variety of software frameworks for multiphysics applications have been constructed and refined within disciplinary communities and executed on leading-edge computer systems. We examine several of these, expose some commonalities among them, and attempt to extrapolate best practices to future systems. From our study, we summarize challenges and forecast opportunities.

278 citations