C
Chokchai Leangsuksun
Researcher at Louisiana Tech University
Publications - 60
Citations - 1098
Chokchai Leangsuksun is an academic researcher from Louisiana Tech University. The author has contributed to research in topics: Fault tolerance & High availability. The author has an hindex of 19, co-authored 60 publications receiving 1043 citations. Previous affiliations of Chokchai Leangsuksun include Kent State University.
Papers
More filters
Proceedings ArticleDOI
An optimal checkpoint/restart model for a large scale high performance computing system
TL;DR: This work presents a reliability-aware method for an optimal checkpoint/restart strategy that can deal with a varying checkpoint interval and with different failure distributions, and aims at addressing fault tolerance challenge, especially in a large-scale HPC system.
Journal ArticleDOI
ASC: an associative-computing paradigm
TL;DR: A parallel programming paradigm called ASC (ASsociative Computing), designed for a wide range of computing engines, that incorporates data parallelism at the base level, so that programmers do not have to specify low-level sequential tasks such as sorting, looping and parallelization.
Proceedings ArticleDOI
Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments
TL;DR: A model that aims to reduce full checkpoint overhead by performing a set of incremental checkpoints between two consecutive full checkpoints is built and a method to find the number of those incremental checkpoints is given.
Proceedings ArticleDOI
A Framework for Proactive Fault Tolerance
Geoffroy Vallée,Christian Engelmann,Anand Tikotekar,Thomas Naughton,K. Charoenpornwattana,Chokchai Leangsuksun,S.L. Scott +6 more
TL;DR: This document presents a proactive fault tolerance framework that can use different reactive fault tolerance mechanisms, i.e., migration and pause/un-pause and allows the implementation of new proactive faultolerance policies thanks to a modular architecture.
Proceedings ArticleDOI
Availability modeling and analysis on high performance cluster computing systems
TL;DR: This paper proposes a single framework that coordinates event monitoring, filtering, data analysis and dynamic availability modeling, and a sample analysis of real time event logs from a 512 node cluster from Lawrence Livermore National Laboratory.