A component-based framework for certification of components in a cloud of HPC services
TL;DR: A Verification-as-a-Service (VaaS) framework for component certification on HPC Shelf is presented, aimed at providing higher confidence that components of parallel computing systems of HPC shelf behave as expected according to one or more requirements expressed in their contracts.
Abstract: HPC Shelf is a proposal of a cloud computing platform to provide component-oriented services for High Performance Computing (HPC) applications. This paper presents a Verification-as-a-Service (VaaS) framework for component certification on HPC Shelf . Certification is aimed at providing higher confidence that components of parallel computing systems of HPC Shelf behave as expected according to one or more requirements expressed in their contracts. To this end, new abstractions are introduced, starting with certifier components. They are designed to inspect other components and verify them for different types of functional, non-functional and behavioral requirements. The certification framework is naturally based on parallel computing techniques to speed up verification tasks.
Summary (8 min read)
- HPC Shelf is a cloud computing platform aimed at addressing domain-specific, computationally intensive problems typically emerging from computational science and engineering domains.
- In HPC Shelf, they must be able to identify and combine components to form parallel computing systems.
- Through the proposed framework, components called certifiers may use a set of different certification tools to certify that the components of parallel computing systems meet a certain set of requirements.
- The case studies used to demonstrate the proposed certification framework are particularly focused on functional and behavioral requirements that can be verified through automated verification methods and tools, such as theorem provers and model checkers.
- From this assessment, the following outstanding features and contributions have been identified in favor of the certification framework of HPC Shelf: .
2. HPC Shelf
- HPC Shelf is a cloud computing platform that provides HPC services for providers of domain-specific applications.
- An application is a problem-solving environment through which specialist users, the end users of HPC Shelf, specify problems and obtain computational solutions for them.
- It is assumed that these solutions are computationally intensive, thus demanding the use of large-scale parallel computing infrastructure, i.e. comprising multiple parallel computing platforms engaged in a single computational task.
- Applications generate computational solutions as component-oriented parallel computing systems.
- To do so, these components comply to Hash , a parallel component model whose components may exploit parallel processing in distributed-memory parallel computing platforms.
2.1. Component kinds of parallel computing systems
- Component platforms that comply to the Hash component model distinguish components according to a set of component kinds.
- Action bindings connect a set of action ports belonging to computation and connector components.
- It may be programmed by using a general-purpose programming language (currently, C#) or SAFeSWL (SAFe Scientific Workflow Language), an XML-based orchestration language designed for activating the computational tasks of the solution components in a prescribed order .
- In a MapReduce parallel computing system, mappers, reducers, splitters, and shufflers may be connected through bindings between their compatible collect_pairs and feed_pairs ports.
- In turn, the connectors have only the read_chunk/finish_chunk and chunk_ready action names, distributed in their facets.
- The following stakeholders work around HPC Shelf: The specialists (end users) use applications for specifying problems using a domain-specific interface.
- They do not handle directly with components, which are hidden behind the domain-specific abstractions of the application interface.
- The providers create and deploy applications, by designing their high-level interfaces and by programming the generation of parallel computing systems.
- For that, they are experts in parallel computer architectures and parallel programming.
- Through contextual contracts, they may specify the architectural features of the virtual platforms they support.
- The multilayer cloud architecture of HPC Shelf for servicing applications comprises the three elements in Fig. 5: Frontend, Core and Backend.
- The Frontend is SAFe (Shelf Application Framework) , a collection of classes and design patterns used by providers to build applications.
- It supports SAFeSWL as a language for specifying parallel computing systems.
- In turn, using the orchestration subset, the provider may program its workflow, by specifying the order in which action names must be activated.
- Applications access the services of the Core for resolving contextual contracts and deploying the selected components on virtual platforms.
2.5. Contextual contracts
- Since both abstract components named Mapper and Reducer are derived from MRComputation, function is used to specify the custom map and reduce functions that they will execute in particular MapReduce computations.
- A contextual contract is an abstract component whose context parameters have particular execution context assumptions associated to each one of them.
- In the classification phase, the list of candidate system components is ordered taking into account the best fulfillment of the contract requirements and the resource allocation policies of HPC Shelf.
3. The certification framework
- For the purpose of leveraging component certification in HPC Shelf, a certification framework is introduced in this section.
- It encompasses a set of component kinds, composition rules and design patterns.
- They provide an environment where certification tools can be encapsulated into components to provide some level of assurance to application providers and component developers that components of parallel computing systems meet a predetermined set of requirements prior to their instantiation.
- Each certifier associated with a certifiable component may impose its own set of obligations on compatible certifiable components, such as the use of certain programming languages, design patterns, code conventions, annotations, etc.
- The service interface determines which kind of ad hoc properties are supported and how they are specified.
3.1. Parallel certification systems
- Certifier components are implemented as parallel certification systems, comprising the following architectural elements, as depicted in Fig. 6: A set of tactical components; A certification-workflow component that orchestrates the tactical ones; A set of bindings, connecting the tactical components to the certification-workflow component.
- The certification-workflow component performs a certification procedure on the certifiable components connected to the certifier.
- Parallel certification systems are analogous to parallel computing systems, but aimed at certification purposes.
- In turn, tactical components play the role of solution components.
- For this reason, they must be seen as special kinds of virtual platforms on which the proof infrastructure is installed and ready to run verification tasks.
3.2. Tactical components
- As stated earlier, a tactical component encapsulates a certification infrastructure comprising one or more certification tools.
- An action port with action names perform, conclusive and inconclusive, where the latter two are alternative; .
- When the certification subroutine terminates, either conclusive or inconclusive may be activated by the tactical component.
- In the current implementation, the contextual signature of Tactical, the component type from which specific tactical components are derived, is similar to that of EnvironmentPortCertification.
3.3. Certifier components
- In the orchestration code, an activation of the action certify will instantiate a parallel certification system for each certifier, which will certificate the certifiable component in parallel.
- Each one may be reused to certify all certifiable components associated with the same certifier, when their certify actions are activated.
- After the certification procedure, a certifiable component is considered certified if all default, contractual, component and ad hoc properties have been checked by the certifier.
- Regarding the action ports, the certification-workflow component has ports to be connected to the action ports of its tactical components.
- The units may run on different data partitions and synchronize by exchanging messages in a discipline akin to the MPI programming model .
4.1. Tactical components for C4
- The verification of computation components may resort to two different classes of methods and tools.
- The first class is based on deductive program verification, which partially automate axiomatic program verification based on some variant of the Floyd-Hoare logic.
- The alternative approach explores the space of reachable states of a system through model checking.
4.1.1. Deductive tactical components
- Tactical components for deductive verification require the target component programs to be annotated with assertions in the style of the Floyd-Hoare logic or its extensions, namely, separation logic , for mutable data structures, Owicki-Gries reasoning , for shared-memory parallel programs, and Apt’s reasoning , for distributed-memory components.
- Actually, only ParTypes  can verify C/MPI programs, annotated in the syntax of VCC , against a high-level communication protocol stored by the certifier as an ad hoc property.
- In such a case, they are equipped with reasonably complex interfaces for editing, searching and choosing suitable proof procedures and heuristics.
- Using this approach, the application may either automatically interact with the tactical component or require some intervention of the specialist user to proceed the verification subroutine.
4.1.2. Model checking tactical components
- Model checking provides a powerful alternative to deductive verification tools to establish properties of MPI programs.
- In the context of the certification framework discussed in this article, the following tools were explored: ISP (In-situ Partial Order)  and CIVL (Concurrency Intermediate Verification Language) .
- Both verify a fixed , although sufficiently expressive set of safety properties.
- The former handles deadlock absence, assertion violations, MPI object leaks, and communication races (i.e. unexpected communication matches) in components written in C, C++ or C#, carrying MPI/OpenMP directives.
- CIVL, on its turn, is also able to establish functional equivalence between programs and is able to discharge verification conditions to the provers Z3, CVC3 and CVC4.
4.2. Contextual contracts and architecture
- Thus, C4 certifiers may prescribe the host programming language on which the computation component is written, as well as the message passing library for communication between the units of the computation component.
- Also, certifiers can determine whether or not the ad hoc properties are supported.
- The tactical components of C4MPIComplex are ISP, JWFA and CZ, whose abstract components restrict the bounds of the context parameters of Tactical to define the interface types through which they talk to the certification-workflow component.
- Like in most scientific workflows management systems, such as Askalon , BPEL Sedna , Kepler , Pegasus , Taverna  and Triana , SAFeSWL workflows are usually represented by components and execution dependencies among them, usually adopting abstract descriptions of components and abstracting away from the computing platforms on which they run.
- At an appropriate time of the workflow execution, a resolution procedure may be triggered for discovering an appropriate component implementation, making it relevant to ensure that the activation of computational actions of components is made after their effective resolution.
5.2. Architecture and contextual contracts
- SWC2 prescribes two default properties: Deadlock absence; Obedience to the protocol in which lifecycle actions must be activated, for each component, presented in Section 2.1.
- It supports all the default properties prescribed for SWC2 certifiers, as well as ad hoc properties.
- There are two concrete components of mCRL2Certifier.
- MCRL2Certifier certifiers may be able to exploit parallelism by initiating different verification tasks on distinct processing nodes of the tactical component.
- For simplifying formulas, mCRL2 allows the use of regular expressions over the set of actions as possible labels of both necessity and eventuality modalities.
5.3.1. The translation process
- The translation process follows directly the operational rules (Fig. 10) defined for an abstract grammar of a formal specification of the orchestration subset of SAFeSWL (Fig. 9).
- Rule finish indicates that an action asynchronously activated can actually occur, having its handle registered in F and emitting an action to the system ((a,h)).
- Rules select-left and select-right indicate the need for the creation of mCRL2 processes that control the state (enabled/disabled) of actions.
- Rules repeat, continue and break indicate, respectively, the need for the creation of a mCRL2 process that manages a repetition task in order to detect the need for a new iteration, the return back to the beginning of the iteration, or the end of the iteration.
- The first part of the conjunction states that a deploy may not be performed before a resolve.
6. Case studies
- Three case studies demonstrate the certification framework of HPC Shelf, as well as the use of C4 and SWC2 certifiers.
- The workflow maintains three versions of each image, which overlap the center of the Pleiades cluster, each corresponding to a different color band: red, infrared and blue.
- Fig. 12(a) presents the architecture of a single subworkflow.
- To do this, the application provider must configure certification bindings between these component instances and one or more instances of C4MPISimple.
- Fig. 15 reports execution times for this certification case study, by varying the number of processing nodes and cores per node involved in the execution of the tactical component ISP.11.
6.2.1. The non-iterative system with three stages
- They are the intermediate stages of a pipeline.
- It is required an intermediate for communication between reducer_2 and application, since they are placed in distinct virtual platforms.
- The workflow of the non-iterative system initially performs the lifecycle action activation sequence (resolve, deploy, instantiate, and run) for all components, because, in a pipeline pattern, they are required from the beginning of the computation.
- Then, computations and connectors that will be placed on these virtual platforms.
- After all the iterations are terminated, the parallel activation completes and all components are released.
6.2.2. The iterative system with a single stage
- In the iterative workflow (Fig. 17), the single stage consists of a shuffler and a pair of parallel reducers.
- In turn, before the next iterations, the following code is executed to enable the collector facets that receive pairs of the reducers and disable the collector facet that receive pairs of source: 0 <parallel> 1 <sequence> 2 <invoke port="task_shuffle_collector_active_status_0" action="CHANGE_STATUS_BEGIN.
- Mappers and reducers receive chunks of input pairs in the first iteration (read_chunk/finish_chunk activation) and process them (invocation to the mapping or reduction function) when the action perform is activated.
- The former describes precedences of execution between two distinct components or component actions.
- Fig. 18 depicts the average certification times for both workflows by varying the number of units (processing nodes) of the tactical component from 1 to 16.
6.3. Parallel sorting
- Parallel sorting is often used in HPC systems when dealing with huge amounts of data .
- The contextual signature of Sorting declares a set of context parameters that may guide the choice of a sorting component that implement the supposedly better algorithm according to the contextual contract.
- Sorting_place states whether internal or external sorting must be employed, also known as They are described below.
- Contrariwise, it may employ a noncomparison-based algorithm.
- In turn, number_nodes, muticore, and accelerator_type are so-called platform parameters, since they describe properties of the underlying parallel computing platform that must be taken into account in the component implementation.
6.3.1. Certifying parallel sorting components
- Let QuickSortImpl and MergeSortImpl be two concrete components of Sorting that implement parallel versions of the well-known Quicksort and Mergesort algorithms, respectively.
- Also, virtual platforms containing 2 processing nodes have been chosen for all tactical components.
- For both QuickSortImpl and MergeSortImpl, all default properties of C4MPIComplex have been proved successfully.
- The parallel times calculated for this case study makes it possible to conclude that, in general, the smallest times happen for tactical components with a single unit running in a processing node with many cores.
- The case studies with Montage, MapReduce and Integer Sorting are primarily aimed at demonstrating the feasibility of certifying components of distinct kinds using the certification framework of HPC Shelf.
- Once all the certification processes involved in the case studies have been completed, the experiment has been successful to demonstrate this.
- Therefore, it is important to emphasize that the experiments whose results are evaluated in this article do not have the ambition to constitute a definitive validation study of the certification framework of HPC Shelf.
- The case studies have also shown how the inherent parallelism supported by the certification framework, using the parallel computing infrastructure of HPC Shelf itself, may be used to accelerate certification tasks, even if the underlying certification tools have not been developed with parallelism in mind, which is the case of the theorem provers and model checkers used in the experiments.
- To reinforce this expectation, it is worth noting that, despite the current implementation of the certification framework is not optimal in relation to performance, the certification times achieved in the experiments, varying between 20 seconds and 12 minutes, are not influenced by possible implementation overheads.
7.1. Certification of software components
- The certification of software components is an active research area in component-based software engineering (CBSE) since the 1990’s [3–5,7].
- From the current literature, the authors define the certification of software components as the study and application of methods and techniques intended to provide a well-defined level of confidence that the components of a system meet a given set of requirements.
- The literature does not mention other proposals of general-purpose certification artifacts in the context of CBHPC13 research, which could be directly compared to the certification framework of HPC Shelf.
- In such applications, incorrect results and execution failures may cause unsustainable increases in project costs and schedules.
7.2. Verification-as-a-service (VaaS)
- As pointed out earlier, the kind of certification focused on this paper is the verification of functional and behavioral properties of components of parallel computing systems in a cloud environment through formal methods, automated by deductive and model-checking tools.
- The authors have systematically searched for related work on VaaS in the most comprehensive databases of scientific literature in computer science: IEEE,14 Scopus,15 ACM16 and Science Direct,17 applying the search string “(platform OR framework) AND service AND formal AND verification AND component AND cloud” to title, abstract and keywords fields.
- Most of the discarded papers do not propose platforms or frameworks for the intended purpose.
- The column Total 1 represents the number of distinct papers found, i.e. after removing redundancies.
- The following sections (7.2.1 and 7.2.2) describe the papers classified in these two groups, respectively.
7.2.1. Verification of cloud administration concerns
- Evangelidis et al. propose a probabilistic verification scheme aimed at dynamically evaluating auto-scaling policies of IaaS and PaaS virtual machines in Amazon EC2 and Microsoft Azure .
- For that, it applies a Markov model implemented in the PRISM model checker .
- Zhou et al. propose a formal framework for resource provisioning as a service .
- Di Cosmo et al. propose the Aeolus component model .
- The architecture and methodology for enabling SDV to operate in Azure, as well as the results of SDV on single drivers and driver suites using various configurations of the cloud relative to a local machine are reported.
7.2.2. Verification of functional requirements
- Nezhad et al. propose COOL, a framework for provider-side design of cloud solutions based on formal methods and model-driven engineering .
- Klai and Ochi address the problem of abstracting and verifying the correctness of integrating service-based business processes (SBPs) .
- Skowyra et al. present Verificare, a verification platform for applications based on Software-Defined Networks (SDN) .
- Three layers compose the framework: graphical layer, which uses sequence diagrams for system modeling; formal specification layer, which uses π -calculus to formalize the UML sequence diagram; and verification layer, in which π -calculus processes are verified by NuSMV.
- It parallelizes symbolic execution, a popular model checking technique, to run on large shared-nothing clusters of computers, such as Amazon EC2.
- In comparison with the related work described above, the certification framework of HPC Shelf has the following distinguishing characteristics: .
- It is a general-purpose framework that can be used for automatic certification of a wide range of requirements, including both functional and non-functional, while other works address a particular requirement.
- For that, it may employ the same parallel computing infrastructure where certifiable components perform their tasks.
- In turn, certifier selection is a component developer responsibility, using contextual contracts.
- Also, when designing certifier components, certification authorities may provide high-level interfaces to facilitate the interaction of application providers and component developers with the underlying verification tools.
Did you find this useful? Give us your feedback
"A component-based framework for cer..." refers methods in this paper
...MapReduce is the parallel processing model of a number of large-scale parallel processing frameworks ....
...Examples include Alt-Ergo , CVC3 , CVC4  and Z3 ....
"A component-based framework for cer..." refers background in this paper
...Well known examples include Coq  and Isabelle/HOL ....
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions in "A component-based framework for certification of components in a cloud of hpc services" ?
In this paper, the authors propose a certification framework for HPC Shelf, a cloud-based general-purpose certification framework.