scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Performability evaluation of multipurpose multiprocessor systems: the "separation of concerns" approach

01 Feb 2003-IEEE Transactions on Computers (IEEE Computer Society)-Vol. 52, Iss: 2, pp 223-236
TL;DR: This work provides a modeling framework for evaluating performability measures of Multipurpose, Multiprocessor Systems (MMSs) and evaluates comprehensive measures defined with respect to the end-user service requirements and specific measures in relation to the distributed shared memory paradigm.
Abstract: The aim of our work is to provide a modeling framework for evaluating performability measures of Multipurpose, Multiprocessor Systems (MMSs). The originality of our approach is in the explicit separation between the architectural and environmental concerns of a system. The overall dependability model, based on stochastic reward nets, is composed of 1) an architectural model describing the behavior of system hardware and software components, 2) a service-level model, and 3) a maintenance policy model. The two latter models are related to the system utilization environment. The results can be used for supporting the manufacturer design choices as well as the potential end-user configuration selection. We illustrate the approach on a particular family of MMSs under investigation by a system manufacturer for Internet and e-commerce applications. As the systems are scalable, we consider two architectures: a reference one composed of 16 processors and an extended one with 20 processors. Then, we use the obtained results to evaluate the performability of a clustered system composed of four reference systems. We evaluate comprehensive measures defined with respect to the end-user service requirements and specific measures in relation to the distributed shared memory paradigm.

Summary (7 min read)

1. Introduction

  • Computer systems are becoming more and more complex.
  • Even though particular applications may need the design of proprietary systems to achieve the required functions, economic reasons promote the use of support systems available on the market.
  • The considered MMS family is scalable, most of its components are COTS and it features a distributed shared memory.
  • It is thus important to take them into account for evaluating system performability.
  • In the following, Section 2 describes their modeling approach and presents current approaches to model construction.

2. Modeling Approach

  • The model built for evaluating system dependability and performance is always a tradeoff between faithfulness (i.e., correct representation of the real systems behavior) and tractability (i.e., ability to solve the model equations to obtain the measures) [19].
  • If the system to analyze is too large, a detailed model representing its in depth behavior may be faithful but might be non-tractable because of its size.
  • The use of an appropriate modeling technique such as the Generalized Stochastic Petri Nets [1] offers a good support.
  • This is one of the aims of this paper in which modeling is based on Stochastic Reward Nets (SRNs) [13], that are obtained from GSPNs by assigning reward rates to tangible markings.
  • In the rest of this section, the authors present their modeling approach and outline some other modeling approaches.

2.1 Separation of Concerns Approach

  • Even though it is possible to model a multipurpose system without considering the end-user requirement in terms of expected services, the results are more accurate if the environmental aspects are incorporated.
  • Hence, their modeling approach takes into account explicitly the environmental context together with the architectural concerns.
  • The latter incorporates several kinds of information.
  • For their purpose, the authors consider two models corresponding to the service-level and maintenance policy models.

2.1.1 Architectural Model

  • The architectural model describes the behavior of system components (including all basic hardware and software components as well as fault tolerance mechanisms) as resulting from elementary failures, repairs and interactions regardless of system utilization environment.
  • The set of failure mode places is grouped into a separate layer, referred to as the failure mode layer.
  • In [24], the authors have defined rules for constructing efficiently the component model.
  • The failure mode layer is used to facilitate the interface between the architectural and environmental models.
  • Also, the information required for component model from the service-level and maintenance policy models is available in the failure mode layer.

2.1.2 Service-Level Model

  • The system service levels are defined with respect to the different service accomplishment and degradation state classes accepted by the considered end-user application.
  • The service levels are thus partitions of the architecture failure modes.
  • In the service level model, places represent the various service levels as seen by the end-user.
  • One and only one place can be marked at any time (including the initial marking) indicating the system current service level.
  • Indeed, the service-level model is not autonomous.

2.1.3 Maintenance Policy Model

  • The maintenance policy model describes the maintenance types (e.g., immediate, delayed or programmed), repair resources availability and might include (re)booting procedures, according to the service level and/or the architectural failure mode.
  • The interconnection between the servicelevel and maintenance policy models represent for example calls for maintenance or notification of maintenance end.

2.1.4 Benefits of our Modeling Approach

  • Even though modeling is performed from a system manufacturer point of view, the authors incorporate explicitly the end-user point of view in order to evaluate comprehensive performability measures that are especially relevant to the end-user, in addition to specific measures that may be too detailed for the end-user but of prime interest for the system manufacturer.
  • Indeed, the same model could be used to evaluate comprehensive and specific measures.
  • The separation of concerns property allows reuse of the architectural model, when the same architecture is to be used in different environments.
  • Obviously, changing the architectural model will lead to change its links with the service-level and maintenance models.
  • As these links are clearly identified and formalized, the changes can be performed more easily than when the various models are not separated.

2.2 Other Modeling Approaches

  • The authors work is related to performability modeling of multipurpose multiprocessors systems.
  • More generally, concerning computer system dependability and performance modeling, the main difficulty for model construction results from the large number of states of the associated model.
  • They can be grouped into two categories: largeness tolerance and largeness avoidance techniques [32].
  • The basic idea is to construct small sub-models that can be processed in isolation.

3.1 Reference and Extended Systems

  • The extended architecture features a fifth node used as a spare.
  • Since the four basic nodes of the extended architecture are used in the same manner as in the reference architecture and the spare node replaces the first failed node, the remainder of the section will focus on the reference architecture.

3.1.1 Reference Architecture

  • As depicted in Figure 2, the nodes are connected through an interconnection network composed of two redundant rings.
  • To ensure high availability, all nodes are connected to a centralized redundant diagnostic equipment (DEq).
  • Its role is to log and analyze all error events, initiate and control system reboots.
  • The MMS has two external RAIDs (Redundant Array of Independent Disks), ED1 and ED2, with on-line maintenance.
  • Nevertheless, if a node or its interconnection controller fails, the RAID can be used by the other node.

3.1.2 Fault tolerance facilities

  • Reboot with all on-line tests (complete reboot) or with a reduced test set (fast reboot).
  • This assumption can be easily modified for those failures needing system reconfiguration without or with a fast reboot.
  • A node is available as long as at least one processor, one memory bank, one power supply unit, the local and PCI buses with their controllers are available, otherwise, the node is considered as lost and is isolated.
  • For modeling purpose, the authors have associated a coverage factor to each fault tolerance mechanism.

3.1.3 Architectural Failure Modes

  • Given the large number of components and the various impact of their failure on system state and performance, a multitude of failure modes can be defined, related to the nature of the failed components.
  • The authors have carried out a Preliminary Risk Analysis to investigate the consequences of all component failures, individually and in combination.
  • The seven identified failure modes are given in Table 1 among which D0, DD, D1, D2 and D3 correspond to performance degradation states and C to system total failure.
  • B gathers the set of states in which one or more components among the redundant components are lost, but the processing capacity of the architecture is not impaired.
  • Two additional states are added for modeling purpose: the nominal state (OK) and the reboot state (Reb).

3.1.4 Distributed Shared Memory

  • The system features a distributed shared memory, composed of the four RAMs of the nodes (i.e., the sixteen memory banks of the system).
  • As the overall memory is directory based using cache coherent non-uniform memory access (CC-NUMA), the access time to a remote bank is longer reducing system performance.
  • A node can thus be maintained available as long as it has at least one memory bank available among the 4 local banks (when the four memory banks are lost, the node is considered as unavailable).
  • When several memory banks are lost, the overall workload is reorganized.
  • These specific failure modes are identified explicitly to allow the evaluation of the impact of memory failure on system performability.

3.1.5 End-Users and Service Levels

  • The above MMS properties and the failure modes are user-independent.
  • Between B and C, several service degradation levels can be defined according to end-user needs and to the architecture failure modes.
  • User Y needs the entire capacity of the system and does not accept any service degradation.
  • The states belonging to the same service degradation level are grouped into a state class.
  • These classes are denoted respectively: SES, SmD, SMD and SNS.

3.1.6 Maintenance

  • The system supplier usually proposes a maintenance contract.
  • In order not to suspend system activities after each failure and call for a paying maintenance, two maintenance policies are defined: Immediate maintenance: for the states where no service is available, state class SNS.
  • As some time is needed for maintenance team arrival on site.
  • For the other states with service degradation, also known as Delayed maintenance.
  • Different delays may be allocated to different service levels.

3.2 Clustered Systems

  • The manufacturer plans to use the reference architecture as a building block in more complex clustered systems.
  • The clusters have similar failure modes as the ones defined for the reference architecture.
  • The clustered system failure modes are defined in Table 3.
  • Also, the authors defined OKc state where all clusters are in their OK state and Rebc state the simultaneous reboot of all the clusters.

4. Dependability and Performability Measures

  • As stated earlier, the authors consider comprehensive and specific measures.
  • Comprehensive measures are defined according to service levels.
  • Specific measures concern specific features of the system.
  • They concern mainly the distributed shared memory.

4.1 Comprehensive Measures

  • The authors define for X and Y four common dependability measures: two conventional measures noted: A for availability and UA for unavailability; two measures refining the unavailability: UANS for the unavailability due to one of the SNS class state failure modes and UAReb for the unavailability due to the reboot procedure.
  • Furthermore, the availability measure for X is refined into three dependability levels corresponding to the three service levels (ES, mD and MD).
  • Table 5 presents the defined measures for X and Y according to the partitions defined in Section 3.1.5 (where e(t) denotes the system state at time t).
  • Since the system continues to operate in the presence of component failures, with performance degradation, performability measures are of prime interest as emphasized in [2], [21] and [31] for practical cases.
  • The authors assume here that all states in a given service level have the same reward rate.

5. Dependability Modeling

  • The dependability model heavily depends on the evaluated measures.
  • In their case, for each system the authors have built the same model of the selected measures.
  • The extended system architectural model differs by an additional node sub-model and the associated switching procedure taking into account the switching coverage factor.
  • To give an idea about model complexity, the GSPN of the reference system has 183 places, 376 transitions and 82 tokens (in the initial marking) for Xr (182 places and 351 transitions for Yr).
  • The authors illustrate the approach on the GSPN of the four processors of a node in Figure 5 that shows only the main states and transitions.

5.1 Model Description

  • The upper part gives a simplified view of the service-level and maintenance models of Xr. Places in the left side represent the four service levels and places in the right side represent the maintenance states: · Pmain: call for delayed maintenance; · PReb: reboot state; · PM: the repairman availability.
  • The lower part of the figure gives the architectural model with the failure mode layer, for Xr and Yr. For the sake of clarity, not all transitions are shown.
  • Note that these conditions depend on the marking of pace PNS in the service-level model and places PM and PReb in the maintenance policy model.
  • Indeed, these places are the only places that are used to enable firing of transitions in the component model directly, without going through the failure mode layer (as stated in Section 2.1.1).
  • The service level has to be updated accordingly.

5.2 Interactions Between the Models

  • The authors show how the architectural model communicates with service-level and maintenance policy models.
  • The failure of the last processor (transition Tphl) leads to a major service degradation level (because the node becomes unavailable).
  • Pps represents the number of processors with a software failure waiting for reboot and Pph those with a hardware failure waiting for repair.
  • The failure is thus recorded in the failure mode layer to update the service-level model.
  • The firing of tmDMD1 changes the current service level from mD to MD and puts a token in PD1w meaning that the failure has been recorded and is waiting for repair.

6. Results

  • Based on the models of the reference architecture and the extended architecture, as well as on the nominal parameter values, the dependability measures presented in Section 4 have been evaluated and several sensitivity analyses have been carried out.
  • The nominal values of the model parameters (failure rates, repair time, maintenance delay and arrival times and coverage factors) have been either provided by the system manufacturer or assigned according to their experience.
  • Sensitivity analyses have been performed to evaluate their relative influence on dependability and performability measures.
  • For all tables, the bold lines represent the results for nominal parameter values.
  • Finally, the authors present measures related to the clustered system.

6.1 User X

  • The immediate maintenance time is equal to the sum of the maintenance arrival time and the repair time (cf. Section 3.1.6).
  • Table 6 shows that Lm, the cumulative time per year spent in SmD, is not sensitive to the repair time; while LM, the cumulative time per year spent in SMD, and the unavailability are affected.
  • Periodic maintenance could be more expensive than on-request maintenance if the time between two maintenance interventions (i. e., its period) is too short (to improve system availability).
  • Table 10 shows that system dependability is affected by the value of this failure rate.
  • Let us assume that the performance index of a given state class represents the percentage of system processing capacity in this class.

6.2 User Y

  • The availability results according to the repair time are summarized in Table 13.
  • Considering the nominal values, the MTTF is 7175h and the MDT due to SNS is 3h09.
  • This means that immediate maintenance is called on average a little bit more than once a year and that the reboot time of 24 min corresponds to a system reboot after maintenance: the system does not exercise reboots in SES as all failures are tolerated without system reconfiguration.
  • Sensitivity analyses with respect to hardware processor and to the RCC failures are similar to those obtained for Xr. Spare Node.
  • Considering the nominal values of the parameters, the time during which the spare is used is 93h08.

6.3 Tradeoff Availability - Performance

  • Columns 3 and 4 of Table 15 report the expected reward rate and availability of the four systems for the nominal values of the parameters (for Y, the expected reward rate is equal to the availability).
  • They show that the reference architecture provides better availability for user X and better performance for Y: there is thus a tradeoff between system performance and availability.
  • On the other hand, the extended architecture provides better availability and better performance for Y.
  • The difference between X and Y availability is not significant.
  • This result is sensitive to the coverage factor c (the probability of successful switching from the failing node to the spare one), as shown in columns 5 and 6 that give the same measures for a lower coverage factor (respectively 0.95 and 0.85).

6.4 Specific Measures

  • Table 16 gives the expected reward rate at steady state for various performance index values qi associated with the sixteen specific failure modes Memi selected hypothetically3, as no experimental values were available when the authors made the study.
  • In the three chosen cases, the authors assumed that the memory service is considered as unavailable when the unavailable memory banks reach 50% of the overall distributed shared memory (i.e., more than 7 unavailable memory banks).
  • If the memory mean access time without any memory loss is t0, it becomes ti (> t0) for Memi.
  • Table 16 shows that the expected reward rate at steady state is slightly influenced by the performance index values: the decrease is 0.9% between the best case (case 1) and the worst case (case 3).
  • This decrease is not important for a degradable highly available system (with A = 99.98% and UA less than 2 hours per year).

6.5 Clustered Systems

  • As mentioned in Section 3.2, the authors have considered a clustered system based on the studied reference architecture.
  • Since the clustered systems are composed of four reference architectures, it is possible, under some independence assumptions, to compute the clustered systems performability by combining measures of the reference architecture.
  • Nevertheless, more complex assumptions can be used4.
  • For the defined measures, Table 18 shows the impact of PCR, the probability that the common resources (mainly the interconnection ring) are unavailable and PRC, the probability that the ring controller is down.
  • For the two latter, the lowest unavailability (obtained for the reference system with a repair time of 1 hour) is 58 min / year to be compared to about 5 min / year for the clustered system.

6.6 Summary of Results

  • User Y · System unavailability is mainly due to the maintenance time.
  • · The No Service states are reached, on average, once a year and the system is rebooted once a year (following system maintenance).
  • The addition of a spare considerably reduces system unavailability.
  • The availability of clustered systems is very high even compared to the availability of the extended architecture.
  • Indeed, the clustered system is to be used for applications requiring very high-availability.

7. Conclusions

  • This paper was devoted to the brief presentation of a performability modeling approach and its application to a particular family of multipurpose, multiprocessor systems (MMS).
  • Its originality is in the separation between the architectural and the environmental concerns.
  • The results obtained for this MMS can be classified into two categories: those supporting the manufacturer choices and those that will support the potential end-users choices.
  • Of course, these results are not independent and have to be used together.
  • From the end-user perspective, the results concern: · the selection of the maintenance policy (delayed or immediate maintenance); · the choice between the reference architecture and the extended one, and more generally between all available solutions.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-01977511
https://hal.laas.fr/hal-01977511
Submitted on 10 Jan 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Performability Evaluation of Multipurpose
Multiprocessor Systems: The ”Separation of Concerns”
Approach
Mourad Rabah, Karama Kanoun
To cite this version:
Mourad Rabah, Karama Kanoun. Performability Evaluation of Multipurpose Multiprocessor Systems:
The ”Separation of Concerns” Approach. IEEE Transactions on Computers, Institute of Electrical
and Electronics Engineers, 2003, 52 (2), �10.1109/TC.2003.1176988�. �hal-01977511�

1
Accepted for publication in IEEE Transactions on Computers
Special Issue on Reliable Distributed Systems
Performability Evaluation of Multipurpose Multiprocessor Systems:
The “Separation of Concerns” Approach
Mourad Rabah and Karama Kanoun
LAAS-CNRS 7, Avenue du Colonel Roche 31077 Toulouse Cedex 4 France
Tel: +33/561 33 6235, Fax: +33/561 33 6411
kanoun@laas.fr
Abstract
The aim of our work is to provide a modeling framework for evaluating performability measures of
Multipurpose, Multiprocessor Systems (MMSs). The originality of our approach is in the explicit
separation between the architectural and environmental concerns of a system. The overall dependability
model, based on Stochastic Reward Nets, is composed of i) an architectural model describing the
behavior of system hardware and software components, ii) a service-level model and iii) a maintenance
policy model. The two latter models are related to the system utilization environment. The results can be
used for supporting the manufacturer design choices as well as the potential end-user configuration
selection. We illustrate the approach on a particular family of MMSs under investigation by a system
manufacturer for Internet and e-commerce applications. As the systems are scalable, we consider two
architectures: a reference one composed of sixteen processors and an extended one with twenty
processors. Then, we use the obtained results to evaluate the performability of a clustered system
composed of four reference systems. We evaluate comprehensive measures defined with respect to the
end-user service requirements and specific measures in relation to the Distributed Shared Memory
paradigm.
Index terms. Dependability and Performability Evaluation, Stochastic Reward Nets, Modular Modeling,
Multipurpose Multiprocessors Systems, Distributed Shared Memory, Clustered Systems.

2

3
1. Introduction
Computer systems are becoming more and more complex. Even though particular applications
may need the design of proprietary systems to achieve the required functions, economic reasons
promote the use of support systems available on the market. This situation gave rise to a category of
systems that are referred to as multipurpose systems in this paper. Such systems are to some
extent application-independent support systems. Many system providers have developed generic
support systems that can be used in several application domains, for example for instrumentation
and control or Web related applications.
Actually, most of the time such systems are composed of Commercial Off-The-Shelf components
(COTS). However, despite their wide use, COTS components are far from being highly dependable.
As a consequence, a careful design, development and validation of COTS-based systems are needed
to provide dependability and high-performance. Indeed, the end-users of a given system want at the
same time high dependability and performance, ease of use and low cost. Furthermore, low
dependability or low performance of a system will ruin the manufacturer's reputation. Hence the
importance of the evaluation of combined dependability and performance measures referred to as
performability measures [20].
Our work was motivated by the desire of a system manufacturer to evaluate the performability of
a family of Multipurpose Multiprocessor Systems (MMSs) under development, intended for
Internet and e-commerce applications. The considered MMS family is scalable, most of its
components are COTS and it features a distributed shared memory. A reference architecture
composed of 16 processors grouped into 4 nodes has been defined. It can be used for various
applications requiring different performance and/or dependability levels. Based on this reference
architecture, a whole family can be designed for applications necessitating either higher
performance or higher dependability or both. In particular, up to 8 reference architectures can be
grouped to form a clustered system. Our first aim was to provide a framework for modeling the

4
performability of different systems of this family. However, the approach presented in this paper is
more general and can be used beyond this particular family of systems.
Even though an MMS is application-independent, usually the system manufacturer targets some
particular classes of utilization. He expects to provide high dependability and performance, at least
for these targeted utilizations. It is thus important to take them into account for evaluating system
performability.
The originality of our approach is in the explicit separation between the architectural and the
environmental concerns of a system. In this way, we can clearly analyze the impact of the
architectural choices on system performability while the end-user context is taken into account
explicitly. Moreover, the results can be used for supporting the manufacturer design choices as well
as the potential end-user configuration selection.
This approach allows us to consider two sets of performability measures:
· Comprehensive measures, defined with respect to service accomplishment levels that are directly
related to the end-user expectations.
· Specific measures, related to specific features of the system, interesting particularly the system
manufacturer for tuning the support system architecture, to study for example the impact of using
a distributed shared memory.
This modeling approach is presented briefly in this paper and applied to two systems pertaining to
the MMS family defined by the system manufacturer who initiated this work. The results presented
illustrate the kind of outcomes that can be obtained from the MMS performability evaluation. They
show the impact of architectural solutions as well as the impact of different utilization environments
on system performability. This paper elaborates on our previous work reported in [25, 26] ([25] was
dedicated to the specific measures related to the distributed shared memory, while [26] has not
addressed the clustered system and the specific measures).
In the following, Section 2 describes our modeling approach and presents current approaches to
model construction. Section 3 introduces the systems under consideration and Section 4 defines

Citations
More filters
Proceedings ArticleDOI
28 Apr 2010
TL;DR: This paper proposes an online reliability monitoring approach, which combines static reliability modeling and dynamic analysis to periodically evaluate system reliability trend during operation.
Abstract: Reliability is one of the major concerns for software engineers. The increasing size of software systems and their inherent complexity - which is essentially related to the intricate interdependencies among many heterogeneous components - pose serious difficulties to its assessment and assurance. The actual system runtime behavior is difficult to forecast during the development phase, and just relying upon sound design and testing techniques is often not sufficient to deliver highly reliable systems. In order to guarantee high reliability, system behavior needs to be monitored at runtime and its reliability needs to be periodically estimated during operation, taking into account both structural/static and behavioral/dynamic information. In this paper, we propose an online reliability monitoring approach, which combines static reliability modeling and dynamic analysis to periodically evaluate system reliability trend during operation. Its usage is illustrated by a prototype implementation and a case- study.

23 citations

Journal ArticleDOI
C. Constantinescu1
TL;DR: The main trends in dependability of semiconductor devices are discussed, and a candidate architecture for a fault-tolerant microprocessor is presented to mitigate the impact of the higher rates of occurrence of transient & intermittent faults.
Abstract: High dependability has become a paramount requirement for computing systems, as they are increasingly used in business & life critical applications. Advances in the design & manufacturing of semiconductor devices have increased the performance of computing systems at a dazzling pace. However, smaller transistor dimensions, lower power voltages, and higher operating frequencies have negatively impacted dependability by increasing the probability of occurrence of transient & intermittent faults. This paper discusses the main trends in dependability of semiconductor devices, and presents a candidate architecture for a fault-tolerant microprocessor. Dependability of the processor is analyzed, and the advantages provided by fault tolerance are underscored. The effect of the higher rates of occurrence of the transient & intermittent faults on a typical microprocessor is evaluated with the aid of GSPN modeling. Dependability analysis shows that a five times increase of the rate of occurrence of the transients leads to about five time lower MTBF, if no error recovery mechanisms are employed. Significantly lower processor availability is also observed. The fault-tolerant processor is devised to mitigate the impact of the higher transient & intermittent fault rates. The processor is based on core redundancy & state checkpointing, and supports three levels of error recovery. First, recovery from a saved state (SSRC) is attempted. The second level consists of a retry (SSRR), and is activated when the first level of recovery fails. Processor reset, followed by reintegration under the operating system control (RB), is the third level of recovery. Dependability analysis, based on GSPN, shows that fault-tolerance features of the processor preserve the MTBF, even if the rate of the transient faults nearly doubles. In terms of availability, a four-time increase of the rate of occurrence of the transients is compensated. The effect of intermittent faults is also analyzed. A five-time increase of the failure rate of the intermittent faults may lower MTBF by 31% to 33%. MTBF decreases even more, by 45% to 67%, if bursts of errors are considered. Intermittent faults have a negative impact on availability as well. Maintaining the dependability of complex integrated circuits to the level available today is becoming a challenge as semiconductor integration continues at a fast pace. Fault avoidance techniques, mainly based on process technology & circuit design, will no be able to fully mitigate the impact of higher rates of occurrence of transient & intermittent faults. As a result, fault-tolerant features, specific to custom designed components today, ought to be employed by COTS circuits, in the future. Enhanced concurrent error detection & correction, self checking circuits, space & time redundancy, triplication, and voting all need to be integrated into semiconductor devices in general, and microprocessors in particular, in order to improve fault & error handling.

22 citations


Cites methods from "Performability evaluation of multip..."

  • ...Readers can find additional details about these modeling techniques & application examples in [4], [7], [14], [36]....

    [...]

Journal ArticleDOI
TL;DR: This paper provides a solution to the problem of formally defining the concept of model templates, formally defining a specification language for model template templates, defining an automated instantiation and composition algorithm, and applying the approach to a case study of a large-scale distributed system.
Abstract: Dependability and performance analysis of modern systems is facing great challenges: their scale is growing, they are becoming massively distributed, interconnected, and evolving. Such complexity makes model-based assessment a difficult and time-consuming task. For the evaluation of large systems, reusable submodels are typically adopted as an effective way to address the complexity and to improve the maintainability of models. When using state-based models, a common approach is to define libraries of generic submodels, and then compose concrete instances by state sharing, following predefined “patterns” that depend on the class of systems being modeled. However, such composition patterns are rarely formalized, or not even documented at all. In this paper, we address this problem using a model-driven approach, which combines a language to specify reusable submodels and composition patterns, and an automated composition algorithm. Clearly defining libraries of reusable submodels, together with patterns for their composition, allows complex models to be automatically assembled, based on a high-level description of the scenario to be evaluated. This paper provides a solution to this problem focusing on: formally defining the concept of model templates, defining a specification language for model templates, defining an automated instantiation and composition algorithm, and applying the approach to a case study of a large-scale distributed system.

14 citations

Proceedings ArticleDOI
13 May 2014
TL;DR: This paper describes the workflow for the automated generation of large per formability models, and introduces the TMDL language, a DSL to concretely support the workflow, and details their implementation within the Eclipse modeling platform.
Abstract: Dependability and performance analysis of modern systems is facing great challenges: their scale is growing, they are becoming massively distributed, interconnected, and evolving. Such complexity makes model-based assessment a difficult and time-consuming task. For the evaluation of large systems, reusable sub models are typically adopted as an effective way to address the complexity and improve the maintanability of models. Approaches based on Stochastic Petri Nets often compose sub models by state-sharing, following predefined "patterns", depending on the scenario of interest. However, such composition patterns are typically not formalized. Clearly defining libraries of reusable sub models, together with valid patterns for their composition, would allow complex models to be automatically assembled, based on a high-level description of the scenario to be evaluated. The contribution of this paper to this problem is twofold: on one hand we describe our workflow for the automated generation of large per formability models, on the other hand we introduce the TMDL language, a DSL to concretely support the workflow. After introducing the approach and the language, we detail their implementation within the Eclipse modeling platform, and briefly show its usage through an example.

13 citations


Cites background from "Performability evaluation of multip..."

  • ...In such approaches (e.g., see [10, 20, 31, 37]) the overall analysis model is built out of well-defined submodels addressing specific aspects of the systems, which are then composed following predefined rules based on the actual scenario to be represented....

    [...]

Journal Article
TL;DR: This paper summarizes the state of knowledge and ongoing research on methods and techniques for resilience evaluation, taking into account the resilience-scaling challenges and properties related to the ubiquitous computerized systems.
Abstract: This paper summarizes the state of knowledge and ongoing research on methods and techniques for resilience evaluation, taking into account the resilience-scaling challenges and properties related to the ubiquitous computerized systems. We mainly focus on quantitative evaluation approaches and, in particular, on model-based evaluation techniques that are commonly used to evaluate and compare, from the dependability point of view, different architecture alternatives at the design stage. We outline some of the main modeling techniques aiming at mastering the largeness of analytical dependability models at the construction level. Actually, addressing the model largeness problem is important with respect to the investigation of the scalability of current techniques to meet the complexity challenges of ubiquitous systems. Finally we present two case studies in which some of the presented techniques are applied for modeling web services and General Packet Radio Service (GPRS) mobile telephone networks, as prominent examples of large and evolving systems.

13 citations


Cites background from "Performability evaluation of multip..."

  • ...Contrarily to the join/replicate formalism that requires the use of a special operation, the graph composition detects all the symmetries exposed at the composition level and uses them to reduce the underlying state space....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: It is shown that GSPN are equivalent to continuous-time stochastic processes, and solution methods for the derivation of the steady state probability distribution are presented.
Abstract: Generalized stochastic Petri nets (GSPNs) are presented and are applied to the performance evaluation of multiprocessor systems. GSPNs are derived from standard Petri nets by partitioning the set of transitions into two subsets comprising timed and immediate transitions. An exponentially distributed random firing time is associated with each timed transition, whereas immediate transitions fire in zero time. It is shown that GSPN are equivalent to continuous-time stochastic processes, and solution methods for the derivation of the steady state probability distribution are presented. Examples of application of gspn models to the performance evaluation of multiprocessor systems show the usefulness and the effectiveness of this modeling tool. 15 references.

1,394 citations

Proceedings ArticleDOI
11 Dec 1989
TL;DR: SPNP, a powerful GSPN package that allows the modeling of complex system behaviors, is presented and is compared with two other SPN-based packages, Great SPN and METASAN.
Abstract: SPNP, a powerful GSPN package that allows the modeling of complex system behaviors, is presented. Advanced constructs are available in SPNP such as marking-dependent arc multiplicities, enabling functions, arrays of places or transitions, and subnets; the full expressive power of the C programming language is also available to increase the flexibility of the net description. Sophisticated steady-state and transient solvers are available including cumulative and up-to-absorption measures. The user is not limited to a predefined set of measures; detailed expressions reflecting exactly the measures sought can be easily specified. The authors conclude by comparing SPNP with two other SPN-based packages, Great SPN and METASAN. >

619 citations


"Performability evaluation of multip..." refers background in this paper

  • ...Also, numerous evaluation tools using GSPNs and their off-springs have been developed (e.g., SPNP [12], SURF-2 [5], SHARPE [28], UltraSAN [29] and DEEM [7])....

    [...]

  • ..., SPNP [12], SURF-2 [5], SHARPE [28], UltraSAN [29] and DEEM [7])....

    [...]

Journal ArticleDOI
TL;DR: Any SRN containing one or more of the nearly-independent structures, commonly encountered in practice, can be analyzed using the decomposition approach presented, and this technique is applied to the analysis of a flexible manufacturing system.

280 citations


"Performability evaluation of multip..." refers methods in this paper

  • ...This is one of the aims of this paper in which modeling is based on Stochastic Reward Nets (SRNs) [13], that are obtained from GSPNs by assigning reward rates to tangible markings....

    [...]

Journal ArticleDOI
TL;DR: This paper presents an approach for avoiding the large state space problem and uses a hierarchical modeling technique for analyzing complex reliability models that allows the flexibility of Markov models where necessary and retains the efficiency of combinatorial solution where possible.
Abstract: Combinatorial models such as fault trees and reliability block diagrams are efficient for model specification and often efficient in their evaluation. But it is difficult, if not impossible, to allow for dependencies (such as repair dependency and near-coincident-fault type dependency), transient and intermittent faults, standby systems with warm spares, and so on. Markov models can capture such important system behavior, but the size of a Markov model can grow exponentially with the number of components in this system. This paper presents an approach for avoiding the large state space problem. The approach uses a hierarchical modeling technique for analyzing complex reliability models. It allows the flexibility of Markov models where necessary and retains the efficiency of combinatorial solution where possible. Based on this approach a computer program called SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) has been written. The hierarchical modeling technique provides a very flexible mechanism for using decomposition and aggregation to model large systems; it allows for both combinatorial and Markov or semi-Markov submodels, and can analyze each model to produce a distribution function. The choice of the number of levels of models and the model types at each level is left up to the modeler. Component distribution functions can be any exponential polynomial whose range is between zero and one. Examples show how combinations of models can be used to evaluate the reliability and availability of large systems using SHARPE.

277 citations


"Performability evaluation of multip..." refers background in this paper

  • ...Also, numerous evaluation tools using GSPNs and their off-springs have been developed (e.g., SPNP [12], SURF-2 [5], SHARPE [28], UltraSAN [29] and DEEM [7])....

    [...]

  • ..., SPNP [12], SURF-2 [5], SHARPE [28], UltraSAN [29] and DEEM [7])....

    [...]

Journal ArticleDOI
TL;DR: The design of UltraSAN reflects its two main purposes: to facilitate the evaluation of realistic computer systems and networks, and to provide a test-bed for investigating new modeling techniques.

195 citations


"Performability evaluation of multip..." refers background in this paper

  • ...Also, numerous evaluation tools using GSPNs and their off-springs have been developed (e.g., SPNP [12], SURF-2 [5], SHARPE [28], UltraSAN [29] and DEEM [7])....

    [...]

  • ..., SPNP [12], SURF-2 [5], SHARPE [28], UltraSAN [29] and DEEM [7])....

    [...]

Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "Performability evaluation of multipurpose multiprocessor systems: the ''separation of concerns'' approach" ?

The aim of their work is to provide a modeling framework for evaluating performability measures of Multipurpose, Multiprocessor Systems ( MMSs ). The originality of their approach is in the explicit separation between the architectural and environmental concerns of a system. As the systems are scalable, the authors consider two architectures: a reference one composed of sixteen processors and an extended one with twenty processors. Then, the authors use the obtained results to evaluate the performability of a clustered system composed of four reference systems. The results can be used for supporting the manufacturer design choices as well as the potential end-user configuration selection. 

Another important point of interest concerns the exploitation by the end-user of the various degradation possibilities offered by the architecture.