scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Software Engineering in 1983"


Journal ArticleDOI
A.J. Albrecht1, J.E. Gaffney
TL;DR: In this paper, the equivalence between Albrecht's external input/output data flow representative of a program (the function points" metric) and Halstead's [2] "software science" or "software linguistics" model of a programming program as well as the "soft content" variation of Halsteads model suggested by Gaffney [7] was demonstrated.
Abstract: One of the most important problems faced by software developers and users is the prediction of the size of a programming system and its development effort. As an alternative to "size," one might deal with a measure of the "function" that the software is to perform. Albrecht [1] has developed a methodology to estimate the amount of the "function" the software is to perform, in terms of the data it is to use (absorb) and to generate (produce). The "function" is quantified as "function points," essentially, a weighted sum of the numbers of "inputs," "outputs,"master files," and "inquiries" provided to, or generated by, the software. This paper demonstrates the equivalence between Albrecht's external input/output data flow representative of a program (the "function points" metric) and Halstead's [2] "software science" or "software linguistics" model of a program as well as the "soft content" variation of Halstead's model suggested by Gaffney [7].

1,560 citations


Journal ArticleDOI
TL;DR: In this article, a new approach, involving version vectors and origin points, is presented and shown to detect single file, multiple copy mutual inconsistency effectively, which has been used in the design of LOCUS, a local network operating system at UCLA.
Abstract: Many distributed systems are now being developed to provide users with convenient access to data via some kind of communications network. In many cases it is desirable to keep the system functioning even when it is partitioned by network failures. A serious problem in this context is how one can support redundant copies of resources such as files (for the sake of reliability) while simultaneously monitoring their mutual consistency (the equality of multiple copies). This is difficult since network faiures can lead to inconsistency, and disrupt attempts at maintaining consistency. In fact, even the detection of inconsistent copies is a nontrivial problem. Naive methods either 1) compare the multiple copies entirely or 2) perform simple tests which will diagnose some consistent copies as inconsistent. Here a new approach, involving version vectors and origin points, is presented and shown to detect single file, multiple copy mutual inconsistency effectively. The approach has been used in the design of LOCUS, a local network operating system at UCLA.

432 citations


Journal ArticleDOI
TL;DR: The presented approach aims to exercise use-definition chains that appear in the program by checking liveness of every definition of a variable at the point(s) of its possible use.
Abstract: Some properties of a program data flow can be used to guide program testing. The presented approach aims to exercise use-definition chains that appear in the program. Two such data oriented testing strategies are proposed; the first involves checking liveness of every definition of a variable at the point(s) of its possible use; the second deals with liveness of vectors of variables treated as arguments to an instruction or program block. Reliability of these strategies is discussed with respect to a program containing an error.

356 citations


Journal ArticleDOI
TL;DR: A formal model for atomic commit protocols for a distributed database system is introduced and is used to prove existence results about resilient protocols for site failures that do not partition the network and then for partitioned networks.
Abstract: A formal model for atomic commit protocols for a distributed database system is introduced. The model is used to prove existence results about resilient protocols for site failures that do not partition the network and then for partitioned networks. For site failures, a pessimistic recovery technique, called independent recovery, is introduced and the class of failures for which resilient protocols exist is identified. For partitioned networks, two cases are studied: the pessimistic case in which messages are lost, and the optimistic case in which no messages are lost. In all cases, fundamental limitations on the resiliency of protocols are derived.

301 citations


Journal ArticleDOI
TL;DR: A new algorithm (Algorithm GENERAL) is presented to derive processing strategies for arbitrarily complex queries to minimize the response time and the total time for distributed queries.
Abstract: The efficiency of processing strategies for queries in a distributed database is critical for system performance. Methods are studied to minimize the response time and the total time for distributed queries. A new algorithm (Algorithm GENERAL) is presented to derive processing strategies for arbitrarily complex queries. Three versions of the algorithm are given: one for minimizing response time and two for minimizing total time. The algorithm is shown to provide optimal solutions under certain conditions.

266 citations


Journal ArticleDOI
TL;DR: This paper defines software safety and describes a technique called software fault tree analysis which can be used to analyze a design as to its safety and has been applied to a program which controls the flight and telemetry for a University of California spacecraft.
Abstract: With the increased use of software controls in critical realtime applications, a new dimension has been introduced into software reliability–the "cost" of errors. The problems of safety have become critical as these applcations have increasingly included areas where the consequences of failure are serious and may involve grave dangers to human life and property. This paper defines software safety and describes a technique called software fault tree analysis which can be used to analyze a design as to its safety. The technique has been applied to a program which controls the flight and telemetry for a University of California spacecraft. A critical failure scenario was detected by the technique which had not been revealed during substantial testing of the program. Parts of this analysis are presented as an example of the use of the technique and the results are discussed.

243 citations


Journal ArticleDOI
TL;DR: In this article, the notion of a "time-driven" system is introduced which is formalized using a Petri net model augmented with timing information, and several subclasses of time-driven systems are defined with increasing levels of complexity.
Abstract: A methodology for the statement of timing requirements is presented for a class of embedded computer systems. The notion of a "time-driven" system is introduced which is formalized using a Petri net model augmented with timing information. Several subclasses of time-driven systems are defined with increasing levels of complexity. By deriving the conditions under which the Petri net model can be proven to be safe in the presence of time, timing requirements for modules in the system can be obtained. Analytical techniques are developed for proving safeness in the presence of time for the net constructions used in the defined subclasses of time-driven systems.

205 citations


Journal ArticleDOI
TL;DR: A tree structure for storing points from a normed space whose norm is effectively computable and an algorithm for finding the nearest point from the tree to a given query point is given.
Abstract: In this paper we present a tree structure for storing points from a normed space whose norm is effectively computable. We then give an algorithm for finding the nearest point from the tree to a given query point. Our algorithm searches the tree and uses the triangle inequality to eliminate searching of the entirety of some branches of the tree whenever a certain predicate is satisfied. Our data structure uses 0(n) for storage. Empirical data which we have gathered suggest that the expected complexity for preprocessing and the search time are, respectively, 0(nlogn) and 0(logn).

162 citations


Journal ArticleDOI
TL;DR: The optimal distribution of a database schema over a number of sites in a distributed network is considered and the design is driven by user-supplied information about data distribution.
Abstract: The optimal distribution of a database schema over a number of sites in a distributed network is considered. The database is modeled in terms of objects (relations or record sets) and links (predefined joins or CODASYL sets). The design is driven by user-supplied information about data distribution. The inputs required by the optimization model are: 1) cardinality and size information about objects and links, 2) a set of candidate horizontal partitions of relations into fragments and the allocations of the fragments, and 3) the specification of all important transactions, their frequencies, and their sites of origin.

139 citations


Journal ArticleDOI
TL;DR: The theory of software science was developed by the late M. H. Halstead of Purdue University during the early 1970's and drew widespread attention from the computer science community.
Abstract: The theory of software science was developed by the late M. H. Halstead of Purdue University during the early 1970's. It was first presented in unified form in the monograph Elements of Software Science published by Elsevier North-Holland in 1977. Since it claimed to apply scientific methods to the very complex and important problem of software production, and since experimental evidence supplied by Halstead and others seemed to support the theory, it drew widespread attention from the computer science community.

137 citations


Journal ArticleDOI
TL;DR: The Software Engineering Laboratory has analyzed the Software Science metrics, cyclomatic complexity, and various standard program measures for their relation to 1) effort, 2) development errors, 3) one another.
Abstract: The desire to predict the effort in developing or explain the quality of software has led to the proposal of several metrics in the literature As a step toward validating these metrics, the Software Engineering Laboratory has analyzed the Software Science metrics, cyclomatic complexity, and various standard program measures for their relation to 1) effort (including design through acceptance testing), 2) development errors (both discrete and weighted according to the amount of time to locate and frix), and 3) one another The data investigated are collected from a production Fortran environment and examined across several projects at once, within individual projects and by individual programmers across projects, with three effort reporting accuracy checks demonstrating the need to validate a database When the data come from individual programmers or certain validated projects, the metrics' correlations with actual effort seem to be strongest For modules developed entirely by individual programmers, the validity ratios induce a statistically significant ordering of several of the metrics' correlations When comparing the strongest correlations, neither Software Science's E metric, cyclomatic complexity nor source lines of code appears to relate convincingly better with effort than the others

Journal ArticleDOI
TL;DR: The study suggests that individual differences have a large effect on the significance of results where many individuals are used and when an individual is isolated, better results are obtainable.
Abstract: A family of syntactic complexity metrics is defined that generates several metrics commonly occurring in the literature. The paper uses the family to answer some questions about the relationship of these metrics to error-proneness and to each other. Two derived metrics are applied; slope which measures the relative skills of programmers at handling a given level of complexity and r square which is indirectly related to the consistency of performance of the programmer or team. The study suggests that individual differences have a large effect on the significance of results where many individuals are used. When an individual is isolated, better results are obtainable. The metrics can also be used to differentiate between projects on which a methodology was used and those on which it was not.

Journal ArticleDOI
TL;DR: A decision procedure to determine when computer software should be released is described, based upon the cost-benefit for the entire company that has developed the software.
Abstract: A decision procedure to determine when computer software should be released is described. This procedure is based upon the cost-benefit for the entire company that has developed the software. This differs from the common practice of only minimizing the repair costs for the data processing division. Decision rules are given to determnine at what time the system should be released based upon the results of testing the software. Necessary and sufficient conditions are identified which determine when the system should be released (immediately, before the deadline, at the deadline, or after the deadline). No assumptions are made about the relationship between any of the model's parameters. The model can be used whether the software was developed by a first or second party. The case where future costs are discounted is also considered.

Journal ArticleDOI
TL;DR: The function point method of measuring application development productivity developed by Albrecht is reviewed and a productivity improvement measure introduced and an analysis of the statistical significance of results is presented.
Abstract: The function point method of measuring application development productivity developed by Albrecht is reviewed and a productivity improvement measure introduced. The measurement methodology is then applied to 24 development projects. Size, environment, and language effects on productivity are examined. The concept of a productivity index which removes size effects is defined and an analysis of the statistical significance of results is presented.

Journal ArticleDOI
TL;DR: This work proposes a straightforward pragmatic approach to software fault tolerance which takes advantage of the structure of real-time systems to simplify error recovery, and a classification scheme for errors is introduced.
Abstract: Real-time systems often have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be expensive and the cost is exacerbated for systems which utilize concurrent processes. The concurrency present in most real-time systems and the further difficulties introduced by timing constraints suggest that providing tolerance for software faults may be inordinately expensive or complex. We believe that this need not be the case, and propose a straightforward pragmatic approach to software fault tolerance'which is believed to be applicable to many real-time systems. The approach takes advantage of the structure of real-time systems to simplify error recovery, and a classification scheme for errors is introduced. Responses to each type of error are proposed which allow service to be maintained.

Journal ArticleDOI
TL;DR: A mathematical framework is developed that provides a mechanism for comparing the power of methods of testing programs based on the degree to which the methods approximate program verification, and provides a reasonable and useful interpretation of the notion that successful tests increase one's confidence in the program's correctness.
Abstract: Testing has long been in need of mathematical underpinnings to explain its value as well as its limitations. This paper develops and applies a mathematical framework that 1) unifies previous work on the subject, 2) provides a mechanism for comparing the power of methods of testing programs based on the degree to which the methods approximate program verification, and 3) provides a reasonable and useful interpretation of the notion that successful tests increase one's confidence in the program's correctness.

Journal ArticleDOI
TL;DR: It is shown that graph traversal techniques have fundamental differences between serial and distributed computations in their behaviors, computational complexities, and effects on the design of graph algorithms.
Abstract: This paper shows that graph traversal techniques have fundamental differences between serial and distributed computations in their behaviors, computational complexities, and effects on the design of graph algorithms. It has three major parts. Section I describes the computational environment for the design and description of distributed graph algorithms in terms of an architectural model for message exchanges. The computational complexity is measured in terms of the number of messages transmitted. Section II presents several distributed algorithms for the pure traversal, depth-first search, and breadth-first search techniques. Their complexities are also given. Through these descriptions are brought out some of the intrinsic differences in the behaviors and complexities of the fundamental traversal techniques between a serial and a distributed computation environment. Section III gives the distributed version of the Ford and Fulkerson algorithm for the maximum flow problem by means of depth-first search, the largest-augmentation search and breadth-first search. The complexities of these methods are found to be 0(f*|A|), 0((l + logM/(M-1)f*|V||A|) and O(|V|6), respectively, where f* is the maximum flow value of the problem, M is the maximum number of ucs in a cut, |V| is the number of vertices, and |A| is the number of arcs. Lastly, it is shown that the largest augmentation search may be a better method than the other two. This is contrary to the known results in serial computation.

Journal ArticleDOI
TL;DR: The model is based on a modified form of Petri net, and enables one to represent both the structure and the behavior of a distributed software system at a desired level of design.
Abstract: A model for representing and analyzing the design of a distributed software system is presented. The model is based on a modified form of Petri net, and enables one to represent both the structure and the behavior of a distributed software system at a desired level of design. Behavioral properties of the design representation can be verified by translating the modified Petri net into an equivalent ordinary Petri net and then analyzing that resulting Petri net. The model emphasizes the unified representation of control and data flows, partially ordered software components, hierarchical component structure, abstract data types, data objects, local control, and distributed system state. At any design level, the distributed software system is viewed as a collection of software components. Software components are externally described in terms of their input and output control states, abstract data types, data objects, and a set of control and data transfer specifications. They are interconnected through the shared control states and through the shared data objects. A system component can be viewed internally as a collection of subcomponents, local control states, local abstract data types, and local data objects.

Journal ArticleDOI
TL;DR: A programming system has been implemented in which annotated Petri nets are used as machine-processable high-evel design representations that can be used to express the parallelism and the dynamic sequential dependencies found in complex software.
Abstract: A programming system has been implemented in which annotated Petri nets are used as machine-processable high-evel design representations. The nets can be used to express the parallelism and the dynamic sequential dependencies found in complex software. They can then be interactively fired to facilitate debugging of the design. The nets are processed into a procedure language, called XL/1, to which a variety of transformations are applied in order to produce more efficient programs. These programs are generated for either a serial or a parallel processing environment. Finally, the XL/1 programs may be translated into PL/I or PL/S. The serial processing versions have been compiled and run successfully, but the parallel processing versions have not yet been run in a parallel processing environment.

Journal ArticleDOI
TL;DR: The architecture and design of the software system being produced as the focus of the Toolpack project is discussed, and the basic requirements that an integrated system of tools must satisfy to be successful and to remain useful both in practice and as an experimental object are explained.
Abstract: This paper discusses the goals and methods of the Toolpack project and in this context discusses the architecture and design of the software system being produced as the focus of the project. Toolpack is presented as an experimental activity in which a large software tool environment is being created for the purpose of general distribution and then careful study and analysis. The paper begins by explaining the motivation for building integrated tool sets. It then proceeds to explain the basic requirements that an integrated system of tools must satisfy in order to be successful and to remain useful both in practice and as an experimental object. The paper then summarizes the tool capabilities that will be incorporated into the environment. It then goes on to present a careful description of the actual architecture of the Toolpack integrated tool system. Finally the Toolpack project experimental plan is presented, and future plans and directions are summarized.

Journal ArticleDOI
TL;DR: The use and acceptance of the term "software engineer" is investigated, and the functions and background of persons identified as software engineers are reported.
Abstract: The results of a survey of software development practice are reported and analyzed. The problems encountered in various phases of the software life cycle are measured and correlated with characteristics of the responding installations. The use and acceptance of the term "software engineer" is investigated, and the functions and background of persons identified as software engineers are reported. The usage of a wide variety of software engineerilng tools and methods is measured; conclusions are drawn concerning the usefulness of these techniques.

Journal ArticleDOI
TL;DR: It is shown that probabilistic control is sub-optimal to minimize the mean number of customers in the system and an approximation to the optimum policy is analyzed which is computationally simple and suffices for most operational applications.
Abstract: A dynamic control policy known as "threshold queueing" is defined for scheduling customers from a Poisson source on a set of two exponential servers with dissimilar service rates. The slower server is invoked in response to instantaneous system loading as measured by the length of the queue of waiting customers. In a threshold queueing policy, a specific queue length is identified as a "threshold," beyond which the slower server is invoked. The slower server remains busy until it completes service on a customer and the queue length is less than its invocation threshold. Markov chain analysis is employed to analyze the performance of the threshold queueing policy and to develop optimality criteria. It is shown that probabilistic control is sub-optimal to minimize the mean number of customers in the system. An approximation to the optimum policy is analyzed which is computationally simple and suffices for most operational applications.

Journal ArticleDOI
TL;DR: In this paper, a probabilistic model of transactions (queries, updates, insertions, and deletions) to a file is presented and an algorithm that obtains a near optimal solution to the index selection problem in polynomial time is developed.
Abstract: A problem of considerable interest in the design of a database is the selection of indexes. In this paper, we present a probabilistic model of transactions (queries, updates, insertions, and deletions) to a file. An evaluation function, which is based on the cost saving (in terms of the number of page accesses) attributable to the use of an index set, is then developed. The maximization of this function would yield an optimal set of indexes. Unfortunately, algorithms known to solve this maximization problem require an order of time exponential in the total number of attributes in the file. Consequently, we develop the theoretical basis which leads to an algorithm that obtains a near optimal solution to the index selection problem in polynomial time. The theoretical result consists of showing that the index selection problem can be solved by solving a properly chosen instance of the knapsack problem. A theoretical bound for the amount by which the solution obtained by this algorithm deviates from the true optimum is provided. This result is then interpreted in the light of evidence gathered through experiments.

Journal ArticleDOI
TL;DR: An approach to producing abstract requirements specifications that applies to a significant class of real-world systems, including any system that must reconstruct data that have undergone a sequence of transformations is introduced.
Abstract: An abstract requirements specification states system requirements precisely without describing a real or a paradigm implementation. Although such specifications have important advantages, they are difficult to produce for complex systems and hence are seldom seen in the "real" programming world. This paper introduces an approach to producing abstract requirements specifications that applies to a significant class of real-world systems, including any system that must reconstruct data that have undergone a sequence of transformations. tions. It also describes how the approach was used to produce a requirements document for SCP, a small, but nontrivial Navy communications system. The specification techniques used in the SCP requirements document are introduced and illustrated with examples.

Journal ArticleDOI
TL;DR: Three conflicting issues discussed in this paper are 1) limitations of short-term memory and number of sub-routine parameters, 2) searches in human memory and programming effort, and 3) psychological time and programming time.
Abstract: Halstead proposed a methodology for studying the process of programming known as software science. This methodology merges theories from cognitive psychology with theories from computer science. There is evidence that some of the assumptions of software science incorrectly apply the results of cognitive psychology studies. HAlstead proposed theories relative to human memory models that appear to be without support from psychologists. Other software scientists, however, report empirical evidence that may support some of those theories. This anomaly places aspects of software science in a precarious position. The three conflicting issues discussed in this paper are 1) limitations of short-term memory and number of sub-routine parameters, 2) searches in human memory and programming effort, and 3) psychological time and programming time.

Journal ArticleDOI
TL;DR: This work investigates the question of when two entity-relationship diagrams (ERD's) should be considered equivalent, in the sense of representing the same information, and gives three natural and increasingly stricter criteria for developing concepts of equivalence for ERD's.
Abstract: We investigate the question of when two entity-relationship diagrams (ERD's) should be considered equivalent, in the sense of representing the same information This question is very important for a database design process which uses the ERD model, and can be interpreted in various ways We give three natural and increasingly stricter criteria for developing concepts of equivalence for ERD's We first give a notion of "domain data compatibility" which ensures that the ERD's in question represent the same universe of data in an aggregate sense Then we define the set of functional dependencies which are naturally embedded in each ERD, and use it to develop a concept of "data dependency equivalence" which ensures that the ERD's satisfy the same constraints (functional dependencies) among the represented data Finally, we give our strongest criterion, instance data equivalence, which requires the ERD's to have the same power to represent instances of data We develop several alternate forms of this third notion, including some giving efficient tableaux tests for its occurrence Indeed, for each type of equivalence, we give a polynomial-time algorithm to test for it

Journal ArticleDOI
TL;DR: The reduction of the space of (potential undetected errors) is proposed as a criterion for test path selection and an analysis of the undetected perturbations for sequential programs operating on integers and real numbers is presented.
Abstract: Many testing methods require the selection of a set of paths on which tests are to be conducted. Errors in arithmetic expressions within program statements can be represented as perturbing functions added to the correct expression. It is then possible to derive the set of errors in a chosen functional class which cannot possibly be detected using a given test path. For example, test paths which pass through an assignment statement "X := f(Y)" are incapable of revealing if the expression "X -f( Y)" has been added to later statements. In general, there are an infinite number of such undetectable error perturbations for any test path. However, when the chosen functional class of error expressions is a vector space, a finite characterization of all undetectable expressions can be found for one test path, or for combined testing along several paths. An analysis of the undetected perturbations for sequential programs operating on integers and real numbers is presented which permits the detection of multinomial error terms. The reduction of the space of (potential undetected errors is proposed as a criterion for test path selection.

Journal ArticleDOI
TL;DR: Three notations for concurrent programming are compared, namely CSP, Ada, and monitors, and "lower-level" communication, synchronization, and nondeterminism in CSP and Ada are compared and "higher- level" module interface properties of Ada tasks and monitors are examined.
Abstract: Three notations for concurrent programming are compared, namely CSP, Ada, and monitors. CSP is an experimental language for exploring structuring concepts in concurrent programming. Ada is a general-purpose language with concurrent programming facilities. Monitors are a construct for managing access by concurrent processes to shared resources. We start by comparing "lower-level" communication, synchronization, and nondeterminism in CSP and Ada and then examine "higher-level" module interface properties of Ada tasks and monitors.

Journal ArticleDOI
TL;DR: A language-independent syntax-directed pretty printer has been implemented as the first step towards building a language- independent syntax- directed editor.
Abstract: A language-independent syntax-directed pretty printer has been implemented as the first step towards building a language-independent syntax-directed editor. The syntax-directed pretty printer works in two phases: the grammar processing phase and the program processing phase. In the grammar processing stage, a grammar which contains a context-free grammar and information for the parser and pretty printer is processed and all files needed by the second phase are written. With these files, the syntax-directed pretty printer works for the language of the grammar. The syntax-directed editor would use the same grammar processing phase to construct the files needed to make it work for a specific language. In the program processing phase, programs in the language of the grammar are parsed and parse trees are built. If syntax errors are found, error messages are produced and error recovery is done. The parse trees are pretty printed according to the pretty printer specifications given in the grammar, resulting in well-indented, syntactically clear programs.

Journal ArticleDOI
TL;DR: The framework and model of a network operating system (NOS) called MIKE, which stands for Multicomputer Integrator KErnel, provides system-transparent operation for users and maintains cooperative autonomy among local hosts.
Abstract: This paper presents the framework and model of a network operating system (NOS) called MIKE for use in distributed systems in general and for use in the Distributed Double-Loop Computer Network (DDLCN) in particular. MIKE, which stands for Multicomputer Integrator KErnel, provides system-transparent operation for users and maintains cooperative autonomy among local hosts.