scispace - formally typeset
Search or ask a question

Showing papers on "Overhead (computing) published in 1991"


Proceedings ArticleDOI
01 Apr 1991
TL;DR: This paper shows that, under the usual assumptions in automatic parallelization, most transformations on loop nests can be expressed as affine transformations on integer sets defined by polyhedra and that the new loop bounds can be computed with algorithms using Fourier’s pairwise elimination method although it is not exact for integer sets.
Abstract: Supercompilers perform complex program transformations which often result in new loop bounds. This paper shows that, under the usual assumptions in automatic parallelization, most transformations on loop nests can be expressed as affine transformations on integer sets defined by polyhedra and that the new loop bounds can be computed with algorithms using Fourier's pairwise elimination method although it is not exact for integer sets. Sufficient conditions to use pairwise elimination on integer sets and to extend it to pseudo-linear constraints are also given. A tradeoff has to be made between dynamic overhead due to some bound slackness and compilation complexity but the resulting code is always correct. Thse algorithms can be used to onterchange or block loops regardless of the loop bounds or the blocking strategy and to safety exchange array parts between two levels of a memory hierarchy or between neighboring processors in a distributed memory machine.

359 citations


Proceedings Article
20 Feb 1991
TL;DR: In this article, the authors present a technique called ISDN-Mixes, which shows that untraceable communication for services like telephony is often considered infeasible in the near future because of bandwidth limitations.
Abstract: Untraceable communication for services like telephony is often considered infeasible in the near future because of bandwidth limitations. We present a technique, called ISDN-Mixes, which shows that this is not the case.

253 citations


Proceedings ArticleDOI
01 Aug 1991
TL;DR: This paper deals with the problem of maintaining distributed directory server, that enables to keep track of mobile users in a distributed network in the presence of concurrent requests, and uses the graph-theoretic concept of regional matching for implementing efficient tracking mechanisms.
Abstract: This paper deals with the problem of maintaining distributed directory server, that enables us to keep track of mobile users in a distributed network in the presence of concurrent requests. The paper uses the graph-theoretic concept of regional matching for implementing efficient tracking mechanisms. The communication overhead of our tracking mechanism is within a polylogarithmic factor of the lower bound.

191 citations


Patent
17 Jul 1991
TL;DR: In this paper, a vehicle primarily for unmanned operation is equipped with a video camera pointed up, and progressively records in an on-board computer memory the observed locations of points derived from overhead features such as pre-existing overhead lights in a building during a manually driven teaching mode trip along a desired course.
Abstract: A vehicle primarily for unmanned operation is equipped with a video camera pointed up, and progressively records in an on-board computer memory the observed locations of points derived from overhead features such as pre-existing overhead lights in a building during a manually driven teaching mode trip along a desired course. On subsequent unmanned trips, the vehicle directs itself along the chosen course by again observing the overhead features, deriving points from them and comparing the locations of those points to the locations of the points that were recorded during the teaching mode trip. It then corrects its steering to bring the errors between the observed locations and the recorded locations to zero, thereby directing the vehicle to automatically follow the course that was followed during the teaching mode trip.

183 citations


01 Jan 1991
TL;DR: This paper presents the experkntal validation of the distributed execution of recovery blocks, in short distributed recovery block (IRB) in a tightly coupled real-tirne distributed canputing envirorment.
Abstract: This paper presents the experkntal validation of the distributed execution of recovery blocks, in short distributed recovery block (IRB). The experiment ws conducted on a tightly coupled network (m and a time-critical application ws used as a testbed for the experiment. The concept of the IRE3 is the canbination of distributed processing and RB (Recovery Block). It is also an approach for unified treatmnt of both hardware and softwre faults. The primary objective of this experiment was to damnstrate the efficiency of the IRB scha in a tightly coupled real-tirne distributed canputing envirorment. This experirnent required tnrploying the IRE3 schm in an application program and masuring its inpact on the systan perfomnce. The min interest in measurmnts is to find the tim overhead of the IRE3 scha, i.e., the overhead of the error detection and recovery activity.

147 citations


Proceedings ArticleDOI
08 Apr 1991
TL;DR: A new, simple, extremely fast, locally adaptive data compression algorithm of the LZ77 class is presented, which almost halves the size of text files, uses 16 K of memory, and requires about 13 machine instructions to compress and about 4 instructions to decompress each byte.
Abstract: A new, simple, extremely fast, locally adaptive data compression algorithm of the LZ77 class is presented. The algorithm, called LZRW1, almost halves the size of text files, uses 16 K of memory, and requires about 13 machine instructions to compress and about 4 instructions to decompress each byte. This results in speeds of about 77 K and 250 K bytes per second on a one-MIPS machine. The algorithm runs in linear time and has a good worst-case running time. It adapts quickly and has a negligible initialization overhead, making it fast and efficient for small as well as large blocks of data. >

125 citations


Journal ArticleDOI
TL;DR: Experimentation aimed at determining the potential benefit of mixed-mode SIMD/MIMD parallel architectures is reported, based on timing measurements made on the PASM system prototype at Purdue utilizing carefully coded synthetic variations of a well-known algorithm.

105 citations


01 Jan 1991
TL;DR: This work has developed and implemented techniques that double the performance of dynamically-typed object-oriented languages and dynamically compiles multiple versions of a source method, each customized according to its receiver's map.
Abstract: We have developed and implemented techniques that double the performance of dynamically-typed object-oriented languages. Our SELF implementation runs twice as fast as the fastest Smalltalk implementation, despite SELF's lack of classes and explicit variables. To compensate for the absence of classes, our system uses implementation-level maps to transparently group objects cloned from the same prototype, providing data type information and eliminating the apparent space overhead for prototype-based systems. To compensate for dynamic typing, user-defined control structures, and the lack of explicit variables, our system dynamically compiles multiple versions of a source method, each customized according to its receiver's map. Within each version the type of the receiver is fixed, and thus the compiler can statically bind and inline all messages sent to self. Message splitting and type prediction extract and preserve even more static type information, allowing the compiler to inline many other messages. Inlining dramatically improves performance and eliminates the need to hard-wire low-level methods such as +, ==, and ifTrue:. Despite inlining and other optimizations, our system still supports interactive programming environments. The system traverses internal dependency lists to invalidate all compiled methods affected by a programming change. The debugger reconstructs inlined stack frames from compiler- generated debugging information, making inlining invisible to the SELF programmer.

94 citations


Proceedings Article
03 Sep 1991
TL;DR: A framework for optimization of multiway join queries in multiprocessor computer systems is presented and shows that the optimizer usually generate optimal or near-optimal plans when the number of joins is relatively small, and the optimization overhead is much lesser compared to exhaustive search.
Abstract: Most of the existing relational database query optimizers generate multi-way join plans only from those linear ones to reduce the optimization overhead. For multiprocessor computer systems, this strategy seems inadequate since it may reduce the search space too much to generate near-optimal plans. In this paper we present a framework for optimization of multiway join queries in multiprocessor computer systems. The optimization process not only determines the order and method in which each join should be performed, but also determines the number of joins should be executed in parallel, and the number of processors should be allocated to each join. The preliminary performance study shows that the optimizer usually generate optimal or near-optimal plans when the number of joins is relatively small. Even when the number of joins increases, the algorithm still gives reasonably good performance. Furthermore, the optimization overhead is much lesser compared to exhaustive search.

88 citations


Proceedings ArticleDOI
01 Apr 1991
TL;DR: The methodology improves upon previous analytical studies and complements previous simulation studies by developing a common high-level workload model that is used to derive separate sets of lowlevel workload parameters for the two schemes, allowing an equitable comparison of the two scheme for a specific workload.
Abstract: We use mean value analysis models to compare representative hardware and software cache coherence schemes for a large-scale shared-memory system. Our goal is to identify the workloads for which either of the schemes is significantly better. Our methodology improves upon previous analytical studies and complements previous simulation studies by developing a common high-level workload model that is used to derive separate sets of lowlevel workload parameters for the two schemes. This approach allows an equitable comparison of the two schemes for a specific workload. is attractive because the overhead of detecting stale data is transferred from runtime to compile time, and the design complexity is transferred from hardware to software. However. software schemes may perform poorly because compile-time analysis may need IO be conservative, leading to unnecessary cache misses and main memory updates. In this paper, we use approximate Mean Value Analysis [U881 to compare the performance of a representative software scheme with a directory-based hardware scheme on a large-scale shared-memory system. In a previous study comparing the performance of hardware and software coherence, Cheong and VeidenOur resuIi, show that software schemes are haum used a parallelizing compiler to implement three difable (in terms of processor efficiency) IO hardware schemes ferent Software coherence schemes [Che90]. For selccted for a wide class of programs. The only cases for which subroutines Of Seven programs, they show that the hit ratio software schemes ,,erform sienificmtlv worse than of their most sophisticated software scheme (version con, ~~~ ~~~~~~ r~

88 citations


Journal ArticleDOI
01 Jul 1991
TL;DR: This work has developed and implemented techniques that double the performance of dynamically-typed object-oriented languages and dynamically compiles multiple versions of a source method, eachcustomized according to its receiver's map.
Abstract: We have developed and implemented techniques that double the performance of dynamically-typed object-oriented languages. Our SELF implementation runs twice as fast as the fastest Smalltalk implementation, despite SELF's lack of classes and explicit variables. To compensate for the absence of classes, our system uses implementation-levelmaps to transparently group objects cloned from the same prototype, providing data type information and eliminating the apparent space overhead for prototype-based systems. To compensate for dynamic typing, user-defined control structures, and the lack of explicit variables, our system dynamically compilesmultiple versions of a source method, eachcustomized according to its receiver's map. Within each version the type of the receiver is fixed, and thus the compiler can statically bind andinline all messages sent toself.Message splitting andtype prediction extract and preserve even more static type information, allowing the compiler to inline many other messages. Inlining dramatically improves performance and eliminates the need to hard-wire low-level methods such as+, ==, andifTrue:. Despite inlining and other optimizations, our system still supports interactive programming environments. The system traverses internal dependency lists to invalidate all compiled methods affected by a programming change. The debugger reconstructs inlined stack frames from compiler-generated debugging information, making inlining invisible to the SELF programmer.

Journal ArticleDOI
01 Dec 1991
TL;DR: This paper studies the relative performance of three high availability data replication strategies, chained declustering, mirrored disks, and interleaved declustered, in a shared nothing database machine environment and examines the tradeoff between the benefit of intra query parallelism and the overhead of activating and scheduling extra operator process.
Abstract: The paper studies the relative performance of chained declustering, mirrored disks, and interleaved declustering, in a shared nothing database machine environment. In particular, it examines the relative performance of the three strategies when no failures have occurred, the effect of load imbalance caused by a disk or processor failure on system throughput and response time, and the tradeoff between the benefit of intra query parallelism and process scheduling overhead. >

Proceedings Article
08 Apr 1991
TL;DR: A method to map multidimensional objects into points in a 1dimensional space based on fractals is proposed; thus, traditional primary-key access methods can be applied, with very few extensions on the part of the DBMS.
Abstract: Existing Database Management Systems (DBMSs) do not handle efficiently multi-dimensional data such as boxes, polygons, or even points in a multi-dimensional space. We examine access methods for these data with two design goals in mind: (a) efficiency in terms of search speed and space overhead and (b) ability to be integrated in a DBMS easily. We propose a method to map multidimensional objects into points in a 1dimensional space; thus, traditional primary-key access methods can be applied, with very few extensions on the part of the DBMS. We propose such mappings based on fractals; we implemented the whole method on top of a B +-tree, along with several mappings. Simulation experiments on several distributions of the input data show that a modified Hilbert curve gives the best results, even when compared to R-trees [7].

Journal ArticleDOI
TL;DR: A method for the derivation of fault signatures for the detection of faults in single-output combinational networks is described, which uses the arithmetic spectrum instead of the Rademacher-Walsh spectrum as a form of data compression to reduce the volume of response data at test time.
Abstract: A method for the derivation of fault signatures for the detection of faults in single-output combinational networks is described. The approach uses the arithmetic spectrum instead of the Rademacher-Walsh spectrum. It is a form of data compression that serves to reduce the volume of the response data at test time. The price which is paid for the reduction in the storage requirements is that some of the knowledge of exact fault location is lost. The derived signatures are short and easily tested using very simple test equipment. The test circuitry could be included on the chip since the overhead involved is comparatively small. The test procedure requires a high-speed counter cycling at maximum speed through selected subsets of all input combinations. Hence, the network under test is exercised at speed, and a number of dynamic errors that are not testable by means of conventional test-set approaches will be detected. >

Journal ArticleDOI
TL;DR: This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity.
Abstract: This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.

Journal ArticleDOI
TL;DR: This paper proposes a method called the vertically layered allocation scheme which utilizes heuristic rules in finding a compromise between computation and communication costs in a static data flow environment.

01 Jan 1991
TL;DR: The implementation of complete exchange on the circuit switched Intel iPSC-860 hypercube is described and results indicate that the programmer needs to evaluate several possibilities before finalizing an implementation - a careful choice can lead to very significant savings in time.
Abstract: The implementation of complete exchange on the circuit switched Intel iPSC-860 hypercube is described. This pattern, also known as all-to-all personalized communication, is the densest requirement that can be imposed on a network. On the iPSC-860, care needs to be taken to avoid edge contention, which can have a disastrous impact on communication time. There are basically two classes of algorithms that achieve contention-free complete exchange. The first contains the classical standard exchange algorithm that is generally useful for small message sizes. The second includes a number of optimal or near-optimal algorithms that are best for large messages. Measurement of communication overhead on the iPSC-860 are given and a notation for analyzing communication link usage is developed. It is shown that for the two classes of algorithms, there is substantial variation in performance with synchronization technique and choice of message protocol. Timings of six implementations are given; each of these is useful over a particular range of message size and cube dimension. Since the complete exchange is a superset of communication patterns, these timings represent upper bounds on the time required by an arbitrary communication requirement. These results indicate that the programmer needs to evaluate several possibilities before finalizing an implementation - a careful choice can lead to very significant savings in time.

Proceedings ArticleDOI
14 Oct 1991
TL;DR: A built-in self-test (BIST) hardware insertion technique is addressed, applying to register transfer level designs, that utilizes not only the circuit structure but also the module functionality in reducing test hardware overhead.
Abstract: A built-in self-test (BIST) hardware insertion technique is addressed. Applying to register transfer level designs, this technique utilizes not only the circuit structure but also the module functionality in reducing test hardware overhead. Experimental results have shown up to 38% reduction in area overhead over other system level BIST techniques. >

Patent
28 Feb 1991
TL;DR: An oval or elliptical shaped overhead conductor is an overhead conductor that is twisted along its length to provide a continuously varying profile to the wind as discussed by the authors, where each layer is helically wound in a direction opposite to the underlying layers.
Abstract: An oval or elliptical shaped overhead conductor that is twisted along its length to provide a continuously varying profile to the wind. A single core is comprised of a circular center wire wrapped by circular wires and the core is surrounded or encased by one or more layers of wire strands, including strands of different size from that of the core wires. Each layer is helically wound in a direction opposite to the underlying layers. The surrounding strands may be circular, with the strand sizes symmetrically arranged to result in a substantially oval or elliptical cross-section. Alternatively, the strands may be shaped into symmetrically arranged non-circular arcuate cross-sections, which when wound together result in an oval or elliptical conductor cross-section. The core strands are each circular or round wires having the same diameter, with the result that the core is easy and inexpensive to manufacture. The conductor is capable of manufacture in one step by winding the core and outer layers at the same time. A helical winding along the length of the conductor results in altering the profile that is presented to the wind, thereby substantially cancelling wind-induced forces in adjacent conductor segments or regions, thus damping conductor vibrations.

Journal ArticleDOI
TL;DR: A software tool, TCAS (time cost analysis system), that uses both the analytic and the simulation approaches is designed and implemented to aid users in determining the time cost behavior of their parallel computations.
Abstract: The authors investigate the modeling and analysis of time cost behavior of parallel computations. It is assumed parallel computations reside in a computer system in which there is a limited number of processors, all the processors have the same speed, and they communicate with each other through a shared memory. It has been found that the time costs of parallel computations depend on the input, the algorithm, the data structure, the processor speed, the number of processors, the processing power allocation, the communication, the execution overhead, and the execution environment. The authors define time costs of parallel computations as a function of the first seven factors as listed. The computation structure model is modified to describe the impact of these seven factors on time cost. Techniques based on the modified computation structure model are developed to analyze time cost. A software tool, TCAS (time cost analysis system), that uses both the analytic and the simulation approaches is designed and implemented to aid users in determining the time cost behavior of their parallel computations. >

Patent
06 Feb 1991
TL;DR: In this paper, an improved branch prediction cache (BPC) is proposed to share significant portions of hardware cost and design complexity overhead for dynamic branch prediction for target address, branch direction, and target instructions.
Abstract: The present invention provides an improved branch prediction cache (BPC) structure that combines various separate structures into one integrated structure. In conjunction with doing this, the present invention is able to share significant portions of hardware cost and design complexity overhead. As a result, the cost-performance trade-off for implementing dynamic branch prediction for target address, branch direction, and target instructions aspects of branches shifts to where 'full' branch prediction is now more practical.

Journal ArticleDOI
TL;DR: The REDUCE-OR process model is derived from the tree representation by providing a process interpretation of tree development, and devising efficient bookkeeping mechanisms and algorithms, and extracts full OR parallelism.
Abstract: A method for parallel execution of logic programs is presented. It uses REDUCE-OR trees instead of AND-OR or SLD trees. The REDUCE-OR trees represent logic-program computations in a manner suitable for parallel interpretation. The REDUCE-OR process model is derived from the tree representation by providing a process interpretation of tree development, and devising efficient bookkeeping mechanisms and algorithms. The process model is complete—it produces any particular solution eventually—and extracts full OR parallelism. This is in contrast to most other schemes that extract AND parallelism. It does this by solving the problem of interaction between AND and OR parallelism effectively. An important optimization that effectively controls the apparent overhead in the process model is given. Techniques that trade parallelism for reducing overhead are also described.

Journal ArticleDOI
TL;DR: In this paper, the extinction of flames is predicted using the limiting oxygen index concept linked with a well-stirred model of the fire environment, which is used to predict preflashover temperatures in forced ventilation fires.
Abstract: Compartment fire data with either no ventilation or forced overhead ventilation are successfully modeled as well-stirred fire environments rather than two-layer fire environments, The extinction of flames is predicted using the limiting oxygen index concept linked with a well-stirred model of the fire environment. While the fire environment in compartments with overhead ventilation is quite different than naturally-ventilated fires or fires ventilated from floor level, a temperature model previously developed by Beyler and Deal is shown to predict preflashover temperatures in forced ventilation fires.

Journal ArticleDOI
TL;DR: It is shown that for a given hardware overhead more reliable systems can be designed using bigger FTBBs without full spare utilization than using smaller FT BBs withFull spare utilization.
Abstract: Consideration is given to fault tolerant systems that are built from modules called fault tolerant basic blocks (FTBBs), where each module contains some primary nodes and some spare nodes Full spare utilization is achieved when each spare within an FTBB can replace any other primary or spare node in that FTBB This, however, may be prohibitively expensive for larger FTBBs Therefore, it is shown that for a given hardware overhead more reliable systems can be designed using bigger FTBBs without full spare utilization than using smaller FTBBs with full spare utilization Sufficient conditions for maximizing the reliability of a spare allocation strategy in an FTBB for a given hardware overhead are presented The proposed spare allocation strategy is applied to two fault tolerant reconfiguration schemes for binary hypercubes One scheme uses hardware switches to replace a faulty node, and the other scheme uses fault tolerant routing to bypass faulty nodes in the system and deliver messages to the destination node >

Proceedings ArticleDOI
01 Sep 1991
TL;DR: A genetic algorithm for MDAP is developed and the effects of varying the communication cost matrix representing the interprocessor communication topology and the uniformity of the distribution of documents to the clusters are studied.
Abstract: Information retrieval is the selection of documents that are potentially relevant to a user’s information need. Given the vast volume of data stored in modern information retrieval systems, searching the document database requires vast computational resources. To meet these computational demands, various researchers have developed parallel information retrieval systems. As efficient exploitation of parallelism demands fast access to the documents, data organization and placement significantly affect the total processing time. We describe and evaluate a data placement strategy for distributed memory, distributed 1/0 multicomputers. Initially, a formal description of the Multiprocessor Document Allocation Problem (MDAP) and a proof that MDAP is NP Complete are presented. A document allocation algorithm for MDAP based on Genetic Algorithms is developed. This algorithm assumes that the documents are clustered using any one of the many clustering algorithms. We define a cost function for the derived allocation and evaluate the performance of our algorithm using this function. As part of the experimental analysis, the effects of varying the number of documents and their distribution across the clusters as well the exploitation of various differing architectural interconnection topologies are studied. We also experiment with the several parameters common to Genetic Algorithms, e.g., the probability of mutation and the population size. 1.0 Introduction An efficient multiprocessor information retrieval system must maintain a low system response time and require relatively little storage overhead. As the volume of stored data continues to increase daily, the multiprocessor engines must likewise scale to a large number of processors. This demand for system scalability necessitates a distributed memory architecture as a large number of processors is not currently possible in a sharedmemory configuration. A distributed memory system, however, introduces the problem Perrrkion to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appaar, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1991 ACM 0-89791 -448 -1/91 /0009 /0230 . ..$1 .50 Hava Tova Siegelmann Dept. of Computer Science Rutgers University New Brunswick, NJ 08903 associated with the proper placement of data onto the given architecture. We refer to this problem as the Multiprocessor Document Allocation Problem (MDAP), a derivative of the Mapping Problem originally described by Bokhari [Bok8 1]. We assume a clustered document database. A clustered approach is taken since an index file organization can introduce vast storage overhead (up to roughly 300% according to Haskin [Has8 1]) and a full-text or signature analysis technique results in lengthy search times. In this context, a proper solution to MDAP is any mapping of the documents onto the processors such that the average cluster diameter is kept to a minimum while still providing for an even document distribution across the nodes. To achieve a significant reduction in the total query processing time using parallelism, the allocation of data among the processors should be distributed as evenly as possible and the interprocessor communication among the nodes should be minimized. Achieving such an allocation is NP Complete. Thus, it is necessary to use heuristics to obtain satisfactory mappings, which may indeed be suboptimal. Genetic Algorithms [DeJ89, G0189, Gre85, Gre87, H0187, Rag87] approximate optimal solutions to computationally intractable problems. We develop a genetic algorithm for MDAP and examine the effects of varying the communication cost matrix representing the interprocessor communication topology and the uniformity of the distribution of documents to the clusters. 1.1 Mapping Problem Approximations As the Mapping Problem and some of its derivatives are NP complete, heuristic algorithms are commonly employed to approximate the optimal solutions. Some of these approaches [Bok81, B0188, Lee87] deal, in some manner, This work was partially supported by grants from DCS, Inc. under contract number 5-35071 and the Center for Innovative Technology under contract number 5-34042.

Patent
15 Mar 1991
TL;DR: In this paper, a method for transmitting signals having a bit-rate higher than a predetermined bit rate in a digital data transmission system normally providing discrete data channels having the predetermined base bit rate, the channels being subject to different propagation delay characteristics through the system is disclosed.
Abstract: A method is disclosed for transmitting signals having a bit-rate higher than a predetermined bit-rate in a digital data transmission system normally providing discrete data channels having the predetermined base bit-rate, the channels being subject to different propagation delay characteristics through the system. A group of k predetermined bit-rate data channels are allocated as a virtual channel for the transmission of said high bit-rate signals. An overhead channel is defined in one of the group of channels. The high bit-rate signals are divided into n sub-signals, where n « k, having a bit-rate equal to or less than the predetermined bit-rate. The n sub-signals are transmitted over the data channels, overhead signals are transmitted at intervals over the channels in slots normally allocated for data. These data slots are transmitted over the overhead channel while the overhead signals are transmitted in their place. The overhead signals are used at the far end to reassemble the n sub-signals into said original high bit-rate data signal.

ReportDOI
01 Jan 1991
TL;DR: TYPESETTER is a programming system that utilizes profile data to select implementations of program abstractions and integrates the development, evaluation, and selection of alternative implementations of programming abstractions into a package that is transparent to the programmer.
Abstract: As the size and complexity of software continues to grow, it will be necessary for software construction systems to collect, maintain, and utilize much more information about programs than systems do now. This dissertation explores compiler utilization of profile. Several widely help assumptions about collecting profile data are not true. It is not true that the optimal instrumentation problem has been solved, and it is not true that counting traversals of the arcs of a program flow graph is more expensive and complex than counting executions of basic blocks. There are simple program flow graphs for which finding optimal instrumentations is possibly exponential. An algorithm is presented that computes instrumentations of a program to count are traversals (and therefore basic block counts also). Such instrumentations impose 10% to 20% overhead on the execution of a program, often less than the overhead required for collecting basic block execution counts. An algorithm called Greedy Sewing improves the behavior of programs on machines with instruction caches. By moving basic blocks physically closer together if they are executed close together in time, miss rates in instruction caches can be reduced up to 50%. Arc-count profile data not only allows the compiler to know which basic blocks to move closer together, it also allows those situations that will have little or no effect on the final performance of the reorganized program to be ignored. Such a low-level compiler optimization would be difficult to do without arc-count profile data. The primary contribution of this work is the development of TYPESETTER, a programming system that utilizes profile data to select implementations of program abstractions. The system integrates the development, evaluation, and selection of alternative implementations of programming abstractions into a package that is transparent to the programmer. Unlike previous systems, TYPESETTER does not require programmers to know details of the compiler implementation. Experience indicates that the TYPESETTER approach to system synthesis has considerable benefit, and will continue to be a promising avenue of research.

Proceedings ArticleDOI
01 May 1991
TL;DR: A compile-time analysis is presented called re~erence escape analysis for higher-order functional languages that determines whether the lifetime of a reference ezceeds the Lifetime of the environment in which the reference was created.
Abstract: In reference counting schemes for automatically reclaiming storage, each time a reference to an object is created or destroyed, the reference count of the object needs to be updated. This may involve expensive inter-processor message exchanges in distributed environments. This overhead can be reduced by analyzing the lifetimes of references to avoid unnecessary updatings. This paper describes a technique for reducing the runtime reference counting overhead through compile-time optimization. We present a compile-time analysis called re~erence escape analysis for higher-order functional languages that determines whether the lifetime of a reference ezceeds the lifetime of the environment in which the reference was created. Using this statically inferred information, a method for optimizing reference counting schemes is described. Our method can be applied to reference counting schemes in both uniprocessor and multiprocessor environments.

Proceedings ArticleDOI
07 Apr 1991
TL;DR: A novel hierarchical scheme for arbitrary networks is proposed, that features a trade-off between the communication overhead and the buffer requirements of the routing, that can be shown to be close to optimal.
Abstract: Store-and-forward deadlock prevention in communication networks is addressed. The approach adopted is that of establishing buffer classes in order to prevent cyclic waiting chains. This type of solution usually requires many buffers. The main contribution of the current study is in showing that the number of required buffers can be reduced considerably by using a hierarchical organization of the network. A novel hierarchical scheme for arbitrary networks is proposed, that features a trade-off between the communication overhead and the buffer requirements of the routing. This trade-off can be shown to be close to optimal. >

DOI
01 Jan 1991
TL;DR: A practical look at the C o m p a r e A n d S w a p operat ion in the context of contemporary shared m e m o r y mult iprocessors and shows that the c o m m o n techniques for reducing the overhead of lock-oriented synchronization in the presence of content ion are inappropria te when used as the basis for lock-free synchronizat ion.
Abstract: "An important class of concurrent objects are those that are lock-free, that is, whose operations are not contained within mutually exclusive critical sections. A lock-free object can be accessed by many threads at a time, yet clever update protocols based on atomic Compare-And-Swap operations guarantee the object's consistency. In this paper we take a practical look at the Compare-And-Swap operation in the context of contemporary shared memory multiprocessors. We first describe an operating system-based solution that permits the construction of a non- blocking Compare-And-Swap function on processor architectures that only support lock-oriented atomic primitives.We then evaluate several locking strategies that can be used to synthesize a Compare-And-Swap operation. We show that the common techniques for reducing the overhead of lock-oriented synchronization in the presence of contention are inappropriate when used as the basis for lock-free synchronization. We then describe a simple modification to an existing synchronization protocol which allows us to avoid much of the overhead normally associated with contention."