scispace - formally typeset
Search or ask a question

Showing papers on "Data flow diagram published in 1997"


Proceedings ArticleDOI
01 May 1997
TL;DR: This paper extends previous work on efficient path profiling to flow sensitive profiling, which associates hardware performance metrics with a path through a procedure, and describes a data structure, the calling context tree, that efficiently captures calling contexts for procedure-level measurements.
Abstract: A program profile attributes run-time costs to portions of a program's execution. Most profiling systems suffer from two major deficiencies: first, they only apportion simple metrics, such as execution frequency or elapsed time to static, syntactic units, such as procedures or statements; second, they aggressively reduce the volume of information collected and reported, although aggregation can hide striking differences in program behavior.This paper addresses both concerns by exploiting the hardware counters available in most modern processors and by incorporating two concepts from data flow analysis--flow and context sensitivity--to report more context for measurements. This paper extends our previous work on efficient path profiling to flow sensitive profiling, which associates hardware performance metrics with a path through a procedure. In addition, it describes a data structure, the calling context tree, that efficiently captures calling contexts for procedure-level measurements.Our measurements show that the SPEC95 benchmarks execute a small number (3--28) of hot paths that account for 9--98% of their L1 data cache misses. Moreover, these hot paths are concentrated in a few routines, which have complex dynamic behavior.

557 citations


Proceedings ArticleDOI
01 Dec 1997
TL;DR: The results affirm that significant instruction-level parallelism can be exploited in integer programs (2 to 6 instructions per cycle) and quantify the value of successively doubling the number of distributed elements.
Abstract: Traces are dynamic instruction sequences constructed and cached by hardware. A microarchitecture organized around traces is presented as a means for efficiently executing many instructions per cycle. Trace processors exploit both control flow and data flow hierarchy to overcome complexity and architectural limitations of conventional superscalar processors by (1) distributing execution resources based on trace boundaries and (2) applying control and data prediction at the trace level rather than individual branches or instructions. Three sets of experiments using the SPECInt95 benchmarks are presented. (i) A detailed evaluation of trace processor configurations: the results affirm that significant instruction-level parallelism can be exploited in integer programs (2 to 6 instructions per cycle). We also isolate the impact of distributed resources, and quantify the value of successively doubling the number of distributed elements. (ii) A trace processor with data prediction applied to inter-trace dependences: potential performance improvement with perfect prediction is around 45% for all benchmarks. With realistic prediction, gcc achieves an actual improvement of 10%. (iii) Evaluation of aggressive control flow: some benchmarks benefit from control independence by as much as 10%.

374 citations


Patent
07 Nov 1997
TL;DR: In this article, a transformation description language (TDL) is proposed for specifying how data is to be manipulated in a data warehousing application. The TDL is comprised of a source for storing raw data, one or more transformation objects for processing the raw data according to predefined instructions, and a target for storing the processed data.
Abstract: A transformation description language (TDL) for specifying how data is to be manipulated in a data warehousing application. The TDL is comprised of a source for storing raw data, one or more transformation objects for processing the raw data according to predefined instructions, and a target for storing the processed data. A mapping is used for directing the data flow between the I/O ports corresponding to the source, the plurality of transformation objects, and the target. The mapping specifies the connectivity between the source, transformation, and target objects as well as the order of these connections. There are a number of different transformations which can be performed to manipulate the data. Some such transformations include: an aggregator transformation, an expression transformation, a filter transformation, a lookup transformation, a query transformation, a sequence transformation, a stored procedure transformation, and an update strategy transformation.

291 citations


Patent
14 Mar 1997
TL;DR: In this paper, the authors propose an approach that allows the network to discover the nature of the service for each traffic flow, classifies it dynamically, and exercises traffic conditioning by means of such techniques as admission control and scheduling when delivering the traffic downstream to support the service appropriately.
Abstract: Multi-media networks will require that a data flow be given certain quality-of-service (QOS) for a network connection but pre-negotiation of this sort is foreign to the current data networking model. The real time traffic flow in the data network requires distinct limits on the tolerance to delay, and the variations in that delay. Interactive voice and video demand that the total delay does not exceed the threshold beyond which human interaction is unacceptably impaired. The present invention allows the network to discover the nature of the service for each traffic flow, classifies it dynamically, and exercises traffic conditioning by means of such techniques as admission control and scheduling when delivering the traffic downstream to support the service appropriately.

238 citations


Proceedings ArticleDOI
01 Oct 1997
TL;DR: The paper shows how the popular data flow approach to visualization can be extended to allow multiple users to collaborate-each running their own visualization pipeline but with the opportunity to connect in data generated by a colleague.
Abstract: Current visualization systems are designed around a single user model, making it awkward for large research teams to collectively analyse large data sets. The paper shows how the popular data flow approach to visualization can be extended to allow multiple users to collaborate-each running their own visualization pipeline but with the opportunity to connect in data generated by a colleague, Thus collaborative visualizations are 'programmed' in exactly the same 'plug-and-play' style as is now customary for single-user mode. The paper describes a system architecture that can act as a basis for the collaborative extension of any data flow visualization system, and the ideas are demonstrated through a particular implementation in terms of IRIS Explorer.

144 citations


Proceedings Article
14 Aug 1997
TL;DR: MineSet supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deployment, and third party vendors can interface to the MineSet tools for model deployment and for integration with other packages.
Abstract: MineSet™, Silicon Graphics' interactive system for data mining, integrates three powerful technologies: database access, analytical data mining, and data visualization. It supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deployment. Mine-Set is based on a client-server architecture that scales to large databases. The database access component provides a rich set of operators that can be used to preprocess and transform the stored data into forms appropriate for visualization and analytical mining. The 3D visualization capabilities allow direct data visualization for exploratory analysis, including tools for displaying high-dimensional data containing geographical and hierarchical information. The analytical mining algorithms help identify potentially interesting models of the data, which can be viewed using visualization tools specialized for the learned models. Third party vendors can interface to the MineSet tools for model deployment and for integration with other packages.

135 citations


Journal ArticleDOI
TL;DR: This work applies program slicing, a program decomposition method, to the problem of extracting reusable functions from ill structured programs, and extends the definition of program slice to a transform slice, one that includes statements which contribute directly or indirectly to transform a set of input variables into aSet of output variables.
Abstract: An alternative approach to developing reusable components from scratch is to recover them from existing systems. We apply program slicing, a program decomposition method, to the problem of extracting reusable functions from ill structured programs. As with conventional slicing first described by M. Weiser (1984), a slice is obtained by iteratively solving data flow equations based on a program flow graph. We extend the definition of program slice to a transform slice, one that includes statements which contribute directly or indirectly to transform a set of input variables into a set of output variables. Unlike conventional program slicing, these statements do not include either the statements necessary to get input data or the statements which test the binding conditions of the function. Transform slicing presupposes the knowledge that a function is performed in the code and its partial specification, only in terms of input and output data. Using domain knowledge we discuss how to formulate expectations of the functions implemented in the code. In addition to the input/output parameters of the function, the slicing criterion depends on an initial statement, which is difficult to obtain for large programs. Using the notions of decomposition slice and concept validation we show how to produce a set of candidate functions, which are independent of line numbers but must be evaluated with respect to the expected behavior. Although human interaction is required, the limited size of candidate functions makes this task easier than looking for the last function instruction in the original source code.

119 citations


Journal ArticleDOI
TL;DR: This article presents a general framework for developing demand-driven interprocedural data flow analyzers and reports the experience in evaluating the performance of this approach.
Abstract: The high cost and growing importance of interprocedural data flow analysis have led to an increased interest in demand-driven algorithms. In this article, we present a general framework for developing demand-driven interprocedural data flow analyzers and report our experience in evaluating the performance of this approach. A demand for data flow information is modeled as a set of queries. The framework includes a generic demand-driven algorithm that determines the response to query by iteratively applying a system of query propagation rules. The propagation rules yield precise responses for the class of distributive finite data flow problems. We also describe a two-phase framework variation to accurately handle nondistributive problems. A performance evaluation of our demand-driven approach is presented for two data flow problems, namely, reaching-definitions and copy constant propagation. Our experiments show that demand-driven analysis performs well in practice, reducing both time and space requirements when compared with exhaustive analysis.

96 citations


Journal ArticleDOI
TL;DR: This paper addresses the problem of allocating (assigning and scheduling) periodic task modules to processing nodes in distributed real-time systems subject to task precedence and timing constraints using the branch-and-bound technique to find an "optimal" allocation.
Abstract: This paper addresses the problem of allocating (assigning and scheduling) periodic task modules to processing nodes in distributed real-time systems subject to task precedence and timing constraints. Using the branch-and-bound technique, a module allocation scheme is proposed to find an "optimal" allocation that maximizes the probability of meeting task deadlines. The task system within a planning cycle is first modeled with a task flow graph which describes computation and communication modules, as well as the precedence constraints among them. To incorporate both timing and logical correctness into module allocation, the probability of meeting task deadlines is used as the objective function. The module allocation scheme is then applied to find an optimal allocation of task modules in a distributed system. The timing aspects embedded in the objective: function drive the scheme not only to assign task modules to processing nodes, but also to use a module scheduling algorithm (with polynomial time complexity) for scheduling all modules assigned to each node, so that all tasks maybe completed in time. In order to speed up the branch-and-bound process and to reduce the computational complexity, a dominance relation is derived. Several numerical examples are presented to demonstrate the effectiveness and practicality of the proposed scheme.

91 citations


Patent
10 Jan 1997
TL;DR: In this article, an object-oriented real-time transaction processing system and a method for improving the control of data flow and the regulation of the data transfer among, multicomputers including a control and scheduling device for looking ahead and predicting the execution path for processing and communications.
Abstract: A object-oriented real time transaction processing system and a method for improving the control of data flow and the regulation of the data transfer among, multicomputers including a control and scheduling device for looking ahead and predicting the execution path for processing and communications. Said system is adaptive to allow the change of run-time environment including bandwidth for internal processing and/or storage, as well as the external communications/transmission. Said system can be self-directive for allowing autonomous operation of knowledge/information processing.

74 citations


Patent
05 Dec 1997
TL;DR: In this article, a self-directed transaction processing system for controlling data flow and regulating data transfer among multiple computing devices, wherein computing devices comprising a predictor means for predicting the forthcoming subject of interest for a particular user/application according to the selection, correlation, and interpretation of the past history of individual work flow and/or data entry of said user or application, further allow said computing devices to proceed searching and retrieving relevant information from remote or local database/transaction information source.
Abstract: A self-directed transaction processing system for controlling data flow and regulating data transfer among multiple computing devices, wherein said computing devices comprising a predictor means for predicting the forthcoming subject of interest for a particular user/application according to the selection, correlation, and/or interpretation of the past history of individual work flow and/or data entry of said user/application, said system further allow said computing devices to proceed searching and retrieving relevant information from remote or local database/transaction information source and forwarding to said user/application.

Patent
01 Dec 1997
TL;DR: In this paper, the authors present an architecture that involves an embedded Digital Signal Processor (DSP), a DSP interface and memory architecture, a micro-controller interface, DSP operating system (OS), a data flow model, and an interface for hardware blocks.
Abstract: The present invention comprises an architecture that involves an embedded Digital Signal Processor (DSP), a DSP interface and memory architecture, a micro-controller interface, a DSP operating system (OS), a data flow model, and an interface for hardware blocks. The design allows software to control much of the configuration of the architecture while using hardware to provide efficient data flow, signal processing, and memory access. In devices with embedded DSPs, memory access is often the bottleneck and is tightly coupled to the efficiency of the design. The platform architecture involves a method that allows the sharing of the DSP memory with other custom hardware blocks or the micro-controller. The DSP can operate at full millions-of-instructions-per-second (MIPS) while another function is transferring data to and from memory. This allows for an efficient use of the memory and for a partitioning of the DSP tasks between software and hardware.

Patent
31 Jan 1997
TL;DR: In this paper, bandwidth requirements for each pre-compressed information frame are sent to a concentrator so that the precompressed data and other information is efficiently concentrated without cropping any information.
Abstract: In a predictive manner, bandwidth requirements for each pre-compressed information frame are sent to a concentrator so that the pre-compressed information and other information is efficiently concentrated without cropping any information. Specifically, the bandwidth requirement for each one of multiple data frames is obtained and stored with the information data frames. The multiple data frames are concentrated with additional data into one data stream, and concentration is controlled with the aggregate bandwidth requirement of the bandwidth requirements for each one of the multiple data frames. A processor with an information data input, a processed data output and a rate data output for each frame of the processed data output is coupled to a storage device. A concentrator operatively connects to the storage device. The concentrator is capable of receiving at least a first and second frame of information data and has a concentrated output. A controller operatively connects to the storage device and is capable of receiving at least one bandwidth requirement. The controller has a data flow control output connected to the concentrator.

Journal ArticleDOI
TL;DR: Decisions on the profiles of dynamic constructs within a macro actor, such as a conditional and a data-dependent iteration, are shown to be optimal under some bold assumptions, and expected to be near-optimal in most cases.
Abstract: Scheduling dataflow graphs onto processors consists of assigning actors to processors, ordering their execution within the processors, and specifying their firing time. While all scheduling decisions can be made at runtime, the overhead is excessive for most real systems. To reduce this overhead, compile-time decisions can be made for assigning and/or ordering actors on processors. Compile-time decisions are based on known profiles available for each actor at compile time. The profile of an actor is the information necessary for scheduling, such as the execution time and the communication patterns. However, a dynamic construct within a macro actor, such as a conditional and a data-dependent iteration, makes the profile of the actor unpredictable at compile time. For those constructs, we propose to assume some profile at compile-time and define a cost to be minimized when deciding on the profile under the assumption that the runtime statistics are available at compile-time. Our decisions on the profiles of dynamic constructs are shown to be optimal under some bold assumptions, and expected to be near-optimal in most cases. The proposed scheduling technique has been implemented as one of the rapid prototyping facilities in Ptolemy. This paper presents the preliminary results on the performance with synthetic examples.

Patent
Ray Hsu1
06 Jun 1997
TL;DR: In this article, a method for detecting differences between two graphical programs is disclosed, where objects of the two programs are heuristically matched together using a scoring approach, and the scores are stored in a match matrix and indicate a degree of similarity between an object in the first graphical program and an object of the second graphical program according to one or more criteria.
Abstract: A method for detecting differences between two graphical programs is disclosed. The graphical programs include objects, preferably arranged as a user interface panel, including controls and indicators, and a block diagram, including graphical code function blocks connected together as a data flow program. Directed graph data structures are created to represent the graphical programs, wherein the vertices of the graphs are the objects of the graphical programs and the edges of the graphs are data flow signals of the block diagram and/or hierarchical relationships of the user interface panel objects. The objects of the two graphical programs are heuristically matched together using a scoring approach. The scores are stored in a match matrix and indicate a degree of similarity between an object in the first graphical program and an object in the second graphical program according to one or more criteria. The matching criteria include object type, object connectivity and object attributes. The match matrix is resolved to generate a 1:1 or 1:0 correspondence between the objects in the first and second graphical programs based on the match scores. The matching information is used to determine differences in the two graphical programs. First, using the matching information and a compare engine, the objects are grouped into exact matching subgraphs and then into non-exact matching subgraphs. Non-exact matching subgraphs are matched and merged where possible using transitivity. Objects in the non-exact matching subgraphs are compared using the compare engine to detect additional differences. All detected differences are stored and displayed for the user. The differences may be displayed in various manners such as drawing a circle around the differences, highlighting the differences by color, and/or displaying a textual description of the differences.

Journal ArticleDOI
TL;DR: An algorithm is given to minimize the system's power; the algorithm finds an optimal schedule and the experimental results for some high-level synthesis benchmarks show considerable reduction in the power consumption.

Journal ArticleDOI
01 Apr 1997
TL;DR: All the tools in the MAD environment follow an extensible and modular debugging strategy based on a graphical user interface that helps the user in monitoring and analyzing message passing programs.
Abstract: Debugging parallel programs can be tedious and difficult. Therefore the programmer needs support from tools, that provide features for error detection and performance analysis. The MAD environment is such a toolset. It helps the user in monitoring and analyzing message passing programs. Communication errors and performance bottlenecks are visualized based on an event graph. Source code connection provides a combination between visualized events and the original lines of code or a control and data flow representation. A main part of the environment is dedicated to race conditions. After evaluation of events, which might be reordered during successive program runs, localization of message races can be performed by means of trace driven simulation. All the tools in the MAD environment follow an extensible and modular debugging strategy based on a graphical user interface.

Proceedings Article
15 Oct 1997
TL;DR: The MAGIK system provides mechanisms that implementors can use to incorporate application semantics into compilation, thereby enabling both optimizations and semantic checking impossible by other means.
Abstract: Programmers have traditionally been passive users of compilers, rather than active exploiters of their transformational abilities. This paper presents MAGIK, a system that allows programmers to easily and modularly incorporate application-specific extensions into the compilation process. The MAGIK system gives programmers two significant capabilities. First, it provides mechanisms that implementors can use to incorporate application semantics into compilation, thereby enabling both optimizations and semantic checking impossible by other means. Second, since extensions are invoked during the translation from source to machine code, code transformations (such as software fault isolation [14]) can be performed with full access to the symbol and data flow information available to the compiler proper, allowing them both to exploit source semantics and to have their transformations (automatically) optimized as any other code.

Proceedings ArticleDOI
21 Jun 1997
TL;DR: This paper documents the experience with building a highly efficient array data flow analyzer which is based on guarded array regions and which runs faster, by one or two orders of magnitude, than other similarly powerful tools.
Abstract: Array data flow analysis is known to be crucial to the success of array privatization, one of the most important techniques for program parallelization. It is clear that array data flow analysis should be performed interprocedurally and symbolically, and that it often needs to handle the predicates represented by IF conditions. Unfortunately, such a powerful program analysis can be extremely time-consuming if not carefully designed. How to enhance the efficiency of thk analysis to a practical level remains an issue largely untouched to date. This paper documents our experience with building a highly efficient array data flow analyzer which is based on guarded array regions and which runs faster, by one or two orders of magnitude, than other similarly powerful tools.

Proceedings ArticleDOI
05 Nov 1997
TL;DR: A testability measurement based on the controllability/observability pair of attributes allows detection of weaknesses and appraisal of improvements in terms of testability during the specification stage.
Abstract: The paper focuses on data flow designs. It presents a testability measurement based on the controllability/observability pair of attributes. A case study provided by AEROSPATIALE illustrates the testability analysis of an embedded data flow design. Applying such an analysis during the specification stage allows detection of weaknesses and appraisal of improvements in terms of testability.

Journal Article
TL;DR: This paper presents a flexible and generic software architecture for describing and performing language-independent data flow analysis which allows such transparent multi-language analysis.
Abstract: Data flow analysis is a process for collecting run-time information about data in programs without actually executing them. In this paper, we focus at the use of data flow analysis to support program understanding and reverse engineering. Data flow analysis is beneficial for these applications since the information obtained can be used to compute relationships between data objects in programs. These relations play a key role, for example, in the determination of the logical components of a system and their interaction. The general support of program understanding and reverse engineering requires the ability to analyse a variety of source languages and the ability to combine the results of analysing multiple languages. We present a flexible and generic software architecture for describing and performing language-independent data flow analysis which allows such transparent multi-language analysis. All components of this architecture were formally specified.

Proceedings ArticleDOI
R. Nair, G. Ryan, F. Farzaneh1
19 Oct 1997
TL;DR: A symbolic simulation-based algorithm to derive optimized Boolean equations for a parameterizable data width CRC generator/checker is described and compared with a conventional loop iteration technique.
Abstract: Describes a symbolic simulation-based algorithm to derive optimized Boolean equations for a parameterizable data width CRC generator/checker. The equations are then used to implement a data flow representation of the CRC circuit in VHDL. The VHDL description is subsequently synthesized to gates. The area and timing results of the hardware implementation are presented and compared with a conventional loop iteration technique (also described in this paper). The CRC-32 polynomial, commonly used for most computer network protocol standards, was chosen to implement the algorithm.

Proceedings ArticleDOI
09 Jun 1997
TL;DR: The author identifies inherent real-time properties of nodes in a PGM data flow graph, and demonstrates how these properties can be exploited to perform useful and important system-level analyses such as schedulability analysis, end-to-end latency analysis, and memory requirements analysis.
Abstract: Real-time signal processing applications are commonly designed using a data flow software architecture. The author attempts to understand fundamental real-time properties of such an architecture-the Navy's coarse-grain processing graph method (PGM). By applying recent results in real-time scheduling theory to the subset of PGM employed by the ARPA RASSP Synthetic Aperture Radar benchmark application, he identifies inherent real-time properties of nodes in a PGM data flow graph, and demonstrates how these properties can be exploited to perform useful and important system-level analyses such as schedulability analysis, end-to-end latency analysis, and memory requirements analysis. More importantly, he develops relationships between properties such as latency and buffer bounds and show how one may be traded-off for the other. The results assume only the existence of a simple EDF scheduler and thus can be easily applied in practice.

Patent
12 Mar 1997
TL;DR: In this article, a method and system for controlling flow of output data between computers sharing an application program is presented, where each computer has a sharing system for coordinating the sharing of the application program.
Abstract: A method and system for controlling flow of output data between computers sharing an application program. The application program is executed on a host computer and shared with shadow computers. Each computer has a sharing system for coordinating the sharing of the application program. The sharing system of the host computer requests a flow control system of the host computer for permission to transmit output data. The flow control system of the host computer, upon receiving the request for permission, determines whether the amount of output data currently in transit from the host computer to the shadow computers exceeds the amount that can be in transit. When the amount is not exceeded, the flow control system grants permission to the sharing system of the host computer; and when the amount is exceeded, the flow control system denies permission to the sharing system of the host computer. Periodically, the flow control system calculates a shadow display time that represents time needed to transmit a certain amount of output data to the shadow computers and to process the certain amount of output data at the shadow computers. The flow control system also adjusts the amount of data that can be in transit when the calculated shadow display time is not acceptable so that the host computer and shadow computers can be displaying output data at approximately the same time. The sharing system transmits the output data to the shadow computers when permission is granted.

Proceedings ArticleDOI
01 Dec 1997
TL;DR: Data flow algorithms for performing optimization algorithms for partial dead code elimination and partial redundancy elimination with the following characteristics are developed: opportunities for PRE and PDE enabled by hoisting and sinking are exploited.
Abstract: Instruction schedulers employ code motion as a means of instruction reordering to enable scheduling of instructions at points where the resources required for their execution are available. In addition, driven by the profiling data, schedulers take advantage of predication and speculation for aggressive code motion across conditional branches. Optimization algorithms for partial dead code elimination (PDE) and partial redundancy elimination (PRE) employ code sinking and hoisting to enable optimization. However, unlike instruction scheduling, these optimization algorithms are unaware of resource availability and are incapable of exploiting profiling information, speculation, and predication. In this paper we develop data flow algorithms for performing the above optimizations with the following characteristics: (i) opportunities for PRE and PDE enabled by hoisting and sinking are exploited; (ii) hoisting and sinking of a code statement is driven by availability of functional unit resources; (iii) predication and speculation is incorporated to allow aggressive hoisting and sinking; and (iv) path profile information guides predication and speculation to enable optimization.

Patent
22 Dec 1997
TL;DR: In this paper, the authors propose a method for dynamic reconfiguration of FPGA, in which one or more switching tables consisting of one/more controls and one/ more configuration storages are integrated in the module or connected thereto.
Abstract: The invention relates to a method for dynamic reconfiguration of FPGA, in which one or more switching tables consisting of one or more controls and one or more configuration storages are integrated in the module or connected thereto. Configuration words of a switching table are transferred to a configurable element or to multiple configurable elements of the module which then set a valid configuration. The load logic or the configurable elements of the module or modules can write data in the configuration storage or storages of the switching table or tables. The control of the switching table or tables can recognize individual inputs as instructions and execute them. The control can also recognize and distinguish different events and execute a relevant defined action. Upon reacting to the occurrence of an event or a combination of events, the control moves the position pointer or pointers. Whenever configuration data and not control instructions are concerned, the control sends said configuration data to the configurable element or elements declared in the configuration data.

Book ChapterDOI
TL;DR: The SysLab system model serves as an abstract mathematical model for information systems and their components that is used to formalize the semantics of all used description techniques, such as object diagrams, state automata, sequence charts or data-flow diagrams.
Abstract: In the SysLab-project, we develop a software engineering method based on a thorough mathematical foundation The SysLab system model serves as an abstract mathematical model for information systems and their components It is used to formalize the semantics of all used description techniques, such as object diagrams, state automata, sequence charts or data-flow diagrams The system model hence is the key to the semantic integration of the method Based on the requirements for such a reference model, we define the system model including its different views and their relationships

Journal ArticleDOI
TL;DR: A new graph-theoretic framework based on a data structure called the synchronization graph is introduced for analyzing and optimizing synchronization overhead in self-timed, iterative dataflow programs in which synchronization overhead can be significant.
Abstract: This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of self-timed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Self-timed execution refers to a combined compile-time/run-time scheduling strategy in which processors synchronize with one another based only on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graph-theoretic framework based on a data structure called the synchronization graph for analyzing and optimizing synchronization overhead in self-timed, iterative dataflow programs. We show that the comprehensive techniques that have been developed for removing redundant synchronizations in noniterative programs can be extended in this framework to optimally remove redundant synchronizations in our context. We also present an optimization that converts a feedforward dataflow graph into a strongly connected graph in such a way as to reduce synchronization overhead without slowing down execution.

Proceedings ArticleDOI
16 Apr 1997
TL;DR: Methods of implementation and performance for several common operations using the wormhole RTR paradigm are outlined, serving as indicators of the diversity of algorithms that can be instantiated through the high-speed run-time reconfiguration that these devices make possible.
Abstract: The wormhole run-time reconfiguration (RTR) computing paradigm is a method for creating high performance computational pipelines. The scalability, distributed control and data flow features of the paradigm allow it to fit neatly into the configurable computing machine (CCM) domain. To date, the field has been dominated by large bit-oriented devices whose flexibility can lead to lowered silicon utilization efficiencies. In an effort to raise this efficiency, the Colt CCM has been created based on the wormhole RTR paradigm. This paper outlines methods of implementation and performance for several common operations using these concepts. They serve as indicators of the diversity of algorithms that can be instantiated through the high-speed run-time reconfiguration that these devices make possible. Particular attention is paid to floating point multiplication. Also discussed is the topic of data dependent computation which would seem to be counter intuitive to the wormhole RTR paradigm. The paper concludes with a summary of performance of the three computations.

Patent
01 Dec 1997
TL;DR: In this paper, the authors present an architecture that involves an embedded Digital Signal Processor (DSP), a DSP interface and memory architecture, a microcontroller interface, an OS, a data flow model, and an interface for hardware blocks.
Abstract: The present invention comprises an architecture that involves an embedded Digital Signal Processor (DSP), a DSP interface and memory architecture, a microcontroller interface, a DSP operating system (OS), a data flow model, and an interface for hardware blocks. The design allows software to control much of the configuration of the architecture while using hardware to provide efficient data flow, signal processing, and memory access. In devices with embedded DSPs, memory access is often the bottleneck and is tightly coupled to the efficiency of the design. The platform architecture involves a method that allows the sharing of the DSP memory with other custom hardware blocks or the micro-controller. The DSP can operate at full millions-of-instructions-per-second (MIPS) while another function is transferring data to and from memory. This allows for an efficient use of the memory and for a partitioning of the DSP tasks between software and hardware.