Showing papers on "Data flow diagram published in 1997"

PDF

Open Access

Proceedings Article•DOI•

Exploiting hardware performance counters with flow and context sensitive profiling

[...]

Glenn Ammons¹, Thomas Ball², James R. Larus¹•Institutions (2)

University of Wisconsin-Madison¹, Alcatel-Lucent²

01 May 1997

TL;DR: This paper extends previous work on efficient path profiling to flow sensitive profiling, which associates hardware performance metrics with a path through a procedure, and describes a data structure, the calling context tree, that efficiently captures calling contexts for procedure-level measurements.

...read moreread less

Abstract: A program profile attributes run-time costs to portions of a program's execution. Most profiling systems suffer from two major deficiencies: first, they only apportion simple metrics, such as execution frequency or elapsed time to static, syntactic units, such as procedures or statements; second, they aggressively reduce the volume of information collected and reported, although aggregation can hide striking differences in program behavior.This paper addresses both concerns by exploiting the hardware counters available in most modern processors and by incorporating two concepts from data flow analysis--flow and context sensitivity--to report more context for measurements. This paper extends our previous work on efficient path profiling to flow sensitive profiling, which associates hardware performance metrics with a path through a procedure. In addition, it describes a data structure, the calling context tree, that efficiently captures calling contexts for procedure-level measurements.Our measurements show that the SPEC95 benchmarks execute a small number (3--28) of hot paths that account for 9--98% of their L1 data cache misses. Moreover, these hot paths are concentrated in a few routines, which have complex dynamic behavior.

...read moreread less

557 citations

Proceedings Article•DOI•

Trace processors

[...]

Eric Rotenberg¹, Quinn Jacobson¹, Yiannakis Sazeides¹, James E. Smith¹•Institutions (1)

University of Wisconsin-Madison¹

01 Dec 1997

TL;DR: The results affirm that significant instruction-level parallelism can be exploited in integer programs (2 to 6 instructions per cycle) and quantify the value of successively doubling the number of distributed elements.

...read moreread less

Abstract: Traces are dynamic instruction sequences constructed and cached by hardware. A microarchitecture organized around traces is presented as a means for efficiently executing many instructions per cycle. Trace processors exploit both control flow and data flow hierarchy to overcome complexity and architectural limitations of conventional superscalar processors by (1) distributing execution resources based on trace boundaries and (2) applying control and data prediction at the trace level rather than individual branches or instructions. Three sets of experiments using the SPECInt95 benchmarks are presented. (i) A detailed evaluation of trace processor configurations: the results affirm that significant instruction-level parallelism can be exploited in integer programs (2 to 6 instructions per cycle). We also isolate the impact of distributed resources, and quantify the value of successively doubling the number of distributed elements. (ii) A trace processor with data prediction applied to inter-trace dependences: potential performance improvement with perfect prediction is around 45% for all benchmarks. With realistic prediction, gcc achieves an actual improvement of 10%. (iii) Evaluation of aggressive control flow: some benchmarks benefit from control independence by as much as 10%.

...read moreread less

374 citations

Patent•

Apparatus and method for performing data transformations in data warehousing

[...]

M. S. Kiumarse Zamanian, Diaz Nesamoney

07 Nov 1997

TL;DR: In this article, a transformation description language (TDL) is proposed for specifying how data is to be manipulated in a data warehousing application. The TDL is comprised of a source for storing raw data, one or more transformation objects for processing the raw data according to predefined instructions, and a target for storing the processed data.

...read moreread less

Abstract: A transformation description language (TDL) for specifying how data is to be manipulated in a data warehousing application. The TDL is comprised of a source for storing raw data, one or more transformation objects for processing the raw data according to predefined instructions, and a target for storing the processed data. A mapping is used for directing the data flow between the I/O ports corresponding to the source, the plurality of transformation objects, and the target. The mapping specifies the connectivity between the source, transformation, and target objects as well as the order of these connections. There are a number of different transformations which can be performed to manipulate the data. Some such transformations include: an aggregator transformation, an expression transformation, a filter transformation, a lookup transformation, a query transformation, a sequence transformation, a stored procedure transformation, and an update strategy transformation.

...read moreread less

291 citations

Patent•

Dynamic traffic conditioning

[...]

Alan Stanley John Chapman¹, Hsiang-Tsung Kung¹•Institutions (1)

Nortel¹

14 Mar 1997

TL;DR: In this paper, the authors propose an approach that allows the network to discover the nature of the service for each traffic flow, classifies it dynamically, and exercises traffic conditioning by means of such techniques as admission control and scheduling when delivering the traffic downstream to support the service appropriately.

...read moreread less

Abstract: Multi-media networks will require that a data flow be given certain quality-of-service (QOS) for a network connection but pre-negotiation of this sort is foreign to the current data networking model. The real time traffic flow in the data network requires distinct limits on the tolerance to delay, and the variations in that delay. Interactive voice and video demand that the total delay does not exceed the threshold beyond which human interaction is unacceptably impaired. The present invention allows the network to discover the nature of the service for each traffic flow, classifies it dynamically, and exercises traffic conditioning by means of such techniques as admission control and scheduling when delivering the traffic downstream to support the service appropriately.

...read moreread less

238 citations

Proceedings Article•DOI•

Collaborative visualization

[...]

Jason Wood¹, Helen Wright¹, Ken Brodlie¹•Institutions (1)

University of Leeds¹

01 Oct 1997

TL;DR: The paper shows how the popular data flow approach to visualization can be extended to allow multiple users to collaborate-each running their own visualization pipeline but with the opportunity to connect in data generated by a colleague.

...read moreread less

Abstract: Current visualization systems are designed around a single user model, making it awkward for large research teams to collectively analyse large data sets. The paper shows how the popular data flow approach to visualization can be extended to allow multiple users to collaborate-each running their own visualization pipeline but with the opportunity to connect in data generated by a colleague, Thus collaborative visualizations are 'programmed' in exactly the same 'plug-and-play' style as is now customary for single-user mode. The paper describes a system architecture that can act as a basis for the collaborative extension of any data flow visualization system, and the ideas are demonstrated through a particular implementation in terms of IRIS Explorer.

...read moreread less

144 citations

Proceedings Article•

MineSet: an integrated system for data mining

[...]

Cliff Brunk, James Kelly, Ron Kohavi

14 Aug 1997

TL;DR: MineSet supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deployment, and third party vendors can interface to the MineSet tools for model deployment and for integration with other packages.

...read moreread less

Abstract: MineSet™, Silicon Graphics' interactive system for data mining, integrates three powerful technologies: database access, analytical data mining, and data visualization. It supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deployment. Mine-Set is based on a client-server architecture that scales to large databases. The database access component provides a rich set of operators that can be used to preprocess and transform the stored data into forms appropriate for visualization and analytical mining. The 3D visualization capabilities allow direct data visualization for exploratory analysis, including tools for displaying high-dimensional data containing geographical and hierarchical information. The analytical mining algorithms help identify potentially interesting models of the data, which can be viewed using visualization tools specialized for the learned models. Third party vendors can interface to the MineSet tools for model deployment and for integration with other packages.

...read moreread less

135 citations

Journal Article•DOI•

Extracting reusable functions by flow graph based program slicing

[...]

Filippo Lanubile¹, Giuseppe Visaggio²•Institutions (2)

University of Maryland, College Park¹, University of Bari²

01 Apr 1997-IEEE Transactions on Software Engineering

TL;DR: This work applies program slicing, a program decomposition method, to the problem of extracting reusable functions from ill structured programs, and extends the definition of program slice to a transform slice, one that includes statements which contribute directly or indirectly to transform a set of input variables into aSet of output variables.

...read moreread less

Abstract: An alternative approach to developing reusable components from scratch is to recover them from existing systems. We apply program slicing, a program decomposition method, to the problem of extracting reusable functions from ill structured programs. As with conventional slicing first described by M. Weiser (1984), a slice is obtained by iteratively solving data flow equations based on a program flow graph. We extend the definition of program slice to a transform slice, one that includes statements which contribute directly or indirectly to transform a set of input variables into a set of output variables. Unlike conventional program slicing, these statements do not include either the statements necessary to get input data or the statements which test the binding conditions of the function. Transform slicing presupposes the knowledge that a function is performed in the code and its partial specification, only in terms of input and output data. Using domain knowledge we discuss how to formulate expectations of the functions implemented in the code. In addition to the input/output parameters of the function, the slicing criterion depends on an initial statement, which is difficult to obtain for large programs. Using the notions of decomposition slice and concept validation we show how to produce a set of candidate functions, which are independent of line numbers but must be evaluated with respect to the expected behavior. Although human interaction is required, the limited size of candidate functions makes this task easier than looking for the last function instruction in the original source code.

...read moreread less

119 citations

Journal Article•DOI•

A practical framework for demand-driven interprocedural data flow analysis

[...]

Evelyn Duesterwald¹, Rajiv Gupta¹, Mary Lou Soffa¹•Institutions (1)

University of Pittsburgh¹

01 Nov 1997-ACM Transactions on Programming Languages and Systems

TL;DR: This article presents a general framework for developing demand-driven interprocedural data flow analyzers and reports the experience in evaluating the performance of this approach.

...read moreread less

Abstract: The high cost and growing importance of interprocedural data flow analysis have led to an increased interest in demand-driven algorithms. In this article, we present a general framework for developing demand-driven interprocedural data flow analyzers and report our experience in evaluating the performance of this approach. A demand for data flow information is modeled as a set of queries. The framework includes a generic demand-driven algorithm that determines the response to query by iteratively applying a system of query propagation rules. The propagation rules yield precise responses for the class of distributive finite data flow problems. We also describe a two-phase framework variation to accurately handle nondistributive problems. A performance evaluation of our demand-driven approach is presented for two data flow problems, namely, reaching-definitions and copy constant propagation. Our experiments show that demand-driven analysis performs well in practice, reducing both time and space requirements when compared with exhaustive analysis.

...read moreread less

96 citations

Journal Article•DOI•

Allocation of periodic task modules with precedence and deadline constraints in distributed real-time systems

[...]

Chao-Ju Hou¹, Kang G. Shin²•Institutions (2)

Ohio State University¹, University of Michigan²

01 Dec 1997-IEEE Transactions on Computers

TL;DR: This paper addresses the problem of allocating (assigning and scheduling) periodic task modules to processing nodes in distributed real-time systems subject to task precedence and timing constraints using the branch-and-bound technique to find an "optimal" allocation.

...read moreread less

Abstract: This paper addresses the problem of allocating (assigning and scheduling) periodic task modules to processing nodes in distributed real-time systems subject to task precedence and timing constraints. Using the branch-and-bound technique, a module allocation scheme is proposed to find an "optimal" allocation that maximizes the probability of meeting task deadlines. The task system within a planning cycle is first modeled with a task flow graph which describes computation and communication modules, as well as the precedence constraints among them. To incorporate both timing and logical correctness into module allocation, the probability of meeting task deadlines is used as the objective function. The module allocation scheme is then applied to find an optimal allocation of task modules in a distributed system. The timing aspects embedded in the objective: function drive the scheme not only to assign task modules to processing nodes, but also to use a module scheduling algorithm (with polynomial time complexity) for scheduling all modules assigned to each node, so that all tasks maybe completed in time. In order to speed up the branch-and-bound process and to reduce the computational complexity, a dominance relation is derived. Several numerical examples are presented to demonstrate the effectiveness and practicality of the proposed scheme.

...read moreread less

91 citations

Patent•

System for regulating multicomputer data transfer by allocating time slot to designated processing task according to communication bandwidth capabilities and modifying time slots when bandwidth change

[...]

Venson M. Shaw, Steven M. Shaw

10 Jan 1997

TL;DR: In this article, an object-oriented real-time transaction processing system and a method for improving the control of data flow and the regulation of the data transfer among, multicomputers including a control and scheduling device for looking ahead and predicting the execution path for processing and communications.

...read moreread less

Abstract: A object-oriented real time transaction processing system and a method for improving the control of data flow and the regulation of the data transfer among, multicomputers including a control and scheduling device for looking ahead and predicting the execution path for processing and communications. Said system is adaptive to allow the change of run-time environment including bandwidth for internal processing and/or storage, as well as the external communications/transmission. Said system can be self-directive for allowing autonomous operation of knowledge/information processing.

...read moreread less

74 citations

Patent•

Information processing system for directing information request from a particular user/application, and searching/forwarding/retrieving information from unknown and large number of information resources

[...]

Venson M. Shaw, Steven M. Shaw

05 Dec 1997

TL;DR: In this article, a self-directed transaction processing system for controlling data flow and regulating data transfer among multiple computing devices, wherein computing devices comprising a predictor means for predicting the forthcoming subject of interest for a particular user/application according to the selection, correlation, and interpretation of the past history of individual work flow and/or data entry of said user or application, further allow said computing devices to proceed searching and retrieving relevant information from remote or local database/transaction information source.

...read moreread less

Abstract: A self-directed transaction processing system for controlling data flow and regulating data transfer among multiple computing devices, wherein said computing devices comprising a predictor means for predicting the forthcoming subject of interest for a particular user/application according to the selection, correlation, and/or interpretation of the past history of individual work flow and/or data entry of said user/application, said system further allow said computing devices to proceed searching and retrieving relevant information from remote or local database/transaction information source and forwarding to said user/application.

...read moreread less

Patent•

Programmable data flow processor for performing data transfers

[...]

Glen W. Brown¹•Institutions (1)

Advanced Micro Devices¹

01 Dec 1997

TL;DR: In this paper, the authors present an architecture that involves an embedded Digital Signal Processor (DSP), a DSP interface and memory architecture, a micro-controller interface, DSP operating system (OS), a data flow model, and an interface for hardware blocks.

...read moreread less

Abstract: The present invention comprises an architecture that involves an embedded Digital Signal Processor (DSP), a DSP interface and memory architecture, a micro-controller interface, a DSP operating system (OS), a data flow model, and an interface for hardware blocks. The design allows software to control much of the configuration of the architecture while using hardware to provide efficient data flow, signal processing, and memory access. In devices with embedded DSPs, memory access is often the bottleneck and is tightly coupled to the efficiency of the design. The platform architecture involves a method that allows the sharing of the DSP memory with other custom hardware blocks or the micro-controller. The DSP can operate at full millions-of-instructions-per-second (MIPS) while another function is transferring data to and from memory. This allows for an efficient use of the memory and for a partitioning of the DSP tasks between software and hardware.

...read moreread less

Patent•

System for allocating available bandwidth of a concentrated media output

[...]

Richard J. Echeita, Thomas G. McGiffen, Robert H. Plummer

31 Jan 1997

TL;DR: In this paper, bandwidth requirements for each pre-compressed information frame are sent to a concentrator so that the precompressed data and other information is efficiently concentrated without cropping any information.

...read moreread less

Abstract: In a predictive manner, bandwidth requirements for each pre-compressed information frame are sent to a concentrator so that the pre-compressed information and other information is efficiently concentrated without cropping any information. Specifically, the bandwidth requirement for each one of multiple data frames is obtained and stored with the information data frames. The multiple data frames are concentrated with additional data into one data stream, and concentration is controlled with the aggregate bandwidth requirement of the bandwidth requirements for each one of the multiple data frames. A processor with an information data input, a processed data output and a rate data output for each frame of the processed data output is coupled to a storage device. A concentrator operatively connects to the storage device. The concentrator is capable of receiving at least a first and second frame of information data and has a concentrated output. A controller operatively connects to the storage device and is capable of receiving at least one bandwidth requirement. The controller has a data flow control output connected to the concentrator.

...read moreread less

Journal Article•DOI•

Compile-time scheduling of dynamic constructs in dataflow program graphs

[...]

Soonhoi Ha¹, Edward A. Lee²•Institutions (2)

Seoul National University¹, University of California, Berkeley²

01 Jul 1997-IEEE Transactions on Computers

TL;DR: Decisions on the profiles of dynamic constructs within a macro actor, such as a conditional and a data-dependent iteration, are shown to be optimal under some bold assumptions, and expected to be near-optimal in most cases.

...read moreread less

Abstract: Scheduling dataflow graphs onto processors consists of assigning actors to processors, ordering their execution within the processors, and specifying their firing time. While all scheduling decisions can be made at runtime, the overhead is excessive for most real systems. To reduce this overhead, compile-time decisions can be made for assigning and/or ordering actors on processors. Compile-time decisions are based on known profiles available for each actor at compile time. The profile of an actor is the information necessary for scheduling, such as the execution time and the communication patterns. However, a dynamic construct within a macro actor, such as a conditional and a data-dependent iteration, makes the profile of the actor unpredictable at compile time. For those constructs, we propose to assume some profile at compile-time and define a cost to be minimized when deciding on the profile under the assumption that the runtime statistics are available at compile-time. Our decisions on the profiles of dynamic constructs are shown to be optimal under some bold assumptions, and expected to be near-optimal in most cases. The proposed scheduling technique has been implemented as one of the rapid prototyping facilities in Ptolemy. This paper presents the preliminary results on the performance with synthetic examples.

...read moreread less

Patent•

Method for detecting differences between graphical programs

[...]

Ray Hsu¹•Institutions (1)

National Instruments¹

06 Jun 1997

TL;DR: In this article, a method for detecting differences between two graphical programs is disclosed, where objects of the two programs are heuristically matched together using a scoring approach, and the scores are stored in a match matrix and indicate a degree of similarity between an object in the first graphical program and an object of the second graphical program according to one or more criteria.

...read moreread less

Abstract: A method for detecting differences between two graphical programs is disclosed. The graphical programs include objects, preferably arranged as a user interface panel, including controls and indicators, and a block diagram, including graphical code function blocks connected together as a data flow program. Directed graph data structures are created to represent the graphical programs, wherein the vertices of the graphs are the objects of the graphical programs and the edges of the graphs are data flow signals of the block diagram and/or hierarchical relationships of the user interface panel objects. The objects of the two graphical programs are heuristically matched together using a scoring approach. The scores are stored in a match matrix and indicate a degree of similarity between an object in the first graphical program and an object in the second graphical program according to one or more criteria. The matching criteria include object type, object connectivity and object attributes. The match matrix is resolved to generate a 1:1 or 1:0 correspondence between the objects in the first and second graphical programs based on the match scores. The matching information is used to determine differences in the two graphical programs. First, using the matching information and a compare engine, the objects are grouped into exact matching subgraphs and then into non-exact matching subgraphs. Non-exact matching subgraphs are matched and merged where possible using transitivity. Objects in the non-exact matching subgraphs are compared using the compare engine to detect additional differences. All detected differences are stored and displayed for the user. The differences may be displayed in various manners such as drawing a circle around the differences, highlighting the differences by color, and/or displaying a textual description of the differences.

...read moreread less

Journal Article•DOI•

Scheduling with multiple voltages

[...]

Salil Raje¹, Majid Sarrafzadeh¹•Institutions (1)

Northwestern University¹

01 Oct 1997-Integration

TL;DR: An algorithm is given to minimize the system's power; the algorithm finds an optimal schedule and the experimental results for some high-level synthesis benchmarks show considerable reduction in the power consumption.

...read moreread less

Journal Article•DOI•

Debugging with the MAD environment

[...]

Dieter Kranzlmüller¹, Siegfried Grabner¹, Jens Volkert¹•Institutions (1)

Johannes Kepler University of Linz¹

01 Apr 1997

TL;DR: All the tools in the MAD environment follow an extensible and modular debugging strategy based on a graphical user interface that helps the user in monitoring and analyzing message passing programs.

...read moreread less

Abstract: Debugging parallel programs can be tedious and difficult. Therefore the programmer needs support from tools, that provide features for error detection and performance analysis. The MAD environment is such a toolset. It helps the user in monitoring and analyzing message passing programs. Communication errors and performance bottlenecks are visualized based on an event graph. Source code connection provides a combination between visualized events and the original lines of code or a control and data flow representation. A main part of the environment is dedicated to race conditions. After evaluation of events, which might be reordered during successive program runs, localization of message races can be performed by means of trace driven simulation. All the tools in the MAD environment follow an extensible and modular debugging strategy based on a graphical user interface.

...read moreread less

Proceedings Article•

Incorporating application semantics and control into compilation

[...]

Dawson Engler¹•Institutions (1)

Massachusetts Institute of Technology¹

15 Oct 1997

TL;DR: The MAGIK system provides mechanisms that implementors can use to incorporate application semantics into compilation, thereby enabling both optimizations and semantic checking impossible by other means.

...read moreread less

Abstract: Programmers have traditionally been passive users of compilers, rather than active exploiters of their transformational abilities. This paper presents MAGIK, a system that allows programmers to easily and modularly incorporate application-specific extensions into the compilation process. The MAGIK system gives programmers two significant capabilities. First, it provides mechanisms that implementors can use to incorporate application semantics into compilation, thereby enabling both optimizations and semantic checking impossible by other means. Second, since extensions are invoked during the translation from source to machine code, code transformations (such as software fault isolation [14]) can be performed with full access to the symbol and data flow information available to the compiler proper, allowing them both to exploit source semantics and to have their transformations (automatically) optimized as any other code.

...read moreread less

Proceedings Article•DOI•

Experience with efficient array data flow analysis for array privatization

[...]

Junjie Gu¹, Zhiyuan Li¹, Gyungho Lee²•Institutions (2)

University of Minnesota¹, University of Texas at San Antonio²

21 Jun 1997

TL;DR: This paper documents the experience with building a highly efficient array data flow analyzer which is based on guarded array regions and which runs faster, by one or two orders of magnitude, than other similarly powerful tools.

...read moreread less

Abstract: Array data flow analysis is known to be crucial to the success of array privatization, one of the most important techniques for program parallelization. It is clear that array data flow analysis should be performed interprocedurally and symbolically, and that it often needs to handle the predicates represented by IF conditions. Unfortunately, such a powerful program analysis can be extremely time-consuming if not carefully designed. How to enhance the efficiency of thk analysis to a practical level remains an issue largely untouched to date. This paper documents our experience with building a highly efficient array data flow analyzer which is based on guarded array regions and which runs faster, by one or two orders of magnitude, than other similarly powerful tools.

...read moreread less

Proceedings Article•DOI•

Testability measurements for data flow designs

[...]

Y. Le Traon, C. Robach

05 Nov 1997

TL;DR: A testability measurement based on the controllability/observability pair of attributes allows detection of weaknesses and appraisal of improvements in terms of testability during the specification stage.

...read moreread less

Abstract: The paper focuses on data flow designs. It presents a testability measurement based on the controllability/observability pair of attributes. A case study provided by AEROSPATIALE illustrates the testability analysis of an embedded data flow design. Applying such an analysis during the specification stage allows detection of weaknesses and appraisal of improvements in terms of testability.

...read moreread less

Journal Article•

A generic architecture for data flow analysis to support reverse engineering

[...]

L.M.F. Moonen¹•Institutions (1)

University of Amsterdam¹

25 Sep 1997-CTIT technical reports series

TL;DR: This paper presents a flexible and generic software architecture for describing and performing language-independent data flow analysis which allows such transparent multi-language analysis.

...read moreread less

Abstract: Data flow analysis is a process for collecting run-time information about data in programs without actually executing them. In this paper, we focus at the use of data flow analysis to support program understanding and reverse engineering. Data flow analysis is beneficial for these applications since the information obtained can be used to compute relationships between data objects in programs. These relations play a key role, for example, in the determination of the logical components of a system and their interaction. The general support of program understanding and reverse engineering requires the ability to analyse a variety of source languages and the ability to combine the results of analysing multiple languages. We present a flexible and generic software architecture for describing and performing language-independent data flow analysis which allows such transparent multi-language analysis. All components of this architecture were formally specified.

...read moreread less

Proceedings Article•DOI•

A symbol based algorithm for hardware implementation of cyclic redundancy check (CRC)

[...]

R. Nair, G. Ryan, F. Farzaneh¹•Institutions (1)

Nortel¹

19 Oct 1997

TL;DR: A symbolic simulation-based algorithm to derive optimized Boolean equations for a parameterizable data width CRC generator/checker is described and compared with a conventional loop iteration technique.

...read moreread less

Abstract: Describes a symbolic simulation-based algorithm to derive optimized Boolean equations for a parameterizable data width CRC generator/checker. The equations are then used to implement a data flow representation of the CRC circuit in VHDL. The VHDL description is subsequently synthesized to gates. The area and timing results of the hardware implementation are presented and compared with a conventional loop iteration technique (also described in this paper). The CRC-32 polynomial, commonly used for most computer network protocol standards, was chosen to implement the algorithm.

...read moreread less

Proceedings Article•DOI•

Analyzing the real-time properties of a dataflow execution paradigm using a synthetic aperture radar application

[...]

Steve Goddard¹•Institutions (1)

University of North Carolina at Chapel Hill¹

09 Jun 1997

TL;DR: The author identifies inherent real-time properties of nodes in a PGM data flow graph, and demonstrates how these properties can be exploited to perform useful and important system-level analyses such as schedulability analysis, end-to-end latency analysis, and memory requirements analysis.

...read moreread less

Abstract: Real-time signal processing applications are commonly designed using a data flow software architecture. The author attempts to understand fundamental real-time properties of such an architecture-the Navy's coarse-grain processing graph method (PGM). By applying recent results in real-time scheduling theory to the subset of PGM employed by the ARPA RASSP Synthetic Aperture Radar benchmark application, he identifies inherent real-time properties of nodes in a PGM data flow graph, and demonstrates how these properties can be exploited to perform useful and important system-level analyses such as schedulability analysis, end-to-end latency analysis, and memory requirements analysis. More importantly, he develops relationships between properties such as latency and buffer bounds and show how one may be traded-off for the other. The results assume only the existence of a simple EDF scheduler and thus can be easily applied in practice.

...read moreread less

Patent•

Method and system for controlling data flow

[...]

Christopher J. Mairs¹, Philip May¹•Institutions (1)

PictureTel Corp.¹

12 Mar 1997

TL;DR: In this article, a method and system for controlling flow of output data between computers sharing an application program is presented, where each computer has a sharing system for coordinating the sharing of the application program.

...read moreread less

Abstract: A method and system for controlling flow of output data between computers sharing an application program. The application program is executed on a host computer and shared with shadow computers. Each computer has a sharing system for coordinating the sharing of the application program. The sharing system of the host computer requests a flow control system of the host computer for permission to transmit output data. The flow control system of the host computer, upon receiving the request for permission, determines whether the amount of output data currently in transit from the host computer to the shadow computers exceeds the amount that can be in transit. When the amount is not exceeded, the flow control system grants permission to the sharing system of the host computer; and when the amount is exceeded, the flow control system denies permission to the sharing system of the host computer. Periodically, the flow control system calculates a shadow display time that represents time needed to transmit a certain amount of output data to the shadow computers and to process the certain amount of output data at the shadow computers. The flow control system also adjusts the amount of data that can be in transit when the calculated shadow display time is not acceptable so that the host computer and shadow computers can be displaying output data at approximately the same time. The sharing system transmits the output data to the shadow computers when permission is granted.

...read moreread less

Proceedings Article•DOI•

Resource-sensitive profile-directed data flow analysis for code optimization

[...]

Rajiv Gupta¹, David A. Berson², Jesse Fang²•Institutions (2)

University of Pittsburgh¹, Intel²

01 Dec 1997

TL;DR: Data flow algorithms for performing optimization algorithms for partial dead code elimination and partial redundancy elimination with the following characteristics are developed: opportunities for PRE and PDE enabled by hoisting and sinking are exploited.

...read moreread less

Abstract: Instruction schedulers employ code motion as a means of instruction reordering to enable scheduling of instructions at points where the resources required for their execution are available. In addition, driven by the profiling data, schedulers take advantage of predication and speculation for aggressive code motion across conditional branches. Optimization algorithms for partial dead code elimination (PDE) and partial redundancy elimination (PRE) employ code sinking and hoisting to enable optimization. However, unlike instruction scheduling, these optimization algorithms are unaware of resource availability and are incapable of exploiting profiling information, speculation, and predication. In this paper we develop data flow algorithms for performing the above optimizations with the following characteristics: (i) opportunities for PRE and PDE enabled by hoisting and sinking are exploited; (ii) hoisting and sinking of a code statement is driven by availability of functional unit resources; (iii) predication and speculation is incorporated to allow aggressive hoisting and sinking; and (iv) path profile information guides predication and speculation to enable optimization.

...read moreread less

Patent•

Method for automatic dynamic unloading of data flow processors (dfp) as well as modules with bidimensional or multidimensional programmable cell structures (fpgas, dpgas or the like)

[...]

Martin Vorbach, Robert Münch

22 Dec 1997

TL;DR: In this paper, the authors propose a method for dynamic reconfiguration of FPGA, in which one or more switching tables consisting of one/more controls and one/ more configuration storages are integrated in the module or connected thereto.

...read moreread less

Abstract: The invention relates to a method for dynamic reconfiguration of FPGA, in which one or more switching tables consisting of one or more controls and one or more configuration storages are integrated in the module or connected thereto. Configuration words of a switching table are transferred to a configurable element or to multiple configurable elements of the module which then set a valid configuration. The load logic or the configurable elements of the module or modules can write data in the configuration storage or storages of the switching table or tables. The control of the switching table or tables can recognize individual inputs as instructions and execute them. The control can also recognize and distinguish different events and execute a relevant defined action. Upon reacting to the occurrence of an event or a combination of events, the control moves the position pointer or pointers. Whenever configuration data and not control instructions are concerned, the control sends said configuration data to the configurable element or elements declared in the configuration data.

...read moreread less

Book Chapter•DOI•

A stream-based mathematical model for distributed information processing systems - the SysLab system model -

[...]

Cornel Klein¹, Bernhard Rumpe¹, Manfred Broy¹•Institutions (1)

Technische Universität München¹

01 Jan 1997-arXiv: Software Engineering

TL;DR: The SysLab system model serves as an abstract mathematical model for information systems and their components that is used to formalize the semantics of all used description techniques, such as object diagrams, state automata, sequence charts or data-flow diagrams.

...read moreread less

Abstract: In the SysLab-project, we develop a software engineering method based on a thorough mathematical foundation The SysLab system model serves as an abstract mathematical model for information systems and their components It is used to formalize the semantics of all used description techniques, such as object diagrams, state automata, sequence charts or data-flow diagrams The system model hence is the key to the semantic integration of the method Based on the requirements for such a reference model, we define the system model including its different views and their relationships

...read moreread less

Journal Article•DOI•

Optimizing synchronization in multiprocessor DSP systems

[...]

Shuvra S. Bhattacharyya¹, Sundararajan Sriram², Edward A. Lee³•Institutions (3)

Hitachi¹, Texas Instruments², University of California, Berkeley³

01 Jun 1997-IEEE Transactions on Signal Processing

TL;DR: A new graph-theoretic framework based on a data structure called the synchronization graph is introduced for analyzing and optimizing synchronization overhead in self-timed, iterative dataflow programs in which synchronization overhead can be significant.

...read moreread less

Abstract: This paper is concerned with multiprocessor implementations of embedded applications specified as iterative dataflow programs in which synchronization overhead can be significant. We develop techniques to alleviate this overhead by determining a minimal set of processor synchronizations that are essential for correct execution. Our study is based in the context of self-timed execution of iterative dataflow programs. An iterative dataflow program consists of a dataflow representation of the body of a loop that is to be iterated an indefinite number of times; dataflow programming in this form has been studied and applied extensively, particularly in the context of signal processing software. Self-timed execution refers to a combined compile-time/run-time scheduling strategy in which processors synchronize with one another based only on interprocessor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graph-theoretic framework based on a data structure called the synchronization graph for analyzing and optimizing synchronization overhead in self-timed, iterative dataflow programs. We show that the comprehensive techniques that have been developed for removing redundant synchronizations in noniterative programs can be extended in this framework to optimally remove redundant synchronizations in our context. We also present an optimization that converts a feedforward dataflow graph into a strongly connected graph in such a way as to reduce synchronization overhead without slowing down execution.

...read moreread less

Proceedings Article•DOI•

Computing kernels implemented with a wormhole RTR CCM

[...]

Ray Bittner, Peter Athanas

16 Apr 1997

TL;DR: Methods of implementation and performance for several common operations using the wormhole RTR paradigm are outlined, serving as indicators of the diversity of algorithms that can be instantiated through the high-speed run-time reconfiguration that these devices make possible.

...read moreread less

Abstract: The wormhole run-time reconfiguration (RTR) computing paradigm is a method for creating high performance computational pipelines. The scalability, distributed control and data flow features of the paradigm allow it to fit neatly into the configurable computing machine (CCM) domain. To date, the field has been dominated by large bit-oriented devices whose flexibility can lead to lowered silicon utilization efficiencies. In an effort to raise this efficiency, the Colt CCM has been created based on the wormhole RTR paradigm. This paper outlines methods of implementation and performance for several common operations using these concepts. They serve as indicators of the diversity of algorithms that can be instantiated through the high-speed run-time reconfiguration that these devices make possible. Particular attention is paid to floating point multiplication. Also discussed is the topic of data dependent computation which would seem to be counter intuitive to the wormhole RTR paradigm. The paper concludes with a summary of performance of the three computations.

...read moreread less

Patent•

Communications system with multiple, simultaneous accesses to a memory

[...]

Glen W. Brown¹•Institutions (1)

Advanced Micro Devices¹

01 Dec 1997

TL;DR: In this paper, the authors present an architecture that involves an embedded Digital Signal Processor (DSP), a DSP interface and memory architecture, a microcontroller interface, an OS, a data flow model, and an interface for hardware blocks.

...read moreread less

Abstract: The present invention comprises an architecture that involves an embedded Digital Signal Processor (DSP), a DSP interface and memory architecture, a microcontroller interface, a DSP operating system (OS), a data flow model, and an interface for hardware blocks. The design allows software to control much of the configuration of the architecture while using hardware to provide efficient data flow, signal processing, and memory access. In devices with embedded DSPs, memory access is often the bottleneck and is tightly coupled to the efficiency of the design. The platform architecture involves a method that allows the sharing of the DSP memory with other custom hardware blocks or the micro-controller. The DSP can operate at full millions-of-instructions-per-second (MIPS) while another function is transferring data to and from memory. This allows for an efficient use of the memory and for a partitioning of the DSP tasks between software and hardware.

...read moreread less