scispace - formally typeset
Search or ask a question
Author

Stephen Neuendorffer

Bio: Stephen Neuendorffer is an academic researcher from Xilinx. The author has contributed to research in topics: Field-programmable gate array & High-level synthesis. The author has an hindex of 16, co-authored 45 publications receiving 1692 citations. Previous affiliations of Stephen Neuendorffer include University of California, Berkeley.


Papers
More filters
Journal ArticleDOI
Jason Cong, Bin Liu, Stephen Neuendorffer1, Juanjo Noguera1, Kees Vissers1, Zhiru Zhang 
TL;DR: AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx are used as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains.
Abstract: Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.

728 citations

Journal ArticleDOI
TL;DR: It is argued that model- based design and platform-based design are two views of the same thing, and that a platform is equivalently a set of designs.
Abstract: In this paper, we argue that model-based design and platform-based design are two views of the same thing. A platform is an abstraction layer in the design flow. For example, a core-based architecture and an instruction set architecture are platforms. We focus on the set of designs induced by this abstraction layer. For example, the set of all ASICs based on a particular core-based architecture and the set of all x86 programs are induced sets. Hence, a platform is equivalently a set of designs. Model-based design is about using platforms with useful modeling properties to specify designs, and then synthesizing implementations from these specifications. Hence model-based design is the view from above (more abstract, closer to the problem domain) and platform-based design is the view from below (less abstract, closer to the implementation technology). One way to define a platform is to provide a design language. Any valid expression in the language is an element of the set. A platform provides a set of cons...

258 citations

01 Apr 2008
TL;DR: This volume describes how to construct Ptolemy II models for web-based modeling or building applications with a brief description of each of the models of computation that have been implemented.
Abstract: : This volume describes how to construct Ptolemy II models for web-based modeling or building applications. The first chapter includes an overview of Ptolemy II software, and a brief description of each of the models of computation that have been implemented. It describes the package structure of the software, and includes as an appendix a brief tutorial on UML notation, which is used throughout the documentation to explain the structure of the software. The second chapter is a tutorial on building models using Vergil, a graphical user interface where models are built pictorially. The third chapter discusses the Ptolemy II expression language, which is used to set parameter values. The next chapter gives an overview of actor libraries. These three chapters, plus one of the domain chapters, will be sufficient for users to start building interesting models in the selected domain. The fifth chapter gives a tutorial on designing actors in Java. The sixth chapter describes the Ptolemy coding style, The seventh chapter explains MoML, the XML schema used by Vergil to store models. And the eighth chapter, the final one in this part, explains how to construct custom applets. Volume 2 describes the software architecture of Ptolemy II, and volume 3 describes the domains, each of which implements a model of computation.

150 citations

01 Apr 2008
TL;DR: This volume describes the software architecture of Ptolemy II, which provides a set of Java classes supporting clustered graph topologies for models and provides a mechanism to systematically transform models by means of graph rewriting.
Abstract: : This volume describes the software architecture of Ptolemy II. The first chapter covers the kernel package, which provides a set of Java classes supporting clustered graph topologies for models. Cluster graphs provide a very general abstract syntax for component-based modeling, without assuming or imposing any semantics on the models. The actor package begins to add semantics by providing basic infrastructure for data transport between components. The data package provides classes to encapsulate the data that is transported. It also provides an extensible type system and an interpreted expression language. The graph package provides graph-theoretic algorithms that are used in the type system and by schedulers in the individual domains. The model transformation package provides a mechanism to systematically transform models by means of graph rewriting. The plot package provides a visual data plotting utility that is used in many of the applets and applications. The codegen package is a templated based code generator similar to the Ptolemy Classic code generators. The copernicus package is a code generator that performs static analysis on Java class files to produce smaller, faster executable models. Volume 1 gives an introduction to Ptolemy II, including tutorials on the use of the software, and volume 3 describes the domains, each of which implements a model of computation.

88 citations

Journal ArticleDOI
TL;DR: This paper introduces a design exploration methodology that identifies the lowest cost FPGA pipelined implementation of an untimed synchronous data-flow graph by combined module selection with resource sharing under the context of pipeline scheduling.
Abstract: The primary goal during synthesis of digital signal processing (DSP) circuits is to minimize the hardware area while meeting a minimum throughput constraint. In field-programmable gate array (FPGA) implementations, significant area savings can be achieved by using slower, more area-efficient circuit modules and/or by time-multiplexing faster, larger circuit modules. Unfortunately, manual exploration of this design space is impractical. In this paper, we introduce a design exploration methodology that identifies the lowest cost FPGA pipelined implementation of an untimed synchronous data-flow graph by combined module selection with resource sharing under the context of pipeline scheduling. These techniques are applied together to minimize the area cost of the FPGA implementation while meeting a user-specified minimum throughput constraint. Two different algorithms are introduced for exploring the large design space. We show that even for small DSP algorithms, combining these techniques can offer significant area savings relative to applying any of them alone

83 citations


Cited by
More filters
Proceedings ArticleDOI
05 May 2008
TL;DR: It is concluded that it will not be sufficient to improve design processes, raise the level of abstraction, or verify designs that are built on today's abstractions to realize the full potential of cyber-Physical Systems.
Abstract: Cyber-Physical Systems (CPS) are integrations of computation and physical processes. Embedded computers and networks monitor and control the physical processes, usually with feedback loops where physical processes affect computations and vice versa. The economic and societal potential of such systems is vastly greater than what has been realized, and major investments are being made worldwide to develop the technology. There are considerable challenges, particularly because the physical components of such systems introduce safety and reliability requirements qualitatively different from those in general- purpose computing. Moreover, physical components are qualitatively different from object-oriented software components. Standard abstractions based on method calls and threads do not work. This paper examines the challenges in designing such systems, and in particular raises the question of whether today's computing and networking technologies provide an adequate foundation for CPS. It concludes that it will not be sufficient to improve design processes, raise the level of abstraction, or verify (formally or otherwise) designs that are built on today's abstractions. To realize the full potential of CPS, we will have to rebuild computing and networking abstractions. These abstractions will have to embrace physical dynamics and computation in a unified way.

3,309 citations

Journal ArticleDOI
TL;DR: For concurrent programming to become mainstream, threads must be discarded as a programming model, and nondeterminism should be judiciously and carefully introduced where needed, and it should be explicit in programs.
Abstract: For concurrent programming to become mainstream, we must discard threads as a programming model. Nondeterminism should be judiciously and carefully introduced where needed, and it should be explicit in programs. In general-purpose software engineering practice, we have reached a point where one approach to concurrent programming dominates all others namely, threads, sequential processes that share memory. They represent a key concurrency model supported by modern computers, programming languages, and operating systems. In scientific computing, where performance requirements have long demanded concurrent programming, data-parallel language extensions and message-passing libraries such as PVM, MPI, and OpenMP dominate over threads for concurrent programming. Computer architectures intended for scientific computing often differ significantly from so-called general-purpose architectures.

956 citations

Journal ArticleDOI
Jason Cong, Bin Liu, Stephen Neuendorffer1, Juanjo Noguera1, Kees Vissers1, Zhiru Zhang 
TL;DR: AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx are used as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains.
Abstract: Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design.

728 citations