Showing papers by "Xilinx published in 2009"

PDF

Open Access

Journal Article•DOI•

OpenDF: a dataflow toolset for reconfigurable hardware and multicore systems

[...]

Shuvra S. Bhattacharyya¹, Gordon J. Brebner², Jorn W. Janneck², Johan Eker³, Carl Von Platen³, Marco Mattavelli⁴, Mickaël Raulet⁵ - Show less +3 more•Institutions (5)

University of Maryland, College Park¹, Xilinx², Ericsson³, École Polytechnique Fédérale de Lausanne⁴, Intelligence and National Security Alliance⁵

20 Jun 2009-ACM Sigarch Computer Architecture News

TL;DR: In this paper, the authors present the OpenDF framework and recall that dataflow programming was once invented to address the problem of parallel computing, and discuss the problems with an imperative style, von Neumann programs, and present what they believe are the advantages of using a data flow programming model.

...read moreread less

Abstract: This paper presents the OpenDF framework and recalls that dataflow programming was once invented to address the problem of parallel computing. We discuss the problems with an imperative style, von Neumann programs, and present what we believe are the advantages of using a dataflow programming model. The CAL actor language is briefly presented and its role in the ISO/MPEG standard is discussed. The Dataflow Interchange Format (DIF) and related tools can be used for analysis of actors and networks, demonstrating the advantages of a dataflow approach. Finally, an overview of a case study implementing an MPEG- 4 decoder is given.

...read moreread less

81 citations

Proceedings Article•DOI•

Exploiting statically schedulable regions in dataflow programs

[...]

Ruirui Gu¹, Jorn W. Janneck², Mickaël Raulet³, Shuvra S. Bhattacharyya¹•Institutions (3)

University of Maryland, College Park¹, Xilinx², Centre national de la recherche scientifique³

19 Apr 2009

TL;DR: This paper focuses on detection of SDF-like regions in dynamic dataflow descriptions — in particular, in the generalized specification framework of CAL, which is an important step for applying static scheduling techniques within aynamic dataflow framework.

...read moreread less

Abstract: Dataflow descriptions have been used in a wide range of Digital Signal Processing (DSP) applications, such as multi-media processing, and wireless communications. Among various forms of dataflow modeling, Synchronous Dataflow (SDF) is geared towards static scheduling of computational modules, which improves system performance and predictability. However, many DSP applications do not fully conform to the restrictions of SDF modeling. More general dataflow models, such as CAL [1], have been developed to describe dynamically-structured DSP applications. Such generalized models can express dynamically changing functionality, but lose the powerful static scheduling capabilities provided by SDF. This paper focuses on detection of SDF-like regions in dynamic dataflow descriptions — in particular, in the generalized specification framework of CAL. This is an important step for applying static scheduling techniques within a dynamic dataflow framework. Our techniques combine the advantages of different dataflow languages and tools, including CAL [1], DIF [2] and CAL2C [3]. The techniques are demonstrated on the IDCT module of MPEG Reconfigurable Video Coding (RVC).

...read moreread less

43 citations

Patent•

Apparatus and method for testing of stacked die structure

[...]

Arifur Rahman¹, Hong-Tsz Pan¹, Bang-Thu Nguyen¹•Institutions (1)

Xilinx¹

17 Jul 2009

TL;DR: In this article, a scan-chain implementation of an integrated circuit device using programmable logic (programmable logic) is presented, where the probe pad (306, 111 -116) is coupled directly to the test logic (305, 104) such that configuration of the programmable Logic (550, 314, 105) is not required.

...read moreread less

Abstract: An integrated circuit device includes a stacked die (102) and a base die (101) having probe pads (306, 111 -116) that directly couple to test logic (305, 104) of the base die to implement a scan chain for testing of the integrated circuit device. The base die further includes contacts (107) disposed on a back side of the base die and through-die vias (310, 121 -128) coupled to the contacts and coupled to programmable logic (550, 314, 105) of the base die. The base die also includes a first probe pad (111) configured to couple test input, a second probe pad (112) configured to couple test output, and a third probe pad (113) configured to couple control signals. Test logic (305) of the base die is configured to couple to additional test logic (405) of the stacked die to implement the scan chain. The probe pads (306, 111 -116) are coupled directly to the test logic (305, 104) such that configuration of the programmable logic (550, 314, 105) is not required to implement the scan chain.

...read moreread less

43 citations

Proceedings Article•DOI•

Clock power reduction for virtex-5 FPGAs

[...]

Qiang Wang¹, Subodh Gupta¹, Jason H. Anderson²•Institutions (2)

Xilinx¹, University of Toronto²

22 Feb 2009

TL;DR: In this paper, two complementary approaches for clock power reduction in the Xilinx Virtex-5 FPGA are presented, where clock enable signals on flip-flops are selectively migrated to use the dedicated clock enable available on the built-in clock network.

...read moreread less

Abstract: Clock network power in field-programmable gate arrays (FPGAs) is considered and two complementary approaches for clock power reduction in the Xilinx Virtex-5 FPGA are presented. The approaches are unique in that they leverage specific architectural aspects of Virtex-5 to achieve reductions in dynamic power consumed by the clock network. The first approach comprises a placement-based technique to reduce interconnect resource usage on the clock network, thereby reducing capacitance and power (up to 12%). The second approach borrows the "clock gating" notion from the ASIC domain and applies it to FPGAs. Clock enable signals on flip-flops are selectively migrated to use the dedicated clock enable available on the FPGA's built-in clock network, leading to reduced toggling on the clock interconnect and lower power (up to 28%). Power reductions are achieved without any performance penalty, on average.

...read moreread less

39 citations

Proceedings Article•DOI•

Performance and power of cache-based reconfigurable computing

[...]

Andrew Putnam¹, Susan J. Eggers¹, Dave Bennett², Eric F. Dellinger², Jeffrey M. Mason², Henry E. Styles², Prasanna Sundararajan², Ralph D. Wittig² - Show less +4 more•Institutions (2)

University of Washington¹, Xilinx²

20 Jun 2009

TL;DR: The analyses and optimizations of the CHiMPS compiler that construct many-cache caches are presented, showing a performance advantage of 7.8x over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater.

...read moreread less

Abstract: Many-cache is a memory architecture that efficiently supports caching in commercially available FPGAs. It facilitates FPGA programming for high-performance computing (HPC) developers by providing them with memory performance that is greater and power consumption that is less than their current CPU platforms, but without sacrificing their familiar, C-based programming environment.Many-cache creates multiple, multi-banked caches on top of an FGPA's small, independent memories, each targeting a particular data structure or region of memory in an application and each customized for the memory operations that access it. The caches are automatically generated from C source by the CHiMPS C-to-FPGA compiler.This paper presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches. An architectural evaluation of CHiMPS-generated FPGAs demonstrates a performance advantage of 7.8x (geometric mean) over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater, by a geometric mean of 21.3x.

...read moreread less

37 citations

Proceedings Article•DOI•

Scalable don't-care-based logic optimization and resynthesis

[...]

Alan Mishchenko¹, Robert Brayton¹, Jie-Hong R. Jiang², Stephen Jang³•Institutions (3)

University of California, Berkeley¹, National Taiwan University², Xilinx³

22 Feb 2009

TL;DR: The proposed resynthesis is capable of substantial logic restructuring, is customizable to solve a variety of optimization tasks, and has reasonable runtime on industrial designs.

...read moreread less

Abstract: We describe an optimization method for combinational and sequential logic networks, with emphasis on scalability and the scope of optimization. The proposed resynthesis (a) is capable of substantial logic restructuring, (b) is customizable to solve a variety of optimization tasks, and (c) has reasonable runtime on industrial designs. The approach uses don't cares computed for a window surrounding a node and can take into account external don't cares (e.g. unreachable states). It uses a SAT solver and interpolation to find a new representation for a node. This representation can be in terms of inputs from other nodes in the window thus effecting Boolean re-substitution. Experimental results on 6-input LUT networks after high effort synthesis show substantial reductions in area and delay. When applied to 20 large academic benchmarks, the LUT count and logic level is reduced by 45.0% and 12.2%, respectively. The longest runtime for synthesis and mapping is about two minutes. When applied to a set of 14 industrial benchmarks ranging up to 83K 6-LUTs, the LUT count and logic level is reduced by 11.8% and 16.5%, respectively. Experimental results on 6-input LUT networks after high-effort synthesis show substantial reductions in area and delay. The longest runtime is about 30 minutes.

...read moreread less

36 citations

Journal Article•DOI•

Exploring the Concurrency of an MPEG RVC Decoder Based on Dataflow Program Analysis

[...]

Ruirui Gu¹, Jorn W. Janneck², Shuvra S. Bhattacharyya¹, Mickaël Raulet, Matthieu Wipliez, William Plishker¹ - Show less +2 more•Institutions (2)

University of Maryland, College Park¹, Xilinx²

01 Nov 2009-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: An in-depth case study on dataflow-based analysis and exploitation of parallelism in the design and implementation of a MPEG reconfigurable video coding decoder and the systematic exploitation of concurrency in CAL networks that are targeted to multicore platforms is presented.

...read moreread less

Abstract: This paper presents an in-depth case study on dataflow-based analysis and exploitation of parallelism in the design and implementation of a MPEG reconfigurable video coding decoder. Dataflow descriptions have been used in a wide range of digital signal processing (DSP) applications, such as applications for multimedia processing and wireless communications. Because dataflow models are effective in exposing concurrency and other important forms of high level application structure, dataflow techniques are promising for implementing complex DSP applications on multicore systems, and other kinds of parallel processing platforms. In this paper, we use the client access license (CAL) language as a concrete framework for representing and demonstrating dataflow design techniques. Furthermore, we also describe our application of the differential item functioning dataflow interchange format package (TDP), a software tool for analyzing dataflow networks, to the systematic exploitation of concurrency in CAL networks that are targeted to multicore platforms. Using TDP, one is able to automatically process regions that are extracted from the original network, and exhibit properties similar to synchronous dataflow (SDF) models. This is important in our context because powerful techniques, based on static scheduling, are available for exploiting concurrency in SDF descriptions. Detection of SDF-like regions is an important step for applying static scheduling techniques within a dynamic dataflow framework. Furthermore, segmenting a system into SDF-like regions also allows us to explore cross-actor concurrency that results from dynamic dependences among different regions. Using SDF-like region detection as a preprocessing step to software synthesis generally provides an efficient way for mapping tasks to multicore systems, and improves the system performance of video processing applications on multicore platforms.

...read moreread less

33 citations

Journal Article•DOI•

Guest Editors’ Introduction to Security in Reconfigurable Systems Design

[...]

Patrick Schaumont¹, Alex K. Jones², Steve Trimberger³•Institutions (3)

Virginia Tech¹, University of Pittsburgh², Xilinx³

01 Mar 2009-ACM Transactions on Reconfigurable Technology and Systems

TL;DR: This special issue on Security in Reconfigurable Systems Design reports on recent research results in the design and implementation of trustworthy reconfigurable systems.

...read moreread less

Abstract: This special issue on Security in Reconfigurable Systems Design reports on recent research results in the design and implementation of trustworthy reconfigurable systems. Five articles cover topics including power-efficient implementation of public-key cryptography, side-channel analysis of electromagnetic radiation, side-channel resistant design, design of robust unclonable functions on an FPGA, and Trojan detection in an FPGA bitstream.

...read moreread less

32 citations

Proceedings Article•DOI•

Multiaccess Channels with State Known to One Encoder: Another Case of Degraded Message Sets

[...]

Abdellatif Zaidi¹, Shiva Prasad Kotagiri², J. Nicholas Laneman³, Luc Vandendorpe¹•Institutions (3)

Université catholique de Louvain¹, Xilinx², University of Notre Dame³

08 Jun 2009-arXiv: Information Theory

TL;DR: In this article, the capacity region of a two-user state-dependent multiaccess channel with only one encoder is informed, non-causally, of the channel states is investigated.

...read moreread less

Abstract: We consider a two-user state-dependent multiaccess channel in which only one of the encoders is informed, non-causally, of the channel states. Two independent messages are transmitted: a common message transmitted by both the informed and uninformed encoders, and an individual message transmitted by only the uninformed encoder. We derive inner and outer bounds on the capacity region of this model in the discrete memoryless case as well as the Gaussian case. Further, we show that the bounds for the Gaussian case are tight in some special cases.

...read moreread less

29 citations

Journal Article•DOI•

Classes and inheritance in actor-oriented design

[...]

Edward A. Lee¹, Xiaojun Liu², Stephen Neuendorffer³•Institutions (3)

University of California, Berkeley¹, Sun Microsystems², Xilinx³

24 Jul 2009-ACM Transactions in Embedded Computing Systems

TL;DR: This article shows a form that such mechanisms can take in actor-oriented components, gives a formal structure, and describes a prototype implementation, which permits a disciplined form of multiple inheritance with unambiguous inheritance and overriding behavior.

...read moreread less

Abstract: Actor-oriented components emphasize concurrency and temporal semantics and are used for modeling and designing embedded software and hardware. Actors interact with one another through ports via a messaging schema that can follow any of several concurrent semantics. Domain-specific actor-oriented languages and frameworks are common (Simulink, LabVIEW, SystemC, etc.). However, they lack many modularity and abstraction mechanisms that programmers have become accustomed to in object-oriented components, such as classes, inheritance, interfaces, and polymorphism, except as inherited from the host language. This article shows a form that such mechanisms can take in actor-oriented components, gives a formal structure, and describes a prototype implementation. The mechanisms support actor-oriented class definitions, subclassing, inheritance, and overriding. The formal structure imposes structural constraints on a model (mainly the “derivation invariant”) that lead to a policy to govern inheritance. In particular, the structural constraints permit a disciplined form of multiple inheritance with unambiguous inheritance and overriding behavior. The policy is based formally on a generalized ultrametric space with some remarkable properties. In this space, inheritance is favored when actors are “closer” (in the generalized ultrametric), and we show that when inheritance can occur from multiple sources, one source is always unambiguously closer than the other.

...read moreread less

27 citations

Proceedings Article•DOI•

Generic Software Framework for Adaptive Applications on FPGAs

[...]

Suhaib A. Fahmy¹, Jorg Lotze¹, Juanjo Noguera², Linda Doyle¹, Robert P. Esser² - Show less +1 more•Institutions (2)

Trinity College, Dublin¹, Xilinx²

05 Apr 2009

TL;DR: This work presents a system model and software architecture for implementing runtime adaptive applications on FPGAs, separating the control and processing planes and abstracting away the details of hardware reconfiguration from the system designer.

...read moreread less

Abstract: Adaptive systems are set to become more mainstream, as numerous practical applications in the communications domain emerge. FPGAs offer an ideal implementation platform, combining high performance with flexibility. While significant research has been undertaken in the area of FPGA partial reconfiguration, it has focussed primarily on low-level architecture-specific implementations. Building upon this previous work, we present a system model and software architecture for implementing runtime adaptive applications on FPGAs, separating the control and processing planes and abstracting away the details of hardware reconfiguration from the system designer. Hardware processing components appear as software components in the runtime system, enabling their inclusion in adaptive applications. We present an adaptive wireless application, demonstrating the use of the model and software architecture.

...read moreread less

Proceedings Article•DOI•

Development Framework for Implementing FPGA-Based Cognitive Network Nodes

[...]

Jorg Lotze¹, Suhaib A. Fahmy¹, Juanjo Noguera², Baris Ozgul¹, Linda Doyle¹, Robert P. Esser² - Show less +2 more•Institutions (2)

Trinity College, Dublin¹, Xilinx²

30 Nov 2009

TL;DR: This paper identifies important features a cognitive radio framework should provide, namely a virtual architecture for hardware abstraction, an adaptive run-time system for managing cognition, and high level design tools for cognitive radio development, and proposes a novel FPGA-based framework that provides all these features.

...read moreread less

Abstract: This paper identifies important features a cognitive radio framework should provide, namely a virtual architecture for hardware abstraction, an adaptive run-time system for managing cognition, and high level design tools for cognitive radio development. We evaluate a range of existing frameworks with respect to these, and propose a novel FPGA-based framework that provides all these features. By abstracting away the details of hardware reconfiguration, radio designers can implement FPGA-based cognitive nodes much as they would do for a software implementation. We apply the proposed framework to the design and implementation of a receiver node that works in two modes: discovery, where it uses spectrum sensing to find a radio transmission, and communication, in which it receives and demodulates the said transmission. We show how the whole design process does not require any hardware experience on behalf of the radio designer.

...read moreread less

Patent•

Method and apparatus for a process, voltage, and temperature variation tolerant semiconductor device

[...]

Guo Jun Ren¹, Qi Zhang¹, Ketan Sodha¹•Institutions (1)

Xilinx¹

29 Jan 2009

TL;DR: In this paper, an adaptive feedback mechanism is employed to sense and correct performance degradation, while simultaneously facilitating configurability within integrated circuits (ICs) such as programmable logic devices (PLDs).

...read moreread less

Abstract: A method and apparatus to reduce the degradation in performance of semiconductor-based devices due to process, voltage, and temperature (PVT) and/or other causes of variation. Adaptive feedback mechanisms are employed to sense and correct performance degradation, while simultaneously facilitating configurability within integrated circuits (ICs) such as programmable logic devices (PLDs). A voltage-feedback mechanism is employed to detect PVT variation and mirrored current references are adaptively adjusted to track and substantially eliminate the PVT variation. More than one voltage-feedback mechanism may instead be utilized to detect PVT-based variations within a differential device, whereby a first voltage-feedback mechanism is utilized to detect common-mode voltage variation and a second voltage-feedback mechanism produces mirrored reference currents to substantially remove the common-mode voltage variation and facilitate symmetrical operation of the differential device.

...read moreread less

Patent•

System and method for using reconfiguration ports for power management in integrated circuits

[...]

Juan J. Noguera Serra¹, Tim Tuan¹•Institutions (1)

Xilinx¹

05 Mar 2009

TL;DR: In this paper, a method of operating an integrated circuit having a circuit block configurable by a configuration memory is disclosed, which includes determining whether to operate the circuit block in a normal operation mode or a low power mode.

...read moreread less

Abstract: A method of operating an integrated circuit having a circuit block configurable by a configuration memory is disclosed. The method includes determining whether to operate the circuit block in a normal operation mode or a low power mode. The configuration memory is loaded with normal operation mode configuration data for the circuit block if the normal operation mode is determined. If the low power mode is determined, the configuration memory is loaded with low power mode configuration data for the circuit block.

...read moreread less

Patent•

Self-checking and self-correcting internal configuration port circuitry

[...]

Chen Wei Tseng¹, Weiguang Lu¹, Matthew Pond Baker¹•Institutions (1)

Xilinx¹

03 Apr 2009

TL;DR: In this paper, a method and apparatus for self-checking and self-correcting memory states of a programmable resource is described, where a first core and a second core instantiated therein.

...read moreread less

Abstract: Method and apparatus for self-checking and self-correcting memory states of a programmable resource is described. Programmable resource of an integrated circuit has a first core and a second core instantiated therein. A first internal configuration port and a second internal configuration port of the integrated circuit are respectively connected to the first core and the second core. The second core is coupled to the first core for monitoring operation of the first core with the second core, and the second core is configured to obtain control responsive to a failure of the first core or the first internal configuration port for a self-correcting mode.

...read moreread less

Patent•

Characterizing circuit performance by separating device and interconnect impact on signal delay

[...]

Xiao-Jie Yuan¹, Michael J. Hart¹, Gary Ling Zicheng¹, Steven P. Young¹•Institutions (1)

Xilinx¹

19 Jan 2009

TL;DR: In this paper, a model equation is defined for each embedded test circuit, with each model equation specifying the output delay of its associated embedded test circuits as a function of Front End OF the Line (FEOL) and Back End Of The Line (BEOL) parameters.

...read moreread less

Abstract: An integrated circuit (IC) includes multiple embedded test circuits that all include a ring oscillator coupled to a test load. The test load either is a direct short in the ring oscillator or else is a interconnect load that is representative of one of the interconnect layers in the IC. A model equation is defined for each embedded test circuit, with each model equation specifying the output delay of its associated embedded test circuit as a function of Front End OF the Line (FEOL) and Back End Of the Line (BEOL) parameters. The model equations are then solved for the various FEOL and BEOL parameters as functions of the test circuit output delays. Finally, measured output delay values are substituted in to these parameter equations to generate actual values for the various FEOL and BEOL parameters, thereby allowing any areas of concern to be quickly and accurately identified.

...read moreread less

Patent•

Analog front-end having built-in equalization and applications thereof

[...]

William C. Black¹, Charles W. Boecker¹, Eric D. Groen¹•Institutions (1)

Xilinx¹

09 Jan 2009

TL;DR: In this article, an analog front-end having built-in equalization includes a control module and a tunable gain stage, where the control module is operably coupled to provide a frequency response setting based on a channel response of a channel providing high-speed serial data to the analog front end.

...read moreread less

Abstract: An analog front-end having built-in equalization includes a control module and a tunable gain stage. The control module is operably coupled to provide a frequency response setting based on a channel response of a channel providing high-speed serial data to the analog front-end. The tunable gain stage includes a frequency dependent load and an amplifier input section. The frequency dependent load is adjusted based on the frequency response setting. The amplifier input section is operably coupled to the frequency dependent load and receives the high-speed serial data. In conjunction with the frequency dependent load, the amplifier input section amplifies and equalizes the high-speed serial data to produce an amplified and equalized serial data.

...read moreread less

Patent•

Method of and circuit for reducing distortion in a power amplifier

[...]

Stephen Summerfield¹, Christopher H. Dick¹•Institutions (1)

Xilinx¹

13 May 2009

TL;DR: In this article, an integrated circuit having a circuit for reducing distortion in a power amplifier (302) is disclosed, which comprises a predistortion circuit (304, 402), coupled to receive a signal (x(n)) to be amplified; sample capture buffers (306, 406) coupled to an output (z(n)), and an estimator circuit (308, 520, 412, 612).

...read moreread less

Abstract: An integrated circuit having a circuit for reducing distortion in a power amplifier (302) is disclosed. The integrated circuit comprises a predistortion circuit (304, 402) coupled to receive a signal (x(n)) to be amplified; sample capture buffers (306, 406) coupled to an output (z(n)) of the predistortion circuit and an input/output port of the integrated circuit; and an estimator circuit (308, 520, 412, 612) coupled to the sample capture buffers, wherein the estimator circuit generates parameters for the predistortion circuit based upon the output of the predistortion circuit and an output of the power amplifier received at the input/output port of the integrated circuit. A method of reducing distortion in a power amplifier is also disclosed.

...read moreread less

Journal Article•DOI•

WireMap: FPGA Technology Mapping for Improved Routability and Enhanced LUT Merging

[...]

Stephen Jang¹, Billy Chan¹, Kevin Chung¹, Alan Mishchenko²•Institutions (2)

Xilinx¹, University of California, Berkeley²

01 Jun 2009-ACM Transactions on Reconfigurable Technology and Systems

TL;DR: A new technology mapper, WireMap, uses an edge flow heuristic to improve the routability of a mapped design and has an additional advantage of reducing an average number of inputs of LUTs without increasing the total LUT count and depth.

...read moreread less

Abstract: This article presents a new technology mapper, WireMap. The mapper uses an edge flow heuristic to improve the routability of a mapped design. The heuristic is applied during the iterative mapping optimization to reduce the total number of pin-to-pin connections (or edges). On academic benchmark (ISCAS, MCNC, and ITC designs), the average edge reduction of 9.3p is achieved while maintaining depth and LUT count compared to state-of-the-art technology mapping. Placing and routing the resulting netlists leads to an 8.5p reduction in the total wirelength, a 6.0p reduction in minimum channel width, and a 2.3p reduction in critical path delay. This technique is applied in the Xilinx ISE Design tool to evaluate its effect on industrial Virtex5 circuits. In a set of 20 large designs, we find the edge reduction is 6.8p while total wirelength measured in the placer is reduced by 3.6p. Applying WireMap has an additional advantage of reducing an average number of inputs of LUTs without increasing the total LUT count and depth. The percentages of 5- and 6-LUTs in a typical design are reduced, while the percentages of 2-, 3-, and 4-LUTs are increased. These smaller LUTs can be merged into pairs and implemented using the dual-output LUT structure found in commercial FPGAs. For academic benchmarks, WireMap leads to 9.4p fewer dual-output LUTs after merging. For the industrial designs, WireMap leads to 6.3p fewer dual-output Virtex5 LUTs.

...read moreread less

Patent•

Integrated circuit with mosfet fuse element

[...]

Hsung Jai Im¹, Paak Sunhom¹, Boon Y. Ang¹•Institutions (1)

Xilinx¹

20 Feb 2009

TL;DR: In this paper, a MOS fuse (200) is programmed by applying a programming signal to the fuse terminals (204, 206) so that programming current flows through the fuse link (202).

...read moreread less

Abstract: At least one MOS parameter of a MOS fuse (200) is characterized to provide at least one MOS parameter reference value. Then, the MOS fuse (200) is programmed by applying a programming signal to the fuse terminals (204, 206) so that programming current flows through the fuse link (202). The fuse resistance is measured to provide a measured fuse resistance associated with a first logic value. A MOS parameter of the programmed MOS fuse is measured to provide a measured MOS parameter value. The measured MOS parameter value is compared to the reference MOS parameter value to determine a second logic value of the MOS fuse, and a bit value is output based on the comparison.

...read moreread less

Patent•

Direct memory access technique for use with PCIe endpoints

[...]

Kiran S. Puranik¹•Institutions (1)

Xilinx¹

19 May 2009

TL;DR: In this paper, an integrated circuit (IC) includes a peripheral component interconnect express (PCIe) root complex having a central processing unit (CPU), a memory controller configured to control a main memory of a PCIe system, and a PCIe port coupled to a PCIe endpoint device through a PCIe switch.

...read moreread less

Abstract: An integrated circuit (“IC”) includes a peripheral component interconnect express (“PCIe”) root complex having a central processing unit (“CPU”), a memory controller configured to control a main memory of a PCIe system, and a PCIe port coupled to a PCIe endpoint device through a PCIe switch. The PCIe endpoint device is configured to initiate data transfer between the main memory and the PCIe endpoint device.

...read moreread less

Journal Article•DOI•

Packing Techniques for Virtex-5 FPGAs

[...]

Taneem Ahmed¹, Paul D. Kundarewich¹, Jason H. Anderson¹•Institutions (1)

Xilinx¹

01 Sep 2009-ACM Transactions on Reconfigurable Technology and Systems

TL;DR: This article considers packing in the commercial FPGA context and discusses packing techniques for large IP blocks, namely, block RAMs and DSPs, and presents techniques for dual-output LUT packing that lead to improved area-efficiency, with minimal performance degradation.

...read moreread less

Abstract: Packing is a key step in the FPGA tool flow that straddles the boundaries between synthesis, technology mapping and placement Packing strongly influences circuit speed, density, and power, and in this article, we consider packing in the commercial FPGA context and examine the area and performance trade-offs associated with packing in a state-of-the-art FPGA---the Xilinx® VirtexTM-5 FPGA In addition to look-up-table (LUT)-based logic blocks, modern FPGAs also contain large IP blocks We discuss packing techniques for both types of blocks Virtex-5 logic blocks contain dual-output 6-input LUTs Such LUTs can implement any single logic function of up to 6 inputs, or any two logic functions requiring no more than 5 distinct inputs The second LUT output has reduced speed, and therefore, must be used judiciously We present techniques for dual-output LUT packing that lead to improved area-efficiency, with minimal performance degradation We then describe packing techniques for large IP blocks, namely, block RAMs and DSPs We pack circuits into the large blocks in a way that leverages the unique block RAM and DSP layout/architecture in Virtex-5, achieving significantly improved design performance

...read moreread less

Patent•

Interposer for redistributing signals

[...]

Robert O. Conn¹•Institutions (1)

Xilinx¹

19 Jun 2009

TL;DR: In this paper, a capacitive interposer (caposer) is disposed inside an integrated circuit package between a die and an inside surface of the package, and conductive layers within the caposer constitute a bypass capacitor.

...read moreread less

Abstract: A capacitive interposer (caposer) is disposed inside an integrated circuit package between a die and an inside surface of the package. Conductive layers within the caposer constitute a bypass capacitor. In a through-hole caposer, micro-bumps on the die pass through through-holes in the caposer and contact corresponding landing pads on the package. As they pass through the caposer, power and ground micro-bumps make contact with the plates of the bypass capacitor. In a via caposer, power and ground micro-bumps on the die are coupled to power and ground landing pads on the package as well as to the plates of the bypass capacitor by power and ground vias that extend through the caposer. In signal redistribution caposer, conductors within the caposer redistribute signals between die micro-bumps and package landing pads. In an impedance matching caposer, termination structures within the caposer provide impedance matching to a printed circuit board trace.

...read moreread less

Proceedings Article•DOI•

Performance and power of cache-based reconfigurable computing

[...]

Andrew Putnam¹, Susan J. Eggers¹, Dave Bennett², Eric F. Dellinger², Jeffrey M. Mason², Henry E. Styles², Prasanna Sundararajan², Ralph D. Wittig² - Show less +4 more•Institutions (2)

University of Washington¹, Xilinx²

22 Feb 2009

TL;DR: This poster presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches, and presents the details of the cache parameters on a Xilinx Virtex-5 LX110T FPGA.

...read moreread less

Abstract: CHiMPS is a C-based compiler for high-performance computing (HPC) on heterogeneous CPU-FPGA computing platforms. CHiMPS efficiently supports random accesses to main memory through the many-cache memory model, enabling a broader range of applications to take advantage of FPGA-based acceleration. Many-cache creates multiple caches on top of an FGPA's small, independent memories, each targeting a particular data structure or region of memory in an application and each customized for the memory operations that access it. This poster presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches, and presents the details of the cache parameters on a Xilinx Virtex-5 LX110T FPGA. Detailed simulation results on HPC kernels demonstrate a 7.8x (geometric mean) performance boost over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater, by a geometric mean of 21.3x.

...read moreread less

Proceedings Article•DOI•

Improving logic density through synthesis-inspired architecture

[...]

Jason H. Anderson¹, Qiang Wang²•Institutions (2)

University of Toronto¹, Xilinx²

29 Sep 2009

TL;DR: Property of the logic synthesis netlist is used to define both a logic element architecture and an associated technology mapping algorithm that together provide improved logic density and shows that 6-LUT optimal mapping depths can be achieved with a small fraction of the LUTs in hardware being 6- lUTs and the remainder being extended 5-Luts, suggesting that a heterogeneous logic block architecture may prove to be advantageous.

...read moreread less

Abstract: We leverage properties of the logic synthesis netlist to define both a logic element architecture and an associated technology mapping algorithmthat together provide improved logic density. We demonstrate that an “extended” logic element with slightly modified K-input LUTs achieves much of the benefit of an architecturewithK+1-inputLUTs, while consuming silicon area close to a K-LUT (a K-LUT requires half the area of a K+1-LUT).We introduce the notion of “non-inverting paths” in a circuit's AND-inverter graph (AIG) and show their utility in mapping into the proposed logic element. Results show that while circuits mapped to a traditional 5-LUT architecture need 14% more LUTs and have 12% more depth than a 6-LUT architecture, our extended 5-LUT architecture requires only 7%more LUTs and 2.5% more depth than 6-LUTs, on average. Nearly all of the depth reduction associated with moving from K-input to K+1-input LUTs can be achieved with considerably less area using extended K-LUTs. We further show that 6-LUT optimal mapping depths can be achieved with a small fraction of the LUTs in hardware being 6-LUTs and the remainder being extended 5-LUTs, suggesting that a heterogeneous logic block architecture may prove to be advantageous.

...read moreread less

Patent•

Method of evaluating an architecture for an integrated circuit device

[...]

Jorn W. Janneck¹, David B. Parlour¹, Ian D. Miller¹•Institutions (1)

Xilinx¹

11 Jun 2009

TL;DR: In this paper, a method for evaluating an architecture for an integrated circuit device is presented, which consists of generating a library of primitives for a predetermined architecture, transforming an original dataflow program into an intermediate format, transforming the intermediate format to a data-flow program defined in terms of the predefined library, and generating an implementation profile comprising information related to an implementation of the original data flow program in the integrated circuit having the predetermined architecture.

...read moreread less

Abstract: A method of evaluating an architecture for an integrated circuit device is disclosed. The method comprises generating a library of primitives for a predetermined architecture; transforming an original dataflow program into an intermediate format; transforming the intermediate format to a dataflow program defined in terms of the predefined library of primitives; and generating an implementation profile comprising information related to an implementation of the original dataflow program in an integrated circuit having the predetermined architecture. A method of evaluating an architecture for an integrated circuit device is also disclosed.

...read moreread less

Journal Article•DOI•

On the Complexity of the Verification of the Costas Property

[...]

L. Barker¹, Konstantinos Drakakis², Scott Rickard²•Institutions (2)

Xilinx¹, University College Dublin²

24 Mar 2009

TL;DR: In this paper, a restricted subset among the totality of pairs of entries in the same row of the difference triangle needs to be checked, and a subset is explicitly described such a subset.

...read moreread less

Abstract: In this paper, we show that in order to ascertain whether a permutation has the Costas property, only a restricted subset among the totality of pairs of entries in the same row of the difference triangle needs to be checked, and we explicitly describe such a subset. This represents a further refinement on the definition of a Costas permutation. This observation can be used to speed up algorithms that exhaustively search for Costas permutations. Asymptotically, the savings approaches 43% for large orders when compared with the previous standard efficient method.

...read moreread less

Proceedings Article•DOI•

Packets everywhere: The great opportunity for field programmable technology

[...]

Gordon J. Brebner¹•Institutions (1)

Xilinx¹

01 Dec 2009

TL;DR: The packet is the atom of the digital revolution: the unit of data communication that leads ultimately to the Internet as the authors know it today.

...read moreread less

Abstract: The packet is the atom of the digital revolution: the unit of data communication. The use of packets in networking was first proposed almost 50 years ago, leading ultimately to the Internet as we know it today. As networking has scaled down towards networks on chip, so packets feature for digital communication in the small. As applications have gone digital, so their data has become packetised. Streams of packets, and the processing of these packets, are characteristic of the digital age.

...read moreread less

Patent•

Specific memory controller implemented using reconfiguration

[...]

Stephen Neuendorffer¹•Institutions (1)

Xilinx¹

17 Jul 2009

TL;DR: In this paper, a memory configuration data-set is selected based on a characteristic of the memory arrangement and a particular partial reconfiguration data set is selected to implement a specific memory controller that differs from the general memory controller.

...read moreread less

Abstract: A circuit controls a memory arrangement and includes an array of programmable resources and interconnect resources, a reconfiguration port, and a processor. The programmable resources and interconnect resources in the array are initially configured with a reference configuration data-set. The reference configuration data-set configures the programmable resources and interconnect resources to implement a general memory controller. The processor obtains a characteristic of the memory arrangement and selects a particular partial reconfiguration data-set based on the characteristic of the memory arrangement. The processor reconfigures the programmable resources and interconnect resources in the array via the reconfiguration port. The processor reconfigures the programmable resources and interconnect resources with the particular partial reconfiguration data-set. The particular partial reconfiguration data-set partially reconfigures the programmable resources and interconnect resources to implement a portion of a specific memory controller that differs from the general memory controller.

...read moreread less

Patent•

Semiconductor stack assembly having reduced thermal spreading resistance and methods of making same

[...]

Arifur Rahman¹•Institutions (1)

Xilinx¹

20 Feb 2009

TL;DR: In this paper, the authors describe a semiconductor assembly with reduced thermal spreading resistance and methods of making the same, where the primary integrated circuit (IC) is mounted on the primary IC die and the secondary IC die is between the IC and the heat extraction element.

...read moreread less

Abstract: Semiconductor assemblies having reduced thermal spreading resistance and methods of making the same are described. In an example, a semiconductor device (101) includes a primary integrated circuit (IC) die (102) and at least one secondary IC die (104) mounted on the primary IC die (102). A heat extraction element (110) includes a base (109) mounted to the semiconductor device (101) such that each of the at least one secondary IC die (104) is between the primary IC die (102) and the heat extraction element (110). At least one dummy fill (106) is adjacent the at least one secondary IC die (104), and each thermally couples the primary IC die (102) to the heat extraction element (110).

...read moreread less

Collapse