scispace - formally typeset
Search or ask a question

Showing papers by "Xilinx published in 2009"


Journal ArticleDOI
TL;DR: In this paper, the authors present the OpenDF framework and recall that dataflow programming was once invented to address the problem of parallel computing, and discuss the problems with an imperative style, von Neumann programs, and present what they believe are the advantages of using a data flow programming model.
Abstract: This paper presents the OpenDF framework and recalls that dataflow programming was once invented to address the problem of parallel computing. We discuss the problems with an imperative style, von Neumann programs, and present what we believe are the advantages of using a dataflow programming model. The CAL actor language is briefly presented and its role in the ISO/MPEG standard is discussed. The Dataflow Interchange Format (DIF) and related tools can be used for analysis of actors and networks, demonstrating the advantages of a dataflow approach. Finally, an overview of a case study implementing an MPEG- 4 decoder is given.

81 citations


Proceedings ArticleDOI
19 Apr 2009
TL;DR: This paper focuses on detection of SDF-like regions in dynamic dataflow descriptions — in particular, in the generalized specification framework of CAL, which is an important step for applying static scheduling techniques within aynamic dataflow framework.
Abstract: Dataflow descriptions have been used in a wide range of Digital Signal Processing (DSP) applications, such as multi-media processing, and wireless communications. Among various forms of dataflow modeling, Synchronous Dataflow (SDF) is geared towards static scheduling of computational modules, which improves system performance and predictability. However, many DSP applications do not fully conform to the restrictions of SDF modeling. More general dataflow models, such as CAL [1], have been developed to describe dynamically-structured DSP applications. Such generalized models can express dynamically changing functionality, but lose the powerful static scheduling capabilities provided by SDF. This paper focuses on detection of SDF-like regions in dynamic dataflow descriptions — in particular, in the generalized specification framework of CAL. This is an important step for applying static scheduling techniques within a dynamic dataflow framework. Our techniques combine the advantages of different dataflow languages and tools, including CAL [1], DIF [2] and CAL2C [3]. The techniques are demonstrated on the IDCT module of MPEG Reconfigurable Video Coding (RVC).

43 citations


Patent
17 Jul 2009
TL;DR: In this article, a scan-chain implementation of an integrated circuit device using programmable logic (programmable logic) is presented, where the probe pad (306, 111 -116) is coupled directly to the test logic (305, 104) such that configuration of the programmable Logic (550, 314, 105) is not required.
Abstract: An integrated circuit device includes a stacked die (102) and a base die (101) having probe pads (306, 111 -116) that directly couple to test logic (305, 104) of the base die to implement a scan chain for testing of the integrated circuit device. The base die further includes contacts (107) disposed on a back side of the base die and through-die vias (310, 121 -128) coupled to the contacts and coupled to programmable logic (550, 314, 105) of the base die. The base die also includes a first probe pad (111) configured to couple test input, a second probe pad (112) configured to couple test output, and a third probe pad (113) configured to couple control signals. Test logic (305) of the base die is configured to couple to additional test logic (405) of the stacked die to implement the scan chain. The probe pads (306, 111 -116) are coupled directly to the test logic (305, 104) such that configuration of the programmable logic (550, 314, 105) is not required to implement the scan chain.

43 citations


Proceedings ArticleDOI
22 Feb 2009
TL;DR: In this paper, two complementary approaches for clock power reduction in the Xilinx Virtex-5 FPGA are presented, where clock enable signals on flip-flops are selectively migrated to use the dedicated clock enable available on the built-in clock network.
Abstract: Clock network power in field-programmable gate arrays (FPGAs) is considered and two complementary approaches for clock power reduction in the Xilinx Virtex-5 FPGA are presented. The approaches are unique in that they leverage specific architectural aspects of Virtex-5 to achieve reductions in dynamic power consumed by the clock network. The first approach comprises a placement-based technique to reduce interconnect resource usage on the clock network, thereby reducing capacitance and power (up to 12%). The second approach borrows the "clock gating" notion from the ASIC domain and applies it to FPGAs. Clock enable signals on flip-flops are selectively migrated to use the dedicated clock enable available on the FPGA's built-in clock network, leading to reduced toggling on the clock interconnect and lower power (up to 28%). Power reductions are achieved without any performance penalty, on average.

39 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: The analyses and optimizations of the CHiMPS compiler that construct many-cache caches are presented, showing a performance advantage of 7.8x over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater.
Abstract: Many-cache is a memory architecture that efficiently supports caching in commercially available FPGAs. It facilitates FPGA programming for high-performance computing (HPC) developers by providing them with memory performance that is greater and power consumption that is less than their current CPU platforms, but without sacrificing their familiar, C-based programming environment.Many-cache creates multiple, multi-banked caches on top of an FGPA's small, independent memories, each targeting a particular data structure or region of memory in an application and each customized for the memory operations that access it. The caches are automatically generated from C source by the CHiMPS C-to-FPGA compiler.This paper presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches. An architectural evaluation of CHiMPS-generated FPGAs demonstrates a performance advantage of 7.8x (geometric mean) over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater, by a geometric mean of 21.3x.

37 citations


Proceedings ArticleDOI
22 Feb 2009
TL;DR: The proposed resynthesis is capable of substantial logic restructuring, is customizable to solve a variety of optimization tasks, and has reasonable runtime on industrial designs.
Abstract: We describe an optimization method for combinational and sequential logic networks, with emphasis on scalability and the scope of optimization. The proposed resynthesis (a) is capable of substantial logic restructuring, (b) is customizable to solve a variety of optimization tasks, and (c) has reasonable runtime on industrial designs. The approach uses don't cares computed for a window surrounding a node and can take into account external don't cares (e.g. unreachable states). It uses a SAT solver and interpolation to find a new representation for a node. This representation can be in terms of inputs from other nodes in the window thus effecting Boolean re-substitution. Experimental results on 6-input LUT networks after high effort synthesis show substantial reductions in area and delay. When applied to 20 large academic benchmarks, the LUT count and logic level is reduced by 45.0% and 12.2%, respectively. The longest runtime for synthesis and mapping is about two minutes. When applied to a set of 14 industrial benchmarks ranging up to 83K 6-LUTs, the LUT count and logic level is reduced by 11.8% and 16.5%, respectively. Experimental results on 6-input LUT networks after high-effort synthesis show substantial reductions in area and delay. The longest runtime is about 30 minutes.

36 citations


Journal ArticleDOI
TL;DR: An in-depth case study on dataflow-based analysis and exploitation of parallelism in the design and implementation of a MPEG reconfigurable video coding decoder and the systematic exploitation of concurrency in CAL networks that are targeted to multicore platforms is presented.
Abstract: This paper presents an in-depth case study on dataflow-based analysis and exploitation of parallelism in the design and implementation of a MPEG reconfigurable video coding decoder. Dataflow descriptions have been used in a wide range of digital signal processing (DSP) applications, such as applications for multimedia processing and wireless communications. Because dataflow models are effective in exposing concurrency and other important forms of high level application structure, dataflow techniques are promising for implementing complex DSP applications on multicore systems, and other kinds of parallel processing platforms. In this paper, we use the client access license (CAL) language as a concrete framework for representing and demonstrating dataflow design techniques. Furthermore, we also describe our application of the differential item functioning dataflow interchange format package (TDP), a software tool for analyzing dataflow networks, to the systematic exploitation of concurrency in CAL networks that are targeted to multicore platforms. Using TDP, one is able to automatically process regions that are extracted from the original network, and exhibit properties similar to synchronous dataflow (SDF) models. This is important in our context because powerful techniques, based on static scheduling, are available for exploiting concurrency in SDF descriptions. Detection of SDF-like regions is an important step for applying static scheduling techniques within a dynamic dataflow framework. Furthermore, segmenting a system into SDF-like regions also allows us to explore cross-actor concurrency that results from dynamic dependences among different regions. Using SDF-like region detection as a preprocessing step to software synthesis generally provides an efficient way for mapping tasks to multicore systems, and improves the system performance of video processing applications on multicore platforms.

33 citations


Journal ArticleDOI
TL;DR: This special issue on Security in Reconfigurable Systems Design reports on recent research results in the design and implementation of trustworthy reconfigurable systems.
Abstract: This special issue on Security in Reconfigurable Systems Design reports on recent research results in the design and implementation of trustworthy reconfigurable systems. Five articles cover topics including power-efficient implementation of public-key cryptography, side-channel analysis of electromagnetic radiation, side-channel resistant design, design of robust unclonable functions on an FPGA, and Trojan detection in an FPGA bitstream.

32 citations


Proceedings ArticleDOI
TL;DR: In this article, the capacity region of a two-user state-dependent multiaccess channel with only one encoder is informed, non-causally, of the channel states is investigated.
Abstract: We consider a two-user state-dependent multiaccess channel in which only one of the encoders is informed, non-causally, of the channel states. Two independent messages are transmitted: a common message transmitted by both the informed and uninformed encoders, and an individual message transmitted by only the uninformed encoder. We derive inner and outer bounds on the capacity region of this model in the discrete memoryless case as well as the Gaussian case. Further, we show that the bounds for the Gaussian case are tight in some special cases.

29 citations


Journal ArticleDOI
TL;DR: This article shows a form that such mechanisms can take in actor-oriented components, gives a formal structure, and describes a prototype implementation, which permits a disciplined form of multiple inheritance with unambiguous inheritance and overriding behavior.
Abstract: Actor-oriented components emphasize concurrency and temporal semantics and are used for modeling and designing embedded software and hardware. Actors interact with one another through ports via a messaging schema that can follow any of several concurrent semantics. Domain-specific actor-oriented languages and frameworks are common (Simulink, LabVIEW, SystemC, etc.). However, they lack many modularity and abstraction mechanisms that programmers have become accustomed to in object-oriented components, such as classes, inheritance, interfaces, and polymorphism, except as inherited from the host language. This article shows a form that such mechanisms can take in actor-oriented components, gives a formal structure, and describes a prototype implementation. The mechanisms support actor-oriented class definitions, subclassing, inheritance, and overriding. The formal structure imposes structural constraints on a model (mainly the “derivation invariant”) that lead to a policy to govern inheritance. In particular, the structural constraints permit a disciplined form of multiple inheritance with unambiguous inheritance and overriding behavior. The policy is based formally on a generalized ultrametric space with some remarkable properties. In this space, inheritance is favored when actors are “closer” (in the generalized ultrametric), and we show that when inheritance can occur from multiple sources, one source is always unambiguously closer than the other.

27 citations


Proceedings ArticleDOI
05 Apr 2009
TL;DR: This work presents a system model and software architecture for implementing runtime adaptive applications on FPGAs, separating the control and processing planes and abstracting away the details of hardware reconfiguration from the system designer.
Abstract: Adaptive systems are set to become more mainstream, as numerous practical applications in the communications domain emerge. FPGAs offer an ideal implementation platform, combining high performance with flexibility. While significant research has been undertaken in the area of FPGA partial reconfiguration, it has focussed primarily on low-level architecture-specific implementations. Building upon this previous work, we present a system model and software architecture for implementing runtime adaptive applications on FPGAs, separating the control and processing planes and abstracting away the details of hardware reconfiguration from the system designer. Hardware processing components appear as software components in the runtime system, enabling their inclusion in adaptive applications. We present an adaptive wireless application, demonstrating the use of the model and software architecture.

Proceedings ArticleDOI
30 Nov 2009
TL;DR: This paper identifies important features a cognitive radio framework should provide, namely a virtual architecture for hardware abstraction, an adaptive run-time system for managing cognition, and high level design tools for cognitive radio development, and proposes a novel FPGA-based framework that provides all these features.
Abstract: This paper identifies important features a cognitive radio framework should provide, namely a virtual architecture for hardware abstraction, an adaptive run-time system for managing cognition, and high level design tools for cognitive radio development. We evaluate a range of existing frameworks with respect to these, and propose a novel FPGA-based framework that provides all these features. By abstracting away the details of hardware reconfiguration, radio designers can implement FPGA-based cognitive nodes much as they would do for a software implementation. We apply the proposed framework to the design and implementation of a receiver node that works in two modes: discovery, where it uses spectrum sensing to find a radio transmission, and communication, in which it receives and demodulates the said transmission. We show how the whole design process does not require any hardware experience on behalf of the radio designer.

Patent
Guo Jun Ren1, Qi Zhang1, Ketan Sodha1
29 Jan 2009
TL;DR: In this paper, an adaptive feedback mechanism is employed to sense and correct performance degradation, while simultaneously facilitating configurability within integrated circuits (ICs) such as programmable logic devices (PLDs).
Abstract: A method and apparatus to reduce the degradation in performance of semiconductor-based devices due to process, voltage, and temperature (PVT) and/or other causes of variation. Adaptive feedback mechanisms are employed to sense and correct performance degradation, while simultaneously facilitating configurability within integrated circuits (ICs) such as programmable logic devices (PLDs). A voltage-feedback mechanism is employed to detect PVT variation and mirrored current references are adaptively adjusted to track and substantially eliminate the PVT variation. More than one voltage-feedback mechanism may instead be utilized to detect PVT-based variations within a differential device, whereby a first voltage-feedback mechanism is utilized to detect common-mode voltage variation and a second voltage-feedback mechanism produces mirrored reference currents to substantially remove the common-mode voltage variation and facilitate symmetrical operation of the differential device.

Patent
Juan J. Noguera Serra1, Tim Tuan1
05 Mar 2009
TL;DR: In this paper, a method of operating an integrated circuit having a circuit block configurable by a configuration memory is disclosed, which includes determining whether to operate the circuit block in a normal operation mode or a low power mode.
Abstract: A method of operating an integrated circuit having a circuit block configurable by a configuration memory is disclosed. The method includes determining whether to operate the circuit block in a normal operation mode or a low power mode. The configuration memory is loaded with normal operation mode configuration data for the circuit block if the normal operation mode is determined. If the low power mode is determined, the configuration memory is loaded with low power mode configuration data for the circuit block.

Patent
03 Apr 2009
TL;DR: In this paper, a method and apparatus for self-checking and self-correcting memory states of a programmable resource is described, where a first core and a second core instantiated therein.
Abstract: Method and apparatus for self-checking and self-correcting memory states of a programmable resource is described. Programmable resource of an integrated circuit has a first core and a second core instantiated therein. A first internal configuration port and a second internal configuration port of the integrated circuit are respectively connected to the first core and the second core. The second core is coupled to the first core for monitoring operation of the first core with the second core, and the second core is configured to obtain control responsive to a failure of the first core or the first internal configuration port for a self-correcting mode.

Patent
19 Jan 2009
TL;DR: In this paper, a model equation is defined for each embedded test circuit, with each model equation specifying the output delay of its associated embedded test circuits as a function of Front End OF the Line (FEOL) and Back End Of The Line (BEOL) parameters.
Abstract: An integrated circuit (IC) includes multiple embedded test circuits that all include a ring oscillator coupled to a test load. The test load either is a direct short in the ring oscillator or else is a interconnect load that is representative of one of the interconnect layers in the IC. A model equation is defined for each embedded test circuit, with each model equation specifying the output delay of its associated embedded test circuit as a function of Front End OF the Line (FEOL) and Back End Of the Line (BEOL) parameters. The model equations are then solved for the various FEOL and BEOL parameters as functions of the test circuit output delays. Finally, measured output delay values are substituted in to these parameter equations to generate actual values for the various FEOL and BEOL parameters, thereby allowing any areas of concern to be quickly and accurately identified.

Patent
09 Jan 2009
TL;DR: In this article, an analog front-end having built-in equalization includes a control module and a tunable gain stage, where the control module is operably coupled to provide a frequency response setting based on a channel response of a channel providing high-speed serial data to the analog front end.
Abstract: An analog front-end having built-in equalization includes a control module and a tunable gain stage. The control module is operably coupled to provide a frequency response setting based on a channel response of a channel providing high-speed serial data to the analog front-end. The tunable gain stage includes a frequency dependent load and an amplifier input section. The frequency dependent load is adjusted based on the frequency response setting. The amplifier input section is operably coupled to the frequency dependent load and receives the high-speed serial data. In conjunction with the frequency dependent load, the amplifier input section amplifies and equalizes the high-speed serial data to produce an amplified and equalized serial data.

Patent
13 May 2009
TL;DR: In this article, an integrated circuit having a circuit for reducing distortion in a power amplifier (302) is disclosed, which comprises a predistortion circuit (304, 402), coupled to receive a signal (x(n)) to be amplified; sample capture buffers (306, 406) coupled to an output (z(n)), and an estimator circuit (308, 520, 412, 612).
Abstract: An integrated circuit having a circuit for reducing distortion in a power amplifier (302) is disclosed. The integrated circuit comprises a predistortion circuit (304, 402) coupled to receive a signal (x(n)) to be amplified; sample capture buffers (306, 406) coupled to an output (z(n)) of the predistortion circuit and an input/output port of the integrated circuit; and an estimator circuit (308, 520, 412, 612) coupled to the sample capture buffers, wherein the estimator circuit generates parameters for the predistortion circuit based upon the output of the predistortion circuit and an output of the power amplifier received at the input/output port of the integrated circuit. A method of reducing distortion in a power amplifier is also disclosed.

Journal ArticleDOI
TL;DR: A new technology mapper, WireMap, uses an edge flow heuristic to improve the routability of a mapped design and has an additional advantage of reducing an average number of inputs of LUTs without increasing the total LUT count and depth.
Abstract: This article presents a new technology mapper, WireMap. The mapper uses an edge flow heuristic to improve the routability of a mapped design. The heuristic is applied during the iterative mapping optimization to reduce the total number of pin-to-pin connections (or edges). On academic benchmark (ISCAS, MCNC, and ITC designs), the average edge reduction of 9.3p is achieved while maintaining depth and LUT count compared to state-of-the-art technology mapping. Placing and routing the resulting netlists leads to an 8.5p reduction in the total wirelength, a 6.0p reduction in minimum channel width, and a 2.3p reduction in critical path delay. This technique is applied in the Xilinx ISE Design tool to evaluate its effect on industrial Virtex5 circuits. In a set of 20 large designs, we find the edge reduction is 6.8p while total wirelength measured in the placer is reduced by 3.6p. Applying WireMap has an additional advantage of reducing an average number of inputs of LUTs without increasing the total LUT count and depth. The percentages of 5- and 6-LUTs in a typical design are reduced, while the percentages of 2-, 3-, and 4-LUTs are increased. These smaller LUTs can be merged into pairs and implemented using the dual-output LUT structure found in commercial FPGAs. For academic benchmarks, WireMap leads to 9.4p fewer dual-output LUTs after merging. For the industrial designs, WireMap leads to 6.3p fewer dual-output Virtex5 LUTs.

Patent
Hsung Jai Im1, Paak Sunhom1, Boon Y. Ang1
20 Feb 2009
TL;DR: In this paper, a MOS fuse (200) is programmed by applying a programming signal to the fuse terminals (204, 206) so that programming current flows through the fuse link (202).
Abstract: At least one MOS parameter of a MOS fuse (200) is characterized to provide at least one MOS parameter reference value. Then, the MOS fuse (200) is programmed by applying a programming signal to the fuse terminals (204, 206) so that programming current flows through the fuse link (202). The fuse resistance is measured to provide a measured fuse resistance associated with a first logic value. A MOS parameter of the programmed MOS fuse is measured to provide a measured MOS parameter value. The measured MOS parameter value is compared to the reference MOS parameter value to determine a second logic value of the MOS fuse, and a bit value is output based on the comparison.

Patent
Kiran S. Puranik1
19 May 2009
TL;DR: In this paper, an integrated circuit (IC) includes a peripheral component interconnect express (PCIe) root complex having a central processing unit (CPU), a memory controller configured to control a main memory of a PCIe system, and a PCIe port coupled to a PCIe endpoint device through a PCIe switch.
Abstract: An integrated circuit (“IC”) includes a peripheral component interconnect express (“PCIe”) root complex having a central processing unit (“CPU”), a memory controller configured to control a main memory of a PCIe system, and a PCIe port coupled to a PCIe endpoint device through a PCIe switch. The PCIe endpoint device is configured to initiate data transfer between the main memory and the PCIe endpoint device.

Journal ArticleDOI
TL;DR: This article considers packing in the commercial FPGA context and discusses packing techniques for large IP blocks, namely, block RAMs and DSPs, and presents techniques for dual-output LUT packing that lead to improved area-efficiency, with minimal performance degradation.
Abstract: Packing is a key step in the FPGA tool flow that straddles the boundaries between synthesis, technology mapping and placement Packing strongly influences circuit speed, density, and power, and in this article, we consider packing in the commercial FPGA context and examine the area and performance trade-offs associated with packing in a state-of-the-art FPGA---the Xilinx® VirtexTM-5 FPGA In addition to look-up-table (LUT)-based logic blocks, modern FPGAs also contain large IP blocks We discuss packing techniques for both types of blocks Virtex-5 logic blocks contain dual-output 6-input LUTs Such LUTs can implement any single logic function of up to 6 inputs, or any two logic functions requiring no more than 5 distinct inputs The second LUT output has reduced speed, and therefore, must be used judiciously We present techniques for dual-output LUT packing that lead to improved area-efficiency, with minimal performance degradation We then describe packing techniques for large IP blocks, namely, block RAMs and DSPs We pack circuits into the large blocks in a way that leverages the unique block RAM and DSP layout/architecture in Virtex-5, achieving significantly improved design performance

Patent
Robert O. Conn1
19 Jun 2009
TL;DR: In this paper, a capacitive interposer (caposer) is disposed inside an integrated circuit package between a die and an inside surface of the package, and conductive layers within the caposer constitute a bypass capacitor.
Abstract: A capacitive interposer (caposer) is disposed inside an integrated circuit package between a die and an inside surface of the package. Conductive layers within the caposer constitute a bypass capacitor. In a through-hole caposer, micro-bumps on the die pass through through-holes in the caposer and contact corresponding landing pads on the package. As they pass through the caposer, power and ground micro-bumps make contact with the plates of the bypass capacitor. In a via caposer, power and ground micro-bumps on the die are coupled to power and ground landing pads on the package as well as to the plates of the bypass capacitor by power and ground vias that extend through the caposer. In signal redistribution caposer, conductors within the caposer redistribute signals between die micro-bumps and package landing pads. In an impedance matching caposer, termination structures within the caposer provide impedance matching to a printed circuit board trace.

Proceedings ArticleDOI
22 Feb 2009
TL;DR: This poster presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches, and presents the details of the cache parameters on a Xilinx Virtex-5 LX110T FPGA.
Abstract: CHiMPS is a C-based compiler for high-performance computing (HPC) on heterogeneous CPU-FPGA computing platforms. CHiMPS efficiently supports random accesses to main memory through the many-cache memory model, enabling a broader range of applications to take advantage of FPGA-based acceleration. Many-cache creates multiple caches on top of an FGPA's small, independent memories, each targeting a particular data structure or region of memory in an application and each customized for the memory operations that access it. This poster presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches, and presents the details of the cache parameters on a Xilinx Virtex-5 LX110T FPGA. Detailed simulation results on HPC kernels demonstrate a 7.8x (geometric mean) performance boost over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater, by a geometric mean of 21.3x.

Proceedings ArticleDOI
29 Sep 2009
TL;DR: Property of the logic synthesis netlist is used to define both a logic element architecture and an associated technology mapping algorithm that together provide improved logic density and shows that 6-LUT optimal mapping depths can be achieved with a small fraction of the LUTs in hardware being 6- lUTs and the remainder being extended 5-Luts, suggesting that a heterogeneous logic block architecture may prove to be advantageous.
Abstract: We leverage properties of the logic synthesis netlist to define both a logic element architecture and an associated technology mapping algorithmthat together provide improved logic density. We demonstrate that an “extended” logic element with slightly modified K-input LUTs achieves much of the benefit of an architecturewithK+1-inputLUTs, while consuming silicon area close to a K-LUT (a K-LUT requires half the area of a K+1-LUT).We introduce the notion of “non-inverting paths” in a circuit's AND-inverter graph (AIG) and show their utility in mapping into the proposed logic element. Results show that while circuits mapped to a traditional 5-LUT architecture need 14% more LUTs and have 12% more depth than a 6-LUT architecture, our extended 5-LUT architecture requires only 7%more LUTs and 2.5% more depth than 6-LUTs, on average. Nearly all of the depth reduction associated with moving from K-input to K+1-input LUTs can be achieved with considerably less area using extended K-LUTs. We further show that 6-LUT optimal mapping depths can be achieved with a small fraction of the LUTs in hardware being 6-LUTs and the remainder being extended 5-LUTs, suggesting that a heterogeneous logic block architecture may prove to be advantageous.

Patent
11 Jun 2009
TL;DR: In this paper, a method for evaluating an architecture for an integrated circuit device is presented, which consists of generating a library of primitives for a predetermined architecture, transforming an original dataflow program into an intermediate format, transforming the intermediate format to a data-flow program defined in terms of the predefined library, and generating an implementation profile comprising information related to an implementation of the original data flow program in the integrated circuit having the predetermined architecture.
Abstract: A method of evaluating an architecture for an integrated circuit device is disclosed. The method comprises generating a library of primitives for a predetermined architecture; transforming an original dataflow program into an intermediate format; transforming the intermediate format to a dataflow program defined in terms of the predefined library of primitives; and generating an implementation profile comprising information related to an implementation of the original dataflow program in an integrated circuit having the predetermined architecture. A method of evaluating an architecture for an integrated circuit device is also disclosed.

Journal ArticleDOI
24 Mar 2009
TL;DR: In this paper, a restricted subset among the totality of pairs of entries in the same row of the difference triangle needs to be checked, and a subset is explicitly described such a subset.
Abstract: In this paper, we show that in order to ascertain whether a permutation has the Costas property, only a restricted subset among the totality of pairs of entries in the same row of the difference triangle needs to be checked, and we explicitly describe such a subset. This represents a further refinement on the definition of a Costas permutation. This observation can be used to speed up algorithms that exhaustively search for Costas permutations. Asymptotically, the savings approaches 43% for large orders when compared with the previous standard efficient method.

Proceedings ArticleDOI
Gordon J. Brebner1
01 Dec 2009
TL;DR: The packet is the atom of the digital revolution: the unit of data communication that leads ultimately to the Internet as the authors know it today.
Abstract: The packet is the atom of the digital revolution: the unit of data communication. The use of packets in networking was first proposed almost 50 years ago, leading ultimately to the Internet as we know it today. As networking has scaled down towards networks on chip, so packets feature for digital communication in the small. As applications have gone digital, so their data has become packetised. Streams of packets, and the processing of these packets, are characteristic of the digital age.

Patent
Stephen Neuendorffer1
17 Jul 2009
TL;DR: In this paper, a memory configuration data-set is selected based on a characteristic of the memory arrangement and a particular partial reconfiguration data set is selected to implement a specific memory controller that differs from the general memory controller.
Abstract: A circuit controls a memory arrangement and includes an array of programmable resources and interconnect resources, a reconfiguration port, and a processor. The programmable resources and interconnect resources in the array are initially configured with a reference configuration data-set. The reference configuration data-set configures the programmable resources and interconnect resources to implement a general memory controller. The processor obtains a characteristic of the memory arrangement and selects a particular partial reconfiguration data-set based on the characteristic of the memory arrangement. The processor reconfigures the programmable resources and interconnect resources in the array via the reconfiguration port. The processor reconfigures the programmable resources and interconnect resources with the particular partial reconfiguration data-set. The particular partial reconfiguration data-set partially reconfigures the programmable resources and interconnect resources to implement a portion of a specific memory controller that differs from the general memory controller.

Patent
Arifur Rahman1
20 Feb 2009
TL;DR: In this paper, the authors describe a semiconductor assembly with reduced thermal spreading resistance and methods of making the same, where the primary integrated circuit (IC) is mounted on the primary IC die and the secondary IC die is between the IC and the heat extraction element.
Abstract: Semiconductor assemblies having reduced thermal spreading resistance and methods of making the same are described. In an example, a semiconductor device (101) includes a primary integrated circuit (IC) die (102) and at least one secondary IC die (104) mounted on the primary IC die (102). A heat extraction element (110) includes a base (109) mounted to the semiconductor device (101) such that each of the at least one secondary IC die (104) is between the primary IC die (102) and the heat extraction element (110). At least one dummy fill (106) is adjacent the at least one secondary IC die (104), and each thermally couples the primary IC die (102) to the heat extraction element (110).