scispace - formally typeset
Search or ask a question

Showing papers on "Field-programmable gate array published in 1996"


Book ChapterDOI
23 Sep 1996
TL;DR: RaPiD is presented, a new coarse-grained FPGA architecture that is optimized for highly repetitive, computation-intensive tasks that make much more efficient use of silicon than traditional FPGAs and also yield much higher performance for a wide range of applications.
Abstract: Configurable computing has captured the imagination of many architects who want the performance of application-specific hardware combined with the reprogrammability of general-purpose computers. Unfortunately, onfigurable computing has had rather limited success largely because the FPGAs on which they are built are more suited to implementing »ndom logic than computing tasks. This paper presents RaPiD, a new coarse-grained FPGA architecture that is optimized for highly repetitive, computation-intensive tasks. Very deep application-specific computation pipelines can be configured in RaPiD. These pipelines make much more efficient use of silicon than traditional FPGAs and also yield much higher performance for a wide range of applications.

459 citations


Dissertation
01 Jan 1996
TL;DR: MATRIX is developed, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs, and it is shown that MATRIX yields 10-20$\times the computational density of conventional processors.
Abstract: General-purpose computing devices allow us to (1) customize computation after fabrication and (2) conserve area by reusing expensive active circuitry for different functions in time. We define RP-space, a restricted domain of the general-purpose architectural space focussed on reconfigurable computing architectures. Two dominant features differentiate reconfigurable from special-purpose architectures and account for most of the area overhead associated with RP devices: (1) instructions which tell the device how to behave, and (2) flexible interconnect which supports task dependent dataflow between operations. We can characterize RP-space by the allocation and structure of these resources and compare the efficiencies of architectural points across broad application characteristics. Conventional FPGAs fall at one extreme end of this space and their efficiency ranges over two orders of magnitude across the space of application characteristics. Understanding RP-space and its consequences allows us to pick the best architecture for a task and to search for more robust design points in the space. Our DPGA, a fine-grained computing device which adds small, on-chip instruction memories to FPGAs is one such design point. For typical logic applications and finite-state machines, a DPGA can implement tasks in one-third the area of a traditional FPGA. TSFPGA, a variant of the DPGA which focuses on heavily time-switched interconnect, achieves circuit densities close to the DPGA, while reducing typical physical mapping times from hours to seconds. Rigid, fabrication-time organization of instruction resources significantly narrows the range of efficiency for conventional architectures. To avoid this performance brittleness, we developed MATRIX, the first architecture to defer the binding of instruction resources until run-time, allowing the application to organize resources according to its needs. Our focus MATRIX design point is based on an array of 8-bit ALU and register-file building blocks interconnected via a byte-wide network. With today's silicon, a single chip MATRIX array can deliver over 10 Gop/s (8-bit ops). On sample image processing tasks, we show that MATRIX yields 10-20$\times$ the computational density of conventional processors. Understanding the cost structure of RP-space helps us identify these intermediate architectural points and may provide useful insight more broadly in guiding our continual search for robust and efficient general-purpose computing structures. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

435 citations


Journal ArticleDOI
TL;DR: This tutorial surveys commercially available, high-capacity field-programmable devices and describes the three main categories of FPDs: simple and complex programmable logic devices, and field- programmable gate arrays.
Abstract: This tutorial surveys commercially available, high-capacity field-programmable devices. The authors describe the three main categories of FPDs: simple and complex programmable logic devices, and field-programmable gate arrays. They then give architectural details of the most important chips and example applications of each type of device.

382 citations


Journal ArticleDOI
TL;DR: This work exhibits a dozen applications where PAM technology proves superior, both in performance and cost, to every other existing technology, including supercomputers, massively parallel machines, and conventional custom hardware.
Abstract: Programmable active memories (PAM) are a novel form of universal reconfigurable hardware coprocessor. Based on field-programmable gate array (FPGA) technology, a PAM is a virtual machine, controlled by a standard microprocessor, which can be dynamically and indefinitely reconfigured into a large number of application-specific circuits. PAM's offer a new mixture of hardware performance and software versatility. We review the important architectural features of PAM's, through the example of DECPeRLe-1, an experimental device built in 1992. PAM programming is presented, in contrast to classical gate-array and full custom circuit design. Our emphasis is on large, code-generated synchronous systems descriptions; no compromise is made with regard to the performance of the target circuits. We exhibit a dozen applications where PAM technology proves superior, both in performance and cost, to every other existing technology, including supercomputers, massively parallel machines, and conventional custom hardware. The fields covered include computer arithmetic, cryptography, error correction, image analysis, stereo vision, video compression, sound synthesis, neural networks, high-energy physics, thermodynamics, biology and astronomy. At comparable cost, the computing power virtually available in a PAM exceeds that of conventional processors by a factor 10 to 1000, depending on the specific application, in 1992. A technology shrink increases the performance gap between conventional processors and PAM's. By Noyce's law, we predict by how much the performance gap will widen with time.

359 citations


Patent
Jocelyn Cloutier1
12 Nov 1996
TL;DR: In this article, a multi-dimensional array of field programmable gate arrays (FPGAs), each FPGA having its own local memory, is used for image processing, pattern recognition, and neural network applications.
Abstract: A multiprocessor having an input/output controller, a process controller, and a multidimensional arrays of field programmable gate arrays (FPGAs), each FPGA having its own local memory. The multiprocessor may be programmed to function as a single-instruction, multiple-data (SIMD) parallel processor having a matrix of processing elements (PEs), where each FPGA may be programmed to operate as a submatrix array of PEs. The multiprocessor is especially useful for image processing, pattern recognition, and neural network applications.

312 citations


Book ChapterDOI
07 Oct 1996
TL;DR: A detailed case-study of the first such application of evolution directly to the configuration of a Field Programmable Gate Array (FPGA), resulting in a highly efficient circuit with a richer structure and dynamics and a greater respect for the natural properties of the implementation medium than is usual.
Abstract: ‘Intrinsic’ Hardware Evolution is the use of artificial evolution — such as a Genetic Algorithm — to design an electronic circuit automatically, where each fitness evaluation is the measurement of a circuit's performance when physically instantiated in a real reconfigurable VLSI chip. This paper makes a detailed case-study of the first such application of evolution directly to the configuration of a Field Programmable Gate Array (FPGA). Evolution is allowed to explore beyond the scope of conventional design methods, resulting in a highly efficient circuit with a richer structure and dynamics and a greater respect for the natural properties of the implementation medium than is usual. The application is a simple, but not toy, problem: a tone-discrimination task. Practical details are considered throughout.

312 citations


Proceedings ArticleDOI
Chow1
17 Apr 1996
TL;DR: A processor architecture called OneChip, which combines a fixed-logic processor core with reconfigurable logic resources into a MIPS-like processor, which eliminates the shortcomings of other custom compute machines.
Abstract: This paper describes a processor architecture called OneChip, which combines a fixed-logic processor core with reconfigurable logic resources. Using the programmable components of this the performance of speed-critical can be improved by customizing OneChip's execution units, or flexibility can be added to the glue logic interfaces of embedded controller applications. OneChip eliminates the shortcomings of other custom compute machines by tightly integrating its reconfigurable resources into a MIPS-like processor. Speedups of close to 50 over strict software implementations on a MIPS R4400 are achievable for computing the DCT.

306 citations


Proceedings Article
01 Jan 1996
TL;DR: OneChip as discussed by the authors is a processor architecture that combines a fixed logic processor core and reconfigurable logic resources, which can be used to improve the performance of speed-critical applications by customizing OneChip's execution units or flexibility can be added to the glue logic interfaces of embedded controller type applications.
Abstract: This thesis describes a processor architecture called OneChip, which combines a fixed logic processor core and reconfigurable logic resources. Using the variable components of this architecture, the performance of speed-critical applications can be improved by customizing OneChip’s execution units or flexibility can be added to the glue logic interfaces of embedded controller type applications. This work eliminates the shortcomings of other custom compute machines by tightly integrating the reconfigurable resources into a MIPS-like processor. The details of the core processor, the fixed to reconfigurable logic interface and the actual reconfigurable structures are described. To study OneChip’s feasibility, a 32-bit processor as well as several performance enhancement and embedded controller type applications are implemented on the Transmogrifier-1 field programmable system. It is shown that application speedups of over 40 are achievable. However, the design flexibility introduced with the use of less dense, reconfigurable structures carries an area penalty of no less than 3.5 times the size of the custom silicon design implementation.

298 citations


Patent
28 May 1996
TL;DR: The Adaptive Logic Processor (ALP) as mentioned in this paper uses a programmable logic structure called an adaptive logic processor, similar to an extendible field programmable gate array (FPGA) and is optimized for the implementation of program specific pipeline functions.
Abstract: An architecture for information processing devices which allows the construction of low cost, high performance systems for specialized computing applications involving sensor data processing. The reconfigurable processor architecture of the invention uses a programmable logic structure called an Adaptive Logic Processor (ALP). This structure is similar to an extendible field programmable gate array (FPGA) and is optimized for the implementation of program specific pipeline functions, where the function may be changed any number of times during the progress of a computation. A Reconfigurable Pipeline Instruction Control (RPIC) unit is used for loading the pipeline functions into the ALP during the configuration process and coordinating the operations of the ALP with other information processing structures, such as memory, I/O devices, and arithmetic processing units. Multiple components having the reconfigurable architecture of the present invention may be combined to produce high performance parallel processing systems based on the Single Instruction Multiple Data (SIMD) architecture concept.

232 citations


Patent
21 Aug 1996
TL;DR: A technique for configuring arrays of programmable logic cells, including those associated with FPGA devices, through a novel DRAM-based configuration control structure that enables not only "on-the-fly" alterable chip and similar device reconfigurations, but, where desired, self-modifying reconfiguration for differing functionalities of the devices, while providing significantly enhanced system performance at low cost as mentioned in this paper.
Abstract: A technique for configuring arrays of programmable logic cells, including those associated with FPGA devices, through a novel DRAM-based configuration control structure that enables not only "on-the-fly" alterable chip and similar device reconfigurations, but, where desired, self-modifying reconfigurations for differing functionalities of the devices, eliminating current serious reconfigurability limitations and related problems, while providing significantly enhanced system performance at low cost. A large amount of memory is available internal to the FPGA and is accessed with a small number of pins such that the reconfiguration time is, for example, four orders of magnitude faster than the traditional approaches and at notably low cost.

220 citations


Proceedings ArticleDOI
15 Feb 1996
TL;DR: Several usage patterns for DPGAs including temporal pipelining, utility functions, multiple function accommodation, and state-dependent logic are examined, offering insight into the application and technology space where DPGA-style reuse techniques are most beneficial.
Abstract: Dynamically Programmable Gate Arrays (DPGAs) are programmable arrays which allow the strategic reuse of limited resources. In so doing, DPGAs promise greater capacity, and in some cases higher performance, than conventional programmable device architectures where all array resources are dedicated to a single function for an entire operational epoch. This paper examines several usage patterns for DPGAs including temporal pipelining, utility functions, multiple function accommodation, and state-dependent logic. In the process, it offers insight into the application and technology space where DPGA-style reuse techniques are most beneficial.

Journal ArticleDOI
TL;DR: This article summarizes the research results on combinational logic synthesis for LUT based FPGAs under a coherent framework, and classify and summarize the basic techniques into two categories, namely, logic optimization and technology mapping, and describe the existing algorithms and systems in terms of how they use the classified basic techniques.
Abstract: The increasing popularity of the field programmable gate-array (FPGA) technology has generated a great deal of interest in the algorithmic study and tool development for FPGA-specific design automation problems. The most widely used FPGAs are LUT based FPGAs, in which the basic logic element is a K-input one-output lookup-table (LUT) that can implement any Boolean function of up to K variables. This unique feature of the LUT has brought new challenges to logic synthesis and optimization, resulting in many new techniques reported in recent years. This article summarizes the research results on combinational logic synthesis for LUT based FPGAs under a coherent framework. These results were dispersed in various conference proceedings and journals and under various formulations and terminologies. We first present general problem formulations, various optimization objectives and measurements, then focus on a set of commonly used basic concepts and techniques, and finally summarize existing synthesis algorithms and systems. We classify and summarize the basic techniques into two categories, namely, logic optimization and technology mapping, and describe the existing algorithms and systems in terms of how they use the classified basic techniques. A comprehensive list of references is compiled in the attached bibliography.

Book ChapterDOI
23 Sep 1996
TL;DR: The issues involved in presenting a software-oriented user with a larger virtual hardware resource that is implemented using smaller physical FPGA hardware are explored.
Abstract: Computer operating systems relieve a user of the responsibility for managing physical resources, such as processors, memory and input/output devices. The evolution of FPGA technology means that a new resource is available — it is accessed like memory, but can behave like a flexible processor or input/output device. There is a role for an operating system in making this resource easy to use, by managing it on behalf of a user. This paper explores the issues involved in such management, in particular the problems involved in presenting a software-oriented user with a larger virtual hardware resource that is implemented using smaller physical FPGA hardware. A prototype operating system, that demonstrates operational solutions to the problems using the Xilinx XC6200 FPGA, is described.

01 Jan 1996
TL;DR: DPGAs are less computationally dense than FPGAs, but allow most applications to achieve greater, yielded computational density and there is good reason to believe that much less than 200 bits can be used to describe each 4-LUT computation, making even greater densities achievable in practice.
Abstract: Field-Programmable Gate Arrays are interesting, general-purpose computational devices because (1) they have high computational density and (2) they have finegrained control of their computationalresources since each gate is independently controlled. The earlier provides them with a potential 10 advantage in raw peak performance density versus modern microprocessors. The later can afford a 32 advantage on random bit-level computations. Nonetheless, typical FPGA usage seldom extracts this full density advantage. DPGAs are less computationally dense than FPGAs, but allow most applications to achieve greater, yielded computational density. The key to unraveling this potential paradox lies in distinguishing instruction density from active computing density. Since the storage space for a single instruction is inherently smaller than the computational element it controls, packing several instructionsper computationalunit increases theaggregate instruction capacity of the device without a significant reduction in computational density. The number of different instructions executed per computational task often limits the effective computational density. As a result, DPGAs can meet the throughput requirements of many computing tasks with 3-4 less area than conventional FPGAs. 1 Computational Area “How big is a computation?” The design goal for “general-purpose” computing devices is to develop a device which can: implement desired computational tasks perform the computation at the desired latency or throughput realize the implementation at minimal cost – usually silicon area As device designers we are concerned with the area which a computational element occupies and its latency or throughput. We know, for example, that a four input Lookup Table (4-LUT) occupies roughly 640K 2 (e.g. 0.16mm2 in a 1 CMOS processor ( 0 5 )) [1] [9]. Thus, we get a 4-LUT density of 1.6 4-LUTs per one million 2 of area. At the same time, we notice that the descriptive density of 4-LUT designs can be much greater than the 4-LUT density just observed. That is, the LUT configuration is small compared to the network area so that an idle LUT can occupy much less space than an active one. For illustrative purposes, let us assume that it takes 200 bits to describe the configuration for one 4-LUT, which is typical of commercial FPGA devices. A 64Mb DRAM would hold 335K such configurations. Since a typical 64Mb DRAM is 6G 2, we can pack 56 4-LUT descriptions per one million 2 of area – or about 35 the density which we can pack 4-LUTs. In fact, there is good reason to believe that we can use much less than 200 bits to describe each 4-LUT computation [3], making even greater densities achievable in practice. Returning to our original question, we see that there are two components which combine to define the requisite area for our general-purpose device: 1. Nd – the total number of 4-LUTs in the design – the descriptive complexity 2. Na – the total number of 4-LUTs which must be evaluated simultaneously in order to achieve the desired task time or computational throughput – the parallelism required to achieve the temporal requirements In an ideal packing, a computation requiringNa active FPD’96 -Fourth Canadian Workshop of Field-Programmable Devices May 13-14, 1996, Toronto, Canada Figure 1: DPGA LUT and Interconnect Primitives compute elements and Nd total 4-LUTs, can be implemented in area: Acompute Na ALUT Nd ALUT config mem 1 In practice, a perfect packing is difficult to achieve due to connectivity and dependency requirements such thatN d Nd configuration memories are required.

Patent
19 Apr 1996
TL;DR: An application specific field programmable gate array (ASFPGA) as discussed by the authors includes at least two fixed functional units in a single IC chip and can be configured to perform all the functions specified for a particular ASIC design.
Abstract: An application specific field programmable gate array ("ASFPGA") includes at least two fixed functional units in a single IC chip. Depending upon a specific application for the ASFPGA, the fixed functional units may include one or more bus interfaces, event timers, an interrupt controller, a Direct Memory Access ("DMA") controller, system timers, a real-time clock, a Random Access Memory ("RAM"), a clock synthesizer, a RAM Digital-to-Analog Converter ("DAC"), a display interface, a register file, a compressed image encoder/decoder ("CODEC"), or similar functional units. The ASFPGA also includes a general purpose field programmable gate array ("FPGA"). The FPGA is configurable to effect a specific digital logic circuit interconnection between fixed functional units. After the FPGA has been configured, the fixed functional units together with the FPGA perform all the functions specified for a particular ASIC design.

Patent
23 Jul 1996
TL;DR: In this paper, the authors propose a processor-like device capable of performing the computations necessary to reconfigure the FPGAs in the array in accordance with the next algorithm to be performed.
Abstract: An array of FPGAs change their configurations successively during performance of successive user-defined algorithms. Adjacent FPGAs are connected through external field programmable interconnection devices (FPINs) or cross-bar switches. The array includes a processor-like device capable of performing the computations necessary to reconfigure the FPGAs in the array in accordance with the next algorithm to be performed. Preferably, this processor-like device is itself a "control" array of interconnected FPGAs which have been configured to emulate a selected microprocessor architecture which accepts user-defined primitives corresponding to an algorithm to be performed or a logic architecture to be emulated and reconfigure the FPGAs and the FPINs accordingly.

Proceedings ArticleDOI
28 Apr 1996
TL;DR: A new approach for Field Programmable Gate Array (FPGA) testing is presented that exploits the reprogrammability of FPGAs to create Built-In Self-Test (BIST) logic only during off-line test, achieving BIST without any area overhead or performance penalties to the system function implemented by the FPGA.
Abstract: We present a new approach for Field Programmable Gate Array (FPGA) testing that exploits the reprogrammability of FPGAs to create Built-In Self-Test (BIST) logic only during off-line test. As a result, BIST is achieved without any area overhead or performance penalties to the system function implemented by the FPGA. Our approach is applicable to all levels of testing, achieves maximal fault coverage, and all tests are applied at-speed. We describe the BIST architecture used to test all the programmable logic blocks in an FPGA and the configurations required to implement our approach using a commercial FPGA. We also discuss implementation problems caused by CAD tool limitations and limited architectural resources, and we describe techniques which overcome these limitations.

Proceedings ArticleDOI
04 Nov 1996
TL;DR: The design and implementation of a tactile sensor system, sensor suit, that covers the entire body of a robot and its application with a full-body humanoid are presented.
Abstract: We present design and implementation of a tactile sensor system, sensor suit, that covers the entire body of a robot. The sensor suit is designed to be soft and flexible and to have a large number of sensing regions. We have built the sensor suit using electrically conductive fabric and string. The current version of the sensor suit has 192 sensing regions. Each sensing region works as a binary switch in the current version. All of the signals from the sensor suit are gathered and superimposed on a visual image of the robot. The video multiplexer for the sensor signals is built on a field programmable gate array set. The construction of sensor suit system and its application with a full-body humanoid are presented.

Proceedings ArticleDOI
Louca1, Cook1, Johnson1
17 Apr 1996
TL;DR: This work has explored FPGA implementations of addition and multiplication for IEEE single precision floating-point numbers, and prototypes have been implemented on Altera FLEX8000s, and peak rates of 7 MFlops for 32-bit addition and 2.3 M flop multiplication have been obtained.
Abstract: Floating point operations are hard to implement on FPGAs because of the complexity of their algorithms. On the other hand, many scientific problems require floating point arithmetic with high levels of accuracy in their calculations. Therefore, we have explored FPGA implementations of addition and multiplication for IEEE single precision floating-point numbers. Customizations were performed where this was possible in order to save chip area, or get the most out of our prototype board. The implementations tradeoff area and speed for accuracy. The adder is a bit-parallel adder, and the multiplier is a digit-serial multiplier. Prototypes have been implemented on Altera FLEX8000s, and peak rates of 7 MFlops for 32-bit addition and 2.3 MFlops for 32-bit multiplication have been obtained.

Journal ArticleDOI
TL;DR: This paper reports on an investigation of new CAD tools and the development of a new simulation technique, called dynamic circuit switching (DCS), for dynamically reconfigurable systems.
Abstract: The emergence of static memory-based field programmable gate arrays (FPGAs) that are capable of being dynamically reconfigured, i.e., partially reconfigured while active, has initiated research into new methods of digital systems synthesis. At present, however, there are virtually no specific CAD tools to support the design and investigation of digital systems using dynamic reconfiguration. This paper reports on an investigation of new CAD tools and the development of a new simulation technique, called dynamic circuit switching (DCS), for dynamically reconfigurable systems. The principles of DCS are presented and examples of its application are described.

Proceedings ArticleDOI
Gregory Ray Goslin1
TL;DR: The benefits of using an FPGA as a DSP co-processor, as well as, a stand-alone DSP engine, are described in detail.
Abstract: FPGAs have become a competitive alternative for high performance DSP applications, previously dominated by general purpose DSP and ASIC devices. This paper describes the benefits of using an FPGA as a DSP co-processor, as well as, a stand-alone DSP engine. Two case studies, a Viterbi decoder and a 16-tap FIR filter are used to illustrate how the FPGA can radically accelerate system performance and reduce component count in a DSP application. Finally, different implementation techniques for reducing hardware requirements and increasing performance are described in detail.

Journal ArticleDOI
TL;DR: An efficient algorithm for technology mapping targeting table look-up (TLU) blocks capable of minimizing either the number of TLUs used or the depth of the produced circuit is proposed.
Abstract: This paper proposes an efficient algorithm for technology mapping targeting table look-up (TLU) blocks. It is capable of minimizing either the number of TLUs used or the depth of the produced circuit. Our approach consists of two steps. First a network of super nodes, is created. Next a Boolean function of each super node with an appropriate don't care set is decomposed into a network of TLUs. To minimize the circuit's depth, several rules are applied on the critical portion of the mapped circuit.

Book ChapterDOI
22 Sep 1996
TL;DR: This paper describes a function-level Evolvable Hardware, hardware which is built on programmable logic devices and whose architecture can be reconfigured by using a genetic learning to adapt to new unknown environments in real time.
Abstract: This paper describes a function-level Evolvable Hardware (EHW). EHW is hardware which is built on programmable logic devices (e.g. PLD and FPGA) and whose architecture can be reconfigured by using a genetic learning to adapt to new unknown environments in real time. It is demonstrated that the function-level hardware evolution can attain much higher performances than the gate-level evolution, in neural network applications (e.g. two-spiral). VLSI architecture of the functionbased FPGA dedicated to function level evolution is also described.

Proceedings ArticleDOI
28 Apr 1996
TL;DR: This paper presents a new general technique for testing field programmable gate arrays (FPGAs) by fully exploiting their programmable and configurable characteristics by introducing a hybrid fault model based on a physical and behavioral characterization.
Abstract: This paper presents a new general technique for testing field programmable gate arrays (FPGAs) by fully exploiting their programmable and configurable characteristics. A hybrid fault model is introduced based on a physical and behavioral characterization; this permits the detection of a single fault, as either a stuck-at or a functional fault. A general approach which regards testing as can application for the reconfigurable FPGA, is then proposed. It is shown that different arrangements of disjoint one-dimensional arrays with unilateral horizontal connections and common vertical input lines provide a very good solution. A further feature that is considered for array testing, is the relation between the configuration of the logic blocks and the number of I/O pins in the chip. As an example, the proposed approach is applied for testing the Xilinz 4000 family of FPGAs.

Patent
13 Nov 1996
TL;DR: In this article, the SAVE STATE bit is used to selectively save or restore memory elements in the row, possibly including flip flops, lookup tables, and blocks of RAM.
Abstract: Structures for saving states of memory cells in an FPGA while the FPGA is being configured are shown. Structures for saving flip flop states, lookup table configurations, and block RAM states are specifically described. Structures are described having (1) a SAVE STATE bit for saving the state of each flip flop, each lookup table RAM, and each block RAM. With these structures, each storage unit can be selectively restored. (2) a SAVE STATE bit for each row(column) of logic blocks in the FPGA. In such structures it is possible with a single SAVE STATE signal to selectively save or restore every memory element in the row, possibly including flip flops, lookup tables, and blocks of RAM. Several structures and methods for providing the SAVE STATE signal are also described. These include: (1) bits in the bitstream of a first configuration which indicate which memory units of the first configuration are to be retained during a second configuration; (2) bits at the beginning of the bitstream of a second configuration which indicate which memory units of the first configuration are to be retained during a second configuration; and (3) circuit loadable during operation of a first configuration which indicates which memory units of the first configuration are to be retained during a second configuration.

Patent
John E. McGowan1, William C. Plants1, Joel Landry1, Sinan Kaptanoglu1, Warren K. Miller1 
16 Feb 1996
TL;DR: In this article, a field programmable gate array architecture comprises a plurality of horizontal and vertical routing channels each including an array of rows and columns of logic function modules each having at least one input and one output.
Abstract: A field programmable gate array architecture comprises a plurality of horizontal and vertical routing channels each including a plurality of interconnect conductors. Some interconnect conductors are segmented by user-programmable interconnect elements, and some horizontal and vertical interconnect conductors are connectable by user-programmable interconnect elements located at selected intersections between them. An array of rows and columns of logic function modules each having at least one input and one output is superimposed on the routing channels. The inputs and outputs of the logic function modules are connectable to ones of the interconnect conductors in either or both of the horizontal and vertical routing channels. At least one column of random access memory blocks is disposed in the array. Each random access memory block spans a distance of more than one row of the array such that more than one horizontal routing channel passes therethrough and is connectable to adjacent logic function modules on either side thereof. Each of the random access memory blocks has address inputs, control inputs, data inputs, and data outputs. User-programmable interconnect elements are connected between the address inputs, control inputs, data inputs, and data outputs of the random access memory blocks and selected ones of the interconnect conductors in the horizontal routing channels passing therethrough. Programming circuitry is provided for programming selected ones of the user-programmable interconnect conductors to connect the inputs and outputs of the logic function modules to one another and to the address inputs, control inputs, data inputs, and data outputs of the random access memory blocks.

Proceedings ArticleDOI
12 Feb 1996
TL;DR: The present in this paper the architecture and implementation of the Virtual Image Processor (VIP) which is an SIMD multiprocessor build with large FPGAs, well suited for image processing, pattern recognition and neural network algorithms.
Abstract: The present in this paper the architecture and implementation of the Virtual Image Processor (VIP) which is an SIMD multiprocessor build with large FPGAs. The SIMD architecture, together with a 2D torus connection topology, is well suited for image processing, pattern recognition and neural network algorithms. The VIP board can be programmed on-line at the logic level, allowing optimal hardware dedication to any given algorithm.

Proceedings ArticleDOI
15 Feb 1996
TL;DR: Preliminary results indicate that compared to LUT-based FPGAs the Hybrid offers savings of more than a factor of two in terms of chip area.
Abstract: This paper proposes a new field-programmable architecture that is a combination of two existing technologies: Field Programmable Gate Arrays (FPGAs) based on LookUp Tables (LUTs), and Complex Programmable Logic Devices based on PALs/PLAs. The methodology used for development of the new architecture, called Hybrid FPGA, is based on analysis of a large set of benchmark circuits, in which we determine what types of logic resources best match the needs of the circuits. The proposed Hybrid FPGA is evaluated by manually technology mapping a set of circuits into the new architecture and estimating the total chip area needed for each circuit, compared to the area that would be required if only LUTs were available. Preliminary results indicate that compared to LUT-based FPGAs the Hybrid offers savings of more than a factor of two in terms of chip area.

Proceedings ArticleDOI
15 Feb 1996
TL;DR: RASP as discussed by the authors is a general synthesis system for SRAM-based FPGAs, which consists of a core with a set of synthesis and optimization algorithms for technology independent logic synthesis and technology mapping.
Abstract: In this paper, we present a general synthesis system for SRAM-based FPGAs named RASP. RASP consists of a core with a set of synthesis and optimization algorithms for technology independent logic synthesis and technology mapping for generating generic look-up tables (LUTs), together with a set of architecture-specific technology mapping routines to map the generic LUT network to programmable logic blocks (PLBs) for various SRAM-based FPGA architectures. Via a set of design representation converter routines, these architecture-independent and dependent synthesis algorithms are easily linked, and the entire system is seamlessly integrated into the design flow of commercial FPGA design systems. As a result, RASP can produce highly optimized designs for various SRAM-based FPGA architectures, and can be quickly adapted for new SRAM-based FPGA architectures. We compare RASP performance with that of several commercial synthesis systems on the MCNC logic synthesis benchmarks and a video compressor/decompressor. For almost all cases, RASP produces mapping solutions with significantly smaller critical path delay after place and route than current commercial synthesis systems.

Patent
F. Erich Goetting1
03 Jun 1996
TL;DR: In this paper, an FPGA combines antifuse and static memory cell programing technologies, and configuration control units are used for applying programing voltages to antifuses in the interconnect structure, storing configuration information which configures the cell during normal operation, and allowing a user to capture the status of all signals on interconnect lines and shift these out of the chip to be examined by the user.
Abstract: An FPGA combines antifuse and static memory cell programing technologies. Static memory cells determine the functions of the FPGA logic cells. Antifuses establish routing through the interconnect structure. Associated with each logic cell are configuration control units which store configuration information which configures the cell during normal operation. Each configuration control unit includes an SRAM memory cell. For each input terminal of a logic cell an SRAM configuration control unit selects whether an input signal is inverted or not. Other SRAM cells control whether a signal is cascaded into the logic cell from an adjacent cell, whether the cell operates as a combinational element or a latch, and whether the cell performs NOR or NAND functions. In a preferred embodiment, the configuration control units are used for three purposes: first for applying programing voltages to antifuses in the interconnect structure, second for storing configuration information which configures the cell during normal operation, and third for allowing a user to capture the status of all signals on interconnect lines and shift these out of the chip to be examined by the user.