scispace - formally typeset
Search or ask a question

Showing papers in "Ibm Journal of Research and Development in 2000"


Journal ArticleDOI
TL;DR: Two simple, but representative, models of bistable devices are subjected to a more detailed analysis of switching kinetics to yield the relationship between speed and energy dissipation, and to estimate the effects of errors induced by thermal fluctuations.
Abstract: It is argued that computing machines inevitably involve devices which perform logical functions that do not have a single-valued inverse. This logical irreversibility is associated with physical irreversibility and requires a minimal heat generation, per machine cycle, typically of the order of kT for each irreversible function. This dissipation serves the purpose of standardizing signals and making them independent of their exact logical history. Two simple, but representative, models of bistable devices are subjected to a more detailed analysis of switching kinetics to yield the relationship between speed and energy dissipation, and to estimate the effects of errors induced by thermal fluctuations.

3,629 citations


Journal ArticleDOI
TL;DR: Enough work has been done to verify the fact that a computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program.
Abstract: Two machine-learning procedures have been investigated in some detail using the game of checkers. Enough work has been done to verify the fact that a computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program. Further-more, it can learn to do this in a remarkably short period of time (8 or 10 hours of machine-playing time) when given only the rules of the game, a sense of direction, and a redundant and incomplete list of parameters which are thought to have something to do with the game, but whose correct signs and relative weights are unknown and unspecified. The principles of machine learning verified by these experiments are, of course, applicable to many other situations.

1,191 citations


Journal ArticleDOI
TL;DR: An overview of the research effort on volume holographic digital data storage is presented, highlighting new insights gained in the design and operation of working storage platforms, novel optical components and techniques, data coding and signal processing algorithms, systems tradeoffs, materials testing and tradeoff, and photon-gated storage materials.
Abstract: We present an overview of our research effort on volume holographic digital data storage. Innovations, developments, and new insights gained in the design and operation of working storage platforms, novel optical components and techniques, data coding and signal processing algorithms, systems tradeoffs, materials testing and tradeoffs, and photon-gated storage materials are summarized.

773 citations


Journal ArticleDOI
TL;DR: In addition to data storage in polymers or other media, and not excluding magnetics, this work envision areas in nanoscale science and technology such as lithography, high-speed/large-scale imaging, molecular and atomic manipulation, and many others in which Millipede may open up new perspectives and opportunities.
Abstract: We report on a new atomic force microscope (AFM)-based data storage concept called the “Millipede” that has a potentially ultrahigh density, terabit capacity, small form factor, and high data rate. Its potential for ultrahigh storage density has been demonstrated by a new thermomechanical local-probe technique to store and read back data in very thin polymer films. With this new technique, 30–40-nm-sized bit indentations of similar pitch size have been made by a single cantilever/tip in a thin (50-nm) polymethylmethacrylate (PMMA) layer, resulting in a data storage density of 400–500 Gb/in. 2 High data rates are achieved by parallel operation of large two-dimensional (2D) AFM arrays that have been batch-fabricated by silicon surface-micromachining techniques. The very large scale integration (VLSI) of micro/nanomechanical devices (cantilevers/tips) on a single chip leads to the largest and densest 2D array of 32 × 32 (1024) AFM cantilevers with integrated write/read storage functionality ever built. Time-multiplexed electronics control the write/read storage cycles for parallel operation of the Millipede array chip. Initial areal densities of 100–200 Gb/in. 2 have been achieved with the 32 × 32 array chip, which has potential for further improvements. In addition to data storage in polymers or other media, and not excluding magnetics, we envision areas in nanoscale science and technology such as lithography, high-speed/large-scale imaging, molecular and atomic manipulation, and many others in which Millipede may open up new perspectives and opportunities.

607 citations


Journal ArticleDOI
Charles H. Bennett1
TL;DR: The history of the thermodynamics of information processing, beginning with the paradox of Maxwelrs demon, is reviewed, continuing through the efforts of Szilard, Brillouin, and others to demonstrate a thermodynamic cost of information acquisition, and a brief survey of recent work on quantum reversible computation is surveyed.
Abstract: We review the history of the thermodynamics of information processing, beginning with the paradox of Maxwelrs demon; continuing through the efforts of Szilard, Brillouin, and others to demonstrate a thermodynamic cost of information acquisition; the discovery by Landauer of the thermodynamic cost of information destruction; the development of the theory of and classical models for reversible computation; and ending with a brief survey of recent work on quantum reversible computation.

309 citations


Journal ArticleDOI
David A. Thompson1, J. S. Best1
TL;DR: The evolutionary path of magnetic data storage is reviewed and the physical phenomena that will prevent the use of those scaling processes which have served us in the past are examined, finding that the first problem will arise from the storage medium, whose grain size cannot be scaled much below a diameter of ten nanometers without thermal self-erasure.
Abstract: In this paper, we review the evolutionary path of magnetic data storage and examine the physical phenomena that will prevent us from continuing the use of those scaling processes which have served us in the past. It is concluded that the first problem will arise from the storage medium, whose grain size cannot be scaled much below a diameter of ten nanometers without thermal self-erasure. Other problems will involve head-to-disk spacings that approach atomic dimensions, and switching-speed limitations in the head and medium. It is likely that the rate of progress in areal density will decrease substantially as we develop drives with ten to a hundred times current areal densities. Beyond that, the future of magnetic storage technology is unclear. However, there are no alternative technologies which show promise for replacing hard disk storage in the next ten years.

303 citations


Journal ArticleDOI
TL;DR: The architecture* of the newly announced IBM System/360 features four innovations: An approach to storage which permits and exploits very large capacities, hierarchies of speeds, read-only storage for microprogram control, flexible storage protection, and simple program relocation.
Abstract: The architecture* of the newly announced IBM System/360 features four innovations: 1. An approach to storage which permits and exploits very large capacities, hierarchies of speeds, read-only storage for microprogram control, flexible storage protection, and simple program relocation. 2. A input/output system offering new degrees of concurrent operation, compatible channel operation, data rates approaching 5,000,000 characters/second, integrated design of hardware and software, a new low-cost, multiple-channel package sharing main-frame hardware, new previsions for device status information, and a standard channel interface between central processing unit and input/output devices. 3. A truly general-purpose machine organization offering new supervisory facilities, powerful logical processing operations, and a wide variety of data formats. 4. Strict upward and downward machine-language compatibility over a line of six models having a performance range factor of 50. This paper discusses in detail the objectives of the design and the rationale for the main features of the architecture. Emphasis is given to the problems raised by the need for compatibility among central processing units of various size and by the conflicting demands of commercial, scientific, real-time, and logical information processing. A tabular summary of the architecture is shown in the Appendices.

169 citations


Journal ArticleDOI
Enrico Clementi1
TL;DR: The present status of ab initio computations for atomic and molecular wave functions is analyzed in this paper, with special emphasis on the work done at the IBM Research Laboratory, San Jose.
Abstract: The present status of ab initio computations for atomic and molecular wave functions is analyzed in this paper, with special emphasis on the work done at the IBM Research Laboratory, San Jose. The Roothaan-Hartree-Fock method has been described in detail for atomic systems. A systematic tabulation of atomic Hartree-Fock functions has been made available in an extended supplement to this paper. Techniques for computing many-center, two-electron matrix elements have been discussed for Slater or Gaussian basis sets. It is concluded that the two possibilities are comparable in efficiency. We have advanced a few suggestions for the extension of the self-consistent field technique to macromolecules. The validity of the suggestions have not been tested. Following the Bethe and Salpeter formalism, the relativistic correction has been discussed and illustrated with numerical results for closed-shell atoms. A brief analysis of the relativistic correction for molecular systems shows that the relativistic effects cannot be neglected in ionic systems containing third-row atoms. The correlation energy is discussed from an experimental starting point. The relativistic and Hartree-Fock energies are used for determining the correlation energy for the elements of the first three periods of the atomic system. A preliminary analysis of the data brings about a "simple pairing" model. Data from the third period force us to consider the "simple pairing" model as a first-order approximation to the "complex pairing" model. The latter model is compared with the geminals method and limitations of the latter are pointed out. A semiempirical model, where use is made of a pseudopotential that represents a coulomb hole, is advanced and preliminary results are presented. This model gives reason to some hope for the practical formulation of a Coulomb-Hartree-Fock technique where the correlation effects are accounted for and the one-particle approximation is retained.

161 citations


Journal ArticleDOI
Thomas N. Theis1
TL;DR: A simple model is developed and used to estimate future wiring requirements and to examine the value of further innovations in materials and architecture, and it is found that wiring need not be a performance limiter for at least another decade.
Abstract: Continuing advances in interconnection technology are seen as essential to continued improvements in integrated circuit performance. The recent introduction of copper metallization, dual-damascene processing, and fully articulated hierarchical wiring structures, along with the imminent introduction of low-dielectric-constant insulating materials, indicates an accelerating pace of innovation. Nevertheless, some authors have argued that such innovations will sustain chip-level performance improvements for only another generation or two. In light of this pessimism, current trends and probable paths in the future evolution of interconnection technology are reviewed. A simple model is developed and used to estimate future wiring requirements and to examine the value of further innovations in materials and architecture. As long as current trends continue, with memory arrays filling an increasing fraction of the total area of high-performance microprocessor chips, wiring need not be a performance limiter for at least another decade. Alternative approaches, such as optical interconnections on chip, have little to offer while the incremental elaboration of the traditional wiring systems is still rapidly advancing.

152 citations


Journal ArticleDOI
John A. Darringer1, Daniel Brand1, Jonh V. Gerbi1, William H. Joyner1, Louise H. Trevillyan1 
TL;DR: The evolution of the Logic Synthesis System is described from an experimental tool to a production system for the synthesis of masterslice chip implementations and the primary reasons for this success are the use of local transformations to simplify logic representations at several levels of abstraction.
Abstract: For some time we have been exploring methods of transforming functional specifications into hardware implementations that are suitable for production. The complexity of this task and the potential value have continued to grow with the increasing complexity of processor design and the mounting pressure to shorten machine design times. This paper describes the evolution of the Logic Synthesis System from an experimental tool to a production system for the synthesis of masterslice chip implementations. The system was used by one project in IBM Poughkeepsie to produce 90 percent of its more than one hundred chip parts. The primary reasons for this success are the use of local transformations to simplify logic representations at several levels of abstraction, and a highly cooperative effort between logic designers and synthesis system designers to understand the logic design process practiced in Poughkeepsie and to incorporate this knowledge into the synthesis system.

149 citations


Journal ArticleDOI
TL;DR: A hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by about 20% for large square matrices and up to almost a factor of 3 for tall thin matrices is introduced.
Abstract: We present new recursive serial and parallel algorithms for QR factorization of an m by n matrix. They improve performance. The recursion leads to an automatic variable blocking, and it also replaces a Level 2 part in a standard block algorithm with Level 3 operations. However, there are significant additional costs for creating and performing the updates, which prohibit the efficient use of the recursion for large n. We present a quantitative analysis of these extra costs. This analysis leads us to introduce a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by about 20% for large square matrices and up to almost a factor of 3 for tall thin matrices. Uniprocessor performance results are presented for two IBM RS/6000® SP nodes-a 120-MHz IBM POWER2 node and one processor of a four-way 332-MHz IBM PowerPC® 604e SMP node. The hybrid recursive algorithm reaches more than 90% of the theoretical peak performance of the POWER2 node. Compared to standard block algorithms, the recursive approach also shows a significant advantage in the automatic tuning obtained from its automatic variable blocking. A successful parallel implementation on a four-way 332-MHz IBM PPC604e SMP node based on dynamic load balancing is presented. For two, three, and four processors it shows speedups of up to 1.97, 2.99, and 3.97.

Journal ArticleDOI
TL;DR: The most significant of these is the use of coarse-grained multithreading to enable the processor to perform useful instructions during cache misses, which provides a significant throughput increase while adding less than 5% to the chip area and having very little impact on cycle time.
Abstract: This paper describes the microarchitecture of the RS64 IV, a multithreaded PowerPC® processor, and its memory system. Because this processor is used only in IBM iSeries™ and pSeries™ commercial servers, it is optimized solely for commercial server workloads. Increasing miss rates because of trends in commercial server applications and increasing latency of cache misses because of rapidly increasing clock frequency are having a compounding effect on the portion of execution time that is wasted on cache misses. As a result, several optimizations are included in the processor design to address this problem. The most significant of these is the use of coarse-grained multithreading to enable the processor to perform useful instructions during cache misses. This provides a significant throughput increase while adding less than 5% to the chip area and having very little impact on cycle time. When compared with other performance-improvement techniques, multithreading yields an excellent ratio of performance gain to implementation cost. Second, the miss rate of the L2 cache is reduced by making it four-way associative. Third, the latency of cache-to-cache movement of data is minimized. Fourth, the size of the L1 caches is relatively large. In addition to addressing cache misses, pipeline "holes" caused by branches are minimized with large instruction buffers, large L1 I-cache fetch bandwidth, and optimized resolution of the branch direction. In part, the branches are resolved quickly because of the short but efficient pipeline. To minimize pipeline holes due to data dependencies, the L1 D-cache access is optimized to yield a one-cycle load-to-use penalty.

Journal ArticleDOI
Rudolf M. Tromp1
TL;DR: The principles of LEEM and its application to problems in science and technology are discussed and an unprecedented amount of information that is amenable to detailed, quantitative analysis is discussed.
Abstract: Low-energy electron microscopy (LEEM) is a relatively new microscopy technique, capable of high-resolution (5 nm) video-rate imaging of surfaces and interfaces. This opens up the possibility of studying dynamic processes at surfaces, such as thin-film growth, strain relief, etching and adsorption, and phase transitions in real time, in situ, as they occur. The resulting video movies contain an unprecedented amount of information that is amenable to detailed, quantitative analysis. In this paper we discuss the principles of LEEM and its application to problems in science and technology.

Journal ArticleDOI
C.H. Stapper1
TL;DR: It is shown that, although part of the yield losses are due to the clustering of defects, most product loss is from random failures, and the yield model shows good agreement with actual product yields.
Abstract: This paper describes an analytical technique for quantifying and modeling the frequency of occurrence of integrated circuit failures. The method is based on the analysis of random and clustered defects on wafers with defect monitors. Results from pilot line data of photolithographic defects, insulator short circuits, and leaky pn junctions are presented to support the practicality of the approach. It is shown that, although part of the yield losses are due to the clustering of defects, most product loss is from random failures. The yield model shows good agreement with actual product yields.

Journal ArticleDOI
TL;DR: In this paper, a newly developed optical method for noninvasively measuring the switching activity of operating CMOS integrated circuit chips is described, which can be used to characterize the gate-level performance of such chips and identify the locations and nature of their operational faults.
Abstract: A newly developed optical method for noninvasively measuring the switching activity of operating CMOS integrated circuit chips is described. The method, denoted as picosecond imaging circuit analysis (PICA) can be used to characterize the gate-level performance of such chips and identify the locations and nature of their operational faults. The principles underlying PICA and examples of its use are discussed.

Journal ArticleDOI
TL;DR: The principles of this new spectro-microscopy approach are reviewed and selected applications to the study of materials of interest in information technology are presented.
Abstract: The detailed understanding of complex materials used in information technology requires the use of state-of-the-art experimental techniques that provide information on the electronic and magnetic properties of the materials. The increasing miniaturization of components furthermore demands the use of techniques with spatial resolution down to the nanometer range. A means to satisfy both requirements is to combine the capabilities of conventional X-ray absorption spectroscopy with those of electron microscopy in a new technique designated as X-ray photoemission electron microscopy. This paper reviews the principles of this new spectro-microscopy approach and presents selected applications to the study of materials of interest in information technology.

Journal ArticleDOI
R. D. Isaac1
TL;DR: Challenges in lithography, transistor scaling, interconnections, circuit families, computer memory, and circuit design are outlined and ways in which these challenges will affect future growth in the industry are considered.
Abstract: The performance of integrated circuits has been improving exponentially for more than thirty years. During the next decade, the industry must overcome several technological challenges to sustain this remarkable pace of improvement. Challenges in lithography, transistor scaling, interconnections, circuit families, computer memory, and circuit design are outlined. Possible solutions are briefly discussed. The ways in which these challenges will affect future growth in the industry are considered.

Journal ArticleDOI
Frances M. Ross1
TL;DR: The use of in situ microscopy is described for the observation of reactions in silicides and the formation of semiconductor "quantum dots" to understand reaction mechanisms and to suggest improvements to growth and processing techniques.
Abstract: In situ transmission electron microscopy allows us to study growth processes and phase transitions which are important in semiconductor processing. It provides a unique view of dynamic reactions as they occur. In this paper we describe the use of in situ microscopy for the observation of reactions in silicides and the formation of semiconductor "quantum dots." The dynamic information obtained from these experiments enables us to understand reaction mechanisms and to suggest improvements to growth and processing techniques. We conclude with a discussion of the use of in situ microscopy for studying reactions such as electrodeposition which occur at liquid/solid interfaces.

Journal ArticleDOI
Rolf Allenspach1
TL;DR: A review is presented of a powerful technique for studying magnetic microstructures: spin-polarized scanning electron microscopy, denoted as spin-SEM, or SEMPA, which describes the main features of the technique, such as its very high surface sensitivity, its suitability for achieving complete separation of relevant magnetic and topographic information, and its high lateral resolution.
Abstract: In this paper, a review is presented of a powerful technique for studying magnetic microstructures: spin-polarized scanning electron microscopy, denoted as spin-SEM, or SEMPA. When the beam of a scanning electron microscope traverses a ferromagnetic sample, secondary electrons are emitted whose spin polarization contains information on the magnitude and direction of the magnetization of the surface. Various illustrative examples are presented which describe the main features of the technique, such as its very high surface sensitivity, its suitability for achieving complete separation of relevant magnetic and topographic information, and its high lateral resolution.

Journal ArticleDOI
TL;DR: Techniques for evaluating the performance of each of these key contributors so as to optimize the overall performance and cost/performance of commercial servers are presented.
Abstract: This paper discusses a methodology for analyzing and optimizing the performance of commercial servers. Commercial server workloads are shown to have unique characteristics which expand the elements that must be optimized to achieve good performance and require a unique performance methodology. The steps in the process of server performance optimization are described and include the following: 1. Selection of representative commercial workloads and identification of key characteristics to be evaluated. 2. Collection of performance data. Various instrumentation techniques are discussed in light of the requirements placed by commercial server workloads on the instrumentation. 3. Creation of input data for performance models on the basis of measured workload information. This step in the methodology must overcome the operating environment differences between the instance of the measured system under test and the target system design to be modeled. 4. Creation of performance models. Two general types are described: high-level models and detailed cycle-accurate simulators. These types are applied to model the processor, memory, and I/O system. 5. System performance optimization. The tuning of the operating system and application software is described. Optimization of performance among commercial applications is not simply an exercise in using traces to maximize the processor MIPS. Equally significant are items such as the use of probabilities to reflect future workload characteristics, software tuning, cache miss rate optimization, memory management, and I/O performance. The paper presents techniques for evaluating the performance of each of these key contributors so as to optimize the overall performance and cost/performance of commercial servers.

Journal ArticleDOI
Bernard S. Meyerson1
TL;DR: This paper reviews the application-driven origins of silicon: germanium (Si:Ge) heterojunction bipolar transistors, how it has evolved, and how limits to conventional silicon bipolar scaling have enhanced its adoption in the semiconductor industry.
Abstract: The need to serve the explosion in data bandwidth demand for fixed and mobile applications has driven transistor performance requirements beyond the reach of conventional silicon devices. Scaling limits of silicon-based bipolar transistors have been encountered, confining further performance gains by traditional means, but cost considerations favor the continued use of silicon-derived technology solutions. Silicon: germanium (Si:Ge) heterojunction bipolar transistors (HBTs) and subsequent generations of highly integrated SiGe BiCMOS processes stem from long-term efforts initiated at IBM to develop such a silicon-derived technology. This paper reviews the application-driven origins of this SiGe technology, how it has evolved, and how limits to conventional silicon bipolar scaling have enhanced its adoption in the semiconductor industry. Examples of the entry of this technology into commercial applications in the wired and wireless marketplace are discussed.

Journal ArticleDOI
TL;DR: Measured nucleotide levels and estimates of central carbon metabolic fluxes point to UTP depletion as the cause of decreased UDP-GNAc during glucose limitation, and it is confirmed that UDP-sugar concentrations are correlated with UTP levels in the absence of glutamine limitation.
Abstract: Asparagine linked (N-linked) glycosylation is an important modification of recombinant proteins, because the attached oligosaccharide chains can significantly alter protein properties. Potential glycosylation sites are not always occupied with oligosaccharide, and site occupancy can change with the culture environment. To investigate the relationship between metabolism and glycosylation site occupancy, we studied the glycosylation of recombinant human interferon-γ(IFN-γ) produced in continuous culture of Chinese hamster ovary cells. Intracellular nucleotide sugar levels and IFN-γ glycosylation were measured at different steady states which were characterized by central carbon metabolic fluxes estimated by material balances and extracellular metabolite rate measurements. Although site occupancy varied over a rather narrow range, we found that differences correlated with the intracellular pool of UDP-N-acetylglucosamine + UDP-N-acetylgalactosamine (UDP-GNAc). Measured nucleotide levels and estimates of central carbon metabolic fluxes point to UTP depletion as the cause of decreased UDP-GNAc during glucose limitation. Glucose limited cells preferentially utilized available carbon for energy production, causing reduced nucleotide biosynthesis. Lower nucleoside triphosphate pools in turn led to lower nucleotide sugar pools and reduced glycosylation site occupancy. Subsequent experiments in batch and fed-batch culture have confirmed that UDP-sugar concentrations are correlated with UTP levels in the absence of glutamine limitation. Glutamine limitation appears to influence glycosylation by reducing amino sugar formation and hence UDP-GNAc concentration. The influence of nucleotide sugars on site occupancy may only be important during periods of extreme starvation, since relatively large changes in nucleotide sugar pools led to only minor changes in glycosylation. © 1999 John Wiley & Sons, Inc. Biotechnol Bioeng 62: 336-347, 1999.

Journal ArticleDOI
TL;DR: The microarchitectural features of the POWER3 processor, particularly those which are unique or significant to the performance of the chip, such as the data prefetch engine, nonblocking and interleaved data cache, and dual multiply-add-fused floating-point execution units are described.
Abstract: The POWER3 processor is a high-performance microprocessor which excels at technical computing. Designed by IBM and deployed in various IBM RS/6000® systems, the superscalar RISC POWER3 processor boasts many advanced features which give it exceptional performance on challenging applications from the workstation to the supercomputer level. In this paper, we describe the microarchitectural features of the POWER3 processor, particularly those which are unique or significant to the performance of the chip, such as the data prefetch engine, nonblocking and interleaved data cache, and dual multiply-add-fused floating-point execution units. Additionally, the performance of specific instruction sequences and kernels is described to quantify and further illuminate the performance attributes of the POWER3 processor.

Journal ArticleDOI
TL;DR: In future microprocessor designs the floorplan and wire plan will be as important as the microarchitecture, more control logic will be structured and become indistinguishable from dataflow elements, and more circuits will be designed and analyzed at the level of single transistors and wires.
Abstract: This paper presents a survey of some of the most aggressive custom designs for CMOS processor products and prototypes in IBM. We argue that microprocessor performance growth, which has traditionally been driven primarily by CMOS technology and microarchitectural improvements, can receive a substantial contribution from improvements in circuit design and physical organization. We predict that in future microprocessor designs the floorplan and wire plan will be as important as the microarchitecture, more control logic will be structured and become indistinguishable from dataflow elements, and more circuits will be designed and analyzed at the level of single transistors and wires.

Journal ArticleDOI
M. Copel1
TL;DR: This paper reviews the application of medium-energy ion scattering to the study of materials problems relevant to microelectronics fabrication and reliability and three examples of MEIS studies are discussed in detail.
Abstract: This paper reviews the application of medium-energy ion scattering (MEIS) to the study of materials problems relevant to microelectronics fabrication and reliability. Associated physical mechanisms and techniques are described. Three examples of MEIS studies are discussed in detail: Studies of the nucleation of silicon nitride on silicon dioxide, the interfacial segregation of Cu from Al(Cu), and the structure of hydrogen-terminated silicon surfaces created by various wet etching techniques.

Journal ArticleDOI
TL;DR: A novel practical algorithm for Cholesky factorization when the matrix is stored in packed format is presented by combining blocking and recursion and it is shown that blocking combined with recursion reduces all overheads to a tiny, acceptable level.
Abstract: We present a novel practical algorithm for Cholesky factorization when the matrix is stored in packed format by combining blocking and recursion. The algorithm simultaneously obtains Level 3 performance, conserves about half the storage, and avoids the production of Level 3 BLAS for packed format. We use recursive packed format, which was first described by Andersen et al. [1]. Our algorithm uses only DGEMM and Level 3 kernel routines; it first transforms standard packed format to packed recursive lower row format. Our new algorithm outperforms the Level 3 LAPACK routine DPOTRF even when we include the cost of data transformation. (This is true for three IBM platforms--the POWER3, the POWER2, and the PowerPC 604e.) For large matrices, blocking is not required for acceptable Level 3 performance. However, for small matrices the overhead of pure recursion and/or data transformation is too high. We analyze these costs analytically and provide detailed cost estimates. We show that blocking combined with recursion reduces all overheads to a tiny, acceptable level. However, a new problem of nonlinear addressing arises. We use two-dimensional mappings (tables) or data copying to overcome the high costs of directly computing addresses that are nonlinear functions of i and j.

Journal ArticleDOI
Robert L. Wisnieff1, John J. Ritsko1
TL;DR: The principal channel of interactive communication from a computer to a person is an electronic display, and the amount of information shown and the way in which it can be exhibited depend on successfully matching the capabilities of the display to the human visual system.
Abstract: The principal channel of interactive communication from a computer to a person is an electronic display. The amount of information shown and the way in which it can be exhibited depend on successfully matching the capabilities of the display to the human visual system. Making this channel as wide, as fast, and as effective as possible has been the goal of electronic display development for the last fifty years. The cathode ray tube (CRT), which has been the dominant display device used in offices and homes, is the display device on which the personal computer and the graphical user interface were developed. Today, the capabilities of information technology are brought to new environments by new display technologies. Active-matrix liquid crystal displays (AMLCDs) have freed the personal computer from the desktop, projection displays bring the power of information technology into meetings, small liquid crystal displays have allowed the development of hand-held computers, and head-mounted displays are bringing wearable computer technology onto the factory and warehouse floor.

Journal ArticleDOI
TL;DR: From this study, inclusion of a monolithic package reduces crosstalk in planar microstrip lines by evaluating micromachined packages as a means to reduce coupling and offers the requisite electrical and environmental protection in addition to shielding of individual elements from parasitic radiation.
Abstract: High-frequency planar circuits experience large electromagnetic (EM) coupling in dense circuit environments. As a result, individual components exhibit performance degradation that ultimately limits overall circuit response. This paper addresses crosstalk in planar microstrip lines by evaluating micromachined packages as a means to reduce coupling. Microstrip lines with straight and meandering paths can exhibit crosstalk coupling as high as -20 dB (i.e., when placed in a side-by-side arrangement). From our study, inclusion of a monolithic package reduces this effect by as much as -30 dB and, consequently, offers the requisite electrical and environmental protection in addition to shielding of individual elements from parasitic radiation. Presented herein is the development of the micromachined package for microstrip geometries. Included in the discussion are crosstalk effects between straight and bending geometries in open and packaged configurations and an evaluation of package noise characteristics. A packaged antenna element is also included as a demonstration of the potential use of micromachined packaging in array applications.

Journal ArticleDOI
TL;DR: Designs and characteristics of experimental devices of 500 and 1000 A gate insulator thicknesses are presented, with particular attention to the effects of source-drain spacing.
Abstract: An n-channel insulated-gate field-effect transistor technology established at IBM Research has served as the basis for further development leading to FET memory. Designs and characteristics of experimental devices of 500 and 1000 A gate insulator thicknesses are presented, with particular attention to the effects of source-drain spacing.

Journal ArticleDOI
Jean Jordan-Sweet1
TL;DR: The capabilities of the IBM/MIT X-ray beamlines at the National Synchrotron Light Source (NSLS), Brookhaven National Laboratory (BNL) are described and a range of techniques are introduced, and examples of their applicability to the study of microelectronics-related materials phenomena are described.
Abstract: X-ray diffraction techniques using synchrotron radiation play a vital role in the understanding of structural behavior for a wide range of materials important in microelectronics. The extremely high flux of X-rays produced by synchrotron storage rings makes it possible to probe layers and interfaces in complicated stacked structures, characterize low-atomic-weight materials such as polymers, and study in situ phase transformations, to name only a few applications. In this paper, following an introduction to synchrotron radiation, we describe the capabilities of the IBM/MIT X-ray beamlines at the National Synchrotron Light Source (NSLS), Brookhaven National Laboratory (BNL). A range of techniques are introduced, and examples of their applicability to the study of microelectronics-related materials phenomena are described.