scispace - formally typeset
Search or ask a question

Showing papers in "Ibm Journal of Research and Development in 2008"


Journal ArticleDOI
TL;DR: This work discusses the critical aspects that may affect the scaling of PCRAM, including materials properties, power consumption during programming and read operations, thermal cross-talk between memory cells, and failure mechanisms, and discusses experiments that directly address the scaling properties of the phase-change materials themselves.
Abstract: Nonvolatile RAM using resistance contrast in phase-change materials [or phase-change RAM (PCRAM)] is a promising technology for future storage-class memory. However, such a technology can succeed only if it can scale smaller in size, given the increasingly tiny memory cells that are projected for future technology nodes (i.e., generations). We first discuss the critical aspects that may affect the scaling of PCRAM, including materials properties, power consumption during programming and read operations, thermal cross-talk between memory cells, and failure mechanisms. We then discuss experiments that directly address the scaling properties of the phase-change materials themselves, including studies of phase transitions in both nanoparticles and ultrathin films as a function of particle size and film thickness. This work in materials directly motivated the successful creation of a series of prototype PCRAM devices, which have been fabricated and tested at phase-change material cross-sections with extremely small dimensions as low as 3 nm × 20 nm. These device measurements provide a clear demonstration of the excellent scaling potential offered by this technology, and they are also consistent with the scaling behavior predicted by extensive device simulations. Finally, we discuss issues of device integration and cell design, manufacturability, and reliability.

1,018 citations


Journal ArticleDOI
Geoffrey W. Burr1, B. N. Kurdi1, J. C. Scott1, Chung H. Lam1, Kailash Gopalakrishnan1, R. S. Shenoy1 
TL;DR: In this article, the authors review the candidate solid-state nonvolatile memory technologies that potentially could be used to construct a storage-class memory (SCM) and compare the potential for practical scaling to ultrahigh effective areal density for each of these candidate technologies.
Abstract: Storage-class memory (SCM) combines the benefits of a solid-state memory, such as high performance and robustness, with the archival capabilities and low cost of conventional hard-disk magnetic storage. Such a device would require a solid-state nonvolatile memory technology that could be manufactured at an extremely high effective areal density using some combination of sublithographic patterning techniques, multiple bits per cell, and multiple layers of devices. We review the candidate solid-state nonvolatile memory technologies that potentially could be used to construct such an SCM. We discuss evolutionary extensions of conventional flash memory, such as SONOS (silicon-oxide-nitride-oxide-silicon) and nanotraps, as well as a number of revolutionary new memory technologies. We review the capabilities of ferroelectric, magnetic, phase-change, and resistive random-access memories, including perovskites and solid electrolytes, and finally organic and polymeric memory. The potential for practical scaling to ultrahigh effective areal density for each of these candidate technologies is then compared.

659 citations


Journal ArticleDOI
TL;DR: Using SCM as a disk drive replacement, storage system products will have random and sequential I/O performance that is orders of magnitude better than that of comparable disk-based systems and require much less space and power in the data center.
Abstract: The dream of replacing rotating mechanical storage, the disk drive, with solid-state, nonvolatile RAM may become a reality in the near future. Approximately ten new technologies--collectively called storage-class memory (SCM)--are currently under development and promise to be fast, inexpensive, and power efficient. Using SCM as a disk drive replacement, storage system products will have random and sequential I/O performance that is orders of magnitude better than that of comparable disk-based systems and require much less space and power in the data center. In this paper, we extrapolate disk and SCM technology trends to 2020 and analyze the impact on storage systems. The result is a 100- to 1,O00-fold advantage for SCM in terms of the data center space and power required.

518 citations


Journal ArticleDOI
TL;DR: The architecture of Qbox, a parallel, scalable first-principles molecular dynamics (FPMD) code, is described, a C++/ Message Passing Interface implementation of FPMD based on the plane-wave, pseudopotential method for electronic structure calculations.
Abstract: We describe the architecture of Qbox, a parallel, scalable first-principles molecular dynamics (FPMD) code. Qbox is a C++/ Message Passing Interface implementation of FPMD based on the plane-wave, pseudopotential method for electronic structure calculations. It is built upon well-optimized parallel numerical libraries, such as Basic Linear Algebra Communication Subprograms (BLACS) and Scalable Linear Algebra Package (ScaLAPACK), and also features an Extensible Markup Language (XML) interface built on the Apache Xerces-C library. We describe various choices made in the design of Qbox that led to excellent scalability on large parallel computers. In particular, we discuss the case of the IBM Blue Gene/L™ platform on which Qbox was run using up to 65,536 nodes. Future design challenges for upcoming petascale computers are also discussed. Examples of applications of Qbox to a variety of first-principles simulations of solid, liquids, and nanostructures are briefly described.

217 citations


Journal ArticleDOI
TL;DR: This is the first time a scanning-probe recording technology has reached this level of technical maturity, demonstrating the joint operation of all building blocks of a storage device.
Abstract: Ultrahigh storage densities can be achieved by using a thermomechanical scanning-probe-based data-storage approach to write, read back, and erase data in very thin polymer films. High data rates are achieved by parallel operation of large two-dimensional arrays of cantilevers that can be batch fabricated by silicon-surface micromachining techniques. The very high precision required to navigate the storage medium relative to the array of probes is achieved by microelectromechanical system (MEMS)- based x and y actuators. The ultrahigh storage densities offered by probe-storage devices pose a significant challenge in terms of both control design for nanoscale positioning and read-channel design for reliable signal detection. Moreover, the high parallelism necessitates new dataflow architectures to ensure high performance and reliability of the system. In this paper, we present a small-scale prototype system of a storage device that we built based on scanning-probe technology. Experimental results of multiple sectors, recorded using multiple levers at 840 Gb/in2 and read back without errors, demonstrate the functionality of the prototype system. This is the first time a scanning-probe recording technology has reached this level of technical maturity, demonstrating the joint operation of all building blocks of a storage device.

139 citations


Journal ArticleDOI
TL;DR: An overview of a neuronal network model of layers II/III of the neocortex built with biophysical model neurons, which is one of the largest simulations of this type ever performed on an IBM Blue Gene/L supercomputer.
Abstract: Biologically detailed large-scale models of the brain can now be simulated thanks to increasingly powerful massively parallel supercomputers. We present an overview, for the general technical reader, of a neuronal network model of layers II/III of the neocortex built with biophysical model neurons. These simulations, carried out on an IBM Blue Gene/L™ supercomputer, comprise up to 22 million neurons and 11 billion synapses, which makes them the largest simulations of this type ever performed. Such model sizes correspond to the cortex of a small mammal. The SPLIT library, used for these simulations, runs on single-processor as well as massively parallel machines. Performance measurements show good scaling behavior on the Blue Gene/L supercomputer up to 8,192 processors. Several key phenomena seen in the living brain appear as emergent phenomena in the simulations. We discuss the role of this kind of model in neuroscience and note that full-scale models may be necessary to preserve natural dynamics. We also discuss the need for software tools for the specification of models as well as for analysis and visualization of output data. Combining models that range from abstract connectionist type to biophysically detailed will help us unravel the basic principles underlying neocortical function.

122 citations


Journal ArticleDOI
TL;DR: The error detection and correction capability of the IBM POWER6™ processor enables high tolerance to single-event upsets and the soft-error resilience was tested with proton beam- and neutron beam-induced fault injection.
Abstract: The error detection and correction capability of the IBM POWER6™ processor enables high tolerance to single-event upsets. The soft-error resilience was tested with proton beam- and neutron beam-induced fault injection. Additionally, statistical fault injection was performed on a hardware-emulated POWER6 processor simulation model. The error resiliency is described in terms of the proportion of latch upset events that result in vanished errors, corrected errors, checkstops, and incorrect architected states.

116 citations


Journal ArticleDOI
TL;DR: In this paper, possible areas that need development in order to overcome some of the size-scaling challenges of flash memories are discussed.
Abstract: Flash memory grew from a simple concept in the early 1980s to a technology that generated close to $23 billion in worldwide revenue in 2007, and this represents one of the many success stories in the semiconductor industry. This success was made possible by the continuous innovation of the industry along many different fronts. In this paper, the history, the basic science, and the successes of fash memories are briefly presented. Flash memories have followed the Moore's Law scaling trend for which finer line widths, achieved by improved lithographic resolution, enable more memory bits to be produced for the same silicon area, reducing cost per bit. When looking toward the future, significant challenges exist to the continued scaling of flash memories. In this paper, I discuss possible areas that need development in order to overcome some of the size-scaling challenges. Innovations are expected to continue in the industry, and flash memories will continue to follow the historical trend in cost reduction of semiconductor memories through the rest of this decade.

98 citations


Journal ArticleDOI
TL;DR: Various compiler optimization techniques that use single-instruction, multiple-data (SIMD) instructions to obtain good sequential performance with NAMD on the embedded IBM PowerPC® 440 processor core are presented.
Abstract: NAMD (nanoscale molecular dynamics) is a production molecular dynamics (MD) application for biomolecular simulations that include assemblages of proteins, cell membranes, and water molecules. In a biomolecular simulation, the problem size is fixed and a large number of iterations must be executed in order to understand interesting biological phenomena. Hence, we need MD applications to scale to thousands of processors, even though the individual timestep on one processor is quite small. NAMD has demonstrated its performance on several parallel computer architectures. In this paper, we present various compiler optimization techniques that use single-instruction, multiple-data (SIMD) instructions to obtain good sequential performance with NAMD on the embedded IBM PowerPC® 440 processor core. We also present several techniques to scale the NAMD application to 20,480 nodes of the IBM Blue Gene/L™ (BG/L) system. These techniques include topology-specific optimizations to localize communication, new messaging protocols that are optimized for the BG/L torus, topology-aware load balancing, and overlap of computation and communication. We also present performance results of various molecular systems with sizes ranging from 5,570 to 327,506 atoms.

98 citations


Journal ArticleDOI
TL;DR: Results of x-ray absorption experiments prove that the microscopic origin of the electroforming, that is, the insulator-to-metal transition, is the creation of oxygen vacancies in Cr-doped SrTiO3 memory cells.
Abstract: We provide a status report on the development of perovskite-based transition-metal-oxide resistance-change memories. We focus on bipolar resistance switching observed in Cr-doped SrTiO3 memory cells with dimensions ranging from bulk single crystals to CMOS integrated nanoscale devices. We also discuss electronic and ionic processes during electroforming and resistance switching, as evidenced from electron-parametric resonance (EPR), x-ray absorption spectroscopy, electroluminescence spectroscopy, thermal imaging, and transport experiments. EPR in combination with electroluminescence reveals electron trapping and detrapping processes at the Cr site. Results of x-ray absorption experiments prove that the microscopic origin of the electroforming, that is, the insulator-to-metal transition, is the creation of oxygen vacancies. Cr-doped SrTiO3 memory cells exhibit short programming times (≤100 ns) and low programming currents (<100 µA) with up to 105 write and erase cycles.

84 citations


Journal ArticleDOI
TL;DR: It is suggested that widespread adoption of an open dataset model of high-performance computing is likely to result in significant advantages for the scientific computing community, in much the same way that the widespread adopted of open-source software has produced similar gains over the last 10 years.
Abstract: Understanding the nature of turbulent flows remains one of the outstanding questions in classical physics. Significant progress has been recently made using computer simulation as an aid to our understanding of the rich physics of turbulence. Here, we present both the computer science and the scientific features of a unique terascale simulation of a weakly compressible turbulent flow that includes tracer particles. (Terascale refers to performance and dataset storage use in excess of a teraflop and terabyte, respectively.) The simulation was performed on the Lawrence Livermore National Laboratory IBM Blue Gene/L™ system, using version 3 of the FLASH application framework. FLASH3 is a modular, publicly available code designed primarily for astrophysical simulations, which scales well to massively parallel environments. We discuss issues related to the analysis and visualization of such a massive simulation and present initial scientific results. We, also discuss challenges related to making the database available for public release. We suggest that widespread adoption ofan open dataset model of high-performance computing is likely to result in significant advantages for the scientific computing community, in much the same way that the widespread adoption of open-source software has produced similar gains over the last 10 years.

Journal ArticleDOI
TL;DR: The causes of UDEs and their effects on data integrity are discussed, some of the basic techniques that have been applied to address this problem at various software layers in the I/O stack are described and a family of solutions that can be integrated into the RAID subsystem are described.
Abstract: Though remarkably reliable, disk drives do fail occasionally. Most failures can be detected immediately; moreover, such, failures can be modeled and addressed using technologies such as RAID (Redundant Arrays of Independent Disks). Unfortunately, disk drives can experience errors that are undetected by the drive-- which we refer to as undetected disk errors (UDEs). These errors can cause silent data corruption that may go completely undetected (until a system or application malfunction) or may be detected by software in the storage I/O stack. Continual increases in disk densities or in storage array sizes and more significantly the introduction of desktop-class drives in enterprise storage systems are increasing the likelihood of UDEs in a given system. Therefore, the incorporation of UDE detection (and correction) into storage systems is necessary to prevent increasing numbers of data corruption and data loss events. In this paper, we discuss the causes of UDEs and their effects on data integrity. We describe some of the basic techniques that have been applied to address this problem at various software layers in the I/O stack and describe a family of solutions that can be integrated into the RAID subsystem.

Journal ArticleDOI
TL;DR: This paper presents a system and methodology for model-driven discovery of end-to-end application-data relationships spanning multiple tiers, from the applications to the lowest levels of the storage hierarchy.
Abstract: Modern business information systems are typically multitiered distributed systems comprising Web services, application services, databases, enterprise information systems, file systems, storage controllers, and other storage systems. In such environments, data is stored in different forms at multiple tiers, with each tier associated with some level of data abstraction. An information entity owned by an application generally maps to several data entities, logically associated across tiers and related to the application. Discovery of such relationships in a distributed system is a challenging problem, complicated by the widespread adoption of virtualization technologies and by the traditional tendency to manage each tier as an independent domain. In this paper, we present a system and methodology for model-driven discovery of end-to-end application-data relationships spanning multiple tiers, from the applications to the lowest levels of the storage hierarchy. The key to our methodology involves modeling how data is used and transformed by distributed software components. An important benefit of our system, which we call Galapagos, is the ability to reflect business decisions expressed at the application level to the level of storage.

Journal ArticleDOI
TL;DR: It is argued that controlling the head-tape interaction is key to achieving high linear density, whereas track-following and reel-to-reel servomechanisms as well as transverse dimensional stability are key for achieving high track density.
Abstract: We examine the issue of scaling magnetic tape-recording to higher areal densities, focusing on the challenges of achieving 100 Gb/in2 in the linear tape format. The current highest achieved areal density demonstrations of 6.7 Gb/in2 in the linear tape and 23.0 Gb/in2 in the helical scan format provide a reference for this assessment. We argue that controlling the head-tape interaction is key to achieving high linear density, whereas track-following and reel-to-reel servomechanisms as well as transverse dimensional stability are key for achieving high track density. We envision that advancements in media, data-detection techniques, reel-to-reel control, and lateral motion control will enable much higher areal densities. An achievable goal is a linear density of 800 Kb/in and a track pitch of 0.2 µm, resulting in an areal density of 100 Gb/in2.

Journal ArticleDOI
TL;DR: Algorithmic and IBM Blue Gene/L™ system-specific optimizations are employed to scale the CPAIMD method to at least 30 times the number of electronic states in small systems consisting of 24 to 768 atoms in order to demonstrate fine-grained parallelism.
Abstract: Important scientific problems can be treated via ab initio-based molecular modeling approaches, wherein atomic forces are derived from an energy Junction that explicitly considers the electrons. The Car-Parrinello ab initio molecular dynamics (CPAIMD) method is widely used to study small systems containing on the order of 10 to 103 atoms. However, the impact of CPAIMD has been limited until recently because of difficulties inherent to scaling the technique beyond processor numbers about equal to the number of electronic states. CPAIMD computations involve a large number of interdependent phases with high interprocessor communication overhead. These phases require the evaluation of various transforms and non-square matrix multiplications that require large interprocessor data movement when efficiently parallelized. Using the Charm++ parallel programming language and runtime system, the phases are discretized into a large number of virtual processors, which are, in turn, mapped flexibly onto physical processors, thereby allowing interleaving of work. Algorithmic and IBM Blue Gene/L™ system-specific optimizations are employed to scale the CPAIMD method to at least 30 times the number of electronic states in small systems consisting of 24 to 768 atoms (32 to 1,024 electronic states) in order to demonstrate fine-grained parallelism. The largest systems studied scaled well across the entire machine (20,480 nodes).

Journal ArticleDOI
TL;DR: The current status of single-event upsets caused by alpha-particles in IBM circuits and technology is reviewed to assess the importance of this issue for microprocessors requiring both high performance and high reliability.
Abstract: In this paper, we review the current status of single-event upsets caused by alpha-particles in IBM circuits and technology. While both alpha-particles and cosmic radiation can induce upsets, the alpha-particle-induced upset rate has become an increasingly important issue because alpha-particle-induced upsets are no longer limited to memory circuits. Latch circuits have become highly sensitive to alpha-particles. The alpha-particle-induced upset rate of latch circuits is one of the most critical issues for microprocessors requiring both high performance and high reliability.

Journal ArticleDOI
TL;DR: The experiment provides important evidence for the necessity of discriminating between horizontal and vertical conductivities for maximally consistent 3D CSEM inversions, and confirms that improved broadside data fits can be achieved by considering anisotropic electrical conductivities.
Abstract: Large-scale controlled-source electromagnetic (CSEM) three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical-conductivity mapping of potential off shore oil and gas reservoirs. To cope with the typically large computational requirements of the 3D CSEM imaging problem, our strategies exploit computational parallelism and optimized finite-difference meshing. We report on an imaging experiment utilizing 32,768 tasks (and processors) on the IBM Blue Gene/L™ (BG/L) supercomputer at the IBM T. J. Watson Research Center. Over a 24-hour period, we were able to image a large-scale marine CSEM field dataset that previously required more than 4 months of computing time on distributed clusters utilizing 1,024 tasks on an InfiniBand® fabric. The total initial data-fitting errors (i.e., "misfits") could be decreased by 67% within 72 completed inversion iterations, indicating the existence of an electrically resistive region in the southern survey area below a depth of 1,500 m underneath the seafloor. The major part of the residual misfit stems from transmitter-parallel receiver components that have an offset from the transmitter sail line (broadside configuration). Modeling confirms that improved broadside data fits can be achieved by considering anisotropic electrical conductivities. While delivering a satisfactory gross-scale image for the depths of interest, the experiment provides important evidence for the necessity of discriminating between horizontal and vertical conductivities for maximally consistent 3D CSEM inversions.

Journal ArticleDOI
TL;DR: This paper describes how anatomical structure is model by identfying, tabulating, and analyzing contacts between 104 neurons in a morphologically precise model of a column in the Blue Brain Project.
Abstract: Simulating neural tissue requires the construction of models of the anatomical structure and physiological function of neural microcircuitry. The Blue Brain Project is simulating the microcircuitry of a neocortical column with very high structural and physiological precision. This paper describes how we model anatomical structure by identfying, tabulating, and analyzing contacts between 104 neurons in a morphologically precise model of a column. A contact occurs when one element touches another, providing the opportunity for the subsequent creation of a simulated synapse. The architecture of our application divides the problem of detecting and analyzing contacts among thousands of processors on the IBM Blue Gene/L™ supercomputer. Data required for contact tabulation is encoded with geometrical data for contact detection and is exchanged among processors. Each processor selects a subset of neurons and then iteratively 1) divides the number of points that represents each neuron among column subvolumes, 2) detects contacts in a subvolume, 3) tabulates arbitrary categories of local contacts, 4) aggregates and analyzes global contacts, and 5) revises the contents of a column to achieve a statistical objective. Computing, analyzing, and optimizing local data in parallel across distributed global data objects involve problems common to other domains (such as three-dimensional image processing and registration). Thus, we discuss the generic nature of the application architecture.

Journal ArticleDOI
TL;DR: The Gyrokinetic Toroidal Code (GTC) was developed to study the global influence of microturbulence on particle and energy confinement, and has been optimized on the IBM Blue Gene/L™ (BG/L) computer, achieving essentially linear scaling on more than 30,000 processors.
Abstract: As the global energy economy makes the transition from fossil fuels toward cleaner alternatives, nuclear fusion becomes an attractive potential solution for satisfying growing needs. Fusion, the power source of the stars, has been the focus of active research since the early, 1950s. While progress has been impressive--especially for magnetically confined plasma devices called tokamaks--the design of a practical power plant remains an outstanding challenge. A key topic of current interest is microturbulence, which is believed to be responsible for the unacceptably large leakage of energy and particles out of the hot plasma core. Understanding and controlling this process is of utmost importance for operating current devices and designing future ones. In addressing such issues, the Gyrokinetic Toroidal Code (GTC) was developed to study the global influence of microturbulence on particle and energy confinement. It has been optimized on the IBM Blue Gene/L™ (BG/L) computer, achieving essentially linear scaling on more than 30,000 processors. A full simulation of unprecedented phase-space resolution was carried out with 32,768 processors on the BG/L supercomputer located at the IBM T. J. Watson Research Center, providing new insights on the influence of collisions on microturbulence.

Journal ArticleDOI
H. H. K. Tang1
TL;DR: The IBM soft-error Monte Carlo model SEMM-2 is a new general-purpose simulation platform developed for single-event-effect (SEE) analysis of advanced CMOS (complementary metal-oxide semiconductor) technologies.
Abstract: The IBM soft-error Monte Carlo model SEMM-2 is a new general-purpose simulation platform developed for single-event-effect (SEE) analysis of advanced CMOS (complementary metal-oxide semiconductor) technologies. The current status and major features of this system are presented in this paper, including the physics model modules for the relevant atomic and nuclear processes, the construction and application of databases, and the simulation methodologies used to solve general transport problems. SEE analysis can be carried out for a large class of radiation subatomic particles in arbitrarily complex geometries and material composition of the integrated circuit designs.

Journal ArticleDOI
TL;DR: An overview of Phaser, a toolset and methodology for modeling the effects of soft errors on the architectural and microarchitectural functionality of a system, and Phaser/M1, the early stage of the predictive modeling of behavior are presented.
Abstract: This paper presents an overview of Phaser, a toolset and methodology for modeling the effects of soft errors on the architectural and microarchitectural functionality of a system. The Phaser framework is used to understand the system-level effects of soft-error rates of a microprocessor chip as its design evolves through the phases of preconcept, concept, high-level design, and register-transfer-level design implementation. Phaser represents a strategic research vision that is being proposed as a next-generation toolset for predicting chip-level failure rates and studying reliability-performance tradeoffs during the phased design process. This paper primarily presents Phaser/M1, the early stage of the predictive modeling of behavior.

Journal ArticleDOI
TL;DR: Automated planners can assist administrators in making intelligent placement and resiliency decisions when provisioning for both new and existing applications.
Abstract: Introducing an application into a data center involves complex interrelated decision-making for the placement of data (where to store it) and resiliency in the event of a disaster (how to protect it). Automated planners can assist administrators in making intelligent placement and resiliency decisions when provisioning for both new and existing applications. Such planners take advantage of recent improvements in storage resource management and provide guided recommendations based on monitored performance data and storage models. For example, the IBM Provisioning Planner provides intelligent decision-making for the steps involved in allocating and assigning storage for workloads. It involves planning for the number, size, and location of volumes on the basis of workload performance requirements and hierarchical constraints, planning for the appropriate number of paths, and enabling access to volumes using zoning, masking, and mapping. The IBM Disaster Recovery (DR) Planner enables administrators to choose and deploy appropriate replication technologies spanning servers, the network, and storage volumes to provide resiliency to the provisioned application. The DR Planner begins with a list of high-level application DR requirements and creates an integrated plan that is optimized on criteria such as cost and solution homogeneity. The Planner deploys the selected plan using orchestrators that are responsible for failover and failback.

Journal ArticleDOI
TL;DR: Issues involved in utilizing such machines efficiently with the Rosetta code are discussed, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/L™ system.
Abstract: One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the "native state" lies at the bottom of a free-energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/L™ system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (>1012 floating-point operations per second) machines.

Journal ArticleDOI
TL;DR: As semiconductor devices decrease in size, soft errors are becoming a major issue that must be addressed at all stages of product definition, and circuit models are designed and modeled as needed.
Abstract: As semiconductor devices decrease in size, soft errors are becoming a major issue that must be addressed at all stages of product definition. Even before prototype silicon chips are available for measuring, modeling must be able to predict soft-error rates with reasonable accuracy. As the technology matures, circuit test sites are produced and experimentally tested to determine representative fail rates of critical SRAM and flip-flop circuits. Circuit models are then fit to these experimental results and further test-site and product circuits are designed and modeled as needed.

Journal ArticleDOI
T. J. Dell1
TL;DR: This paper describes some of the history of DRAM soft-error discovery and the subsequent development of mitigation strategies, and examines some architectural considerations that can exacerbate the effect ofDRAM soft errors and may have system-level implications for today's standard fault-tolerance schemes.
Abstract: While attention in the realm of computer design has shifted away from the classic DRAM soft-error rate (SER) and focused instead on SRAM and microprocessor latch sensitivities as sources of potential errors, DRAM SER nonetheless remains a challenging problem. This is true even though both cosmic ray-induced and alpha-particle-induced DRAM soft errors have been well modeled and, to a certain degree, well understood. However, the often-overlooked alignment of a DRAM hard error and a random soft error can have major reliability, availability, and serviceability (RAS) implications for systems that require an extremely long mean time between failures. The net of this effect is that what appears to be a well-behaved, single-bit soft error ends up overwhelming a seemingly state-of-the-art mitigation technique. This paper describes some of the history of DRAM soft-error discovery and the subsequent development of mitigation strategies. It then examines some architectural considerations that can exacerbate the effect of DRAM soft errors and may have system-level implications for today's standard fault-tolerance schemes.

Journal ArticleDOI
TL;DR: Methods that are used to measure alpha-particle emissivity from semiconductor and packaging materials are discussed, as well as methods that were used and the results for life testing and accelerated SEU testing of modern devices.
Abstract: The susceptibility of modern integrated-circuit devices to single-event upsets (SEUs) depends on both the alpha-particle emission rate and the energy of the alpha-particles emitted. In addition, the terrestrial neutron energy and flux, which produce secondary charged fragments in the device and circuit at the location of operation, contribute to the SEU rate. In this paper, we discuss methods that are used to measure alpha-particle emissivity from semiconductor and packaging materials, as well as methods that we used and our results for life testing and accelerated SEU testing of modern devices.

Journal ArticleDOI
TL;DR: The key issues involved in achieving ultrastrong scaling of methodologically correct biomolecular simulations are reviewed, particularly the treatment of the long-range electrostatic forces present in simulations of proteins in water and membranes.
Abstract: N-body simulations present some of the most interesting challenges in the area of massively parallel computing, especially when the object is to improve the time to solution for a fixed-size problem. The Blue Matter molecular simulation framework was developed specifically to address these challenges, to explore programming models for massively parallel machine architectures in a concrete context, and to support the scientific goals of the IBM Blue Gene® Project. This paper reviews the key issues involved in achieving ultrastrong scaling of methodologically correct biomolecular simulations, particularly the treatment of the long-range electrostatic forces present in simulations of proteins in water and membranes. Blue Matter computes these forces using the particle-particle particle-mesh Ewald (P3ME) method, which breaks the problem up into two pieces, one that requires the use of three-dimensional fast Fourier transforms with global data dependencies and another that involves computing interactions between pairs of particles within a cutoff distance. We summarize our exploration of the parallel decompositions used to compute these finite-ranged interactions, describe some of the implementation details involved in these decompositions, and present the evolution of strong-scaling performance achieved over the course of this exploration, along with evidence for the quality of simulation achieved.

Journal ArticleDOI
D. Nagle1, M. E. Factor2, S. Iren, D. Naor2, Erik Riedel, Ohad Rodeh2, J. Satran2 
TL;DR: The rationale for OSDs is described, the ANSI T10 OSD V1.0 interface is highlighted, and three OSD implementations are presented: an OSD from Seagate, an IBM object-based storage prototype (ObjectStone), and the Panasas object- based distributed storage system.
Abstract: Object-based storage is the natural evolution of the block storage interface, aimed at efficiently and effectively meeting the performance, reliability, security, and service requirements demanded by current and future applications. The object-based storage interface provides an organizational container, called an object, into which higher-level software (e.g., file systems, databases, and user applications) can store both data and related attributes. In 2004, the ANSI (American National Standards Institute) T10 Standards body ratified an Object-based Storage Device (OSD) command set for SCSI (Small Computer System Interface) storage devices that implements the OSD interface. This paper describes the rationale for OSDs, highlights the ANSI T10 OSD V1.0 interface, and presents three OSD implementations: an OSD from Seagate, an IBM object-based storage prototype (ObjectStone), and the Panasas object-based distributed storage system.

Journal ArticleDOI
TL;DR: This paper describes ongoing research to develop ways to simplify and automate the day-to-day administrative tasks of monitoring, configuring, provisioning, managing change, analyzing configuration, managing performance, and determining problems and provides details of SMART (storage management analytics and reasoning technology) as a library that provides a collection of data-aggregation functions and optimization algorithms.
Abstract: Exponential growth in storage requirements and an increasing number of heterogeneous devices and application policies are making enterprise storage management a nightmare for administrators. Back-of-the-envelope calculations, rules of thumb, and manual correlation of individual device data are too error prone for the day-to-day administrative tasks of resource provisioning, problem determination, performance management, and impact analysis. Storage management tools have evolved over the past several years from standardizing the data reported by storage subsystems to providing intelligent planners. In this paper, we describe that evolution in the context of the IBM Total Storage® Productivity Center (TPC)--a suite of tools to assist administrators in the day-to-day tasks of monitoring, configuring, provisioning, managing change, analyzing configuration, managing performance, and determining problems. We describe our ongoing research to develop ways to simplify and automate these tasks by applying advanced analytics on the performance statistics and raw configuration and event data collected by TPC using the popular Storage Management Initiative-Specification (SMI-S). In addition, we provide details of SMART (storage management analytics and reasoning technology) as a library that provides a collection of data-aggregation functions and optimization algorithms.

Journal ArticleDOI
TL;DR: The soft-error resilience of the IBM POWER6™ processor I/O (input/output) subsystem was measured using proton beam irradiation to accelerate the effect of single-event upsets.
Abstract: The soft-error resilience of the IBM POWER6™ processor I/O (input/output) subsystem was measured using proton beam irradiation to accelerate the effect of single-event upsets. Test programs exercised each of the adapters on the chip. Error rates were measured for various cases ranging from idle to high I/O bandwidth and utilization. The POWER6 processor and I/O hub subsystem work together to maintain resiliency even under strenuous irradiation conditions.