scispace - formally typeset
Search or ask a question

Showing papers in "Ibm Journal of Research and Development in 2007"


Journal ArticleDOI
TL;DR: Target applications including surface-roughening for on-chip decoupling capacitors, patterning nanocrystal floating gates for FLASH devices, and defining FET channel arrays are discussed.
Abstract: We are inspired by the beauty and simplicity of self-organizing materials and the promise they hold for enabling continued improvements in semiconductor technology. Self assembly is the spontaneous arrangement of individual elements into regular patterns," under suitable conditions, certain materials self organize into useful nanometer-scale patterns of importance to high-performance microelectronics applications. Polymer self assembly is a nontraditional approach to patterning integrated circuit elements at dimensions and densities inaccessible to traditional lithography methods. We review here our efforts in IBM to develop and integrate self-assembly processes as high-resolution patterning alternatives and to demonstrate targeted applications in semiconductor device fabrication. We also provide a framework for understanding key requirements for the adoption of polymer self-assembly processes into semiconductor technology, as well as a discussion of the ultimate dimensional scalability of the technique.

450 citations


Journal ArticleDOI
TL;DR: It is shown that the Cell/B.E.E., or Cell Broadband Engine, processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.
Abstract: The Cell Broadband Engine™ (Cell/B.E.) processor is the first implementation of the Cell Broadband Engine Architecture (CBEA), developed jointly by Sony, Toshiba, and IBM. In addition to use of the Cell/B.E. processor in the Sony Computer Entertainment PLAYSTATION® 3 system, there is much interest in using it for workstations, media-rich electronics devices, and video and image processing systems. The Cell/B.E. processor includes one PowerPC® processor element (PPE) and eight synergistic processor elements (SPEs). The CBEA is designed to be well suited for a wide variety of programming models, and it allows for partitioning of work between the PPE and the eight SPEs. In this paper we show that the Cell/B.E. processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.

401 citations


Journal ArticleDOI
TL;DR: Key extensions to the coherence protocol enable POWER6 microprocessor-based systems to achieve better SMP scalability while enabling reductions in system packaging complexity and cost.
Abstract: This paper describes the implementation of the IBM POWER6™ microprocessor, a two-way simultaneous multithreaded (SMT) dual-core chip whose key features include binary compatibility with IBM POWER5™ microprocessor-based systems; increased functional capabilities, such as decimal floating-point and vector multimedia extensions; significant reliability, availability, and serviceability enhancements; and robust scalability with up to 64 physical processors. Based on a new industry-leading high-frequency core architecture with enhanced SMT and driven by a high-throughput symmetric multiprocessing (SMP) cache and memory subsystem, the POWER6 chip achieves a significant performance boost compared with its predecessor, the POWER5 chip. Key extensions to the coherence protocol enable POWER6 microprocessor-based systems to achieve better SMP scalability while enabling reductions in system packaging complexity and cost.

255 citations


Journal ArticleDOI
TL;DR: An overview of CellSs and a newly implemented scheduling algorithm is presented and an analysis of the results--both performance measures and a detailed analysis with performance analysis tools--was performed and is presented here.
Abstract: With the appearance of new multicore processor architectures, there is a need for new programming paradigms, especially for heterogeneous devices such as the Cell Broadband Engine™ (Cell/B.E.) processor. CellSs is a programming model that addresses the automatic exploitation of functional parallelism from a sequential application with annotations. The focus is on the flexibility and simplicity of the programming model. Although the concept and programming model are general enough to be extended to other devices, its current implementation has been tailored to the Cell/B.E. device. This paper presents an overview of CellSs and a newly implemented scheduling algorithm. An analysis of the results--both performance measures and a detailed analysis with performance analysis tools--was performed and is presented here.

128 citations


Journal ArticleDOI
TL;DR: Together, the sensing, actuation, and management support available in the POWER6 processor enables higher performance, greater energy efficiency, and new power management capabilities such as power and thermal capping and power savings with explicit performance control.
Abstract: The IBM POWER6™ microprocessor chip supports advanced, dynamic power management solutions for managing not, just the chip but the entire server. The design facilitates a programmable power management solution for greater flexibility and integration into system- and data-center-wide management solutions. The design of the POWER6 microprocessor provides real-time access to detailed and accurate information on power, temperature, and performance. Together, the sensing, actuation, and management support available in the POWER6 processor, known as the EnergyScale™ architecture, enables higher performance, greater energy efficiency, and new power management capabilities such as power and thermal capping and power savings with explicit performance control. This paper provides an overview of the innovative design of the POWER6 processor that enables these advanced, dynamic system power management solutions.

113 citations


Journal ArticleDOI
TL;DR: The organization of the architecture is described, as well as the instruction set, commands, and facilities defined in the architecture, and the motivation for these facilities is explained and examples are provided to illustrate their intended use.
Abstract: This paper provides an overview of the Cell Broadband Engine™ Architecture (CBEA). The CBEA defines a revolutionary extension to a more conventional processor organization and serves as the basis for the development of microprocessors targeted at the computer entertainment, multimedia, and real-time market segments. In this paper, the organization of the architecture is described, as well as the instruction set, commands, and facilities defined in the architecture. In many cases, the motivation for these facilities is explained and examples are provided to illustrate their intended use. In addition, this paper introduces the Software Development Kit and the software standards for a CBEA-compliant processor.

107 citations


Journal ArticleDOI
TL;DR: The IBM POWER6 processor--on which a substantial amount of area has been devoted to increasing performance of both scientific and commercial workloads--is the first commercial hardware implementation of the IEEE 754R Binary Floating-point Arithmetic Standard.
Abstract: The IBM POWER6™ microprocessor core includes two accelerators for increasing performance of specific workloads. The vector multimedia extension (VMX) provides a vector acceleration of graphic and scientific workloads. It provides single instructions that work on multiple data elements. The instructions separate a 128-bit vector into different components that are operated on concurrently. The decimal floating-point unit (DFU) provides acceleration of commercial workloads, more specifically, financial transactions. It provides a new number system that performs implicit rounding to decimal radix points, a feature essential to monetary transactions. The IBM POWER™ processor instruction set is substantially expanded with the addition of these two accelerators. The VMX architecture contains 176 instructions, while the DFU architecture adds 54 instructions to the base architecture. The IEEE 754R Binary Floating-Point Arithmetic Standard defines decimal floating-point formats, and the POWER6 processor--on which a substantial amount of area has been devoted to increasing performance of both scientific and commercial workloads--is the first commercial hardware implementation of this format.

101 citations


Journal ArticleDOI
TL;DR: By applying constraint programming, a subfield of artificial intelligence, this paper is able to deal successfully with the complex constraints encountered in the field and reach near-optimal assignments that take into account all resources and positions in the pool.
Abstract: Matching highly skilled people to available positions is a high-stakes task that requires careful consideration by experienced resource managers. A wrong decision may result in significant loss of value due to understaffing, underqualification or overqualification of assigned personnel, and high turnover of poorly matched workers. While the importance of quality matching is clear, dealing with pools of hundreds of jobs and resources in a dynamic market generates a significant amount of pressure to make decisions rapidly. We present a novel solution designed to bridge the gap between the need for high-quality matches and the need for timeliness. By applying constraint programming, a subfield of artificial intelligence, we are able to deal successfully with the complex constraints encountered in the field and reach near-optimal assignments that take into account all resources and positions in the pool. The considerations include constraints on job role, skill level, geographical location, language, potential retraining, and many more. Constraints are applied at both the individual and team levels. This paper introduces the technology and then describes its use by IBM Global Services, where large numbers of service and consulting employees are considered when forming teams assigned to customer projects.

91 citations


Journal ArticleDOI
TL;DR: The physical and logical implementation of eFUSEs has resulted in improved yield at wafer, module, and final assembly test levels, and has provided additional flexibility in logic function and in system use.
Abstract: IBM System z9TM is the first zSeries® product to use electronic fuses (eFUSEs). The blowing of the fuse does not involve a physical rupture of the fuse element, but rather causes electromigration of the silicide layer, substantially increasing the resistance. The fuse is "blown" with the application of a higher-than-nominal voltage, eFUSEs provide several compelling advantages over the laser fuses they have replaced. The blow process does not risk damage to adjacent devices, eFUSEs can be blown by a logic process instead of a physical laser ablation method, eFUSEs are substantially smaller than laser fuses, and they scale better with process improvements. Finally, since no specialized equipment or separate product flow is required, eFUSEs can be blown at multiple test and application stages. We discuss circuit design, fuse programming, test considerations, and z9TM system applications. The physical and logical implementation of eFUSEs has resulted in improved yield at wafer, module, and final assembly test levels, and has provided additional flexibility in logic function and in system use.

89 citations


Journal ArticleDOI
TL;DR: An overview of this implementation of the newly defined decimal floating-point (DFP) format is presented and some measurement of the performance gained using hardware assists is provided.
Abstract: Although decimal arithmetic is widely used in commercial and financial applications, the related computations are handled in software. As a result, applications that use decimal data may experience performance degradations. Use of the newly defined decimal floating-point (DFP) format instead of binary floating-point is expected to significantly improve the performance of such applications. System z9TM is the first IBM machine to support the DFP instructions. We present an overview of this implementation and provide some measurement of the performance gained using hardware assists. Various tools and techniques employed for the DFP verification on unit, element, and system levels are presented in detail. Several groups within IBM collaborated on the verification of the new DFP facility, using a common reference model to predict DFP results.

66 citations


Journal ArticleDOI
TL;DR: This work presents a network-flow-based crew-optimization model that can be applied at the tactical, planning, and strategic levels of crew scheduling, and develops several highly efficient algorithms using problem decomposition and relaxation techniques that use the special structure of the underlying network model to obtain significant increases in speed.
Abstract: We present our solution to the crew-scheduling problem Jor North American railroads. (Crew scheduling in North America is very different from scheduling in Europe, where it has been well studied.) The crew-scheduling problem is to assign operators to scheduled trains over a time horizon at minimal cost while honoring operational and contractual requirements. Currently, decisions related to crew are made manually. We present our work developing a network-flow-based crew-optimization model that can be applied at the tactical, planning, and strategic levels of crew scheduling. Our network flow model maps the assignment of crews to trains as the flow of crews on an underlying network, where different crew types are modeled as different commodities in this network. We formulate the problem as an integer programming problem on this network, which allows it to be solved to optimality. We also develop several highly efficient algorithms using problem decomposition and relaxation techniques, in which we use the special structure of the underlying network model to obtain significant increases in speed. We present very promising computational results of our algorithms on the data provided by a major North American railroad. Our network flow model is likely to form a backbone for a decision-support system for crew scheduling.

Journal ArticleDOI
TL;DR: The proposed approach, which includes enhanced data mining methodology and state-of-the-art optimization technology, is applicable to settings in which a large amount of data must be analyzed in order to discover relevant relationships.
Abstract: We introduce a simulation optimization approach that is effective in guiding the search for optimal values of input parameters to a simulation model. Our proposed approach, which includes enhanced data mining methodology and state-of-the-art optimization technology, is applicable to settings in which a large amount of data must be analyzed in order to discover relevant relationships. Our approach makes use of optimization technology not only for data mining but also for optimizing the underlying simulation model itself A market research application embodying agent-based simulation is used to illustrate our proposed approach.

Journal ArticleDOI
TL;DR: This work presents a methodology developed for the GBS organization of IBM to enable automated generation of staffing plans involving specific job roles, skill sets, and employee experience levels, and presents results of applying the clustering and staffing plan generation methodologies.
Abstract: In order to successfully deliver a labor-based professional service, the right people with the right skills must be available to deliver the service when it is needed. Meeting this objective requires a systematic, repeatable approach for determining the staffing requirements that enable informed staffing management decisions. We present a methodology developed for the Global Business Services (GBS) organization of IBM to enable automated generation of staffing plans involving specific job roles, skill sets, and employee experience levels. The staffing plan generation is based on key characteristics of the expected project as well as selection of a project type from a project taxonomy that maps to staffing requirements. The taxonomy is developed using statistical clustering techniques applied to labor records from a large number of historical GBS projects. We describe the steps necessary to process the labor records so that they are in a form suitable for analysis, as well as the clustering methods used for analysis, and the algorithm developed to dynamically generate a staffing plan based on a selected group. We also present results of applying the clustering and staffing plan generation methodologies to a variety of GBS projects.

Journal ArticleDOI
TL;DR: The primary focus of this paper is on the algorithms and the firmware structure used in the EnergyScale architecture, although it also provides the system design considerations needed to support performance-aware power management.
Abstract: With increasing processor speed and density, denser system packaging, and other technology advances, system power and heat have become important design considerations The introduction of new technology including denser circuits, improved lithography, and higher clock speeds means that power consumption and heat generation, which are already significant problems with older systems, are significantly greater with IBM POWER6™ processor-based designs, including both standalone servers and those implemented as blades for the IBM BladeCenter® product line In response, IBM has developed the EnergyScale™ architecture, a system-level power management implementation for POWER6 processor-based machines The EnergyScale architecture uses the basic power control facilities of the POWER6 chip, together with additional board-level hardware, firmware, and systems software, to provide a complete power and thermal management solution The EnergyScale architecture is performance aware, taking into account the characteristics of the executing workload to ensure that it meets the goals specified by the user while reducing power consumption This paper introduces the EnergyScale architecture and describes its implementation in two representative platform designs: an eight-way, rack-mounted machine and a server blade The primary focus of this paper is on the algorithms and the firmware structure used in the EnergyScale architecture, although it also provides the system design considerations needed to support performance-aware power management In addition, it describes the extensions and modifications to power management that are necessary to span the range of POWER6 processor-based system designs

Journal ArticleDOI
TL;DR: The problem prior to the development of NPIV, the concept ofNPIV, and the first implementation of this technique in the FCP channel of the IBM System z9 are described.
Abstract: The IBM System z9TM and its predecessors pioneered server virtualization, including the sharing of data storage subsystems among the virtual servers of a host computer using the channel-sharing capabilities of FICON® channels in Fibre Channel (FC) fabrics. Now industry-standard Small Computer System Interface (SCSI) devices in storage area networks must be shared among host computers using the Fibre Channel Protocol (FCP), and this has been problematic with virtual servers in a host computer. To apply the power of server virtualization to this environment, the IBM System z9 implements a new FC standard called N_Port Identifier Virtualization (NPIV). IBM invented NPIV and offered it as a standard to enable the sharing of host adapters in IBM servers and FC fabrics. With NPIV, a host FC adapter is shared in such a way that each virtual adapter is assigned to a virtual server and is separately identifiable within the fabric. Connectivity and access privileges within the fabric are controlled by identification of each virtual adapter and, hence, the virtual server using each virtual adapter. This paper describes the problem prior to the development of NPIV, the concept of NPIV, and the first implementation of this technique in the FCP channel of the IBM System z9.

Journal ArticleDOI
TL;DR: The Cell Broadband Engine (Cell/B.E.) processor as mentioned in this paper was developed by Sony, Toshiba and IBM engineers to deliver a high-speed, high-performance, multicore processor that brings supercomputer performance via a custom system-on-a-chip (SoC) implementation.
Abstract: The Cell Broadband Engine™ (Cell/B.E.) processor was developed by Sony, Toshiba, and IBM engineers to deliver a high-speed, high-performance, multicore processor that brings supercomputer performance via a custom system-on-a-chip (SoC) implementation. To achieve its goals, the Cell/B.E. processor uses an innovative architecture, new circuit design styles, and hierarchical integration and verification techniques. The Cell/B.E. processor design point was also targeted at high-volume manufacturing. To meet high-volume manufacturing requirements, the chip was designed so that it could be completely tested in less than 26 seconds. In addition to the above items, the Cell/B.E. processor was designed with the "triple design constraints" of maximizing performance while minimizing area and power consumed. The initial application was targeted at real-time systems that require high-speed data movement for both on-chip and off-chip transfers. This application also required very high speed compute and real-time response processes.

Journal ArticleDOI
TL;DR: Emphasis is placed on aspects of the design methodology, technology, clock distribution, integration, chip analysis, power and performance, random logic macro (RLM), and design data management processes that enabled the design to be completed and the project goals to be met.
Abstract: The IBM POWER6™ microprocessor is a 790 million-transistor chip that runs at a clock frequency of greater than 4 GHz. The complexity and size of the POWER6 microprocessor, together with its high operating frequency, present a number of significant challenges. This paper describes the physical design and design methodology of the POWER6 processor. Emphasis is placed on aspects of the design methodology, technology, clock distribution, integration, chip analysis, power and performance, random logic macro (RLM), and design data management processes that enabled the design to be completed and the project goals to be met.

Journal ArticleDOI
Jon Lee1
TL;DR: This paper examines various aspects of modeling and solution via mixed-integer nonlinear programming, concentrating on an aspect of a classical facility location problem that is well-known in the MILP literature, although here the authors consider a nonlinear objective function.
Abstract: We examine various aspects of modeling and solution via mixed-integer nonlinear programming (MINLP). MINLP has much to offer as a powerful modeling paradigm. Recently, significant advances have been made in MINLP solution software. To fully realize the power of MINLP to solve complex business optimization problems, we need to develop knowledge and expertise concerning MINLP modeling and solution methods. Some of this can be drawn from conventional wisdom of mixed-integer linear programming (MILP) and nonlinear programming (NLP), but theoretical and practical issues exist that are specific to MINLP. This paper discusses some of these, concentrating on an aspect of a classical facility location problem that is well-known in the MILP literature, although here we consider a nonlinear objective function.

Journal ArticleDOI
TL;DR: The Cell Broadband Engine™ (Cell/B.E.) processor, developed jointly by Sony, Toshiba, and IBM primarily for next-generation gaming consoles, packs a level of floating-point, vector, and integer streaming performance in one chip that is an order of magnitude greater than that of traditional commodity microprocessors.
Abstract: The Cell Broadband Engine™ (Cell/B.E.) processor, developed jointly by Sony, Toshiba, and IBM primarily for next-generation gaming consoles, packs a level of floating-point, vector, and integer streaming performance in one chip that is an order of magnitude greater than that of traditional commodity microprocessors. Cell/B.E. blades are server and supercomputer building blocks that use the Cell/B.E. processor, the high-volume IBM BladeCenter® server platform, high-speed commodity networks, and open-system software. In this paper we present the design of the Cell/B.E. blades and discuss several early application prototypes and results.

Journal ArticleDOI
TL;DR: This paper describes the state-of-the art reliability features of the IBM POWER6™ microprocessor, which includes a high degree of detection of soft and hard errors in both dataflow and control logic, as well as a feature--instruction retry recovery (IRR)--usually available only on mainframe systems.
Abstract: This paper describes the state-of-the art reliability features of the IBM POWER6™ microprocessor. The POWER6 microprocessor includes a high degree of detection of soft and hard errors in both dataflow and control logic, as well as a feature--instruction retry recovery (IRR)--usually available only on mainframe systems. IRR provides full hardware error recovery of those registers that are defined by the instruction set architecture. This is accomplished by taking a checkpoint of the defined state for both of the core threads and recovering the machine state back to a known good point. To allow changing memory accessibility without using different page table entries, the POWER6 microprocessor implements virtual page class keys, a new architectural extension that enables the OS (operating system) to manage eight classes of memory with efficiently modifiable access authority for each class. With this feature, malfunctioning kernel extensions can be prevented from destroying OS data that may, in turn, bring an OS down.

Journal ArticleDOI
TL;DR: The Cell Broadband Engine™ (Cell/B.E.) processor security architecture has three core features that are well suited for this purpose: hardware-enforced process isolation in which code and data can execute in physically isolated memory space, and a hardware key to act as the root of an encryption chain.
Abstract: Current data protection technologies such as those based on public-key encryption and broadcast encryption focus on the secure control and protection of data. Although these protection schemes are effective and mathematically sound, they are susceptible to systematic attacks that utilize any underlying platform weakness, bypassing the cryptographic strengths of the actual schemes. Thus, ensuring that the computing platform supports the cryptographic data protection layers is a critical issue. The Cell Broadband Engine™ (Cell/B.E.) processor security architecture has three core features that are well suited for this purpose. It provides hardware-enforced process isolation in which code and data can execute in physically isolated memory space. It also provides the ability to perform hardware-supported authentication of any software stack (i.e., "secure boot") during runtime. Finally, the architecture provides a hardware key to act as the root of an encryption chain. Data encrypted directly or indirectly by this key can be decrypted and provided only to an application that is running in the isolated memory and that has been verified. This significantly reduces an adversary's chances of manipulating software to expose the key that is fundamental to a data protection or authentication scheme. Furthermore, it provides a foundation for an application to attest itself to a remote party by demonstrating access to a secret.

Journal ArticleDOI
TL;DR: Two RCP models are described and two approaches to solving each of them are presented, which solve two core resource planning problems, gap/glut analysis and resource action planning.
Abstract: The IBM Research Division has developed the Resource Capacity Planning (RCP) Optimizer to support the Workforce Management Initiative (WMI) of IBM. RCP applies supply chain management techniques to the problem of planning the needs of IBM for skilled labor in order to satisfy service engagements, such as consulting, application development, or customer support. This paper describes two RCP models and presents two approaches to solving each of them. We also describe the motivation for using one approach over another. The models are built using the Watson Implosion Technology toolkit, which consists of a supply chain model, solvers for analysis and optimization, and an Application Programming Interface (API) for developing a solution. The models that we built solve two core resource planning problems, gap/glut analysis and resource action planning. The gap/glut analysis is similar to material requirements planning (MRP), in which shortages (gaps) and excesses (gluts) of resources are determined on the basis of expected demand. The goal of the resource action planning problem is to determine what resource actions to take in order to fill the gaps and reduce the gluts. The gap/glut analysis engine is currently deployed within the IBM service organization to report gaps and gluts in personnel.

Journal ArticleDOI
A. Labbi1, C. Berrospi1
TL;DR: A new tool is described, the IBM Customer Equity Lifetime Management Solution (CELM), that helps to determine long-term customer value by means of dynamic programming algorithms in order to identify which marketing actions are the most effective in improving customer loyalty and hence increasing revenue.
Abstract: Many companies have no reliable way to determine whether their marketing money has been spent effectively, and their return on investment is often not evaluated in a systematic manner. Thus, a compelling need exists for computational tools that help companies to optimize their marketing strategies. For this purpose, we have developed computational models of customer buying behavior in order to determine and leverage the value generated by a customer within a given time frame. The term "customer value" refers to the revenue generated from a customer's buying behavior in relation to the costs of marketing campaigns. We describe a new tool, the IBM Customer Equity Lifetime Management Solution (CELM), that helps to determine long-term customer value by means of dynamic programming algorithms in order to identify which marketing actions are the most effective in improving customer loyalty and hence increasing revenue. Simulation of marketing scenarios may be performed in order to assess budget requirements and the expected impact of marketing policies. We present a case study of a pilot program with a leading European airline, and we show how this company optimized its frequent flyer program to reduce its marketing budget and increase customer value.

Journal ArticleDOI
D. W. Plass1, Y.H. Chan1
TL;DR: Key elements of the POWER6 processor chip arrays include paradigm shifts such as thin memory cell layout, large signal read (without a sense amplifier), segmented bitline structure, unclamped column-half-select scheme, multidimensional programmable timing control, and separate elevated static random access memory (SRAM) power supply.
Abstract: The IBM POWER6™ microprocessor presented new challenges to array design because of its high-frequency requirement and its use of 65-nm silicon-on-insulator (SOI) technology. Advancements in performance (2X to 3X improvement over the 90-nm generation) and design margins (cell stability, writability, and redundancy coverage) were major focus areas. Key elements of the POWER6 processor chip arrays include paradigm shifts such as thin memory cell layout, large signal read (without a sense amplifier), segmented bitline structure, unclamped column-half-select scheme, multidimensional programmable timing control, and separate elevated static random access memory (SRAM) power supply. There are two main array categories on the POWER6 microprocessor chip: core and nest. Processor core arrays use a single-port, 0.75-µm2, six-transistor (6T) cell and operate at full frequency, whereas the surrounding nest arrays use a smaller 0.65-µm2 cell that operates at half or one-quarter of the core frequency in order to achieve better density and power efficiency. The core arrays include the 96-KB instruction cache (I-cache) and the 64-KB data cache (D-cache), with associate lookup-path SRAM macros. The I-cache is a four-way set-associative, single-port design, whereas the D-cache is an eight-way design with dual read ports to handle multithreading capability. The lookup-path arrays contain content-addressable memory (CAM) and RAM macros with integrated dynamic hit logic circuitry. In the nest portion, an 8-MB level 2 (L2) D-cache and a level 3 (L3) directory (1.2 MB) make up the largest arrays. The latter macro designs use longer bitlines and orthogonal word-decode layouts to achieve high array-area efficiency.

Journal ArticleDOI
Derrin M. Berger1, Jonathan Y. Chen1, F. D. Ferraiolo1, J. A. Magee1, G. A. Van Hubert1 
TL;DR: The EI is a generic high-speed, source-synchronous interface used to transfer addresses, controls, and data between CPUs, L2 caches, memory subsystems, switches, and I/O hubs and has single-ended data lines, resulting in twice the performance of similar buses operating with two differential lines per signal.
Abstract: As mainframes evolve and deliver higher performance, technologists are focusing less on processor speed and more on overall system performance to create optimized systems. One important area of focus for performance improvement involves chip-to-chip interconnects, with their associated bandwidths and latencies. IBM and related computer manufacturers are optimizing the characteristics of interconnects between processors as well as between processors and their supporting chip sets (local cache, memory, I/O bridge). This paper describes the IBM proprietary high-speed interface known as Elastic Interface (EI), which is used for nearly all chip-to-chip communication in the IBM System z9TM. In particular, EI is a generic high-speed, source-synchronous interface used to transfer addresses, controls, and data between CPUs, L2 caches, memory subsystems, switches, and I/O hubs. The EI has single-ended data lines, resulting in twice the performance (bandwidth per pin) of similar buses operating with two differential lines per signal.

Journal ArticleDOI
TL;DR: This paper describes a behavior-anomaly-based system for detecting insider attacks that uses peer-group profiling, composite feature modeling, and real-time statistical data mining and describes an implementation of this detection approach in the form of the IBM Identity Risk and Investigation Solution (IRIS).
Abstract: Early detection of employees' improper access to sensitive or valuable data is critical to limiting negative financial impacts to an organization, including regulatory penalties for misuse of customer data that results from these insider attacks. Implementing a system for detecting insider attacks is a technical challenge that also involves business-process changes and decision making that prioritizes the value of enterprise data. This paper focuses primarily on the techniques for detecting insider attacks, but also discusses the processes required to implement a solution. In particular, we describe a behavior-anomaly-based system for detecting insider attacks. The system uses peer-group profiling, composite feature modeling, and real-time statistical data mining. The analytical models are refined and used to update the real-time monitoring process. This continues in a cyclical manner as the system self-tunes. Finally, we describe an implementation of this detection approach in the form of the IBM Identity Risk and Investigation Solution (IRIS).

Journal ArticleDOI
TL;DR: An optimization tool for a multistage production process for rectangular steel plates is described, which combines mathematical programming models with search techniques from artificial intelligence and produces a production design for rectangular plate products in a steel plant.
Abstract: We describe an optimization tool for a multistage production process for rectangular steel plates. The problem we solve yields a production design (or plan) for rectangular plate products in a steel plant, i.e., a detailed list of operational steps and intermediate products on the way to producing steel plates. We decompose this problem into subproblems that correspond to the production stages, where one subproblem requires the design of casts by sequencing slabs which, in turn, have to be designed from mother plates. The design of mother plates consists of a two-dimensional packing problem. We develop a solution approach which combines mathematical programming models with search techniques from artificial intelligence. The use of these tools provides two types of benefits: improvements in the productivity of the plant and an approach to making the key business performance indicators, such as available-to-promise at a production level, operational.

Journal ArticleDOI
S. Müller1, Chonawee Supatgiat
TL;DR: This paper proposes an approach to compliance management based on a quantitative risk-based optimization model that allows dynamic selection of the optimal set of feasible measures for attaining an adequate level of compliance with a given set of regulatory requirements.
Abstract: The changing nature of regulation forces businesses to continuously reevaluate the measures taken to comply with regulatory requirements. To prepare for compliance audits, businesses must also implement an effective internal inspection policy that identifies and rectifies instances of noncompliance. In this paper, we propose an approach to compliance management based on a quantitative risk-based optimization model. Our model allows dynamic selection of the optimal set of feasible measures for attaining an adequate level of compliance with a given set of regulatory requirements. The model is designed to minimize the expected total cost of compliance, including the costs of implementing a set of measures, the cost of carrying out periodic inspections, and the audit outcome cost for various compliance levels. Our approach is based on dynamic programming and naturally accounts for the dynamic nature of the regulatory environment. Our method can be used either as a scenario-based management support system or, depending on the availability of reliable input data, as a comprehensive tool for optimally selecting the needed compliance measures and inspection policy. We illustrate our approach in a hypothetical case study.

Journal ArticleDOI
Brian W. Curran1, Eric Fluhr1, Jose Angel Paredes1, L. Sigal1, Joshua Friedrich1, Y.H. Chan1, C. Hwang1 
TL;DR: The circuit, physical design, clocking, timing, power, and hardware characterization challenges faced in the pursuit of this industry-leading frequency are described.
Abstract: The IBM POWER6™ microprocessor is a high-frequency (>5-G Hz) microprocessor fabricated in the IBM 65-nm silicon-on-insulator (SOI) complementary metal-oxide semiconductor (CMOS) process technology. This paper describes the circuit, physical design, clocking, timing, power, and hardware characterization challenges faced in the pursuit of this industry-leading frequency. Traditional high-power, high-frequency techniques were abandoned in favor of more-power-efficient circuit design methodologies. The hardware frequency and power characterization are reviewed.

Journal ArticleDOI
TL;DR: The architecture and implementation of the original gaming-oriented synergistic processor element (SPE) in both 90-nm and 65-nm silicon-on-insulator (SOI) technology is described and a new SPE implementation targeted for the high-performance computing community is introduced.
Abstract: This paper describes the architecture and implementation of the original gaming-oriented synergistic processor element (SPE) in both 90-nm and 65-nm silicon-on-insulator (SOI) technology and introduces a new SPE implementation targeted for the high-performance computing community. The Cell Broadband Engine™ processor contains eight SPEs. The dual-issue, four-way single-instruction multiple-data processor is designed to achieve high performance per area and power and is optimized to process streaming data, simulate physical phenomena, and render objects digitally. Most aspects of data movement and instruction flow are controlled by software to improve the performance of the memory system and the core performance density. The SPE was designed as an 11-F04 (fan-out-of-4-inverter-delay) processor using 20.9 million transistors within 14.8 mm 2 using the IBM 90-nm SOI low-k process. CMOS (complementary metal-oxide semiconductor) static gates implement the majority of the logic. Dynamic circuits are used in critical areas and occupy 19% of the non-static random access memory (SRAM) area. Instruction set architecture, microarchitecture, and physical implementation are tightly coupled to achieve a compact and power-efficient design. Correct operation has been observed at up to 5.6 GHz and 7.3 GHz, respectively, in 90-nm and 65-nm SOI technology.