scispace - formally typeset
Search or ask a question

Showing papers in "Ibm Journal of Research and Development in 1994"


Journal ArticleDOI
Don Coppersmith1
TL;DR: Some of the safeguards against differential cryptanalysis that were built into the DES system from the beginning are shown, with the result that more than 10 15 bytes of chosen plaintext are required for this attack to succeed.
Abstract: The Data Encryption Standard (DES) was developed by an IBM team around 1974 and adopted as a national standard in 1977. Since that time, many cryptanalysts have attempted to find shortcuts for breaking the system. In this paper, we examine one such attempt, the method of differential cryptanalysis, published by Biham and Shamir. We show some of the safeguards against differential cryptanalysis that were built into the system from the beginning, with the result that more than 10 15 bytes of chosen plaintext are required for this attack to succeed.

560 citations


Journal ArticleDOI
F. D. Egitto1, L. J. Matienzo1
TL;DR: This paper addresses the interaction of organic surfaces with the various components of a plasma, with examples taken from a review of the pertinent literature.
Abstract: Polymers have wide-ranging applications in food packaging and decorative products, and as insulation for electronic devices. For these applications, the adhesion of materials deposited onto polymer substrates is of primary importance. Not all polymer surfaces possess the required physical and/or chemical properties for good adhesion. Plasma treatment is one means of modifying polymer surfaces to improve adhesion while maintaining the desirable properties of the bulk material. This paper addresses the interaction of organic surfaces with the various components of a plasma, with examples taken from a review of the pertinent literature.

264 citations


Journal ArticleDOI
Jonathan R. M. Hosking1
TL;DR: Some of the properties of the four-parameter kappa distribution are described, and an example in which it is applied to modeling the distribution of annual maximum precipitation data is given.
Abstract: Many common probability distributions, including some that have attracted recent interest for flood-frequency analysis, may be regarded as special cases of a four-parameter distribution that generalizes the three-parameter kappa distribution of P.W. Mielke. This four-parameter kappa distribution can be fitted to experimental data or used as a source of artificial data in simulation studies. This paper describes some of the properties of the four-parameter kappa distribution, and gives an example in which it is applied to modeling the distribution of annual maximum precipitation data.

163 citations



Journal ArticleDOI
TL;DR: This work presents a hybrid algorithm which automates this process in real time using neural networks and a knowledge-based system to vowelize Arabic, applicable to a wide variety of purposes, including visa processing and document processing by border patrols.
Abstract: An Arabic name can be written in English with many different spellings For example, the name Sulayman is written only one way in Arabic In English, this name is written in as many as forty different ways, such as Salayman, Seleiman, Solomon, Suleiman, and Sylayman Currently, Arabic linguists manually transliterate these names—a slow, laborious, error-prone, and time-consuming process We present a hybrid algorithm which automates this process in real time using neural networks and a knowledge-based system to vowelize Arabic A supervised neural network filters out unreliable names, passing the reliable names on to the knowledge-based system for romanization This approach, developed at the IBM Federal Systems Company, is applicable to a wide variety of purposes, including visa processing and document processing by border patrols

127 citations


Journal ArticleDOI
TL;DR: A scheme for matrix-matrix multiplication on a distributed-memory parallel computer that hides almost all of the communication cost with the computation and uses the standard, optimized Level-3 BLAS operation on each node to achieve peak performance for parallel BLAS.
Abstract: In this paper, we propose a scheme for matrix-matrix multiplication on a distributed-memory parallel computer. The scheme hides almost all of the communication cost with the computation and uses the standard, optimized Level-3 BLAS operation on each node. As a result, the overall performance of the scheme is nearly equal to the performance of the Level-3 optimized BLAS operation times the number of nodes in the computer, which is the peak performance obtainable for parallel BLAS. Another feature of our algorithm is that it can give peak performance for larger ®Copyright 1994 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. matrices, even if the underlying communication network of the computer is slow .

92 citations


Journal ArticleDOI
TL;DR: The paper gives two examples that illustrate how the algorithms and architectural features interplay to produce high-performance codes and included in ESSL (Engineering and Scientific Subroutine Library); an overview of ESSL is also given.
Abstract: We describe the algorithms and architecture approach to produce high-performance codes for numerically intensive computations. In this approach, for a given computation, we design algorithms so that they perform optimally when run on a target machine-in this case, the new POWERS'\" machines from the RSl6000 family of RISC processors. The algorithmic features that we emphasize are functional parallelism, cachelregister blocking, algorithmic prefetching, loop unrolling, and algorithmic restructuring. The architectural features of the POWER2 machine that we describe and that lead to high performance are multiple functional units, high bandwidth between registers, cache, and memory, a large number of fixedand floating-point registers, and a large cache and TLB (translation lookaside buffer). The paper gives two examples that illustrate how the algorithms and architectural features interplay to produce high-performance codes. They are B U S (basic linear algebra subroutines) and narrow-band matrix routines. These routines are included in ESSL (Engineering and Scientific Subroutine Library); an overview of ESSL is also given in the paper.

78 citations


Journal ArticleDOI
TL;DR: Enough conditions are studied in order that a neural network having a single hidden layer consisting of n neurons, each with an activation function φ, can be constructed so as to give a mean square approximation to f within a given accuracy, independent of the number of variables.
Abstract: Let φ be a univariate 2π-periodic function. Suppose that s ≥ 1 and f is a 2π-periodic function of s real variables. We study sufficient conditions in order that a neural network having a single hidden layer consisting of n neurons, each with an activation function φ, can be constructed so as to give a mean square approximation to f within a given accuracy ∈ n , independent of the number of variables. We also discuss the case in which the activation function φ is not 2π-periodic.

74 citations


Journal ArticleDOI
S. W. White1, S. Dhawan1
TL;DR: The architectural extensions which improve storage reference bandwidth, allow hardware square-root computation, and speed floating-point-to-integer conversion are described, which demonstrate that superscalar capabilities are an attractive alternative to aggressive clock rates.

70 citations


Journal ArticleDOI
K W Lee1, Alfred Viehbeck1
TL;DR: Wet- process modifications of dielectric polymer surfaces are reviewed and unpublished results related to fluorinated polyimides are presented, while chemical reactions are the major contributors to PCTFE/glass adhesion strength.
Abstract: For many electronic applications, the surface of a dielectric polymer must be modified to obtain the desired surface properties, such as wetting, adhesion, and moisture barrier, without altering the bulk properties. This paper reviews wet- process modifications of dielectric polymer surfaces and also presents unpublished results related to fluorinated polyimides. In a typical wet process, a substrate is immersed in or sprayed with a chemical solution, rinsed with a solvent to remove the excess reagents, and then dried if necessary. Wet processing can provide greatly enhanced adhesion and reliability of the adherate (top) layer to the modified polymer surface (adherend). We discuss a) the wet-process modification of various polymers (e.g., PMDA-ODA, BPDA-PDA, 6FDA-ODA, PTFE, PCTFE); b) polyimide/polyimide and PCTFE/glass adhesion, and c) the surface chemistry and the adhesion at a fluorinated polyimide (6FDA-ODA) surface. Entanglement of polymer chains plays an important role in polyimide/polyimide adhesion, while chemical reactions are the major contributors to PCTFE/glass adhesion strength. The metallization of dielectric substrates often requires surface pretreatments or conditioning by wet processes to sensitize a polymer surface for deposition of a metal seed layer. After seeding, a thick layer of a conducting metal (e.g., Cu) is deposited by electroless or electrolytic plating. Unlike dry or high-vacuum processing of polymer surfaces, the chemistry of a wet-processed polymer surface can be well characterized and often defined at a molecular level. A relationship can be established between a polymer's surface chemical (or morphological) structure and its surface properties such as adhesion and metallization.

61 citations


Journal ArticleDOI
M.D. Thouless1
TL;DR: In this paper, the essential elements of the mechanics of delamination are reviewed and their implications for design are discussed, and two important concepts for the prediction of the reliability of thin-film systems are emphasized: 1) limiting solutions for the crack-driving force that are independent of flaw size, and 2) mixed-mode fracture.
Abstract: The essential elements of the mechanics of delamination are reviewed and their implications for design are discussed. Two important concepts for the prediction of the reliability of thin-film systems are emphasized: 1) limiting solutions for the crack-driving force that are independent of flaw size, and 2) “mixed-mode fracture.” Consideration of the first concept highlights the possibility of flaw-tolerant design in which the statistical effects associated with flaw distributions can be eliminated. The significance of mode-mixedness includes its effect on crack trajectories and on the interface toughness, two key variables in determining failure mechanisms. Theoretical predictions are given for some cases of delamination of thin films under compressive stresses, and the results are compared with experimental observations to illustrate appropriate design criteria for the model systems studied.

Journal ArticleDOI
Hugh R. Brown1
TL;DR: Advances in the under- standing of cracks at bimaterial interfaces are considered, with particular emphasis on their implications in the interpretation of adhesion tests.
Abstract: This paper is concerned with recent work that relates to the adhesion between nonreacting polymers. Advances in the under- standing of cracks at bimaterial interfaces are considered, with particular emphasis on their implications in the interpretation of adhesion tests. An interpretation of the peel and blister tests is then discussed. Consideration is given to mechanisms of polymer failure as they relate to adhesion, with an emphasis on the distinctions between the properties of glassy and elastomeric materials. Polymer self-adhesion and its relation to interdiffusion are reviewed and compared with the adhesion of miscible polymers. In considering the adhesion between immiscible polymers, emphasis is given to the use of copolymers as coupling agents at the interface.

Journal ArticleDOI
TL;DR: The analyses provide a good fit and insight into previously obtained laser-induced fluorescence results for Cu’, Cu+, and Cu, and the surface shadowing and plasma heating beginning in the 5-J/cm2 region are clearly illustrated.
Abstract: Excimer laser ablation of metals starts as a thermal process in the -1-J/cm2 fluence range and makes a rapid transition near 5 J/cm2 to a highly ionized plume (for IO-ns pulses). The 1-5-J/cm2 range is of particular interest because it overlaps the irradiance range used to fabricate high-temperature superconductors, diamondlike carbon films, and conducting Cu films. Covered here are analyses aimed at a quantitative evaluation of the transition using a previously described model. The model is based primarily on the thermal (diffusivity and vapor pressure) properties of copper, along with electron heating by inverse bremsstrahlung due to electrons scattering off both neutrals and ions. The analyses provide a good fit and insight into previously obtained laser-induced fluorescence results for Cu’, Cu+, and Cu,. Also, the surface shadowing and plasma heating beginning in the 5-J/cm2 (500-MW/cm2) region are clearly illustrated. Introduction

Journal ArticleDOI
TL;DR: This paper describes a machine organization suitable for RISC and CISC architectures that reduces hardware complexity in parallel instruction fetch and issue logic by minimizing possible increases in cycle time caused by parallel instruction issue decisions in the instruction buffer and improves instruction-level parallelism by means of special features.
Abstract: In this paper we describe a machine organization suitable for RISC and CISC architectures. The proposed organization reduces hardware complexity in parallel instruction fetch and issue logic by minimizing possible increases in cycle time caused by parallel instruction issue decisions in the instruction buffer. Furthermore, it improves instruction-level parallelism by means of special features. The improvements are achieved by analyzing instruction sequences and deciding which instructions will issue and execute in parallel prior to actual instruction fetch and issue, by incorporating preprocessed information for parallel issue and execution of instructions in the cache, by categorizing instructions for parallel issue and execution on the basis of hardware utilization rather than opcode description, by attempting to avoid memory interlocks through the preprocessing mechanism, and by eliminating execution interlocks with specialized hardware. Introduction Improvements in the performance of computer systems relate to circuit-level or technology improvements and to organizational techniques such as pipelining, cache memories, out-of-order execution, multiple functional units, and exploitation of instruction-level parallelism. One increasingly popular approach for exploiting instructionlevel parallelism, i.e., allowing multiple instructions to be issued and executed in one machine cycle, is the so-called superscalar machine organization [1]. A number of such machines with varying degrees of parallelism have recently been described [2, 3]. The increasing popularity of superscalar machine organizations may be attributed to the increased instruction execution rate such systems may offer, concomitant with technology improvements that have made their organizations more feasible.


Journal ArticleDOI
R. R. Heisch1
TL;DR: Using the prototype tools developed for this effort on a selection of both user-level application programs and operating system (kernel) code, improvements in execution time and reduced instruction memory requirements of up to 61% were measured.
Abstract: This paper presents the design and implementation of trace-directed program restructuring (TDPR) for AIX@ executable programs. TDPR is the process of reordering the instructions in an executable program, using an actual execution profile (or instruction address trace) for a selected workload, to improve utilization of the existing hardware architecture. Generally, the application of TDPR results in faster programs, programs that use less real memory, or both. Previous similar work [l-61 regarding profile-guided or feedback-directed program optimization has demonstrated significant improvements for various architectures. TDPR applies these concepts to AIX executable programs at a global level (i.e., independent of procedure or other structural boundaries) running on the POWER, POWERP \" , and PowerPC 601 \" machines and adds the methodology to preserve correctness and debuggability for reordered executables. Using the prototype tools developed for this effort on a selection of both user-level application programs and operating system (kernel) code, improvements in execution time of up to 73% and reduced instruction memory requirements of up to 61% were measured. The techniques used to restructure AIX executables are discussed, and the performance improvements and memory reductions measured for several application programs are presented.

Journal ArticleDOI
TL;DR: Algorithmic prefetching denotes changing algorithm A to algorithm B, which contains additional steps to move data from slower levels of memory to faster levels, with the aim that algorithm B outperform algorithm A.
Abstract: In this paper, we introduce a concept called algorithmic prefetching, for exploiting some of the features of the IBM RISC System/6000® computer. Algorithmic prefetching denotes changing algorithm A to algorithm B, which contains additional steps to move data from slower levels of memory to faster levels, with the aim that algorithm B outperform algorithm A. The objective of algorithmic prefetching is to minimize any penalty due to cache misses in the innermost loop of an algorithm. This concept, along with “cache blocking,” can be exploited to improve the performance of linear algebra algorithms for dense matrices. We experimentally demonstrated the impact of prefetching on two dense-matrix operations. For one operation, the performance was improved from 74% of peak to 89% of peak by algorithmic prefetching; for the second operation, it was improved from 73% to 87% of the peak performance.


Journal ArticleDOI
TL;DR: ABC continually compensates for off-specification manufacturing steps by feedforward-andfeedback corrective actions which keep the product on target, and its capacity to minimize scrap and rework by compensating for out-of-control conditions is demonstrated in this example.
Abstract: ABC is a generic methodology to improve the quality of manufacturing. It can optimize operation of a single process or an entire factory to meet or exceed product specifications. ABC is based on three nets which continually interact o model processes and to provide local process control and global product optimization. Significant process variables are identified, evaluated, and ranked according to their contributions to product quality. Process performance, which determines product quality, is characterized by a sensitive parameter, the Q-factor, which is used for local control and for global optimization. Real-time response maps capture process behavior and identify current status, improved operating points, and expected improvement in relation to design targets. ABC continually compensates for off-specification manufacturing steps by feedforward-andfeedback corrective actions which keep the product on target. ABC also evaluates and ranks the effects of non-numeric manufacturing variables, such as specific tools and vendors, on product quality. Total quality control can be achieved by optimizing all variables, both sensor-based and non-numeric, which control the product. Some of ABC’s capabilities are demonstrated in a multistep fabrication of a semiconductor capacitor in which the electrical properties of the product are optimized by controlling the individual chemical process steps. ABC’s capacity to minimize scrap and rework by compensating for out-of-control conditions is demonstrated in this example. A functional subset of ABC currently exists as a menu-driven tool, implemented in APW@ on VM/CMS for mainframe computers and in the C language for workstation platforms: RS/6000 running under AIX@ and PS/2@ under OS/2@. ABC is available, in the workstation version, as an IBM Program Offering under the name QuMAP-A Better Control, and is currently used in the semiconductor, pharmaceutical, chemical, and consumer goods industries.

Journal ArticleDOI
B. McNutt1
TL;DR: The key conclusion of the analysis is that in environments with the scattered update patterns typical of database I/O, the utilization of storage must be controlled in order to achieve the high write throughput of which the log-structured disk subsystem is capable.

Journal ArticleDOI
J. G. Rudd1, R. A. Marsh1, J. A. Roecker1
TL;DR: An overview of system requirements and design issues that must be considered in the design of algorithms and software for the surveillance and tracking of ballistic missile launches and original work that has not been previously reported in the technical literature is described.
Abstract: This paper begins with an overview of system requirements and design issues that must be considered in the design of algorithms and software for the surveillance and tracking of ballistic missile launches. Detection and tracking algorithms and approaches are then described for the processing of data from a single satellite and from multiple satellites. We cover track formation, missile detection, track extension, and global arbitration, and indicate how these functions fit together coherently. We include both profile-dependent and profile- free aspects of detection, tracking, and estimation of tactical parameters. In some instances, particularly in the area of track monitoring and in a discussion of how we accommodate intersatellite bias errors in line-of-sight measurements, we describe original work that has not been previously reported in the technical literature.

Journal ArticleDOI
J. E. E. Baglin1
TL;DR: Ion beam technologies provide a variety of well-proven means for creating or enhancing strong, stable, direct adhesion of thin films deposited on substrates as discussed by the authors, including reactive and nonreactive ion beam sputtering, ion-beam assisted deposition, ion implantation, and ion beam stitching.
Abstract: Ion beam technologies provide a variety of well-proven means for creating or enhancing strong, stable, direct adhesion of thin films deposited on substrates. Interface chemical bonding and structure are critical. Yet success with such approaches has been reported for a great variety of systems that have little or no bulk chemical affinity, including metals, polymers, ceramics, and semiconductors. This review paper describes the established techniques of reactive and nonreactive ion beam sputtering, ion-beam-assisted deposition, ion implantation, and ion beam stitching. It then presents representative examples of adhesion enhancement selected from the current literature, in order to clarify the roles of interface chemistry, morphology, contaminants, and stability. The review offers a basis upon which interface tailoring for adhesion may be planned in order to optimize both performance and fabrication of specific materials systems.

Journal ArticleDOI
R. J. Blainey1
TL;DR: This paper describes the parts of the TOBEY compiler which address the instruction scheduling issue and proposes a suitable architecture for this purpose.
Abstract: The high performance of pipelined, superscalar processors such as the POWERS\" and PowerPC\" is achieved in large part through the parallel execution of instructions. This fine-grain parallelism cannot always be achieved by the processor alone, but relies to some extent on the ordering of the instructions in a program. This dependence implies that optimizing compilers for these processors must generate or schedule the instructions in an order that maximizes the possible parallelism. This paper describes the parts of the TOBEY compiler which address the instruction scheduling issue.

Journal ArticleDOI
TL;DR: It is shown, using elementary considerations, that a modified barrier function method for the solution of convex programming problems converges for any fixed positive setting of the barrier parameter.
Abstract: We show, using elementary considerations, that a modified barrier function method for the solution of convex programming problems converges for any fixed positive setting of the barrier parameter. With mild conditions on the primal and dual feasible regions, we show how to use the modified barrier function method to obtain primal and dual optimal solutions, even in the presence of degeneracy. We illustrate the argument for convergence in the case of linear programming, and then generalize it to the convex programming case.

Journal ArticleDOI
TL;DR: In this method, different precursors of the same polyimide (PMDA-ODA) are blended, and this novel technique reduces the extent of chain orientation, gives rise to topographical and morphological surface heterogeneities, and produces a discontinuous ordered skin.
Abstract: Semiflexible polyimide structures are not amenable to good adhesion because of their a) spontaneous orientation of the polymer chains parallel to the film substrate during curing, b) formation of an ordered skin, and c) smooth surface topography. We briefly discuss these structural features with regard to metal-on-polyimide (metal/Pl) adhesion. A method is proposed to improve adhesion by tailoring the surface and bulk morphology of the Pl to circumvent these properties. In this method, different precursors of the same polyimide (PMDA-ODA) are blended. Phase separation includes spontaneous roughening of the Pl surface. This novel technique reduces the extent of chain orientation, gives rise to topographical and morphological surface heterogeneities, and produces a discontinuous ordered skin. A variety of topographical features with nanoscale dimensions are produced that range from “mounds” to “dimples.” The process does not alter the overall chemical composition of the Pl, occurs spontaneously, and is extendable to other polyimides or polymer systems. Threefold and ninefold enhancements of adhesion over that of a conventionally cured PMDA-ODA film are obtainable for electroless and vapor-deposited Cu on Pl, respectively.

Journal ArticleDOI
M. T. Franklin1, W. P. Alexander1, R. Jauhari1, A. M. G. Maynard1, B. R. Olszewski1 
TL;DR: Key features of the POWERS processor and memory subsystem that enhance RISC System/6000@' performance on commercial workloads are described and their analysis methods are described.
Abstract: We describe features of the POWERS \" processor and memory subsystem that enhance RISC System/6000@' performance on commercial workloads. We explain the performance characteristics of commercial workloads and some of the common benchmarks used to measure them. Our own analysis methods are also described.

Journal ArticleDOI
M. D. Pritt1
TL;DR: An algorithm is described for the automated registration of remotely sensed imagery that registers 6000 × 6000-pixel images in 8–18 minutes on an IBM RISC System/6000® workstation and is accurate to the subpixel level even in the presence of noise and large areas of change in the images.
Abstract: An algorithm is described for the automated registration of remotely sensed imagery that registers 6000 × 6000-pixel images in 8–18 minutes on an IBM RISC System/6000® workstation. The resulting registration is accurate to the subpixel level even in the presence of noise and large areas of change in the images. It is shown that the registration- mapping function for parallel projections has the form F(x,y) = A(x,y) + h(x,y)e, where A(x,y) is an affine transformation, h(x,y) is a function that depends on the topographic heights, and e is a vector that defines the epipolar lines. The algorithm determines the parameters of this equation using only the image data, without knowledge of the viewing orientations or scene point coordinates. The search for match points is then a one-dimensional search along the epipolar lines, which greatly increases the speed and accuracy of the registration.

Journal ArticleDOI
P. N. Stiles1, I. S. Glickstein1
TL;DR: An overview is presented of the work on a highly parallelizable route planner that efficiently finds an optimal route between two points and the advantages and disadvantages of the associated search algorithm relative to other search algorithms.
Abstract: An overview is presented of our work on a highly parallelizable route planner that efficiently finds an optimal route between two points; both serial and massively parallel implementations are described. We compare the advantages and disadvantages of the associated search algorithm relative to other search algorithms, and conclude with a discussion of future extensions and related applications.

Journal ArticleDOI
TL;DR: The virtues of replacing CRC10 with CRC32 or with a degree-10 polynomial that has no factors in common with the scrambler are explored and extensive results are presented concerning the capability of CRC10, P2055, and CRC32 to detect various error patterns.
Abstract: This paper covers four topics: 1) the operation and performance of cyclic redundancy checics (CRCs); 2) the shortest error patterns of various weights that are undetectable by the ANSI/IEEE-standard 32-bit CRC (CRC32); 3) the general interaction of data scramblers with CRCs; and 4) the specific problems that arise in ATM communication due to the interaction of the scrambler with the degree-10 CRC polynomial (CRC10). Elaborating 4), we explore the virtues of replacing CRC10 with CRC32 or with a degree-10 polynomial (P2055) that has no factors in common with the scrambler. Extensive results are presented concerning the capability of CRC10, P2055, and CRC32 to detect various error patterns.

Journal ArticleDOI
David Shippy1, T. W. Griffith1
TL;DR: The P0WER2TM fixed-point, data caciie, and storage control units provide a tightly integrated subunit for a second-generation high-performance superscalar RISC processor.
Abstract: The P0WER2TM fixed-point, data caciie, and storage control units provide a tightly integrated subunit for a second-generation high-performance superscalar RISC processor. These functional units provide dual fixed-point execution units and a large multiported data cache, as well as high-performance interfaces to memory, I/O, and the other execution units in the processor. These units provide the following features: dual fixed-point execution units, improved fixed-point/floating-point synchronization, new floating-point load and store quadword instructions, improved address translation, improved fixed-point multiply/divide, large multiported D-cache, increased bandwidth into and out of the caches through wider data buses, an improved external interrupt mechanism, and an Improved I/O DMA mechanism to support multiple-streaming lUlicro Channels.®