Showing papers on "Overhead (computing) published in 1997"

PDF

Open Access

Proceedings Article•DOI•

Two-dimensional pilot-symbol-aided channel estimation by Wiener filtering

[...]

Peter Adam Hoeher, Susanna Kaiser, Patrick Robertson

21 Apr 1997

TL;DR: The discrete shift-variant 2-D Wiener filter is derived and analyzed given an arbitrary sampling grid, an arbitrary (but possibly optimized) selection of observations, and the possibility of model mismatch to reveal the potential of pilot-symbol-aided channel estimation in two dimensions.

...read moreread less

Abstract: The potential of pilot-symbol-aided channel estimation in two dimensions are explored. In order to procure this goal, the discrete shift-variant 2-D Wiener filter is derived and analyzed given an arbitrary sampling grid, an arbitrary (but possibly optimized) selection of observations, and the possibility of model mismatch. Filtering in two dimensions is revealed to outperform filtering in just one dimension with respect to overhead and mean-square error performance. However, two cascaded orthogonal 1-D filters are simpler to implement and shown to be virtually as good as true 2-D filters.

...read moreread less

724 citations

Patent•

Dynamic mapping of broadcast resources

[...]

Robert G. Arsenault, Tam T. Leminh, Thomas H. James

04 Sep 1997

TL;DR: In this paper, a dynamic mapping of broadcast resources (BR) is provided to exploit occasional redundancy in the program content of two or more input data streams (IN), freeing at least one broadcast resource to carry alternate bitstreams, such as additional programs or existing programs at higher quality.

...read moreread less

Abstract: In a data communication system such as a high capacity DBS system, dynamic mapping of broadcast resources (BR) is provided to exploit occasional redundancy in the program content of two or more input data streams (IN), freeing at least one broadcast resource to carry alternate bitstreams, such as additional programs or existing programs at higher quality. Transmission maps (30) defining the correspondence between input data streams and broadcast resources, and reception maps (40) defining the correspondence between broadcast resources and output data streams, are updated as needed to dynamically modify broadcast resource mapping to increase effective utilization of available bandwidth. Beneficial n : n-y : m mapping in a high capacity consumer DBS entertainment system is provided. Apparatus and methods for efficiently generating, maintaining and updating allocation maps (30, 40) with reduced overhead requirements, are disclosed.

...read moreread less

319 citations

Book Chapter•DOI•

Pilot-symbol-aided channel estimation in time and frequency

[...]

Peter Adam Hoeher, Stefan Kaiser, Patrick Robertson

14 Aug 1997

TL;DR: Filtering in two dimensions is revealed to outperform filtering in just one dimension with respect to overhead, mean-square error performance and latency.

...read moreread less

Abstract: The potentials of pilot-symbol-aided channel estimation in two dimensions are explored for mobile radio and broadcasting applications. In order to procure this goal, the discrete shift-variant 2-D Wiener filter is analyzed given an arbitrary sampling grid, an arbitrary (but possibly optimized) selection of observations, and the possibility of model mismatch. Filtering in two dimensions is revealed to outperform filtering in just one dimension with respect to overhead, mean-square error performance and latency. Conceptually, the discrete shiftvariant 2-D Wiener filter is the optimal linear estimator for the given problem, however, two cascaded orthogonal 1-D filters are simpler to implement and virtually as good as true 2-D filters. Analytical results are presented, verified by Monte-Carlo simulations.

...read moreread less

282 citations

Proceedings Article•DOI•

The multicluster architecture: reducing cycle time through partitioning

[...]

Keith Farkas, Paul Chow¹, Norman P. Jouppi, Zvonko G. Vranesic¹•Institutions (1)

University of Toronto¹

01 Dec 1997

TL;DR: A static instruction scheduling algorithm is developed that for the configurations considered the multicluster architecture may have significant performance advantages at feature sizes below 0.35 /spl mu/m, and warrants further investigation.

...read moreread less

Abstract: The multicluster architecture that we introduce offers a decentralized, dynamically scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster is assigned a subset of the architectural registers. The motivation for the multicluster architecture is to reduce the clock cycle time, relative to a single-cluster architecture with the same number of hardware resources, by reducing the size and complexity of components on critical timing paths. Resource partitioning, however, introduces instruction-execution overhead and may reduce the number of concurrently executing instructions. To counter these two negative by-products of partitioning, we developed a static instruction scheduling algorithm. We describe this algorithm, and using trace-driven simulations of SPEC92 benchmarks, evaluate its effectiveness. This evaluation indicates that for the configurations considered the multicluster architecture may have significant performance advantages at feature sizes below 0.35 /spl mu/m, and warrants further investigation.

...read moreread less

275 citations

Proceedings Article•

Fast Computation of Sparse Datacubes

[...]

Kenneth A. Ross, Divesh Srivastava¹•Institutions (1)

AT&T¹

25 Aug 1997

TL;DR: This work proposes a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrates the efficiency of the algorithm using synthetic, benchmark and real-world data sets.

...read moreread less

Abstract: Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. Real-world data is frequently sparse, and hence efficiently computing datacubes over large sparse relations is important. We show that current techniques for computing datacubes over sparse relations do not scale well with the number of CUBE BY attributes, especially when the relation is much larger than main memory. We propose a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrate the efficiency of our algorithm using synthetic, benchmark and real-world data sets. When the relation fits in memory, our technique performs multiple in-memory sorts, and does not incur any I/O beyond the input of the relation and the output of the datacube itself. When the relation does not fit in memory, a divideand-conquer strategy divides the problem of computing the datacube into several simpler computations of sub-datacubes. Often, all but one of the sub-datacubes can be computed in memory and our in-memory solution applies. In that case, the total I/O overhead is linear in the number of CUBE BY attributes. We demonstrate with an implementation that the CPU cost of our algorithm is dominated by the I/O cost for sparse relations. ‘The research of Kenneth A.

...read moreread less

234 citations

Proceedings Article•DOI•

Automated low-power technique exploiting multiple supply voltages applied to a media processor

[...]

Kimiyoshi Usami¹, Kazutaka Nogami¹, Mutsunori Igarashi¹, Fumihiro Minami¹, Yukio Kawasaki¹, Takashi Ishikawa¹, Masahiro Kanazawa¹, Takao Aoki¹, Midori Takano¹, Chiharu Mizuno¹, Makoto Ichida¹, Shinji Sonoda¹, M. Takahashi¹, N. Hatanaka¹ - Show less +10 more•Institutions (1)

Toshiba¹

05 May 1997

TL;DR: An automated design technique to reduce power by making use of two supply voltages by combining structure synthesis, placement and routing and random logic modules of a media processor chip.

...read moreread less

Abstract: This paper describes an automated design technique to reduce power by making use of two supply voltages. The technique consists of structure synthesis, placement and routing. The structure synthesizer clusters the gates off the critical paths so as to supply the reduced voltage to save power. The placement and routing tool assigns either the reduced voltage or the unreduced one to each row so as to minimize the area overhead. Combining these techniques together, we applied it to the random logic modules of a media processor chip. The combined technique reduced the power by 47% on average with an area overhead of 15% at the random logic, while keeping the performance,.

...read moreread less

211 citations

Patent•

Method and apparatus for protecting public key schemes from timing and fault attacks

[...]

Adi Shamir¹•Institutions (1)

Weizmann Institute of Science¹

12 May 1997

TL;DR: In this paper, improved methods and apparatus are provided for protecting public key schemes based on modular exponentiation (including RSA and Diffie-Hellman) from indirect cryptanalytic techniques such as timing and fault attacks.

...read moreread less

Abstract: Improved methods and apparatus are provided for protecting public key schemes based on modular exponentiation (including RSA and Diffie-Hellman) from indirect cryptanalytic techniques such as timing and fault attacks. Known methods for making the implementation of number-theoretic schemes resistant to such attacks typically double their running time, whereas the novel methods and apparatus described in this patent add only negligible overhead. This improvement is particularly significant in smart card and software-based implementations, in which the modular exponentiation operation is quite slow, and doubling its time may be an unacceptable solution.

...read moreread less

173 citations

Patent•

Method and apparatus for providing electronic program guide information from a single electronic program guide server

[...]

Saiprasad V. Naimpally¹•Institutions (1)

Panasonic¹

05 Feb 1997

TL;DR: In this article, a method and apparatus for requesting, receiving, processing, and providing information from a single source to a television viewer is presented, where an information provider is accessed via a communications link and specific data, which is separate and distinct from video signals received by the television receiver, is downloaded to the receiver.

...read moreread less

Abstract: A method and apparatus for requesting, receiving, processing, and providing information from a single source to a television viewer. An information provider is accessed via a communications link and specific data, which is separate and distinct from video signals received by the television receiver, is downloaded to the television receiver. The data provided by the information provider is database information with minimal formatting and does not contain any graphical overhead. Requests for information from the information provider may be on demand or at a predetermined time. The information provided may be filtered by the information provider and/or television receiver based on selected program categories and/or a user provided profile.

...read moreread less

161 citations

Patent•

System and method for automatically and dynamically changing an address associated with a device disposed in a fire channel environment

[...]

James F. Spring McCarty, Richard E. Gunlock, Michael E. Spring McGowan¹•Institutions (1)

Hewlett-Packard¹

11 Feb 1997

TL;DR: In this paper, a computer system with a plurality of devices compatible with the Fibre Channel Protocol (FCP) is provided with the capability to dynamically alter the configuration of the plurality of the devices without a system reset, or without additional software overhead.

...read moreread less

Abstract: A computer system with a plurality of devices compatible with the Fibre Channel Protocol, which computer system is provided with the capability to dynamically alter the configuration of the plurality of devices without a system reset, or without additional software overhead. This capability is realized by providing unique mapping relationships between low-level Fibre Channel information structures related to the devices and upper-level link elements compatible with an Operating System associated with the computer system.

...read moreread less

156 citations

Proceedings Article•DOI•

Embedded program timing analysis based on path clustering and architecture classification

[...]

Rolf Ernst¹, W. Ye¹•Institutions (1)

Braunschweig University of Technology¹

13 Nov 1997

TL;DR: An approach which combines simulation and formal techniques in a safe way to improve analysis precision and tighten the timing bounds is presented, which shows an unprecedented analysis precision allowing us to reduce performance overhead for provably correct system or interface timing.

...read moreread less

Abstract: Formal Program running time verification is an important issue in system design required for performance optimization under "first-time-right" design constraints and for real-time system verification. Simulation based approaches or simple instruction counting are not appropriate and risky for more complex architectures in particular with data dependent execution paths. Formal analysis techniques have suffered from loose timing bounds leading to significant performance penalties when strictly adhered to. We present an approach which combines simulation and formal techniques in a safe way to improve analysis precision and tighten the timing bounds. Using a set of processor parameters, it is adaptable to arbitrary processor architectures. The results show an unprecedented analysis precision allowing to reduce performance overhead for provably correct system or interface timing.

...read moreread less

156 citations

Proceedings Article•DOI•

On the analysis of indexing schemes

[...]

Joseph M. Hellerstein¹, Elias Koutsoupias², Christos H. Papadimitriou¹•Institutions (2)

University of California, Berkeley¹, University of California, Los Angeles²

01 May 1997

TL;DR: A framework for measuring the efficiency of an indexing scheme for a workload based on two characterizations: storage redundancy and access overhead is defined.

...read moreread less

Abstract: We consider the problem of indexing general database workloads (combinations of data sets and sets of potential queries). We define a framework for measuring the efficiency of an indexing scheme for a workload based on two characterizations: storage redundancy (how many times each item in the data set is stored), and access overhead (how many times more blocks than necessary does a query retrieve). Using this framework we present some initial results, showing upper and lower bounds and trade-offs between them in the case of multi-dimensional range queries and set queries.

...read moreread less

Proceedings Article•DOI•

Free transactions with Rio Vista

[...]

David E. Lowell¹, Peter M. Chen¹•Institutions (1)

University of Michigan¹

01 Oct 1997

TL;DR: A system that improves transaction overhead by a factor of 2000 for working sets that fit in main memory and lowers transaction overhead to 5 μsec by using no redo log, no system calls, and only one memory-to-memory copy is presented.

...read moreread less

Abstract: Transactions and recoverable memories are powerful mechanisms for handling failures and manipulating persistent data. Unfortunately, standard recoverable memories incur an overhead of several milliseconds per transaction. This paper presents a system that improves transaction overhead by a factor of 2000 for working sets that fit in main memory. Of this factor of 2000, a factor of 20 is due to the Rio file cache, which absorbs synchronous writes to disk without losing data during system crashes. The remaining factor of 100 is due to Vista, a 720-line, recoverable-memory library tailored for Rio. Vista lowers transaction overhead to 5 μsec by using no redo log, no system calls, and only one memory-to-memory copy. This drastic reduction in overhead leads to a overall speedup of 150-556x for benchmarks based on TPC-B and TPC-C.

...read moreread less

Patent•

Multi-state flash memory defect management

[...]

Robert D. Norman¹•Institutions (1)

Micron Technology¹

01 Dec 1997

TL;DR: In this article, a defect location table for the row of the memory array is provided to identify when a defective memory cell is address;ed for either a read or write access operation.

...read moreread less

Abstract: A system is described which stores data intended for defective memory cells in a row of a memory array in an overhead location of the memory row. The data is stored in the overhead packet during a write operation, and is read from the overhead packet during a read operation. A defect location table for the row of the memory array is provided to identify when a defective memory cell is address;ed for either a read or write access operation. During a write operation, the correct data is stripped from incoming data for storing into the overhead packet. During a read operation, the correct data is inserted into an output data stream from the overhead packet. Data written to defective cells can be either a custom setting, a default setting, or the original data. Shift registers are described for holding good data during either a read or write operation. The number of shift registers used is determined by the number of states stored in a memory cell. The shift registers use a marker for alignment ofdata bits in a data stream.

...read moreread less

Patent•

Transparent management at host interface of flash-memory overhead-bytes using flash-specific DMA having programmable processor-interrupt of high-level operations

[...]

Ricardo H. Bruce, Rolando H. Bruce, Earl T. Cohen

29 Sep 1997

TL;DR: In this article, a DRAM cache stores the pages of data as enlarged pages with the overhead bytes, even though the enlarged pages are not aligned to a power of 2. The overhead bytes store system information such as address pointers for bad-block replacement and write counters used for wear-leveling.

...read moreread less

Abstract: A flash-memory system adds system-overhead bytes to each page of data stored in flash memory chips. The overhead bytes store system information such as address pointers for bad-block replacement and write counters used for wear-leveling. The overhead bytes also contain an error-correction (ECC) code when stored in the flash-memory chips. A DRAM cache stores the pages of data as enlarged pages with the overhead bytes, even though the enlarged pages are not aligned to a power of 2. When an enlarged page is read out of a flash-memory chip, its ECC code is immediately checked and the ECC code in the overhead bytes is replaced with a syndrome code and stored in the DRAM cache. A local processor for the flash-memory system then reads the syndrome code in the overhead bytes and repairs any error using repair information in the syndrome. The overhead bytes are stripped off when pages are transferred from the DRAM cache to a host. The host can be notified early by an intermediate interrupt after a programmable number of pages have been read. This improves performance since the host does not have to wait for an entire block of pages to be read.

...read moreread less

Proceedings Article•DOI•

Dynamic feedback: an effective technique for adaptive computing

[...]

Pedro C. Diniz¹, Martin Rinard¹•Institutions (1)

University of California, Santa Barbara¹

01 May 1997

TL;DR: This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments, and performs a theoretical analysis which provides a guaranteed optimality bound for dynamic feedback relative to a hypothetical (and unrealizable) optimal algorithm.

...read moreread less

Abstract: This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment.We have implemented dynamic feedback in the context of a parallelizing compiler for object-based programs. The generated code uses dynamic feedback to automatically choose the best synchronization optimization policy. Our experimental results show that the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that the compiler is unable to statically choose the best policy, and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy. We have also performed a theoretical analysis which provides, under certain assumptions, a guaranteed optimality bound for dynamic feedback relative to a hypothetical (and unrealizable) optimal algorithm that uses the best policy at every point during the execution.

...read moreread less

Proceedings Article•DOI•

I/O optimal isosurface extraction

[...]

Yi-Jen Chiang¹, Cláudio T. Silva•Institutions (1)

State University of New York System¹

19 Oct 1997

TL;DR: The algorithms improve the performance of isosurface extraction by speeding up the active-cell searching process so that it is no longer a bottleneck, and this search time is independent of the main memory available.

...read moreread less

Abstract: The authors give I/O-optimal techniques for the extraction of isosurfaces from volumetric data, by a novel application of the I/O-optimal interval tree of Arge and Vitter (1996). The main idea is to preprocess the data set once and for all to build an efficient search structure in disk, and then each time one wants to extract an isosurface, they perform an output-sensitive query on the search structure to retrieve only those active cells that are intersected by the isosurface. During the query operation, only two blocks of main memory space are needed, and only those active cells are brought into the main memory, plus some negligible overhead of disk accesses. This implies that one can efficiently visualize very large data sets on workstations with just enough main memory to hold the isosurfaces themselves. The implementation is delicate but not complicated. They give the first implementation of the I/O-optimal interval tree, and also implement their methods as an I/O filter for Vtk's isosurface extraction for the case of unstructured grids. They show that, in practice, the algorithms improve the performance of isosurface extraction by speeding up the active-cell searching process so that it is no longer a bottleneck. Moreover, this search time is independent of the main memory available. The practical efficiency of the techniques reflects their theoretical optimality.

...read moreread less

Journal Article•DOI•

An on-line algorithm for checkpoint placement

[...]

Avi Ziv¹, Jehoshua Bruck²•Institutions (2)

Advanced Technology Center¹, California Institute of Technology²

01 Sep 1997-IEEE Transactions on Computers

TL;DR: The proposed algorithm uses knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint, and its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost.

...read moreread less

Abstract: Checkpointing enables us to reduce the time to recover from a fault by saving intermediate states of the program in a reliable storage. The length of the intervals between checkpoints affects the execution time of programs. On one hand, long intervals lead to long reprocessing time, while, on the other hand, too frequent checkpointing leads to high checkpointing overhead. In this paper, we present an on-line algorithm for placement of checkpoints. The algorithm uses knowledge of the current cost of a checkpoint when it decides whether or not to place a checkpoint. The total overhead of the execution time when the proposed algorithm is used is smaller than the overhead when fixed intervals are used. Although the proposed algorithm uses only on-line knowledge about the cost of checkpointing, its behavior is close to the off-line optimal algorithm that uses a complete knowledge of checkpointing cost.

...read moreread less

Journal Article•DOI•

An adaptive protocol for synchronizing media streams

[...]

Kurt Rothermel¹, Tobias Helbig¹•Institutions (1)

University of Stuttgart¹

01 Sep 1997-Multimedia Systems

TL;DR: An adaptive stream synchronization protocol that supports any kind of distribution of the sources and sinks of the streams to be synchronized, based on a buffer-level control mechanism, allowing immediate corrections when the danger of a buffer overflow or underflow is recognized.

...read moreread less

Abstract: Stream synchronization is widely regarded as a fundamental problem in the field of multimedia systems. Solutions to this problem can be divided into adaptive and rigid mechanisms. While rigid mechanisms are based on worst case assumptions, adaptive ones monitor the underlying network and are able to adapt themselves to changing network conditions. In this paper, we will present an adaptive stream synchronization protocol. This protocol supports any kind of distribution of the sources and sinks of the streams to be synchronized. It is based on a buffer-level control mechanism, allowing immediate corrections when the danger of a buffer overflow or underflow is recognized. Moreover, the proposed protocol is flexible enough to support a wide variety of synchronization policies, which can be dynamically changed while synchronization is in progress. Finally, the message overhead of this protocol is low, because control messages are only exchanged when network conditions change.

...read moreread less

Proceedings Article•DOI•

Testing embedded cores using partial isolation rings

[...]

Nur A. Touba, B. Pouya¹•Institutions (1)

University of Texas at Austin¹

27 Apr 1997

TL;DR: This paper presents a systematic method for designing a partial isolation ring that provides the same fault coverage as a full isolation ring, but avoids adding MUXes on critical timing paths and reduces area overhead.

...read moreread less

Abstract: Intellectual property cores pose a significant test challenge. The core supplier may not give any information about the internal logic of the core, but simply provide a set of test vectors for the core which guarantees a particular fault coverage. If the core is embedded within a larger design, then the problem is how to apply the specified test vectors to the core and how to test the user-defined logic around the core. A simple and fast solution is to place a full isolation ring (i.e., boundary scan) around the core, however, the area and performance overhead for this may not be acceptable in many applications. This paper presents a systematic method for designing a partial isolation ring that provides the same fault coverage as a full isolation ring, but avoids adding MUXes on critical timing paths and reduces area overhead. Efficient ATPG techniques are used to analyze the user-defined logic surrounding the core and identify a maximal set of core inputs and outputs (that includes the critical timing paths) that do not need to be included in the partial isolation ring. Several different partial isolation ring selection strategies that vary in computational complexity are described. Experimental results are shown comparing the different strategies.

...read moreread less

Journal Article•DOI•

IDAMN: an intrusion detection architecture for mobile networks

[...]

D. Samfat¹, Refik Molva¹•Institutions (1)

Institut Eurécom¹

01 Sep 1997-IEEE Journal on Selected Areas in Communications

TL;DR: The main novelty of the IDAMN architecture is its ability to perform intrusion detection in the visited location and within the duration of a typical call, as opposed to existing designs that require the reporting of all call data to the home location in order to perform the actual detection.

...read moreread less

Abstract: We present IDAMN (intrusion detection architecture for mobile networks), a distributed system whose main functionality is to track and detect mobile intruders in real time. IDAMN includes two algorithms which model the behavior of users in terms of both telephony activity and migration pattern. The main novelty of our architecture is its ability to perform intrusion detection in the visited location and within the duration of a typical call, as opposed to existing designs that require the reporting of all call data to the home location in order to perform the actual detection. The algorithms and the components of IDAMN have been designed in order to minimize the overhead incurred in the fixed part of the cellular network.

...read moreread less

Patent•

Overhead console having flip-down monitor

[...]

Christopher J. Vitito

27 Feb 1997

TL;DR: An overhead console for a motor vehicle, boat, or aircraft is described in this paper, which includes an elongated console housing having a leading end and a trailing end, a monitor mounted in the leading end of the console housing, and a compartment for storing a source of video signals.

...read moreread less

Abstract: An overhead console for a motor vehicle, boat, or aircraft, is disclosed. The console includes an elongated console housing having a leading end and a trailing end, a monitor mounted in the leading end of the console housing, and a compartment for storing a source of video signals.

...read moreread less

Journal Article•DOI•

Direct phase-domain modelling of frequency-dependent overhead transmission lines

[...]

H.V. Nguyen¹, Hermann W. Dommel¹, Jose R. Marti¹•Institutions (1)

University of British Columbia¹

01 Jul 1997-IEEE Transactions on Power Delivery

TL;DR: In this article, a new wideband transmission line model, based on synthesizing the line functions directly in the phase domain, is presented, which includes the complete frequency-dependent nature of untransposed overhead transmission lines by means of recursive convolutions over a wide frequency range.

...read moreread less

Abstract: A new wideband transmission line model, based on synthesizing the line functions directly in the phase domain, is presented. It includes the complete frequency-dependent nature of untransposed overhead transmission lines by means of recursive convolutions over a wide frequency range. The model belongs to the class of time-domain models and is designed to be implemented in general electromagnetic transients programs such as the EMTP. Because the synthesis of the frequency-dependent modal transformation matrix is avoided, the method requires fewer convolutions at each time step than full frequency-dependent modal-domain models. Simulations have been performed comparing the proposed model with existing line models, with field recordings, and with calculations from a frequency-domain program. The new model provides accurate answers in both steady-state and transient conditions.

...read moreread less

Journal Article•DOI•

Simple randomized mergesort on parallel disks

[...]

Rakesh D. Barve¹, Edward F. Grove¹, Jeffrey Scott Vitter¹•Institutions (1)

Duke University¹

01 Jun 1997

TL;DR: The upper bound derived on expected I/O performance of SRM indicates that SRM is provably better than disk-striped mergesort (DSM) for realistic parameter values D, M and B.

...read moreread less

Abstract: We consider the problem of sorting a file of N records on the D-disk model of parallel I/O in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D-disks in parallel. We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forecast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks. SRM exhibits a limited use of randomization and also has a useful deterministic version. Generalizing the technique of forecasting, our algorithm is able to read in, at any time, the ‘right’ block from any disk and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the ‘right’ blocks from memory to make space for new ones to be read in. The disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundamental inefficiencies of previous mergesort algorithms. By analysis of generalized maximum occupancy problems we are able to derive an analytical upper bound on SRM's expected overhead valid for arbitrary inputs. The upper bound derived on expected I/O performance of SRM indicates that SRM is provably better than disk-striped mergesort (DSM) for realistic parameter values D, M and B. Average-case simulations show further improvement on the analytical upper bound. Unlike previously proposed optimal sorting algorithms, SRM outper-forms DSM even when the number D of parallel disks is small.

...read moreread less

Journal Article•DOI•

New concept in fault location for overhead distribution systems using superimposed components

[...]

Raj Aggarwal¹, Y. Aslan¹, A.T. Johns¹•Institutions (1)

University of Bath¹

01 May 1997

TL;DR: In this paper, a single-ended fault location approach for overhead distribution systems based on the concept of superimposed voltages and currents is presented, which is highly robust to changes in local and remote source capacities and to the presence of load taps.

...read moreread less

Abstract: Due to the challenges facing many utilities worldwide as a result of deregulation, the demand and importance of accurate fault location in distribution Systems has increased, principally to minimise line outages through effecting repairs expeditiously. The paper presents a novel approach in single-ended fault location for overhead distribution systems based on the concept of superimposed voltages and currents. It is clearly shown that the technique, which is interactive in nature, is highly robust to changes in local and remote source capacities and to the presence of load taps.

...read moreread less

Proceedings Article•DOI•

Embedded program timing analysis based on path clustering and architecture classification

[...]

Ernst, Ye

01 Jan 1997

TL;DR: In this paper, the authors present an approach which combines simulation and formal techniques in a safe way to improve analysis precision and tighten the timing bounds, using a set of processor parameters, which is adaptable to arbitrary processor architectures.

...read moreread less

Abstract: Formal program running time verification is an important issue in system design required for performance optimization under "first-time-right" design constraints and for real time system verification. Simulation based approaches or simple instruction counting are not appropriate and risky for more complex architectures in particular with data dependent execution paths. Formal analysis techniques have suffered from loose timing bounds leading to significant performance penalties when strictly adhered to. We present an approach which combines simulation and formal techniques in a safe way to improve analysis precision and tighten the timing bounds. Using a set of processor parameters, it is adaptable to arbitrary processor architectures. The results show an unprecedented analysis precision allowing us to reduce performance overhead for provably correct system or interface timing.

...read moreread less

Proceedings Article•DOI•

Automatic inline allocation of objects

[...]

Julian Dolby¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 May 1997

TL;DR: This paper introduces object inlining, an optimization that automatically inline allocates objects within containers (as is done by hand in C++) within a uniform model, and presents the technique, which includes novel program analyses that track how inlinable objects are used throughout the program.

...read moreread less

Abstract: Object-oriented languages like Java and Smalltalk provide a uniform object model that simplifies programming by providing a consistent, abstract model of object behavior. But direct implementations introduce overhead, removal of which requires aggressive implementation techniques (e.g. type inference, function specialization); in this paper, we introduce object inlining, an optimization that automatically inline allocates objects within containers (as is done by hand in C++) within a uniform model. We present our technique, which includes novel program analyses that track how inlinable objects are used throughout the program. We evaluated object inlining on several object-oriented benchmarks. It produces performance up to three times as fast as a dynamic model without inlining and roughly equal to that of manually-inlined codes.

...read moreread less

Patent•

Parallel file system and method for independent metadata loggin

[...]

Frank B. Schmuck¹, Radha R. Kandadai¹, Anthony J. Zlotek¹, Robert J. Curran¹, William A. Kish¹ - Show less +1 more•Institutions (1)

IBM¹

11 Jul 1997

TL;DR: In this paper, locking techniques reduce the overhead of a token manager which is also used in the file system recovery if a computer participating in the management of shared disks becomes unavailable or failed.

...read moreread less

Abstract: A computer system having a shared disk file system running on multiple computers each having their own instance of an operating system and being coupled for parallel data sharing access to files residing on network attached shared disks. Locking techniques reduce the overhead of a token manager which is also used in the file system recovery if a computer participating in the management of shared disks becomes unavailable or failed. Synchronous and asynchronous takeover of a metadata node occurs for correction of metadata which was under modification and a new computer node to be a metadata node for that file. Locks are not constantly required to allocate new blocks on behalf of a user.

...read moreread less

Proceedings Article•DOI•

Distributed recovery with K-optimistic logging

[...]

Yi-Min Wang¹, Om P. Damani², Vijay K. Garg³•Institutions (3)

Bell Labs¹, AT&T², Microsoft³

27 May 1997

TL;DR: This paper introduces the concept of K-optimistic logging where K is the degree of optimism that can be used to fine-tune the tradeoff between failure-free overhead and recovery efficiency, and proves that only dependencies on those states that may be lost upon a failure need to be tracked on-line.

...read moreread less

Abstract: Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world applications to reduce service downtime. Most industrial applications have chosen pessimistic logging because it allows fast and localized recovery. The price that they must pay, however, is the higher failure-free overhead. In this paper, we introduce the concept of K-optimistic logging where K is the degree of optimism that can be used to fine-tune the tradeoff between failure-free overhead and recovery efficiency. Traditional pessimistic logging and optimistic logging then become the two extremes in the entire spectrum spanned by K-optimistic logging. Our approach is to prove that only dependencies on those states that may be lost upon a failure need to be tracked on-line, and so transitive dependency tracking can be performed with a variable-size vector. The size of the vector piggybacked on a message then indicates the number of processes whose failures may revoke the message, and K corresponds to the system-imposed upper bound on the vector size.

...read moreread less

Proceedings Article•DOI•

Procedure based program compression

[...]

Darko Kirovski¹, Johnson Kin¹, William H. Mangione-Smith¹•Institutions (1)

University of California, Los Angeles¹

01 Dec 1997

TL;DR: This paper will discuss a new approach to implementing transparent program compression that requires little or no hardware support, and results in an average memory reduction of 40% with a runtime performance overhead of 10%.

...read moreread less

Abstract: Cost and power consumption are two of the most important design factors for many embedded systems, particularly consumer devices. Products such as personal digital assistants, pagers with integrated data services and smart phones have fixed performance requirements but unlimited appetites for reduced cost and increased battery life. Program compression is one technique that can be used to attack both of these problems. Compressed programs require less memory, thus reducing the cost of both direct materials and manufacturing. Furthermore, by relying on compressed memory, the total number of memory references is reduced. This reduction saves power by lowering the traffic on high-capacitance buses. This paper discusses a new approach to implementing transparent program compression that requires little or no hardware support. Procedures are compressed individually, and a directory structure is used to bind them together at run-time. Decompressed procedures are explicitly cached in ordinary RAM as complete units, thus resolving references within each procedure. This approach has been evaluated on a set of 25 embedded multimedia and communications applications, and results in an average memory reduction of 40% with a run-time performance overhead of 10%.

...read moreread less

Proceedings Article•DOI•

The design and verification of a high-performance low-control-overhead asynchronous differential equation solver

[...]

K.Y. Yun¹, Peter A. Beerel², Vida Vakilotojar², A.E. Dooply¹, Juan Carlos Arceo¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, University of Southern California²

07 Apr 1997

TL;DR: This paper describes the design and verification of a high-performance asynchronous differential equation solver that has low control overhead which allows the average-case delay to be 48% faster than any comparable synchronous design.

...read moreread less

Abstract: This paper describes the design and verification of a high-performance asynchronous differential equation solver. The design has low control overhead which allows the average-case delay to be 48% faster (tested at 22/spl deg/C and 3.3 V) than any comparable synchronous design (simulated at 100/spl deg/C and 3 V). The techniques to reduce completion sensing overhead and hide control overhead at the circuit, architectural, and protocol levels are discussed. In addition, symbolic model checking techniques are described that were used to gain higher confidence in the correctness of the timed distributed control.

...read moreread less

Collapse