scispace - formally typeset
Search or ask a question

Showing papers on "Cache published in 1998"


Patent
15 May 1998
TL;DR: In this paper, a technique for automatic, transparent, distributed, scalable and robust caching, prefetching, and replication in a computer network that request messages for a particular document follow paths from the clients to a home server that form a routing graph.
Abstract: A technique for automatic, transparent, distributed, scalable and robust caching, prefetching, and replication in a computer network that request messages for a particular document follow paths from the clients to a home server that form a routing graph. Client request messages are routed up the graph towards the home server as would normally occur in the absence of caching. However, cache servers are located along the route, and may intercept requests if they can be serviced. In order to be able to service requests in this manner without departing from standard network protocols, the cache server needs to be able to insert a packet filter into the router associated with it, and needs also to proxy for the homer server from the perspective of the client. Cache servers may cooperate to service client requests by caching and discarding documents based on its local load, the load on its neighboring caches, attached communication path load, and on document popularity. The cache servers can also implement security schemes and other document transformation features.

463 citations


Proceedings ArticleDOI
01 Oct 1998
TL;DR: This paper proposes a new protocol called "Summary Cache"; each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these summaries for potential hits before sending any queries, which enables cache sharing among a large number of proxies.
Abstract: The sharing of caches among Web proxies is an important technique to reduce Web traffic and alleviate network bottlenecks. Nevertheless it is not widely deployed due to the overhead of existing protocols. In this paper we propose a new protocol called "Summary Cache"; each proxy keeps a summary of the URLs of cached documents of each participating proxy and checks these summaries for potential hits before sending any queries. Two factors contribute to the low overhead: the summaries are updated only periodically, and the summary representations are economical --- as low as 8 bits per entry. Using trace-driven simulations and a prototype implementation, we show that compared to the existing Internet Cache Protocol (ICP), Summary Cache reduces the number of inter-cache messages by a factor of 25 to 60, reduces the bandwidth consumption by over 50%, and eliminates between 30% to 95% of the CPU overhead, while at the same time maintaining almost the same hit ratio as ICP. Hence Summary Cache enables cache sharing among a large number of proxies.

446 citations


Proceedings ArticleDOI
29 Mar 1998
TL;DR: Bounds and a close approximation for the expected cardinality of the maximum matching in a random graph have been derived and are outlined and the two phase ISCOD algorithms are presented.
Abstract: We present the informed source coding on-demand (ISCOD) approach for efficiently supplying non-identical data from a central server to multiple caching clients through a broadcast channel. The key idea underlying ISCOD is the joint exploitation of the data already cached by each client, the server's full awareness of client cache contents and client requests, and the fact that each client only needs to be able to derive the items requested by it rather than all the items ever transmitted or even the union of the items requested by the different clients. We present a set of two-phase ISCOD algorithms. The server uses these algorithms to assemble ad-hoc error correction sets based its knowledge of every client's cache content and of the items requested by it; next, it uses error-correction codes to construct the data that is actually transmitted. Each client uses its cached data and the received supplemental data to derive the items that it has requested. This technique achieves a reduction of up to tens of percents in the amount of data that must be transmitted in order for every client to be able to derive the data requested by it. Finally, we define k-partial cliques in a directed graph, and cast the two phase approach in terms of partial clique covers. As a byproduct of this work, bounds and a close approximation for the expected cardinality of the maximum matching in a random graph have been derived and are outlined.

415 citations


Patent
Eric Horvitz1
15 Jan 1998
TL;DR: In this article, a probabilistic user model is proposed for prefetching a web page of future interest to a user, in terms of a discounted expected rate of refinement in value with time for the future page, than a current incremental benefit being obtained for that user by continuing the current download.
Abstract: A technique that, through continual computation, harnesses available computer resources during periods of low processing activity and low network activity, such as idle time, for prefetching, e.g., web pages, or pre-selected portions thereof, into local cache of a client computer. This technique utilizes, e.g., a probabilistic user model, which specifies, at any one time, those pages or portions of pages that are likely to be prefetched given, e.g., a web page currently being rendered to a user; these pages being those which promise to provide the largest benefit (expected utility) to the user. Advantageously, this technique prematurely terminates or retards a current information download for a user in favor of prefetching a web page of future interest to that user whenever the latter page exhibits greater current incremental benefit to that user, in terms of a discounted expected rate of refinement in value with time for the future page, than a current incremental benefit being obtained for that user by continuing the current download.

391 citations


Patent
15 Apr 1998
TL;DR: In this paper, a method for consistently storing cached objects in the presence of failures is provided, where objects are indexed by a directory table that is stored in main memory and mapped to non-volatile storage.
Abstract: A method for consistently storing cached objects in the presence of failures is provided. This method ensures atomic object consistency--in the event of failure and restart, an object will either be completely present or completely absent from the cache, never truncated or corrupted. Furthermore, this consistency comes without any time-consuming data structure reconstruction on restart. In this scheme, objects are indexed by a directory table that is stored in main memory and mapped to non-volatile storage, and changes to the directory table are buffered into an open directory that is stored in main memory. Cache objects are either stored in volatile aggregation buffers or in segments of non-volatile disk storage called arenas. Objects are first coalesced into memory-based aggregation buffers, and later committed to disk. Locking is used to control parallel storage to aggregation buffers. Directory entries pointing to objects are only permitted to be written to persistent disk storage after the target objects are themselves committed to disk, preventing dangling pointers. Periodically, when the contents of open directory entries point to objects that are stably stored on disk, the open directory entries are copied into the directory table and committed to non-volatile storage. The disclosure also encompasses a computer program product, computer apparatus, and computer data signal configured similarly.

367 citations


Patent
Eric Horvitz1
06 Feb 1998
TL;DR: In this paper, the browser prefetches and stores each web page (or component thereof) in its local cache, providing a suitable and preferably visual indication, through its graphical user interface, to a user that this item has been fetched and stored.
Abstract: A technique, specifically apparatus and accompanying methods for use therein, that, through continual computation, harnesses available computer resources during periods of low processing activity and low network activity, such as idle time, for prefetching, e.g., web pages, or pre-selected portions thereof, into local cache of a client computer. As the browser prefetches and stores each web page (or component thereof) in its local cache, the browser provides a suitable and preferably visual indication, through its graphical user interface, to a user that this item has been fetched and stored. Consequently, the user can quickly and visually perceive that a particular item (i.e., a “fresh” page or portion) has just been prefetched and which (s)he can now quickly access from local cache. As such additional items are cached, the browser can change the color of the displayed hotlink associated with each of the items then stored in cache so as, through color coding, to reflect their relative latency (“aging”) in cache.

355 citations


Proceedings ArticleDOI
01 Nov 1998
TL;DR: An architecture that features dynamic multithreading execution of a single program that minimizes the impact of ICache misses and branch mispredictions by fetching and dispatching instructions out-of-order and uses a novel value prediction and recovery mechanism to reduce artificial data dependencies created by the use of a stack to manage run-time storage is presented.
Abstract: We present an architecture that features dynamic multithreading execution of a single program. Threads are created automatically by hardware at procedure and loop boundaries and executed speculatively on a simultaneous multithreading pipeline. Data prediction is used to alleviate dependency constraints and enable lookahead execution of the threads. A two-level hierarchy significantly enlarges the instruction window. Efficient selective recovery from the second level instruction window takes place after a mispredicted input to a thread is corrected. The second level is slower to access but has the advantage of large storage capacity. We show several advantages of this architecture: (1) it minimizes the impact of ICache misses and branch mispredictions by fetching and dispatching instructions out-of-order, (2) it uses a novel value prediction and recovery mechanism to reduce artificial data dependencies created by the use of a stack to manage run-time storage, and (3) it improves the execution throughput of a superscalar by 15% without increasing the execution resources or cache bandwidth, and by 30% with one additional ICache fetch port. The speedup was measured on the integer SPEC95 benchmarks, without any compiler support, using a detailed performance simulator.

339 citations


Patent
23 Jul 1998
TL;DR: The NI Cache as discussed by the authors is a network infrastructure cache that provides proxy file services to a plurality of client workstations concurrently requesting access to file data stored on a server through a network interface.
Abstract: A network-infrastructure cache ("NI Cache") transparently provides proxy file services to a plurality of client workstations concurrently requesting access to file data stored on a server. The NI Cache includes a network interface that connects to a digital computer network. A file-request service-module of the NI Cache receives and responds to network-file-services-protocol requests from workstations through the network interface. A cache, also included in the NI Cache, stores data that is transmitted back to the workstations. A file-request generation-module, also included in the NI Cache, transmits requests for data to the server, and receives responses from the server that include data missing from the cache.

331 citations


Proceedings ArticleDOI
31 Jan 1998
TL;DR: This proposal uses distributed caches to eliminate the latency and bandwidth problems of the ARB and conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches.
Abstract: Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences can be overcome by memory dependence speculation which enables a load or store to be speculatively executed before the addresses of all preceding loads and stores are known. Furthermore, multiple speculative stores to a memory location create multiple speculative versions of the location. Program order among the speculative versions must be tracked to maintain sequential semantics. A previously proposed approach, the address resolution buffer (ARB) uses a centralized buffer to support speculative versions. Our proposal, called the speculative versioning cache (SVC), uses distributed caches to eliminate the latency and bandwidth problems of the ARB. The SVC conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches. A preliminary evaluation for the multiscalar architecture shows that hit latency is an important factor affecting performance, and private cache solutions trade-off hit rate for hit latency.

317 citations


Journal ArticleDOI
TL;DR: Equipped with the URL routing table and neighbor cache contents, a cache in the revised design can now search the local group, and forward all missing queries quickly and efficiently, thus eliminating both the waiting delay and the overhead associated with multicast queries.
Abstract: An adaptive, highly scalable, and robust web caching system is needed to effectively handle the exponential growth and extreme dynamic environment of the World Wide Web. Our work presented last year sketched out the basic design of such a system. This sequel paper reports our progress over the past year. To assist caches making web query forwarding decisions, we sketch out the basic design of a URL routing framework. To assist fast searching within each cache group, we let neighbor caches share content information. Equipped with the URL routing table and neighbor cache contents, a cache in the revised design can now search the local group, and forward all missing queries quickly and efficiently, thus eliminating both the waiting delay and the overhead associated with multicast queries. The paper also presents a proposal for incremental deployment that provides a smooth transition from the currently deployed cache infrastructure to the new design.

315 citations


Patent
06 Oct 1998
TL;DR: In this paper, a scrambled data transmission is descrambled by communicating encrypted program information and authentication information between an external storage device and block buffers of a secure circuit, where the program information is communicated in block chains to reduce the overhead of the authentication information.
Abstract: A scrambled data transmission is descrambled by communicating encrypted program information and authentication information between an external storage device and block buffers of a secure circuit. The program information is communicated in block chains to reduce the overhead of the authentication information. The program information is communicated a block at a time, or even a chain at a time, and stored temporarily in block buffers and a cache, then provided to a CPU to be processed. The blocks may be stored in the external storage device according to a scrambled address signal, and the bytes, blocks, and chains may be further randomly re-ordered and communicated to the block buffers non-sequentially to obfuscate the processing sequence of the program information. Program information may be also be communicated from the secure circuit to the external memory. The program information need not be encrypted but only authenticated for security.

Proceedings ArticleDOI
01 Oct 1998
TL;DR: Results show that profile driven data placement significantly reduces the data miss rate by 24% on average, and a compiler directed approach that creates an address placement for the stack, global variables, heap objects, and constants in order to reduce data cache misses is presented.
Abstract: As the gap between memory and processor speeds continues to widen, cache eficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache pet$ormance by mapping code with temporal locality to different cache blocks in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved data cache pedormance.In this paper we present a general framework for Cache Conscious Data Placement. This is a compiler directed approach that creates an address placement for the stack (local variables), global variables, heap objects, and constants in order to reduce data cache misses. The placement of data objects is guided by a temporal relationship graph between objects generated via profiling. Our results show that profile driven data placement significantly reduces the data miss rate by 24% on average.

Patent
09 Oct 1998
TL;DR: In this article, a request can be forwarded to a cooperating cache server if the requested object cannot be found locally, and the load is balanced by shifting some or all of the forwarded requests from an overloaded cache server to a less loaded one.
Abstract: In a system including a collection of cooperating cache servers, such as proxy cache servers, a request can be forwarded to a cooperating cache server if the requested object cannot be found locally. An overload condition is detected if for example, due to reference skew, some objects are in high demand by all the clients and the cache servers that contain those hot objects become overloaded due to forwarded requests. In response, the load is balanced by shifting some or all of the forwarded requests from an overloaded cache server to a less loaded one. Both centralized and distributed load balancing environments are described.

Patent
10 Jun 1998
TL;DR: In this article, a configuration tree is used to partition a single multiprocessor into multiple partitions, each running a distinct copy, or instance, of an operating system, each of the partitions has access to its own physical resources plus resources designated as shared.
Abstract: Multiple instances of operating systems execute cooperatively in a single multiprocessor computer wherein all processors and resources are electrically connected together. The single physical machine with multiple physical processors and resources is subdivided by software into multiple partitions, each running a distinct copy, or instance, of an operating system. Each of the partitions has access to its own physical resources plus resources designated as shared. The partitioning is performed by assigning all resources with a configuration tree. None, some, or all, resources may be designated as shared among multiple partitions. Each individual operating instance will generally be assigned the resources it needs to execute independently and these resources will be designated as “private.” Other resources, particularly memory, can be assigned to more than one instance and shared. Shared memory is cache coherent so that instances may be tightly coupled, and may share resources that are normally allocated to a single instance. This allows previously distributed user or operating system applications which usually must pass messages via an external interconnect to operate cooperatively in the shared memory without the need for either an external interconnect or message passing. Examples of application that could take advantage of this capability include distributed lock managers and cluster interconnects. Newly-added resources, such as CPUs and memory, can be dynamically assigned to different partitions and used by instances of operating systems running within the machine by modifying the configuration.

Journal ArticleDOI
TL;DR: The structure and functionality of the Internet cache protocol (ICP) and its implementation in the Squid web caching software are described and successes, failures, and lessons learned from using ICP to deploy a global Web cache hierarchy are cataloged.
Abstract: We describe the structure and functionality of the Internet cache protocol (ICP) and its implementation in the Squid web caching software. ICP is a lightweight message format used for communication among Web caches. Caches exchange ICP queries and replies to gather information to use in selecting the most appropriate location from which to retrieve an object. We present background on the history of ICP, and discuss issues in ICP deployment, efficiency, security, and interaction with other aspects of Web traffic behavior. We catalog successes, failures, and lessons learned from using ICP to deploy a global Web cache hierarchy.

Proceedings ArticleDOI
01 May 1998
TL;DR: Experiments on arange of programs indicate PADLITE can eliminate conflicts for benchmarks, but PAD is more effective over a range of cache and problem sizes, with some SPEC95 programs improving up to 15%.
Abstract: Many cache misses in scientific programs are due to conflicts caused by limited set associativity. We examine two compile-time data-layout transformations for eliminating conflict misses, concentrating on misses occuring on every loop iteration. Inter-variable padding adjusts variable base addresses, while intra-variable padding modifies array dimension sizes. Two levels of precision are evaluated. PADLITE only uses array and column dimension sizes, relying on assumptions about common array reference patterns. PAD analyzes programs, detecting conflict misses by linearizing array references and calculating conflict distances between uniformly-generated references. The Euclidean algorithm for computing the gcd of two numbers is used to predict conflicts between different array columns for linear algebra codes. Experiments on a range of programs indicate PADLITE can eliminate conflicts for benchmarks, but PAD is more effective over a range of cache and problem sizes. Padding reduces cache miss rates by 16% on average for a 16K direct-mapped cache. Execution times are reduced by 6% on average, with some SPEC95 programs improving up to 15%.

Patent
Eric Horvitz1
15 Jan 1998
TL;DR: In this paper, a probabilistic user model is used to specify, at any one time, those pages or portions of pages that are likely to be prefetched given, e.g., a web page currently being rendered to a user, which promise to provide the largest benefit to the user.
Abstract: A technique that, through continual computation, harnesses available computer resources during periods of low processing activity and low network activity, such as idle time, for prefetching, e.g., web pages, or pre-selected portions thereof, into local cache of a client computer. This technique utilizes a probabilistic user model to specify, at any one time, those pages or portions of pages that are likely to be prefetched given, e.g., a web page currently being rendered to a user, which promise to provide the largest benefit (expected utility) to the user. Specifically, once a user, at a client computer, enters an address of a desired web page, a set containing web addresses of web pages, that based on the user model are each likely to be accessed next by that user, are determined, with corresponding files therefor prefetched, in order of their expected utility to the user, by the client computer during intervals of low processing activity and low network activity. Expected utility of a page or portion is assessed as a product of rate of refinement in utility of that page or portion to the user multiplied by its transition probability. Once prefetched, these pages or portions are stored in local cache at the client computer for ready access should the user next select any such page or portion.

Patent
09 Apr 1998
TL;DR: In this paper, a method and apparatus are disclosed for providing mirrored site administrators (480) with the number of hits from a proxy's document cache (465) and for dispatching document requests in a proxy (410) to more efficiently allocate the document cache space within the proxy.
Abstract: A method and apparatus are disclosed for providing mirrored site administrators (480) with the number of hits from a proxy's document cache (465) and for dispatching document requests in a proxy (410) to more efficiently allocate the document cache space (465) within the proxy (410). A proxy (410) includes a document cache (465) storing recently requested documents, and maintains information regarding requests from the client (1) that are serviced from the proxy's document cache (465) such as the Uniform Resource Locator (URL) of the requested document and the number of cached responses. This information is provided by the proxy (410) to a remote site administrator (480). In this manner, remote site administrators (480) can more accurately track total hits. According to another aspect of the present invention, the proxy (410) implements a dispatching scheme for client requests that results in a more efficient allocation of the proxy's document cache space (465).

Proceedings ArticleDOI
01 Jun 1998
TL;DR: Chunk-based caching allows fine granularity caching, and allows queries to partially reuse the results of previous queries with which they overlap, and a new organization for relational tables, which is called a “chunked file” is proposed.
Abstract: Caching has been proposed (and implemented) by OLAP systems in order to reduce response times for multidimensional queries Previous work on such caching has considered table level caching and query level caching Table level caching is more suitable for static schemes On the other hand, query level caching can be used in dynamic schemes, but is too coarse for “large” query results Query level caching has the further drawback for small query results in that it is only effective when a new query is subsumed by a previously cached query In this paper, we propose caching small regions of the multidimensional space called “chunks” Chunk-based caching allows fine granularity caching, and allows queries to partially reuse the results of previous queries with which they overlap To facilitate the computation of chunks required by a query but missing from the cache, we propose a new organization for relational tables, which we call a “chunked file” Our experiments show that for workloads that exhibit query locality, chunked caching combined with the chunked file organization performs better than query level caching An unexpected benefit of the chunked file organization is that, due to its multidimensional clustering properties, it can significantly improve the performance of queries that “miss” the cache entirely as compared to traditional file organizations

Proceedings ArticleDOI
01 Oct 1998
TL;DR: This paper studies a technique for using a generational garbage collector to reorganize data structures to produce a cache-conscious data layout, in which objects with high temporal affinity are placed next to each other, so that they are likely to reside in the same cache block.
Abstract: The cost of accessing main memory is increasing. Machine designers have tried to mitigate the consequences of the processor and memory technology trends underlying this increasing gap with a variety of techniques to reduce or tolerate memory latency. These techniques, unfortunately, are only occasionally successful for pointer-manipulating programs. Recent research has demonstrated the value of a complementary approach, in which pointer-based data structures are reorganized to improve cache locality.This paper studies a technique for using a generational garbage collector to reorganize data structures to produce a cache-conscious data layout, in which objects with high temporal affinity are placed next to each other, so that they are likely to reside in the same cache block. The paper explains how to collect, with low overhead, real-time profiling information about data access patterns in object-oriented languages, and describes a new copying algorithm that utilizes this information to produce a cache-conscious object layout.Preliminary results show that this technique reduces cache miss rates by 21--42%, and improves program performance by 14--37% over Cheney's algorithm. We also compare our layouts against those produced by the Wilson-Lam-Moher algorithm, which attempts to improve program locality at the page level. Our cache-conscious object layouts reduces cache miss rates by 20--41% and improves program performance by 18--31% over their algorithm, indicating that improving locality at the page level is not necessarily beneficial at the cache level.

Journal ArticleDOI
25 Nov 1998
TL;DR: This paper presents Cache Digest, a novel protocol and optimization technique for cooperative Web caching that allows proxies to make information about their cache contents available to peers in a compact form and shows that Cache Digest outperforms ICP in several categories.
Abstract: This paper presents Cache Digest, a novel protocol and optimization technique for cooperative Web caching. Cache Digest allows proxies to make information about their cache contents available to peers in a compact form. A peer uses digests to identify neighbors that are likely to have a given document. Cache Digest is a promising alternative to traditional per-request query/reply schemes such as ICP. We discuss the design ideas behind Cache Digest and its implementation in the Squid proxy cache. The performance of Cache Digest is compared to ICP using real-world Web caches operated by NLANR. Our analysis shows that Cache Digest outperforms ICP in several categories. Finally, we outline improvements to the techniques we are currently working on.

Patent
07 Jan 1998
TL;DR: In this article, a cache for use with a network filter that receives, stores, and ejects local rule bases dynamically is proposed, where the cache stores a rule that was derived from a rule base in the filter.
Abstract: A cache for use with a network filter that receives, stores and ejects local rule bases dynamically. The cache stores a rule that was derived from a rule base in the filter. The cache rule is associated in the cache with a rule base indicator indicating from which rule base the cache rule was derived, and a rule base version number indicating the version of the rule base from which the cache rule was derived. When the filter receives a packet, the cache is searched for a rule applicable to a received packet. If no such rule is found, the filter rule base is found, and an applicable rule is carried out and copied to the cache along with a rule base indicator and version number. If a cache rule is found, it is implemented if its version number matches the version number of the rule base from which it was derived. Otherwise, the cache rule is deleted. The cache provides an efficient way of accurately implementing the rules of a dynamic rule base without having to search the entire rule base for each packet.

Patent
Keith P. Loring1, Paritosh D. Patel1
27 Mar 1998
TL;DR: In this paper, a network spoken language vocabulary system for a speech application comprises: a network server and a plurality of network clients communicating with one another over a network; a central vocabulary list in the server for recognizable words; a speech recognition engine and a local vocabulary list cache in each of the clients.
Abstract: A network spoken language vocabulary system for a speech application comprises: a network server and a plurality of network clients communicating with one another over a network; a central vocabulary list in the server for recognizable words; a speech recognition engine and a local vocabulary list cache in each of the clients The cache can have therein previously recognized words communicated from the central vocabulary list in the server and new words recognized by the speech application Each of the new words can be communicated to the server and added to the central vocabulary list and each of the new words added to the central vocabulary list can be communicated to at least one other of the clients for addition to the cache The new words can be automatically communicated to and from the server

Patent
18 Aug 1998
TL;DR: In this article, a system prefetches most frequently used domain names and stores the domain name data at local cache servers, generating validity codes to enable error checking for valid domain names without accessing root servers.
Abstract: A system prefetches most frequently used domain names and stores the domain name data at local cache servers. It generates validity codes to enable error checking for valid domain names at the local cache servers without accessing root servers. A cache server obtains, stores, and propagates updates or new DNS data to local cache servers at predetermined intervals. Users can obtain internet protocol addresses of domain names directly from local cache servers, thus eliminating processing delays over the Internet.

Proceedings ArticleDOI
16 Apr 1998
TL;DR: Examination of database performance on SMT processors using traces of the Oracle database management system shows that while DBMS workloads have large memory footprints, there is substantial data reuse in a small, cacheable "critical" working set.
Abstract: Simultaneous multithreading (SMT) is an architectural technique in which the processor issues multiple instructions from multiple threads each cycle. While SMT has been shown to be effective on scientific workloads, its performance on database systems is still an open question. In particular, database systems have poor cache performance, and the addition of multithreading has the potential to exacerbate cache conflicts.This paper examines database performance on SMT processors using traces of the Oracle database management system. Our research makes three contributions. First, it characterizes the memory-system behavior of database systems running on-line transaction processing and decision support system workloads. Our data show that while DBMS workloads have large memory footprints, there is substantial data reuse in a small, cacheable "critical" working set. Second, we show that the additional data cache conflicts caused by simultaneous multithreaded instruction scheduling can be nearly eliminated by the proper choice of software-directed policies for virtual-to-physical page mapping and per-process address offsetting. Our results demonstrate that with the best policy choices, D-cache miss rates on an 8-context SMT are roughly equivalent to those on a single-threaded superscalar. Multithreading also leads to better interthread instruction cache sharing, reducing I-cache miss rates by up to 35%. Third, we show that SMT's latency tolerance is highly effective for database applications. For example, using a memory-intensive OLTP workload, an 8-context SMT processor achieves a 3-fold increase in instruction throughput over a single-threaded superscalar with similar resources.

Patent
03 Nov 1998
TL;DR: In this paper, the authors propose a dynamic redirection service (DRS) module that extracts therefrom pairs of client workstations and services, employs a performance metric to order those pairs, and compiles a list (138) of workst stations (24) and services that are assigned to the proxy; and a name resolution filter ('NRF') module (136) that, receives the list and network-name-resolution requests, and, when enabled by the list, resolves requests by sending network addresses for the proxy to clients.
Abstract: Generally a computer network includes a file server (22), a network (26), and several client workstations (24). Specific network software provides a name server ('NS') (122) to resolve network-name requests. The computer network can also include a proxy for a network service, e.g. a network infrastructure cache (72) that stores files copied from the server (22). Automatic network-name-services configuration adds to this: 1) a traffic-monitor module (132) that identifies shared network services, and collects service use data; 2) a dynamic redirection service ('DRS') module (126) that receives the collected data, extracts therefrom pairs of client workstations (24) and services, employs a performance metric to order those pairs, and compiles a list (138) of workstations (24) and services that are assigned to the proxy; and 3) a name resolution filter ('NRF') module (136) that, receives the list (138) and network-name-resolution requests, and, when enabled by the list, resolves requests by sending network addresses for the proxy to client workstations (24).

Patent
15 Apr 1998
TL;DR: In this article, a method is provided for caching and delivering an alternate version from among a plurality of alternate versions of information objects, which is customized for the requesting client, without requiring access to the original object server.
Abstract: A method is provided for caching and delivering an alternate version from among a plurality of alternate versions of information objects One or more alternate versions of an information object, for example, versions of the information object that are prepared in different languages or compatible with different systems, are stored in an object cache database In the cache, a vector of alternates is associated with a key value that identifies the information object The vector of alternates stores information that describes the alternate, the context and constraints of the object's use, and a reference to the location of the alternate's object content When a subsequent client request for the information object is received, the cache extracts information from the client request, and attempts to select an acceptable and optimal alternate from the vector by matching the request information to the cached contextual information in the vector of alternates This selection is performed in a time- and space-efficient manner Accordingly, the cache can deliver different versions of an information object based on the metadata and criteria specified in a request to the cache As a result, the information delivered by the cache is customized for the requesting client, without requiring access to the original object server

Patent
Anne Wright1, James Randal Sargent1, Carl Witty1, Brian K. Moran1, David Feinlieb1 
16 Sep 1998
TL;DR: In this paper, a webcast system has a server unit that gathers Web pages from sites on the Internet and stores the pages in a cache, and a broadcast unit that retrieves the package files from the package store and delivers the packages files to the clients over the broadcast medium.
Abstract: A webcast system delivers Web content from a webcast center over a broadcast medium to many clients. The webcast center has a server unit that gathers Web pages from sites on the Internet and stores the pages in a cache. The server unit bundles the pages from the cache into package files and stores the package files in a package store. The webcast center also has a broadcast unit that retrieves the package files from the package store and delivers the package files to the clients over the broadcast medium. Each client is equipped with a receiver to receive the broadcast package files. The client maintains a subscription database to store a directory of the Web content gathered by the webcast center. A subscriber user interface enables a user to select preferred Web content from the directory of the subscription database. The client creates a filter based on the user's preferences which is used to direct the receiver to collect only the package files carrying the preferred Web content, while rejecting packages carrying unwanted Web content.

Proceedings ArticleDOI
01 Oct 1998
TL;DR: The results show that the combination of out-of-order execution and multiple instruction issue is effective in improving performance of database workloads, providing gains of 1.5 and 2.6 times over an in-order single-issue processor for OLTP and DSS, respectively.
Abstract: Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized to perform well on scientific and engineering workloads. Given the radically different behavior of database workloads (especially OLTP), it is important to re-evaluate key system design decisions in the context of this important class of applications.This paper examines the behavior of database workloads on shared-memory multiprocessors with aggressive out-of-order processors, and considers simple optimizations that can provide further performance improvements. Our study is based on detailed simulations of the Oracle commercial database engine. The results show that the combination of out-of-order execution and multiple instruction issue is indeed effective in improving performance of database workloads, providing gains of 1.5 and 2.6 times over an in-order single-issue processor for OLTP and DSS, respectively. In addition, speculative techniques enable optimized implementations of memory consistency models that significantly improve the performance of stricter consistency models, bringing the performance to within 10--15% of the performance of more relaxed models.The second part of our study focuses on the more challenging OLTP workload. We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within 15%). Furthermore, our characterization shows that a large fraction of the data communication misses in OLTP exhibit migratory behavior; our preliminary results show that software prefetch and writeback/flush hints can be used for this data to further reduce execution time by 12%.

Patent
08 Dec 1998
TL;DR: In this article, a cache-based compaction technique is used to reduce the amount of information that must be transmitted from an Internet server to a user's computer or workstation when the user requests an Internet object, for example by clicking on a URL in a web browser application.
Abstract: The amount of information that must be transmitted from an Internet server to a user's computer or workstation when the user requests an Internet object, for example, by clicking on a URL in a web browser application, is reduced using a cache-based compaction technique in which the requested object is encoded in the server using information relating to similar objects that were previously supplied to the user. Similar objects available in both a client side cache and a server side cache are selected by comparing the URL of the requested object to the URL's of stored objects. Differential encoding is performed in the server such that the server transmits to the client information indicative of the differences between the requested object and the reference (similar) objects available in the server cache. A corresponding decoding operation is performed in the client, using the encoded version and reference objects available in the client cache.