scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 2008"


Proceedings ArticleDOI
08 Dec 2008
TL;DR: This work presents a protocol for general state machine replication - a method that provides strong consistency - that has high performance in a wide-area network and low latency under low client load even under changing wide- area network environment and client load.
Abstract: We present a protocol for general state machine replication - a method that provides strong consistency - that has high performance in a wide-area network. In particular, our protocol Mencius has high throughput under high client load and low latency under low client load even under changing wide-area network environment and client load. We develop our protocol as a derivation from the well-known protocol Paxos. Such a development can be changed or further refined to take advantage of specific network or application requirements.

292 citations


Patent
Guotong Feng1, John W. Barrus1
13 Jun 2008
TL;DR: In this article, a line drawing module of the electronic paper display driver determines at least one pixel to activate based on the received pen input information, independent of the display update rate.
Abstract: A system and a method are disclosed for fast pen tracking a low latency display updates on an electronic paper display. Pen input information is received on an electronic paper display that updates at a predetermined display update rate. A line drawing module of the electronic paper display driver determines at least one pixel to activate based on the received pen input information. The at least one pixel is updated independent of the display update rate of the electronic paper display. Active pixel state information is maintained separately for each pixel in real time until the pixel update is complete and the pixel is deactivated. In some embodiments, a future pixel to activate is determined based on the received pen input information. The future pixel is deactivated if pen input information is not received on the activated pixel for a predetermined amount of time.

90 citations


Journal ArticleDOI
Ruidong Li1, Jie Li1, Kui Wu1, Yang Xiao1, Jiang Xie1 
TL;DR: From the evaluation results, it can be seen that the proposed enhanced fast handover scheme can achieve low handover latency and low packet delay.
Abstract: One of the most important challenges in Mobile IPv6 is to provide the service for a mobile node to maintain its connectivity to the Internet when it moves from one domain to another, which is referred to as handover. Here we deal with the fast handover problem, which is to provide rapid handover service for the delay-sensitive and real-time applications. In this paper, we propose an enhanced fast handover scheme for Mobile IPv6. In our scheme, each AR (Access Router) maintains a CoA (Care of Address) table and generates the new CoA for the MN that will move to its domain. At the same time, the binding updates to home agent and correspondent node are to be performed from the time point when the new CoA for MN is known by PAR (Previous AR). Also the localized authentication procedure cooperated with the proposed scheme is provided. For the comparison with the existing fast handover scheme, detailed performance evaluation is performed. From the evaluation results, we can see that the proposed enhanced fast handover scheme can achieve low handover latency and low packet delay.

83 citations


Proceedings ArticleDOI
27 Oct 2008
TL;DR: A dependent link padding scheme to protect anonymity systems from traffic analysis attacks while providing a strict delay bound is proposed and the rate of the covering traffic converges to a constant when the number of flows goes to infinity.
Abstract: Low latency anonymity systems are susceptive to traffic analysis attacks. In this paper, we propose a dependent link padding scheme to protect anonymity systems from traffic analysis attacks while providing a strict delay bound. The covering traffic generated by our scheme uses the minimum sending rate to provide full anonymity for a given set of flows. The relationship between user anonymity and the minimum covering traffic rate is then studied via analysis and simulation. When user flows are Poisson processes with the same sending rate, the minimum covering traffic rate to provide full anonymity to m users is O(log m). For Pareto traffic, we show that the rate of the covering traffic converges to a constant when the number of flows goes to infinity. Finally, we use real Internet trace files to study the behavior of our algorithm when user flows have different rates.

65 citations


Patent
Vivek Gupta1, Pouya Taaghol1
04 Sep 2008
TL;DR: In this paper, a virtual eNB is proposed to facilitate low latency handover between Mobile WiMAX and 2G/3G/LTE networks with only a single radio transmitting at any given point in time, by establishing L2 tunnel between 3GPP MME and WiMAX ASN for control plane signaling to perform pre-registration, pre-authentication and context transfer to the target network, while UE maintains its connection to the source network.
Abstract: An example of this invention provides low latency handovers between Mobile WiMAX and 2G/3G/LTE networks with only a single radio transmitting at any given point in time, by establishing L2 tunnel between 3GPP MME and WiMAX ASN for control plane signaling to perform pre-registration, pre-authentication and context transfer to the target network, while UE maintains its connection to the source network, and by setting up bearer path for packet forwarding between Servicing Gateway and WiMAX ASN. An example of this invention uses a virtual eNB to facilitate low latency L2 handoffs to legacy 2G/3G networks with minimum impact to SGSN and MME.

56 citations


Patent
15 Feb 2008
TL;DR: In this article, the authors describe a network on chip (NOC) that includes integrated processor (IP) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller.
Abstract: Data processing on a network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, each network interface controller controlling inter-IP block communications through routers, with each IP block also adapted to the network by a low latency, high bandwidth application messaging interconnect comprising an inbox and an outbox.

52 citations


Proceedings ArticleDOI
01 Oct 2008
TL;DR: A new bit-level algorithm and new circuit techniques for the design of programmable priority arbiters that offer significantly more efficient implementations compared to already-known solutions are presented.
Abstract: The need for efficient implementation of simple crossbar schedulers has increased in the recent years due to the advent of on-chip interconnection networks that require low latency message delivery. The core function of any crossbar scheduler is arbitration that resolves conflicting requests for the same output. Since, the delay of the arbiters directly determine the operation speed of the scheduler, the design of faster arbiters is of paramount importance. In this paper, we present a new bit-level algorithm and new circuit techniques for the design of programmable priority arbiters that offer significantly more efficient implementations compared to already-known solutions. From the experimental results it is derived that the proposed circuits are more than 15% faster than the most efficient previous implementations, which under equal delay comparisons, translates to 40% less energy.

48 citations


Proceedings ArticleDOI
18 Nov 2008
TL;DR: A covert communication model based on least significant bits (LSB) steganography in Voice over IP (VoIP) aimed at providing nice security of secret messages and real-time performance that is vital for VoIP.
Abstract: Steganography, as one of alternative techniques for secure communications, has drawn more and more attentions. This paper presents a covert communication model based on least significant bits (LSB) steganography in Voice over IP (VoIP). The model aims at providing nice security of secret messages and real-time performance that is vital for VoIP. Therefore, we employ a simple encryption of secret messages before embedding them. This encryption strikes a good balance between adequate short-term protection for secret messages and low latency for VoIP. Furthermore, we design a structure of embedded messages. It can provide flexible length and avoid effectually both extraction attack and deceptive attack. We evaluate the model with ITU-T G.729a as the codec of the cover speech in StegTalk, our platform for study on covert communications theory in VoIP. In this case, the proposed model can provide two optional covert transmission speeds, i.e. 0.8 kb/s and 2.6 kb/s, where the maximum payload ratio is 99.98%. The experimental results show that our method has negligible effects on speech quality and well meets the real-time requirement of VoIP.

44 citations


Proceedings ArticleDOI
22 Apr 2008
TL;DR: This work presents Alert, a MAC protocol for collecting event-triggered urgent messages from a group of sensor nodes with minimum latency and without requiring any cooperation or pre-scheduling among the senders or between senders and receiver during protocol execution.
Abstract: Collection of rare but delay-critical messages from a group of sensor nodes is a key process in many wireless sensor network applications. This is particularly important for security related applications like intrusion detection and fire alarm systems. An event sensed by multiple sensor nodes in the network can trigger many messages to be sent simultaneously. We present Alert, a MAC protocol for collecting event-triggered urgent messages from a group of sensor nodes with minimum latency and without requiring any cooperation or pre-scheduling among the senders or between senders and receiver during protocol execution. Alert is designed to handle multiple simultaneous messages efficiently and reliably minimizing the overall delay to collect all messages along with the delay to get the first message. Moreover, the ability to handle a large number of simultaneous messages does not come at the cost of excessive delays when only a few messages need to be handled. We analyze Alert and evaluate its feasibility and performance with an implementation on commodity hardware. We further compare Alert with existing approaches through simulations and show the performance improvement possible through Alert.

41 citations


Proceedings ArticleDOI
07 Oct 2008
TL;DR: In this paper, a silicon photonic WDM point-to-point network enabled by optical proximity communications is proposed, which provides scalable interconnectivity between chips low latency and high bisection bandwidth.
Abstract: We introduce a silicon photonic WDM point-to-point network enabled by novel optical proximity communications. This strictly non-blocking network provides scalable interconnectivity between chips low latency and high bisection bandwidth.

35 citations


Proceedings ArticleDOI
09 Sep 2008
TL;DR: A novel stateless, virtualized communication engine for sub-microsecond latency using a field-programmable-gate-array (FPGA) based prototype and shows a latency of 970 ns between two machines with the virtualized engine for low overhead (VELO).
Abstract: This paper presents a novel stateless, virtualized communication engine for sub-microsecond latency. Using a field-programmable-gate-array (FPGA) based prototype we show a latency of 970 ns between two machines with our virtualized engine for low overhead (VELO). The FPGA device is directly connected to the CPUs by a hypertransport link. The described hardware architecture is optimized for small messages and avoids the overhead typically found with direct-memory access (DMA) controlled transfers. The stateless approach allows to use the hardware unit directly from many threads and processes simultaneously. It provides a secure user level communication with an extremely optimized start-up phase. Micro benchmarks results are reported both based on proprietary API and OpenMPI basis.

Book ChapterDOI
26 Mar 2008
TL;DR: This paper presents implementation of the double precision exponential function with novel table-based architecture which provides low latency (30 clock cycles) which is comparable to 32-bit implementations and aims primarily to meet quantum chemistry's huge and strict requirements of precision and speed.
Abstract: This paper presents implementation of the double precision exponential function. A novel table-based architecture, together with short Taylor expansion, provides low latency (30 clock cycles) which is comparable to 32-bit implementations. Low area consumption of a single exp()module (roughtly 4% of XC4LX200) allows implementation of several parallel modules on a single FPGAs. The exp() function was implemented on the SGI RASC platform, thus external memory interface limitation allowed only a twin module parallelism. Each module is capable of processing at speed of 200 MHz with max. error of 1 ulp, RMSE equals 0,62. This implementation aims primarily to meet quantum chemistry's huge and strict requirements of precision and speed.

Journal ArticleDOI
Ilias Iliadis1, Cyriel Minkenberg1
TL;DR: An analytical model is presented to investigate the efficiency of the speculative transmission scheme employed in a non-blocking N times NR input-queued crossbar switch with R receivers per output and shows that the control-path latency can be almost entirely eliminated for loads up to 50%.
Abstract: Low latency is a critical requirement in some switching applications, specifically in parallel computer interconnection networks. The minimum latency in switches with centralized scheduling comprises two components, namely, the control-path latency and the data-path latency, which in a practical high-capacity, distributed switch implementation can be far greater than the cell duration. We introduce a speculative transmission scheme to significantly reduce the average control-path latency by allowing cells to proceed without waiting for a grant, under certain conditions. It operates in conjunction with any centralized matching algorithm to achieve a high maximum utilization and incorporates a reliable delivery mechanism to deal with failed speculations. An analytical model is presented to investigate the efficiency of the speculative transmission scheme employed in a non-blocking N times NR input-queued crossbar switch with R receivers per output. Using this model, performance measures such as the mean delay and the rate of successful speculative transmissions are derived. The results demonstrate that the control-path latency can be almost entirely eliminated for loads up to 50%. Our simulations confirm the analytical results.

Proceedings ArticleDOI
08 Jun 2008
TL;DR: A hybrid MoT-BF network that combines MoT network with the area efficient butterfly network is introduced and it is proved that the hybrid network reducesMoT network's area cost.
Abstract: Single-chip parallel processing requires high bandwidth between processors and on-chip memory modules. A recently proposed Mesh-of-Trees (MoT) network provides high through put and low latency at relatively high area cost.In this paper, we introduce a hybrid MoT-BF network that combines MoT network with the area efficient butterfly network. We prove that the hybrid network reduces MoT network's area cost. Cycle-accurate simulation and post-layout results all show that significant area reduction can be achieved with negligible performance degradation, when operating at same clock rate.

Proceedings ArticleDOI
23 Apr 2008
TL;DR: A new SMU architecture which combines the concept of the trace-forward and trace-back and the power consumption of the proposed architecture is slightly higher than the 3-pointer even TB architecture.
Abstract: Viterbi decoder is a common module in communication system in which power and decoding latency are constraint. Register exchange (RE) architecture has the lowest decoding latency L. However, it is not suitable for communication system because of its high power consumption. In this paper, we propose a new SMU architecture which combines the concept of the trace-forward and trace-back. The decoding latency of the proposed SMU algorithm is only L+M. Besides, we present a power efficient architecture for the proposed SMU algorithm. We implement the proposed architecture in TSMC 0.13 mum technology. The power consumption of the proposed architecture is slightly higher than the 3-pointer even TB architecture.

Proceedings ArticleDOI
09 Sep 2008
TL;DR: A novel algorithm that minimizes the latency of workflows while satisfying strict throughput requirements and investigates the benefit of task duplication in alleviating communication overheads in the pipelined schedule for different workflow characteristics.
Abstract: Scheduling, in many application domains, involves the optimization of multiple performance metrics. For example, application workflows with real-time constraints have strict throughput requirements and also desire a low latency or response time. In this paper, we present a novel algorithm for the scheduling of workflows that act on a stream of input data. Our algorithm focuses on the two performance metrics: latency and throughput, and minimizes the latency of workflows while satisfying strict throughput requirements. We leverage pipelined, task and data parallelism in a coordinated manner to meet these objectives and investigate the benefit of task duplication in alleviating communication overheads in the pipelined schedule for different workflow characteristics. The proposed algorithm is designed for a realistic k-port communication model, where each processor can simultaneously communicate with at most k distinct processors. Evaluation using synthetic and application benchmarks shows that our algorithm consistently produces lower-latency schedules and meets throughput requirements, even when previously proposed schemes fail.

Proceedings ArticleDOI
24 Oct 2008
TL;DR: The proposed mobile WiMAX architecture meets the WiMAX radio system profile release 1.0 and RNM (reference network model) requirements when the class 2 group real time applications (VoIP & video conference) are deployed.
Abstract: This article presents a mobile WiMAX network deployment as a candidate for broadband and low latency V2I communication architecture. Firstly, it looks over the current state of development of the standards involved, outlining the newest trends (i.e. 802.16m support for 500 km/h). Secondly, the opportunities and the main characteristics that this technology offers are highlighted. Thirdly, by stressing this network in two highly demanding scenarios, the challenges that this WiMAX network deployment faces are also identified. Network simulation modeling techniques have been used to carry out the corresponding performance analysis. The inter ASN handover is identified as the critical point to be tackled, so that, the proposed mobile WiMAX architecture meets the WiMAX radio system profile release 1.0 and RNM (reference network model) requirements when the class 2 group real time applications (VoIP & video conference) are deployed.

Proceedings ArticleDOI
22 Apr 2008
TL;DR: A novel improvement to existing frameworks is proposed by optimizing the address configuration stage so that the handover latency is further reduced and based on fast handover for hierarchical mobile IPv6 and optimistic duplicate address detection.
Abstract: A seamless handover scheme with low latency and low packet loss is important to maintain TCP performance of mobile users. Several solutions have been proposed such as hierarchical mobile IPv6 (HMIPv6), fast handover protocol for mobile IPv6 (FMIPv6), seamless handoff architecture for mobile IP (S-MIP), ..., however, the overall handover latency is still too high for time-sensitive services. This paper proposes a novel improvement to existing frameworks by optimizing the address configuration stage so that the handover latency is further reduced. The proposed framework is based on fast handover for hierarchical mobile IPv6 and optimistic duplicate address detection.

Proceedings ArticleDOI
15 Oct 2008
TL;DR: Four selected hardware implementations of a 5 times 5 median filter are investigated and the Batcher network is a clear winner in power efficiency and the latency, maximum clock rates, and resource utilization are analyzed.
Abstract: The two-dimensional spatial median filter is a core algorithm for impulse noise removal in digital image processing and computer vision. While the literature presents several analyses of median filters optimized for a standard 3 times 3 pixel neighborhood configuration, a 5 times 5 neighborhood, useful for imagery exhibiting noise not conforming to the classic ldquosalt and pepperrdquo formation, has received little analysis. Research efforts on hardware implementations of median filters have been devoted primarily toward implementations with low latency and high throughput. We are developing a system that includes stereo visible near infrared sensors; both require a 5 times 5 median filter to handle intensifier noise. Since the system is a battery powered unit, optimal power usage is a critical requirement in addition to low latency and high throughput. However, optimal power usage for median filtering has received little attention in the literature. In this paper, we focus on investigating four selected hardware implementations of a 5 times 5 median filter and compare them on the basis of power efficiency. We also analyze the latency, maximum clock rates, and resource utilization for these implementations. The designs include implementations of merge sort and radix sort-based elimination algorithms, common in software implementation of median filters, and a systolic sorting array and a Batcher sorting network, common hardware sorting techniques. All designs were created in the Altera Quartus-II environment for Stratix-II field programmable gate arrays, and were designed to be fully pipelined, accepting input sets and generating median filter output values every pixel clock pulse. Of the four considered designs, the Batcher network is a clear winner in power efficiency. Also, the Batcher network exceeds the functional and performance requirements for resource usage, latency, and clock rate.

Proceedings ArticleDOI
18 Nov 2008
TL;DR: The key requirements for MAC protocol design are summarized and the advantages and disadvantages of existed MAC protocols are analyzed and promising directions for future work are outlined.
Abstract: Due to the nature of low sensing range, limited power capacity, high density, more efficient medium access control (MAC) protocols are needed to achieve low latency, low power and high throughput in wireless sensor networks. This paper summarized the key requirements for MAC protocol design and analyzed the advantages and disadvantages of existed MAC protocols and outlined promising directions for future work.

Proceedings ArticleDOI
21 Jan 2008
TL;DR: This paper proposes a variation-tolerant design technique, namely, block remap with turnoff (BRT), to minimize performance loss and leakage energy consumption in process variations of on-chip data cache.
Abstract: With reducing feature size, the effects of process variations are becoming more and more predominant. Memory components such as on-chip caches are more susceptible to such variations because of high density and small sized transistors present in them. Process variations can result in high access latency and leakage energy dissipation. This may lead to a functionally correct chip being rejected, resulting in reduced chip yield. In this paper, by considering a process variation affected on-chip data cache, we first analyze performance loss due to worst-case design techniques such as accessing the entire cache with the worst-case access latency or turning off the process variation affected cache blocks, and show that the worst-case design techniques result in significant performance loss and/or high leakage energy. Then by exploiting the fact that not all applications require full associativity at set-level, we propose a variation-tolerant design technique, namely, block remap with turnoff (BRT), to minimize performance loss and leakage energy consumption. In BRT technique we selectively turnoff few blocks after rearranging them in such a way that all sets get almost equal number of process variation affected blocks. By turning off process variation affected blocks of a set, leakage energy can be minimized and the set can be accessed with low latency at the cost of reduced set associativity. We validate our technique by running SPEC2000 CPU benchmark-suite on Simplescalar simulator and show that our technique significantly reduces the performance loss and leakage energy consumption due to process variations.

Proceedings ArticleDOI
19 May 2008
TL;DR: The impetus of the present study is to describe a cross-layer approach, enabling techniques as beamforming, MIMO antennas, OFDM and low latency MAC operation in IEEE802.11n wireless access networks.
Abstract: With the advent of low-cost WLAN devices, the delivery of multimedia content is highly desirable. Such applications require high throughput and near-real time for quality viewing. The use of next-generation WLAN 802.1 In and physical layer multicast transmission for a wide range of wireless terminals introduces many significant challenges. The impetus of the present study is to describe a cross-layer approach, enabling techniques as beamforming, MIMO antennas, OFDM and low latency MAC operation. Our simulation results show a substantial improvement in network performance for our proposed strategies in IEEE802.11n wireless access networks.

Proceedings ArticleDOI
01 Sep 2008
TL;DR: This paper presents an implementation of an extended version of MaxNet, which extends the original algorithm to give both provable stability and rate fairness and introduces the MaxStart algorithm which allows new MaxNet connections to reach their fair rates quickly.
Abstract: MaxNet TCP is a congestion control protocol that uses explicit multi-bit signalling from routers to achieve desirable properties such as high throughput and low latency. In this paper we present an implementation of an extended version of MaxNet. Our contributions are threefold. First, we extend the original algorithm to give both provable stability and rate fairness. Second, we introduce the MaxStart algorithm which allows new MaxNet connections to reach their fair rates quickly. Third, we provide a Linux kernel implementation of the protocol. With no overhead but 24-bit price signals, our implementation scales from 32 bit/s to 1 peta-bit/s with a 0.001% rate accuracy. We confirm the theoretically predicted properties by performing a range of experiments at speeds up to 1 Gbit/sec and delays up to 180 ms on the WAN-in-Lab facility.

Proceedings ArticleDOI
25 Nov 2008
TL;DR: All-optical encryption and anti-jamming have been experimentally demonstrated in an optical CDMA system using four-wave mixing in a 32-cm highly-nonlinear bismuth-oxide fiber, providing a high-speed, compact, and low latency approach for network security.
Abstract: All-optical encryption and anti-jamming have been experimentally demonstrated in an optical CDMA system using four-wave mixing in a 32-cm highly-nonlinear bismuth-oxide fiber. The scheme provides a high-speed, compact, and low latency approach for network security.

Journal Article
TL;DR: The theoretical analysis and the simulation show that L-MAC can solve low latency efficiently and adopts the gathering tree to reduce low latency.
Abstract: Simple analysis is made in MAC for wireless sensor network. Taking account of the drawbacks in low latency of the existing MAC protocols, a low-latency MAC Protocol (L-MAC is put forward. The protocol adopts the gathering tree to reduce low latency, which is on a base of satisfying energy efficient. The theoretical analysis and the simulation show that L-MAC can solve low latency efficiently.

Patent
28 Apr 2008
TL;DR: In this paper, a system, method and apparatus for logging data are provided, where network data is acquired in transmission to a destination and converted to loggable data, which is output in a machine readable format.
Abstract: A system, method and apparatus for logging data are provided. Network data is acquired in transmission to a destination. The network data is converted to loggable data. The loggable data is output in a machine readable format. As a result, the system, method and apparatus provides for low latency acquisition of data, which may also be used for recoverability.

Proceedings ArticleDOI
28 Aug 2008
TL;DR: The vertical handover architecture is designed to be easily extended to support more than two heterogeneous networks, so that it is easy to apply architecture to terminals that have multiple heterogeneous wireless network interfaces.
Abstract: This paper describes the vertical handover architecture with low latency handover for mobile terminals which supports interworking between heterogeneous networks. The vertical handover architecture defines handover-related modules for mobile terminal and the handover switching flows among them. We adapt low latency handover method to mobile IP for reducing handover time. The architecture is designed to be easily extended to support more than two heterogeneous networks, so that it is easy to apply architecture to terminals that have multiple heterogeneous wireless network interfaces. We analyze the performance of handover method in terms of handover delay time using test bed system.

Patent
10 Dec 2008
TL;DR: In this paper, a system may include a memory having stored thereon trading-application instructions; and a processor to execute the tradingapplication instructions resulting in a trading application, wherein the trading application is able to cause a display device to display to a user a visualized trading tool corresponding to a trade position of the user with relation to a financial instrument.
Abstract: In some embodiments, a system may include a memory having stored thereon trading-application instructions; and a processor to execute the trading-application instructions resulting in a trading application, wherein the trading application is able to cause a display device to display to a user a visualized trading tool corresponding to a trade position of the user with relation to a financial instrument, wherein the visualized trading tool includes at least one graphical element representing the trade position, and one or more user-controllable graphical indicators representing one or more respective position-related parameters of the trade position, and wherein the trading application is able to receive an input responsive to movement of at least one of the graphical indicators and to dynamically update the trade position based on the input. Other embodiments are described and claimed.

Proceedings ArticleDOI
TL;DR: A substantial performance improvement is realized by a new memorycommunication model that incorporates the data dependencies of the image-processing functions and exploits the locality of the signal- processing functions to streamline the memory communication.
Abstract: In Cardiovascular minimal invasive interventions, physicians require low-latency X-ray imaging applications, as their actions must be directly visible on the screen. The image-processing system should enable the simultaneous execution of a plurality of functions. Because dedicated hardware lacks flexibility, there is a growing interest in using off-the-shelf computer technology. Because memory bandwidth is a scarce parameter, we will focus on optimization methods for bandwidth reduction within multiprocessor systems at the chip level. We create a practical realistic model of required compute and memory bandwidth for a given set of image-processing functions. Similar modeling is applied for the available system resources. We concentrate in particular on X-ray image processing based on multi-resolution decomposition, noise reduction and image-enhancement techniques. We derive formulas for which we can optimize the mapping of the application onto processors, cache and memory for different configurations. The data-block granularity is matched to the memory hierarchy, so that caching will be optimized for low latency. More specifically, we exploit the locality of the signal-processing functions to streamline the memory communication. A substantial performance improvement is realized by a new memorycommunication model that incorporates the data dependencies of the image-processing functions. Results show a memory-bandwidth reduction in the order of 60% and a latency reduction in the order of 30-60% compared to straightforward implementations.

DOI
G Iles1
01 Jan 2008
TL;DR: The design for a low latency, high speed serial interface between the GCT and GT, based upon a Xilinx Virtex 5 FPGA is presented.
Abstract: The CMS Global Calorimeter Trigger (GCT) has been designed, manufactured and commissioned on a short time schedule of approximately two years. The GCT system has gone through extensive testing on the bench and in-situ and its performance is well understood. This paper describes problems encountered during the project, the solutions to them and possible lessons for future designs, particularly for high speed serial links. The input links have been upgraded from 1.6Gb/s synchronous links to 2.0Gb/s asynchronous links. The existing output links to the Global Trigger (GT) are being replaced. The design for a low latency, high speed serial interface between the GCT and GT, based upon a Xilinx Virtex 5 FPGA is presented.