Showing papers on "Latency (engineering) published in 2012"

PDF

Open Access

Posted Content•

PRINCE – A Low-latency Block Cipher for Pervasive Computing Applications

[...]

Julia Borghoff, Anne Canteaut, Tim Güneysu, Elif Bilge Kavun, Miroslav Knežević, Lars R. Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, Peter Maria Franciscus Rombouts, Søren S. Thomsen, Tolga Yalcin - Show less +9 more

01 Jan 2012

TL;DR: This paper presents a block cipher that is optimized with respect to latency when implemented in hardware and holds that decryption for one key corresponds to encryption with a related key, which is of independent interest and proves its soundness against generic attacks.

...read moreread less

Abstract: This paper presents a block cipher that is optimized with respect to latency when implemented in hardware. Such ciphers are desirable for many future pervasive applications with real-time security needs. Our cipher, named PRINCE, allows encryption of data within one clock cycle with a very competitive chip area compared to known solutions. The fully unrolled fashion in which such algorithms need to be implemented calls for innovative design choices. The number of rounds must be moderate and rounds must have short delays in hardware. At the same time, the traditional need that a cipher has to be iterative with very similar round functions disappears, an observation that increases the design space for the algorithm. An important further requirement is that realizing decryption and encryption results in minimum additional costs. PRINCE is designed in such a way that the overhead for decryption on top of encryption is negligible. More precisely for our cipher it holds that decryption for one key corresponds to encryption with a related key. This property we refer to as α-reflection is of independent interest and we prove its soundness against generic attacks.

...read moreread less

439 citations

Proceedings Article•

High Performance Parallel Computing with Clouds and Cloud Technologies

[...]

Jaliya Ekanayake¹, Geoffrey C. Fox²•Institutions (2)

Microsoft¹, Indiana University²

28 May 2012

TL;DR: In this article, the authors discuss large scale data analysis using different MapReduce implementations and then present a performance analysis of high performance parallel applications on virtualized resources, including MPI and CGL-MapReduce.

...read moreread less

Abstract: Infrastructure services (Infrastructure-as-a-service), provided by cloud vendors, allow any user to provision a large number of compute instances fairly easily. Whether leased from public clouds or allocated from private clouds, utilizing these virtual resources to perform data/compute intensive analyses requires employing different parallel runtimes to implement such applications. Among many parallelizable problems, most “pleasingly parallel” applications can be performed using MapReduce technologies such as Hadoop, CGL-MapReduce, and Dryad, in a fairly easy manner. However, many scientific applications, which have complex communication patterns, still require low latency communication mechanisms and rich set of communication constructs offered by runtimes such as MPI. In this paper, we first discuss large scale data analysis using different MapReduce implementations and then, we present a performance analysis of high performance parallel applications on virtualized resources.

...read moreread less

214 citations

Proceedings Article•DOI•

Designing for low-latency direct-touch input

[...]

Albert Han Ng¹, G. Julian Lepinski², Daniel Wigdor², Steven Leonard Sanders, Paul Henry Dietz¹ - Show less +1 more•Institutions (2)

Microsoft¹, University of Toronto²

07 Oct 2012

TL;DR: This work proposes a hybrid system that provides low-fidelity feedback immediately, followed by high-f fidelity visuals at standard levels of latency, and shows that users greatly prefer lower latencies, and improvement continued well below 10ms.

...read moreread less

Abstract: Software designed for direct-touch interfaces often utilize a metaphor of direct physical manipulation of pseudo "real-world" objects. However, current touch systems typically take 50-200ms to update the display in response to a physi-cal touch action. Utilizing a high performance touch de-monstrator, subjects were able to experience touch latencies ranging from current levels down to about 1ms. Our tests show that users greatly prefer lower latencies, and noticea-ble improvement continued well below 10ms. This level of performance is difficult to achieve in commercial compu-ting systems using current technologies. As an alternative, we propose a hybrid system that provides low-fidelity visu-al feedback immediately, followed by high-fidelity visuals at standard levels of latency.

...read moreread less

178 citations

Patent•

Network streaming of media data

[...]

Gordon Kent Walker¹, Michael Luby¹•Institutions (1)

Qualcomm¹

04 Oct 2012

TL;DR: In this paper, the authors propose to decode at least a portion of the media data of the second segment relative to the first segment in order to achieve a low latency live profile for dynamic adaptive streaming over HTTP.

...read moreread less

Abstract: In one example, a device includes one or more processors configured to receive a first segment of media data, wherein the media data of the first segment comprises a stream access point, receive a second segment of media data, wherein the media data of the second segment lacks a stream access point at the beginning of the second segment, and decode at least a portion of the media data of the second segment relative to at least a portion of data for the first segment. In this manner, the techniques of this disclosure may be used to achieve a Low Latency Live profile for, e.g., dynamic adaptive streaming over HTTP (DASH).

...read moreread less

97 citations

Journal Article•DOI•

A virtual infrastructure based on honeycomb tessellation for data dissemination in multi-sink mobile wireless sensor networks

[...]

Aysegul Tuysuz Erman¹, Arta Dilo¹, Paul J.M. Havinga¹•Institutions (1)

University of Twente¹

16 Jan 2012-Eurasip Journal on Wireless Communications and Networking

TL;DR: This article proposes a virtual infrastructure and a data dissemination protocol exploiting this infrastructure, which considers dynamic conditions of multiple sinks and sources and is fault-tolerant, meaning it can bypass routing holes created by imperfect conditions of wireless communication in the network.

...read moreread less

Abstract: A new category of intelligent sensor network applications emerges where motion is a fundamental characteristic of the system under consideration. In such applications, sensors are attached to vehicles, or people that move around large geographic areas. For instance, in mission critical applications of wireless sensor networks (WSNs), sinks can be associated to first responders. In such scenarios, reliable data dissemination of events is very important, as well as the efficiency in handling the mobility of both sinks and event sources. For this kind of applications, reliability means real-time data delivery with a high data delivery ratio. In this article, we propose a virtual infrastructure and a data dissemination protocol exploiting this infrastructure, which considers dynamic conditions of multiple sinks and sources. The architecture consists of 'highways' in a honeycomb tessellation, which are the three main diagonals of the honeycomb where the data flow is directed and event data is cached. The highways act as rendezvous regions of the events and queries. Our protocol, namely hexagonal cell-based data dissemination (HexDD), is fault-tolerant, meaning it can bypass routing holes created by imperfect conditions of wireless communication in the network. We analytically evaluate the communication cost and hot region traffic cost of HexDD and compare it with other approaches. Additionally, with extensive simulations, we evaluate the performance of HexDD in terms of data delivery ratio, latency, and energy consumption. We also analyze the hot spot zones of HexDD and other virtual infrastructure based protocols. To overcome the hot region problem in HexDD, we propose to resize the hot regions and evaluate the performance of this method. Simulation results show that our study significantly reduces overall energy consumption while maintaining comparably high data delivery ratio and low latency.

...read moreread less

80 citations

Patent•

Transceiver including a high latency communication channel and a low latency communication channel

[...]

Heng Zhang¹, Mehdi Khanpour¹, Jun Cao¹, Chang Liu¹, Afshin Momtaz¹ - Show less +1 more•Institutions (1)

Broadcom¹

07 Nov 2012

TL;DR: In this paper, a transceiver includes a high latency channel and a low latency communication channel that is configured to be a bypass channel for the high latency communication channels for low latency applications.

...read moreread less

Abstract: Methods, systems, and apparatuses are described for reducing the latency in a transceiver A transceiver includes a high latency communication channel and a low latency communication channel that is configured to be a bypass channel for the high latency communication channel The low latency communication channel may be utilized when implementing the transceiver is used in low latency applications By bypassing the high latency communication channel, the high latency that is introduced therein (due to the many stages of de-serialization used to reduce the data rate for digital processing) can be avoided An increase in data rate is realized when the low latency communication channel is used to pass data A delay-locked loop (DLL) may be used to phase align the transmitter clock of the transceiver with the receiver clock of the transceiver to compensate for a limited tolerance of phase offset between these clocks

...read moreread less

72 citations

Journal Article•DOI•

A New Approach to Low-Power and Low-Latency Wake-Up Receiver System for Wireless Sensor Nodes

[...]

Dae-Young Yoon¹, Chang-Jin Jeong¹, J. A. Cartwright², Ho-Yong Kang³, Seok-Kyun Han¹, Nae-Soo Kim³, Dong-Sam Ha², Sang-Gug Lee¹ - Show less +4 more•Institutions (3)

KAIST¹, Virginia Tech², Electronics and Telecommunications Research Institute³

28 Aug 2012-IEEE Journal of Solid-state Circuits

TL;DR: A new wake-up receiver is proposed to reduce energy consumption and latency through adoption of two different data rates for the transmission of wake-ups, and achieves a sensitivity of -73 dBm and dissipating an average power of 8.5 μW from a 1.8 V supply.

...read moreread less

Abstract: A new wake-up receiver is proposed to reduce energy consumption and latency through adoption of two different data rates for the transmission of wake-up packets. To reduce the energy consumption, the start frame bits (SFBs) of a wake-up packet are transmitted at a low data rate of 1 kbps, and a bit-level duty cycle is employed for detection of SFBs. To reduce both energy consumption and latency, duty cycling is halted upon detection of the SFB sequence, and the rest of the wake-up packet is transmitted at a higher data rate of 200 kbps. The proposed wake-up receiver is designed and fabricated in a 0.18 μm CMOS technology with a core size of 1850×1560 μm for the target frequency range of 902-928 MHz. The measured results show that the proposed design achieves a sensitivity of -73 dBm, while dissipating an average power of 8.5 μW from a 1.8 V supply.

...read moreread less

48 citations

Posted Content•

PRINCE - A Low-latency Block Cipher for Pervasive Computing Applications (Full version).

[...]

Julia Borghoff, Anne Canteaut, Tim Güneysu, Elif Bilge Kavun, Miroslav Knezevic, Lars R. Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, Peter Maria Franciscus Rombouts, Søren S. Thomsen, Tolga Yalcin - Show less +9 more

01 Jan 2012-IACR Cryptology ePrint Archive

37 citations

Patent•

Method and Apparatus for Low Latency Data Distribution

[...]

David E. Taylor, Scott Parsons, David Vincent Schuehler, Todd Alan Strader, Ryan L. Eder¹ - Show less +1 more•Institutions (1)

ABB Ltd¹

05 Apr 2012

TL;DR: In this article, various techniques for distributing data, particularly real-time data such as financial market data, to data consumers at low latency are described, including adaptive data distribution techniques and multi-class distribution engine.

...read moreread less

Abstract: Various techniques are disclosed for distributing data, particularly real-time data such as financial market data, to data consumers at low latency Exemplary embodiments include embodiments that employ adaptive data distribution techniques and embodiments that employ a multi-class distribution engine

...read moreread less

35 citations

Proceedings Article•DOI•

Engineering a Bandwidth-Scalable Optical Layer for a 3D Multi-core Processor with Awareness of Layout Constraints

[...]

Luca Ramini¹, Davide Bertozzi¹, Luca P. Carloni²•Institutions (2)

University of Ferrara¹, Columbia University²

09 May 2012

TL;DR: This paper assesses network partitioning options and bandwidth scalability techniques with deep technology and layout awareness and the main contribution lying in the characterization and precise quantification of such interaction effects between the technology platform, the layout constraints and the network-level quality metrics of a passive optical NoC.

...read moreread less

Abstract: The performance of future chip multi-processors will only scale with the number of integrated cores if there is a corresponding increase in memory access efficiency. The focus of this paper on a 3D-stacked wavelength-routed optical layer for high bandwidth and low latency processor-memory communication goes in this direction and complements ongoing efforts on photonically integrated bandwidth-rich DRAM devices. This target environment dictates layout constraints that make the difference in discriminating between alternative design choices of the optical layer. This paper assesses network partitioning options and bandwidth scalability techniques with deep technology and layout awareness, the main contribution lying in the characterization and precise quantification of such interaction effects between the technology platform, the layout constraints and the network-level quality metrics of a passive optical NoC.

...read moreread less

31 citations

Proceedings Article•DOI•

Low latency and large port count optical packet switch with highly distributed control

[...]

Jun Luo¹, S. Di Lucente¹, J. Ramirez¹, Harm J. S. Dorren¹, Nicola Calabretta¹ - Show less +1 more•Institutions (1)

Eindhoven University of Technology¹

04 Mar 2012

TL;DR: This work demonstrates for the first time 40 Gb/s operation of a modular large port count optical packet switch with highly distributed control with 25ns latency and record low energy consumption.

...read moreread less

Abstract: We demonstrate for the first time 40 Gb/s operation of a modular large port count optical packet switch with highly distributed control. The switch shows 25ns latency and record low energy consumption of 76.5 pj/bit.

...read moreread less

Proceedings Article•DOI•

Reliable Energy Aware Multi-token Based MAC Protocol for WSN

[...]

Subhasis Dash¹, Amulya Ratna Swain², Anuja Ajay³•Institutions (3)

KIIT University¹, Indian Institute of Science², Wipro³

26 Mar 2012

TL;DR: An energy efficient multi-token based MAC protocol is presented that not only extends the network lifetime and maintain the network connectivity but also achieve congestion less, fault-tolerant and reliable data transmission.

...read moreread less

Abstract: Wireless sensor networks (WSNs) have accelerated tremendous research efforts with an aim to maximize the lifetime of battery-powered sensor nodes and, by extension, the overall network lifetime. With an objective to prolong the lifetime of WSN, reducing energy consumption turns out to be the most crucial factors for almost all WSN protocols, particularly for the MAC protocol that directly ensures the state of the main energy consumption component, i.e., the radio module. In order to minimize energy consumption, RMAC and HEMAC protocols allow a node to transmit data packets for multi-hop WSN in a single duty-cycle. At the same time, each node remains in low power sleep mode and wakes up periodically to sense for channel activities, i.e., data transmission. But, in token based MAC protocol, depending on the token availability, there is always an end-to-end communication between source and sink one at a time, still it would have high latency time. Hence, different MAC protocols for WSN always have greater challenges towards energy conservation, maintaining low latency time, and fault-tolerant to node failure. To overcome these problems, we present an energy efficient multi-token based MAC protocol that not only extend the network lifetime and maintain the network connectivity but also achieve congestion less, fault-tolerant and reliable data transmission. Simulation studies of the proposed MAC protocol have been carried out using Castalia simulator, and its performance has been compared with that of SMAC, RMAC, and token based MAC protocol. Simulation results also show that the proposed approach has lower energy consumption and higher delivery ratio.

...read moreread less

Proceedings Article•DOI•

Demonstration of high-speed MIMO OFDM flexible bandwidth data center network

[...]

Philip N. Ji¹, Ting Wang¹, Dayou Qian¹, Lei Xu¹, Yoshiaki Aono², Tsutomu Tajima², Christoforos Kachris, Konstantinos Kanonakis, Ioannis Tomkos, Tiejun J. Xia³, Glenn A. Wellbrock³ - Show less +7 more•Institutions (3)

Princeton University¹, NEC², Verizon Communications³

16 Sep 2012

TL;DR: A novel datacenter network architecture utilizing OFDM and parallel signal detection technologies and efficient subcarrier allocation algorithms is proposed and fast, low latency, fine granularity, bandwidth flexible, and low power consumption MIMO switching is demonstrated experimentally.

...read moreread less

Abstract: We propose a novel datacenter network architecture utilizing OFDM and parallel signal detection technologies and efficient subcarrier allocation algorithms. Fast, low latency, fine granularity, bandwidth flexible, and low power consumption MIMO switching is demonstrated experimentally.

...read moreread less

Proceedings Article•DOI•

A demonstration of ultra-low-latency data center optical circuit switching

[...]

Nathan Farrington¹, George Porter¹, Pang-Chen Sun¹, Alex Forencich¹, Joseph E. Ford¹, Yeshaiahu Fainman¹, George C. Papen¹, Amin Vahdat¹ - Show less +4 more•Institutions (1)

University of California, San Diego¹

13 Aug 2012

TL;DR: This work designed and constructed a 24x24-port optical circuit switch (OCS) prototype with a programming time of 68.5 μs, a switching time of 2.8μs, and a receiver electronics initialization time of 8.7 μs and demonstrates the operation of this prototype switch in a data center testbed under various workloads.

...read moreread less

Abstract: We designed and constructed a 24x24-port optical circuit switch (OCS) prototype with a programming time of 68.5 μs, a switching time of 2.8 μs, and a receiver electronics initialization time of 8.7 μs [1]. We demonstrate the operation of this prototype switch in a data center testbed under various workloads.

...read moreread less

Patent•

Systems and methods for low latency, high reliability error correction in a flash drive

[...]

Philip L. Northcott¹•Institutions (1)

PMC-Sierra¹

22 May 2012

TL;DR: In this article, the authors present an approach and methods that provide relatively low uncorrectable bit error rates, low write amplification, long life, fast and efficient retrieval, and efficient storage density such that a solid-state drive (SSD) can be implemented using relatively inexpensive MLC Flash for an enterprise storage application.

...read moreread less

Abstract: Apparatus and methods provide relatively low uncorrectable bit error rates, low write amplification, long life, fast and efficient retrieval, and efficient storage density such that a solid-state drive (SSD) can be implemented using relatively inexpensive MLC Flash for an enterprise storage application.

...read moreread less

Proceedings Article•DOI•

CMOS implementation of a fast 4-2 compressor for parallel accumulations

[...]

Amir Fathi¹, Sarkis Azizian¹, Khayrollah Hadidi¹, Abdollah Khoei¹, Amin Chegeni¹ - Show less +1 more•Institutions (1)

Urmia University¹

20 May 2012

TL;DR: A novel and fast 4-2 compressor is proposed which will have no need for extra buffers in low latency paths to equalize the delays and the power dissipation will be decreased and the output waveforms will be free of any glitch.

...read moreread less

Abstract: This paper discusses about the design of a novel and fast 4-2 compressor. To enhance the speed performance, some changes are performed in the truth table of conventional 4-2 compressor which leaded to reduction of gate level delay to 2 XOR logic gates plus 1 transistor for all parameters. Because of similar paths, there will be no need for extra buffers in low latency paths to equalize the delays. Therefore, the power dissipation will be decreased and the output waveforms will be free of any glitch. The delay of proposed architecture is 340ps which is simulated by HSPICE using TSMC 0.35µm CMOS technology.

...read moreread less

Journal Article•DOI•

HartOS - a Hardware Implemented RTOS for Hard Real-time Applications

[...]

Anders Blaabjerg Lange¹, Karsten Holm Andersen¹, Ulrik Pagh Schultz¹, Anders Stengaard Sørensen¹•Institutions (1)

Maersk¹

23 May 2012

TL;DR: HartOS is a hardware-implemented, micro-kernel-structured RTOS targeted for hard real-time embedded applications running on FPGA based platforms and has up to 3 orders of magnitude less mean error in generating the correct period for a periodic task, while having up to 100% less overhead depending on the tick frequency.

...read moreread less

Abstract: This paper introduces HartOS, a hardware-implemented, micro-kernel-structured RTOS targeted for hard real-time embedded applications running on FPGA based platforms. Historically hardware RTOSs have been too inflexible and have had limited features and resources. HartOS is designed to be flexible and supports most of the features normally found in a software-based RTOS. To ensure fast, low latency and jitter-free communication between the CPU and RTOS, HartOS uses the ARM AXI4-Stream bus recently supported by the MicroBlaze softcore processor. Compared to μC/OS-II, HartOS has up to 3 orders of magnitude less mean error in generating the correct period for a periodic task, and around 1 order of magnitude less jitter, while having up to 100% less overhead depending on the tick frequency.

...read moreread less

Book Chapter•DOI•

Adaptive sampling for low latency vision processing

[...]

D. P. Gibson¹, Henk L. Muller, Neill W Campbell¹, David Bull¹•Institutions (1)

University of Bristol¹

05 Nov 2012

TL;DR: A close-to-sensor low latency visual processing system that shows that by adaptively sampling visual information, low level tracking can be achieved at high temporal frequencies with no increase in bandwidth and using very little memory.

...read moreread less

Abstract: In this paper we describe a close-to-sensor low latency visual processing system. We show that by adaptively sampling visual information, low level tracking can be achieved at high temporal frequencies with no increase in bandwidth and using very little memory. By having close-to-sensor processing, image regions can be captured and processed at millisecond sub-frame rates. If spatiotemporal regions have little useful information in them they can be discarded without further processing. Spatiotemporal regions that contain 'interesting' changes are further processed to determine what the interesting changes are. Close-to-sensor processing enables low latency programming of the image sensor such that interesting parts of a scene are sampled more often than less interesting parts. Using a small set of low level rules to define what is interesting, early visual processing proceeds autonomously. We demonstrate system performance with two applications. Firstly, to test the absolute performance of the system, we show low level visual tracking at millisecond rates and secondly a more general recursive Baysian tracker.

...read moreread less

Journal Article•DOI•

A Novel and Very Fast 4-2 Compressor for High Speed Arithmetic Operations

[...]

Amir Fathi¹, Sarkis Azizian¹, Khayrollah Hadidi¹, Abdollah Khoei¹•Institutions (1)

Urmia University¹

01 Apr 2012-IEICE Transactions on Electronics

TL;DR: A novel high speed 4-2 compressor using static and pass-transistor logic, has been designed in a 0.35µm CMOS technology in order to reduce gate level delay and increase the speed.

...read moreread less

Abstract: A novel high speed 4-2 compressor using static and pass-transistor logic, has been designed in a 0.35µm CMOS technology. In order to reduce gate level delay and increase the speed, some changes are performed in truth table of conventional 4-2 compressor which leaded to the simplification of logic function for all parameters. Therefore, power dissipation is decreased. In addition, because of similar paths from all inputs to the outputs, the delays are the same. So there will be no need for extra buffers in low latency paths to equalize the delays.

...read moreread less

Journal Article•DOI•

FPGA-Based Label Processor for Low Latency and Large Port Count Optical Packet Switches

[...]

Nicola Calabretta¹•Institutions (1)

Eindhoven University of Technology¹

28 Aug 2012-Journal of Lightwave Technology

TL;DR: A field-programmable gate array (FPGA)-based label processor for in-band optical labels with a processing time independent of the number of label bits is presented, which allows for implementing an optical packet switching architecture that scales to a large port count without compromising the latency.

...read moreread less

Abstract: We present a field-programmable gate array (FPGA)-based label processor for in-band optical labels with a processing time independent of the number of label bits. This allows for implementing an optical packet switching architecture that scales to a large port count without compromising the latency. As a proof of concept, we have employed an FPGA board with 100 MHz clock to validate the operation of the label processor in a 160 Gb/s optical packet switching system. Experimental results show successful three label bits processing and 160 Gb/s packets switching with 1 dB power penalty and 470 ns of latency. Projections on the label processor performance by using more powerful FPGAs indicate that 60 label bits ( optical addresses) can be processed within 31 ns.

...read moreread less

Journal Article•DOI•

Peregrine: Low-latency queries on Hive warehouse data

[...]

Raghotham Murthy¹, Rajat Goel²•Institutions (2)

Stanford University¹, International Institute of Information Technology, Hyderabad²

01 Sep 2012-ACM Crossroads Student Magazine

TL;DR: In this article, the authors describe how Facebook analyzes big data and how it applies it to Facebook's own data, and how Facebook is analyzed big data, using big data.

...read moreread less

Abstract: How Facebook is analyzing big data.

...read moreread less

Proceedings Article•DOI•

LA-MAC: Low-latency asynchronous MAC for wireless sensor networks

[...]

Giorgio Corbellini, Emilio Calvanese Strinati, Andrzej Duda¹•Institutions (1)

Grenoble Institute of Technology¹

29 Nov 2012

TL;DR: LA-MAC is a low-latency asynchronous access method for efficient forwarding in wireless sensor networks suitable for current and future sensor networks that increasingly provide support for multiple applications, handle heterogeneous traffic, and become organized according to some complex structure.

...read moreread less

Abstract: The paper presents LA-MAC, a low-latency asynchronous access method for efficient forwarding in wireless sensor networks. It is suitable for current and future sensor networks that increasingly provide support for multiple applications, handle heterogeneous traffic, and become organized according to some complex structure (tree, DAG, partial mesh). It takes advantage of the network structure so that a parent of some nodes becomes a coordinator that schedules transmissions in a localized region. Allowing burst transmissions improves the network capacity so that the network can handle load fluctuations. At the same time, the method reduces energy consumption by decreasing the overhead of node coordination per frame. The paper reports on the results of extensive simulations that compare LA-MAC with B-MAC and X-MAC, two representative methods based on preamble sampling. They show excellent performance of LA-MAC with respect to latency, delivery ratio, and consumed energy.

...read moreread less

Patent•

Low-latency interface-based networking

[...]

Eric Badi¹, Yves Masse¹, Philippe Gentric¹•Institutions (1)

Texas Instruments¹

11 Apr 2012

TL;DR: In this paper, a network of processing devices includes a medium for low-latency interfaces for providing point-to-point connections between each of the processing devices, and a switch within each processing device is arranged to facilitate communications in any combination between the processing resources and the local point to point interfaces within each processor device.

...read moreread less

Abstract: A network of processing devices includes a medium for low-latency interfaces for providing point-to-point connections between each of the processing devices A switch within each processing device is arranged to facilitate communications in any combination between the processing resources and the local point-to-point interfaces within each processing device A networking layer is provided above the low-latency interface stack, which facilitates re-use of software and exploits existing protocols for providing the point-to-point connections Higher speeds are achieved for switching between the relatively low numbers of processor resources within each processing device, while low-latency point-to-point communications are achieved using the low-latency interfaces for accessing processor resources that are external to a processing device

...read moreread less

Patent•

Securing transmit openings

[...]

Gang Lu¹, Payam Torab Jahromi², Hongyu Xie²•Institutions (2)

Broadcom¹, Avago Technologies²

10 Jul 2012

TL;DR: In this article, a technique for securing transmit opening helps enhance the operation of a station that employs the technique, which may facilitate low latency response to a protocol data requester, for instance.

...read moreread less

Abstract: A technique for securing transmit opening helps enhance the operation of a station that employs the technique. The technique may facilitate low latency response to a protocol data requester, for instance. In one aspect, the technique provides a way for the protocol data responder to hold its transmit opening to transmit the protocol response data to the protocol data requester. The technique may allow the protocol data responder to hold the transmit opening until the protocol response data is ready and available for the protocol data responder to send.

...read moreread less

Journal Article•DOI•

Evaluation of a high-volume, low-latency market data processing system implemented with IBM middleware

[...]

Yoonho Park¹, Richard Pervin King¹, Senthil Nathan¹, Wesley Most¹, Henrique Andrade² - Show less +1 more•Institutions (2)

IBM¹, Goldman Sachs²

01 Jan 2012-Software - Practice and Experience

TL;DR: This work determines the effectiveness of each system optimization that the hardware and software infrastructure makes available and shows that a stock market data processing system can be built with general‐purpose middleware and run on commodity hardware.

...read moreread less

Abstract: A stock market data processing system that can handle high data volumes at low latencies is critical to market makers. Such systems play a critical role in algorithmic trading, risk analysis, market surveillance, and many other related areas. The current systems tend to use specialized software and custom processors. We show that such a system can be built with general-purpose middleware and run on commodity hardware. The middleware we use is IBM System S which includes transport technology from IBM WebSphere MQ Low Latency Messaging (LLM). Our performance evaluation consists of two parts. First, we determined the effectiveness of each system optimization that the hardware and software infrastructure makes available. These optimizations were implemented at all software levels--application, middleware, and operating system. Second, we evaluated our system on different hardware platforms.

...read moreread less

Proceedings Article•DOI•

Composable, non-blocking collective operations on power7 IH

[...]

Gabriel Tanase¹, Gheorghe Almasi¹, Hanhong Xue¹, Charles J. Archer²•Institutions (2)

IBM¹, University of Rochester²

25 Jun 2012

TL;DR: This paper presents a novel set of collective operations implemented using point to point messages, shared memory and accelerator hardware to exploit the hierarchical organization of the P7IH for providing low latency, high bandwidth operations.

...read moreread less

Abstract: The Power7 IH (P7IH) is one of IBM's latest generation of supercomputers Like most modern parallel machines, it has a hierarchical organization consisting of simultaneous multithreading (SMT) within a core, multiple cores per processor, multiple processors per node (SMP), and multiple SMPs per cluster A low latency/high bandwidth network with specialized accelerators is used to interconnect the SMP nodes System software is tuned to exploit the hierarchical organization of the machineIn this paper we present a novel set of collective operations that take advantage of the P7IH hardware We discuss non blocking collective operations implemented using point to point messages, shared memory and accelerator hardware We show how collectives can be composed to exploit the hierarchical organization of the P7IH for providing low latency, high bandwidth operations We demonstrate the scalability of the collectives we designed by including experimental results on a P7IH system with up to 4096 cores

...read moreread less

Patent•

Bi-modal arbitration nodes for a low-latency adaptive asynchronous interconnection network and methods for using the same

[...]

Steven M. Nowick¹, Gennette Delaine Gill¹, Sumedh S. Attarde¹•Institutions (1)

Columbia University¹

14 Mar 2012

TL;DR: In this paper, a dynamically reconfigurable asynchronous arbitration node for use in an adaptive asynchronous interconnection network is provided, which includes a circuit, an output channel and two input channels.

...read moreread less

Abstract: A dynamically reconfigurable asynchronous arbitration node for use in an adaptive asynchronous interconnection network is provided. The arbitration node includes a circuit, an output channel and two input channels—a first input channel and a second input channel. The circuit supports a default-arbitration mode and a biased-input mode. The circuit is configured to generate data for the output channel by mediating between input traffic including data received at the first and second input channels, if the arbitration node is operating in the default-arbitration mode, or by providing a direct path to the output channel for one of the first input channel and the second input channel that is biased, if the arbitration node is operating in the biased-input mode. The circuit is further configured to monitor the input traffic and implement a mode change based on a history of the observed input traffic in accordance with a mode-change policy.

...read moreread less

Posted Content•

The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture

[...]

Andrea Biagioni, Francesca Lo Cicero, Alessandro Lonardo, Pierluigi Paolucci¹, Mersia Perra, Davide Rossetti, Carlo Sidore, Francesco Simula, Laura Tosoratto, Piero Vicini - Show less +6 more•Institutions (1)

Istituto Nazionale di Fisica Nucleare¹

07 Mar 2012-arXiv: Hardware Architecture

TL;DR: The DNP provides inter-tile services for both on-chip and off-chip communications with a uniform RDMA style API, over a multi-dimensional direct network with a (possibly) hybrid topology.

...read moreread less

Abstract: One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network between processor's tiles. In this paper, we present a configurable and scalable architecture, based on our Distributed Network Processor (DNP) IP Library, targeting systems ranging from single MPSoCs to massive HPC platforms. The DNP provides inter-tile services for both on-chip and off-chip communications with a uniform RDMA style API, over a multi-dimensional direct network with a (possibly) hybrid topology.

...read moreread less

Patent•

Ultra Low Latency Network Buffer Storage

[...]

Kelvin Chan¹, Ganga Sudharshini Devadas¹, Chih-Tsung Huang¹, Wei-Jen Huang¹, Dennis Khoa Dang Nguyen¹, Yue J. Yang¹ - Show less +2 more•Institutions (1)

Cisco Systems, Inc.¹

07 Dec 2012

TL;DR: In this paper, a first portion of the packet is written into a first cell of a plurality of cells of a buffer in the network device, each of the cells has a size that is less than a minimum size of packets received by the device.

...read moreread less

Abstract: Buffer designs and write/read configurations for a buffer in a network device are provided. According to one aspect, a first portion of the packet is written into a first cell of a plurality of cells of a buffer in the network device. Each of the cells has a size that is less than a minimum size of packets received by the network device. The first portion of the packet can be read from the first cell while concurrently writing a second portion of the packet to a second cell.

...read moreread less

Journal Article•

Implementation of Five Port Router Architecture Using VHDL

[...]

Priti B. Domkondwar¹, D.S. Chaudhari¹•Institutions (1)

Government College of Engineering, Amravati¹

28 May 2012-International Journal of Advanced Research in Computer Science and Electronics Engineering

TL;DR: This paper will provide an Design and implementation of on-chip router architecture that allows routing function for each input port and distributed arbiters which gives high level of parallelism.

...read moreread less

Abstract: Technology scaling continuously increasing number of component and complexity for System on Chip systems [1]. For effective global on-chip communication, on-chip routers provide essential routing functionality with low complexity and relatively high performance [1]. The low latency and high speed is achieved by allowing routing function for each input port and distributed arbiters which gives high level of parallelism [4]. This paper will provide an Design and implementation of on-chip router architecture.

...read moreread less