Home
/
Authors
/
Matt Crawford

Author

Matt Crawford

Bio: Matt Crawford is an academic researcher from Fermilab. The author has contributed to research in topics: Network packet & Scheduling (computing). The author has an hindex of 8, co-authored 15 publications receiving 217 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The performance analysis of linux networking - Packet receiving

[...]

Wenji Wu¹, Matt Crawford¹, M. Bowden¹•Institutions (1)

Fermilab¹

01 Mar 2007-Computer Communications

TL;DR: A mathematical model is developed to characterize the Linux packet receiving process, which is studied from NIC to application and key factors that affect Linux systems' network performance are analyzed.

...read moreread less

71 citations

Journal Article•DOI•

Sorting Reordered Packets with Interrupt Coalescing

[...]

Wenji Wu¹, Phil DeMar¹, Matt Crawford¹•Institutions (1)

Fermilab¹

01 Oct 2009-Computer Networks

TL;DR: Sorting Reordered Packets with Interrupt Coalescing (SRPIC) works in the network device driver; it makes use of the interrupt coalescing mechanism to sort the reordered packets belonging to the same TCP stream in a block of packets before delivering them upward.

...read moreread less

26 citations

Journal Issue•DOI•

Potential performance bottleneck in Linux TCP

[...]

Wenji Wu¹, Matt Crawford¹•Institutions (1)

Fermilab¹

01 Nov 2007-International Journal of Communication Systems

TL;DR: This paper systematically describes the trip of a TCP packet from its ingress into a Linux network end system to its final delivery to the application, and proposes and test one possible solution to resolve this performance bottleneck in Linux TCP.

...read moreread less

Abstract: Transmission control protocol (TCP) is the most widely used transport protocol on the Internet today. Over the years, especially recently, due to requirements of high bandwidth transmission, various approaches have been proposed to improve TCP performance. The Linux 2.6 kernel is now preemptible. It can be interrupted mid-task, making the system more responsive and interactive. However, we have noticed that Linux kernel preemption can interact badly with the performance of the networking subsystem. In this paper, we investigate the performance bottleneck in Linux TCP. We systematically describe the trip of a TCP packet from its ingress into a Linux network end system to its final delivery to the application; we study the performance bottleneck in Linux TCP through mathematical modelling and practical experiments; finally, we propose and test one possible solution to resolve this performance bottleneck in Linux TCP. Copyright © 2007 John Wiley & Sons, Ltd.

...read moreread less

24 citations

Proceedings Article•DOI•

Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations

[...]

Lana Abadie, P. Badino, Jean-Philippe Baud, E. Corso¹, Matt Crawford², S. De Witt³, Flavia Donno⁴, Alessandra Forti, Akos Frohner⁴, P. Fuhrmann, G. Grosdidier⁵, Junmin Gu⁶, Jens Jensen³, Birger Koblitz⁴, Sophie Lemaitre⁴, Maarten Litmaath⁴, D. Litvinsev⁷, G. Lo Presti⁴, L. Magnoni, Tigran Mkrtchan, Alexander Moibenko, Rémi Mollon⁴, Vijaya Natarajan⁶, G. Oleynik², T. Perelmutov², Don Petravick², Arie Shoshani⁶, Alex Sim⁶, David Smith⁴, M. Sponza¹, Paolo Tedesco⁴, Riccardo Zappi - Show less +28 more•Institutions (7)

International Centre for Theoretical Physics¹, Fermilab², Rutherford Appleton Laboratory³, CERN⁴, Centre national de la recherche scientifique⁵, Lawrence Berkeley National Laboratory⁶, SLAC National Accelerator Laboratory⁷

24 Sep 2007

TL;DR: Using SRM in a large international high energy physics collaboration, called WLCG, to prepare to handle the large volume of data expected when the Large Hadron Collider goes online at CERN is described.

...read moreread less

Abstract: Storage management is one of the most important enabling technologies for large-scale scientific investigations. Having to deal with multiple heterogeneous storage and file systems is one of the major bottlenecks in managing, replicating, and accessing files in distributed environments. Storage resource managers (SRMs), named after their Web services control protocol, provide the technology needed to manage the rapidly growing distributed data volumes, as a result of faster and larger computational facilities. SRMs are grid storage services providing interfaces to storage resources, as well as advanced functionality such as dynamic space allocation and file management on shared storage systems. They call on transport services to bring files into their space transparently and provide effective sharing of files. SRMs are based on a common specification that emerged over time and evolved into an international collaboration. This approach of an open specification that can be used by various institutions to adapt to their own storage systems has proven to be a remarkable success - the challenge has been to provide a consistent homogeneous interface to the grid, while allowing sites to have diverse infrastructures. In particular, supporting optional features while preserving interoperability is one of the main challenges we describe in this paper. We also describe using SRM in a large international high energy physics collaboration, called WLCG, to prepare to handle the large volume of data expected when the Large Hadron Collider (LHC) goes online at CERN. This intense collaboration led to refinements and additional functionality in the SRM specification, and the development of multiple interoperating implementations of SRM for various complex multi- component storage systems.

...read moreread less

22 citations

Posted Content•

A Transport-Friendly NIC for Multicore/Multiprocessor Systems

[...]

Wenji Wu, Matt Crawford, Phil DeMar

02 Jun 2011-arXiv: Networking and Internet Architecture

TL;DR: An NIC with an NIC data steering mechanism to remedy the RSS and Flow Director limitations is proposed “A Transport-Friendly NIC” (A-TFN), and experimental results have proven the effectiveness of A- TFN in accelerating TCP/IP performance.

...read moreread less

Abstract: Receive side scaling (RSS) is a network interface card (NIC) technology. It provides the benefits of parallel receive processing in multiprocessing environments. However, existing RSS-enabled NICs lack a critical data steering mechanism that would automatically steer incoming network data to the same core on which its application process resides. This absence causes inefficient cache usage if an application is not running on the core on which RSS has scheduled the received traffic to be processed. In Linux systems, it cannot even ensure that packets in a TCP flow are processed by a single core, even if the interrupts for the flow are pinned to a specific core. This results in degraded performance. In this paper, we develop such a data steering mechanism in the NIC for multicore or multiprocessor systems. This data steering mechanism is mainly targeted at TCP, but it can be extended to other transport layer protocols. We term a NIC with such a data steering mechanism "A Transport Friendly NIC" (A-TFN). Experimental results have proven the effectiveness of A-TFN in accelerating TCP/IP performance.

...read moreread less

21 citations

Cited by

PDF

Open Access

More filters

The Transmission Control Protocol.

[...]

Aleksander Malinowski, Bogdan M. Wilamowski

01 Jan 2005

1,360 citations

Proceedings Article•DOI•

ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers

[...]

Yoongu Kim¹, Dongsu Han¹, Onur Mutlu¹, Mor Harchol-Balter¹•Institutions (1)

Carnegie Mellon University¹

01 Apr 2010

TL;DR: It is shown that the implementation of least-attained-service thread prioritization reduces the time the cores spend stalling and significantly improves system throughput, and ATLAS's performance benefit increases as the number of cores increases.

...read moreread less

Abstract: Modern chip multiprocessor (CMP) systems employ multiple memory controllers to control access to main memory. The scheduling algorithm employed by these memory controllers has a significant effect on system throughput, so choosing an efficient scheduling algorithm is important. The scheduling algorithm also needs to be scalable — as the number of cores increases, the number of memory controllers shared by the cores should also increase to provide sufficient bandwidth to feed the cores. Unfortunately, previous memory scheduling algorithms are inefficient with respect to system throughput and/or are designed for a single memory controller and do not scale well to multiple memory controllers, requiring significant finegrained coordination among controllers. This paper proposes ATLAS (Adaptive per-Thread Least-Attained-Service memory scheduling), a fundamentally new memory scheduling technique that improves system throughput without requiring significant coordination among memory controllers. The key idea is to periodically order threads based on the service they have attained from the memory controllers so far, and prioritize those threads that have attained the least service over others in each period. The idea of favoring threads with least-attained-service is borrowed from the queueing theory literature, where, in the context of a single-server queue it is known that least-attained-service optimally schedules jobs, assuming a Pareto (or any decreasing hazard rate) workload distribution. After verifying that our workloads have this characteristic, we show that our implementation of least-attained-service thread prioritization reduces the time the cores spend stalling and significantly improves system throughput. Furthermore, since the periods over which we accumulate the attained service are long, the controllers coordinate very infrequently to form the ordering of threads, thereby making ATLAS scalable to many controllers. We evaluate ATLAS on a wide variety of multiprogrammed SPEC 2006 workloads and systems with 4–32 cores and 1–16 memory controllers, and compare its performance to five previously proposed scheduling algorithms. Averaged over 32 workloads on a 24-core system with 4 controllers, ATLAS improves instruction throughput by 10.8%, and system throughput by 8.4%, compared to PAR-BS, the best previous CMP memory scheduling algorithm. ATLAS's performance benefit increases as the number of cores increases.

...read moreread less

439 citations

Journal Article•DOI•

NetVM: High Performance and Flexible Networking Using Virtualization on Commodity Platforms

[...]

Jinho Hwang¹, Kadangode K. Ramakrishnan², Timothy Wood³•Institutions (3)

IBM¹, University of California, Riverside², George Washington University³

09 Feb 2015-IEEE Transactions on Network and Service Management

TL;DR: This evaluation shows how NetVM can compose complex network functionality from multiple pipelined VMs and still obtain throughputs up to 10 Gbps, an improvement of more than 250% compared to existing techniques that use SR-IOV for virtualized networking.

...read moreread less

Abstract: NetVM brings virtualization to the Network by enabling high bandwidth network functions to operate at near line speed, while taking advantage of the flexibility and customization of low cost commodity servers. NetVM allows customizable data plane processing capabilities such as firewalls, proxies, and routers to be embedded within virtual machines, complementing the control plane capabilities of Software Defined Networking. NetVM makes it easy to dynamically scale, deploy, and reprogram network functions. This provides far greater flexibility than existing purpose-built, sometimes proprietary hardware, while still allowing complex policies and full packet inspection to determine subsequent processing. It does so with dramatically higher throughput than existing software router platforms. NetVM is built on top of the KVM platform and Intel DPDK library. We detail many of the challenges we have solved such as adding support for high-speed inter-VM communication through shared huge pages and enhancing the CPU scheduler to prevent overheads caused by inter-core communication and context switching. NetVM allows true zero-copy delivery of data to VMs both for packet processing and messaging among VMs within a trust boundary. Our evaluation shows how NetVM can compose complex network functionality from multiple pipelined VMs and still obtain throughputs up to 10 Gbps, an improvement of more than 250% compared to existing techniques that use SR-IOV for virtualized networking.

...read moreread less

399 citations

Proceedings Article•DOI•

Presto: Edge-based Load Balancing for Fast Datacenter Networks

[...]

Keqiang He¹, Eric J. Rozner², Kanak B. Agarwal², Wesley M. Felter², John B. Carter², Aditya Akella¹ - Show less +2 more•Institutions (2)

University of Wisconsin-Madison¹, IBM²

17 Aug 2015

TL;DR: A soft-edge load balancing scheme that closely tracks that of a single, non-blocking switch over many workloads and is adaptive to failures and topology asymmetry, called Presto is designed and implemented.

...read moreread less

Abstract: Datacenter networks deal with a variety of workloads, ranging from latency-sensitive small flows to bandwidth-hungry large flows. Load balancing schemes based on flow hashing, e.g., ECMP, cause congestion when hash collisions occur and can perform poorly in asymmetric topologies. Recent proposals to load balance the network require centralized traffic engineering, multipath-aware transport, or expensive specialized hardware. We propose a mechanism that avoids these limitations by (i) pushing load-balancing functionality into the soft network edge (e.g., virtual switches) such that no changes are required in the transport layer, customer VMs, or networking hardware, and (ii) load balancing on fine-grained, near-uniform units of data (flowcells) that fit within end-host segment offload optimizations used to support fast networking speeds. We design and implement such a soft-edge load balancing scheme, called Presto, and evaluate it on a 10 Gbps physical testbed. We demonstrate the computational impact of packet reordering on receivers and propose a mechanism to handle reordering in the TCP receive offload functionality. Presto's performance closely tracks that of a single, non-blocking switch over many workloads and is adaptive to failures and topology asymmetry.

...read moreread less

250 citations

Proceedings Article•

XIA: efficient support for evolvable internetworking

[...]

Dongsu Han¹, Ashok Anand², Fahad R. Dogar¹, Boyan Li¹, Hyeontaek Lim¹, Michel Machado³, Arvind Mukundan¹, Wenfei Wu², Aditya Akella², David G. Andersen¹, John W. Byers³, Srinivasan Seshan¹, Peter Steenkiste¹ - Show less +9 more•Institutions (3)

Carnegie Mellon University¹, University of Wisconsin-Madison², Boston University³

25 Apr 2012

TL;DR: This paper presents the eXpressive Internet Architecture (XIA), an architecture with native support for multiple principals and the ability to evolve its functionality to accommodate new, as yet unforeseen, principals over time.

...read moreread less

Abstract: Motivated by limitations in today's host-centric IP network, recent studies have proposed clean-slate network architectures centered around alternate first-class principals, such as content, services, or users. However, much like the host-centric IP design, elevating one principal type above others hinders communication between other principals and inhibits the network's capability to evolve. This paper presents the eXpressive Internet Architecture (XIA), an architecture with native support for multiple principals and the ability to evolve its functionality to accommodate new, as yet unforeseen, principals over time. We describe key design requirements, and demonstrate how XIA's rich addressing and forwarding semantics facilitate flexibility and evolvability, while keeping core network functions simple and efficient. We describe case studies that demonstrate key functionality XIA enables.

...read moreread less

156 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

Collapse