scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Implementation of a simplified network processor

13 Jun 2010-pp 7-13
TL;DR: This work presents a network processor design that separates programming from resource management, which simplifies the software development process and improves the system's ability to adapt to network conditions.
Abstract: Programmable packet processors have replaced traditional fixed-function custom logic in the data path of routers. Programmability of these systems allows the introduction of new packet processing functions, which is essential for today's Internet as well as for next-generation network architectures. Software development for many existing implementations of these network processors requires a deep understanding of the architecture and careful resource management by the software developer. Resource management that is tied to application development makes it difficult for packet processors to adapt to changes in the workload that are based on traffic conditions and the deployment of new functionality. Therefore, we present a network processor design that separates programming from resource management, which simplifies the software development process and improves the system's ability to adapt to network conditions. Based on our initial system design, we present a prototype implementation of a 4-core network processor using the NetFPGA platform. We demonstrate the operation of the system using header-processing and payload-processing applications. For packet forwarding, our simplified network processor can achieve a throughput of 2.79 Gigabits per second at a clock rate of only 62.5 MHz. Our results indicate the proposed design can scale to configurations with many more processors that operate at much higher clock rates and thus can achieve considerable higher throughput while using modest amounts of hardware resources.

Content maybe subject to copyright    Report

1
Implementation of a Simplified Network Processor
Qiang Wu, Danai Chasaki and Tilman Wolf
Department of Electrical and Computer Engineering
University of Massachusetts
Amherst, MA, USA
{qwu,dchasaki,wolf}@ecs.umass.edu
Abstract—Programmable packet processors have replaced tra-
ditional fixed-function custom logic in the data path of routers.
Programmability of these systems allows the introduction of new
packet processing functions, which is essential for today’s Internet
as well as for next-generation network architectures. Software
development for many existing implementations of these network
processors requires a deep understanding of the architecture and
careful resource management by the software developer. Resource
management that is tied to application development makes it
difficult for packet processors to adapt to changes in the workload
that are based on traffic conditions and the deployment of new
functionality. Therefore, we present a network processor design
that separates programming from resource management, which
simplifies the software development process and improves the
system’s ability to adapt to network conditions. Based on our
initial system design, we present a prototype implementation of
a 4-core network processor using the NetFPGA platform. We
demonstrate the operation of the system using header-processing
and payload-processing applications. For packet forwarding, our
simplified network processor can achieve a throughput of 2.79
Gigabits per second at a clock rate of only 62.5 MHz. Our results
indicate the proposed design can scale to configurations with
many more processors that operate at much higher clock rates
and thus can achieve considerable higher throughput while using
modest amounts of hardware resources.
Index Terms—Router design, network processor, next-
generation Internet, parallel processor, prototype
I. INTRODUCTION
Modern routers use programmable packet processors on
each port to implement packet forwarding and other advanced
protocol functionality. This programmability in the data path
is an important aspect of router designs in the current Inter-
net in contrast to the traditional approach where custom
application-specific integrated circuits with fixed functionality
are used. The ability to change a router’s operation by simply
changing the software processed on router ports makes it pos-
sible to introduce new functions (e.g., monitoring, accounting,
anomaly detection, blocking, etc.) without changing router
hardware. An essential requirement for these systems is the
availability of a high-performance packet processor that can
deliver packet processing at data rates of multiple Gigabits per
second. Such network processors (NPs) have been developed
and deployed over the last decade as systems-on-a-chip based
on multi-core architectures.
One of the key challenges in using a network processor to
implement advanced packet processing functionality is soft-
This material is based upon work supported by the National Science
Foundation under Grant Nos. CNS-0626690 and CNS-0447873.
ware development. Many programming environments for NPs
use very low levels of abstractions. While this approach helps
with achieving high throughput performance, it also poses
considerable challenges to the software developer. Distributing
processing workload between processor cores, coordinating
shared resources, and manually allocating data structures to
different memory types and banks is a difficult process. In
environments where network functionality does not change
frequently, it is conceivable to dedicate considerable resources
to such software development. However, this software devel-
opment approach becomes less practical in highly dynamic
systems. The next-generation Internet is envisioned to be such
a dynamic environment.
The next-generation Internet architecture is expected to rely
on programmability in the infrastructure substrate to provide
isolated network “slices” with functionally different protocol
stacks [1], [2]. In such a network, the processing workload on
router changes dynamically as slices are added or removed or
as the amount of traffic within a slice changes. These dynamics
require that the software on a network processor adapt at
runtime without the involvement of a software developer.
Thus, it is necessary to develop network processing systems
where these dynamics can be handled by the system. The
performance demands of packet processing do not allow the
use of a completely general operating system. An operating
system would use a considerable fraction of the network
processor’s resources. Instead, we focus on a solution where
resource management is built into the network processor
hardware and in turn allows a much simplified programming
process.
The simplified network processor that we present in this
paper attempts to hide the complexity of resource management
in the network processor hardware. The software developer
merely programs the functionality of packet processing. This
approach contrasts other network processors, where func-
tionality and resource management are tightly coupled (e.g.,
programmers need to explicitly choose allocation of data in
SRAM or DRAM). By separating functionality from resource
management, the system can more readily adapt to runtime
conditions that could not have been predicted by the software
developer.
We have described the general architecture of a simplified
network processor in [3], [4], and we review the main aspects
of the design in Section III. In this paper, we present a
prototype implementation of our simplified network processor.
Specifically, the contributions of our paper are:

2
FPGA-based prototype implementation of simplified net-
work processor: To demonstrate the feasibility of our
simplified network processor design, we present a 4-
core prototype implementation based on the NetFPGA
system [5]. This implementation shows that the proposed
architecture can be realized with a moderate amount of
resources.
Functional operation with header-processing and payload-
processing application: We present the results of two
applications operating on the prototype system to demon-
strate that the simplified network processor operates cor-
rectly and is able to process packet header and payloads.
We also illustrate the simplicity of software development
for the system.
Performance results to demonstrate scalability: We
present results on system throughput to show how well
the system performs and how workload can be distributed
over all processor cores. We also present results to
indicate system scalability to larger number of cores and
higher clock rates on ASIC-based implementations.
Overall, the results presented in this paper demonstrate
that the simplified network processor architecture is feasible,
efficient, and easier to program than conventional network
processors. We believe that these results present an important
step towards developing an efficient, easy-to-use infrastructure
for packet processing in today’s networks and the future
Internet.
The remainder of this paper is organized as follows. Sec-
tion II discusses related work. The overall system design of
the simplified network processor is introduced in Section III.
Specific details on the prototype implementation are presented
in Section IV. Results from the prototype implementation
and its performance are presented in Section V. Section VI
summarizes and concludes this paper.
II. R
ELATED WORK
Programmability in the data path of routers has been in-
troduced as software extensions to workstation-based routers
(e.g., Click modular router [6], dynamically extensible router
[7]) as well as multi-core embedded network processors (e.g.,
Intel IXP platform [8], Cisco QuantumFlow processor [9],
EZchip NP-3 [10], and AMCC nP series [11]). Programmabil-
ity in the data path can be used to implement additional packet
processing functions beyond simple IPv4-forwarding [12] or
in-network data path service for next-generation networks
[13], [14].
Software development environments for data path program-
ming support general-purpose programmability [15], provide
a modular structure (e.g., NP-Click [16], router plugins [17]),
or implement abstraction layers to hide underlying hardware
details [18]. While the management of packet I/O is largely
simplified by software abstractions or libraries, control of
packet transmission is still tied to packet processing in these
approaches. Programming environments for network proces-
sors require software support for program partitioning (e.g.,
how to distribute workload over multiple processor cores) and
resource management (e.g., how to allocate program state to
different memories).
Fig. 1. Resource management in simplified network processor design.
In prior work, we have shown that it is possible to perform
automated program partitioning of workloads [19] as well
as to dynamically manage resources on multi-core network
processors [20], [21]. Based on these results, we have proposed
the basic architecture of the simplified network processor in
[3], [4]. While we envision a specific implementation based
on our prior work on workload partitioning and resource man-
agement, it is possible to use different runtime management
approaches (e.g., [22], [23]).
III. S
YSTEM ARCHITECTURE
We provide a brief overview on the system architecture
of the simplified network processor before discussing details
on the prototype implementation in Section IV. In principle,
the network processor consists of a grid of packet processors
that are locally connected to each other. A control system
determines how packets are moved between processors and
what processing is performed.
A. Resource Management
The system architecture of the simplified network processor
is based on the idea of removing explicit resource man-
agement from the software development process and instead
implementing this feature in the network processor hardware.
Figure 1 shows this difference. The idea to not have the
programmer handle resource management is not new it has
been very successfully deployed in practically any operating
system. However, network processors (and many other em-
bedded systems) do not use operating systems for reasons
of performance. Network processors may use an embedded
operating system on their control processor, but processors in
the data where performance really matters are programmed
directly.
Since network processors are used for a very specific task
(i.e., processing packets), it is possible to provide resource
management operations as part of the network processor
system. Using a combination of special-purpose hardware
resources and software on the control processor, it is possible
to perform the following actions:

3
Fig. 2. Processing context in simplified network processor design.
Move packets between processor cores,
Switch processing context between different applications,
Allocate processing resources to applications based on
processing requirements, and
Allocate memories to data structures based on access
patterns.
We have discussed algorithms for the latter two functions
in prior work (see [20] for dynamic mapping of tasks to
processing resources and [21] for dynamic mapping of data
structures to memories). These algorithms are implemented
on the control processor of the system and thus are not
immediately related to the network processor hardware. In this
paper, we focus on the first two issues since they are an integral
part of the hardware design.
B. Software Development
Using hardware support for moving packets between proces-
sor cores and switching processing contexts enables us to sig-
nificantly simplify the software development process. Since the
software developer does not need to explicitly manage packets
or processing context, a much simpler processing environment
can be presented. Through careful memory management, pro-
cessing instructions, data structures, and the current packet can
all be mapped to (virtually) fixed memory locations. Thus,
the program can simply access them through static references.
Figure 2 illustrates this simplified environment.
To make this approach practical, the network processor
hardware needs to handle the necessary context switching and
packet movement operations.
C. Packet Processor
The packet processor unit in the simplified network pro-
cessor design is shown in Figure 3. The address shifters are
used to map the most significant bits of memory addresses to
the processing context that is currently in use by the processor.
Similarly, access to the packet is mapped to one out of several
in the packet buffer. For more details on the operation of the
address shifter, see [3].
A critical aspect of the system is determining when to map
memory accesses to which processing context. This step is
handled by the resource management system on the control
processor. Each packet is classified when entering the network
processor to determine what processing steps are required
(e.g., depending on the virtual slice to which a packet is sent).
The packet then carries control information that determines
its path and the processing steps that need be performed
on different processors. Using this control information, each
packet processor can determine what context needs to be
mapped using the address shifters. Thus, it can be ensured
that the correct processing steps are performed on the packet.
IV. P
ROTOTYPE IMPLEMENTATION
The high-level architecture of our four-core prototype sys-
tem, which we have implemented on a NetFPGA [5], is
shown in Figure 4. Packets enter through the I/O interface,
get classified into flows and then distributed into the grid of
packet processing units (PPUs). Each processing unit has a
set of packet processing applications preloaded (as determined
by the runtime system), and is able to select the requested
application based on control information determined during
packet classification. After the processing steps have been
completed, the packet is sent through the output arbiter to
the outgoing interface. The processor core is a 32-bit Plasma
processor [24], which uses the MIPS instruction set and
operates at 62.5MHz.
One of the key design aspects of this system is that
packet processors only use local memory. Avoiding the use
of a global, shared memory interface helps in preserving
the scalability of the design. As we show in the results in
Section V, the prototype implementation can scale to a 7×7
grid configuration with a linear increase in chip resources.
A. Packet Processing Unit
The setup of the packet buffer system is also illustrated
in Figure 3. The packet buffers are used to store packets
that are received from neighboring processor units (or from
the flow classification unit). The processor can switch its
local context to the packet that is being processed. Completed
packets are stored until they can be passed to the neighboring
processor units (or to the output arbiter). Bypass buffers are
used for packets that do not need processing on the local
processor, but need to be passed to a neighbor. By using

4
Fig. 3. Packet processing unit with support for context mapping and packet handling.
Fig. 4. Network processing platform design.
separate buffers, blocking due to processor overload can be
avoided. Our prototype uses four packet buffers for processing
and two packet buffers for bypass. Larger numbers of buffers
would help in reducing potential packet drops due to blocking,
but limited on-chip memory resources impede larger buffer
designs.
B. Flow Routing Mechanism
The flow classification unit of the system determines the
transmission path of a packet through the system as well as
the set of applications that are executed along the way. Each
packet is augmented by control information that contains two
pieces of information for each processor that is traversed in
the grid:
A3
A1
PPU1
PPU0
PPU2
PPU3
A2
A0
Flow
Classification
Out p ut
Arbiter
A2
1 10 11 0 00 11 1 11 01
00
01
01
11
1 10 11 1 01 00 1 00 11
Tag Table
Fig. 5. Example of flow-based packet routing/processing and control
information used by system.
Service tag: The service tag consists of an indicator if an
application is to be executed on the packet on the current
processor. If processing is required, the additional bits
in the service tag determine which application is used.
In the current prototype, we support processing of one
application per processor, but this design can be extended
to support multiple applications per packet per processor.
Routing information: The routing information indicates
to which neighbor the packet should be passed after
processing is completed.
Figure 5 shows an example of this control information for
a 2×2 processor example with two active flows. The flows
are routed as shown in the figure and processing occurs when
the flow encounters an application illustrated by a circle. The
tags that are kept in the tag table and that are added to the

5
packet are shown at the bottom of the figure. There are three
triples of bit sequences. Each triple is used by one of the
processors that are traversed. Note that the number of valid
triples may change with different routes. Also, the triples
are processed from right to left. Within a triple, the first bit
indicates if an application is to operate on the packet. If so,
the second bit sequence indicates the application identifier.
The last bit sequence indicates the routing according to the
directions shown in the lower right of the figure.
To setup (or change) the route of a flow or its processing
steps, the runtime system of the network processor simply
rewrites the control information in the tag table. This approach
allows for very easy control of the system without the need
to communicate with individual packet processing units.
Identification of flows is achieved through lookup operations
on flow table stored in the classification unit. Thus, by altering
entries of the flow table, a flow is able to access any service
inside the processing grid. In addition, the bypass path of each
PPU is isolated from the processing path to avoid blocking of
bypass packet transmission. Thus, the flow routing mechanism
allows for significant flexibility in the utilization of the pro-
cessing grid. For example, all PPUs can be chained together to
form a pipeline, or they can be logically parallelized (i.e., each
flow can only be served by exactly one PPU). More details
about application mapping on PPUs and the flow routing
algorithm can be found in [25].
C. Simplified Programming Abstraction
As discussed in [3], [4], one of the goals of our design
is to simplify code development for the network service
processing platform. To achieve the desired simplicity, the
packet processor is able to directly access on-chip memories,
in which instructions (program code for multiple services),
data and packets have been stored. As shown in Figure 2
the packet processor has an interface for reading program
instructions and data memory and an interface for access
to packet memory. In the instruction memory, the code for
running a particular service is placed at a fixed, well-known
offset. In the data memory we have placed the stack and global
pointers at well-known offsets as well. With this design, packet
processing and code development for packet processing is
simplified. Packet data can be accessed via referencing the data
memory on the (fixed) packet offset. Moreover, the program
code is placed in a fixed location in the instruction memory
and thus can be accessed easily by the processor.
An example of a piece of C code that accesses packet
memory is shown in Figure 6. The code reads the time-to-
live (TTL) field in the IP header and decrements it. Since the
context is automatically mapped, the IP header can simply be
accessed by a static reference. The hardware of the system
ensures that this memory access is mapped to the correct
physical address in the packet buffer that is currently in
use. Similarly, data memory (and instruction memory) can be
accessed. For example, to count the number of packets handled
by an application, a simple counter can be declared:
static int packet_count
This counter can be incremented once per packet:
#define IP_TTL 0x1000001E
#define pkt_get8(addr, data) \
data =
*
((volatile unsigned char
*
) addr)
#define pkt_put8(addr, data) \
*
((volatile unsigned char
*
) addr) = data
typedef unsigned char _u8;
_u8 ip_ttl;
pkt_get8(IP_TTL, ip_ttl);
if (ip_ttl != 0){
ip_ttl--; \\decrement TTL
pkt_put8(IP_TTL, ip_ttl);
} else {
...handle TTL expiration...
}
Fig. 6. Simple C program for accessing and decrementing the time-to-live
field in the IP header.
packet_count++
The automated context handling ensures that the memory state
is maintained for the application across packets, and thus
correct counting is possible.
To program other network processors, a programmer has
to specify the exact memory offset and memory bank (e.g.,
SRAM vs. DRAM) each and every time a data structure is
accessed. Compared to this complex method of referencing
memory, our programming model is considerably easier.
For our prototype implementation, we have implemented
two specific applications:
IP forwarding: This application implements IP forward-
ing [26] using a simple destination IP lookup algorithm.
IPsec encryption: This application implements the cryp-
tographic processing to encrypt IP headers and payload
for VPN transmission [27].
These two applications represent two extremes in the spec-
trum of processing complexity. IP forwarding implements the
minimum amount of processing that is necessary to forward
a packet. IPsec is extremely processing-intensive since each
byte of the packet has to be processed and since cryptographic
processing is very compute-intensive.
V. E
VALUATION
In this section, we discuss performance results obtained
from our prototype system. These results focus on functional-
ity, throughput performance, and scalability.
A. Experimental Setup and Correctness
Using three of the Ethernet ports on the NetFPGA system,
we connect the network processor to three workstation com-
puters for traffic generation and trace collection. The routing
and processing steps for flows on the network processor are set
up statically for each experiment. The IP forwarding and IPsec
application are instantiated as necessary on the processing
units.
The first important result is that the system operates cor-
rectly. Using network monitoring on the workstation comput-
ers, we can verify that IP forwarding is implemented correctly

Citations
More filters
Journal ArticleDOI
TL;DR: This work demonstrates the first-practical attack that exploits a vulnerability in packet processing software to launch a devastating denial-of-service attack from within the network infrastructure.
Abstract: Security issues in computer networks have focused on attacks on end systems and the control plane. An entirely new class of emerging network attacks aims at the data plane of the network. Data plane forwarding in network routers has traditionally been implemented with custom-logic hardware, but recent router designs increasingly use software-programmable network processors for packet forwarding. These general-purpose processing devices exhibit software vulnerabilities and are susceptible to attacks. We demonstrate-to our knowledge the first-practical attack that exploits a vulnerability in packet processing software to launch a devastating denial-of-service attack from within the network infrastructure. This attack uses only a single attack packet to consume the full link bandwidth of the router's outgoing link. We also present a hardware-based defense mechanism that can detect situations where malicious packets try to change the operation of the network processor. Using a hardware monitor, our NetFPGA-based prototype system checks every instruction executed by the network processor and can detect deviations from correct processing within four clock cycles. A recovery system can restore the network processor to a safe state within six cycles. This high-speed detection and recovery system can ensure that network processors can be protected effectively and efficiently from this new class of attacks.

33 citations

Proceedings ArticleDOI
30 Aug 2011
TL;DR: This paper demonstrates a specific attack that can launch a devastating denial-of-service attack by sending just a single packet and shows that defense techniques based on processor monitoring that have been proposed in prior work can help in detecting and avoiding such attacks.
Abstract: We present the first practical example of an entirely new class of network attacks -- attacks that target the network infrastructure. Modern routers in computer networks use general-purpose programmable packet processors. The software used for packet processing on these systems is potentially vulnerable to remote exploits. In this paper, we demonstrate a specific attack that can launch a devastating denial-of-service attack by sending just a single packet. We show that vulnerable packet processing code can be exploited on a Click modular router as well as on a custom packet processor on the NetFPGA platform. We also show that defense techniques based on processor monitoring that we have proposed in prior work can help in detecting and avoiding such attacks.

22 citations


Cites background from "Implementation of a simplified netw..."

  • ...Specifically, we demonstrate how benign protocol processing code (in our case, the insertion of a protocol header) can be exploited by a single data packet and trigger a denial-ofservice that consumes the entire outgoing link bandwidth of a router....

    [...]

Proceedings ArticleDOI
25 Oct 2010
TL;DR: This work presents a design and proof-of-concept implementation of a packet processing system that uses two security techniques to defend against potential attacks: a processing monitor is used to track operations on each processor core to detect attacks at the processing instruction level; an I/O monitor isused to track operation of the router to detect attacked devices at the protocol level.
Abstract: Programmability in the data path of routers provides the basis for modern router implementations that can adapt to new functional requirements. This programmability is typically achieved through software-programmable packet processing systems. One key concern with the proliferation of these programmable devices throughout the Internet is the potential impact of software vulnerabilities that can be exploited remotely. We present a design and proof-of-concept implementation of a packet processing system that uses two security techniques to defend against potential attacks: a processing monitor is used to track operations on each processor core to detect attacks at the processing instruction level; an I/O monitor is used to track operations of the router to detect attacks at the protocol level. Our prototype implementation on the NetFPGA system shows that these monitors can be implemented to operate at high data rates and with little additional hardware resources.

17 citations

Proceedings ArticleDOI
03 Oct 2011
TL;DR: This work proposes a modular packet processor monitoring technique that can help in detecting and avoiding network attacks, using information about the processing time distribution of individual modules, input/output traffic characteristics can be inferred and abnormal behavior can be detected.
Abstract: Programmable packet processors have replaced traditional fixed-function custom logic in the data path of routers. This programmability introduces new vulnerabilities in these systems that can lead to new types of network attacks. We propose a modular packet processor monitoring technique that can help in detecting and avoiding such attacks. Using information about the processing time distribution of individual modules, input/output traffic characteristics can be inferred and abnormal behavior can be detected.

4 citations


Cites methods from "Implementation of a simplified netw..."

  • ...In this work, we present a modular technique for validation of a router’s correct operation by monitoring the “processing” of each module, examining input/output flow characteristics and correlating them with the processing time spent on individual modules....

    [...]

Proceedings ArticleDOI
07 Apr 2013
TL;DR: In this paper, based on the disk array and storage cluster technology, several storage solutions for the media server are evaluated and discussed and the most appropriate storage solution is proposed at present.
Abstract: Nowadays, the rapid development of digital TV media applications proposes higher requirements on the performance of the media servers This paper discusses such a media server based on multi-core network processors These servers are deployed at the edge of the Internet to provide services independently, catering to the characteristics of digital TV However, it is crucial to find an appropriate storage solution that provides high IO performance for the media servers in such architecture In this paper, based on the disk array and storage cluster technology, several storage solutions for the media server are evaluated and discussed Finally we propose the most appropriate storage solution for the media server at present

2 citations

References
More filters
01 Aug 1995
TL;DR: This document describes an updated version of the "Security Architecture for IP", which is designed to provide security services for traffic at the IP layer, and obsoletes RFC 2401 (November 1998).
Abstract: This document describes an updated version of the "Security Architecture for IP", which is designed to provide security services for traffic at the IP layer. This document obsoletes RFC 2401 (November 1998). [STANDARDS-TRACK]

3,455 citations


"Implementation of a simplified netw..." refers methods in this paper

  • ...• IPsec encryption: This application implements the cryptographic processing to encrypt IP headers and payload for VPN transmission [27]....

    [...]

Journal ArticleDOI
TL;DR: On conventional PC hardware, the Click IP router achieves a maximum loss-free forwarding rate of 333,000 64-byte packets per second, demonstrating that Click's modular and flexible architecture is compatible with good performance.
Abstract: Clicks is a new software architecture for building flexible and configurable routers. A Click router is assembled from packet processing modules called elements. Individual elements implement simple router functions like packet classification, queuing, scheduling, and interfacing with network devices. A router configurable is a directed graph with elements at the vertices; packets flow along the edges of the graph. Several features make individual elements more powerful and complex configurations easier to write, including pull connections, which model packet flow drivn by transmitting hardware devices, and flow-based router context, which helps an element locate other interesting elements. Click configurations are modular and easy to extend. A standards-compliant Click IP router has 16 elements on its forwarding path; some of its elements are also useful in Ethernet switches and IP tunnelling configurations. Extending the IP router to support dropping policies, fairness among flows, or Differentiated Services simply requires adding a couple of element at the right place. On conventional PC hardware, the Click IP router achieves a maximum loss-free forwarding rate of 333,000 64-byte packets per second, demonstrating that Click's modular and flexible architecture is compatible with good performance.

2,595 citations

Proceedings ArticleDOI
12 Dec 1999
TL;DR: The Click IP router can forward 64-byte packets at 73,000 packets per second, just 10% slower than Linux alone, and is easy to extend by adding additional elements, which are demonstrated with augmented configurations.
Abstract: Click is a new software architecture for building flexible and configurable routers. A Click router is assembled from packet processing modules called elements. Individual elements implement simple router functions like packet classification, queueing, scheduling, and interfacing with network devices. Complete configurations are built by connecting elements into a graph; packets flow along the graph's edges. Several features make individual elements more powerful and complex configurations easier to write, including pull processing, which models packet flow driven by transmitting interfaces, and flow-based router context, which helps an element locate other interesting elements.We demonstrate several working configurations, including an IP router and an Ethernet bridge. These configurations are modular---the IP router has 16 elements on the forwarding path---and easy to extend by adding additional elements, which we demonstrate with augmented configurations. On commodity PC hardware running Linux, the Click IP router can forward 64-byte packets at 73,000 packets per second, just 10% slower than Linux alone.

1,608 citations


"Implementation of a simplified netw..." refers background in this paper

  • ..., Click modular router [6], dynamically extensible router [7]) as well as multi-core embedded network processors (e....

    [...]

Journal ArticleDOI
TL;DR: The design, rationale, and implementation of a security architecture for protecting the secrecy and integrity of Internet traffic at the Internet Protocol (IP) layer, which includes a modular key management protocol, called MKMP, is presented.
Abstract: In this paper we present the design, rationale, and implementation of a security architecture for protecting the secrecy and integrity of Internet traffic at the Internet Protocol (IP) layer. The design includes three components: (1) a security policy for determining when, where, and how security measures are to be applied; (2) a modular key management protocol, called MKMP, for establishing shared secrets between communicating parties and meta-information prescribed by the security policy; and (3) the IP Security Protocol, as it is being standardized by the Internet Engineering Task Force, for applying security measures using information provided through the key management protocol. Effectively, these three components together allow for the establishment of a secure channel between any two communicating systems over the Internet. This technology is a component of IBM's firewall product and is now being ported to other IBM computer platforms.

1,480 citations


"Implementation of a simplified netw..." refers methods in this paper

  • ...• IPsec encryption: This application implements the cryptographic processing to encrypt IP headers and payload for VPN transmission [27]....

    [...]

Journal ArticleDOI
TL;DR: The Internet architecture has proven its worth by the vast array of applications it now supports and the wide variety of network technologies over which it currently runs.
Abstract: The Internet architecture has proven its worth by the vast array of applications it now supports and the wide variety of network technologies over which it currently runs. Most current Internet research involves either empirical measurement studies or incremental modifications that can be deployed without major architectural changes. Easy access to virtual testbeds could foster a renaissance in applied architectural research that extends beyond these incrementally deployable designs.

1,043 citations


"Implementation of a simplified netw..." refers methods in this paper

  • ...The next-generation Internet architecture is expected to rely on programmability in the infrastructure substrate to provide isolated network "slices" with functionally different protocol stacks [1], [2]....

    [...]