IBM PowerNP network processor: Hardware, software, and applications
Summary (5 min read)
Introduction
- The convergence of telecommunications and computer networking into next-generation networks poses challenging demands for high performance and flexibility.
- In addition, more sophisticated end user services lead to further demands on edge devices, calling for high flexibility to support evolving high-level services as well as performance to deal with associated high packet rates.
- Traditional hardware design, in which ASICs are used to perform the bulk of processing load, is not suited for the complex operations required and the new and evolving protocols that must be processed.
- Permission to republish any other portion of this paper must be obtained from the Editor.
- Not only is the instruction set customized for packet processing and forwarding; the entire design of the network processor, including execution environment, memory, hardware accelerators, and bus architecture, is optimized for high-performance packet handling.
System architecture
- From a system architecture viewpoint, network processors can be divided into two general models: the run-to- 2.
- Which is a high-end member of the IBM network processor family.
- The model is based on the symmetric multiprocessor (SMP) architecture, in which multiple CPUs share the same memory [5].
- Even when processing is identical for every packet, the code path must be partitioned according to the number of pipeline stages required.
- It provides an interface to multiple large data memories for buffering data traffic as it flows through the network processor.
Functional blocks
- Figure 4 shows the main functional blocks that make up the PowerNP architecture.
- In the following sections the authors discuss each functional block within the PowerNP.
Physical MAC multiplexer
- The physical MAC multiplexer (PMM) moves data between physical layer devices and the PowerNP.
- The PMM interfaces with the external ports of the network processor in the ingress PMM and egress PMM directions.
- When a DMU is configured for Ethernet, it can support either one port of 1 Gigabit Ethernet or ten ports of Fast Ethernet (10/100 Mb/s).
- To provide an OC-48 clear channel (OC-48c) link, DMU A is configured to attach to a 32-bit framer and the other three DMUs are disabled, providing only interface pins for the data path.
Switch interface
- The switch interface (SWI) supports two high-speed dataaligned synchronous link (DASL)4 interfaces, labeled A and B, supporting standalone operation (wrap), dual-mode operation (two PowerNPs interconnected), or connection to an external switch fabric.
- The DASL links A and B can be used in parallel, with one acting as the primary switch interface and the other as an alternate switch interface for increased system availability.
- The ingress SWI side sends data to the switch fabric, and the egress SWI side receives data from the switch fabric.
- The egress SDM is the logical interface between the switch fabric cell data flow and the packet data flow of the egress EDS, also designated as the egress DF.
- There is also an “internal wrap” link which enables traffic generated by the ingress side of the PowerNP to move to the egress side without going out of the chip.
Data flow and traffic management
- The ingress DF interfaces with the ingress PMM, the EPC, and the SWI.
- After it selects a packet, the ingress DF passes the packet to the ingress SWI.
- When the ingress DS is sufficiently congested, the flow-control actions discard packets.
- The egress DF enqueues the packet to the EPC for processing.
- The scheduler manages bandwidth on a per-packet basis by determining the bandwidth required by a packet (i.e., the number of bytes to be transmitted) and comparing this against the bandwidth permitted by the configuration of the packet flow queue.
Embedded processor complex
- The embedded processor complex (EPC) performs all processing functions for the PowerNP.
- Within the EPC, eight dyadic protocol processor units containing processors, coprocessors, and hardware accelerators support functions such as packet parsing and classification, high-speed pattern search, and internal chip management.
- The GPH-Resp thread processes responses from the embedded PowerPC.
- The TSE coprocessor provides hardware search operations for full match (FM) trees, longest prefix match (LPM) trees, and software-managed trees (SMTs).
- Each thread has read and write access to the ingress and egress DS through a DS coprocessor.
Ingress side
- The ingress PMM receives a packet from an external physical layer device and forwards it to the ingress DF: 1. The ingress DF enqueues the packet to the EPC.
- The code examines the information from the HC and may examine the data further; it assembles search keys and launches the TSE.
- Packet data moves into the memory buffer of the DS coprocessor.
- ● Forwarding and packet alteration information is identified by the results of the search. ●.
- With the help of the ingress DF, the ingress switch data mover (I-SDM) segments the packets from the switch interface queues into 64-byte cells and inserts cell header and packet header bytes as they are transmitted to the SWI.
Egress side
- The egress SWI receives a packet from a switch fabric, from another PowerNP processor, or from the ingress SWI of the device.
- The code examines the information from the HC and may examine the data further; it assembles search keys and launches the TSE.
- In flexible packet alteration, the code allocates additional buffers, and the DS coprocessor places data in these buffers.
- The enqueue coprocessor develops the necessary information to enqueue the packet to the egress DF and provides it to the CU, which guarantees the packet order as the data moves from the 32 threads of the DPPUs to the egress DF queues.
- The egress DF selects packets for transmission from the target port queue and moves their data to the egress PMM.
System software architecture
- The PowerNP system software architecture is defined around the concept of partitioning control and data planes, as shown in Figure 6.
- This is consistent with industry and standard directions, for example, the Network Processing Forum (NPF) and the Forces group of the Internet Engineering Task Force (IETF) organization.
- The non-performance-critical functions of the control plane run on a GPP, while the performancecritical data plane functions run on the PowerNP processing elements (i.e., CLPs).
- There is no need to use the instruction memory on the network processor for IP options processing, given that options are associated with only a small percentage of IP packets.
- The software architecture and programming model describes the data plane functions and APIs, the control plane functions and APIs, and the communication model between these components.
Data plane
- The data plane is structured as two major components: ● System library.
- These functions provide a hardware abstraction layer that can be used either from the control plane, using a message-passing interface, or from the data plane software, using API calls.
- These components plus the overall software design help a programmer to develop the PowerNP networking applications quickly.
- Po w er N P so ft w ar e de ve lo pm en t t oo lk it Management application NPAS core Middleware Po w er N P Application library System library VOL.
- The MBC model provides scalability in configuration, supporting multiple network processors and control processors, and achieves a scalable system architecture.
Control plane
- The control plane networking applications interface with the data plane functions using the network processor application services (NPAS) layer that exposes two types of APIs: ● Protocol services API (such as IPv4 and MPLS).
- This set of services handles hardware-independent protocol objects.
- The control plane software provides a way to manage the PowerNP, and it also provides a set of APIs which can be used to support generic control plane protocol stacks.
- The look and feel of the APIs is consistent inside the NPAS.
- The NPAS can easily be ported to any operating system and can be used with any application because it is designed to run on any OS and to connect to the network processor via any transport mechanism.
Software development toolkit
- The PowerNP software development toolkit provides a set of tightly integrated development tools which address each phase of the software development process.
- Through the use of Tcl/Tk scripts, developers can interact with the simulation model and perform a wide range of functions in support of packet traffic generation and analysis.
- Figure 8 PowerNP software development kit. NPProfile Chip-level simulator NP simulation model NPSim NPScope NPTest NPAsm Software development toolkit Network processor ePPC 405 R IS C W at ch pr ob e RISCWatch application Ethernet PowerNP JTAG interface IBM J. RES.
- NPProfile analyzes simulation event information contained in a message log file produced by NPScope to accumulate relevant data regarding the performance of picocode execution.
Networking applications support
- The power and flexibility of the PowerNP is useful in supporting a wide range of existing and emerging applications.
- A number of networking applications have been implemented on the PowerNP chip, and the following sections discuss two of them.
Small group multicast
- Small group multicast (SGM) [9] is a new approach to IP multicast that makes multicast practical for applications such as IP telephony, videoconferencing, and multimedia “e-meetings.”.
- Like today’s multicast schemes, SGM sends at most one copy of any given packet across any network link, thus minimizing the use of network bandwidth.
- A router performs a route table lookup to determine the “next hop” for each of the destinations.
- Programmability that allows the above functions to be combined in nontraditional ways.
- A preliminary version of SGM has been implemented on the PowerNP chip.
GPRS tunneling protocol
- General packet radio service (GPRS), a set of protocols for converging the mobile data with the IP packet data, presents new challenges to the equipment manufacturers of GPRS support nodes (GSNs) as the bandwidth available to mobile terminals increases significantly with wireless technology advances.
- The authors consider the support of GTP as a typical networking application that requires a high memory/bandwidth product and deeper packet processing than the common packet forwarding of an IP router.
- Traffic counters associated with the context are incremented to account for the data transmission.
- The decapsulation process requires the retrieval of the GTP context from the IP address of the inner IP header.
- Packet reordering based on the GTP sequence number requires the temporary storage of misordered packets in a per-context associated reordering queue.
Performance
- The PowerNP picoprocessors provide 2128 MIPS of aggregate processing capability.
- Similarly, some cycles are not usable for instruction execution (i.e., both threads waiting for search result).
- To quantify expected performance for specific applications, associated code paths are profiled in terms of memory accesses, coprocessor use, program flow (i.e., branch instructions), and overlap of coprocessor operations with instructions or other coprocessor operations.
- The following code paths from the PowerNP software package achieve OC-48 line speed at minimum packet size: ● Border gateway protocol (BGP) layer-3 routing.
Summary
- As the demand on network edge equipment increases to provide more services on behalf of the core and the end user, the role of flexible and programmable network processors becomes more critical.
- The authors have discussed the challenges and demands posed by nextgeneration networks and have described how network processors can address these issues by performing highly sophisticated packet processing at line speed.
- Its hardware and software design characteristics make it an ideal component for a wide range of networking applications.
- Its run-to-completion model supports a simple programming model and a scalable system architecture, which provide abundant functionality and headroom at line speed.
- Because of the availability of associated advanced development and simulation tools, combined with extensive reference implementations, rapid prototyping and development of new high-performance applications are significantly easier than with either GPPs or ASICs.
Acknowledgments
- The authors gratefully acknowledge the significant contributions of a large number of their colleagues at IBM who, through years of research, design, and development, have helped to create and document the PowerNP network processor.
- The processor would not have been possible without the appreciable efforts and contributions by those colleagues.
- *Trademark or registered trademark of International Business Machines Corporation.
Did you find this useful? Give us your feedback
Citations
173 citations
94 citations
62 citations
62 citations
59 citations
Cites methods from "IBM PowerNP network processor: Hard..."
...The parallel hardware architectures we target are Network Processors (NPs) [ 1 ,6,10,23]: specialised programmable chips designed for high-speed packet processing....
[...]
References
[...]
2,094 citations
"IBM PowerNP network processor: Hard..." refers background in this paper
...The routing challenges introduced by the mobility of terminals are supported by tunneling packets between the GSN of a wireless network provider, similarly to the mobile IP scheme [11]....
[...]
774 citations
358 citations
270 citations
251 citations
Related Papers (5)
Frequently Asked Questions (16)
Q2. What is the importance of a unified packet-based network?
Scalability for traffic engineering, quality of service (QoS), and the integration of wireless networks in a unified packet-based next-generation network requires traffic differentiation and aggregation.
Q3. How many cycles per second can the egress DS run?
To sustain media speed with 48-byte packets, 6.1 million packets per second, the egress DS must run with a 10-clock cycle data store access window.
Q4. What are the main components of the PowerNP?
The PowerNP has the following main components: embedded processor complex (EPC), data flow (DF), scheduler, MACs, and coprocessors.
Q5. What is the function of the control store arbiter?
The control store arbiter (CSA) controls access to the control store (CS), which allocates memory bandwidth among the threads of all DPPUs.
Q6. What is the way to ensure consistency in the assignment and verification of GTP sequence numbers?
Consistency in the assignment and verification of GTP sequence numbers, and in operations on the reordering queues, is ensured by using the semaphore coprocessor.
Q7. How many threads can be executed simultaneously?
Although there are 32 independent threads, each CLP can execute the instructions of only one of its threads at a time, so at any instant up to 16 threads are executing simultaneously.
Q8. What is the purpose of the DS interface and arbiters?
The ingress and egress DS interface and arbiters are for controlling accesses to the DS, since only one thread at a time can access either DS.
Q9. What is the way to configure a DMU for Gigabit Ethernet?
To support 1 Gigabit Ethernet, a DMU can be configured as either a gigabit mediaindependent interface (GMII) or a ten-bit interface (TBI).
Q10. Why is it easier to develop new high-performance applications?
Because of the availability of associated advanced development and simulation tools, combined with extensive reference implementations, rapid prototyping and development of new high-performance applications are significantly easier than with either GPPs or ASICs.
Q11. What is the softwarearchitecture and programming model?
The softwarearchitecture and programming model describes the data plane functions and APIs, the control plane functions and APIs, and the communication model between these components.
Q12. What are the types of threads supported in the EPC?
Five types of threads are supported:● General data handler (GDH) Seven DPPUs contain the GDH threads for a total of 28 GDH threads.
Q13. What is the abbreviation of the term PowerNP?
In this paper the abbreviated term PowerNP is used to designate the IBM PowerNP NP4GS3, which is a high-end member of the IBM network processor family.
Q14. What is the assembler for creating images designed to execute on the powerNP?
It generates files used to execute picocode on the chip-level simulation model or the PowerNP, as well as files that picocode programmers can use for debugging.
Q15. What is the main structure that manages the CS?
The lookup definition table (LuDefTable), an internal memory structure that contains 128 entries to define 128 trees, is the main structure that manages the CS.
Q16. What is the significant challenge of a pipelined programming model?
Perhaps a more significant challenge of a pipelined programming model is in dealing with changes, since a relatively minor code change may require a programmer to start from scratch with code partitioning.