scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Packet Classification with Limited Memory Resources

01 Aug 2017-pp 179-183
TL;DR: A new hardware architecture for packet classification is designed, which can balance between the processing speed and hardware resources, and can scale the processing rate to wire-speed throughput on 100 Gbps line at the cost of additional memory resources.
Abstract: Network security and monitoring devices use packet classification to match packet header fields in a set of rules. Many hardware architectures have been designed to accelerate packet classification and achieve wire-speed throughput for 100 Gbps networks. The architectures are designed for high throughput even for the shortest packets. However, FPGA SoC and Intel Xeon with FPGA have limited resources for multiple accelerators. Usually, it is necessary to balance between available resources and the level of acceleration. Therefore, we have designed new hardware architecture for packet classification, which can balance between the processing speed and hardware resources. To achieve 10 Gbps average throughput the architecture need only 20 BlockRAMs for 5500 rules. Moreover, the architecture can scale the processing speed to wire-speed throughput on 100 Gbps line at the cost of additional memory resources.
Citations
More filters
Journal ArticleDOI
TL;DR: A unique parallel hardware architecture for hash-based exact match classification of multiple packets in each clock cycle that offers a reduction of memory replication requirements and is able to maintain a rather high throughput of matching multiple packets per clock cycle even without fully replicated memory resources in matching tables.

8 citations

Proceedings ArticleDOI
27 Oct 2020
TL;DR: This paper proposes flow table representation designed for easy translation into NPU search trees based on patterns of flow tables operated from RUNOS OpenFlow controller and shows that this approach effectively reduces program size.
Abstract: The paper considers the OpenFlow 1.3 switch based on a programmable network processing unit (NPU). OpenFlow switch performs flow entry lookup in a flow table by the values of packet header fields to determine actions to apply to incoming packet (classification).In the considered NPU, lookup operation is based on the search trees. The trees are implemented in the NPU assembly language. But these trees cannot be directly used for OpenFlow classification because of compared operands width limitation. In this paper, we propose flow table representation designed for easy translation into NPU search trees. Another goal of our research was to create a compact program that fits in the NPU memory.Another NPU limitation requires program update after each flow table data modification. Consequently, the switch must maintain the currently installed flows of the flow tables to provide a fast NPU program update. We developed algorithms for incremental update of flow table representation (flow addition and removal).To evaluate the proposed flow table translation approach, we used patterns of flow tables operated from RUNOS OpenFlow controller. A set of flow tables based on the patterns was translated into NPU assembly language using a simple algorithm (based on related work) and an improved algorithm (our proposal). The evaluation was performed on the NPU simulation model and showed that our approach effectively reduces program size.

2 citations


Cites methods from "Packet Classification with Limited ..."

  • ...The papers [6], [7] investigate an approach based on the decomposition of the classification by many fields into several classifications by one field....

    [...]

Proceedings ArticleDOI
01 Aug 2018
TL;DR: This paper proposes a novel parallel hardware architecture for hash-based exact match classification of multiple packets per clock cycle with reduced memory replication requirements and shows that the proposed approach can use memory very efficiently and scales exceptionally well with increased record capacities.
Abstract: Packet classification is a crucial operation for many different networking tasks ranging from switching or routing to monitoring and security devices like firewall or IDS. Generally, accelerated architectures implementing packet classification must be used to satisfy ever-growing demands of current high-speed networks. Furthermore, to keep up with the rising network throughputs, the accelerated architectures for FPGAs must be able to classify more than one packet in each clock cycle. This can be mainly achieved by utilization of multiple processing pipelines in parallel, what brings replication of FPGA logic and more importantly scarce on-chip memory resources. Therefore in this paper, we propose a novel parallel hardware architecture for hash-based exact match classification of multiple packets per clock cycle with reduced memory replication requirements. The basic idea is to leverage the fact that modern FPGAs offer hundreds of BlockRAM tiles that can be accessed (addressed) independently to maintain high throughput of matching even without fully replicated memory architecture. Our results show that the proposed approach can use memory very efficiently and scales exceptionally well with increased record capacities. For example, the designed architecture is able to achieve throughput of more than 2 Tbps (over 3 000 Mpps) with an effective capacity of more than 40 000 IPv4 flow records for the cost of only 366 BlockRAM tiles and around 57 000 LUTs.

1 citations


Cites background from "Packet Classification with Limited ..."

  • ...This idea was pushed further to build scalable architecture through memory duplication in [18]....

    [...]

  • ...The architecture was shown to be scalable even to higher throughputs [18], but only by using multiple copies of the memories....

    [...]

Proceedings ArticleDOI
03 May 2023
TL;DR: In this paper , the authors proposed optimizations to the DCFL algorithm and overall packet processing hardware architecture to maximize the throughput and minimize the resource strain, which can achieve up to a 76 % increase in the throughput of the packet classification.
Abstract: Packet classification is a crucial time-critical operation for many different networking tasks ranging from switching or routing to monitoring and security devices like firewalls or IDS. Accelerated architectures implementing packet classification must satisfy the ever-growing demand for current high-speed networks. However, packet classification is generally used together with other packet processing algorithms, which decreases the available hardware resources on the FPGA chip. The introduction of the P4 language requires the packet classification to be even more flexible while maintaining a high throughput with limited resources. Thus, we need flexible and high-performance architectures to balance processing speed and hardware resources for specific types of rules. DCFL algorithm provides high performance and flexibility. Therefore, we propose optimizations to the DCFL algorithm and overall packet processing hardware architecture. The goal is to maximize the throughput and minimize the resource strain. The main idea of the approach is to analyze the ruleset, identify some conflicting rules and offload these rules to other hardware modules. This approach allows us to process packets faster, even in the worst-case scenarios. Moreover, we can fit more packet processing into the FPGA and fine-tune the packet processing architecture to meet a specific network application’s throughput and resource demands. With the proposed optimizations we can achieve up to a 76 % increase in the throughput of the packet classification. Alternatively, we can achieve up to a 37 % decrease in resources needed.
Journal ArticleDOI
TL;DR: A bit vector-based IP lookup engine is presented that implements parallel units to achieve 4.3 Billion Packets Per Second (BPPS) lookup speeds for 5 fields and consumes much less memory, facilitating multiple engines on a single chip whilst maintaining a very low overall power profile.
References
More filters
Proceedings ArticleDOI
30 Aug 1999
TL;DR: It is found that a simple multi-stage classification algorithm, called RFC (recursive flow classification), can classify 30 million packets per second in pipelined hardware, or one million packetsper second in software.
Abstract: Routers classify packets to determine which flow they belong to, and to decide what service they should receive. Classification may, in general, be based on an arbitrary number of fields in the packet header. Performing classification quickly on an arbitrary number of fields is known to be difficult, and has poor worst-case performance. In this paper, we consider a number of classifiers taken from real networks. We find that the classifiers contain considerable structure and redundancy that can be exploited by the classification algorithm. In particular, we find that a simple multi-stage classification algorithm, called RFC (recursive flow classification), can classify 30 million packets per second in pipelined hardware, or one million packets per second in software.

822 citations


Additional excerpts

  • ...[13] proposed using recursive flow classification (RFC)....

    [...]

Proceedings ArticleDOI
01 Oct 1998
TL;DR: New packet classification schemes are presented that, with a worst-case and traffic-independent performance metric, can classify packets, by checking amongst a few thousand filtering rules, at rates of a million packets per second using range matches on more than 4 packet header fields.
Abstract: The ability to provide differentiated services to users with widely varying requirements is becoming increasingly important, and Internet Service Providers would like to provide these differentiated services using the same shared network infrastructure. The key mechanism, that enables differentiation in a connectionless network, is the packet classification function that parses the headers of the packets, and after determining their context, classifies them based on administrative policies or real-time reservation decisions. Packet classification, however, is a complex operation that can become the bottleneck in routers that try to support gigabit link capacities. Hence, many proposals for differentiated services only require classification at lower speed edge routers and also avoid classification based on multiple fields in the packet header even if it might be advantageous to service providers. In this paper, we present new packet classification schemes that, with a worst-case and traffic-independent performance metric, can classify packets, by checking amongst a few thousand filtering rules, at rates of a million packets per second using range matches on more than 4 packet header fields. For a special case of classification in two dimensions, we present an algorithm that can handle more than 128K rules at these speeds in a traffic independent manner. We emphasize worst-case performance over average case performance because providing differentiated services requires intelligent queueing and scheduling of packets that precludes any significant queueing before the differentiating step (i.e., before packet classification). The presented filtering or classification schemes can be used to classify packets for security policy enforcement, applying resource management decisions, flow identification for RSVP reservations, multicast look-ups, and for source-destination and policy based routing. The scalability and performance of the algorithms have been demonstrated by implementation and testing in a prototype system.

741 citations


"Packet Classification with Limited ..." refers background in this paper

  • ...[9], is a practical implementation that leverages the fact that rule updates are infrequent compared to search operations....

    [...]

Proceedings ArticleDOI
01 Oct 1998
TL;DR: Two new algorithms for solving the least cost matching filter problem at high speeds are described, based on a grid-of-tries construction and works optimally for processing filters consisting of two prefix fields using linear space.
Abstract: In Layer Four switching, the route and resources allocated to a packet are determined by the destination address as well as other header fields of the packet such as source address, TCP and UDP port numbers. Layer Four switching unifies firewall processing, RSVP style resource reservation filters, QoS Routing, and normal unicast and multicast forwarding into a single framework. In this framework, the forwarding database of a router consists of a potentially large number of filters on key header fields. A given packet header can match multiple filters, so each filter is given a cost, and the packet is forwarded using the least cost matching filter.In this paper, we describe two new algorithms for solving the least cost matching filter problem at high speeds. Our first algorithm is based on a grid-of-tries construction and works optimally for processing filters consisting of two prefix fields (such as destination-source filters) using linear space. Our second algorithm, cross-producting, provides fast lookup times for arbitrary filters but potentially requires large storage. We describe a combination scheme that combines the advantages of both schemes. The combination scheme can be optimized to handle pure destination prefix filters in 4 memory accesses, destination-source filters in 8 memory accesses worst case, and all other filters in 11 memory accesses in the typical case.

625 citations


"Packet Classification with Limited ..." refers background in this paper

  • ...Several of the possible approaches to support multiple dimensions are described in [12]....

    [...]

Proceedings ArticleDOI
25 Aug 2003
TL;DR: This paper introduces a classification algorithm called phHyperCuts, which can provide an order of magnitude improvement over existing classification algorithms and can be fully pipelined to provide one classification result every packet arrival time, and also allows fast updates.
Abstract: This paper introduces a classification algorithm called phHyperCuts. Like the previously best known algorithm, HiCuts, HyperCuts is based on a decision tree structure. Unlike HiCuts, however, in which each node in the decision tree represents a hyperplane, each node in the HyperCuts decision tree represents a k--dimensional hypercube. Using this extra degree of freedom and a new set of heuristics to find optimal hypercubes for a given amount of storage, HyperCuts can provide an order of magnitude improvement over existing classification algorithms. HyperCuts uses 2 to 10 times less memory than HiCuts optimized for memory, while the worst case search time of HyperCuts is 50--500% better than that of HiCuts optimized for speed. Compared with another recent scheme, EGT-PC, HyperCuts uses 1.8--7 times less memory space while the worst case search time is up to 5 times smaller. More importantly, unlike EGT-PC, HyperCuts can be fully pipelined to provide one classification result every packet arrival time, and also allows fast updates.

572 citations


"Packet Classification with Limited ..." refers background or methods in this paper

  • ...First algorithms and hardware architectures HiCuts [3] and HyperCuts [2] use the geometric representation of the classification problem, where packet header fields are dimensions and the classification is the searching in the n-dimensional space....

    [...]

  • ...They have observed that HyperCuts and similar decision-tree-based algorithms do not efficiently deal with rules that have too much overlap with each other....

    [...]

  • ...HiCuts [3] and HyperCuts [2] are examples of such algorithms....

    [...]

  • ...A way to increase throughput of HyperCuts was introduced by Luo et al. [14]....

    [...]

  • ...Kennedy et al. [15] implemented simplified version of HyperCuts algorithm with the goal of reducing power consumption and increase power efficiency....

    [...]

Journal ArticleDOI
TL;DR: This work presents ClassBench, a suite of tools for benchmarking packet classification algorithms and devices and seeks to eliminate the significant access barriers to realistic test vectors for researchers and initiate a broader discussion to guide the refinement of the tools and codification of a formal benchmarking methodology.
Abstract: Packet classification is an enabling technology for next generation network services and often a performance bottleneck in high-performance routers. The performance and capacity of many classification algorithms and devices, including TCAMs, depend upon properties of filter sets and query patterns. Despite the pressing need, no standard performance evaluation tools or filter sets are publicly available. In response to this problem, we present ClassBench, a suite of tools for benchmarking packet classification algorithms and devices. ClassBench includes a filter set generator that produces synthetic filter sets that accurately model the characteristics of real filter sets. Along with varying the size of the filter sets, we provide high-level control over the composition of the filters in the resulting filter set. The tool suite also includes a trace generator that produces a sequence of packet headers to exercise packet classification algorithms with respect to a given filter set. Along with specifying the relative size of the trace, we provide a simple mechanism for controlling locality of reference. While we have already found ClassBench to be very useful in our own research, we seek to eliminate the significant access barriers to realistic test vectors for researchers and initiate a broader discussion to guide the refinement of the tools and codification of a formal benchmarking methodology. (The ClassBench tools are publicly available at the following site: http://www.arl.wustl.edu/~det3/ClassBench/.)

478 citations