Front end device for content networking
10 Mar 2008-pp 1456-1461
TL;DR: This paper proposes an architecture for a device that will utilize hardware-level string matching to distribute incoming requests for a server farm, implemented in VHDL, synthesized, and laid out on an Altera FPGA.
Abstract: The bandwidth and speed of network connections are continually increasing. The speed increase in network technology is set to soon outpace the speed increase in CMOS technology. This asymmetrical growth is beginning to causing software applications that once worked with then current levels of network traffic to flounder under the new high data rates. Processes that were once executed in software now have to be executed, partially if not wholly in hardware. One such application that could benefit from hardware implementation is high layer routing. By allowing a network device to peer into higher layers of the OSI model, the device can scan for viruses, provide higher quality-of-service (QoS), and efficiently route packets. This paper proposes an architecture for a device that will utilize hardware-level string matching to distribute incoming requests for a server farm. The proposed architecture is implemented in VHDL, synthesized, and laid out on an Altera FPGA.
Citations
More filters
TL;DR: This special purpose processor is a parallel and pipelined architecture which can deal with the regular expression semantics and can achieve 200-400 times speedup over traditional CPU implementations and up to 7.9Gbps in processing throughput.
Abstract: The expressive power of regular expressions has been often adopted in network intrusion detection systems, virus scanners, and spam filtering applications. However in the CPU based systems, pattern matching is one of the most computation intensive parts. In this paper, we present the design, implementation and evaluation of a regular expression string matching processing unit (SMPU). This special purpose processor is a parallel and pipelined architecture which can deal with the regular expression semantics. Two hardware stacks are implemented in SMPU to support fast branches when the non-matching occurs. Our implementation processes four characters per clock cycle (maximum performance of state of the art solutions) and occupies only O(n) memory (where n is the length of the regular expression) via synthesizing the verilog description and analyzing area/time constraints, SMPU can achieve 200-400 times speedup over traditional CPU implementations and up to 7.9Gbps in processing throughput. Besides it outperforms the counterparts greatly as the complexity of regular expressions increases.
3 citations
Journal Article•
TL;DR: An extensible firewall has been implemented that performs packet filtering, content scanning, and per-flow queuing of Internet packets at Gigabit/second rates as mentioned in this paper, using layered protocol wrappers to parse the content of Internet data.
Abstract: An extensible firewall has been implemented that performs packet filtering, content scanning, and per-flow queuing of Internet packets at Gigabit/second rates The firewall uses layered protocol wrappers to parse the content of Internet data Packet payloads are scanned for keywords using parallel regular expression matching circuits Packet headers are compared to rules specified in Ternary Content Addressable Memories (TCAMs) Per-flow queuing is performed to mitigate the effect of Denial of Service attacks All packet processing operations were implemented with reconfigurable hardware and fit within a single Xilinx Virtex XCV2000E Field Programmable Gate Array (FPGA) The single-chip firewall has been used to filter Internet SPAM and to guard against several types of network intrusion Additional features were implemented in extensible hardware modules deployed using run-time reconfiguration
3 citations
01 Jan 2014
TL;DR: This paper implemented search process to perform compressed pattern matching in binary Huffman encoded texts by applying Brute-Force Search algorithm and evaluating pattern matching processes in terms of clock cycle.
Abstract: High speed and always-on network access is becoming commonplace around the world, creating a demand for increased network security. Network Intrusion Detection Systems (NIDS) attempt to detect and prevent attacks from the network using pattern-matching rules. Data compression methods are used to reduce the data storage requirement. Searching a compressed pattern in the compressed text reduces the internal storage requirement and computation resources. In this paper we implemented search process to perform compressed pattern matching in binary Huffman encoded texts. Brute-Force Search algorithm is applied comparing a single bit per clock cycle and comparing an encoded character per clock cycle. Pattern matching processes are evaluated in terms of clock cycle.
References
More filters
TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Abstract: This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
3,270 citations
TL;DR: The algorithm has the unusual property that, in most cases, not all of the first i .” in another string, are inspected.
Abstract: An algorithm is presented that searches for the location, “il” of the first occurrence of a character string, “pat,” in another string, “string.” During the search operation, the characters of pat are matched starting with the last character of pat. The information gained by starting the match at the end of the pattern often allows the algorithm to proceed in large jumps through the text being searched. Thus the algorithm has the unusual property that, in most cases, not all of the first i characters of string are inspected. The number of characters actually inspected (on the average) decreases as a function of the length of pat. For a random English pattern of length 5, the algorithm will typically inspect i/4 characters of string before finding a match at i. Furthermore, the algorithm has been implemented so that (on the average) fewer than i + patlen machine instructions are executed. These conclusions are supported with empirical evidence and a theoretical analysis of the average behavior of the algorithm. The worst case behavior of the algorithm is linear in i + patlen, assuming the availability of array space for tables linear in patlen plus the size of the alphabet.
2,542 citations
01 Oct 1998
TL;DR: A simple, practical strategy for locality-aware request distribution (LARD), in which the front-end distributes incoming requests in a manner that achieves high locality in the back-ends' main memory caches as well as load balancing.
Abstract: We consider cluster-based network servers in which a front-end directs incoming requests to one of a number of back-ends. Specifically, we consider content-based request distribution: the front-end uses the content requested, in addition to information about the load on the back-end nodes, to choose which back-end will handle this request. Content-based request distribution can improve locality in the back-ends' main memory caches, increase secondary storage scalability by partitioning the server's database, and provide the ability to employ back-end nodes that are specialized for certain types of requests.As a specific policy for content-based request distribution, we introduce a simple, practical strategy for locality-aware request distribution (LARD). With LARD, the front-end distributes incoming requests in a manner that achieves high locality in the back-ends' main memory caches as well as load balancing. Locality is increased by dynamically subdividing the server's working set over the back-ends. Trace-based simulation results and measurements on a prototype implementation demonstrate substantial performance improvements over state-of-the-art approaches that use only load information to distribute requests. On workloads with working sets that do not fit in a single server node's main memory cache, the achieved throughput exceeds that of the state-of-the-art approach by a factor of two to four.With content-based distribution, incoming requests must be handed off to a back-end in a manner transparent to the client, after the front-end has inspected the content of the request. To this end, we introduce an efficient TCP handoflprotocol that can hand off an established TCP connection in a client-transparent manner.
643 citations
22 Sep 2002
TL;DR: A module generator that extracts strings from the Snort NIDS rule-set, generates a regular expression that matches all extracted strings, synthesizes a FPGA-based string matching circuit, and generates an EDIF netlist that can be processed by Xilinx software to create an FPGAs bitstream is developed.
Abstract: String matching is used by Network Intrusion Detection Systems (NIDS) to inspect incoming packet payloads for hostile data. String-matching speed is often the main factor limiting NIDS performance. String-matching performance can be dramatically improved by using Field-Programmable Gate Arrays (FPGAs); accordingly, a "regular-expression to FPGA circuit" module generator has been developed. The module generator extracts strings from the Snort NIDS rule-set, generates a regular expression that matches all extracted strings, synthesizes a FPGA-based string matching circuit, and generates an EDIF netlist that can be processed by Xilinx software to create an FPGA bitstream. The feasibility of this approach is demonstrated by comparing the performance of the FPGA-based string matcher against the software-based GNU regex program. The FPGA-based string matcher exceeds the performance of the software-based system by 600x for large patterns.
380 citations
"Front end device for content networ..." refers methods in this paper
...Because of the large number of threats faced by networks and computers and increasing line speeds, software based NIDS have become unusable [3, 4, 14]....
[...]
20 Apr 2004
TL;DR: The efficiency of the technique enables a current-generation FPGA device to support pattern-matching at network rates from 1 Gbps to 100 Gbps and beyond and offers flexible trade-offs between character capacity, throughput, and data bus width and rate.
Abstract: In this paper, we present a scalable FPGA design methodology for searching network packet payloads for a large number of patterns, including complex regular expressions. The efficiency of the technique enables a current-generation FPGA device to support pattern-matching at network rates from 1 Gbps to 100 Gbps and beyond. It offers flexible trade-offs between character capacity, throughput, and data bus width and rate. This allows the approach to be used in a wide range of devices from low-end home network appliances to high-end backbone routers. Suitable network applications for the FPGA pattern-matcher include firewalls, network intrusion detection, email virus scanning, and junk-email identification. In this work, we use a standard set of patterns from an intrusion detection system to demonstrate the performance and scalability of our design with a real-world application.
347 citations