scispace - formally typeset
Search or ask a question
Author

Sarang Dharmapurikar

Other affiliations: University of Washington
Bio: Sarang Dharmapurikar is an academic researcher from Washington University in St. Louis. The author has contributed to research in topics: Bloom filter & Network packet. The author has an hindex of 19, co-authored 25 publications receiving 3807 citations. Previous affiliations of Sarang Dharmapurikar include University of Washington.

Papers
More filters
Journal ArticleDOI
TL;DR: This work describes a hardware-based technique using Bloom filters, which can detect strings in streaming data without degrading network throughput and queries a database of strings to check for the membership of a particular string.
Abstract: There is a class of packet processing applications that inspect packets deeper than the protocol headers to analyze content. For instance, network security applications must drop packets containing certain malicious Internet worms or computer viruses carried in a packet payload. Content forwarding applications look at the hypertext transport protocol headers and distribute the requests among the servers for load balancing. Packet inspection applications, when deployed at router ports, must operate at wire speeds. With networking speeds doubling every year, it is becoming increasingly difficult for software-based packet monitors to keep up with the line rates. We describe a hardware-based technique using Bloom filters, which can detect strings in streaming data without degrading network throughput. A Bloom filter is a data structure that stores a set of signatures compactly by computing multiple hash functions on each member of the set. This technique queries a database of strings to check for the membership of a particular string. The answer to this query can be false positive but never a false negative. An important property of this data structure is that the computation time involved in performing the query is independent of the number of strings in the database provided the memory used by the data structure scales linearly with the number of strings stored in it. Furthermore, the amount of storage required by the Bloom filter for each string is independent of its length.

707 citations

Journal ArticleDOI
11 Aug 2006
TL;DR: This paper introduces a new representation for regular expressions, called the Delayed Input DFA (D2FA), which substantially reduces space equirements as compared to a DFA, and describes an efficient architecture that can perform deep packet inspection at multi-gigabit rates.
Abstract: There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. owever, there is growing interest in the use of regular expression-based pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application.In this paper, we introduce a new representation for regular expressions, called the Delayed Input DFA (D2FA), which substantially reduces space equirements as compared to a DFA. A D2FA is constructed by transforming a DFA via incrementally replacing several transitions of the automaton with a single default transition. Our approach dramatically reduces the number of distinct transitions between states. For a collection of regular expressions drawn from current commercial and academic systems, a D2FA representation reduces transitions by more than 95%. Given the substantially reduced space equirements, we describe an efficient architecture that can perform deep packet inspection at multi-gigabit rates. Our architecture uses multiple on-chip memories in such a way that each remains uniformly occupied and accessed over a short duration, thus effectively distributing the load and enabling high throughput. Our architecture can provide ostffective packet content scanning at OC-192 rates with memory requirements that are consistent with current ASIC technology.

553 citations

Proceedings ArticleDOI
22 Aug 2005
TL;DR: This work presents a novel hash table data structure and lookup algorithm which improves the performance over a naive hash table by reducing the number of memory accesses needed for the most time-consuming lookups, which allows designers to achieve higher lookup performance for a given memory bandwidth.
Abstract: Hash tables are fundamental components of several network processing algorithms and applications, including route lookup, packet classification, per-flow state management and network monitoring. These applications, which typically occur in the data-path of high-speed routers, must process and forward packets with little or no buffer, making it important to maintain wire-speed throughout. A poorly designed hash table can critically affect the worst-case throughput of an application, since the number of memory accesses required for each lookup can vary. Hence, high throughput applications require hash tables with more predictable worst-case lookup performance. While published papers often assume that hash table lookups take constant time, there is significant variation in the number of items that must be accessed in a typical hash table search, leading to search times that vary by a factor of four or more.We present a novel hash table data structure and lookup algorithm which improves the performance over a naive hash table by reducing the number of memory accesses needed for the most time-consuming lookups. This allows designers to achieve higher lookup performance for a given memory bandwidth, without requiring large amounts of buffering in front of the lookup engine. Our algorithm extends the multiple-hashing Bloom Filter data structure to support exact matches and exploits recent advances in embedded memory technology. Through a combination of analysis and simulations we show that our algorithm is significantly faster than a naive hash table using the same amount of memory, hence it can support better throughput for router applications that use hash tables.

410 citations

Proceedings ArticleDOI
25 Aug 2003
TL;DR: This work introduces the first algorithm that is aware of to employ Bloom filters for longest prefix matching (LPM), and shows that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches.
Abstract: We introduce the first algorithm that we are aware of to employ Bloom filters for Longest Prefix Matching (LPM). The algorithm performs parallel queries on Bloom filters, an efficient data structure for membership queries, in order to determine address prefix membership in sets of prefixes sorted by prefix length. We show that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches. The key feature of our technique is that the performance, as determined by the number of dependent memory accesses per lookup, can be held constant for longer address lengths or additional unique address prefix lengths in the forwarding table given that memory resources scale linearly with the number of prefixes in the forwarding table.Our approach is equally attractive for Internet Protocol Version 6 (IPv6) which uses 128-bit destination addresses, four times longer than IPv4. We present a basic version of our approach along with optimizations leveraging previous advances in LPM algorithms. We also report results of performance simulations of our system using snapshots of IPv4 BGP tables and extend the results to IPv6. Using less than 2Mb of embedded RAM and a commodity SRAM device, our technique achieves average performance of one hash probe per lookup and a worst case of two hash probes and one array access per lookup.

377 citations

Journal ArticleDOI
TL;DR: This work introduces the first algorithm that is aware of to employ Bloom filters for longest prefix matching (LPM), and shows that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches.
Abstract: We introduce the first algorithm that we are aware of to employ Bloom filters for longest prefix matching (LPM). The algorithm performs parallel queries on Bloom filters, an efficient data structure for membership queries, in order to determine address prefix membership in sets of prefixes sorted by prefix length. We show that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches. The key feature of our technique is that the performance, as determined by the number of dependent memory accesses per lookup, can be held constant for longer address lengths or additional unique address prefix lengths in the forwarding table given that memory resources scale linearly with the number of prefixes in the forwarding table. Our approach is equally attractive for Internet Protocol Version 6 (IPv6) which uses 128-bit destination addresses, four times longer than IPv4. We present a basic version of our approach along with optimizations leveraging previous advances in LPM algorithms. We also report results of performance simulations of our system using snapshots of IPv4 BGP tables and extend the results to IPv6. Using less than 2 Mb of embedded RAM and a commodity SRAM device, our technique achieves average performance of one hash probe per lookup and a worst case of two hash probes and one array access per lookup.

290 citations


Cited by
More filters
Patent
29 Oct 2007
TL;DR: In this article, a flow processing facility for inspecting payloads of network traffic packets detects security threats and intrusions across accessible layers of the IP-stack by applying content matching and behavioral anomaly detection techniques based on regular expression matching and self-organizing maps.
Abstract: A flow processing facility, which uses a set of artificial neurons for pattern recognition, such as a self-organizing map, in order to provide security and protection to a computer or computer system supports unified threat management based at least in part on patterns relevant to a variety of types of threats that relate to computer systems, including computer networks. Flow processing for switching, security, and other network applications, including a facility that processes a data flow to address patterns relevant to a variety of conditions are directed at internal network security, virtualization, and web connection security. A flow processing facility for inspecting payloads of network traffic packets detects security threats and intrusions across accessible layers of the IP-stack by applying content matching and behavioral anomaly detection techniques based on regular expression matching and self-organizing maps. Exposing threats and intrusions within packet payload at or near real-time rates enhances network security from both external and internal sources while ensuring security policy is rigorously applied to data and system resources. Intrusion Detection and Protection (IDP) is provided by a flow processing facility that processes a data flow to address patterns relevant to a variety of types of network and data integrity threats that relate to computer systems, including computer networks.

1,428 citations

01 Jan 2010
TL;DR: A global center for commercial innovation, PARC, a Xerox company, works closely with enterprises, entrepreneurs, government program partners and other clients to discover, develop, and deliver new business opportunities.
Abstract: A global center for commercial innovation, PARC, a Xerox company, works closely with enterprises, entrepreneurs, government program partners and other clients to discover, develop, and deliver new business opportunities. PARC was incorporated in 2002 as a wholly owned subsidiary of Xerox Corporation (NYSE: XRX).

1,072 citations

Journal ArticleDOI
TL;DR: This paper presents a comprehensive discussion on state-of-the-art big data technologies based on batch and stream data processing based on structuralism and functionalism paradigms and strengths and weaknesses of these technologies are analyzed.

964 citations

Proceedings ArticleDOI
17 May 2004
TL;DR: A lattice-based static analysis algorithm derived from type systems and typestate is created, and its soundness is addressed, thus securing Web applications in the absence of user intervention and reducing potential runtime overhead by 98.4%.
Abstract: Security remains a major roadblock to universal acceptance of the Web for many kinds of transactions, especially since the recent sharp increase in remotely exploitable vulnerabilities have been attributed to Web application bugs. Many verification tools are discovering previously unknown vulnerabilities in legacy C programs, raising hopes that the same success can be achieved with Web applications. In this paper, we describe a sound and holistic approach to ensuring Web application security. Viewing Web application vulnerabilities as a secure information flow problem, we created a lattice-based static analysis algorithm derived from type systems and typestate, and addressed its soundness. During the analysis, sections of code considered vulnerable are instrumented with runtime guards, thus securing Web applications in the absence of user intervention. With sufficient annotations, runtime overhead can be reduced to zero. We also created a tool named.WebSSARI (Web application Security by Static Analysis and Runtime Inspection) to test our algorithm, and used it to verify 230 open-source Web application projects on SourceForge.net, which were selected to represent projects of different maturity, popularity, and scale. 69 contained vulnerabilities. After notifying the developers, 38 acknowledged our findings and stated their plans to provide patches. Our statistics also show that static analysis reduced potential runtime overhead by 98.4%.

655 citations

Proceedings ArticleDOI
02 Dec 2014
TL;DR: Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters, and have lower space overhead than space-optimized Bloom filters.
Abstract: In many networking systems, Bloom filters are used for high-speed set membership tests. They permit a small fraction of false positive answers with very good space efficiency. However, they do not permit deletion of items from the set, and previous attempts to extend "standard" Bloom filters to support deletion all degrade either space or performance. We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set membership tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters outperform previous data structures that extend Bloom filters to support deletions substantially in both time and space.

593 citations