Home
/
Authors
/
Sarang Dharmapurikar

Author

Sarang Dharmapurikar

Other affiliations: University of Washington

Bio: Sarang Dharmapurikar is an academic researcher from Washington University in St. Louis. The author has contributed to research in topics: Bloom filter & Network packet. The author has an hindex of 19, co-authored 25 publications receiving 3807 citations. Previous affiliations of Sarang Dharmapurikar include University of Washington.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Deep packet inspection using parallel bloom filters

[...]

Sarang Dharmapurikar¹, Praveen Krishnamurthy¹, Todd Sproull¹, John W. Lockwood¹•Institutions (1)

Washington University in St. Louis¹

01 Jan 2004-IEEE Micro

TL;DR: This work describes a hardware-based technique using Bloom filters, which can detect strings in streaming data without degrading network throughput and queries a database of strings to check for the membership of a particular string.

...read moreread less

Abstract: There is a class of packet processing applications that inspect packets deeper than the protocol headers to analyze content. For instance, network security applications must drop packets containing certain malicious Internet worms or computer viruses carried in a packet payload. Content forwarding applications look at the hypertext transport protocol headers and distribute the requests among the servers for load balancing. Packet inspection applications, when deployed at router ports, must operate at wire speeds. With networking speeds doubling every year, it is becoming increasingly difficult for software-based packet monitors to keep up with the line rates. We describe a hardware-based technique using Bloom filters, which can detect strings in streaming data without degrading network throughput. A Bloom filter is a data structure that stores a set of signatures compactly by computing multiple hash functions on each member of the set. This technique queries a database of strings to check for the membership of a particular string. The answer to this query can be false positive but never a false negative. An important property of this data structure is that the computation time involved in performing the query is independent of the number of strings in the database provided the memory used by the data structure scales linearly with the number of strings stored in it. Furthermore, the amount of storage required by the Bloom filter for each string is independent of its length.

...read moreread less

707 citations

Journal Article•DOI•

Algorithms to accelerate multiple regular expressions matching for deep packet inspection

[...]

Sailesh Kumar¹, Sarang Dharmapurikar¹, Fang Yu², Patrick Crowley¹, Jonathan S. Turner¹ - Show less +1 more•Institutions (2)

University of Washington¹, University of California, Berkeley²

11 Aug 2006

TL;DR: This paper introduces a new representation for regular expressions, called the Delayed Input DFA (D2FA), which substantially reduces space equirements as compared to a DFA, and describes an efficient architecture that can perform deep packet inspection at multi-gigabit rates.

...read moreread less

Abstract: There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. owever, there is growing interest in the use of regular expression-based pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application.In this paper, we introduce a new representation for regular expressions, called the Delayed Input DFA (D2FA), which substantially reduces space equirements as compared to a DFA. A D2FA is constructed by transforming a DFA via incrementally replacing several transitions of the automaton with a single default transition. Our approach dramatically reduces the number of distinct transitions between states. For a collection of regular expressions drawn from current commercial and academic systems, a D2FA representation reduces transitions by more than 95%. Given the substantially reduced space equirements, we describe an efficient architecture that can perform deep packet inspection at multi-gigabit rates. Our architecture uses multiple on-chip memories in such a way that each remains uniformly occupied and accessed over a short duration, thus effectively distributing the load and enabling high throughput. Our architecture can provide ostffective packet content scanning at OC-192 rates with memory requirements that are consistent with current ASIC technology.

...read moreread less

553 citations

Proceedings Article•DOI•

Fast hash table lookup using extended bloom filter: an aid to network processing

[...]

Haoyu Song¹, Sarang Dharmapurikar¹, Jonathan S. Turner¹, John W. Lockwood¹•Institutions (1)

Washington University in St. Louis¹

22 Aug 2005

TL;DR: This work presents a novel hash table data structure and lookup algorithm which improves the performance over a naive hash table by reducing the number of memory accesses needed for the most time-consuming lookups, which allows designers to achieve higher lookup performance for a given memory bandwidth.

...read moreread less

Abstract: Hash tables are fundamental components of several network processing algorithms and applications, including route lookup, packet classification, per-flow state management and network monitoring. These applications, which typically occur in the data-path of high-speed routers, must process and forward packets with little or no buffer, making it important to maintain wire-speed throughout. A poorly designed hash table can critically affect the worst-case throughput of an application, since the number of memory accesses required for each lookup can vary. Hence, high throughput applications require hash tables with more predictable worst-case lookup performance. While published papers often assume that hash table lookups take constant time, there is significant variation in the number of items that must be accessed in a typical hash table search, leading to search times that vary by a factor of four or more.We present a novel hash table data structure and lookup algorithm which improves the performance over a naive hash table by reducing the number of memory accesses needed for the most time-consuming lookups. This allows designers to achieve higher lookup performance for a given memory bandwidth, without requiring large amounts of buffering in front of the lookup engine. Our algorithm extends the multiple-hashing Bloom Filter data structure to support exact matches and exploits recent advances in embedded memory technology. Through a combination of analysis and simulations we show that our algorithm is significantly faster than a naive hash table using the same amount of memory, hence it can support better throughput for router applications that use hash tables.

...read moreread less

410 citations

Proceedings Article•DOI•

Longest prefix matching using bloom filters

[...]

Sarang Dharmapurikar¹, Praveen Krishnamurthy¹, David E. Taylor¹•Institutions (1)

Washington University in St. Louis¹

25 Aug 2003

TL;DR: This work introduces the first algorithm that is aware of to employ Bloom filters for longest prefix matching (LPM), and shows that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches.

...read moreread less

Abstract: We introduce the first algorithm that we are aware of to employ Bloom filters for Longest Prefix Matching (LPM). The algorithm performs parallel queries on Bloom filters, an efficient data structure for membership queries, in order to determine address prefix membership in sets of prefixes sorted by prefix length. We show that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches. The key feature of our technique is that the performance, as determined by the number of dependent memory accesses per lookup, can be held constant for longer address lengths or additional unique address prefix lengths in the forwarding table given that memory resources scale linearly with the number of prefixes in the forwarding table.Our approach is equally attractive for Internet Protocol Version 6 (IPv6) which uses 128-bit destination addresses, four times longer than IPv4. We present a basic version of our approach along with optimizations leveraging previous advances in LPM algorithms. We also report results of performance simulations of our system using snapshots of IPv4 BGP tables and extend the results to IPv6. Using less than 2Mb of embedded RAM and a commodity SRAM device, our technique achieves average performance of one hash probe per lookup and a worst case of two hash probes and one array access per lookup.

...read moreread less

377 citations

Journal Article•DOI•

Longest prefix matching using bloom filters

[...]

Sarang Dharmapurikar¹, Praveen Krishnamurthy¹, David E. Taylor¹•Institutions (1)

Washington University in St. Louis¹

01 Apr 2006-IEEE ACM Transactions on Networking

...read moreread less

Abstract: We introduce the first algorithm that we are aware of to employ Bloom filters for longest prefix matching (LPM). The algorithm performs parallel queries on Bloom filters, an efficient data structure for membership queries, in order to determine address prefix membership in sets of prefixes sorted by prefix length. We show that use of this algorithm for Internet Protocol (IP) routing lookups results in a search engine providing better performance and scalability than TCAM-based approaches. The key feature of our technique is that the performance, as determined by the number of dependent memory accesses per lookup, can be held constant for longer address lengths or additional unique address prefix lengths in the forwarding table given that memory resources scale linearly with the number of prefixes in the forwarding table. Our approach is equally attractive for Internet Protocol Version 6 (IPv6) which uses 128-bit destination addresses, four times longer than IPv4. We present a basic version of our approach along with optimizations leveraging previous advances in LPM algorithms. We also report results of performance simulations of our system using snapshots of IPv4 BGP tables and extend the results to IPv6. Using less than 2 Mb of embedded RAM and a commodity SRAM device, our technique achieves average performance of one hash probe per lookup and a worst case of two hash probes and one array access per lookup.

...read moreread less

290 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Patent•

Systems and methods for processing data flows

[...]

Harsh Kapoor, Moisey Akerman¹, Stephen D. Justus², John C. Ferguson², Yevgeny Korsunsky², Paul S. Gallo¹, Charles Lee², Timothy M. Martin², Chunsheng Fu², Weidong Xu² - Show less +6 more•Institutions (2)

Symantec¹, CA Technologies²

29 Oct 2007

TL;DR: In this article, a flow processing facility for inspecting payloads of network traffic packets detects security threats and intrusions across accessible layers of the IP-stack by applying content matching and behavioral anomaly detection techniques based on regular expression matching and self-organizing maps.

...read moreread less

Abstract: A flow processing facility, which uses a set of artificial neurons for pattern recognition, such as a self-organizing map, in order to provide security and protection to a computer or computer system supports unified threat management based at least in part on patterns relevant to a variety of types of threats that relate to computer systems, including computer networks. Flow processing for switching, security, and other network applications, including a facility that processes a data flow to address patterns relevant to a variety of conditions are directed at internal network security, virtualization, and web connection security. A flow processing facility for inspecting payloads of network traffic packets detects security threats and intrusions across accessible layers of the IP-stack by applying content matching and behavioral anomaly detection techniques based on regular expression matching and self-organizing maps. Exposing threats and intrusions within packet payload at or near real-time rates enhances network security from both external and internal sources while ensuring security policy is rigorously applied to data and system resources. Intrusion Detection and Protection (IDP) is provided by a flow processing facility that processes a data flow to address patterns relevant to a variety of types of network and data integrity threats that relate to computer systems, including computer networks.

...read moreread less

1,428 citations

Named Data Networking (NDN) Project

[...]

Lixia Zhang, Deborah Estrin, Jeff Burke, Van Jacobson, James D. Thornton, Diana K. Smetters, Beichuan Zhang, Gene Tsudik - Show less +4 more

01 Jan 2010

TL;DR: A global center for commercial innovation, PARC, a Xerox company, works closely with enterprises, entrepreneurs, government program partners and other clients to discover, develop, and deliver new business opportunities.

...read moreread less

Abstract: A global center for commercial innovation, PARC, a Xerox company, works closely with enterprises, entrepreneurs, government program partners and other clients to discover, develop, and deliver new business opportunities. PARC was incorporated in 2002 as a wholly owned subsidiary of Xerox Corporation (NYSE: XRX).

...read moreread less

1,072 citations

Journal Article•DOI•

Big data

[...]

Ibrar Yaqoob¹, Ibrahim Abaker Targio Hashem¹, Abdullah Gani¹, Salimah Binti Mokhtar¹, Ejaz Ahmed¹, Nor Badrul Anuar¹, Athanasios V. Vasilakos² - Show less +3 more•Institutions (2)

Information Technology University¹, Luleå University of Technology²

01 Dec 2016-International Journal of Information Management

TL;DR: This paper presents a comprehensive discussion on state-of-the-art big data technologies based on batch and stream data processing based on structuralism and functionalism paradigms and strengths and weaknesses of these technologies are analyzed.

...read moreread less

964 citations

Proceedings Article•DOI•

Securing web application code by static analysis and runtime protection

[...]

Yao-Wen Huang¹, Fang Yu², Christian Hang³, Chung-Hung Tsai¹, Der-Tsai Lee², Sy-Yen Kuo¹ - Show less +2 more•Institutions (3)

National Taiwan University¹, Academia Sinica², RWTH Aachen University³

17 May 2004

TL;DR: A lattice-based static analysis algorithm derived from type systems and typestate is created, and its soundness is addressed, thus securing Web applications in the absence of user intervention and reducing potential runtime overhead by 98.4%.

...read moreread less

Abstract: Security remains a major roadblock to universal acceptance of the Web for many kinds of transactions, especially since the recent sharp increase in remotely exploitable vulnerabilities have been attributed to Web application bugs. Many verification tools are discovering previously unknown vulnerabilities in legacy C programs, raising hopes that the same success can be achieved with Web applications. In this paper, we describe a sound and holistic approach to ensuring Web application security. Viewing Web application vulnerabilities as a secure information flow problem, we created a lattice-based static analysis algorithm derived from type systems and typestate, and addressed its soundness. During the analysis, sections of code considered vulnerable are instrumented with runtime guards, thus securing Web applications in the absence of user intervention. With sufficient annotations, runtime overhead can be reduced to zero. We also created a tool named.WebSSARI (Web application Security by Static Analysis and Runtime Inspection) to test our algorithm, and used it to verify 230 open-source Web application projects on SourceForge.net, which were selected to represent projects of different maturity, popularity, and scale. 69 contained vulnerabilities. After notifying the developers, 38 acknowledged our findings and stated their plans to provide patches. Our statistics also show that static analysis reduced potential runtime overhead by 98.4%.

...read moreread less

655 citations

Proceedings Article•DOI•

Cuckoo Filter: Practically Better Than Bloom

[...]

Bin Fan¹, Dave G. Andersen¹, Michael Kaminsky², Michael Mitzenmacher³•Institutions (3)

Carnegie Mellon University¹, Intel², Harvard University³

02 Dec 2014

TL;DR: Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters, and have lower space overhead than space-optimized Bloom filters.

...read moreread less

Abstract: In many networking systems, Bloom filters are used for high-speed set membership tests. They permit a small fraction of false positive answers with very good space efficiency. However, they do not permit deletion of items from the set, and previous attempts to extend "standard" Bloom filters to support deletion all degrade either space or performance. We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set membership tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters outperform previous data structures that extend Bloom filters to support deletions substantially in both time and space.

...read moreread less

593 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse