scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Accelerated spam filtering with enhanced KMP algorithm on GPU

01 Feb 2017-pp 1-7
TL;DR: An accelerated spam filtering mechanism that uses GPUs is presented that utilizes an enhanced version of Knuth Morris Pratt pattern matching algorithm that outperforms the serial versions up to 12x and also performs more efficiently compared to other parallel versions.
Abstract: Spam filtering is one of the most important applications in email services that has become increasingly sophisticated due to the enormous usage of Internet. Traditionally, spam filters have been implemented on the CPU with a pattern matching algorithm. In this paper, an accelerated spam filtering mechanism that uses GPUs is presented. The filtering process utilizes an enhanced version of Knuth Morris Pratt pattern matching algorithm that outperforms the serial versions up to 12x and also performs more efficiently compared to other parallel versions. The parallel algorithm is to develop and advanced keyword based Naive Bayesian classifier speeds up the spam filtering up to 2 times compared to CPU.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes a high performance parallel KMP algorithm on the Heterogeneous High Performance Computing (HPC) architecture based on the general-purpose multicore microprocessor and the manycore graphic processing unit (GPU) using OpenCL as the General Purpose computing using Graphic Processing Unit (GPGPU) platform.
Abstract: String matching algorithm is widely used in many application areas such as bio-informatics, network intrusion detection, computer virus scan, among many others. KMP (Knuth-Morris-Pratt) algorithm is commonly used for its fast execution time compared with many other string matching algorithms when applied to large input texts. However, the performance of the KMP algorithm is limited when the input text size increases significantly beyond a certain limit. In this paper, we propose a high performance parallel KMP algorithm on the Heterogeneous High Performance Computing (HPC) architecture based on the general-purpose multicore microprocessor and the Graphic Processing Unit (GPU) using OpenCL as the GPGPU (General Purpose computing using Graphic Processing Unit) platform. The proposed parallel KMP algorithm mainly focuses on optimizing the CPU-GPU memory hierarchy by overlapping the data transfer between the CPU memory and the GPU memory with the string matching operations on the GPU. It also optimizes the allocation of the work-groups and the work-items, and places the pattern data and the Failure Table to the on-chip shared-memory of the GPU. The experimental results show that the optimized parallel KMP algorithm leads up to ~8 times faster execution time.

6 citations

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , the authors reviewed various kinds of methods in order to translate one language to another language using Natural Language Processing (NLP) toolkit and proposed a method to enable people in various regions to understand the basic ideas of the amendments or drafts or bills passed by the Government.
Abstract: There are diverse languages present all over the world and in countries too. India is also the diverse country with people following various culture and languages. Though there are many languages, the amendments, bills and the drafts made by the constitution, are scripted only by using Hindi and English. Most of the people in India know only their mother-tongue that is their regional languages. In this case, when a draft is passed in the Constitution, majority of the people will not understand the gist of the draft, its uses and its impacts clearly. They need a translator manually to convey the pros and cons of the drafts. Thus, in order to make the people easily understand the essence of the drafts or bills passed by government, an automatic regional language translator is required. This paper reviewed various kind of methods in order to translate one language to another language. In this proposed method in order to process the string and translate into another language we use Natural Language Processing (NLP) toolkit. Thus, this project enables people in various regions to understand the basic ideas of the amendments or drafts or bills passed by the Government summarize.
References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

01 Jan 2005

19,250 citations

Book ChapterDOI
01 Jan 2014
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Abstract: Algorithms are important tools for solving problems computationally. All computation involves algorithms, and the efficiency of an algorithm largely determines its usefulness. This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation. A brief history of recent nature-inspired algorithms for optimization is outlined in this chapter.

8,285 citations

Posted Content
TL;DR: NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware that covers symbolic and statistical natural language processing.
Abstract: NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic and statistical natural language processing, and is interfaced to annotated corpora. Students augment and replace existing components, learn structured programming by example, and manipulate sophisticated models from the outset.

3,345 citations


"Accelerated spam filtering with enh..." refers methods in this paper

  • ...NLTK, is toolkit available in Python that provides easy to use interfaces and consists of a basket of text classification libraries including classification, tokenization, lemmatising, parsing, and semantic reasoning [29] [30]....

    [...]

Journal ArticleDOI
TL;DR: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings, showing that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time.
Abstract: An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings. The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems. A theoretical application of the algorithm shows that the set of concatenations of even palindromes, i.e., the language $\{\alpha \alpha ^R\}^*$, can be recognized in linear time. Other algorithms which run even faster on the average are also considered.

3,156 citations


"Accelerated spam filtering with enh..." refers methods in this paper

  • ...In this paper, an enhanced version of Knuth Morris Pratt pattern matching algorithm [11] that divides the text to be searched into chunks of smaller texts and utilizes the power of GPU to search in parallel is proposed....

    [...]