scispace - formally typeset
Search or ask a question
Author

R. Prabhakar

Bio: R. Prabhakar is an academic researcher. The author has contributed to research in topics: Bag-of-words model & Document clustering. The author has an hindex of 1, co-authored 1 publications receiving 70 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A novel method of efficient spam mail classification using clustering techniques is presented in this research paper, which can extract spam/non-spam email and detect the spam email efficiently.
Abstract: A novel method of efficient spam mail classification using clustering techniques is presented in this research paper. E-mail spam is one of the major problems of the today’s internet, bringing financial damage to companies and annoying individual users. Among the approaches developed to stop spam, filtering is an important and popular one. A new spam detection technique using the text clustering based on vector space model is proposed in this research paper. By using this method, one can extract spam/non-spam email and detect the spam email efficiently. Representation of data is done using a vector space model. Clustering is the technique used for data reduction. It divides the data into groups based on pattern similarities such that each group is abstracted by one or more representatives. Recently, there is a growing emphasis on exploratory analysis of very large datasets to discover useful patterns, it is called data mining. Each cluster is

78 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: A focused literature survey of Artificial Intelligence (AI) and Machine Learning (ML) methods for intelligent spam email detection, which can help in developing appropriate countermeasures.
Abstract: The tremendously growing problem of phishing e-mail, also known as spam including spear phishing or spam borne malware, has demanded a need for reliable intelligent anti-spam e-mail filters. This survey paper describes a focused literature survey of Artificial Intelligence (AI) and Machine Learning (ML) methods for intelligent spam email detection, which we believe can help in developing appropriate countermeasures. In this paper, we considered 4 parts in the email's structure that can be used for intelligent analysis: (A) Headers Provide Routing Information, contain mail transfer agents (MTA) that provide information like email and IP address of each sender and recipient of where the email originated and what stopovers, and final destination. (B) The SMTP Envelope, containing mail exchangers' identification, originating source and destination domains\users. (C) First part of SMTP Data, containing information like from, to, date, subject - appearing in most email clients (D) Second part of SMTP Data, containing email body including text content, and attachment. Based on the number the relevance of an emerging intelligent method, papers representing each method were identified, read, and summarized. Insightful findings, challenges and research problems are disclosed in this paper. This comprehensive survey paves the way for future research endeavors addressing theoretical and empirical aspects related to intelligent spam email detection.

124 citations

Journal ArticleDOI
TL;DR: A modified machine learning technique of the human immune system called negative selection algorithm (NSA) generates detectors at the random detector generation phase of NSA; code named NSA-DE; local outlier factor (LOF) is implemented as fitness function to maximize the distance of generated spam detectors from the non-spam space.

73 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A novel spam classification method that uses features based on email content-language and readability combined with the previously used content-based task features to outperform a number of state-of-the-art methods proposed in previous studies.
Abstract: Supervised machine learning methods for classifying spam emails are long-established. Most of these methods use either header-based or content-based features. Spammers, however, can bypass these methods easily-especially the ones that deal with header features. In this paper, we report a novel spam classification method that uses features based on email content-language and readability combined with the previously used content-based task features. The features are extracted from four benchmark datasets viz. CSDMC2010, Spam Assassin, Ling Spam, and Enron-Spam. We use five well-known algorithms to induce our spam classifiers: Random Forest (RF), BAGGING, ADABOOSTM1, Support Vector Machine (SVM), and Naive Bayes (NB). We evaluate the classifier performances and find that BAGGING performs the best. Moreover, its performance surpasses that of a number of state-of-the-art methods proposed in previous studies. Although applied only to English language emails, the results indicate that our method may be an excellent means to classify spam emails in other languages, as well.

62 citations

Journal ArticleDOI
TL;DR: In this article , the authors review the applications of ML in aerodynamic shape optimization (ASO) and provide a perspective on the state-of-the-art and future directions.

44 citations