scispace - formally typeset
Search or ask a question
Author

Hee-Sun Kim

Bio: Hee-Sun Kim is an academic researcher from Konkuk University. The author has contributed to research in topics: Bag-of-words model & Fingerprint (computing). The author has an hindex of 1, co-authored 1 publications receiving 13 citations.

Papers
More filters
Proceedings ArticleDOI
05 Oct 2014
TL;DR: A novel similarity-based method is proposed that implements the fingerprinting technique on parallel processing framework and meet-in-the-middle approach is used in this method to achieve a higher accuracy in the spam email detection system.
Abstract: Currently, there are many effective techniques that are used for filtering spam emails. However, spammers have mostly identified the weakness of those methods in order to bypass current detection systems. In this paper, we propose a novel similarity-based method that implements the fingerprinting technique on parallel processing framework. Furthermore, meet-in-the-middle approach is used in our method to achieve a higher accuracy in the spam email detection system. Our experimental result demonstrates the improved efficiency of this study.

16 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Malytics is a novel scheme to detect malware which is not dependent on any particular tool or operating system and outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms.
Abstract: An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement, and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by tf -simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97.21% and 99.45% on Android dex file and Windows PE files, respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated.

28 citations

Proceedings ArticleDOI
09 Apr 2016
TL;DR: This work proposes an association rule mining to improve Naïve Bayes Classifier, one of the famous algorithm in big data classification but based on an independent assumptions between features.
Abstract: Nowadays, big data contains infinite business opportunities. Companies begin to analyze their data to predict their potential customers and business decisions using Naive Bayes Classifier, Association Rule Mining, Decision Tree and other famous algorithms. An accurate classification result may help companies leading in its industry. Companies seek to find feasible business intelligences to obtain reliable prediction results. In this paper we propose an association rule mining to improve Naive Bayes Classifier. Naive Bayes Classifier is one of the famous algorithm in big data classification but based on an independent assumptions between features. Association rule mining is popular and useful for discovering relations between inputs in big data analysis. We use bank marketing data set to illustrate in this work. In general, this work is helpful to all the business data set.

19 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: A system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email is proposed and tries to overcome the two hurdles of the SVM.
Abstract: Phishing is a criminal scheme to steal the user's personal data and other credential information. It is a fraud that acquires victim's confidential information such as password, bank account detail, credit card number, financial username and password etc. and later it can be misuse by attacker. We aim to use fundamental visual features of a web page's appearance as the basis of detecting page similarities. We propose a novel solution, to efficiently detect phishing web pages. Note that page layouts and contents are fundamental feature of web pages' appearance. Since the standard way to specify page layouts is through the style sheet (CSS), we develop an algorithm to detect similarities in key elements related to CSS. In this paper, we proposed a system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email. By using the map-reduce technique we also try to overcome the two hurdles of the SVM.

15 citations

Proceedings Article
John H. Reif1
19 May 2002
TL;DR: The topics included algorithms and computational complexity bounds for classical problems in algebra, geometry, topology, graph theory, game theory, logic and machine learning, as well as theoretical aspects of security, databases, information retrieval and networks, the web, computational biology, and alternative models of computation including quantum computation and self-assembly.
Abstract: The papers in this volume were presented at the Thirty-Fourth Annual ACM Symposium on Theory of Computing (STOC2002), held in Montreal, Quebec, Canada, May 19-21, 2002. The Symposium was sponsored by the ACM Special Interest Group on Algorithms and Computation Theory (SIGACT).In response to a call for papers, 287 paper submissions were received. All were submitted electronically. The program committee conducted its deliberations electronically, via an on-line meeting that ran from January 10 to January 19. The committee selected 91 papers from among the submissions. The submissions were not refereed, and many of these papers represented reports of continuing research. It is expected that most of them will appear in a more polished and complete form in scientific journals.The papers encompassed in wide variety of areas of theoretical computer science. The topics included algorithms and computational complexity bounds for classical problems in algebra, geometry, topology, graph theory, game theory, logic and machine learning, as well as theoretical aspects of security, databases, information retrieval, and networks, the web, computational biology, and alternative models of computation including quantum computation and self-assembly.

14 citations

Proceedings ArticleDOI
30 Mar 2015
TL;DR: An anti-spam filter is developed that employs the Naïve Bayesian classifier, an effective engine to pick out spam emails that was trained on Enron Spam Dataset, a well-known spam/legitimate email dataset.
Abstract: Electronic mail has nowadays become a convenient and inexpensive way for communication regardless of the distance. However, an increasing volume of unsolicited emails is bringing down the productivity dramatically. There is a need for reliable anti-spam filters to separate such messages from legitimate ones. The Naive Bayesian classifier is suggested as an effective engine to pick out spam emails. We have developed an anti-spam filter that employs this content-based classifier. This statistic-based classifier was trained on Enron Spam Dataset, a well-known spam/legitimate email dataset. We developed this filter as a Web Service, which would consume the emails user uploads and give back the predicted probability that in what degree the given email is spam. This engine was achieved by Rest easy technology, and consists three phases to train pre-labeled emails and then apply Naive Bays theorem to calculate email's Spamicity.

10 citations