scispace - formally typeset
Search or ask a question
Author

Phuc-Tran Ho

Bio: Phuc-Tran Ho is an academic researcher from Konkuk University. The author has contributed to research in topics: Spambot & Bag-of-words model. The author has an hindex of 2, co-authored 2 publications receiving 20 citations.

Papers
More filters
Proceedings ArticleDOI
05 Oct 2014
TL;DR: A novel similarity-based method is proposed that implements the fingerprinting technique on parallel processing framework and meet-in-the-middle approach is used in this method to achieve a higher accuracy in the spam email detection system.
Abstract: Currently, there are many effective techniques that are used for filtering spam emails. However, spammers have mostly identified the weakness of those methods in order to bypass current detection systems. In this paper, we propose a novel similarity-based method that implements the fingerprinting technique on parallel processing framework. Furthermore, meet-in-the-middle approach is used in our method to achieve a higher accuracy in the spam email detection system. Our experimental result demonstrates the improved efficiency of this study.

16 citations

Journal ArticleDOI
TL;DR: This paper proposes a similarity-based method that combines fingerprinting technique with trie-tree data structure and meet-in-the-middle approach in order to achieve a higher accuracy in spam comments detection.
Abstract: Social networking has been used widely by millions of people over the world. It has become the most popular way for people who want to connect and interact online with their friends. Currently, there are many social networking sites, for instance, Facebook, My Space, and Twitter, with a huge number of active users. Therefore, they are also good places for spammers or cheaters who want to steal the personal information of users or advertise their products. Recently, many proposed methods are applied to detect spam comments on social networks with different techniques. In this paper, we propose a similarity-based method that combines fingerprinting technique with trie-tree data structure and meet-in-the-middle approach in order to achieve a higher accuracy in spam comments detection. Using our proposed approach, we are able to detect around 98% spam comments in our dataset.

7 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work integrated co-author features, institution features, and text fingerprints to provide semantic fingerprints for disambiguating author names and achieving better performance on the F-measure.
Abstract: Author name disambiguation is an important problem that needs to be resolved in bibliometric analysis or tech mining. Many techniques have been presented; however, most of them require a long run time or additional information. A new method based on semantic fingerprints was presented to disambiguate author names without external data. A manually annotated dataset was built to testify on the efficiency of the presented method. Experiments using co-author features, institution features, and text fingerprints were conducted respectively. We found that the first two methods had higher precision, but their recall was low, and the text fingerprint method had higher recall and satisfied precision. Based on these results, we integrated co-author features, institution features, and text fingerprints to provide semantic fingerprints for disambiguating author names and achieving better performance on the F-measure.

29 citations

Journal ArticleDOI
TL;DR: Malytics is a novel scheme to detect malware which is not dependent on any particular tool or operating system and outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms.
Abstract: An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement, and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by tf -simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97.21% and 99.45% on Android dex file and Windows PE files, respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated.

28 citations

Proceedings ArticleDOI
09 Apr 2016
TL;DR: This work proposes an association rule mining to improve Naïve Bayes Classifier, one of the famous algorithm in big data classification but based on an independent assumptions between features.
Abstract: Nowadays, big data contains infinite business opportunities. Companies begin to analyze their data to predict their potential customers and business decisions using Naive Bayes Classifier, Association Rule Mining, Decision Tree and other famous algorithms. An accurate classification result may help companies leading in its industry. Companies seek to find feasible business intelligences to obtain reliable prediction results. In this paper we propose an association rule mining to improve Naive Bayes Classifier. Naive Bayes Classifier is one of the famous algorithm in big data classification but based on an independent assumptions between features. Association rule mining is popular and useful for discovering relations between inputs in big data analysis. We use bank marketing data set to illustrate in this work. In general, this work is helpful to all the business data set.

19 citations

Proceedings ArticleDOI
05 Oct 2014
TL;DR: A novel similarity-based method is proposed that implements the fingerprinting technique on parallel processing framework and meet-in-the-middle approach is used in this method to achieve a higher accuracy in the spam email detection system.
Abstract: Currently, there are many effective techniques that are used for filtering spam emails. However, spammers have mostly identified the weakness of those methods in order to bypass current detection systems. In this paper, we propose a novel similarity-based method that implements the fingerprinting technique on parallel processing framework. Furthermore, meet-in-the-middle approach is used in our method to achieve a higher accuracy in the spam email detection system. Our experimental result demonstrates the improved efficiency of this study.

16 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: A system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email is proposed and tries to overcome the two hurdles of the SVM.
Abstract: Phishing is a criminal scheme to steal the user's personal data and other credential information. It is a fraud that acquires victim's confidential information such as password, bank account detail, credit card number, financial username and password etc. and later it can be misuse by attacker. We aim to use fundamental visual features of a web page's appearance as the basis of detecting page similarities. We propose a novel solution, to efficiently detect phishing web pages. Note that page layouts and contents are fundamental feature of web pages' appearance. Since the standard way to specify page layouts is through the style sheet (CSS), we develop an algorithm to detect similarities in key elements related to CSS. In this paper, we proposed a system that uses SVM technique along with map-reduce paradigm to achieve a higher accuracy in detection of the spam email. By using the map-reduce technique we also try to overcome the two hurdles of the SVM.

15 citations