scispace - formally typeset
Search or ask a question

Showing papers by "Hiroki Takakura published in 2013"


Journal ArticleDOI
TL;DR: A new anomaly detection method by which it can automatically tune and optimize the values of parameters without predefining them is proposed and evaluated over real traffic data obtained from Kyoto University honeypots.

95 citations


Proceedings ArticleDOI
22 Jul 2013
TL;DR: Experimental results show that this method can detect the name of APIs used in a malware which existing methods cannot, that it is useful to determine inserted codes which is used for generating variants to avoid pattern detection by anti-virus, and that it actually reduces the time to process malware programs without deteriorating the accuracy of classification.
Abstract: It is required in the first step of malware analysis to determine whether a given malware program is a variant of known ones. If it is surely not a variant, manual analysis against it is required. However, it is impossible to perform manual analysis, the cost of which is very high, over all the enormous number of newly found malware programs. An automatic and accurate malware program classification method should contribute to this situation. Existing methods suffer from such problems as the cost of calculating similarity between every pair of malware programs in a database, and the disability to precisely present the similarity and the difference between programs. In our approach, known malware programs are classified into families. A given malware program is determined to be a variant if it is classified into an existing family. Incremental clustering is then performed for the new one and the family, which reduces the cost of re-training and similarity calculation. Accurate comparison between programs is enabled by evaluating the difference between programs using the longest common subsequences (LCSs) of instructions. To reduce the amount of the costly calculation of LCSs, the numeric features of codes, such as cyclomatic complexity, the number of function calls and so on, are used to filter out dissimilar codes. Subsequences in the LCS of two codes are presented to malware analysts as the similarity between them, while those out of it are given as the difference. Experimental results show that this method can detect the name of APIs used in a malware which existing methods cannot, that it is useful to determine inserted codes which is used for generating variants to avoid pattern detection by anti-virus, and that it actually reduces the time to process malware programs without deteriorating the accuracy of classification.

13 citations