scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Semantics-aware malware detection

TL;DR: Experimental evaluation demonstrates that the malware-detection algorithm can detect variants of malware with a relatively low run-time overhead and the semantics-aware malware detection algorithm is resilient to common obfuscations used by hackers.
Abstract: A malware detector is a system that attempts to determine whether a program has malicious intent. In order to evade detection, malware writers (hackers) frequently use obfuscation to morph malware. Malware detectors that use a pattern-matching approach (such as commercial virus scanners) are susceptible to obfuscations used by hackers. The fundamental deficiency in the pattern-matching approach to malware detection is that it is purely syntactic and ignores the semantics of instructions. In this paper, we present a malware-detection algorithm that addresses this deficiency by incorporating instruction semantics to detect malicious program traits. Experimental evaluation demonstrates that our malware-detection algorithm can detect variants of malware with a relatively low run-time overhead. Moreover our semantics-aware malware detection algorithm is resilient to common obfuscations used by hackers.

Content maybe subject to copyright    Report

Randal E. Bryant
Publications
Books and Book Chapters
R. E. Bryant, “Binary Decision Diagrams, Handbook of Model Checking, E. M. Clarke, T. A. Henzinger,
H. Veith, and R. Bloem, eds., Springer, 2018, pp. 191–218, Available as
http://www.cs.cmu.edu/˜bryant/pubdir/hmc-bdd18.pdf.
R. E. Bryant, and D. R. O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition,
Prentice-Hall, 2015. More information available at http://csapp.cs.cmu.edu/.
R. E. Bryant, and D. R. O’Hallaron, Computer Systems: A Programmer’s Perspective, Second Edition,
Prentice-Hall, 2011. More information available at http://csapp.cs.cmu.edu/.
R. E. Bryant, and J. H. Kukula, “Formal Methods for Functional Verification, in The Best of ICCAD: 20
Years of Excellence in Computer-Aided Design, A. Kuehlmann, ed. Kluwer Academic Publishers, 2003,
pp. 3–16. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/iccad-best02.pdf.
R. E. Bryant, and D. R. O’Hallaron, Computer Systems: A Programmer’s Perspective, Prentice-Hall, 2003.
R. E. Bryant, and C. Meinel, “Ordered Binary Decision Diagrams, in Logic Synthesis and Verification, S.
Hassoun and T. Sasao, eds., Kluwer Academic Publishers, 2001.
R. E. Bryant, ed., Proceedings of the Third Caltech Conference on Very Large Scale Integration, Computer
Science Press, March, 1983.
R. E. Bryant and J. B. Dennis, “Concurrent Programming, in Research Directions in Software Technology,
P. Wegner, ed., MIT Press, June, 1979, pp. 584–610. Revised version in Operating Systems Engineer-
ing, Lecture Notes in Computer Science 143, M. Maekawa and L. A. Belady, eds., Springer-Verlag, 1982,
pp. 426–451. Electronic version available as
http://www.cs.cmu.edu/˜bryant/pubdir/MIT-CSG-148-2.pdf.
Refereed Journal and Book Articles
R. E. Bryant, “Chain Reduction for Binary and Zero-Suppressed Decision Diagrams, Journal of Automated
Reasoning, Vol. 64, No. 7 (2020), pp. 81–98. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/jar20.pdf.
R. E. Bryant, “Data-Intensive Scalable Computing for Scientific Applications, IEEE Computing in Science
and Engineering, Vol. 13, No. 6 (2011), pp. 25–33.
R. E. Bryant, D. Kroening, J. Ouaknine, S. A. Seshia, O. Strichman, and B. Brady, An Abstraction-Based
Decision Procedure for Bit-Vector Arithmetic, International Journal of Software Tools for Technology,
Springer-Verlag Vol. 11, No. 2 (April, 2009), pp. 95–104.
R. M. Jensen, M. M. Veloso, and R. E. Bryant, “State-Set Branching: Leveraging BDDs for Heuristic
Search, Artificial Intelligence, Vol. 172, Issues 2–3 (February, 2008), pp. 103–139. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/aij07.pdf.
1

S. K. Lahiri, and R. E. Bryant, “Predicate Abstraction with Indexed Predicates, ACM Transactions on
Computational Logic, Vol. 9, No. 1 (Dec., 2007). Available as
http://www.cs.cmu.edu/˜bryant/pubdir/tocl06.pdf.
S. A. Seshia, K. Subramani, and R. E. Bryant, “On Solving Boolean Combinations of UTVPI Constraints,
Journal of Satisfiability, Boolean Modeling and Computation, Vol. 3 (2007), pp. 67–90. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/jsat07.pdf.
M. N. Velev, and R. E. Bryant, “TLSim and EVC: A Term-Level Symbolic Simulator and an Efficient
Decision Procedure for the Logic of Equality with Uninterpreted Functions and Memories, International
Journal of Embedded Systems, Vol. 1, No. 1/2 (2005), pp. 134–149. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/ijes05.pdf.
S. A. Seshia, and R. E. Bryant, “Deciding Quantifier-Free Presburger Formulas Using Parameterized Solu-
tion Bounds, Logical Methods in Computer Science, Vol. 1, Issue 2, Paper 7 (December, 2005). Available
as
http://www.cs.cmu.edu/˜bryant/pubdir/lmcs05.pdf.
M. N. Velev, and R. E. Bryant, “Effective Use of Boolean Satisfiability Procedures in the Formal Verification
of Superscalar and VLIW Microprocessors, Journal of Symbolic Computation. Vol. 35, No. 2 (February,
2003), pp. 73–106. Submitted version available as
http://www.cs.cmu.edu/˜bryant/pubdir/jsc03.pdf.
R. E. Bryant and M. N. Velev, “Boolean Satisfiability with Transitivity Constraints, ACM Transactions on
Computational Logic, Vol. 3, No. 4 (October, 2002). Available as
http://www.cs.cmu.edu/˜bryant/pubdir/tocl-trans01.pdf.
Y.-A. Chen, and R. E. Bryant, An Efficient Graph Representation for Arithmetic Circuit Verification, IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 20, No. 12 (December,
2001), pp. 1442–1454. Winner of 2003 IEEE CAD Transactions Best Paper Award. Preprint version avail-
able as
http://www.cs.cmu.edu/˜bryant/pubdir/tcad01-chen.pdf.
R. E. Bryant, and Y.-A. Chen, “Verification of Arithmetic Circuits Using Binary Moment Diagrams, Soft-
ware Tools for Technology Transfer, Springer-Verlag, Vol. 3, No. 2 (May, 2001), pp. 137–155. Submitted
version available as
http://www.cs.cmu.edu/˜bryant/pubdir/sttt-submit.pdf.
C. B. McDonald and R. E. Bryant, “CMOS Circuit Verification with Symbolic Switch-Level Timing Sim-
ulation, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 20, No. 3
(March , 2001), pp. 458–474. Preprint version available as
http://www.cs.cmu.edu/˜bryant/pubdir/tcad01.pdf.
R. E. Bryant, S. German, M. N. Velev, “Processor Verification Using Efficient Reductions of the Logic of
Uninterpreted Functions to Propositional Logic, ACM Transactions on Computational Logic, Vol. 2, No. 1
(January, 2001). Available as
http://www.cs.cmu.edu/˜bryant/pubdir/tocl01.pdf.
M. Pandey, and R. E. Bryant, “Exploiting symmetry when verifying transistor-level circuits by symbolic
trajectory evaluation, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
Vol. 18, No. 7 (July, 1999), pp. 918–935. Winner of 2001 IEEE Circuits and Systems Society Outstanding
Young Author Award. Preprint version available as
2

http://www.cs.cmu.edu/˜bryant/pubdir/tcad99.pdf.
C.-J. H. Seger, and R. E. Bryant, “Formal Verification by Symbolic Evaluation of Partially-Ordered Tra-
jectories, Formal Methods in System Design, Vol. 6, No. 2 (March, 1995), pp. 147–190. Preprint version
available as
http://www.cs.cmu.edu/˜bryant/pubdir/fmsd95.pdf.
R. E. Bryant, J. D. Tygar, and L. P. Huang, “Geometric Characterization of Series-Parallel Variable Resistor
Networks, IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Vol. 41,
No. 11 (November, 1994), pp. 686–698. Manuscript version available as
http://www.cs.cmu.edu/˜bryant/pubdir/tcas94.pdf.
L. P. Huang, and R. E. Bryant, “Intractability in Linear Switch-Level Simulation, IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, Vol. 12, No. 6 (June, 1993), pp. 829–836.
R. E. Bryant, “Symbolic Boolean Manipulation with Ordered Binary Decision Diagrams, ACM Computing
Surveys, Vol. 24, No. 3 (September, 1992), pp. 293–318. Preprint version published as CMU Technical
Report CMU-CS-92-160,
http://www.cs.cmu.edu/˜bryant/pubdir/CMU-CS-92-160.pdf. Also available
as
http://www.cs.cmu.edu/˜bryant/pubdir/acmcs92.pdf
S. A. Kravitz, R. E. Bryant, and R. A. Rutenbar, “Massively Parallel Switch-Level Simulation: A Feasibility
Study, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 10, No. 7
(July, 1991) pp. 871–894.
R. E. Bryant, A Methodology for Hardware Verification Based on Logic Simulation, J.ACM, Vol. 38,
No. 2 (April, 1991), pp. 299–328. Preprint available as
http://www.cs.cmu.edu/˜bryant/pubdir/jacm91.pdf.
R. E. Bryant, “On the Complexity of VLSI Implementations and Graph Representations of Boolean Func-
tions with Application to Integer Multiplication, IEEE Transactions on Computers, Vol. 40, No. 2 (Febru-
ary, 1991), pp. 205–213. Preprint available as
http://www.cs.cmu.edu/˜bryant/pubdir/ieeetc91.pdf.
R. E. Bryant, “Formal Verification of Memory Circuits by Switch-Level Simulation, IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, Vol. 10, No. 1 (January, 1991), pp. 94–102.
Preprint available as
http://www.cs.cmu.edu/˜bryant/pubdir/tcad91.pdf.
D. L. Beatty, and R. E. Bryant, “Incremental Switch-Level Analysis, IEEE Design and Test of Computers,
Vol. 5, No. 6 (December, 1988), pp. 33–42.
R. E. Bryant, A Survey of Switch-Level Algorithms, IEEE Design and Test of Computers, Vol. 4, No. 4
(August, 1987), pp. 26–40.
R. E. Bryant, Algorithmic Aspects of Symbolic Switch Network Analysis, IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, Vol. CAD-6, No. 4 (July, 1987), pp. 618–633. Winner of
1987 IEEE CAD Transactions Best Paper Award, and the 1989 IEEE W. R. G. Baker Award. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/tcad87a.pdf.
R. E. Bryant, “Boolean Analysis of MOS Circuits, IEEE Transactions on Computer-Aided Design of Inte-
grated Circuits and Systems, Vol. CAD-6, No. 4 (July, 1987), pp. 634–649. Winner of the IEEE W. R. G.
Baker Award. Available as
3

http://www.cs.cmu.edu/˜bryant/pubdir/tcad87b.pdf.
R. E. Bryant, “Graph-Based Algorithms for Boolean Function Manipulation, IEEE Transactions on Com-
puters, Vol. C-35, No. 8 (August, 1986), pp. 677–691. Reprinted in M. Yoeli, Formal Verification of Hard-
ware Design, IEEE Computer Society Press, 1990, pp. 253–267. Electronic version with annotations avail-
able as
http://www.cs.cmu.edu/˜bryant/pubdir/ieeetc86.pdf.
W. J. Dally and R. E. Bryant, A Hardware Architecture for Switch-Level Simulation, IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, Vol. CAD-4, No. 3 (July, 1985), pp. 239–
249.
R. E. Bryant, A Switch-Level Model and Simulator for MOS Digital Systems, IEEE Transactions on
Computers, Vol. C-33, No. 2 (February, 1984), pp. 160–177.
Refereed Conference Articles
R. E. Bryant, A. Biere, and M. J. H. Heule, “Clausal Proofs for Pseudo-Boolean Reasoning, Tools and
Algorithms for the Construction and Analysis of Systems TACAS 2022, LNCS 12651, April, 2022. Available
as
http://www.cs.cmu.edu/˜bryant/pubdir/tacas22-bbh.pdf.
J. E. Reeves, M. J. H. Heule, and R. E. Bryant, “Moving Definition Variables in Quantified Boolean Formu-
las, Tools and Algorithms for the Construction and Analysis of Systems TACAS 2022, LNCS 12651, April,
2022. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/tacas22-rhb.pdf.
R. E. Bryant and M. J. H. Heule, “Dual Proof Generation for Quantified Boolean Formulas with a BDD-
Based Solver, Computer-Aided Deduction CADE 2021, LNAI 12699, July, 2021, pp. 433–449. Available
as
http://www.cs.cmu.edu/˜bryant/pubdir/cade21.pdf.
R. E. Bryant and M. J. H. Heule, “Generating Extended Resolution Proofs with a BDD-Based SAT Solver,
Tools and Algorithms for the Construction and Analysis of Systems TACAS 2021, LNCS 12651, April, 2021,
pp. 76–93. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/tacas21.pdf.
R. E. Bryant, “Chain Reduction for Binary and Zero-Suppressed Decision Diagrams, Tools and Algorithms
for the Construction and Analysis of Systems TACAS 2018, LNCS 10805, April, 2018, pp. 81–98. Available
as
http://www.cs.cmu.edu/˜bryant/pubdir/tacas18.pdf.
B. P. Railing, and R. E. Bryant, “Implementing Malloc: Students and Systems Programming, 49th ACM
Technical Symposium on Computer Science Education SIGCSE 2018, February, 2018. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/sigcse18.pdf.
R. M. Fujimoto, R. Bagrodia, R. E. Bryant, K. M. Chandy, D. Jefferson, J. Misra, D. Nicol, and B. Unger,
“Parallel Discrete Event Simulation: The Making of a Field, Winter Simulation Conference 2017, Decem-
ber, 2017. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/wsc17.pdf
H. Cui, J.
ˇ
Sim
ˇ
sa, Y.-H. Ling, H. Li, B. Blum, X. Xu, J. Yang, G. A. Gibson, and R. E. Bryant, “PARROT:
4

A Practical Runtime for Deterministic, Stable, and Reliable Threads, 24th ACM Symposium on Operating
Systems Principles, 2013.
J.
ˇ
Sim
ˇ
sa, R. Bryant, G. A. Gibson, and J. Hickey, “Scalable Dynamic Partial Order Reduction, 3rd Inter-
national Conference on Runtime Verification, 2012.
B. A. Brady, R. E. Bryant, and S. A. Seshia, “Learning Conditional Abstractions, Formal Methods in
Computer-Aided Design, October, 2011, pp. 116–124. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/fmcad11.pdf
J.
ˇ
Sim
ˇ
sa, G. A. Gibson, and R. E. Bryant, “dBug: Systematic Testing of Unmodified Distributed and Multi-
Threaded Programs, 18th International Workshop on Model Checking of Softare (SPIN ’11), 2011.
B. A. Brady, R. E. Bryant, S. A. Seshia, and J. W. O’Leary, “ATLAS: Automatic Term-Level Abstraction of
RTL Designs, Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign
(MEMOCODE), July, 2010. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/memocode10.pdf.
R. E. Bryant, D. Kroening, J. Ouaknine, S. A. Seshia, O. Strichman, and B. Brady, “Deciding Bit-Vector
Arithmetic with Abstraction, Tools and Algorithms for the Construction and Analysis of Systems TACAS 2007,
April, 2007. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/tacas07.pdf.
M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E. Bryant, “Semantics Aware Malware Detection,
IEEE Symposium on Security and Privacy, May, 2005, pp. 32–46. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/oakland05.pdf.
V. Ganapathy, S. A. Seshia, S. Jha, T. W. Reps, and R. E. Bryant, Automatic Discovery of API-Level
Exploits, International Conference on Software Engineering ICSE 05, May, 2005, pp. 312–321. Available
as
http://www.cs.cmu.edu/˜bryant/pubdir/icse05.pdf.
S. A. Seshia, R. E. Bryant, and K. S. Stevens, “Modeling and Verifying Circuits Using Generalized Relative
Timing, IEEE International Symposium on Asynchronous Circuits and Systems, ASYNC 05, March, 2005,
pp. 98–108 Available as
http://www.cs.cmu.edu/˜bryant/pubdir/async05.pdf.
S. K. Lahiri and R. E. Bryant, “Indexed Predicate Discovery for Unbounded System Verification, Computer-
Aided Verification CAV 2004, R. Alur, and D. A. Peled, eds., LNCS 3114, Springer-Verlag, July, 2004,
pp. 135–147. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/cav04b.pdf.
A. Goel and R. E. Bryant, “Symbolic Simulation, Model Checking and Abstraction with Partially Ordered
Boolean Function Vectors, Computer-Aided Verification CAV 2004, R. Alur, and D. A. Peled, eds., LNCS
3114, Springer-Verlag, July, 2004, pp. 255–267. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/cav04a.pdf.
S. A. Seshia and R. E. Bryant, “Deciding Quantifier-Free Presburger Formulas Using Parameterized Solu-
tion Bounds, Logic in Computer Science LICS 2004, IEEE, July, 2004, pp. 100–109. Available as
http://www.cs.cmu.edu/˜bryant/pubdir/lics04.pdf.
R. M. Jensen, M. M. Veloso, and R. E. Bryant, “Fault Tolerant Planning: Toward Probabilistic Uncertainty
Models in Symbolic Non-Deterministic Planning, International Conference on Automated Planning and
Scheduling ICAPS 04, June, 2004. Available as
5

Citations
More filters
Proceedings ArticleDOI
01 Dec 2007
TL;DR: A binary obfuscation scheme that relies on opaque constants, which are primitives that allow us to load a constant into a register such that an analysis tool cannot determine its value, demonstrates that static analysis techniques alone might no longer be sufficient to identify malware.
Abstract: Malicious code is an increasingly important problem that threatens the security of computer systems. The traditional line of defense against malware is composed of malware detectors such as virus and spyware scanners. Unfortunately, both researchers and malware authors have demonstrated that these scanners, which use pattern matching to identify malware, can be easily evaded by simple code transformations. To address this shortcoming, more powerful malware detectors have been proposed. These tools rely on semantic signatures and employ static analysis techniques such as model checking and theorem proving to perform detection. While it has been shown that these systems are highly effective in identifying current malware, it is less clear how successful they would be against adversaries that take into account the novel detection mechanisms. The goal of this paper is to explore the limits of static analysis for the detection of malicious code. To this end, we present a binary obfuscation scheme that relies on the idea of opaque constants, which are primitives that allow us to load a constant into a register such that an analysis tool cannot determine its value. Based on opaque constants, we build obfuscation transformations that obscure program control flow, disguise access to local and global variables, and interrupt tracking of values held in processor registers. Using our proposed obfuscation approach, we were able to show that advanced semantics-based malware detectors can be evaded. Moreover, our opaque constant primitive can be applied in a way such that is provably hard to analyze for any static code analyzer. This demonstrates that static analysis techniques alone might no longer be sufficient to identify malware.

838 citations


Cites background or methods from "Semantics-aware malware detection"

  • ...However, based on their description, a more powerful static analyzer such as the one introduced by the same authors in [3] can undo these obfuscations....

    [...]

  • ...According to [3], such a template consists of “(1) a loop that processes data from a source memory area and writes data to a destination memory area, and (2) a jump that targets the destination area....

    [...]

  • ...These detectors [3, 10, 11] operate with abstract models, or templates, that describe the behavior of malicious code....

    [...]

  • ...Semantics-Aware Malware Detection: Another system that uses code templates instead of patterns to specify malicious code was presented in [3]....

    [...]

Proceedings ArticleDOI
28 Oct 2007
TL;DR: This work proposes a system, Panorama, to detect and analyze malware by capturing malicious information access and processing behavior, which separates these malicious applications from benign software.
Abstract: Malicious programs spy on users' behavior and compromise their privacy. Even software from reputable vendors, such as Google Desktop and Sony DRM media player, may perform undesirable actions. Unfortunately, existing techniques for detecting malware and analyzing unknown code samples are insufficient and have significant shortcomings. We observe that malicious information access and processing behavior is the fundamental trait of numerous malware categories breaching users' privacy (including keyloggers, password thieves, network sniffers, stealth backdoors, spyware and rootkits), which separates these malicious applications from benign software. We propose a system, Panorama, to detect and analyze malware by capturing this fundamental trait. In our extensive experiments, Panorama successfully detected all the malware samples and had very few false positives. Furthermore, by using Google Desktop as a case study, we show that our system can accurately capture its information access and processing behavior, and we can confirm that it does send back sensitive information to remote servers in certain settings. We believe that a system such as Panorama will offer indispensable assistance to code analysts and malware researchers by enabling them to quickly comprehend the behavior and innerworkings of an unknown sample.

796 citations


Cites background from "Semantics-aware malware detection"

  • ...Although semantic-aware signature checking [11] improves its resilience to polymorphic and metamorphic variants, the inherent limitation of the signature based approach is its incapability of detecting previously unseen malware instances....

    [...]

Proceedings ArticleDOI
27 Oct 2008
TL;DR: Ether, a transparent and external approach to malware analysis, is proposed, which is motivated by the intuition that for a malware analyzer to be transparent, it must not induce any side-effects that are unconditionally detectable by malware.
Abstract: Malware has become the centerpiece of most security threats on the Internet. Malware analysis is an essential technology that extracts the runtime behavior of malware, and supplies signatures to detection systems and provides evidence for recovery and cleanup. The focal point in the malware analysis battle is how to detect versus how to hide a malware analyzer from malware during runtime. State-of-the-art analyzers reside in or emulate part of the guest operating system and its underlying hardware, making them easy to detect and evade. In this paper, we propose a transparent and external approach to malware analysis, which is motivated by the intuition that for a malware analyzer to be transparent, it must not induce any side-effects that are unconditionally detectable by malware. Our analyzer, Ether, is based on a novel application of hardware virtualization extensions such as Intel VT, and resides completely outside of the target OS environment. Thus, there are no in-guest software components vulnerable to detection, and there are no shortcomings that arise from incomplete or inaccurate systememulation. Our experiments are based on our study of obfuscation techniques used to create 25,000 recent malware samples. The results show that Ether remains transparent and defeats the obfuscation tools that evade existing approaches.

756 citations


Cites background from "Semantics-aware malware detection"

  • ...Thefocalpointinthemalware analysis battle is how to detect versus how to hide a malware ana­lyzerfrom malwareduring runtime....

    [...]

Journal ArticleDOI
TL;DR: An incremental approach for behavior-based analysis, capable of processing the behavior of thousands of malware binaries on a daily basis is proposed, significantly reduces the run-time overhead of current analysis methods, while providing accurate discovery and discrimination of novel malware variants.
Abstract: Malicious software - so called malware - poses a major threat to the security of computer systems. The amount and diversity of its variants render classic security defenses ineffective, such that millions of hosts in the Internet are infected with malware in the form of computer viruses, Internet worms and Trojan horses. While obfuscation and polymorphism employed by malware largely impede detection at file level, the dynamic analysis of malware binaries during run-time provides an instrument for characterizing and defending against the threat of malicious software. In this article, we propose a framework for the automatic analysis of malware behavior using machine learning. The framework allows for automatically identifying novel classes of malware with similar behavior (clustering) and assigning unknown malware to these discovered classes (classification). Based on both, clustering and classification, we propose an incremental approach for behavior-based analysis, capable of processing the behavior of thousands of malware binaries on a daily basis. The incremental analysis significantly reduces the run-time overhead of current analysis methods, while providing accurate discovery and discrimination of novel malware variants.

675 citations


Cites methods from "Semantics-aware malware detection"

  • ...Finally, semantics-aware analysis of malware binaries has been devised by [12] and later on extended by [48,49]....

    [...]

Book ChapterDOI
10 Jul 2008
TL;DR: The effectiveness of the proposed method for learning and discrimination of malware behavior is demonstrated, especially in detecting novel instances of malware families previously not recognized by commercial anti-virus software.
Abstract: Malicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the effectiveness of classical signature-based detection. Yet variants of malware families share typical behavioral patternsreflecting its origin and purpose. We aim to exploit these shared patterns for classification of malware and propose a method for learning and discrimination of malware behavior. Our method proceeds in three stages: (a) behavior of collected malware is monitored in a sandbox environment, (b) based on a corpus of malware labeled by an anti-virus scanner a malware behavior classifieris trained using learning techniques and (c) discriminative features of the behavior models are ranked for explanation of classification decisions. Experiments with different heterogeneous test data collected over several months using honeypots demonstrate the effectiveness of our method, especially in detecting novelinstances of malware families previously not recognized by commercial anti-virus software.

648 citations


Cites background or methods from "Semantics-aware malware detection"

  • ...These techniques are especially e ective against byte-level content analysis [18, 20] and static malware analysis methods [8, 10, 12]....

    [...]

  • ...Extensive literature exists on static analysis of malicious binaries, e.g. [8, 10, 19, 21]....

    [...]

References
More filters
Proceedings ArticleDOI
01 Jun 2000
TL;DR: An integrated collection of program analysis and transformation components, called Bandera, that enables the automatic extraction of safe, compact finite-state models from program source code.
Abstract: Finite-state verification techniques, such as model checking, have shown promise as a cost-effective means for finding defects in hardware designs To date, the application of these techniques to software has been hindered by several obstacles Chief among these is the problem of constructing a finite-state model that approximates the executable behavior of the software system of interest Current best-practice involves hand-construction of models which is expensive (prohibitive for all but the smallest systems), prone to errors (which can result in misleading verification results), and difficult to optimize (which is necessary to combat the exponential complexity of verification algorithms)In this paper, we describe an integrated collection of program analysis and transformation components, called Bandera, that enables the automatic extraction of safe, compact finite-state models from program source code Bandera takes as input Java source code and generates a program model in the input language of one of several existing verification tools; Bandera also maps verifier outputs back to the original source code We discuss the major components of Bandera and give an overview of how it can be used to model check correctness properties of Java programs

1,135 citations

Journal ArticleDOI
Fred Cohen1
TL;DR: This paper introduces ''computer viruses'' and examines their potential for causing widespread damage to computer systems and the infeasibility of viral defense in large classes of systems.

916 citations


"Semantics-aware malware detection" refers methods in this paper

  • ...Cohen [ 10 ] and Chess-White [6] propose a virus detection model that executes code in a sandbox....

    [...]

  • ...Cohen [ 10 ] and Chess-White [6] showed that in general the problem of virus detection is undecidable....

    [...]

01 Jan 2001

820 citations

Proceedings ArticleDOI
27 Oct 2003
TL;DR: Experimental results indicate that significant portions of executables that have been obfuscated using the techniques described are disassembled incorrectly, thereby showing the efficacy of the methods.
Abstract: A great deal of software is distributed in the form of executable code. The ability to reverse engineer such executables can create opportunities for theft of intellectual property via software piracy, as well as security breaches by allowing attackers to discover vulnerabilities in an application. The process of reverse engineering an executable program typically begins with disassembly, which translates machine code to assembly code. This is then followed by various decompilation steps that aim to recover higher-level abstractions from the assembly code. Most of the work to date on code obfuscation has focused on disrupting or confusing the decompilation phase. This paper, by contrast, focuses on the initial disassembly phase. Our goal is to disrupt the static disassembly process so as to make programs harder to disassemble correctly. We describe two widely used static disassembly algorithms, and discuss techniques to thwart each of them. Experimental results indicate that significant portions of executables that have been obfuscated using our techniques are disassembled incorrectly, thereby showing the efficacy of our methods.

694 citations

ReportDOI
04 Aug 2003
TL;DR: An architecture for detecting malicious patterns in executables that is resilient to common obfuscation transformations is presented, and experimental results demonstrate the efficacy of the prototype tool, SAFE (a static analyzer for executables).
Abstract: Malicious code detection is a crucial component of any defense mechanism. In this paper, we present a unique viewpoint on malicious code detection. We regard malicious code detection as an obfuscation-deobfuscation game between malicious code writers and researchers working on malicious code detection. Malicious code writers attempt to obfuscate the malicious code to subvert the malicious code detectors, such as anti-virus software. We tested the resilience of three commercial virus scanners against code-obfuscation attacks. The results were surprising: the three commercial virus scanners could be subverted by very simple obfuscation transformations! We present an architecture for detecting malicious patterns in executables that is resilient to common obfuscation transformations. Experimental results demonstrate the efficacy of our prototype tool, SAFE (a static analyzer for executables).

691 citations


"Semantics-aware malware detection" refers methods in this paper

  • ...SAFE can only handle very simple obfuscations (only nops can appear between matching instructions), e.g., the example shown in Figure 1 cannot be handled by SAFE....

    [...]

  • ...In this area, we previously described a malware-detection algorithm called SAFE [7]....

    [...]

Frequently Asked Questions (2)
Q1. What are the contributions in this paper?

Bryant and O'Hallaron this paper presented a survey of the 20 years of ICCAD 's 20 Years of Excellence in Computer-Aided Design. 

Winner of best paper award in category “Verification, Simulation, and Test.” Available as http://www.cs.cmu.edu/˜bryant/pubdir/dac95a.pdf.