Abstraction-Based Malware Analysis Using Rewriting and Model Checking
Summary (3 min read)
1 Introduction
- These dynamic abstraction-based approaches, though they can detect unknown viruses whose execution traces exhibit known malicious behaviors, only deal with a single execution trace.
- Static behavior analysis by abstraction is more challenging than its dynamic counterpart because, precisely, this approach needs to abstract a program behavior potentially representing an infinite set of execution traces.
- An interesting application of static behavior analysis is the audit of programs in high-level technologies, like mobile applications, browser extensions, web page scripts, .NET or Java programs.
Previous work.
- In [4] , the authors already proposed to abstract program sets of traces with respect to behavior patterns, for detection and analysis.
- These samples belonged to known malware families, like Allaple, Virut, Agent, Rbot, Afcore and Mimail.
- But patterns were defined by string rewriting systems, which did not allow the actions composing a trace to have parameters, precluding dataflow analysis.
- The formalism proposed in this paper addresses both issues: first, the authors handle interleaved patterns by keeping the identified patterns when abstracting them.
- Second, the authors extend the rewriting framework to express data constraints on action parameters by using term rewriting systems.
2 Background
- The elements of T Trace (F) are called traces, the elements of T Action (F) are called actions.
- The authors distinguish the sort Action from the sort Trace but, for a sake of readability, they may denote by a the trace (a, ǫ), for some action a.
- Similarly, the authors use the symbol with infix notation and right associativity, and ǫ is understood when the context is unambiguous.
- Σ therefore represents the finite set of library calls, while terms built on F d identify the arguments and the return values of these calls.
- Using FOLTL on finite traces allows us a correct balance between behavior expresivity and decidability.
3 Behavior Patterns
- The problem under study can be formalized in the following way.
- The authors goal is then to find an effective and efficient method solving this problem.
- The authors describe a functionality by an FOLTL formula, such that traces satisfying this formula are traces carrying out the functionality.
- One way of realizing it consists in calling the socket function with the parameter IPPROTO ICMP describing the network protocol and, then, calling the sendto function with the parameter ICMP ECHOREQ describing the data to be sent.
- Between these two calls, the socket should not be freed.
5 Detection Problem
- Then the detection problem can be formalizeded as follows.
- The authors want to exclude traces unreliably realizing the abstract behavior in R ≤n (L), while not having to reach normal forms.
- The following propositions show that the (m, n)-completeness property is realistic for abstract behaviors considered in practice.
- The first step computes the abstract forms of the program traces while the second step applies usual verification techniques in order to decide whether one of the computed traces verifies the FOLTL formula defining the abstract behavior.
- The authors therefore show that, in the previous proposition, (m, n)-completeness allows us to nonetheless preserve that decomposition, so that the abstraction step now becomes decidable.
6 Detection Complexity
- The detection problem, like the more general problem of program analysis, requires computing a partial abstraction of the set of analyzed traces.
- 4, the abstraction relation is rational, which entails the decidability of detection.
- Using the set of traces n-reliably realizing M , when T Action (F) is finite, the authors get the following detection complexity, which is linear in the size of the automaton recognizing the program set of traces, a major improvement on the exponential complexity bound of [17] .
7 Information Leak Behaviors
- Such a leak can be decomposed into two steps: capturing sensitive information and sending this information to an exogenous location.
- The captured data can be keystrokes, passwords or data read from a sensitive network location, while the exogenous location can be the network, a removable device, etc.
- Moreover, since the captured data must not be invalidated before being leaked, the authors define a behavior pattern λ inval (x), which represents such an invalidation.
- The authors consider the following definitions of the four behavior patterns involved, after looking at several malware samples, like keyloggers, sms message leaking applications or personal information stealing mobile applications: keystroke capture functionality:.
8 Experiments
- The authors goal is to detect the information leak behavior M defined in the previous section.
- In order to perform behavior pattern abstraction and behavior detection in the presence of data, the authors use the CADP toolbox [14] , which allows us to manipulate and model-check communicating processes written in the LO-TOS language.
- First, approximation of conditional branches by nondeterministic branches may result in false positives, especially when the program code is obfuscated.
- The first one comes from a study on the detection rate of keylogger programs by existing antivirus [13] , which shows a high failure rate.
- It then requests Android systems through its file metadata, to execute OnReceive on each SMS received or sent.
9 Conclusion
- The authors presented an original approach for detecting high-level behaviors in programs, describing combinations of functionalities and defined by first-order temporal logic formulas.
- Behavior patterns, expressing concrete realizations of functionalities, are also defined by first-order temporal logic formulas.
- Validation of the abstracted traces with respect to some high-level behavior is performed via usual model checking techniques.
- Moreover, high-level behaviors and behavior patterns are easy to update since they are expressed in terms of basic blocks.
- Applicability of their detection technique could be further enhanced by automating construction of reference behavior patterns, for example using mining techniques as in [9] .
Did you find this useful? Give us your feedback
Citations
185 citations
Cites background from "Abstraction-Based Malware Analysis ..."
...represented rewriting and model checking which capture high-level malware behaviors when detecting malware [89]....
[...]
177 citations
Cites background from "Abstraction-Based Malware Analysis ..."
...In the same direction, several static filters and tools are proposed in the literature to speed up the detection of similar malware samples [31, 9, 23, 60, 39, 44, 61, 41, 17, 27, 36, 48, 49]....
[...]
36 citations
Cites methods from "Abstraction-Based Malware Analysis ..."
...FO-LTL was used for malware detection in [3]....
[...]
34 citations
29 citations
Cites background from "Abstraction-Based Malware Analysis ..."
...F.3.1 [Theory of Computation]: Specifying and Verifying and Reasoning about Programs—Malware Detection...
[...]
References
[...]
916 citations
"Abstraction-Based Malware Analysis ..." refers methods in this paper
...Behavior analysis was introduced by Cohen’s seminal work [11] to detect malware and in particular unknown malware....
[...]
791 citations
"Abstraction-Based Malware Analysis ..." refers methods in this paper
...Finally, given some program p coming with an infinite set of traces L (static analysis scenario, for instance by using the control flow graph, see our previous work [4] and [10, 20]), we formulate the detection problem in the following way....
[...]
...We first represent the program set of traces as a CADP process, using a program control flow graph obtained by static analysis (see [4] and [10, 20])....
[...]
697 citations
"Abstraction-Based Malware Analysis ..." refers background in this paper
...In [23] and in [3], a captured execution trace is transformed into a higher-level representation capturing its semantic meaning, i....
[...]
624 citations
"Abstraction-Based Malware Analysis ..." refers background in this paper
...In general, a behavior is described by a sequence of system calls and recognition uses the formalism of finite state automata [22, 26, 24, 6]....
[...]
Related Papers (5)
Frequently Asked Questions (12)
Q2. What is the function that allows on-the-fly model checking of formulas?
CADP features a verification tool, which allows on-the-fly model checking of formulas expressed in the MCL language, a fragment of the modal mu-calculus extended with data variables, whose FOLTL logic used in this paper is a subset.
Q3. What is the main idea behind the behavior analysis?
Underpinned by language theory, term rewriting and first-order temporal logic, it allows us to determine whether a program exhibits a high-level behavior.
Q4. What is the interesting application of static behavior analysis?
An interesting application of static behavior analysis is the audit of programs in high-level technologies, like mobile applications, browser extensions, web page scripts, .NET or Java programs.
Q5. What is the behavior pattern that is used to represent the data?
since the captured data must not be invalidated before being leaked, the authors define a behavior pattern λinval (x), which represents such an invalidation.
Q6. How do the authors describe a function by an FOLTL formula?
The authors describe a functionality by an FOLTL formula, such that traces satisfying this formula are traces carrying out the functionality.
Q7. What is the general definition of a behavior?
In general, a behavior is described by a sequence of system calls and recognition uses the formalism of finite state automata [22, 26, 24, 6].
Q8. What is the key to the problem of constructing the normal form trace set?
In order to address the general intractability of the problem of constructing the normal form trace set for a given program, the authors have identified a property of practical high-level behaviors allowing us to avoid computing normal forms and yielding a linear time detection algorithm.
Q9. How is the ping behavior pattern in Example 1 defined?
The ping behavior pattern in Example 1 is abstracted in traces by inserting the λping symbol after the send action or after the IcmpSendEcho action.
Q10. What is the purpose of the abstract behavior analysis framework?
Their abstraction framework can be used in two scenarios:– Detection of given behaviors: signatures of given high-level behaviors are expressed in terms of abstract functionalities.
Q11. What is the simplest way to prove that a tree transducer is a rational?
The authors show that this is sufficient, with termination of the set of rules, to ensure that the abstraction relation is realizable by a tree transducer, in other words that it is a rational tree transduction.
Q12. What is the motivation behind the behavior analysis approach?
This has motivated yet another approach where a malicious behavior is specified as a combination of high-level actions, in order to be independent from the way these actions are realized and to only consider their effect on a system.