scispace - formally typeset
Search or ask a question
Book ChapterDOI

Abstraction-Based Malware Analysis Using Rewriting and Model Checking

TL;DR: This work uses a rewriting-based abstraction mechanism, producing abstracted forms of program traces, independent of the program implementation, which allows it to handle similar behaviors in a generic way and thus to be robust with respect to variants.
Abstract: We propose a formal approach for the detection of high-level malware behaviors. Our technique uses a rewriting-based abstraction mechanism, producing abstracted forms of program traces, independent of the program implementation. It then allows us to handle similar behaviors in a generic way and thus to be robust with respect to variants. These behaviors, defined as combinations of patterns given in a signature, are detected by model-checking on the high-level representation of the program. We work on unbounded sets of traces, which makes our technique useful not only for dynamic analysis, considering one trace at a time, but also for static analysis, considering a set of traces inferred from a control flow graph. Abstracting traces with rewriting systems on first order terms with variables allows us in particular to model dataflow and to detect information leak.

Summary (3 min read)

1 Introduction

  • These dynamic abstraction-based approaches, though they can detect unknown viruses whose execution traces exhibit known malicious behaviors, only deal with a single execution trace.
  • Static behavior analysis by abstraction is more challenging than its dynamic counterpart because, precisely, this approach needs to abstract a program behavior potentially representing an infinite set of execution traces.
  • An interesting application of static behavior analysis is the audit of programs in high-level technologies, like mobile applications, browser extensions, web page scripts, .NET or Java programs.

Previous work.

  • In [4] , the authors already proposed to abstract program sets of traces with respect to behavior patterns, for detection and analysis.
  • These samples belonged to known malware families, like Allaple, Virut, Agent, Rbot, Afcore and Mimail.
  • But patterns were defined by string rewriting systems, which did not allow the actions composing a trace to have parameters, precluding dataflow analysis.
  • The formalism proposed in this paper addresses both issues: first, the authors handle interleaved patterns by keeping the identified patterns when abstracting them.
  • Second, the authors extend the rewriting framework to express data constraints on action parameters by using term rewriting systems.

2 Background

  • The elements of T Trace (F) are called traces, the elements of T Action (F) are called actions.
  • The authors distinguish the sort Action from the sort Trace but, for a sake of readability, they may denote by a the trace (a, ǫ), for some action a.
  • Similarly, the authors use the symbol with infix notation and right associativity, and ǫ is understood when the context is unambiguous.
  • Σ therefore represents the finite set of library calls, while terms built on F d identify the arguments and the return values of these calls.
  • Using FOLTL on finite traces allows us a correct balance between behavior expresivity and decidability.

3 Behavior Patterns

  • The problem under study can be formalized in the following way.
  • The authors goal is then to find an effective and efficient method solving this problem.
  • The authors describe a functionality by an FOLTL formula, such that traces satisfying this formula are traces carrying out the functionality.
  • One way of realizing it consists in calling the socket function with the parameter IPPROTO ICMP describing the network protocol and, then, calling the sendto function with the parameter ICMP ECHOREQ describing the data to be sent.
  • Between these two calls, the socket should not be freed.

5 Detection Problem

  • Then the detection problem can be formalizeded as follows.
  • The authors want to exclude traces unreliably realizing the abstract behavior in R ≤n (L), while not having to reach normal forms.
  • The following propositions show that the (m, n)-completeness property is realistic for abstract behaviors considered in practice.
  • The first step computes the abstract forms of the program traces while the second step applies usual verification techniques in order to decide whether one of the computed traces verifies the FOLTL formula defining the abstract behavior.
  • The authors therefore show that, in the previous proposition, (m, n)-completeness allows us to nonetheless preserve that decomposition, so that the abstraction step now becomes decidable.

6 Detection Complexity

  • The detection problem, like the more general problem of program analysis, requires computing a partial abstraction of the set of analyzed traces.
  • 4, the abstraction relation is rational, which entails the decidability of detection.
  • Using the set of traces n-reliably realizing M , when T Action (F) is finite, the authors get the following detection complexity, which is linear in the size of the automaton recognizing the program set of traces, a major improvement on the exponential complexity bound of [17] .

7 Information Leak Behaviors

  • Such a leak can be decomposed into two steps: capturing sensitive information and sending this information to an exogenous location.
  • The captured data can be keystrokes, passwords or data read from a sensitive network location, while the exogenous location can be the network, a removable device, etc.
  • Moreover, since the captured data must not be invalidated before being leaked, the authors define a behavior pattern λ inval (x), which represents such an invalidation.
  • The authors consider the following definitions of the four behavior patterns involved, after looking at several malware samples, like keyloggers, sms message leaking applications or personal information stealing mobile applications: keystroke capture functionality:.

8 Experiments

  • The authors goal is to detect the information leak behavior M defined in the previous section.
  • In order to perform behavior pattern abstraction and behavior detection in the presence of data, the authors use the CADP toolbox [14] , which allows us to manipulate and model-check communicating processes written in the LO-TOS language.
  • First, approximation of conditional branches by nondeterministic branches may result in false positives, especially when the program code is obfuscated.
  • The first one comes from a study on the detection rate of keylogger programs by existing antivirus [13] , which shows a high failure rate.
  • It then requests Android systems through its file metadata, to execute OnReceive on each SMS received or sent.

9 Conclusion

  • The authors presented an original approach for detecting high-level behaviors in programs, describing combinations of functionalities and defined by first-order temporal logic formulas.
  • Behavior patterns, expressing concrete realizations of functionalities, are also defined by first-order temporal logic formulas.
  • Validation of the abstracted traces with respect to some high-level behavior is performed via usual model checking techniques.
  • Moreover, high-level behaviors and behavior patterns are easy to update since they are expressed in terms of basic blocks.
  • Applicability of their detection technique could be further enhanced by automating construction of reference behavior patterns, for example using mining techniques as in [9] .

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-00762252
https://hal.inria.fr/hal-00762252
Submitted on 10 Dec 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Abstraction-based Malware Analysis Using Rewriting
and Model Checking
Philippe Beaucamps, Isabelle Gnaedig, Jean-Yves Marion
To cite this version:
Philippe Beaucamps, Isabelle Gnaedig, Jean-Yves Marion. Abstraction-based Malware Analysis Using
Rewriting and Model Checking. ESORICS - 17th European Symposium on Research in Computer
Security - 2012, Sep 2012, Pisa, Italy. pp.806-823, �10.1007/978-3-642-33167-1�. �hal-00762252�

Abstraction-based Malware Analysis Using
Rewriting and Model Checking
Philippe Beaucamps
1
, Isabelle Gnaedig
2
, Jean-Yves Marion
1
1
Universit´e de Lorraine, LORIA, UMR 7503, Vandoeuvre-l`es-Nancy, F-54506, France
2
Inria, Villers-l`es-Nancy, F-54600, France
{Philippe.Beaucamps,Isabelle.Gnaedig,Jean-Yves.Marion}@loria.fr
Abstract. We propose a formal approach for the detection of high-level
malware behaviors. Our technique uses a rewriting-based abstraction
mechanism, producing abstracted forms of program traces, independent
of the program implementation. It then allows us to handle similar be-
haviors in a generic way and thus to be robust with respect to variants.
These behaviors, defined as combinations of patterns given in a signa-
ture, are detected by model-checking on the high-level representation of
the program. We work o n unbounded sets of traces, which makes our
technique useful not only for dynamic analysis, considering one trace at
a time, but also for static analysis, considering a set of traces inferred
from a control flow graph. Abstrac ting traces with rewriting systems on
first order terms with variables allows us in particular to model dataflow
and to detect information leak.
Keywords: Malware, behavioral detection, behavior abstraction, trace,
term rewriting, model checking, first order temporal logic, finite state
automaton, formal language.
1 Introduction
Behavior analysis was introduced by Cohen’s seminal work [11] to detect mal-
ware and in particular unknown malware. In general, a behavior is described by
a sequence of system calls and recognition uses the formalism of finite state au-
tomata [22, 26, 24, 6]. New approaches have been proposed recently. In [18, 27],
malicious behaviors ar e specified by temporal logic formulas with parameters
and detection is carried out by model-checking. However, th ese approaches are
tightly dependent on the way malicious actions are realized: using any oth er
system facility to realize an action allows a malware to go undetected. This has
motivated yet another approach where a malicious behavior is specified as a
combination of high-level actions, in order to be independent from the way these
actions are realized and to only consider their effect on a system. In [23] and
in [3], a captured execution trace is transformed into a higher-level represen-
tation capturing its semantic m eaning, i.e., the trace is first abstracted before
being compared to a malicious behavior. In [17], the authors propose to use
attribute automata, at the price of an exponential time complexity detection.

These dynamic abstraction-based approaches, though they can detect unknown
viruses whose execution traces exhibit known malicious behaviors, only deal with
a single execution trace.
In this paper, we propose a formal approach for high-level behavior analysis,
with the following features. Underpinned by language theory, term rewriting and
first-order temporal logic, it allows us to determine whether a program exhibits
a high-level behavior. Detection is achieved in two steps. First, traces of the pro-
gram are abstracted in order to reveal the sequences of high-level functionalities
they realize. Then, abstracted traces are compared with the behavior formula,
using usual model-checking techniques. Functionalities have parameters repre-
senting the manipulated data, so our formalism is adapted to the protection
against generic threats like the leak of sensitive information.
Our goal here is not to provide a ready-made software to detect behaviors, but
to propose a formal framewok emphasizing fundamental detection mechanisms,
which are independent of implementation-based solutions.
Our approach has two main characteristics. First, we work on an unbounded
set of traces representing the behavior of a program, in order to consider a more
complete representation of the program than with a single trace. To deal with
the infinity of the set of traces, we restrict to regular sets an d safely approximate
the set of abstract traces, so that we detect in linear time whether a program
exhibits a given behavior. Second, we work on abstract forms of traces, in or-
der to only keep the essence of the functions performed by the program, to be
independent of their possible implementations and to be generic with respect
to behavior mutations. Behavior components are abstracted in program traces,
by identifying known functionalities and marking them by inserting abstract
functionality symbols.
By working on sets of tr aces, which may consist of a single trace as well
as of an unbounded numb er of traces, our approach may be used not only for
classical, dynamic behavior analysis, but also for static behavior analysis i.e.,
behavior analysis in a static analysis setting.
Static behavior analysis by abstraction is more challenging than its dynamic
counterpart because, precisely, this approach needs to abstract a program behav-
ior potentially representing an infinite set of execution traces. The construction
of an exhaustive representation of a program behavior is an intractable prob-
lem in general: in particular, a program flow may not be easily followed due to
indirect jumps, and a program may use complex code protection, for instance
by dynamically modifying its code or by using obfuscation. Self modification is
usually tackled by emulating the program long enough to deactivate most code
protections. Indirect jumps and obfuscation are usually handled by abstract in-
terpretation [25, 19] or symbolic execution [7].
Static behavior analysis has many advantages and applications. First, it al-
lows us to analyze the behavior of a program in a more exhaustive way, as it
analyzes the unbounded set of the program execution traces, or an approxima-
tion of it. Second, static behavior analysis can complement classical, dynamic,

behavior analysis with an analysis of the future behavior, to prevent damages
when some critical point is reached in an execution.
An interesting application of static behavior analysis is the audit of pro-
grams in high-level technologies, like mobile applications, browser extensions,
web page scripts, .NET or Java programs. Auditing these programs is complex
and mostly manual, resulting in highly publicized infections [2, 1]. In this con-
text, static analysis can provide an appropriate help, because it is usually easier
than for usual programs, especially when additionally enforcing a security pol-
icy (e.g. p rohibiting self-modification [28]) or when enforcing strict development
guidelines (e.g. for iPhone applications).
To our knowledge, the use of behavior abstraction on top of static behavior
analysis has not been investigated so far. As our detection mechanism relies on
satisfaction of temporal logic formulas, it is akin to model checking [21], for which
there already exist numerous frameworks and tools [16, 14, 8]. Th e specificity of
our approach, however, is that, rather than being applied on the set of program
traces, verification is applied on the set of abstract forms of these traces, which is
not computable in general. Accordingly, we identify a property of practical high-
level behaviors allowing us to approximate this set, in a sound and complete way
with respect to detection, and then to apply classical verification techniques.
Our abstraction framework can be used in two scenarios:
Detection of given behaviors: signatures of given high-level behaviors are ex-
pressed in terms of abstract functionalities. Given some program, we then
assess whether one of its execution traces exhibits a s equence of known func-
tionalities, in a way specific to one of the given behaviors. This can be applied
to detection of suspicious behaviors. Although detection of such suspicious
behaviors may not suffice to label a program as malicious, it can be used to
supplement existing detection techniques with additional decision criteria.
Analysis of programs: abstraction provides a simple and high-level represen-
tation of a program behavior, which is more suitable than the original traces
for manual analysis, or for analysis of behavior similarity with known be-
haviors, etc. For instance, it could be used to detect not necessarily harmful
behaviors, in order to get a basic understanding of the program and to fur-
ther investigate if deemed necessary. It could also be used to automatically
discover sequences of high-level functionalities and their dataflow dependen-
cies, exhibited by a program.
Previous work. In [4], we already proposed to abstract program sets of traces
with respect to behavior patterns, for detection and analysis. We tested our
approach on samples of malicious programs collected using a honeypot
3
and
identified using Kaspersky Antivirus. These samples belonged to known malware
families, like Allaple, Virut, Agent, R bot, Afcore and Mimail. Most of them were
successfully matched to our malware database.
3
The honeypot of the Loria’s High Security Lab: http://lhs.loria.fr

But patterns were defined by string rewriting systems, which did not allow
the actions composing a trace to have parameters, precluding dataflow analysis.
Moreover, abstraction rules replaced identified patterns by abstraction symbols
in the original trace, precluding a further detection of patterns interleaved with
the rewritten ones.
The formalism proposed in this paper addresses both issues: first, we handle
interleaved patterns by keeping the identified patterns when abstracting them.
Second, we extend the rewriting framework to express data constraints on action
parameters by using term rewriting systems. An important consequence is that,
unlike in [4], using the dataflow, we can detect information leaks in order to
prevent unauthorized disclosure or modifications of information.
2 Background
Term Algebras. Let S = {T race, Action, Data} be a set of sorts, F = F
t
F
a
∪F
d
be a finite S-sorted signature, where F
t
, F
a
, F
d
are mutually distinct and:
F
t
= {ǫ, ·} is the set of the trace constructors, where ǫ : T race denotes
the empty trace, . has profile Data T race T race;
F
a
is a set of function symbols or constants, with profile Data
n
Action,
n N, describing actions;
F
d
is a set of data constructors, with profile Data or Data
n
Data,
n N.
Let N
+
be the set of finite strings of positive natural numbers, called positions.
The empty string is denoted by λ, and u v means that u is prefix of v. Let X
be a set of S-sorted variables. A S-sorted term over (F, X) is a partial function
t : N
+
F X, such that the domain of definition of t, denoted by Pos(t),
is finite and satisfies, for w N
+
and i N: (1) wi Pos(t) w Pos(t),
(2) w Pos(t) t(w) F X. Pos(t) is called the set of positions of t. We
denote by T (F, X) (resp. T (F)) the set of S-sorted terms over ( F, X) (resp. the
set of finite ground terms over F). For any sort s S, and any of the above sets
of terms T we denote by T
s
the restriction of T to terms of sort s and by X
s
the subset of variables of X of sort s. For a term t with p Pos(t), we denote
by t|
p
the subterm of t at position p. We denote by t[t
]
p
the term obtained by
replacing by t
the subterm at position p in t. We use the abbreviated notation
x for variables x
1
, . . . , x
n
. So x X stands for x
1
, . . . , x
n
X, and if f F is
a symbol of arity n N, we denote by f (
x) the term f (x
1
, . . . , x
n
).
The elements of T
Trace
(F) are called traces, the elements of T
Action
(F) are
called actions. We distinguish the sort Action f rom the sort Trace but, for a
sake of readability, we may denote by a the trace · (a, ǫ), for some action a.
Similarly, we use the · symbol with infix notation and right associativity, and
ǫ is understood when the context is unambiguous. For instance, if a, b, c are
actions, a · b · c denotes the trace · (a, · ( b , · (c, ǫ)) ).
We partition F
a
in a set Σ of symbols, denoting concrete program-le vel ac-
tions, and a set Γ , denoting abstract actions identifying abstracted functional-
ities. To construct purely concrete (resp. abstract) terms, we use F
Σ
= F \ Γ

Citations
More filters
Journal ArticleDOI
TL;DR: A technique for quantifying the collusion threat, essentially the first step towards assessing the collusion risk, is presented, useful in finding the collusion candidate of interest which is critical given the high volume of Android apps available.
Abstract: App collusion refers to two or more apps working together to achieve a malicious goal that they otherwise would not be able to achieve individually. The permissions based security model of Android does not address this threat as it is rather limited to mitigating risks of individual apps. This paper presents a technique for quantifying the collusion threat, essentially the first step towards assessing the collusion risk. The proposed method is useful in finding the collusion candidate of interest which is critical given the high volume of Android apps available. We present our empirical analysis using a classified corpus of over 29,000 Android apps provided by Intel SecurityTM.

12 citations

Journal ArticleDOI
TL;DR: An intrusion detection algorithm based on ASDL is obtained that can find coordinated chop-chop attacks and is presented for automatically verifying whether or not the latter programs satisfy the formulas, that is, whether or whether the audit log coincides with the attack signatures.
Abstract: Wireless networks are more vulnerable to cyberattacks than cable networks. Compared with the misuse intrusion detection techniques based on pattern matching, the techniques based on model checking U+0028 MC U+0029 have a series of comparative advantages. However, the temporal logics employed in the existing latter techniques cannot express conveniently the complex attacks with synchronization phenomenon. To address this problem, we formalize a novel temporal logic language called attack signature description language U+0028 ASDL U+0029. On the basis of it, we put forward an ASDL model checking algorithm. Furthermore, we use ASDL programs, which can be considered as temporal logic formulas, to describe attack signatures, and employ other ASDL programs to create an audit log. As a result, the ASDL model checking algorithm can be presented for automatically verifying whether or not the latter programs satisfy the formulas, that is, whether or not the audit log coincides with the attack signatures. Thus, an intrusion detection algorithm based on ASDL is obtained. The case studies and simulations show that the new method can find coordinated chop-chop attacks.

11 citations


Cites methods from "Abstraction-Based Malware Analysis ..."

  • ...Detect Malware Using Model Checking Some studies, such as [22]−[27], employed the model checking technique to detect malicious behaviors in software....

    [...]

Proceedings ArticleDOI
21 Sep 2015
TL;DR: This paper presents a statistical based method that can be used to identify a specific dynamic behavior of a program and to extract sequences of native system functions with a potential malign outcome and proves to be an effective filtering method.
Abstract: Detection of malicious software is a current problem which can be solved via several approaches. Among these are signature based detection, heuristic detection and behavioral analysis. In the last year the number of malicious files has increased exponentially. At the same time, automated obfuscation methods (used to generate malicious files with similar behavior but different aspect) have grown significantly. In response to these new obfuscation methods, many security vendors have introduced file reputation techniques to quickly find out potentially clean and malicious samples. In this paper we present a statistical based method that can be used to identify a specific dynamic behavior of a program. The main idea behind this solution is to analyze the execution flow of every file and to extract sequences of native system functions with a potential malign outcome. This technique is reliable against most forms of malware polymorphism and is intended to work as a filtering system for different automated detection systems. We use a database consisting of approximately 50.000 malicious files gathered over the last three months and almost 3.000.000 clean files collected for a period of 3 years. Our technique proved to be an effective filtering method and helped us improve our detection response time against the most prevalent malware families discovered in the last year.

11 citations


Additional excerpts

  • ...One approach for behavior detection uses model-checking techniques to specify a malicious behavior[2][7]....

    [...]

Book ChapterDOI
21 Sep 2015
TL;DR: A novel and convenient way of accurately specifying malicious behavior in mobile environments by taking Android as a representative platform of analysis and implementation is presented, and it is shown how the specification can lend to robust mobile malware detection.
Abstract: The need to accurately specify and detect malicious behavior is widely known. This paper presents a novel and convenient way of accurately specifying malicious behavior in mobile environments by taking Android as a representative platform of analysis and implementation. Our specification takes a sequence-based approach in declaratively formulating a malicious action, whereby any two consecutive security-sensitive operations are connected by either a control or taint flow. It also captures the invocation context of an operation within an app’s component type and lifecycle/callback method. Additionally, exclusion of operations that are invoked from UI-related callback methods can be specified to indicate an action’s stealthy execution portions. We show how the specification is sufficiently expressive to describe malicious patterns that are commonly exhibited by mobile malware. To show the usefulness of the specification, and to demonstrate that it can derive stable and distinctive patterns of existing Android malware, we develop a static analyzer that can automatically check an app for numerous security-sensitive actions written using the specification. Given a target app’s uncovered behavior, the analyzer associates it with a collection of known malware families. Experiments show that our obfuscation-resistant analyzer can associate malware samples with their correct family with an accuracy of 97.2 %, while retaining the ability to differentiate benign apps from the profiled malware families with an accuracy of 97.6 %. These results positively show how the specification can lend to robust mobile malware detection.

8 citations


Cites background from "Abstraction-Based Malware Analysis ..."

  • ...While previous modelchecking based detectors work at the native code level [12] or on a generic platform [2], a recent work [15] applies model checking to Android apps....

    [...]

Proceedings ArticleDOI
01 Nov 2015
TL;DR: This work focuses on characteristic behaviour and other properties of malicious software that can be extracted by current analytic techniques and synthesised into malware behaviour description, or malware behavioural signature, independent from the binary representation of analysed program.
Abstract: Malware signatures play an essential role in defence against malicious programs which were analysed by malware analysts and identified as a security threat. It is important to maintain such detection mechanisms which identify known malicious software on a victim's computer system. However, the problem is that occurrence of unknown malicious software increases and these threats are undetectable with current malware signatures. This weakness of signature-based detection lead us to searching for novel approach to the problem of malicious features representation which should be effective in detection of unknown, obfuscated or mutated malware. We focus on characteristic behaviour and other properties of malicious software that can be extracted by current analytic techniques and synthesised into malware behaviour description, or malware behavioural signature, independent from the binary representation of analysed program.

8 citations


Cites background from "Abstraction-Based Malware Analysis ..."

  • ...(2) Hypothesis 1: There is such property or operation o ∈ O, that application of the analytic function anl to a benign sample sB ∈ SB and the malicious operation o results in a positive detection of o in sample sB : ∃o ∈ O : anl(sB , o) = true, sB ∈ SB, o ∈ O. (3) We will try to show that the…...

    [...]

References
More filters
Book ChapterDOI
29 Mar 2004
TL;DR: This work introduces a temporal logic of calls and returns (CaRet) for specification and algorithmic verification of correctness requirements of structured programs and presents a tableau construction that reduces the model checking problem to the emptiness problem for a Buchi pushdown system.
Abstract: Model checking of linear temporal logic (LTL) specifications with respect to pushdown systems has been shown to be a useful tool for analysis of programs with potentially recursive procedures. LTL, however, can specify only regular properties, and properties such as correctness of procedures with respect to pre and post conditions, that require matching of calls and returns, are not regular. We introduce a temporal logic of calls and returns (CaRet) for specification and algorithmic verification of correctness requirements of structured programs. The formulas of CaRet are interpreted over sequences of propositional valuations tagged with special symbols call and ret. Besides the standard global temporal modalities, CaRet admits the abstract-next operator that allows a path to jump from a call to the matching return. This operator can be used to specify a variety of non-regular properties such as partial and total correctness of program blocks with respect to pre and post conditions. The abstract versions of the other temporal modalities can be used to specify regular properties of local paths within a procedure that skip over calls to other procedures. CaRet also admits the caller modality that jumps to the most recent pending call, and such caller modalities allow specification of a variety of security properties that involve inspection of the call-stack. Even though verifying context-free properties of pushdown systems is undecidable, we show that model checking CaRet formulas against a pushdown model is decidable. We present a tableau construction that reduces our model checking problem to the emptiness problem for a Buchi pushdown system. The complexity of model checking CaRet formulas is the same as that of checking LTL formulas, namely, polynomial in the model and singly exponential in the size of the specification.

3,516 citations

Book
01 May 2011
TL;DR: The SPIN Model Checker as mentioned in this paper is used for both teaching software verification techniques, and for validating large scale applications, and it has been estimated that up to three-quarters of the $400 billion spent annually to hire programmers in the United States is ultimately spent on debugging.
Abstract: The SPIN Model Checker is used for both teaching software verification techniques, and for validating large scale applications. The growing number of users has created a need for a more comprehensive user guide and a standard reference manual that describes the most recent version of the tool. This book fills that need. SPIN is used in over 40 countries. The offical SPIN web site, spinroot.com receives between 2500 and 3000 hits per day. It has been estimated that up to three-quarters of the $400 billion spent annually to hire programmers in the United States is ultimately spent on debugging

2,530 citations

Book
01 Jan 1997
TL;DR: The goal of this book is to provide a textbook which presents the basics ofTree automata and several variants of tree automata which have been devised for applications in the aforementioned domains.
Abstract: CONTENTS 7 Acknowledgments Many people gave substantial suggestions to improve the contents of this book. These are, in alphabetic order, Introduction During the past few years, several of us have been asked many times about references on finite tree automata. On one hand, this is the witness of the liveness of this field. On the other hand, it was difficult to answer. Besides several excellent survey chapters on more specific topics, there is only one monograph devoted to tree automata by Gécseg and Steinby. Unfortunately, it is now impossible to find a copy of it and a lot of work has been done on tree automata since the publication of this book. Actually using tree automata has proved to be a powerful approach to simplify and extend previously known results, and also to find new results. For instance recent works use tree automata for application in abstract interpretation using set constraints, rewriting, automated theorem proving and program verification, databases and XML schema languages. Tree automata have been designed a long time ago in the context of circuit verification. Many famous researchers contributed to this school which was headed by A. Church in the late 50's and the early 60's: B. Trakhtenbrot, Many new ideas came out of this program. For instance the connections between automata and logic. Tree automata also appeared first in this framework, following the work of Doner, Thatcher and Wright. In the 70's many new results were established concerning tree automata, which lose a bit their connections with the applications and were studied for their own. In particular, a problem was the very high complexity of decision procedures for the monadic second order logic. Applications of tree automata to program verification revived in the 80's, after the relative failure of automated deduction in this field. It is possible to verify temporal logic formulas (which are particular Monadic Second Order Formulas) on simpler (small) programs. Automata, and in particular tree automata, also appeared as an approximation of programs on which fully automated tools can be used. New results were obtained connecting properties of programs or type systems or rewrite systems with automata. Our goal is to fill in the existing gap and to provide a textbook which presents the basics of tree automata and several variants of tree automata which have been devised for applications in the aforementioned domains. We shall discuss only finite tree automata, and the …

1,492 citations


"Abstraction-Based Malware Analysis ..." refers background in this paper

  • ...Tree automata and tree transducers are defined as usual [12]....

    [...]

Book ChapterDOI
11 Jan 2004
TL;DR: In this paper, a finite-state abstraction of a sequential program with potentially recursive procedures and input from the environment is checked statically whether there are input sequences that can drive the system into "bad/good" executions.
Abstract: Given a finite-state abstraction of a sequential program with potentially recursive procedures and input from the environment, we wish to check statically whether there are input sequences that can drive the system into “bad/good” executions. Pushdown games have been used in recent years for such analyses and there is by now a very rich literature on the subject. (See, e.g., [BS92,Tho95,Wal96,BEM97,Cac02a,CDT02].)

1,144 citations

Journal Article
TL;DR: Given a finite-state abstraction of a sequential program with potentially recursive procedures and input from the environment, whether there are input sequences that can drive the system into “bad/good” executions is checked.
Abstract: Given a finite-state abstraction of a sequential program with potentially recursive procedures and input from the environment, we wish to check statically whether there are input sequences that can drive the system into bad/good executions. Pushdown games have been used in recent years for such analyses and there is by now a very rich literature on the subject. (See, e.g., [BS92,Tho95,Wal96,BEM97,Cac02a,CDT02].) In this paper we use recursive game graphs to model such interprocedural control flow in an open system. These models are intimately related to pushdown systems and pushdown games , but more directly capture the control flow graphs of recursive programs ([AEY01,BGR01,ATM03b]). We describe alternative algorithms for the well-studied problems of determining both reachability and Buchi winning strategies in such games. Our algorithms are based on solutions to second-order data flow equations, generalizing the Datalog rules used in [AEY01] for analysis of recursive state machines. This offers what we feel is a conceptually simpler view of these well-studied problems and provides another example of the close links between the techniques used in program analysis and those of model checking. There are also some technical advantages to the equational approach. Like the approach of Cachat [Cac02a], our solution avoids the necessarily exponential-space blow-up incurred by Walukiewicz's algorithms for pushdown games. However, unlike [Cac02a], our approach does not rely on a representation of the space of winning configurations of a pushdown graph by (alternating) automata. Only minimal sets of exits that can be forced need to be maintained, and this provides the potential for greater space efficiency. In a sense, our algorithms can be viewed as an automaton-free version of the algorithms of [Cac02a].

1,038 citations

Frequently Asked Questions (12)
Q1. What contributions have the authors mentioned in the paper "Abstraction-based malware analysis using rewriting and model checking" ?

The authors propose a formal approach for the detection of high-level malware behaviors. The authors work on unbounded sets of traces, which makes their technique useful not only for dynamic analysis, considering one trace at a time, but also for static analysis, considering a set of traces inferred from a control flow graph. The authors propose a formal approach for the detection of high-level malware behaviors. The authors work on unbounded sets of traces, which makes their technique useful not only for dynamic analysis, considering one trace at a time, but also for static analysis, considering a set of traces inferred from a control flow graph. 

CADP features a verification tool, which allows on-the-fly model checking of formulas expressed in the MCL language, a fragment of the modal mu-calculus extended with data variables, whose FOLTL logic used in this paper is a subset. 

Underpinned by language theory, term rewriting and first-order temporal logic, it allows us to determine whether a program exhibits a high-level behavior. 

An interesting application of static behavior analysis is the audit of programs in high-level technologies, like mobile applications, browser extensions, web page scripts, .NET or Java programs. 

since the captured data must not be invalidated before being leaked, the authors define a behavior pattern λinval (x), which represents such an invalidation. 

The authors describe a functionality by an FOLTL formula, such that traces satisfying this formula are traces carrying out the functionality. 

In general, a behavior is described by a sequence of system calls and recognition uses the formalism of finite state automata [22, 26, 24, 6]. 

In order to address the general intractability of the problem of constructing the normal form trace set for a given program, the authors have identified a property of practical high-level behaviors allowing us to avoid computing normal forms and yielding a linear time detection algorithm. 

The ping behavior pattern in Example 1 is abstracted in traces by inserting the λping symbol after the send action or after the IcmpSendEcho action. 

Their abstraction framework can be used in two scenarios:– Detection of given behaviors: signatures of given high-level behaviors are expressed in terms of abstract functionalities. 

The authors show that this is sufficient, with termination of the set of rules, to ensure that the abstraction relation is realizable by a tree transducer, in other words that it is a rational tree transduction. 

This has motivated yet another approach where a malicious behavior is specified as a combination of high-level actions, in order to be independent from the way these actions are realized and to only consider their effect on a system.