Formal Verification of Smart Contracts: Short Paper

doi:10.1145/2993600.2993611

Short Paper: Formal Veriﬁcation of Smart Contracts

Karthikeyan Bhargavan

2

Antoine Delignat-Lavaud

1

C

´

edric Fournet

1

Anitha Gollamudi

3

Georges Gonthier

1

Nadim Kobeissi

2

Aseem Rastogi

1

Thomas Sibut-Pinote

2

Nikhil Swamy

1

Santiago Zanella-B

´

eguelin

1

Microsoft Research

2

Inria

3

Harvard University

{antdl,fournet,gonthier,aseemr,nswamy,santiago}@microsoft.com

{karthikeyan.bhargavan,nadim.kobeissi,thomas.sibut-pinote}@inria.fr agollamudi@g.harvard.edu

Abstract

Ethereum is a cryptocurrency framework that uses blockchain

technology to provide an open distributed computing plat-

form, called the Ethereum Virtual Machine (EVM). EVM

programs are written in bytecode which operates on a sim-

ple stack machine. Programmers do not usually write EVM

code; instead, they can program in a JavaScript-like lan-

guage called Solidity that compiles to bytecode. Since the

main application of EVM programs is as smart contracts that

manage and transfer digital assets, security is of paramount

importance. However, writing trustworthy smart contracts

can be extremely difﬁcult due to the intricate semantics of

EVM and its openness: both programs and pseudonymous

users can call into the public methods of other programs.

This problem is best illustrated by the recent attack on

TheDAO contract, which allowed roughly $50M USD worth

of Ether to be transferred into the control of an attacker. Re-

covering the funds required a hard fork of the blockchain,

contrary to the code is law premise of the system. In this

paper, we outline a framework to analyze and verify both

the runtime safety and the functional correctness of Solidity

contracts in F

?

, a functional programming language aimed

at program veriﬁcation.

Categories and Subject Descriptors F.3 [F.3.1 Specifying

and Verifying and Reasoning about Programs]

Keywords Ethereum, Solidity, EVM, smart contracts

1. Introduction

The blockchain technology, pioneered by Bitcoin [7] pro-

vides a globally-consistent append-only ledger that does not

rely on a central trusted authority. In Bitcoin, this ledger

records transactions of a virtual currency, which is created

by a process called mining. In the proof-of-work mining

scheme, each node of the network can earn the right to ap-

pend the next block of transactions to the ledger by ﬁnding

a formatted value (which includes all transactions to appear

in the block) whose SHA256 digest is below some difﬁculty

threshold. The system is designed to ensure that blocks are

mined at a constant rate: when too many blocks are submit-

ted too quickly, the difﬁculty increases, thus raising the com-

putational cost of mining.

Ethereum is similarly built on a blockchain based on

proof-of-work; however, its ledger is considerably more ex-

pressive than that of Bitcoin’s: it stores Turing-complete

programs in the form of Ethereum Virtual Machine (EVM)

bytecode, while transactions are construed as function calls

and can carry additional data in the form of arguments. Fur-

thermore, contracts may also use non-volatile storage and

log events, both of which are recorded in the ledger.

The initiator of a transaction pays a fee for its execution

measured in units of gas. The miner who manages to ap-

pend a block including the transaction gets to claim the fee

converted to Ether at a speciﬁed gas price. Some operations

are more expensive than others: for instance, writing to stor-

age and initiating a transaction is four orders of magnitude

more expensive than an arithmetic operation on stack val-

ues. Therefore, Ethereum can be thought of as a distributed

computing platform where anyone can run code by paying

for the associated gas charges.

The integrity of the system relies on the honesty of a

majority of miners: a miner may try to cheat by not running

the program, or running it incorrectly, but honest miners will

reject the block and fork the chain. Since the longest chain is

the one that is considered valid, miners are incentivized not

to cheat and to verify that others do as well, since their block

reward may be lost unless malicious miners can supply the

majority of new blocks to the network.

While Ethereum’s adoption has led to smart contracts

managing millions of dollars in currency, the security of

these contracts has become highly sensitive. For instance,

a variant of a well-documented reentrancy attack was re-

cently exploited in TheDAO [2], a contract that implements

a decentralized autonomous venture capital fund, leading to

the theft of more than $50M worth of Ether, and raising the

question of whether similar bugs could be found by static

analysis [6].

In this paper, we outline a framework to analyze and

formally verify Ethereum smart contracts using F

?

[9], a

functional programming language aimed at program veriﬁ-

cation. Such contracts are generally written in Solidity [3],

1 2016/8/11

a JavaScript-like language, and compiled down to bytecode

for the EVM. We consider the Solidity compiler as untrusted

and develop a language-based approach for verifying smart

contracts. Namely, we present two tools based on F

?

:

Solidity

?

a tool to translate Solidity program to shallow-

embedded F

?

programs (Section 2).

EVM

?

a decompiler for EVM bytecode that produces

equivalent shallow-embedded F

?

programs that operate

on a simpler machine without stack (Section 3).

These tools enable three different forms of veriﬁcation:

1. Given a Solidity program, we can use Solidity

?

to trans-

late it to F

?

and verify at the source level functional cor-

rectness speciﬁcations such as contract invariants, as well

as safety with respect to runtime errors.

2. Given an EVM bytecode, we can use EVM

?

to decompile

it and analyze low-level properties, such as bounds on the

amount of gas consumed by calls.

3. Given a Solidity program and allegedly functionally

equivalent EVM bytecode, we can verify their equiva-

lence by translating each into F

?

. Thus, we can check the

correctness of the output of the Solidity compiler on a

case-by-case basis using relational reasoning [1].

1.1 Architecture of the Framework

Solidity*

Subset-of-F*

EVM*

Subset-of-F*

Verified- Transla t ion

Verified- Decompilation

✅

Functional- Correctness

Runtime-Safety

F*

Solidity

Source-Code

EVM

Compiled-Bytecode

Verify

Equivalence-

Proof

Figure 1. Overview of the architecture of our framework

Our smart contract veriﬁcation framework is a two-

pronged approach (Figure 1) based on F

?

. F

?

comes with

a type system that includes dependent types and monadic

effects, which we apply to generate automated queries to

statically verify properties on EVM bytecode and Solidity

sources.

While it is clearly favorable to obtain both the Solidity

source code and EVM bytecode of a target smart contract,

we design our architecture with the assumption that the veri-

ﬁer may only have the bytecode. At the moment of this writ-

ing, only 396 out of 112,802 contracts have their source code

available on http://etherscan.io. Therefore we provide

separate tools for decompiling EVM bytecode (EVM

?

), and

analyzing Solidity source code (Solidity

?

).

hsolidityi ::= (hcontracti)*

hcontracti ::= ‘contract ’ @identiﬁer ‘{’ (hsti)*‘}’

hsti ::= htypedef i | hstatedef i | hmethodi

htypedef i ::= ‘struct ’ @identiﬁer ‘ {’ (htypei @identiﬁer ‘;’)* ‘}’

htypei ::= ‘uint’ | ‘address’ | ‘bool’

| ‘mapping (’ htypei ‘=>’ htypei ‘)’

| @identiﬁer

hstatedef i ::= htypei @identiﬁer

hmethodi ::= ‘function’ (@identiﬁer)?‘()’ (hqualiﬁeri)* ‘{’

(‘var’ (@identiﬁer (‘=’ hexpressioni)? ‘,’)+)?

(hstatementi ‘;’)* ‘}’

hqualiﬁeri ::= ‘private’ | ‘public’ | ‘internal’

| ‘returns (’ htypei (@identiﬁer)? ‘)’

hstatementi ::= ε

| htypei @identiﬁer (‘=’ hexpressioni)? (*decl*)

| ‘if(’ hexpressioni ‘)’ hstatementi

(‘else’ hstatementi)?

| ‘{’ (hstatementi ‘;’)* ‘}’

| ‘return’ (hexpressioni)?

| ‘throw’

| hexpressioni

hexpressioni ::= hliterali

| hlhs expressioni ‘(’ (hexpressioni ‘,’)* ‘)’

| hexpressioni hbinopi hexpressioni

| hunopi hexpressioni

| hlhs expressioni ‘=’ hexpressioni

| hlhs expressioni

hlhs expressioni ::=

| @identiﬁer

| hlhs expressioni ‘[’ hlhs expressioni‘]’

| hlhs expressioni ‘.’ @identiﬁer

hliterali ::= hfunctioni

| ‘{’ ( @identiﬁer ‘:’ hexpressioni ‘,’)* ‘}’

| ‘[’ (hexpressioni ‘,’)* ‘]’

| @number | @address | @boolean

hbinopi ::= ‘+’ | ‘-’ | ‘*’ | ‘/’ | ‘%’

| ‘&&’ | ‘||’ | ‘==’ | ‘!=’ | ‘>’ | ‘<’ | ‘>=’ | ‘<=’

hunopi ::= ‘+’ | ‘-’ | ‘!’

Figure 2. Syntax of the translated Solidity subset

2. Translating Solidity to F

?

In the spirit of previous work on type-based analysis of

JavaScript programs [8], we advocate an approach where the

programmer can verify high-level goals of a contract using

F

?

. In this section, we present a tool to translate Solidity to

F

?

, and a simple automated analysis of extracted F

?

con-

tracts.

Solidity programs consist of a number of contract decla-

rations. Once compiled to EVM, contracts are installed us-

ing a special kind of account-creating transaction, which al-

locates an address to the contract. Unlike Bitcoin, where an

2 2016/8/11

address is the hash of the public key of an account, Ethereum

addresses can refer indistinguishably to a contract or a user

public key. Similarly, there is no distinction between trans-

actions and method calls: when sending Ether to a contract,

it will implicitly call the fallback function (the unnamed

method of the Solidity contract). In fact, compiled contracts

in the blockchain consist of a single entry point that de-

cides depending on the incoming transaction which method

code to invoke. The methods of a Solidity contract have

access to ambient global variables that contain information

about the contract (such as the balance in this.balance),

the transaction used to invoke the contract’s method (such

as the source address in msg.sender and the amount of

ether sent in msg.value), or the block in which the invo-

cation transaction is mined (such as the miner’s timestamp

in block.timestamp).

In this exploratory work, we consider a restricted subset

of Solidity, shown in Figure 2. Notably, the fragment we con-

sider does not include loops. The three main types of decla-

rations within a contract are type declarations, property dec-

larations and methods. Type declarations consist of C-like

structs and enums, and mappings (associative arrays imple-

mented as hash tables). Although properties and methods are

reminiscent of object oriented programming, it is somewhat

a confusing analogy: contracts are “instantiated” by the ac-

count creating transaction; this will allocate the properties

of the contract in the global storage and call the construc-

tor (the method with the same name as the contract). De-

spite the C++/Java-like access modiﬁers, all properties of a

contract are stored in the Ethereum ledger, and as such, the

internal state of all contracts is completely public. Methods

are compiled in EVM into a single function that runs when

a transaction is sent to the contract’s address. This transac-

tion handler matches the requested method signature with

the list of non-internal methods, and calls the relevant one.

If no match is found, a fallback handler is called instead (in

Solidity, this is the unnamed method).

2.1 Translation to F

?

We perform a shallow translation of Solidity to F

?

as fol-

lows:

1. contracts are translated to F

?

modules;

2. type declarations are translated to type declarations:

enums become sums of nullary data constructors, structs

become records, and mappings become F

?

maps;

3. all contract properties are packaged together within a

state record, where each property is a reference;

4. each method gets translated to a function, no defunction-

alization is required since Solidity is ﬁrst-order only;

5. we rewrite if statements that have a continuation de-

pending on whether one branch ends in return or throw

(moving the continuation in the other branch) or not (we

then duplicate the continuation in each branch).

6. to translate assignments, we keep an environment of lo-

cal, state, and ambient global variable names: local vari-

able declarations and assignments are translated to let

bindings; globals are replaced with library calls; state

properties are replaced with update on the state type;

7. built-in method calls (e.g.address.send()) are re-

placed by library calls.

We show a minimalistic Solidity contract and its F

?

trans-

lation in Figure 3. The only type annotation added by the

translation is a custom Eth effect on the contract’s methods,

which we describe in Section 2.2. The Solidity library de-

ﬁnes the mapping type (a reference to a map) and the as-

sociated functions update map and lookup. Furthermore,

it deﬁnes the numeric types used in Solidity, which are un-

signed 256-bit by default.

2.2 An effect for detecting vulnerable patterns

The example in Figure 3 captures two major pitfalls of So-

lidity programming. First, many contracts fail to realize that

send and its variants are not guaranteed to succeed (send

returns a bool). This is highly surprising for Solidity pro-

grammers because all other runtime errors (such as run-

ning out of gas or call stack overﬂows) trigger an exception.

Such exceptions (including the ones triggered by throw) re-

vert all transactions and all changes to the contract’s prop-

erties. This is not the case of send: the programmer needs

to undo side effects manually when it returns false, e.g.

if(!addr.send(x)) throw.

The other problem illustrated in MyBank is reentrancy.

Since transactions are also method calls, calling send is a

transfer of program control. Consider the following mali-

cious contract:

contract Malicious {

uint balance;

MyBank bank = MyBank(0xdeadbeef8badf00d...);

function Malicious(){

balance = msg.value;

bank.Deposit.value(balance)();

bank.Withdraw.value(0)(balance); // forwarding gas

}

function (){ // fallback function

bank.Withdraw.value(0)(balance);

}

It attacks the Withdraw method of MyBank by calling recur-

sively into it at the point where it does its send. The if

condition in the second Withdraw call is still satisﬁed (be-

cause the balances are updated after send, and there is no

check that it was successful). Even though the send in the

second call to Withdraw is guaranteed to fail (because un-

like method calls, send allocates only 2300 gas for the call),

it still corrupts the balance by decreasing twice, causing an

unsigned integer underﬂow. After corrupting the balance,

3 2016/8/11

contract MyBank {

mapping (address ⇒ uint) balances;

function Deposit() {

balances[msg.sender] += msg.value;

}

function Withdraw(uint amount) {

if(balances[msg.sender] ≥ amount) {

msg.sender.send(amount);

balances[msg.sender] −= amount;

}

function Balance() constant returns(uint) {

return balances[msg.sender];

}

module MyBank

open Solidity

type state = { balances: mapping address uint; }

val store : state = {balances = ref empty map}

let deposit () : Eth unit =

update map store.balances msg.sender

(add (lookup store.balances msg.sender) msg.value)

let withdraw (amount:uint) : Eth unit =

if (ge (lookup store.balances msg.sender) amount) then

send msg.sender amount;

update map store.balances msg.sender

(sub (lookup store.balances msg.sender) amount)

let balance () : Eth uint =

lookup store.balances msg.sender

Figure 3. A simple bank contract in Solidity translated to F

?

the malicious contract can freely withdraw any remaining

funds in the bank.

Using the effect system of F

?

, we now show how to detect

some vulnerable patterns such as unchecked send results in

translated contracts. The base construction is a combined

exception and state monad (see [9] for details) with the

following signature:

EST (a:Type) = h0:heap // input heap

→ send failed:bool // send failure ﬂag

→ Tot (option (a ∗ heap) // result and new heap, or exception

∗ bool) // new failure ﬂag

return (a:Type) (x:a) : EST a =

fun h0 b0 → Some (x, h0), b0

bind (a:Type) (b:Type) (f:EST a) (g:a → EST b) : EST b =

fun h0 b0 →

match f h0 b0 with

| None, b1 → None, b1 // exception in f: no output heap

| Some (x, h1), b1 → g x h1 b1 // run g, carry failure ﬂag

The monad carries a send failure ﬂag to record

whether or not a send() or external call may have failed

so far. It is possible to enforce several different styles based

on this monad; for instance, one may want to enforce that

a contract always throws when a send fails. As an example,

we deﬁned the following effect based on EST:

eﬀect Eth (a:Type) = EST a

(fun b0 → not b0) // Start in non-failsure state

(fun h0 b0 r b1 →

// What to do when a send failed

b1 =⇒ (match r with | None → True // exception

| Some ( , h1) → no mods h0 h1)) // no writes

The standard library then deﬁnes the post-condition

of throw to fun h0 b0 r b1 → b0=b1 ∧ is None r and the post-

condition of send to fun h0 b0 r b1 → r == Some (b1, h0).

Simply by typechecking extracted methods in the Eth

effect, we can detect dangerous patterns such as the send()

followed by an unconditional write to the balances table

in MyBank. Note that the safety condition imposed by Eth

is not sufﬁcient to prevent reentrency attacks, as there is no

guarantee that the state modiﬁctions before and after send

preserve the functional invariant of the contract. Therefore,

this analysis is useful for detecting dangerous patterns and

enforcing a failure handling style, but it doesn’t replace a

manual F

?

proof that the contract is correct.

Evaluation Despite the limitations of our tool (in particu-

lar, it doesn’t support many syntactic features of Solidity),

we are able to translate and typecheck 46 out of the 396

contracts we collected on https://etherscan.io. Out of

these, only a handful are valid in the Eth effect. This is a

clear sign that a large scale analysis of published contract is

likely to uncover widespread vulnerabilities; we leave such

analysis to future work.

3. Decompiling EVM Bytecode to F

?

In this section we present EVM

?

, a decompiler for EVM

bytecode that we use to analyze contracts for which the

Solidity source is unavailable (as is the case for the majority

of live contracts in the Ethereum blockchain), as well as

low-level properties of contracts. A third use case of the

decompiler that we do not further explore in this paper is to

use EVM

?

together with Solidity

?

to check the equivalence

between a Solidity program and the bytecode output by the

Solidity compiler, thus ensuring not only that the compiler

did not introduce bugs, but also that any properties veriﬁed at

the source level are preserved. This equivalence proof could

be done, for instance, using rF

?

[1] a version of F

?

with

relational reﬁnement types.

4 2016/8/11

EVM

?

takes as input the bytecode of a contract as stored

in the blockchain and translates it into a representation in F

?

.

The decompiler performs a stack analysis to identify jump

destinations in the program and detect stack under- and over-

ﬂows. The result is an equivalent F

?

program that, morally,

operates on a machine with inﬁnite single-assignment regis-

ters which we translate as let bindings.

The EVM is a stack-based machine with a word size of

256 bits [10]. Bytecode programs have access to a word-

addressed non-volatile storage modeled as a word array, a

word-addressed volatile memory modeled as an array of

bytes, and an append-only non-readable event log. The in-

struction set includes the usual arithmetic and logic opera-

tions (e.g. ADD, XOR), stack and memory operations (e.g.

PUSH, POP, MSTORE, MLOAD, SSTORE, SLOAD), con-

trol ﬂow operations (e.g. JUMP, CALL, RETURN), instruc-

tions to inspect the environment and blockchain (e.g. BAL-

ANCE, TIMESTAMP), as well as specialized instructions

unique to EVM (e.g. SHA3, CREATE, SUICIDE). As a pe-

culiarity, the instruction JUMPDEST is used to mark valid

jump destinations in the code section of a contract, but be-

haves as a NOP at runtime. This is convenient for identifying

potential jump destinations during decompilation, as jump-

ing to an invalid address halts execution.

The static analysis done by EVM

?

marks stack cells as

either of 3 types: 1. Void for initialized cells, 2. Local for

results of operations, and 3. Constant for immediate argu-

ments of PUSH operations The analysis identiﬁes jumpable

addresses and blocks, contiguous sections of code starting at

a jumpable address and ending in a halting or control ﬂow

instruction (we treat branches of conditionals as indepen-

dent blocks). A block summary consists of the address of

its entry point, its ﬁnal instruction, and a representation of

the initial and ﬁnal stacks summarizing the block effects on

the stack. An entry point may be either the 0 address, an ad-

dress marked with JUMPDEST, an immediate argument of

a PUSH used in a jump, or a fall-through address of a con-

ditional.

As a result of the static analysis, EVM

?

emits F

?

code,

using variables bound in let bindings instead of stack cells.

Many instructions can be eliminated in this way; the analysis

keeps an accurate account of the offsets of instructions in

the remaining code. Because the instructions eliminated may

incur gas charges, we keep track of the fuel consumption by

instrumenting the code with calls to burn, a library function

whose sole effect is to accumulate gas charges. Figure 4

shows the F

?

code decompiled from the Balance method

of the MyBank contract in Fig. 3.

We wrote a reference cost model for bytcode operations

that can be used to prove bounds on the gas consumption of

contract methods. As an example, Fig. 5 shows a type anno-

tation for the entry point of the MyBank contract decompiled

to F

?

that proves that a method call to the Balance function

will consume at most 390 units of gas.

let x 29 = pow [0x02uy] [0xA0uy] in

let x 30 = sub x 29 [0x01uy] in

let x 31 = get caller () in

let x 32 = land x 31 x 30 in

burn 17 (∗ opcodes: SUB, CALLER, AND, PUSH1 00, SWAP1, DUP2 ∗);

mstore [0x00uy] x 32;

burn 9 (∗ opcodes: PUSH1 20, DUP2, DUP2 ∗);

mstore [0x20uy] [0x00uy];

burn 9 (∗ opcodes: PUSH1 40, SWAP1, SWAP2 ∗);

let x 33 = sha3 [0x00uy] [0x40uy] in

let x 34 = sload x 33 in

burn 9 (∗ opcodes: PUSH1 60, SWAP1, DUP2 ∗);

mstore [0x60uy] x 34;

loadLocal [0x60uy] [0x20uy] (∗ returned value ∗)

Figure 4. Decompiled version of the Balance method of

the MyBank contract, instrumented with gas consumption.

val myBank: unit → ST word

(requires (fun h → sel h mem = 0 ∧ sel h gas = 0 ∧

nonZero (eqw

(div (get calldataload [0x00uy]) (pow [0x02uy] [0xE0uy]))

[0xF8uy; 0xF8uy; 0xA9uy; 0x12uy]))) // hash of Balance method

(ensures (fun h0 h1 → sel h1 gas ≤ 390))

let myBank () =

burn 6 (∗ opcodes: PUSH1 60, PUSH1 40 ∗);

mstore [0x40uy] [0x60uy];

...

let x 28 = eqw [0xF8uy; 0xF8uy; 0xA9uy; 0x12uy] x 3 in

burn 10 (∗ opcode JUMPI ∗);

if nonZero x 28 then

begin (∗ offset: 165 ∗)

// decompiled code of Balance method

end

Figure 5. A proof of a bound on the gas consumed by a call

to the Balance method of MyBank.

4. Conclusion

Our preliminary experiments in using F

?

to verify smart con-

tracts show that the type and effect system of F

?

is ﬂexible

enough to express and prove non-trivial properties. In par-

allel, Luu et al. [6] used symbolic execution to detect ﬂaws

in EVM bytecode programs, and an experimental Why3 [5]

formal veriﬁcation backend is now available from the Solid-

ity web IDE [4].

The examples we considered are simple enough that we

did not have to write a full implementation of EVM byte-

code. We plan to complete a veriﬁed reference implementa-

tion and use it to verify that the output of the Solidity com-

piler is functionally equivalent to the sources.

We implemented EVM

?

and Solidity

?

in OCaml. It would

be interesting to implement and verify parts of these tools

using F

?

instead. For instance, we could prove that the stack

and control ﬂow analysis done in EVM

?

is sound with re-

spect to a stack machine semantics.

5 2016/8/11

Formal Verification of Smart Contracts: Short Paper

Citations

Cites background or methods from "Formal Verification of Smart Contra..."

Cites background from "Formal Verification of Smart Contra..."

References

"Formal Verification of Smart Contra..." refers background or methods in this paper

"Formal Verification of Smart Contra..." refers methods in this paper

"Formal Verification of Smart Contra..." refers background or methods in this paper

Related Papers (5)