scispace - formally typeset
Open AccessBook ChapterDOI

The Lean Theorem Prover (System Description)

Reads0
Chats0
TLDR
Lean is a new open source theorem prover being developed at Microsoft Research and Carnegie Mellon University, with a small trusted kernel based on dependent type theory.
Abstract
Lean is a new open source theorem prover being developed at Microsoft Research and Carnegie Mellon University, with a small trusted kernel based on dependent type theory. It aims to bridge the gap between interactive and automated theorem proving, by situating automated tools and methods in a framework that supports user interaction and the construction of fully specified axiomatic proofs. Lean is an ongoing and long-term effort, but it already provides many useful components, integrated development environments, and a rich API which can be used to embed it into other systems. It is currently being used to formalize category theory, homotopy type theory, and abstract algebra. We describe the project goals, system architecture, and main features, and we discuss applications and continuing work.

read more

Content maybe subject to copyright    Report

The Lean Theorem Prover
(system description)
Leonardo de Moura
1
, Soonho Kong
2
, Jeremy Avigad
2
,
Floris van Doorn
2
and Jakob von Raumer
2*
1
Microsoft Research
leonardo@microsoft.com
2
Carnegie Mellon University
soonhok@cs.cmu.edu, {avigad, fpv, javra}@andrew.cmu.edu
Abstract. Lean is a new open source theorem prover being developed
at Microsoft Research and Carnegie Mellon University, with a small
trusted kernel based on dependent type theory. It aims to bridge the
gap between interactive and automated theorem proving, by situating
automated tools and methods in a framework that supports user inter-
action and the construction of fully specified axiomatic proofs. Lean is an
ongoing and long-term effort, but it already provides many useful com-
ponents, integrated development environments, and a rich API which
can be used to embed it into other systems. It is currently being used to
formalize category theory, homotopy type theory, and abstract algebra.
We describe the project goals, system architecture, and main features,
and we discuss applications and continuing work.
1 Introduction
Formal verification involves the use of logical and computational methods to
establish claims that are expressed in precise mathematical terms. These can in-
clude ordinary mathematical theorems, as well as claims that pieces of hardware
or software, network protocols, and mechanical and hybrid systems meet their
specifications. In practice, there is not a sharp distinction between verifying a
piece of mathematics and verifying the correctness of a system: formal verifica-
tion requires describing hardware and software systems in mathematical terms,
at which point establishing claims as to their correctness becomes a form of the-
orem proving. Conversely, the proof of a mathematical theorem may require a
lengthy computation, in which case verifying the truth of the theorem requires
verifying that the computation does what it is supposed to do.
Automated theorem proving focuses on the “finding” aspect, and strives for
power and efficiency, often at the expense of guaranteed soundness. Such sys-
tems can have bugs, and typically there is little more than the author’s good
intentions to guarantee that the results they deliver are correct. In contrast, in-
teractive theorem proving focuses on the verification aspect of theorem proving,
*
Visiting student from Karlsruhe Institute of Technology, sponsored by the Baden-
Württemberg-Stipendium.

2 L. de Moura, S. Kong, J. Avigad, F. van Doorn, J. von Raumer
requiring that every claim is supporting by a proof in a suitable axiomatic foun-
dation. This sets a very high standard: every rule of inference and every step of
a calculation has to be justified by appealing to prior definitions and theorems,
all the way down to basic axioms and rules. In fact, most such systems provide
fully elaborated proof objects that can be communicated to other systems and
checked independently. Constructing such proofs typically requires much more
input and interaction from users, but it allows us to obtain deeper and more
complex proofs.
The Lean Theorem Prover
3
aims to bridge the gap between interactive and
automated theorem proving, by situating automated tools and methods in a
framework that supports user interaction and the construction of fully speci-
fied axiomatic proofs. The goal is to support both mathematical reasoning and
reasoning about complex systems, and to verify claims in both domains. Lean
is released under the Apache 2.0 license, a permissive open source license that
permits others to use and extend the code and mathematical libraries freely.
At Carnegie Mellon University, Lean is already being used to formalize cate-
gory theory, homotopy type theory, and abstract algebra. Lean is an ongoing,
long-term effort, and much of the potential for automation will be realized only
gradually over time.
Lean’s small, trusted kernel is based on dependent type theory, with sev-
eral configuration options. It can be instantiated with an impredicative sort
or propositions, Prop, to provide a version of the Calculus of Inductive Con-
structions (CIC) [5,6]. Moreover, Prop can be marked proof-irrelevant if desired.
Without an impredicative Prop, the kernel implements a version of Martin-Löf
type theory [12,23]. In both cases, Lean provides a sequence of non-cumulative
type universes, with universe polymorphism.
Lean is meant to be used both as a standalone system and as a software
library. SMT solvers can use the Lean API to create proof terms that can be
independently checked. The API can be used to export Lean proofs to other
systems based on similar foundations (e.g., Coq [3] and Matita [1]). Lean can
also be used as an efficient proof checker, and definitions and theorems can be
checked in parallel using all available cores on the host machine. When used as a
proof assistant, Lean provides a powerful elaborator that can handle higher-order
unification, definitional reductions, coercions, overloading, and type classes, in
an integrated way. Lean allows users to provide definitions and theorems using a
declarative style resembling Mizar [20] and Isabelle/Isar [24]. Lean also provides
tactics as an alternative (more imperative) approach to constructing (proof)
terms as in Coq, HOL-Light [10], Isabelle [17] and PVS [19]. Moreover, the
declarative and tactic styles can be freely mixed together.
Lean includes two libraries of formally verified mathematics and basic data-
structures. The standard library uses a kernel instantiated with an impredicative
and proof-irrelevant Prop. This library supports constructive and classical users,
and the following axioms can be optionally used: propositional completeness,
function extensionality, and strong indefinite description. Lean also contains a
3
http://leanprover.github.io

The Lean Theorem Prover (system description) 3
library tailored for Homotopy Type Theory (HoTT) [23], using a predicative and
proof relevant instantiation of the kernel. Future plans to support HoTT include
a higher inductive types (HITs) and sorts for fibrant type universes.
2 The Kernel
Lean’s trusted kernel is implemented in two layers. The first layer contains the
type checker and APIs for creating and manipulating terms, declarations, and
the environment. This layer consists of 6k lines of C++ code. The second layer
provides additional components such as inductive families (700 additional lines
of code). When the kernel is instantiated, one selects which of these components
should be used. We have tried to maintain the number of objects manipulated by
the kernel to a minimum: the list consists of terms, universe terms, declarations,
and environments. Identifiers are encoded as hierarchical names [14], i.e. lists of
strings/numbers, such as x.y.1.
Terms. The term language is a dependent λ-calculus. A term can be a free vari-
able (also called a local constant), a bound variable, a constant (parameterized
by universe terms), a function application f t, a lambda abstraction λx : A, t, a
function space Πx : A, B, a sort Type u (where u is a universe term), a metavari-
able, or a macro
m
[
t
1
, . . . , t
n
]
.
Sorts. The sorts Type u are used to encode the infinite sequence of universes
Type
0
, Type
1
, Type
2
, . . . An explicit universe term is of the form s
k
z (for k 0),
where z denotes the base universe zero, and s denotes the successor universe
operator. We use Type z to represent Prop in kernel instantiations that support
it. To support universe polymorphism, we also have universe parameters (an
identifier), and the operators max u
1
u
2
and imax u
1
u
2
. The universe term
max u
1
u
2
denotes the universe that is greater than or equal to u
1
and u
2
, and
is equal to one of them. The universe term imax u
1
u
2
denotes the universe zero
if u
2
denotes zero, and max u
1
u
2
otherwise. The operator imax is only needed
for kernel instantiations that have an impredicative Prop. In these kernels, given
A : Type u
1
and B : Type u
2
, the type of Πx : A, B is Type (imax u
1
u
2
).
The imax operator makes sure that Πx : A, B is a proposition when B is a
proposition.
Free and bound variables. Free variables have a unique identifier and a type,
and bound variables are just a number (a de Bruijn index). By storing the type
with each free variable, we do not need to carry around contexts in the type
checker and normalizer. As described in [14], this representation simplifies the
implementation considerably, and it also minimizes the number of places where
calculations with de Bruijn indices must be performed.
Metavariables. In Lean, users may provide partial constructions, i.e., construc-
tions containing “holes” that must be filled by the system. These holes (also

4 L. de Moura, S. Kong, J. Avigad, F. van Doorn, J. von Raumer
known as placeholders) are internally represented as metavariables that must be
replaced by closed terms that are synthesized by the system. Since only closed
terms can be assigned to metavariables, a metavariable that occurs in a context
records the parameters it depends on. For example, we encode a hole in the con-
text (x : nat) (y : bool) as ?m x y, where ?m is a fresh metavariable. As with free
variables, every metavariable has a type. We also have universe metavariables
to represent “holes” in universe terms.
Macros. Macros, which can be viewed as procedural attachments, provide more
efficient ways of storing and working with terms. Each macro must provide
two procedures, namely, type inference and macro expansion. The type infer-
ence procedure minfer is responsible for computing the type of a macro ap-
plication m[t
1
, . . . , t
n
], and the macro expansion procedure mexpand must ex-
pand/eliminate the macro application. The point is that, given a term t of
the form m[t
1
, . . . , t
n
], minfer(t) may be able to infer the type of mexpand(t)
more efficiently than the kernel type checker, and t may be more compact than
mexpand(t).
We also use macros to store annotations and hints used by automation such
as rewriters and decision procedures. Each macro has a trust level represented
by a natural number. When the Lean kernel is initialized, the user must provide
a trust level , and the kernel then refuses any term that contains a macro with
trust level greater than or equal to . A kernel initialized with trust level zero
does not accept any macro, forcing any macro occurring in declarations to be
expanded. The idea is that macros are not part of the trusted code base, but
users may choose to trust them “most of the time” when formalizing a system
and/or theorem. Note that an independent type checker for Lean does not need
to implement support for metavariables or macros.
Environments. An environment stores a sequence of declarations. The kernel
currently supports three different kinds of declarations: axioms, definitions and
inductive families. Each has a unique identifier, and can be parameterized by a
sequence of universe parameters. Every axiom has a type, and every definition
has a type and a value.
A constant in Lean is just a reference to a declaration. The main task of
the kernel is to type check these declarations and refuse type incorrect ones. The
kernel does not allow declarations containing metavariables and/or free variables
to be added to an environment. Environments are never destructively updated,
and are implemented using pure red-black trees.
Inductive families. Inductive families [8] are a form of simultaneously defined
collection of algebraic data-structures which can be parameterized over values
as well as types. Each inductive family definitions produces introduction rules,
elimination rules, and computational rules as described in [8]. As in the CIC,
the instances of an inductive family can be in Prop, and special rules are used
to make sure the eliminator is compatible with proof irrelevance. Finally, when
proof irrelevance is enabled in the kernel, axiom K [22] “computes” in Lean (a

The Lean Theorem Prover (system description) 5
similar feature is available in Agda [18]). In contrast to Coq, Lean does not have
fix-point expressions, match expressions, or a termination checker in the kernel.
Instead, recursive definitions and pattern matching are compiled into eliminators
outside of the kernel.
The type checker. To minimize the amount of code duplication, the type checker
plays two roles. First, it is used to validate any declaration sent to the kernel
before adding it to an environment. Second, it is used by elaboration procedures
that try to synthesize holes in terms provided by the user. Consequently, the type
checker is capable of processing terms containing metavariables. When a term
contains metavariables, the type checker may produce unification constraints, in
which case the resultant type is correct only if the unification constraints can be
resolved.
3 Elaboration
The task of the elaborator is to convert a partially specified expression into a fully
specified, type-correct term. When typing in a term, users can leave arguments
implicit by entering them with an underscore (i.e., a “hole”), leaving it to the
elaborator to infer a suitable value. One can also mark arguments implicit by
putting them in curly brackets when defining a function, to indicate that they
should generally be inferred rather than entered explicitly. For example, the
standard library defines the identity function as:
definition id {A : Type} (a : A) : A := a
As a result, the user can write id a rather than id A a. It is fairly routine to
infer the type A given a : A. Often the elaborator needs to infer an element of
a Π-type, which constitutes a higher-order problem. For example, if e : a = b
is a proof of the equality of two terms of some type A, and H : P is a proof of
some expression involving a, the term subst e H denotes a proof of the result
of replacing some or all the occurrences of a in P with b. Here not just the type
A is inferred, but also an expression C : A Prop denoting the context for the
substitution, that is, the expression with the property that C a “reduces” to P.
Such expressions can be ambiguous. For example, if H has type R (f a a) a,
then with subst e H the user may have in mind R (f b b) b or R (f a b)
a among other interpretations, and the elaborator has to rely on context and a
backtracking search to find an interpretation that fits. Similar issues arise with
proofs by induction, which require the system to infer an induction predicate.
The elaborator should also respect the computational interpretation of terms.
It should recognize the equivalence of terms (λx, t)s and t[s/x] under beta
reduction, as well as (s, t).1 and s under the reduction rule for pairs. (Terms
that are equivalent modulo such reductions are said to be definitionally equal.)
Unfolding definitions and reducing projections is especially crucial when working
with algebraic structures, where many basic expressions cannot even be seen to
be type correct without carrying out such reductions.

Citations
More filters
Journal ArticleDOI

Hammer for Coq: Automation for Dependent Type Theory

TL;DR: An architecture of a full hammer for dependent type theory together with its implementation for the Coq proof assistant is presented and 40.8% of the theorems can be proved in a push-button mode in about 40 s of real time on a 8-CPU system.
Proceedings ArticleDOI

The Lean mathematical library

TL;DR: Mathlib as discussed by the authors is a community-driven effort to build a unified library of mathematics formalized in the Lean proof assistant, which is distinguished by dependently typed foundations, focus on classical mathematics, extensive hierarchy of structures, use of large and small-scale automation, and distributed organization.
Journal ArticleDOI

Leveraging rust types for modular specification and verification

TL;DR: This paper presents a novel verification technique that leverages Rust's type system to greatly simplify the specification and verification of system software written in Rust, and enables a new kind of verification tool, with the potential to impact a wide audience and allow the Rust community to benefit from state-of-the-art verification techniques.
Journal ArticleDOI

A metaprogramming framework for formal verification

TL;DR: The metaprogramming framework currently used in Lean, an interactive theorem prover based on dependent type theory, is described, which extends Lean's object language with an API to some of Lean's internal structures and procedures, and provides ways of reflecting object-level expressions into the metalanguage.
References
More filters
Book

Isabelle/HOL: A Proof Assistant for Higher-Order Logic

TL;DR: This presentation discusses Functional Programming in HOL, which aims to provide students with an understanding of the programming language through the lens of Haskell.
Book

Intuitionistic type theory

TL;DR: These lectures were given in Padova and Munich later in the same year as part of the meeting on Konstruktive Mengenlehre und Typentheorie which was organized in Munich by Prof. Helmut Schwichtenberg.
Journal ArticleDOI

The calculus of constructions

TL;DR: In this article, the authors propose a method to solve the problem of homonymity in homonymization, i.e., homonym-of-subjects-with-objectivity.

The Coq proof assistant : reference manual, version 6.1

TL;DR: Coq V6.1 is a proof assistant based on a higher-order logic allowing powerful definitions of functions and is available by anonymous ftp at ftp.ens-lyon.fr/INRIA/Projects/coq/V 6.1.
Frequently Asked Questions (7)
Q1. What are the contributions in "The lean theorem prover (system description)" ?

The authors describe the project goals, system architecture, and main features, and they discuss applications and continuing work. 

When used as a proof assistant, Lean provides a powerful elaborator that can handle higher-order unification, definitional reductions, coercions, overloading, and type classes, in an integrated way. 

Lean’s elaborator also supports ad-hoc overloading; for example, the authors can use notation a + b for addition on the natural numbers, integers, and additive groups simultaneously. 

Lean also contains a 3 http://leanprover.github.iolibrary tailored for Homotopy Type Theory (HoTT) [23], using a predicative and proof relevant instantiation of the kernel. 

show p, from t does nothing more thanannotate t with its expected type p. Lean also provides alternative Mizar/Isarinspired syntax for lambda abstractions: assume H : p, t and take x : A, t. Calculational proofs, which begin with the keyword calc, are a convenient notation for chaining intermediate results that are meant to be composed by basic principles such as the transitivity of equality. 

This sets a very high standard: every rule of inference and every step of a calculation has to be justified by appealing to prior definitions and theorems, all the way down to basic axioms and rules. 

It is fairly routine to infer the type A given a : A. Often the elaborator needs to infer an element of a Π-type, which constitutes a higher-order problem.