scispace - formally typeset
Open AccessJournal ArticleDOI

A Solution to Wiehagen's Thesis

Timo Kötzing
- 01 Apr 2017 - 
- Vol. 60, Iss: 3, pp 498-520
Reads0
Chats0
TLDR
This work proves Wiehagen’s Thesis in Inductive Inference for a wide range of learning criteria, including many popular criteria from the literature, and shows the limitations of the thesis by giving four learning criteria for which the thesis does not hold.
Abstract
Wiehagen's Thesis in Inductive Inference (1991) essentially states that, for each learning criterion, learning can be done in a normalized, enumerative way. The thesis was not a formal statement and thus did not allow for a formal proof, but support was given by examples of a number of different learning criteria that can be learned by enumeration. Building on recent formalizations of learning criteria, we are now able to formalize Wiehagen's Thesis. We prove the thesis for a wide range of learning criteria, including many popular criteria from the literature. We also show the limitations of the thesis by giving four learning criteria for which the thesis does not hold (and, in two cases, was probably not meant to hold). Beyond the original formulation of the thesis, we also prove stronger versions which allow for many corollaries relating to strongly decisive and conservative learning.

read more

Content maybe subject to copyright    Report

A Solution to Wiehagen’s Thesis
Timo Kötzing
Friedrich-Schiller-Universität Jena, Jena, Germany
timo.koetzing@uni-jena.de
Abstract
Wiehagen’s Thesis in Inductive Inference (1991) essentially states that, for each learning criterion,
learning can be done in a normalized, enumerative way. The thesis was not a formal statement
and thus did not allow for a formal proof, but support was given by examples of a number of
different learning criteria that can be learned enumeratively.
Building on recent formalizations of learning criteria, we are now able to formalize Wiehagen’s
Thesis. We prove the thesis for a wide range of learning criteria, including many popular criteria
from the literature. We also show the limitations of the thesis by giving four learning criteria for
which the thesis does not hold (and, in two cases, was probably not meant to hold). Beyond the
original formulation of the thesis, we also prove stronger versions which allow for many corollaries
relating to strongly decisive and conservative learning.
1998 ACM Subject Classification I.2.6 Learning
Keywords and phrases Algorithmic Learning Theory, Wiehagen’s Thesis, Enumeration Learning
Digital Object Identifier 10.4230/LIPIcs.STACS.2014.494
1 Introduction
In Gold-style learning [
10
] (also known as inductive inference) a learner tries to learn
an infinite sequence, given more and more finite information about this sequence. For
example, a learner
h
might be presented longer and longer initial segments of the sequence
g
= 1
,
4
,
9
,
16
, . . .
. After each new datum of
g
,
h
may output a description of a function (for
example a Turing machine program computing that function) as its conjecture.
h
might
output a program for the constantly-1 function after seeing the first element of this sequence
g
, and then, as soon as more data is available, a program for the squaring function. Many
criteria for saying whether
h
is successful on
g
have been proposed in the literature. Gold, in
his seminal paper [
10
], gave a first, simple learning criterion, later called Ex-learning
1
, where
a learner is successful iff it eventually stops changing its conjectures, and its final conjecture
is a correct program (computing the input sequence).
Trivially, each single, describable sequence
g
has a suitable constant function as an
Ex-learner (this learner constantly outputs a description for
g
). Thus, we are interested
in sets of total computable functions
S
for which there is a single learner
h
learning each
member of S (those sets S are then called Ex-learnable).
Gold [
10
] showed an important class of sets of functions to be Ex-learnable:
2
each
We would like to thank Sandra Zilles for bringing Wiehagen’s Thesis in connection with the approach
of abstractly defining learning criteria, as well as the anonymous reviewers for their friendly and helpful
suggestions.
1
“Ex” stands for explanatory.
2
We let
N
=
{
0
,
1
,
2
, . . .}
be the set of natural numbers and we fix a coding for programs based on Turing
machines letting, for any program (code)
p N
,
ϕ
p
be the function computed by the Turing machine
coded to p.
© Timo Kötzing;
licensed under Creative Commons License CC-BY
31st Symposium on Theoretical Aspects of Computer Science (STACS’14).
Editors: Ernst W. Mayr and Natacha Portier; pp. 494–505
Leibniz International Proceedings in Informatics
Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

T. Kötzing 495
uniformly computable set of total functions is Ex-learnable; a set of functions
S
is uniformly
computable iff there is a computable function
e
such that
S
=
{ϕ
e(n)
| n N}
. The
corresponding learner learns by enumeration: in every iteration, it finds the first index
n
such that ϕ
e(n)
is consistent with all known data, and outputs e(n) as the conjecture.
However, it is well-known that there are sets which are not uniformly computable, yet
Ex-learnable. Blum and Blum [
6
] gave the following example. Let
e
be a total computable
listing of programs such that the predicate
ϕ
e(n)
(
x
) =
y
is decidable in
n
,
x
and
y
. Crucially,
some of the
ϕ
e(n)
may be undefined on some arguments; these functions are not required
to be learned, but the set of all the total functions enumerated is Ex-learnable. This uses
the same strategy as for uniformly computable sets of functions, but this learning already
goes beyond enumeration of all and only the learned functions, as there are sets which are so
learnable, but not uniformly computable. The price is that the learner may give intermediate
conjectures
e
(
n
) which are programs for partial functions; this is necessarily so, as noted
in [9].
As already shown by Wiehagen [
16
], there are Ex-learnable sets of functions that cannot
be learned while always having a hypothesis that is consistent with the known data. Thus,
the above strategy for learning employed by Blum and Blum [
6
] is not applicable for all
learning tasks. In [
17
,
18
] Wiehagen was looking for whether there is a more general strategy
which also enumerates a list of candidate conjectures and is applicable to all Ex-learnable sets.
He showed that this is indeed possible, giving an insightful characterization of Ex-learning.
A main focus of the research in inductive inference defines learning criteria that are
different from (but usually similar in flavor to) Ex-learning. For example, consistent learning
requires that each conjecture is consistent with the known data; monotone learning requires
the sequence of conjectures to be monotone with respect to inclusion of the graphs of the
computed functions. Wiehagen also gives characterizations for these learning criteria and
more. Other researchers give similar characterizations; recent work in this area includes, for
example, [
1
]. For any learning criterion
I
we are again interested in sets of total computable
functions
S
for which there is a single learner
h
which learns every function in
S
in the sense
specified by I; we call such S I-learnable.
Wiehagen was inspired by his work to conjecture a general structure of learning, as stated
in his Thesis in Inductive Inference [18], which we rephrase in the language of this paper:
Let
I
be any learning criterion. Then for any
I
-learnable class
S
, an enumeration
of programs
e
can be constructed such that
S
is
I
-learnable with respect to
e
by an enumerative learner.
Note that [
18
] called a learning criterion an “inference type” and a learner an “inference
strategy”. About his thesis, Wiehagen [
18
] wrote that “We do not exclude that one nice day
a formal proof of this thesis will be presented. This would require ‘only’ to formalize the
notions of ‘inference type’ and ‘enumerative inference strategy’ which does not seem to be
hopeless. But up to this moment we prefer ‘verifying’ our thesis analogously as it has been
done with ‘verifying’ Church’s thesis, namely by formally proving it for ‘real’, reasonable,
distinct inference types.
Recently, the notion of a learning criterion was formalized in [
13
] (see Section 2.1 for the
formal notions relevant to this paper). Our first contribution in this paper is a formalization
of “enumeration learner” in Definition 2. It is in the nature of the very general thesis that any
formalization may be too broad in some respects and too narrow in other. For example, our
formalizations exclude some learning criteria, such as finite learning, learning by non-total
S TA C S 1 4

496 A Solution to Wiehagen’s Thesis
learners, and criteria featuring global restrictions on the learner. However, for the scope of
our definitions, we already get very strong and insightful results in this paper.
In Theorem 3 we discuss four different learning criteria in which the thesis does not hold.
The first one is prediction, which attaches a totally different meaning to the “conjectures”
than Ex-learning (the thesis was probably never meant to hold for such learning criteria).
The second criterion involves mandatory oscillation between (correct) conjectures, which is in
immediate contradiction to enumerative learning. The third learning criterion is transductive
learning, where the learner has very little information in each iteration. The fourth is
learning in a non-standard hypothesis space. The last two learning criteria do not contradict
enumerative learning directly, but still demand too much for learning by enumeration.
In Section 4 we show that there is a broad core of learning criteria for which Wiehagen’s
Thesis holds. For this we introduce the notion of a pseudo-semantic restriction, where only
the semantics of conjectures and possibly the occurrence of mind changes matter, but not
other parts of their syntax. Theorem 10 shows that Wiehagen’s Thesis holds in the case of
full information learning (like in Ex-learning given above, where the learner only gets more
information in each iteration) when all restrictions are pseudo-semantic, and in Theorem 16
we see that the same holds in the case of iterative learning (a learning model in which a
learner has a restricted memory). Note that these two theorems already cover a very wide
range of learning criteria from the literature, including all given by Wiehagen [18].
Finally, going beyond the scope of Wiehagen’s Thesis, we show that we can assume the
enumeration
e
of programs to be semantically 1-1 (each
e
(
n
) codes for a different function)
if we assume a little bit more about the learning criteria, namely that their restrictions allow
for patching and erasing (see Definition 11). This is formally shown in Theorem 13 (for
the case of full information learning) and in Theorem 17 (for the case of iterative learning).
Example criteria to which these theorems apply include Ex-learning, as well as consistent
and monotone learning. Wiehagen [
18
] already pointed out in special cases that one can get
such semantically 1-1 enumerations. From these results on learning with a semantically 1-1
enumeration we can derive corollaries to conclude that the learning criteria, to which the
theorems apply, allow for strongly decisive and conservative learning (see Definition 1); for
example, for plain Ex-learning, this proves (a stronger version of) a result from [
15
] (which
showed that Ex-learning can be done decisively). Note that all positive results are sufficient
conditions for enumerative learnability; except for the (weak) condition given in Remark 9,
we could not find interesting necessary conditions.
The benefits of this work are threefold. First, we address a long-open problem in its
essential parts. Second, we derive results about (strongly) decisive and conservative learning
in many different settings. Finally, we further develop general techniques to derive powerful
theorems applicable to many different learning criteria, thanks to general notions such as
“pseudo-semantic restriction”.
Note that we omit a number of nontrivial proofs due to space constraints.
2 Mathematical Preliminaries
We fix any computable 1-1 and onto pairing function
, ·i
:
N × N N
; Whenever we
consider tuples of natural numbers as input to a function, it is understood that the general
coding function
, ·i
is used to code the tuples into a single natural number. We similarly fix
a coding for finite sets and sequences, so that we can use those as input as well. We use
to
denote the empty sequence; for every non-empty sequence
σ
we let
σ
denote the sequence
derived from σ by dropping the last listed element.

T. Kötzing 497
If a function
f
is not defined for some argument
x
, then we denote this fact by
f
(
x
)
,
and we say that
f
on
x
diverges; the opposite is denoted by
f
(
x
)
, and we say that
f
on
x
converges. If
f
on
x
converges to
p
, then we denote this fact by
f
(
x
)
=
p
. For any total
computable predicate
P
, we use
µx P
(
x
) to denote the minimal
x
such that
P
(
x
) (undefined,
if no such
x
exists). The special symbol ? is used as a possible hypothesis (meaning “no
change of hypothesis”).
Unintroduced notation for computability theory follows [
14
].
P
and
R
denote, respectively,
the set of all partial computable and the set of all computable functions (mapping
N N
).
For any function
f
:
N N
and all
i
, we use
f
[
i
] to denote the sequence
f
(0), . . . ,
f
(
i
1)
(undefined, if any one of these values is undefined).
We will use a number of basic computability-theoretic results in this paper. First, we
fix a padding function, a 1-1 function
pad R
such that
p, n, x
:
ϕ
pad(p,n)
(
x
) =
ϕ
p
(
x
)
.
Intuitively,
pad
generates infinitely many syntactically different copies of the semantically
same program. We require that
pad
is monotone increasing in both arguments. The S-m-n
Theorem states that there is a 1-1 function
s R
such that
p, n, x
:
ϕ
s(p,n)
(
x
) =
ϕ
p
(
n, x
)
.
Intuitively, s-m-n allows for “hard-coding” arguments to a program.
2.1 Learning Criteria
In this section we formally introduce our setting of learning in the limit and associated
learning criteria. We follow [
13
] in its “building-blocks” approach for defining learning criteria.
A learner is a partial computable function from
N
to
N {
?
}
. A sequence generating operator
is a function
β
taking as arguments a function
h
(the learner) and a function
g
(the learnee)
and that outputs a function
p
. We call
p
the conjecture sequence of
h
given
g
. Intuitively,
β
defines how a learner can interact with a given learnee to produce a sequence of conjectures.
The most important sequence generating operator is
G
(which stands for “Gold”, who
first studied it [
10
]), which gives the learner full information about the learning process so
far; this corresponds to the examples of learning criteria given in the introduction. Formally,
G is defined such that
h, g, i : G(h, g)(i) = h(g[i]).
We define two additional sequence generating operators
It
(iterative learning, [
16
]) and
Td
(transductive learning, [8]) as follows. For all learners h, learnees g and all i,
It(h, g)(i) =
(
h(), if i = 0;
3
h(It(h, g)(i 1), i 1, g(i 1)), otherwise;
Td(h, g)(i) =
h(), if i = 0;
Td(h, g)(i 1), else, if h(i 1, g(i 1)) = ?;
h(i 1, g(i 1)), otherwise.
For both of iterative and transductive learning, the learner is presented with a new datum
each turn (argument/value pair from the learnee in complete and argument-increasing order).
Furthermore, in iterative learning, the learner has access to the previous conjecture, but not
so in transductive learning; however, in transductive learning, the learner can implicitly take
over the previous conjecture by outputting ?.
Successful learning requires the learner to observe certain restrictions, for example
convergence to a correct index. These restrictions are formalized in our next definition. A
3
h() denotes the initial conjecture (based on no data) made by h.
S TA C S 1 4

498 A Solution to Wiehagen’s Thesis
sequence acceptance criterion is a predicate
δ
on a learning sequence and a learnee. The most
important sequence acceptance criterion is denoted
Ex
(which stands for “Explanatory”),
already studied by Gold [
10
]. The requirement is that the conjecture sequence converges (in
the limit) to a correct hypothesis for the learnee (we met this requirement already in the
introduction). Formally, for any programming system
4
ψ
, we define
Ex
ψ
as a predicate such
that
Ex
ψ
= {(p, g) R
2
| n
0
, q : n n
0
: p(n) = q ψ
q
= g}.
Standardly we use
Ex
=
Ex
ϕ
. We will meet many more sequence acceptance criteria below.
We combine any two sequence acceptance criteria δ and δ
0
by intersecting them; we denote
this by juxtaposition (for example, the sequence acceptance criteria given below are meant
to be always used together with Ex).
For any set
C P
of possible learners, any sequence generating operator
β
and any
sequence acceptance criterion
δ
, (
C, β, δ
) (or, for short,
Cβδ
) is a learning criterion. A
learner
h C Cβδ
-learns the set
Cβδ
(
h
) =
{g R | δ
(
β
(
h, g
)
, g
)
}.
A set
S R
of possible
learnees is called
Cβδ
-learnable iff there is a function
h C
which
Cβδ
-learns all elements of
S
(possibly more). Abusing notation, we also use
Cβδ
to denote the set of all
Cβδ
-learnable
sets (learnable by some learner).
Next we define a number of further sequence acceptance criteria which are of interest for
this paper.
I Definition 1.
With
Cons
we denote the restriction of consistent learning [
4
,
6
] (being
correct on all known data); with
Conf
the restriction of conformal learning [
17
] (being
correct or divergent on known data); with
Conv
we denote the restriction of conservative
learning [
2
] (never abandoning a conjecture which is correct on all known data); with
Mon
we denote the restriction of monotone learning [
12
] (conjectures make all the outputs that
previous conjectures made monotonicity in the graphs); finally, with
PMon
we denote the
restriction of pseudo-monotone learning [
18
] (conjectures make all the correct outputs that
previous conjectures made). The following definitions formalize these restrictions.
Conf = {(p, g) R
2
| nx < n : ϕ
p(n)
(x) ϕ
p(n)
(x) = g(x)};
Cons = {(p, g) R
2
| nx < n : ϕ
p(n)
(x) = g(x)};
Conv = {(p, g) R
2
| n : p(n) 6= p(n + 1) x < n + 1 : ϕ
p(n)
(x) 6= g(x)};
Mon = {(p, g) R
2
| i j x : ϕ
p(i)
(x) ϕ
p(j)
(x) = ϕ
p(i)
(x)};
PMon = {(p, g) R
2
| i j x : ϕ
p(i)
(x) = g(x) ϕ
p(j)
(x) = ϕ
p(i)
(x)}.
An example of a well-studied learning criterion is
RGConsEx
, requiring convergence of the
learner to a correct conjecture, as well as consistent conjectures along the way.
Furthermore, we are interested in a number of restrictions which disallow certain kinds
of returning to abandoned conjectures. We say that a learner exhibits a U-shape when it
first outputs a correct conjecture, abandons this, and then returns to a correct conjecture.
We distinguish between syntactic U-shapes (returning to the syntactically same conjecture),
semantic U-shapes (returning to the semantically same conjecture, after semantically aban-
doning it; note that we drop the qualifier “semantic” in this case) and strong U-shapes
(outputting a semantically same conjecture after syntactically abandoning it; this is called
strong, because it leads to the stronger restriction). Forbidding these kinds of U-shapes leads
4
We call
ψ
a programming system iff, for all
p
,
ψ
p
is a computable function, and the function mapping
any p and x to ψ
p
(x) is also (partial) computable.

Citations
More filters
Posted Content

A Map of Update Constraints in Inductive Inference

TL;DR: In this paper, the authors investigate how different learning restrictions reduce learning power and how the different restrictions relate to one another, and give a complete map for nine different restrictions both for the cases of complete information learning and set-driven learning.
Proceedings Article

Normal Forms in Semantic Language Identification

TL;DR: It is shown that strongly locking learning can be assumed for partially set-driven learners, even when learning restrictions apply, and also the converse is true: every strongly locking learner can be made partiallySet-driven.
Posted Content

Maps for Learning Indexable Classes.

TL;DR: Several maps (depictions of all pairwise relations) of various groups of learning criteria are provided, including a map for monotonicity restrictions and similar criteria and amap for restrictions on data presentation, to consider, for various learning criteria, whether learners can be assumed consistent.
References
More filters
Journal ArticleDOI

Learning secrets interactively. Dynamic modeling in inductive inference

TL;DR: introduced is a new inductive inference paradigm, dynamic modeling, which provides an idealization of a social interaction in which h seeks to discover program models of g@?s behavior it sees in interacting with g, and h openly discloses to g its sequence of candidate program models to see what g says back.
Journal ArticleDOI

Learning languages and functions by erasing

TL;DR: The capabilities of learning by erasing are investigated in relation to two factors: the choice of the overall hypothesis space itself and what sets of hypotheses must or may be erased.
Frequently Asked Questions (1)
Q1. What are the contributions in "A solution to wiehagen’s thesis∗" ?

The authors prove the thesis for a wide range of learning criteria, including many popular criteria from the literature. The authors also show the limitations of the thesis by giving four learning criteria for which the thesis does not hold ( and, in two cases, was probably not meant to hold ). Beyond the original formulation of the thesis, the authors also prove stronger versions which allow for many corollaries relating to strongly decisive and conservative learning.