Journal ArticleDOI

# A Solution to Wiehagen's Thesis

01 Apr 2017--Vol. 60, Iss: 3, pp 498-520

TL;DR: This work proves Wiehagen’s Thesis in Inductive Inference for a wide range of learning criteria, including many popular criteria from the literature, and shows the limitations of the thesis by giving four learning criteria for which the thesis does not hold.

AbstractWiehagen's Thesis in Inductive Inference (1991) essentially states that, for each learning criterion, learning can be done in a normalized, enumerative way. The thesis was not a formal statement and thus did not allow for a formal proof, but support was given by examples of a number of different learning criteria that can be learned by enumeration. Building on recent formalizations of learning criteria, we are now able to formalize Wiehagen's Thesis. We prove the thesis for a wide range of learning criteria, including many popular criteria from the literature. We also show the limitations of the thesis by giving four learning criteria for which the thesis does not hold (and, in two cases, was probably not meant to hold). Beyond the original formulation of the thesis, we also prove stronger versions which allow for many corollaries relating to strongly decisive and conservative learning.

Topics: , Formal proof (52%)

## Summary (1 min read)

Jump to:  and [Definition 5.]

### Introduction

• In Gold-style learning [10] (also known as inductive inference) a learner tries to learn an infinite sequence, given more and more finite information about this sequence.
• Gold, in his seminal paper [10] , gave a first, simple learning criterion, later called Ex-learning 1 , where a learner is successful iff it eventually stops changing its conjectures, and its final conjecture is a correct program (computing the input sequence).
• In Theorem 3 the authors discuss four different learning criteria in which the thesis does not hold.
• From these results on learning with a semantically 1-1 enumeration the authors can derive corollaries to conclude that the learning criteria, to which the theorems apply, allow for strongly decisive and conservative learning (see Definition 1); for example, for plain Ex-learning, this proves (a stronger version of) a result from [15] (which showed that Ex-learning can be done decisively).

### Definition 1.

• The authors say that a learner exhibits a U-shape when it first outputs a correct conjecture, abandons this, and then returns to a correct conjecture.
• Forbidding these kinds of U-shapes leads to the respective non-U-shapedness restrictions SynNU, NU and SNU.
• If the authors consider forbidding returning to abandoned conjectures more generally, they get three corresponding restrictions of decisiveness.
• Note that the literature knows many more learning criteria than those constructible from the parts given in this section (see the text book [11] or the survey [19] for an overview).

### 3 Learning by Enumeration

• From the wealth of (theoretically possible) learning criteria the authors quickly see that there are learning criteria which do not allow for learning by enumeration.
• With these definitions, the authors get the follwing theorem.

### S TA C S ' 1 4

• The following learning criteria do not allow for learning by enumeration.
• The authors can see the deep power and versatility of Theorem 13 in connection with Remark 3 and the various examples of sequence acceptance criteria fulfilling the prerequisites of Theorem 13, which leads, for example, to the following corollary.
• Then RItδ allows for learning by enumeration.

### Definition 5.

• That is (by taking the contrapositive), different pre-images under e not only give different images, but even semantically different images.
• The authors say that a learning criterion I allows for learning by semantically 1-1 enumeration iff each I-learnable set S is I-learnable by semantically 1-1 enumeration.
• Let h learn by semantically 1-1 enumeration.
• In particular, for any learning criterion I allowing for learning by semantically 1-1 enumeration, every I-learnable set is I-learnable by a strongly decisive learner.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

A Solution to Wiehagen’s Thesis
Timo Kötzing
Friedrich-Schiller-Universität Jena, Jena, Germany
timo.koetzing@uni-jena.de
Abstract
Wiehagen’s Thesis in Inductive Inference (1991) essentially states that, for each learning criterion,
learning can be done in a normalized, enumerative way. The thesis was not a formal statement
and thus did not allow for a formal proof, but support was given by examples of a number of
diﬀerent learning criteria that can be learned enumeratively.
Building on recent formalizations of learning criteria, we are now able to formalize Wiehagen’s
Thesis. We prove the thesis for a wide range of learning criteria, including many popular criteria
from the literature. We also show the limitations of the thesis by giving four learning criteria for
which the thesis does not hold (and, in two cases, was probably not meant to hold). Beyond the
original formulation of the thesis, we also prove stronger versions which allow for many corollaries
relating to strongly decisive and conservative learning.
1998 ACM Subject Classiﬁcation I.2.6 Learning
Keywords and phrases Algorithmic Learning Theory, Wiehagen’s Thesis, Enumeration Learning
Digital Object Identiﬁer 10.4230/LIPIcs.STACS.2014.494
1 Introduction
In Gold-style learning [
10
] (also known as inductive inference) a learner tries to learn
an inﬁnite sequence, given more and more ﬁnite information about this sequence. For
example, a learner
h
might be presented longer and longer initial segments of the sequence
g
= 1
,
4
,
9
,
16
, . . .
. After each new datum of
g
,
h
may output a description of a function (for
example a Turing machine program computing that function) as its conjecture.
h
might
output a program for the constantly-1 function after seeing the ﬁrst element of this sequence
g
, and then, as soon as more data is available, a program for the squaring function. Many
criteria for saying whether
h
is successful on
g
have been proposed in the literature. Gold, in
his seminal paper [
10
], gave a ﬁrst, simple learning criterion, later called Ex-learning
1
, where
a learner is successful iﬀ it eventually stops changing its conjectures, and its ﬁnal conjecture
is a correct program (computing the input sequence).
Trivially, each single, describable sequence
g
has a suitable constant function as an
Ex-learner (this learner constantly outputs a description for
g
). Thus, we are interested
in sets of total computable functions
S
for which there is a single learner
h
learning each
member of S (those sets S are then called Ex-learnable).
Gold [
10
] showed an important class of sets of functions to be Ex-learnable:
2
each
We would like to thank Sandra Zilles for bringing Wiehagen’s Thesis in connection with the approach
of abstractly deﬁning learning criteria, as well as the anonymous reviewers for their friendly and helpful
suggestions.
1
“Ex” stands for explanatory.
2
We let
N
=
{
0
,
1
,
2
, . . .}
be the set of natural numbers and we ﬁx a coding for programs based on Turing
machines letting, for any program (code)
p N
,
ϕ
p
be the function computed by the Turing machine
coded to p.
© Timo Kötzing;
licensed under Creative Commons License CC-BY
31st Symposium on Theoretical Aspects of Computer Science (STACS’14).
Editors: Ernst W. Mayr and Natacha Portier; pp. 494–505
Leibniz International Proceedings in Informatics
Schloss Dagstuhl Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

T. Kötzing 495
uniformly computable set of total functions is Ex-learnable; a set of functions
S
is uniformly
computable iﬀ there is a computable function
e
such that
S
=
{ϕ
e(n)
| n N}
. The
corresponding learner learns by enumeration: in every iteration, it ﬁnds the ﬁrst index
n
such that ϕ
e(n)
is consistent with all known data, and outputs e(n) as the conjecture.
However, it is well-known that there are sets which are not uniformly computable, yet
Ex-learnable. Blum and Blum [
6
] gave the following example. Let
e
be a total computable
listing of programs such that the predicate
ϕ
e(n)
(
x
) =
y
is decidable in
n
,
x
and
y
. Crucially,
some of the
ϕ
e(n)
may be undeﬁned on some arguments; these functions are not required
to be learned, but the set of all the total functions enumerated is Ex-learnable. This uses
the same strategy as for uniformly computable sets of functions, but this learning already
goes beyond enumeration of all and only the learned functions, as there are sets which are so
learnable, but not uniformly computable. The price is that the learner may give intermediate
conjectures
e
(
n
) which are programs for partial functions; this is necessarily so, as noted
in [9].
As already shown by Wiehagen [
16
], there are Ex-learnable sets of functions that cannot
be learned while always having a hypothesis that is consistent with the known data. Thus,
the above strategy for learning employed by Blum and Blum [
6
] is not applicable for all
learning tasks. In [
17
,
18
] Wiehagen was looking for whether there is a more general strategy
which also enumerates a list of candidate conjectures and is applicable to all Ex-learnable sets.
He showed that this is indeed possible, giving an insightful characterization of Ex-learning.
A main focus of the research in inductive inference deﬁnes learning criteria that are
diﬀerent from (but usually similar in ﬂavor to) Ex-learning. For example, consistent learning
requires that each conjecture is consistent with the known data; monotone learning requires
the sequence of conjectures to be monotone with respect to inclusion of the graphs of the
computed functions. Wiehagen also gives characterizations for these learning criteria and
more. Other researchers give similar characterizations; recent work in this area includes, for
example, [
1
]. For any learning criterion
I
we are again interested in sets of total computable
functions
S
for which there is a single learner
h
which learns every function in
S
in the sense
speciﬁed by I; we call such S I-learnable.
Wiehagen was inspired by his work to conjecture a general structure of learning, as stated
in his Thesis in Inductive Inference [18], which we rephrase in the language of this paper:
Let
I
be any learning criterion. Then for any
I
-learnable class
S
, an enumeration
of programs
e
can be constructed such that
S
is
I
-learnable with respect to
e
by an enumerative learner.
Note that [
18
] called a learning criterion an “inference type” and a learner an “inference
strategy”. About his thesis, Wiehagen [
18
] wrote that “We do not exclude that one nice day
a formal proof of this thesis will be presented. This would require ‘only’ to formalize the
notions of ‘inference type’ and ‘enumerative inference strategy’ which does not seem to be
hopeless. But up to this moment we prefer ‘verifying’ our thesis analogously as it has been
done with ‘verifying’ Church’s thesis, namely by formally proving it for ‘real’, reasonable,
distinct inference types.
Recently, the notion of a learning criterion was formalized in [
13
] (see Section 2.1 for the
formal notions relevant to this paper). Our ﬁrst contribution in this paper is a formalization
of “enumeration learner” in Deﬁnition 2. It is in the nature of the very general thesis that any
formalization may be too broad in some respects and too narrow in other. For example, our
formalizations exclude some learning criteria, such as ﬁnite learning, learning by non-total
S TA C S 1 4

496 A Solution to Wiehagen’s Thesis
learners, and criteria featuring global restrictions on the learner. However, for the scope of
our deﬁnitions, we already get very strong and insightful results in this paper.
In Theorem 3 we discuss four diﬀerent learning criteria in which the thesis does not hold.
The ﬁrst one is prediction, which attaches a totally diﬀerent meaning to the “conjectures”
than Ex-learning (the thesis was probably never meant to hold for such learning criteria).
The second criterion involves mandatory oscillation between (correct) conjectures, which is in
immediate contradiction to enumerative learning. The third learning criterion is transductive
learning, where the learner has very little information in each iteration. The fourth is
learning in a non-standard hypothesis space. The last two learning criteria do not contradict
enumerative learning directly, but still demand too much for learning by enumeration.
In Section 4 we show that there is a broad core of learning criteria for which Wiehagen’s
Thesis holds. For this we introduce the notion of a pseudo-semantic restriction, where only
the semantics of conjectures and possibly the occurrence of mind changes matter, but not
other parts of their syntax. Theorem 10 shows that Wiehagen’s Thesis holds in the case of
full information learning (like in Ex-learning given above, where the learner only gets more
information in each iteration) when all restrictions are pseudo-semantic, and in Theorem 16
we see that the same holds in the case of iterative learning (a learning model in which a
learner has a restricted memory). Note that these two theorems already cover a very wide
range of learning criteria from the literature, including all given by Wiehagen [18].
Finally, going beyond the scope of Wiehagen’s Thesis, we show that we can assume the
enumeration
e
of programs to be semantically 1-1 (each
e
(
n
) codes for a diﬀerent function)
if we assume a little bit more about the learning criteria, namely that their restrictions allow
for patching and erasing (see Deﬁnition 11). This is formally shown in Theorem 13 (for
the case of full information learning) and in Theorem 17 (for the case of iterative learning).
Example criteria to which these theorems apply include Ex-learning, as well as consistent
and monotone learning. Wiehagen [
18
] already pointed out in special cases that one can get
such semantically 1-1 enumerations. From these results on learning with a semantically 1-1
enumeration we can derive corollaries to conclude that the learning criteria, to which the
theorems apply, allow for strongly decisive and conservative learning (see Deﬁnition 1); for
example, for plain Ex-learning, this proves (a stronger version of) a result from [
15
] (which
showed that Ex-learning can be done decisively). Note that all positive results are suﬃcient
conditions for enumerative learnability; except for the (weak) condition given in Remark 9,
we could not ﬁnd interesting necessary conditions.
The beneﬁts of this work are threefold. First, we address a long-open problem in its
essential parts. Second, we derive results about (strongly) decisive and conservative learning
in many diﬀerent settings. Finally, we further develop general techniques to derive powerful
theorems applicable to many diﬀerent learning criteria, thanks to general notions such as
“pseudo-semantic restriction”.
Note that we omit a number of nontrivial proofs due to space constraints.
2 Mathematical Preliminaries
We ﬁx any computable 1-1 and onto pairing function
, ·i
:
N × N N
; Whenever we
consider tuples of natural numbers as input to a function, it is understood that the general
coding function
, ·i
is used to code the tuples into a single natural number. We similarly ﬁx
a coding for ﬁnite sets and sequences, so that we can use those as input as well. We use
to
denote the empty sequence; for every non-empty sequence
σ
we let
σ
denote the sequence
derived from σ by dropping the last listed element.

T. Kötzing 497
If a function
f
is not deﬁned for some argument
x
, then we denote this fact by
f
(
x
)
,
and we say that
f
on
x
diverges; the opposite is denoted by
f
(
x
)
, and we say that
f
on
x
converges. If
f
on
x
converges to
p
, then we denote this fact by
f
(
x
)
=
p
. For any total
computable predicate
P
, we use
µx P
(
x
) to denote the minimal
x
such that
P
(
x
) (undeﬁned,
if no such
x
exists). The special symbol ? is used as a possible hypothesis (meaning “no
change of hypothesis”).
Unintroduced notation for computability theory follows [
14
].
P
and
R
denote, respectively,
the set of all partial computable and the set of all computable functions (mapping
N N
).
For any function
f
:
N N
and all
i
, we use
f
[
i
] to denote the sequence
f
(0), . . . ,
f
(
i
1)
(undeﬁned, if any one of these values is undeﬁned).
We will use a number of basic computability-theoretic results in this paper. First, we
ﬁx a padding function, a 1-1 function
pad R
such that
p, n, x
:
ϕ
pad(p,n)
(
x
) =
ϕ
p
(
x
)
.
Intuitively,
pad
generates inﬁnitely many syntactically diﬀerent copies of the semantically
same program. We require that
pad
is monotone increasing in both arguments. The S-m-n
Theorem states that there is a 1-1 function
s R
such that
p, n, x
:
ϕ
s(p,n)
(
x
) =
ϕ
p
(
n, x
)
.
Intuitively, s-m-n allows for “hard-coding” arguments to a program.
2.1 Learning Criteria
In this section we formally introduce our setting of learning in the limit and associated
learning criteria. We follow [
13
] in its “building-blocks” approach for deﬁning learning criteria.
A learner is a partial computable function from
N
to
N {
?
}
. A sequence generating operator
is a function
β
taking as arguments a function
h
(the learner) and a function
g
(the learnee)
and that outputs a function
p
. We call
p
the conjecture sequence of
h
given
g
. Intuitively,
β
deﬁnes how a learner can interact with a given learnee to produce a sequence of conjectures.
The most important sequence generating operator is
G
(which stands for “Gold”, who
ﬁrst studied it [
10
]), which gives the learner full information about the learning process so
far; this corresponds to the examples of learning criteria given in the introduction. Formally,
G is deﬁned such that
h, g, i : G(h, g)(i) = h(g[i]).
We deﬁne two additional sequence generating operators
It
(iterative learning, [
16
]) and
Td
(transductive learning, [8]) as follows. For all learners h, learnees g and all i,
It(h, g)(i) =
(
h(), if i = 0;
3
h(It(h, g)(i 1), i 1, g(i 1)), otherwise;
Td(h, g)(i) =
h(), if i = 0;
Td(h, g)(i 1), else, if h(i 1, g(i 1)) = ?;
h(i 1, g(i 1)), otherwise.
For both of iterative and transductive learning, the learner is presented with a new datum
each turn (argument/value pair from the learnee in complete and argument-increasing order).
Furthermore, in iterative learning, the learner has access to the previous conjecture, but not
so in transductive learning; however, in transductive learning, the learner can implicitly take
over the previous conjecture by outputting ?.
Successful learning requires the learner to observe certain restrictions, for example
convergence to a correct index. These restrictions are formalized in our next deﬁnition. A
3
h() denotes the initial conjecture (based on no data) made by h.
S TA C S 1 4

498 A Solution to Wiehagen’s Thesis
sequence acceptance criterion is a predicate
δ
on a learning sequence and a learnee. The most
important sequence acceptance criterion is denoted
Ex
(which stands for “Explanatory”),
already studied by Gold [
10
]. The requirement is that the conjecture sequence converges (in
the limit) to a correct hypothesis for the learnee (we met this requirement already in the
introduction). Formally, for any programming system
4
ψ
, we deﬁne
Ex
ψ
as a predicate such
that
Ex
ψ
= {(p, g) R
2
| n
0
, q : n n
0
: p(n) = q ψ
q
= g}.
Standardly we use
Ex
=
Ex
ϕ
. We will meet many more sequence acceptance criteria below.
We combine any two sequence acceptance criteria δ and δ
0
by intersecting them; we denote
this by juxtaposition (for example, the sequence acceptance criteria given below are meant
to be always used together with Ex).
For any set
C P
of possible learners, any sequence generating operator
β
and any
sequence acceptance criterion
δ
, (
C, β, δ
) (or, for short,
Cβδ
) is a learning criterion. A
learner
h C Cβδ
-learns the set
Cβδ
(
h
) =
{g R | δ
(
β
(
h, g
)
, g
)
}.
A set
S R
of possible
learnees is called
Cβδ
-learnable iﬀ there is a function
h C
which
Cβδ
-learns all elements of
S
(possibly more). Abusing notation, we also use
Cβδ
to denote the set of all
Cβδ
-learnable
sets (learnable by some learner).
Next we deﬁne a number of further sequence acceptance criteria which are of interest for
this paper.
I Deﬁnition 1.
With
Cons
we denote the restriction of consistent learning [
4
,
6
] (being
correct on all known data); with
Conf
the restriction of conformal learning [
17
] (being
correct or divergent on known data); with
Conv
we denote the restriction of conservative
learning [
2
] (never abandoning a conjecture which is correct on all known data); with
Mon
we denote the restriction of monotone learning [
12
] (conjectures make all the outputs that
previous conjectures made monotonicity in the graphs); ﬁnally, with
PMon
we denote the
restriction of pseudo-monotone learning [
18
] (conjectures make all the correct outputs that
previous conjectures made). The following deﬁnitions formalize these restrictions.
Conf = {(p, g) R
2
| nx < n : ϕ
p(n)
(x) ϕ
p(n)
(x) = g(x)};
Cons = {(p, g) R
2
| nx < n : ϕ
p(n)
(x) = g(x)};
Conv = {(p, g) R
2
| n : p(n) 6= p(n + 1) x < n + 1 : ϕ
p(n)
(x) 6= g(x)};
Mon = {(p, g) R
2
| i j x : ϕ
p(i)
(x) ϕ
p(j)
(x) = ϕ
p(i)
(x)};
PMon = {(p, g) R
2
| i j x : ϕ
p(i)
(x) = g(x) ϕ
p(j)
(x) = ϕ
p(i)
(x)}.
An example of a well-studied learning criterion is
RGConsEx
, requiring convergence of the
learner to a correct conjecture, as well as consistent conjectures along the way.
Furthermore, we are interested in a number of restrictions which disallow certain kinds
of returning to abandoned conjectures. We say that a learner exhibits a U-shape when it
ﬁrst outputs a correct conjecture, abandons this, and then returns to a correct conjecture.
We distinguish between syntactic U-shapes (returning to the syntactically same conjecture),
semantic U-shapes (returning to the semantically same conjecture, after semantically aban-
doning it; note that we drop the qualiﬁer “semantic” in this case) and strong U-shapes
(outputting a semantically same conjecture after syntactically abandoning it; this is called
strong, because it leads to the stronger restriction). Forbidding these kinds of U-shapes leads
4
We call
ψ
a programming system iﬀ, for all
p
,
ψ
p
is a computable function, and the function mapping
any p and x to ψ
p
(x) is also (partial) computable.

##### Citations
More filters

Posted Content
Abstract: We investigate how different learning restrictions reduce learning power and how the different restrictions relate to one another. We give a complete map for nine different restrictions both for the cases of complete information learning and set-driven learning. This completes the picture for these well-studied \emph{delayable} learning restrictions. A further insight is gained by different characterizations of \emph{conservative} learning in terms of variants of \emph{cautious} learning. Our analyses greatly benefit from general theorems we give, for example showing that learners with exclusively delayable restrictions can always be assumed total.

9 citations

Proceedings Article
11 Oct 2017
TL;DR: It is shown that strongly locking learning can be assumed for partially set-driven learners, even when learning restrictions apply, and also the converse is true: every strongly locking learner can be made partiallySet-driven.
Abstract: We consider language learning in the limit from text where all learning restrictions are semantic, that is, where any conjecture may be replaced by a semantically equivalent conjecture. For different such learning criteria, starting with the well-known TxtGBclearning, we consider three different normal forms: strongly locking learning, consistent learning and (partially) set-driven learning. These normal forms support and simplify proofs and give insight into what behaviors are necessary for successful learning (for example when consistency in conservative learning implies cautiousness and strong decisiveness). We show that strongly locking learning can be assumed for partially set-driven learners, even when learning restrictions apply. We give a very general proof relying only on a natural property of the learning restriction, namely, allowing for simulation on equivalent text. Furthermore, when no restrictions apply, also the converse is true: every strongly locking learner can be made partially set-driven. For several semantic learning criteria we show that learning can be done consistently. Finally, we deduce for which learning restrictions partial set-drivenness and set-drivenness coincide, including a general statement about classes of infinite languages. The latter again relies on a simulation argument.

7 citations

### Cites background from "A Solution to Wiehagen's Thesis"

• ...In decisive learning (Dec, Osherson et al., 1982), a learner may never return to a semantically abandoned conjecture; in strongly decisive learning (SDec, Kötzing, 2014) the learner may not even return to syntactically abandoned conjectures....

[...]

• ...The following definitions were first given by Kötzing (2014)....

[...]

Posted Content
TL;DR: Several maps (depictions of all pairwise relations) of various groups of learning criteria are provided, including a map for monotonicity restrictions and similar criteria and amap for restrictions on data presentation, to consider, for various learning criteria, whether learners can be assumed consistent.
Abstract: We study learning of indexed families from positive data where a learner can freely choose a hypothesis space (with uniformly decidable membership) comprising at least the languages to be learned. This abstracts a very universal learning task which can be found in many areas, for example learning of (subsets of) regular languages or learning of natural languages. We are interested in various restrictions on learning, such as consistency, conservativeness or set-drivenness, exemplifying various natural learning restrictions. Building on previous results from the literature, we provide several maps (depictions of all pairwise relations) of various groups of learning criteria, including a map for monotonicity restrictions and similar criteria and a map for restrictions on data presentation. Furthermore, we consider, for various learning criteria, whether learners can be assumed consistent.

##### References
More filters

Journal ArticleDOI
TL;DR: Central concerns of the book are related theories of recursively enumerable sets, of degree of un-solvability and turing degrees in particular and generalizations of recursion theory.

3,665 citations

### Additional excerpts

• ...If there is an f ∈ R such that ∀z : (φz) = φf (z), we call effective [20]....

[...]

• ...Unintroduced notation for computability theory follows [20]....

[...]

Journal ArticleDOI
TL;DR: It was found that theclass of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learningable from a text.
Abstract: Language learnability has been investigated. This refers to the following situation: A class of possible languages is specified, together with a method of presenting information to the learner about an unknown language, which is to be chosen from the class. The question is now asked, “Is the information sufficient to determine which of the possible languages is the unknown language?” Many definitions of learnability are possible, but only the following is considered here: Time is quantized and has a finite starting time. At each time the learner receives a unit of information and is to make a guess as to the identity of the unknown language on the basis of the information received so far. This process continues forever. The class of languages will be considered learnable with respect to the specified method of information presentation if there is an algorithm that the learner can use to make his guesses, the algorithm having the following property: Given any language of the class, there is some finite time after which the guesses will all be the same and they will be correct. In this preliminary investigation, a language is taken to be a set of strings on some finite alphabet. The alphabet is the same for all languages of the class. Several variations of each of the following two basic methods of information presentation are investigated: A text for a language generates the strings of the language in any order such that every string of the language occurs at least once. An informant for a language tells whether a string is in the language, and chooses the strings in some order such that every string occurs at least once. It was found that the class of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learnable from a text.

3,364 citations

### "A Solution to Wiehagen's Thesis" refers background or methods in this paper

• ...for “Explanatory”), already studied by Gold [13]....

[...]

• ...Gold, in his seminal paper [13], gave a first, simple learning criterion, later called Ex-learning,1 where a learner is successful iff it eventually stops changing its conjectures, and its final conjecture is a correct program (computing the input sequence)....

[...]

• ...In inductive inference (as introduced by Gold [13]) a learner tries to learn an infinite sequence of function values, given more and more finite information about this sequence....

[...]

• ...We consider the three sequence generating operators in this paper: G (which stands for “Gold”, who first studied it [13]), corresponding to the examples of learning criteria given in the introduction; It (iterative learning, [23]); and Td (transductive learning, [10])....

[...]

• ...The most important sequence acceptance criterion is denoted Ex (which stands for “Explanatory”), already studied by Gold [13]....

[...]

Book
22 Apr 1987
Abstract: Central concerns of the book are related theories of recursively enumerable sets, of degree of un-solvability and turing degrees in particular. A second group of topics has to do with generalizations of recursion theory. The third topics group mentioned is subrecursive computability and subrecursive hierarchies

1,778 citations

Journal ArticleDOI
TL;DR: A theorem characterizing when an indexed family of nonempty recursive formal languages is inferrable from positive data is proved, and other useful conditions for inference frompositive data are obtained.
Abstract: We consider inductive inference of formal languages, as defined by Gold (1967) , in the case of positive data, i.e., when the examples of a given formal language are successive elements of some arbitrary enumeration of the elements of the language. We prove a theorem characterizing when an indexed family of nonempty recursive formal languages is inferrable from positive data. From this theorem we obtain other useful conditions for inference from positive data, and give several examples of their application. We give counterexamples to two variants of the characterizing condition, and investigate conditions for inference from positive data that avoids “overgeneralization.”

791 citations

### "A Solution to Wiehagen's Thesis" refers methods in this paper

• ...Definition 1 With Cons we denote the restriction of consistent learning [4, 6, 17] (being correct on all known data); with Conf the restriction of conformal learning [24] (being correct or divergent on known data); with Conv we denote the restriction of conservative learning [2] (never abandoning a conjecture which is correct on all known data); with Mon we denote the restriction of monotone learning [16] (conjectures make all the outputs that previous conjectures made— monotonicity in the graphs); finally, with PMon we denote the restriction of pseudo-monotone learning [25] (conjectures make all the correct outputs that previous conjectures made)....

[...]

Journal ArticleDOI
TL;DR: This paper investigates the theoretical capabilities and limitations of a computer to infer such sequences and design Turing machines that in principle are extremely powerful for this purpose and place upper bounds on the capabilities of machines that would do better.
Abstract: Intelligence tests occasionally require the extrapolation of an effective sequence (e.g. 1661, 2552, 3663, …) that is produced by some easily discernible algorithm. In this paper, we investigate the theoretical capabilities and limitations of a computer to infer such sequences. We design Turing machines that in principle are extremely powerful for this purpose and place upper bounds on the capabilities of machines that would do better.

638 citations

### "A Solution to Wiehagen's Thesis" refers background or methods in this paper

• ...Thus, the above strategy for learning employed by Blum and Blum [6] is not applicable for all learning tasks....

[...]

• ...Blum and Blum [6] gave the following example....

[...]

• ...Thus, the above strategy for learning employed by Blum and Blum [6] is not...

[...]

• ...Definition 1 With Cons we denote the restriction of consistent learning [4, 6, 17] (being correct on all known data); with Conf the restriction of conformal learning [24] (being correct or divergent on known data); with Conv we denote the restriction of conservative learning [2] (never abandoning a conjecture which is correct on all known data); with Mon we denote the restriction of monotone learning [16] (conjectures make all the outputs that previous conjectures made— monotonicity in the graphs); finally, with PMon we denote the restriction of pseudo-monotone learning [25] (conjectures make all the correct outputs that previous conjectures made)....

[...]

##### Frequently Asked Questions (1)
###### Q1. What are the contributions in "A solution to wiehagen’s thesis∗" ?

The authors prove the thesis for a wide range of learning criteria, including many popular criteria from the literature. The authors also show the limitations of the thesis by giving four learning criteria for which the thesis does not hold ( and, in two cases, was probably not meant to hold ). Beyond the original formulation of the thesis, the authors also prove stronger versions which allow for many corollaries relating to strongly decisive and conservative learning.