scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Learning in 2001"


Posted Content
TL;DR: This article studied the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints, and developed two general approaches for an important subproblem-identifying phrase structure.
Abstract: We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we develop two general approaches for an important subproblem-identifying phrase structure. The first is a Markovian approach that extends standard HMMs to allow the use of a rich observation structure and of general classifiers to model state-observation dependencies. The second is an extension of constraint satisfaction formalisms. We develop efficient combination algorithms under both models and study them experimentally in the context of shallow parsing.

204 citations


Journal Article
TL;DR: This thesis introduces a new unsupervised learning framework, called Alignment-Based Learning, which is based on the alignment of sentences and Harris's (1951) notion of substitutability, and can be applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of that corpus.
Abstract: refined and abstract meanings largely grow out of more concrete meanings. Bloomfield (1933) This thesis introduces a new unsupervised learning framework, called Alignment-Based Learning, which is based on the alignment of sentences and Harris's (1951) notion of substitutability . Instances of the framework can be applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of that corpus. Firstly, the framework aligns all sentences in the corpus in pairs, resulting in a partition of the sentences consisting of parts of the sentences that are equal in both sentences and parts that are unequal. Unequal parts of sen tences can be seen as being substitutable for each other, since substituting one unequal part for the other results in another valid sentence. The unequal parts of the sentences are thus considered to be possible (possibly overlapping) constituents, called hypotheses. Secondly , the selection learning phase considers all hypotheses found by the alignment learning phase and selects the best of these. The hypotheses are selected based on the order in which they were found, or based on a probabilistic function. The framework can be extended with a grammar extraction phase. This extended framework is called parseABL. Instead of returning a structured version of the unstructured input corpus, like the ABL system, this system also returns a stochastic context-free or tree substitution grammar. Different instances of the framework have been tested on the English ATIS corpus, the Dutch OVIS corpus and the Wall Street Journal corpus. One of the interesting results, apart from the encouraging numerical results, is that all instances can (and do) learn recursive structures.

99 citations


Posted Content
TL;DR: In this paper, a new paradigm and computational framework for identification of correspondences between sub-structures of distinct composite systems is proposed, termed coupled clustering, which simultaneously identifies corresponding clusters within two data sets.
Abstract: This paper proposes a new paradigm and computational framework for identification of correspondences between sub-structures of distinct composite systems. For this, we define and investigate a variant of traditional data clustering, termed coupled clustering, which simultaneously identifies corresponding clusters within two data sets. The presented method is demonstrated and evaluated for detecting topical correspondences in textual corpora.

37 citations


Posted Content
TL;DR: In this article, it is shown that the average number of prediction errors made by the universal universal prediction scheme rapidly converges to those made by a best possible informed computation scheme in a conditional mean squared sense.
Abstract: Solomonoff's uncomputable universal prediction scheme $\xi$ allows to predict the next symbol $x_k$ of a sequence $x_1...x_{k-1}$ for any Turing computable, but otherwise unknown, probabilistic environment $\mu$. This scheme will be generalized to arbitrary environmental classes, which, among others, allows the construction of computable universal prediction schemes $\xi$. Convergence of $\xi$ to $\mu$ in a conditional mean squared sense and with $\mu$ probability 1 is proven. It is shown that the average number of prediction errors made by the universal $\xi$ scheme rapidly converges to those made by the best possible informed $\mu$ scheme. The schemes, theorems and proofs are given for general finite alphabet, which results in additional complications as compared to the binary case. Several extensions of the presented theory and results are outlined. They include general loss functions and bounds, games of chance, infinite alphabet, partial and delayed prediction, classification, and more active systems.

28 citations


Posted Content
TL;DR: In this article, the authors develop value estimators that utilize data gathered when using one policy to estimate the value of using another policy, resulting in much more data-efficient algorithms, and give PAC-style bounds.
Abstract: Reinforcement learning means finding the optimal course of action in Markovian environments without knowledge of the environment's dynamics. Stochastic optimization algorithms used in the field rely on estimates of the value of a policy. Typically, the value of a policy is estimated from results of simulating that very policy in the environment. This approach requires a large amount of simulation as different points in the policy space are considered. In this paper, we develop value estimators that utilize data gathered when using one policy to estimate the value of using another policy, resulting in much more data-efficient algorithms. We consider the question of accumulating a sufficient experience and give PAC-style bounds.

19 citations


Posted Content
TL;DR: This paper explores a {\it stigmergic} approach, in which the agent's actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent.
Abstract: In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a {\it stigmergic} approach, in which the agent's actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent. In this case, we need to learn a reactive policy in a highly non-Markovian domain. We explore two algorithms: SARSA(\lambda), which has had empirical success in partially observable domains, and VAPS, a new algorithm due to Baird and Moore, with convergence guarantees in partially observable domains. We compare the performance of these two algorithms on benchmark problems.

5 citations


Posted Content
TL;DR: In this paper, the authors show that the computational overhead of cross-validation can be reduced significantly by integrating the crossvalidation with the normal decision tree induction process, and provide an analysis of the speedups these adaptations may yield.
Abstract: Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. The analysis is supported by experimental results.

5 citations


Posted Content
TL;DR: It is shown that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically), and its performance vis-a-vis learning with full memory is less clearcut, and depends on certain Probabilistic assumptions.
Abstract: We study the convergence speed of the batch learning algorithm, and compare its speed to that of the memoryless learning algorithm and of learning with memory (as analyzed in joint work with N. Komarova). We obtain precise results and show in particular that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance vis-a-vis learning with full memory is less clearcut, and depends on certainprobabilistic assumptions. These results necessitate theintroduction of the moment zeta function of a probability distribution and the study of some of its properties.

4 citations


Posted Content
TL;DR: A new type of unsupervised learning algorithm, based on the alignment of sentences and Harris’s (1951) notion of interchangeability is introduced, which results in a labelled, bracketed version of the corpus of natural language sentences.
Abstract: This paper introduces a new type of unsupervised learning algorithm, based on the alignment of sentences and Harris's (1951) notion of interchangeability. The algorithm is applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of the corpus. Firstly, the algorithm aligns all sentences in the corpus in pairs, resulting in a partition of the sentences consisting of parts of the sentences that are similar in both sentences and parts that are dissimilar. This information is used to find (possibly overlapping) constituents. Next, the algorithm selects (non-overlapping) constituents. Several instances of the algorithm are applied to the ATIS corpus (Marcus et al., 1993) and the OVIS (Openbaar Vervoer Informatie Systeem (OVIS) stands for Public Transport Information System.) corpus (Bonnema et al., 1997). Apart from the promising numerical results, the most striking result is that even the simplest algorithm based on alignment learns recursion.

4 citations


Posted Content
TL;DR: The authors introduced a new type of grammar learning algorithm, inspired by string edit distance (Wagner and Fischer, 1974), which works on pairs of unstructured sentences that have one or more words in common.
Abstract: This paper introduces a new type of grammar learning algorithm, inspired by string edit distance (Wagner and Fischer, 1974). The algorithm takes a corpus of flat sentences as input and returns a corpus of labelled, bracketed sentences. The method works on pairs of unstructured sentences that have one or more words in common. When two sentences are divided into parts that are the same in both sentences and parts that are different, this information is used to find parts that are interchangeable. These parts are taken as possible constituents of the same type. After this alignment learning step, the selection learning step selects the most probable constituents from all possible constituents. This method was used to bootstrap structure on the ATIS corpus (Marcus et al., 1993) and on the OVIS (Openbaar Vervoer Informatie Systeem (OVIS) stands for Public Transport Information System.) corpus (Bonnema et al., 1997). While the results are encouraging (we obtained up to 89.25 % non-crossing brackets precision), this paper will point out some of the shortcomings of our approach and will suggest possible solutions.

3 citations


Posted Content
TL;DR: A new similarity-based learning algorithm, inspired by string edit-distance, is applied to the problem of bootstrapping structure from scratch and compared the resulting structured sentences to the structured sentences in the ATIS corpus.
Abstract: In this paper a new similarity-based learning algorithm, inspired by string edit-distance (Wagner and Fischer, 1974), is applied to the problem of bootstrapping structure from scratch. The algorithm takes a corpus of unannotated sentences as input and returns a corpus of bracketed sentences. The method works on pairs of unstructured sentences or sentences partially bracketed by the algorithm that have one or more words in common. It finds parts of sentences that are interchangeable (i.e. the parts of the sentences that are different in both sentences). These parts are taken as possible constituents of the same type. While this corresponds to the basic bootstrapping step of the algorithm, further structure may be learned from comparison with other (similar) sentences. We used this method for bootstrapping structure from the flat sentences of the Penn Treebank ATIS corpus, and compared the resulting structured sentences to the structured sentences in the ATIS corpus. Similarly, the algorithm was tested on the OVIS corpus. We obtained 86.04 % non-crossing brackets precision on the ATIS corpus and 89.39 % non-crossing brackets precision on the OVIS corpus.