scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 2002"


Journal ArticleDOI
01 Mar 2002
TL;DR: This presentation discusses the design and implementation of machine learning algorithms in Java, as well as some of the techniques used to develop and implement these algorithms.
Abstract: 1. What's It All About? 2. Input: Concepts, Instances, Attributes 3. Output: Knowledge Representation 4. Algorithms: The Basic Methods 5. Credibility: Evaluating What's Been Learned 6. Implementations: Real Machine Learning Schemes 7. Moving On: Engineering The Input And Output 8. Nuts And Bolts: Machine Learning Algorithms In Java 9. Looking Forward

5,936 citations


Journal ArticleDOI
TL;DR: Experimental results showing that employing the active learning method can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings are presented.
Abstract: Support vector machines have met with significant success in numerous real-world learning tasks. However, like most machine learning algorithms, they are generally applied using a randomly selected training set classified in advance. In many settings, we also have the option of using pool-based active learning. Instead of using a randomly selected training set, the learner has access to a pool of unlabeled instances and can request the labels for some number of them. We introduce a new algorithm for performing active learning with support vector machines, i.e., an algorithm for choosing which instances to request next. We provide a theoretical motivation for the algorithm using the notion of a version space. We present experimental results showing that employing our active learning method can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings.

3,212 citations


Proceedings Article
01 Jan 2002
TL;DR: The proposed extensions of the Support Vector Machine learning approach lead to mixed integer quadratic programs that can be solved heuristic ally and a generalization of SVMs makes a state-of-the-art classification technique, including non-linear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods.
Abstract: This paper presents two new formulations of multiple-instance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristic ally. Our generalization of SVMs makes a state-of-the-art classification technique, including non-linear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods. We present experimental results on a pharmaceutical data set and on applications in automated image indexing and document categorization.

1,556 citations


Journal ArticleDOI
TL;DR: This article introduces the WoLF principle, “Win or Learn Fast”, for varying the learning rate, and examines this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games.

807 citations


Book ChapterDOI
TL;DR: This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems, including sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks.
Abstract: Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks. The paper also discusses some open research issues.

698 citations


Proceedings Article
01 Jan 2002
TL;DR: A framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, which allows for Bayesian model selection and is less complex in implementation is presented.
Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated in O(d) rather than O(n), d ≪ n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities ('error bars'), allows for Bayesian model selection and is less complex in implementation.

590 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: This paper introduces a framework for reinforcement learning on mobile robots and describes the experiments using it to learn simple tasks.
Abstract: Programming mobile robots can be a long, time-consuming process. Specifying the low-level mapping from sensors to actuators is prone to programmer misconceptions, and debugging such a mapping can be tedious. The idea of having a robot learn how to accomplish a task, rather than being told explicitly, is an appealing one. It seems easier and much more intuitive for the programmer to specify what the robot should be doing, and to let it learn the fine details of how to do it. In this paper, we introduce a framework for reinforcement learning on mobile robots and describe our experiments using it to learn simple tasks.

409 citations


Journal ArticleDOI
01 Dec 2002
TL;DR: An attempt has been made to bring together the main ideas involved in a unified framework of learning automata and provide pointers to relevant references.
Abstract: Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper by Narendra and Thathachar (1974). Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of LA, has been modified in several directions to suit different applications. Concepts such as parameterized learning automata (PLA), generalized learning,automata (GLA), and continuous action-set learning automata (CALA) have been proposed, analyzed, and applied to solve many significant learning problems. Furthermore, groups of LA forming teams and feedforward networks have been shown to converge to desired solutions under appropriate learning algorithms. Modules of LA have been used for parallel operation with consequent increase in speed of convergence. All of these concepts and results are relatively new and are scattered in technical literature. An attempt has been made in this paper to bring together the main ideas involved in a unified framework and provide pointers to relevant references.

379 citations


Proceedings ArticleDOI
07 May 2002
TL;DR: A wrapper-learning system called WL2 that can exploit several different representations of a document, including DOM-level and token-level representations, as well as two-dimensional geometric views of the rendered page and representations of the visual appearance of text asm it will be rendered.
Abstract: A program that makes an existing website look like a database is called a wrapper. Wrapper learning is the problem of learning website wrappers from examples. We present a wrapper-learning system called WL2 that can exploit several different representations of a document. Examples of such different representations include DOM-level and token-level representations, as well as two-dimensional geometric views of the rendered page (for tabular data) and representations of the visual appearance of text asm it will be rendered. Additionally, the learning system is modular, and can be easily adapted to new domains and tasks. The learning system described is part of an "industrial-strength" wrapper management system that is in active use at WhizBang Labs. Controlled experiments show that the learner has broader coverage and a faster learning rate than earlier wrapper-learning systems.

279 citations


Proceedings Article
08 Jul 2002
TL;DR: The multiple-instance (MI) learning model is applied to use a small number of training images to learn what images from the database are of interest to the user.
Abstract: We explore the application of machine learning techniques to the problem of content-based image retrieval (CBIR). Unlike most existing CBIR systems in which only global information is used or in which a user must explicitly indicate what part of the image is of interest, we apply the multiple-instance (MI) learning model to use a small number of training images to learn what images from the database are of interest to the user.

274 citations


Journal ArticleDOI
TL;DR: This tutorial survey this subject with a principal focus on the most well-known models based on kernel substitution, namely, support vector machines.

Journal ArticleDOI
TL;DR: Support vector machines (SVM) as a recent approach to classification implement classifiers of an adjustable flexibility, which are automatically and in a principled way optimised on the training data for a good generalisation performance.

Journal ArticleDOI
TL;DR: This work proposes a general active learning framework for content-based information retrieval and uses this framework to guide hidden annotations in order to improve the retrieval performance.
Abstract: We propose a general active learning framework for content-based information retrieval. We use this framework to guide hidden annotations in order to improve the retrieval performance. For each object in the database, we maintain a list of probabilities, each indicating the probability of this object having one of the attributes. During training, the learning algorithm samples objects in the database and presents them to the annotator to assign attributes. For each sampled object, each probability is set to be one or zero depending on whether or not the corresponding attribute is assigned by the annotator. For objects that have not been annotated, the learning algorithm estimates their probabilities with biased kernel regression. Knowledge gain is then defined to determine, among the objects that have not been annotated, which one the system is the most uncertain. The system then presents it as the next sample to the annotator to which it is assigned attributes. During retrieval, the list of probabilities works as a feature vector for us to calculate the semantic distance between two objects, or between the user query and an object in the database. The overall distance between two objects is determined by a weighted sum of the semantic distance and the low-level feature distance. The algorithm is tested on both synthetic databases and real databases of 3D models. In both cases, the retrieval performance of the system improves rapidly with the number of annotated samples. Furthermore, we show that active learning outperforms learning based on random sampling.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: The proposed transformation is based on simplifying the original problem and employing the Kesler construction which can be carried out by the use of properly defined kernel only and is comparable with the one-against-all decomposition solved by the state-of-the-art sequential minimal optimizer algorithm.
Abstract: We propose a transformation from the multi-class support vector machine (SVM) classification problem to the single-class SVM problem which is more convenient for optimization. The proposed transformation is based on simplifying the original problem and employing the Kesler construction which can be carried out by the use of properly defined kernel only. The experiments conducted indicate that the proposed method is comparable with the one-against-all decomposition solved by the state-of-the-art sequential minimal optimizer algorithm.

Journal ArticleDOI
Malcolm Ware1, Eibe Frank1, Geoffrey Holmes1, Mark Hall1, Ian H. Witten1 
TL;DR: It is shown that appropriate techniques can empower users to create models that compete with classifiers built by state-of-the-art learning algorithms, and that small expert-defined models offer the additional advantage that they will generally be more intelligible than those generated by automatic techniques.
Abstract: According to standard procedure, building a classifier using machine learning is a fully automated process that follows the preparation of training data by a domain expert. In contrast, interactive machine learning engages users in actually generating the classifier themselves. This offers a natural way of integrating background knowledge into the modelling stage—as long as interactive tools can be designed that support efficient and effective communication. This paper shows that appropriate techniques can empower users to create models that compete with classifiers built by state-of-the-art learning algorithms. It demonstrates that users—even users who are not domain experts—can often construct good classifiers, without any help from a learning algorithm, using a simple two-dimensional visual interface. Experiments on real data demonstrate that, not surprisingly, success hinges on the domain: if a few attributes can support good predictions, users generate accurate classifiers, whereas domains with many high-order attribute interactions favour standard machine learning techniques. We also present an artificial example where domain knowledge allows an “expert user” to create a much more accurate model than automatic learning algorithms. These results indicate that our system has the potential to produce highly accurate classifiers in the hands of a domain expert who has a strong interest in the domain and therefore some insights into how to partition the data. Moreover, small expert-defined models offer the additional advantage that they will generally be more intelligible than those generated by automatic techniques.

Journal ArticleDOI
TL;DR: Kernel methods, a new generation of learning algorithms, utilize techniques from optimization, statistics, and functional analysis to achieve maximal generality, flexibility, and performance and are considered the state- art in several machine learning tasks.
Abstract: Kernel methods, a new generation of learning algorithms, utilize techniques from optimization, statistics, and functional analysis to achieve maximal generality, flexibility, and performance. These algorithms are different from earlier techniques used in machine learning in many respects: For example, they are explicitly based on a theoretical model of learning rather than on loose analogies with natural learning systems or other heuristics. They come with theoretical guarantees about their performance and have a modular design that makes it possible to separately implement and analyze their components. They are not affected by the problem of local minima because their training amounts to convex optimization. In the last decade, a sizable community of theoreticians and practitioners has formed around these methods, and a number of practical applications have been realized. Although the research is not concluded, already now kernel methods are considered the state of the art in several machine learning tasks. Their ease of use, theoretical appeal, and remarkable performance have made them the system of choice for many learning problems. Successful applications range from text categorization to handwriting recognition to classification of gene-expression data.

Proceedings Article
01 Jan 2002
TL;DR: A procedure for rule extraction from support vector machines is proposed: the SVM+Prototypes method, which allows to give explanation ability to SVM.
Abstract: Support vector machines (SVMs) are learning systems based on the statistical learning theory, which are exhibiting good generalization ability on real data sets Nevertheless, a possible limitation of SVM is that they generate black box models In this work, a procedure for rule extraction from support vector machines is proposed: the SVM+Prototypes method This method allows to give explanation ability to SVM Once determined the decision function by means of a SVM, a clustering algorithm is used to determine prototype vectors for each class These points are combined with the support vectors using geometric methods to define ellipsoids in the input space, which are later transfers to if-then rules By using the support vectors we can establish the limits of these regions

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A fast iterative algorithm for identifying the support vectors of a given set of points using a greedy approach to pick points for inclusion in the candidate set, which is extremely competitive as compared to other conventional iterative algorithms like SMO and the NPA.
Abstract: We present a fast iterative algorithm for identifying the support vectors of a given set of points. Our algorithm works by maintaining a candidate support vector set. It uses a greedy approach to pick points for inclusion in the candidate set. When the addition of a point to the candidate set is blocked because of other points already present in the set, we use a backtracking approach to prune away such points. To speed up convergence we initialize our algorithm with the nearest pair of points from opposite classes. We then use an optimization based approach to increase or prune the candidate support vector set. The algorithm makes repeated passes over the data to satisfy the KKT constraints. The memory requirements of our algorithm scale as O(|SI|/sup 2/) in the average case, where |S| is the size of the support vector set. We show that the algorithm is extremely competitive as compared to other conventional iterative algorithms like SMO and the NPA. We present results on a variety of real life datasets to validate our claims.

Journal ArticleDOI
TL;DR: RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed and it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic.
Abstract: The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(λ) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(λ) can be viewed as the extension of RLS-TD(0) from λ =0 to general 0≤ λ ≤1, so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(λ) are proved for ergodic Markov chains. Compared to the existing LS-TD(λ) algorithm, RLS-TD(λ) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(λ) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(λ) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(λ). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(λ) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.

01 Jan 2002
TL;DR: Several techniques for learning statistical models have been developed recently by researchers in machine learning and data mining and must address a similar set of representational and algorithmic choices and must face a set of statistical challenges unique to learning from relational data.
Abstract: Several techniques for learning statistical models have been developed recently by researchers in machine learning and data mining. All of these techniques must address a similar set of representational and algorithmic choices and must face a set of statistical challenges unique to learning from relational data.

Journal ArticleDOI
TL;DR: A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism, and this claim is supported by empirical results obtained by incorporating support for queryPack execution in two existing learning systems.
Abstract: Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described for executing such query packs. A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism. This claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems.

Patent
30 Mar 2002
TL;DR: In this paper, a computer-aided learning method and an apparatus for a learning user to learn materials inexpensively was proposed, providing the user the freedom as to where and when to learn, and the guidance as to what to learn.
Abstract: A computer-aided learning method and apparatus for a learning user to learn materials inexpensively. Not only does the apparatus provide the user the freedom as to where and when to learn, and the guidance as to what to learn, the apparatus also reduces a significant hurdle to learning—Tuition. The apparatus retrieves a user identifier entered by the user, and determines whether the user is a learning user or an institute user. If the user is a learning user, the apparatus allows the user to access information regarding learning materials. If the user is an institute user, the apparatus permits the user to access information regarding at least one learning user. The institute user might be interested to use the apparatus to recruit employees to fill job openings. A learning user pays significantly less than an institute user to access information, so as to encourage the learning user to work on learning materials. The apparatus can also track and update information regarding the users.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: An application of a gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication is presented and the performance of this algorithm is compared to other routing methods on a benchmark problem.
Abstract: Reinforcement learning means learning a policy-a mapping of observations into actions-based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. We present an application of a gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication and compare the performance of this algorithm to other routing methods on a benchmark problem.

Journal ArticleDOI
TL;DR: It is concluded that real-time learning for complex motor system like humanoid robots is possible with appropriately tailored algorithms, such that increasingly autonomous robots with massive learning abilities should be achievable in the near future.
Abstract: The complexity of the kinematic and dynamic structure of humanoid robots make conventional analytical approaches to control increasingly unsuitable for such systems. Learning techniques offer a possible way to aid controller design if insufficient analytical knowledge is available, and learning approaches seem mandatory when humanoid systems are supposed to become completely autonomous. While recent research in neural networks and statistical learning has focused mostly on learning from finite data sets without stringent constraints on computational efficiency, learning for humanoid robots requires a different setting, characterized by the need for real-time learning performance from an essentially infinite stream of incrementally arriving data. This paper demonstrates how even high-dimensional learning problems of this kind can successfully be dealt with by techniques from nonparametric regression and locally weighted learning. As an example, we describe the application of one of the most advanced of such algorithms, Locally Weighted Projection Regression (LWPR), to the on-line learning of three problems in humanoid motor control: the learning of inverse dynamics models for model-based control, the learning of inverse kinematics of redundant manipulators, and the learning of oculomotor reflexes. All these examples demonstrate fast, i.e., within seconds or minutes, learning convergence with highly accurate final peformance. We conclude that real-time learning for complex motor system like humanoid robots is possible with appropriately tailored algorithms, such that increasingly autonomous robots with massive learning abilities should be achievable in the near future.

Journal ArticleDOI
TL;DR: A new, more powerful competitive learning algorithm, self-splitting competitive learning (SSCL), that is able to find the natural number of clusters based on the one-prototype-take-one-cluster (OPTOC) paradigm and a self- Splitting validity measure is presented.
Abstract: Clustering in the neural-network literature is generally based on the competitive learning paradigm. The paper addresses two major issues associated with conventional competitive learning, namely, sensitivity to initialization and difficulty in determining the number of prototypes. In general, selecting the appropriate number of prototypes is a difficult task, as we do not usually know the number of clusters in the input data a priori. It is therefore desirable to develop an algorithm that has no dependency on the initial prototype locations and is able to adaptively generate prototypes to fit the input data patterns. We present a new, more powerful competitive learning algorithm, self-splitting competitive learning (SSCL), that is able to find the natural number of clusters based on the one-prototype-take-one-cluster (OPTOC) paradigm and a self-splitting validity measure. It starts with a single prototype randomly initialized in the feature space and splits adaptively during the learning process until all clusters are found; each cluster is associated with a prototype at its center. We have conducted extensive experiments to demonstrate the effectiveness of the SSCL algorithm. The results show that SSCL has the desired ability for a variety of applications, including unsupervised classification, curve detection, and image segmentation.

Journal ArticleDOI
TL;DR: With the learning history recording and inquiry available to the users of the OT simulator, a better performance was obtained during the learning process itself, and when the use of the history mechanism was removed after 2 weeks, the better performance still remained.
Abstract: Simulations are recognized as an efficient and effective way of teaching and learning complex, dynamic systems. A new concept of simulation-based teaching with a built-in learning history is introduced in several simulation-based teaching tools. The user of these systems obtains access to past states and decisions and to the consequences of these decisions. To date, there has been very little research on the effectiveness and efficiency of the learning history in simulation-based teaching. In this paper we report the results of a controlled experiment to evaluate the effectiveness and efficiency of a learning process that takes place in a dynamic simulation. This was done with and without recording and accessing the history of the learning process, along with the ability to restart the simulation from any point. The experiment was based on the simulation teaching tool called the Operations Trainer (OT) that simulates the order fulfillment process in a manufacturing organization, implementing an Enterprise Resource Planning (ERP) system. The findings show that with the learning history recording and inquiry available to the users of the OT simulator, a better performance was obtained during the learning process itself. Moreover, when the use of the history mechanism was removed after 2 weeks, the better performance still remained. In addition, performance was similarly better in a different context, than the one used in the original learning with access to the learning history. The findings are discussed with respect to the self-learning process in simulation-based teaching environments and the practical implications of using simulators in the growing field of Electronic Learning (E-Learning).

Proceedings Article
01 Jan 2002
TL;DR: Improved performance was achieved by ensemble learning in all experiments and the best result was obtained in the third task, in which the overall correct rate increases from 84.26% to 87.17%.
Abstract: In this study, we applied ensemble machine learning to predict pitch accents. With decision tree as the baseline algorithm, two popular ensemble learning methods, bagging and boosting, were evaluated across different experiment conditions: using acoustic features only, using text-based features only; using both acoustic and text-based features. F0 related acoustic features are derived from underlying pitch targets. Models of four ToBI pitch accent types (High, Down-stepped high, Low, and Unaccented) are built at the syllable level. Results showed that in all experiments improved performance was achieved by ensemble learning. The best result was obtained in the third task, in which the overall correct rate increases from 84.26% to 87.17%.

Journal ArticleDOI
TL;DR: A system that accelerates reinforcement learning by using transfer from related tasks that achieves much of its power by transferring parts of previously learned solutions rather than a single complete solution.
Abstract: This paper discusses a system that accelerates reinforcement learning by using transfer from related tasks. Without such transfer, even if two tasks are very similar at some abstract level, an extensive re-learning effort is required. The system achieves much of its power by transferring parts of previously learned solutions rather than a single complete solution. The system exploits strong features in the multi-dimensional function produced by reinforcement learning in solving a particular task. These features are stable and easy to recognize early in the learning process. They generate a partitioning of the state space and thus the function. The partition is represented as a graph. This is used to index and compose functions stored in a case base to form a close approximation to the solution of the new task. Experiments demonstrate that function composition often produces more than an order of magnitude increase in learning rate compared to a basic reinforcement learning algorithm.

Journal ArticleDOI
01 Jun 2002
TL;DR: It is suggested that genetic algorithms are probably the most general approach for adding generalization although they might be not the only solution.
Abstract: We analyze learning classifier systems in the light of tabular reinforcement learning. We note that although genetic algorithms are the most distinctive feature of learning classifier systems, it is not clear whether genetic algorithms are important to learning classifiers systems. In fact, there are models which are strongly based on evolutionary computation (e.g., Wilson's XCS) and others which do not exploit evolutionary computation at all (e.g., Stolzmann's ACS). To find some clarifications, we try to develop learning classifier systems “from scratch”, i.e., starting from one of the most known reinforcement learning technique, Q-learning. We first consider thebasics of reinforcement learning: a problem modeled as a Markov decision process and tabular Q-learning. We introduce a formal framework to define a general purpose rule-based representation which we use to implement tabular Q-learning. We formally define generalization within rules and discuss the possible approaches to extend our rule-based Q-learning with generalization capabilities. We suggest that genetic algorithms are probably the most general approach for adding generalization although they might be not the only solution.

Proceedings ArticleDOI
Manabu Sassano1
06 Jul 2002
TL;DR: It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool.
Abstract: We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve 97.0% accuracy, the proposed technique needs 59.3% of labeled examples that are required when using the previous technique and only 17.4% of labeled examples with random sampling.