scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Pattern Recognition and Machine Learning

01 Aug 2007-Technometrics (Taylor & Francis)-Vol. 49, Iss: 3, pp 366-366
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.
Citations
More filters
Posted Content
TL;DR: This paper proposes a regularization formulation for learning the relationships between tasks in multi-task learning, called MTRL, which can also describe negative task correlation and identify outlier tasks based on the same underlying principle.
Abstract: Multi-task learning is a learning paradigm which seeks to improve the generalization performance of a learning task with the help of some other related tasks. In this paper, we propose a regularization formulation for learning the relationships between tasks in multi-task learning. This formulation can be viewed as a novel generalization of the regularization framework for single-task learning. Besides modeling positive task correlation, our method, called multi-task relationship learning (MTRL), can also describe negative task correlation and identify outlier tasks based on the same underlying principle. Under this regularization framework, the objective function of MTRL is convex. For efficiency, we use an alternating method to learn the optimal model parameters for each task as well as the relationships between tasks. We study MTRL in the symmetric multi-task learning setting and then generalize it to the asymmetric setting as well. We also study the relationships between MTRL and some existing multi-task learning methods. Experiments conducted on a toy problem as well as several benchmark data sets demonstrate the effectiveness of MTRL.

363 citations


Cites background from "Pattern Recognition and Machine Lea..."

  • ...It follows that the posterior distribution for W, which is proportional to the product of the prior and the likelihood function [6], is given by:...

    [...]

Journal ArticleDOI
TL;DR: A patient‐specific algorithm for seizure prediction using multiple features of spectral power from electroencephalogram (EEG) and support vector machine (SVM) classification is proposed.
Abstract: Summary Purpose: We propose a patient-specific algorithm for seizure prediction using multiple features of spectral power from electroencephalogram (EEG) and support vector machine (SVM) classification. Methods: The proposed patient-specific algorithm consists of preprocessing, feature extraction, SVM classification, and postprocessing. Preprocessing removes artifacts of intracranial EEG recordings and they are further preprocessed in bipolar and/or time-differential methods. Features of spectral power of raw, or bipolar and/or time-differential intracranial EEG (iEEG) recordings in nine bands are extracted from a sliding 20-s–long and half-overlapped window. Nine bands are selected based on standard EEG frequency bands, but the wide gamma bands are split into four. Cost-sensitive SVMs are used for classification of preictal and interictal samples, and double cross-validation is used to achieve in-sample optimization and out-of-sample testing. We postprocess SVM classification outputs using the Kalman Filter and it removes sporadic and isolated false alarms. The algorithm has been tested on iEEG of 18 patients of 20 available in the Freiburg EEG database who had three or more seizure events. To investigate the discriminability of the features between preictal and interictal, we use the Kernel Fisher Discriminant analysis. Key findings: The proposed patient-specific algorithm for seizure prediction has achieved high sensitivity of 97.5% with total 80 seizure events and a low false alarm rate of 0.27 per hour and total false prediction times of 13.0% over a total of 433.2 interictal hours by bipolar preprocessing (92.5% sensitivity, a false positive rate of 0.20 per hour, and false prediction times of 9.5% by time-differential preprocessing). This high prediction rate demonstrates that seizures can be predicted by the patient-specific approach using linear features of spectral power and nonlinear classifiers. Bipolar and/or time-differential preprocessing significantly improves sensitivity and specificity. Spectral powers in high gamma bands are the most discriminating features between preictal and interictal. Significance: High sensitivity and specificity are achieved by nonlinear classification of linear features of spectral power. Power changes in certain frequency bands already demonstrated their possibilities for seizure prediction indicators, but we have demonstrated that combining those spectral power features and classifying them in a multivariate approach led to much higher prediction rates. Employing only linear features is advantageous, especially when it comes to an implantable device, because they can be computed rapidly with low power consumption.

362 citations


Cites background from "Pattern Recognition and Machine Lea..."

  • ...The SVM is considered the most powerful and favorable classifier in the statistical learning community (Alpaydin, 2004; Bishop, 2006; Cherkassky & Mulier, 2007)....

    [...]

Journal ArticleDOI
TL;DR: Modulation spectral features are proposed for the automatic recognition of human affective information from speech and render a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for emotion recognition.

359 citations


Cites background or methods from "Pattern Recognition and Machine Lea..."

  • ...Using all the features for machine learning might deteriorate recognition performance due to the curse of dimensionality (Bishop, 2006)....

    [...]

  • ...It is also interesting to see that in these tests, using six LDA transformed features delivers even higher accuracy than using dozens of SFS features, indicating that the effective reduction of feature dimensionality offered by LDA indeed contributes to recognition performance....

    [...]

  • ...Similar to the case where spectral features are evaluated individually, the MSFs achieve the highest overall accuracy when combined with prosodic features, and up to 91.6% recognition rate can be obtained using LDA with SN. Applying LDA with SN also gives the best recognition performance for icates the best performance in each test....

    [...]

  • ...The numeric results of applying SFS and LDA techniques to the FDR screened feature pools are detailed in Table 2 with SVMs employed for classification and the results averaged over the 10 cross-validation trials....

    [...]

  • ...Since the maximum rank of Sb for a C-class problem is C 1, the maximum number of LDA features is also C 1....

    [...]

Journal ArticleDOI
Yu Zheng1
TL;DR: High-level principles of each category of methods are introduced, and examples in which these techniques are used to handle real big data problems are given, to help a wide range of communities find a solution for data fusion in big data projects.
Abstract: Traditional data mining usually deals with data from a single domain. In the big data era, we face a diversity of datasets from different sources in different domains. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. How to unlock the power of knowledge from multiple disparate (but potentially connected) datasets is paramount in big data research, essentially distinguishing big data from traditional data mining tasks. This calls for advanced techniques that can fuse knowledge from various datasets organically in a machine learning and data mining task. This paper summarizes the data fusion methodologies, classifying them into three categories: stage-based, feature level-based, and semantic meaning-based data fusion methods. The last category of data fusion methods is further divided into four groups: multi-view learning-based, similarity-based, probabilistic dependency-based, and transfer learning-based methods. These methods focus on knowledge fusion rather than schema mapping and data merging, significantly distinguishing between cross-domain data fusion and traditional data fusion studied in the database community. This paper does not only introduce high-level principles of each category of methods, but also give examples in which these techniques are used to handle real big data problems. In addition, this paper positions existing works in a framework, exploring the relationship and difference between different data fusion methods. This paper will help a wide range of communities find a solution for data fusion in big data projects.

356 citations

Journal ArticleDOI
TL;DR: The Bayesian update of dialogue state framework was shown to be a feasible and effective approach to building real-world POMDP-based dialogue systems and a method for learning in spoken dialogue systems which uses a component-based policy with the episodic Natural Actor Critic algorithm.

355 citations


Cites background or methods from "Pattern Recognition and Machine Lea..."

  • ...An explanation of the reasoning involved in obtaining the algorithm is beyond the scope of this paper but may be found in (Bishop, 2006, Chapter 8)....

    [...]

  • ...If observations do not distinguish between different concept-values one would expect the messages for those values to be the same....

    [...]

  • ...The technique is based on the loopy belief propagation (LBP) algorithm (Bishop, 2006, Chapter 8)....

    [...]