Batch and online learning algorithms for nonconvex neyman-pearson classification
read more
Citations
Adaptation and learning in automatic systems
DC programming and DCA: thirty years of developments
Optimization Methods for Large-Scale Machine Learning
Incremental Learning From Stream Data
Distributed Online One-Class Support Vector Machine for Anomaly Detection Over Networks
References
Statistical learning theory
Statistical significance for genomewide studies
Advances in kernel methods: support vector learning
Fast training of support vector machines using sequential minimal optimization, advances in kernel methods
Fast training of support vector machines using sequential minimal optimization
Related Papers (5)
Frequently Asked Questions (13)
Q2. What dataset was used for the q-value optimization experiments?
The q-value optimization experiments were carried out using a proteomics dataset consisting of 139410 samples with positive and negative samples equally represented [Spivak et al. 2009].
Q3. How many pairs of parameters are used for ONP-SVM?
As the authors tested 100 pairs of parameters (C+, C−) for AC-SVM and only 36 pairs (γ, ν) for ONP-SVM, the computing time gain is highly appreciable.
Q4. What is the third flavor of generative approach?
A third flavor of generative approach addresses the estimation of class-condition distributions by Parzen window [Bounsiar et al. 2008].
Q5. How many values of (C+, C) were searched in the hyperbox?
In the case of AC-SVM, 15×15 values of (C+, C−) were searched in the hyperbox [0.001, 1000]2 for Pageblocks and [0.01, 100]2 for the other datasets.
Q6. What is the earliest attempt to estimate class-conditional distributions?
One of the earliest attempts [Streit 1990] uses multi-layered neural network to estimate class-conditional distributions as mixture of Gaussians.
Q7. How many documents are used as the training set?
In order to make the problem large scale, the authors use the official testing set (781,265 documents) as the training set and the official training set (23,149 documents) as the testing set.
Q8. Why do the authors not report results obtained with SVMPerf?
The authors do not report results obtained with SVMPerf [Joachims 2005] because, running the algorithm for plain kernel machines is “painfully slow”, according to the SVMPerf web site.
Q9. What is the algorithm for calculating the cost of asymmetric SVM?
• Compute βi ← {1 if yif(xi) < −η 0 otherwise• Update λ← λ(1 + ν(P̂fa − ρ)) until convergence.iterations start with β = 0, the first iteration solves the classical asymmetric cost SVM problem whose solution is progressively improved by taking the nonconvexity into account and updating λ in order to achieve the target Pfa.
Q10. What is the approach to replace the 0–1 loss in Pnd and P?
Their approach consists in replacing the 0–1 loss in P̃nd and P̃fa by a continuous nonconvex approximation such as the sigmoid loss`(z) = 11 + eηz (4)or the ramp loss`(z) = max { 0, 12η (η − z)} −max { 0, − 12η (η + z)} .
Q11. What is the difference between the two convex functions?
Since the analytical expression of the ramp loss (5) is a difference `1(z)− `2(z) of two convex functions, the full Lagrangian can also be expressed as the difference of two convex functions amenable to DC programming [Tao and An 1998].
Q12. How many units are used in the neural network?
The neural network replicates the structure of the state-of-the-art model (qRanker) [Spivak et al. 2009] with a single hidden layer with 5 units.
Q13. What is the classifier for each method?
The best classifier for each method was selected using the validation criterion [Davenport et al. 2010]:Jval = Pnd + max(0,Pfa − ρ)/ρ .ACM Journal Name, Vol. V, No. N, Month 20YY.