A Tutorial on the Cross-Entropy Method
read more
Citations
Machine learning
Statistical Pattern Recognition
BASNet: Boundary-Aware Salient Object Detection
UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation
High-throughput Ethomics in Large Groups of Drosophila
References
Genetic algorithms in search, optimization, and machine learning
Computers and Intractability: A Guide to the Theory of NP-Completeness
Machine learning
Related Papers (5)
Frequently Asked Questions (12)
Q2. What is the optimum sequence of mean vectors?
During the course of the algorithm, the sequence of mean vectors ideally tends to the maximizer x∗, while the vector of standard deviations tend to the zero vector.
Q3. What is the important part of the algorithm?
Apart from specifying the family of sampling probability densities, the initial vector v̂0, the sample size N and the rarity parameter % (typically between 0.01 and 0.1), the algorithm is completely self-tuning.
Q4. What is the idea of the CE method?
The idea of the CE method is to choose the importance sampling density g in a specified class of densities such that the cross-entropy or KullbackLeibler divergence between the optimal importance sampling density g∗ and g is minimal.
Q5. What is the penalty function for the i-th constraint?
the i-th penalty function Pi (corresponding to the i-th constraint) is defined asPi(x) = Hi max(Gi(x), 0) (12)and Hi > 0 measures the importance (cost) of the i-th penalty.
Q6. What is the CE method for estimating the importance of a random variable?
The CE minimization procedure then reduces to finding an optimal reference parameter vector, v∗ say, by crossentropy minimization.
Q7. What is the way to estimate the importance of a random variable?
(4)In most cases of interest the sample performance function H is non-negative, and the “nominal” probability density f is parameterized by a finite-dimensional vector u; that is, f(x) = f(x;u).
Q8. What is the possible stopping criterion for the CE algorithm?
A possible stopping criterion is to stop when all standard deviations are smaller than some ε.Constrained optimization problems can be put in the framework (6) by taking X a (non-linear) region defined by some system of inequalities:Gi(x) 6 0, i = 1, . . . , L . (10)To solve the program (6) with constraints (10), two approaches can be adopted.
Q9. What is the CE method for determining a good reference parameter?
The CE method for optimization produces a sequence of levels {γ̂t} and reference parameters {v̂t} such that the former tends to the optimal γ∗ and the latter to the optimal reference vector v∗ corresponding to the point mass at x∗; see, e.g., [46, page 251].
Q10. What is the simplest method for estimating rare events?
Generate a sample X1, . . . ,XN1 according to the probability density f(·; v̂T ) and estimate ` via importance sampling, as in (3).
Q11. What is the way to solve a combinatorial optimization problem?
In particular, it is shown that with appropriate smoothing the CE method converges and finds the optimal solution with probability arbitrarily close to 1.
Q12. What types of problems can be used for the CE method?
(6)This setting includes many types of optimization problems: discrete (combinatorial), continuous, mixed, and constrained problems.