Alternating optimization

Background

What is alternating optimization?

Alternating optimization is an iterative procedure for optimizing some function jointly over all parameters by alternating restricted optimization over individual parameter subsets.

More precisely, consider minimizing or maximizing \(f:\mathbb{R}^n \to \mathbb{R}\) over the set of feasible points \(x \in X \subseteq \mathbb{R}^n\). The underlying algorithm of alternating optimization is as follows:

Assign an initial value for \(x\).
Optimize \(f\) with respect to a subset of parameters \(\tilde{x}\) while holding the other parameters constant. Note that alternating optimization is a generalization of joint optimization, where the only parameter subset would be the whole set of parameters.
Replace the values in \(x\) by the optimal values for \(\tilde{x}\) found in step 2.
Repeat from step 2 with another parameter subset.
Stop when the process has converged or reached the iteration limit.

When is alternating optimization a good idea?

When the joint optimization is numerically difficult or not feasible.
When there is a natural division of the parameters. That is the case for likelihood functions, where the parameter space naturally divides into parameter subsets, e.g. corresponding to linear effects, variances and covariances with different influence on the likelihood value.
To improve optimization time in some cases, see (Hu and Hathaway 2002) for an example.
Compared to joint optimization, alternating optimization may be better in bypassing local optima, see (James C. Bezdek and Hathaway 2002).

What are the properties of alternating optimization?

Alternating optimization under certain conditions on \(f\) can convergence to the global optimum. However, the set of possible solutions also contains saddle points (James C. Bezdek et al. 1987). J. Bezdek and Hathaway (2003) provides additional details on the convergence rate.

Application

As an application, we consider minimizing the Himmelblau’s function in \(n = 2\) dimensions with parameter constraints \(-5 \leq x_1, x_2 \leq 5\):

library("ao")
#> Loading required package: optimizeR
#> Thanks for using {ao} version 0.3.3, happy alternating optimization!
#> Documentation: https://loelschlaeger.de/ao
himmelblau <- function(x) (x[1]^2 + x[2] - 11)^2 + (x[1] + x[2]^2 - 7)^2
ao(
  f = himmelblau, p = c(0, 0), partition = list(1, 2),
  base_optimizer = optimizeR::Optimizer$new(
    which = "stats::optim", lower = -5, upper = 5, method = "L-BFGS-B"
  )
)
#> $value
#> [1] 1.940035e-12
#> 
#> $estimate
#> [1]  3.584428 -1.848126
#> 
#> $sequence
#>    iteration partition        value      seconds       p1        p2
#> 1          0        NA 1.700000e+02 0.0000000000 0.000000  0.000000
#> 2          1         1 1.327270e+01 0.0093390942 3.395691  0.000000
#> 3          1         2 1.743666e+00 0.0009489059 3.395691 -1.803183
#> 4          2         1 2.847292e-02 0.0007188320 3.581412 -1.803183
#> 5          2         2 4.687472e-04 0.0006120205 3.581412 -1.847412
#> 6          3         1 7.368063e-06 0.0011599064 3.584381 -1.847412
#> 7          3         2 1.157612e-07 0.0004951954 3.584381 -1.848115
#> 8          4         1 1.900153e-09 0.0004329681 3.584427 -1.848115
#> 9          4         2 4.221429e-11 0.0003859997 3.584427 -1.848126
#> 10         5         1 3.598278e-12 0.0003240108 3.584428 -1.848126
#> 11         5         2 1.940035e-12 0.0003900528 3.584428 -1.848126
#> 
#> $seconds
#> [1] 0.01480699

The call minimizes f by alternating optimizing with respect to each parameter separately, where the parameters all are initialized at the value 0.

References

Bezdek, James C., and Richard J. Hathaway. 2002. “Some Notes on Alternating Optimization.” Proceedings of the 2002 AFSS International Conference on Fuzzy Systems. Calcutta: Advances in Soft Computing. https://doi.org/10.1007/3-540-45631-7_39.

Bezdek, James C, Richard J Hathaway, Michael J Sabin, and William T Tucker. 1987. “Convergence Theory for Fuzzy c-Means: Counterexamples and Repairs.” IEEE Transactions on Systems, Man, and Cybernetics 17 (5): 873–77. https://doi.org/10.1109/TSMC.1987.6499296.

Bezdek, James, and Richard Hathaway. 2003. “Convergence of Alternating Optimization.” Neural, Parallel and Scientific Computations 11 (December): 351–68.

Hu, Yingkang, and Richard J Hathaway. 2002. “On Efficiency of Optimization in Fuzzy c-Means.” Neural, Parallel and Scientific Computations 10.