Momentum for the  Frank Wolfe Method

Modern machine learning tasks built to learn from data can be typically formulated as optimization problems. The large volume of data justifies the pressing need for efficient and scalable iterative algorithms that are designed specifically to accommodate to the computation resource at hand and the requirement of structural (e.g., sparse) solutions. Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented merits in machine learning and signal processing applications that involves minimizing a loss function with constraints. Compared to projection based methods, one of the key benefits is that FW overcomes the need of projection, which is computationally heavy. Unlike projection-based methods however, momentum cannot improve the convergence rate of FW, in general. For this reason, momentum is relatively less studied in the FW literature. This limitation motivates the work in this dissertation. In Chapter 2, we deal with heavy ball momentum and its impact to FW. Specifically, it is established that heavy ball offers a unifying perspective on the primal-dual (PD) convergence, and enjoys a tighter \textit{per iteration} PD error rate, for multiple choices of step sizes, where PD error can serve as the stopping criterion in practice. In addition, it is asserted that restart, a scheme typically employed jointly with Nesterov's momentum, can further tighten this PD error bound. Going beyond heavy ball momentum, we establish the connections between the subproblem in FW and Nesterov's momentum in Chapter 3. On the negative side, these connections show why momentum is unlikely to be effective for FW type algorithms on general problems. The encouraging message behind this link, on the other hand, is that Nesterov's momentum accelerates FW on a class of problems encountered in many signal processing and machine learning applications. In particular, we prove that a momentum variant of FW, that we term accelerated Frank Wolfe (AFW), converges with a faster rate ${\cal O}(\frac{1}{k^2})$ on such a family of problems despite the same ${\cal O}(\frac{1}{k})$ rate as FW on general cases. Our faster rates rely on parameter-free step sizes, which distinguishes with most of existing faster rates of FW variants. Chapter 4 introduces and analyzes a variant of FW termed ExtraFW. The distinct feature of ExtraFW is the pair of gradients leveraged per iteration, thanks to which the decision variable is updated in a prediction-correction (PC) format. Relying on no problem dependent parameters in the step sizes, ExtraFW convergences at a faster rate ${\cal O}\big(\frac{1}{k^2} \big)$ on a class of machine learning problems. Compared with other parameter-free FW variants that have faster rates on the same problems such as AFW, ExtraFW has improved rates and fine-grained analysis thanks to its PC update. Numerical tests on binary classification with different sparsity-promoting constraints demonstrate that the empirical performance of HFW, AFW and ExtraFW is significantly better than FW. We also observe that AFW and ExtraFW are even faster than Nesterov's accelerated gradient on certain datasets, even though they rely on no problem dependent parameters. For matrix completion, the solutions found by HFW, AFW and ExtraFW enjoy smaller optimality gap, and lower rank than FW.

Keywords

conditional gradient

first order methods

frank wolfe

optimization

Description

University of Minnesota Ph.D. dissertation. May 2022. Major: Electrical Engineering. Advisor: Georgios Giannakis. 1 computer file (PDF); x, 116 pages.

Collections

Dissertations

Suggested citation

Li, Bingcong. (2022). Momentum for the Frank Wolfe Method. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/241422.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Momentum for the Frank Wolfe Method

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Momentum for the Frank Wolfe Method

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation