In this work, we study stochastic quasi-Newton methods for solving the non-linear and non-convex optimization problems arising in the training of deep neural networks. We consider the limited memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update in the framework of a trust-region approach. We provide an almost comprehensive overview of recent improvements in quasi-Newton based training algorithms, such as accurate selection of the initial Hessian approximation, efficient solution of the trust-region subproblem with a direct method in high accuracy and an overlap sampling strategy to assure stable quasi-Newton updating by computing gradient differences based on this overlap. We provide a comparison of the standard L-BFGS method with a variant of this algorithm based on a modified secant condition which is theoretically shown to provide an increased order of accuracy in the approximation of the curvature of the Hessian. In our experiments, both quasi-Newton updates exhibit comparable performances. Our results show that with a fixed computational time budget the proposed quasi-Newton methods provide comparable or better testing accuracy than the state-of-the-art first-order Adam optimizer.
A Stochastic Modified Limited Memory BFGS for Training Deep Neural Networks
Yousefi M.;Martinez Calomardo A.
2022-01-01
Abstract
In this work, we study stochastic quasi-Newton methods for solving the non-linear and non-convex optimization problems arising in the training of deep neural networks. We consider the limited memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update in the framework of a trust-region approach. We provide an almost comprehensive overview of recent improvements in quasi-Newton based training algorithms, such as accurate selection of the initial Hessian approximation, efficient solution of the trust-region subproblem with a direct method in high accuracy and an overlap sampling strategy to assure stable quasi-Newton updating by computing gradient differences based on this overlap. We provide a comparison of the standard L-BFGS method with a variant of this algorithm based on a modified secant condition which is theoretically shown to provide an increased order of accuracy in the approximation of the curvature of the Hessian. In our experiments, both quasi-Newton updates exhibit comparable performances. Our results show that with a fixed computational time budget the proposed quasi-Newton methods provide comparable or better testing accuracy than the state-of-the-art first-order Adam optimizer.File | Dimensione | Formato | |
---|---|---|---|
518007_1_En_2_Chapter_Author.pdf
Open Access dal 08/07/2023
Tipologia:
Bozza finale post-referaggio (post-print)
Licenza:
Copyright Editore
Dimensione
1.3 MB
Formato
Adobe PDF
|
1.3 MB | Adobe PDF | Visualizza/Apri |
Intelligent Computing_ Proceedings of the 2022 Computing Conference,.pdf
Accesso chiuso
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
2.35 MB
Formato
Adobe PDF
|
2.35 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.