Hyperparameter Optimization: Distributed Hardware-Aware and Homotopy-Based Strategies

Keywords: machine learning, hyperparameters, optimization, distributed computing, homotopy
Spring 2017 - Present
teaser image

Description

Computer vision is experiencing an AI renaissance, in which machine learning models are expediting important breakthroughs in academic research and commercial applications. Effectively training these models, however, is not trivial due in part to hyperparameters: user-configured values that control a model’s ability to learn from data. Existing hyperparameter optimization methods are highly parallel but make no effort to balance the search across heterogeneous hardware or to prioritize searching high-impact spaces.

In this work, we introduce a framework for massively Scalable Hardware-Aware Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the relative complexity of each search space and monitors performance on the learning task over all trials. These metrics are then used as heuristics to assign hyperparameters to distributed workers based on their hardware. We first demonstrate that our framework achieves double the throughput of a standard distributed hyperparameter optimization framework by optimizing SVM for MNIST using 150 distributed workers. We then conduct model search with SHADHO over the course of one week using 74 GPUs across two compute clusters to optimize U-Net for a cell segmentation task, discovering 515 models that achieve a lower validation loss than standard U-Net.

This project also looks at hyperparameter optimization strategies that can enhance existing approaches. Many hyperparameter optimization techniques rely on naive search methods or assume that the loss function is smooth and continuous, which may not always be the case. Traditional methods, like grid search and Bayesian optimization, often struggle to quickly adapt and efficiently search the loss landscape. Grid search is computationally expensive, while Bayesian optimization can be slow to prime. Since the search space for hyperparameter optimization is frequently high-dimensional and non-convex, it is often challenging to efficiently find a global minimum. Moreover, optimal hyperparameters can be sensitive to the specific dataset or task, further complicating the search process. To address these issues, we propose a new hyperparameter optimization method, HomOpt, using a data-driven approach based on a generalized additive model (GAM) surrogate combined with homotopy optimization. This strategy augments established optimization methodologies to boost the performance and effectiveness of any given method with faster convergence to the optimum on continuous, discrete, and categorical domain spaces.

This work was supported by IARPA contract #D16PC00002, the NVIDIA Corporation, DEVCOM Army Research Laboratory under cooperative agreement W911NF-20-2-0218, and the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357

Publications

  • "Multi-Objective Hyperparameter Optimization with Homotopy-Based Strategies for
    Enhanced Automatic Target Recognition Models,"
    Sophia Abraham, Steve Cruz, Suya You,
    Jonathan Hauenstein, Walter Scheirer,
    Proceedings of the SPIE Defense + Commercial Sensing Symposium
    (Best Student Paper Award),
    April 2024.
  • "NCQS: Nonlinear Convex Quadrature Surrogate Hyperparameter Optimization,"
    Sophia Abraham, Kehelwala Maduranga, Jeffery Kinnison, Jonathan Hauenstein,
    Walter Scheirer,
    Proceedings of the 1st Workshop on Resource Efficient Deep Learning for Computer Vision,
    October 2023.
  • "Efficient Hyperparameter Optimization for ATR Using Homotopy Parametrization,"
    Sophia Abraham, Jeffery Kinnison, Zachary Miksis, Domenick Poster, Suya You,
    Jonathan Hauenstein, Walter Scheirer,
    Proceedings of the SPIE Defense + Commercial Sensing Symposium
    (Best Student Paper Award),
    April 2023.
  • "Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance
    for Low-Resource Machine Translation,"
    Kenton Murray, Jeff Kinnison, Toan Nguyen, Walter J. Scheirer, David Chiang,
    Proceedings of the Workshop on Neural Generation and Translation (WNGT),
    November 2019.
  • "SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter
    Optimization,"
    Jeff Kinnison, Nathaniel Kremer-Herman, Douglas Thain, Walter J. Scheirer,
    Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV),
    March 2018.