AdamMCMC: Bayesian Deep Learning

AdamMCMC: Bayesian Deep Learning
Uncertainty decomposition via AdamMCMC on the CIFAR10 testset and its impact on prediction quality.

Uncertainty quantification is essential for deploying deep learning in safety-critical applications such as autonomous systems, medicine, and scientific discovery yet existing methods sacrifice accuracy through parametric approximations.

We develop AdamMCMC, a novel Markov Chain Monte Carlo algorithm that bridges stochastic optimization and Bayesian inference. AdamMCMC wraps the popular Adam optimizer inside a Metropolis Adjusted Langevin Algorithm (MALA) framework, using a *prolate proposal distribution* aligned with the Adam update direction. This ensures high acceptance rates for common deep learning step sizes, while the Metropolis-Hastings correction guarantees convergence to the exact Gibbs posterior.

We cover the following aspects:

  • Theory: Because AdamMCMC uses a Metropolis-Hastings correction, the chain converges to the exact Gibbs posterior — unlike stochastic gradient methods that only approximate it at finite step sizes. We characterize this convergence in total variation distance.
  • Scaling: The algorithm is adapted for multi-GPU training via Distributed Data Parallel (DDP), adding only ~15% training time overhead compared to standard Adam at 8 GPUs.
  • Uncertainty disentanglement: We apply AdamMCMC to a large-scale benchmark that separates aleatoric (data) from epistemic (model) uncertainty. The results suggest that how flat the loss landscape is around a minimum plays an important role in how well a method can disentangle the two uncertainty types.

AdamMCMC makes principled Bayesian inference over neural network weights practical at scale, offering a drop-in replacement for Adam with rigorous uncertainty estimates.