The Cold Posterior Effect in Random Features Models: A Theoretical Explanation

Jun Li1, Xiao Bai1, Jin Zheng1
1The school of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
DOI: https://doi.org/10.71448/bcds2452-3
Published: 30/06/2024
Cite this article as: Jun Li, Xiao Bai, Jin Zheng. The Cold Posterior Effect in Random Features Models: A Theoretical Explanation. Bulletin of Computer and Data Sciences, Volume 5 Issue 2. Page: 30-46.

Abstract

The “cold posterior effect”—where raising the Bayesian posterior to a power greater than 1 improves predictive performance—remains one of the most puzzling empirical phenomena in Bayesian deep learning. While numerous heuristic explanations have been proposed, a rigorous theoretical understanding remains elusive. In this paper, we provide the first theoretical analysis of this effect through the lens of random features regression. We prove that in the overparameterized regime, the posterior predictive distribution becomes systematically over-dispersed relative to the true risk of the maximum a posteriori (MAP) estimator. This miscalibration naturally suggests tempering the posterior to achieve better uncertainty quantification. Using recent asymptotic results for Bayesian random features models, we derive explicit conditions under which cold tempering improves frequentist coverage of credible sets and characterize the optimal temperature parameter. Our theoretical results are validated by numerical experiments and provide a mathematically grounded explanation for why cold posteriors work in practice.

Keywords: cold posterior effect, Bayesian deep learning, random features regression, posterior tempering, uncertainty calibration

Abstract

The “cold posterior effect”—where raising the Bayesian posterior to a power greater than 1 improves predictive performance—remains one of the most puzzling empirical phenomena in Bayesian deep learning. While numerous heuristic explanations have been proposed, a rigorous theoretical understanding remains elusive. In this paper, we provide the first theoretical analysis of this effect through the lens of random features regression. We prove that in the overparameterized regime, the posterior predictive distribution becomes systematically over-dispersed relative to the true risk of the maximum a posteriori (MAP) estimator. This miscalibration naturally suggests tempering the posterior to achieve better uncertainty quantification. Using recent asymptotic results for Bayesian random features models, we derive explicit conditions under which cold tempering improves frequentist coverage of credible sets and characterize the optimal temperature parameter. Our theoretical results are validated by numerical experiments and provide a mathematically grounded explanation for why cold posteriors work in practice.

Keywords: cold posterior effect, Bayesian deep learning, random features regression, posterior tempering, uncertainty calibration
Jun Li
The school of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
Xiao Bai
The school of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
Jin Zheng
The school of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China

DOI

Cite this article as:

Jun Li, Xiao Bai, Jin Zheng. The Cold Posterior Effect in Random Features Models: A Theoretical Explanation. Bulletin of Computer and Data Sciences, Volume 5 Issue 2. Page: 30-46.

Publication history

Copyright © 2024 Jun Li, Xiao Bai, Jin Zheng. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Browse Advance Search