We suggest adaptive weight decay, which mechanically tunes the hyper-parameter for weight decay throughout every coaching iteration. For classification issues, we suggest altering the worth of the burden decay hyper-parameter on the fly based mostly on the power of updates from the classification loss (i.e., gradient of cross-entropy), and the regularization loss (i.e., -norm of the weights). We present that this straightforward modification may end up in massive enhancements in adversarial robustness — an space which suffers from sturdy overfitting — with out requiring further knowledge throughout varied datasets and structure decisions. For instance, our reformulation ends in 20% relative robustness enchancment for CIFAR-100, and 10% relative robustness enchancment on CIFAR-10 evaluating to the very best tuned hyper-parameters of conventional weight decay leading to fashions which have comparable efficiency to SOTA robustness strategies. As well as, this technique has different fascinating properties, similar to much less sensitivity to studying charge, and smaller weight norms, which the latter contributes to robustness to overfitting to label noise, and pruning.