On page 24 in The Elements of Statistical Learning (ESL) by Hastie et al, the Bias-Variance decomposition is shown, but not derived. It turns out the derivation is quite easy, but also a bit tedious. I am presenting the derivation here using notation similar to ESL. I hope that this saves someone some time.
I’d also like to credit these notes, which provided me the trick necessary to derive this, but which unfortunately did not provide the gory details.
To recap the notation used in ESL, we have
Recall the definitions of Variance and Bias Squared:
Now we have mean-squared error:
The big trick required to get the result is to simultaneously add and subtract
Note that the underlined terms used in the third step are combined as the first two lines of the fourth step and are the terms that make up the variance and bias squared.