11 (Gradient) Boosting

11.1 Exercises

Exercise 11.1: First Step of Gradient Boosting (Regression)

You observe four data points:
\[ (x_i, y_i) = (1,2), (2,3), (3,2.5), (4,5) \]

Assume squared error loss and decision stumps as weak learners (one split). No learning rate (i.e. \(\nu = 1\)).

Briefly state the main idea of (gradient) boosting in regression.
Compute the initial model \(f_0(x)\), residuals \(r_i\), fit the optimal first stump \(h_1(x)\) (try split points at midpoints 1.5, 2.5, 3.5), and write the updated model \(f_1(x)=f_0(x)+h_1(x)\).

Please note the following: In the lecture, I presented the boosting algorithm with \(F_0(x) = 0\) in line with what you can find in ISL. However, most implementations (like sklearn) use the mean \(\bar y\) as initial model. My solution below will use this more widespread initial model.

Exam level

Hint 1 (Initial Model)

For squared error loss the best constant is the sample mean of \(y\).

Hint 2 (Residual Stump)

For each candidate split compute mean residual left/right → choose split minimizing residual SSE.

Hint 3 (Update)

With \(\nu=1\): \(F_1 = F_0 + h_1\). If a learning rate \(\nu<1\) were used you would scale (h_1).

Solution

Boosting builds an additive model \(f_B(x)=\sum_{b=0}^B h_b(x)\) where each new weak learner fits (an approximation to) the negative gradient of the loss—under squared error this is the residual.
\[ f_0(x)=\bar y=\tfrac{2+3+2.5+5}{4}=3.125 \] Residuals: \[ r=(-1.125,\,-0.125,\,-0.625,\,1.875) \] Candidate split points: 1.5, 2.5, 3.5. Optimal: (x=3.5) (groups: first three vs last).

Region means: \[ \bar r_{x\le 3.5}=\tfrac{-1.125-0.125-0.625}{3}=-0.625,\qquad \bar r_{x>3.5}=1.875 \] So \[ h_1(x)=\begin{cases}-0.625 & x\le 3.5\\ 1.875 & x>3.5\end{cases} \] Update: \[ f_1(x)=f_0(x)+h_1(x)= \begin{cases} 3.125-0.625=2.5 & x\le 3.5\\ 3.125+1.875=5.0 & x>3.5 \end{cases} \]