f(x, y) = (1.5 - x + xy)2 + (2.25 - x + xy2)2 + (2.625 - x +
xy3)2
Gradient Descent Parameters
Understanding Gradient Descent Parameters:
Learning Rate: This controls the size of the steps taken towards the minimum of the function.
A small learning rate means smaller, more cautious steps. A large learning rate means larger,
potentially overshooting steps. Too large, and the algorithm might diverge (move away from the
minimum). Too small, and it will take a very long time to converge.
Iterations: This is the maximum number of steps the algorithm will take. If the algorithm
converges (finds the minimum) before reaching the maximum number of iterations, it stops. If it
doesn't converge, it stops after this many iterations.
Start X and Start Y: These are the initial coordinates (x, y) where the gradient descent
algorithm begins its search for the minimum. The choice of starting point can affect the path the
algorithm takes and whether it finds a local or global minimum.