One of my other hobbies, besides Music and Film, are maths, so last night I was a little bored and decided to take a look into some and get back to my good-old-school days. I especially like nonlinear equations.
Here are some close connections between finding a local minimum and solving a set of nonlinear equations. Given a set of n equations in n unknowns, seeking a solution is equivalent to minimizing the sum of squares when the residual is zero at the minimum, so there is a particularly close connection to the Gauss-Newton methods. In fact, the Gauss-Newton step for local minimization and the Newton step for nonlinear equations are exactly the same. Also, for a smooth function, Newton’s method for local minimization is the same as Newton’s method for the nonlinear equations . Not surprisingly many aspects of the algorithms are similar. Nonetheless there are also important differences.
Another thing in common with minimization algorithms is the need for some kind of step control. Typically, step control is based on the same methods as minimization except that they are applied to a merit function, usually the smooth 2-norm squared, r(x).r(x)
- Newton: Newton’s method is an efficient algorithm for finding approximations to the zeros (or roots) of a real-valued function. As such, it is an example of a root-finding algorithm. It produces iteratively a sequence of approximations to the root, their rate of convergence to the root is quadratic. The method generalizes to complex and multidimensional versions.
- Secant: Works without derivates by constructing a secant approximation to the Jacobian using n past steps. It requires two starting conditions in each dimension.
- Brent: Method in one dimension that maintains bracketing of roots. It requires two starting conditions which bracket a root.
Newton’s method
The idea of the method is as follows: one starts with an initial guess which is reasonably close to the true root, then the function is approximated by its tangent line, and one computes the x-intercept of this tangent line (which is easily done with elementary algebra). This x-intercept will typically be a better approximation to the function’s root than the original guess, and the method can be iterated.
Suppose f : [a, b] → R is a differentiable function defined on the interval [a, b] with values in the real numbers R. The formula for converging on the root can be easily derived. Suppose we have some current approximation xn. Then we can derive the formula for a better approximation, xn+1. We know from the definition of the derivative at a given point that it is the slope of a tangent at that point.
That is:

Here, f ‘ denotes the derivative of the function f. Then by simple algebra we can derive:

We start the process off with some arbitrary initial value x0 (The closer to the zero, the better. But, in the absence of any intuition about where the zero might lie, a “guess and check” method might narrow the possibilities to a reasonably small interval by appealing to the intermediate value theorem).
The method will usually converge, provided this initial guess is close enough to the unknown zero, and that:

Furthermore, for a zero of multiplicity 1, the convergence is at least quadratic (see rate of convergence) in a neighbourhood of the zero, which intuitively means that the number of correct digits roughly at least doubles in every step.