Any clues about this Newton iteration formula with Jacobian matrix?

A while ago I wrote about using Boltzmann sampling to generate random instances of algebraic data types, and mentioned that I have some code I inherited for doing the core computations. There is one part of the code that I still don’t understand, having to do with a variant of Newton’s method for finding a fixed point of a mutually recursive system of equations. It seems to work, but I don’t like using code I don’t understand—for example, I’d like to be sure I understand the conditions under which it does work, to be sure I am not misusing it. I’m posting this in the hopes that someone reading this may have an idea.

Let \Phi : \mathbb{R}^n \to \mathbb{R}^n be a vector function, defined elementwise in terms of functions \Phi_1, \dots, \Phi_n : \mathbb{R}^n \to \mathbb{R}:

\displaystyle \Phi(\mathbf{X}) = (\Phi_1(\mathbf{X}), \dots, \Phi_n(\mathbf{X}))

where \mathbf{X} = (X_1, \dots, X_n) is a vector in \mathbb{R}^n. We want to find the fixed point \mathbf{Y} such that \Phi(\mathbf{Y}) = \mathbf{Y}.

The algorithm (you can see the code here) now works as follows. First, define \mathbf{J} as the n \times n Jacobian matrix of partial derivatives of the \Phi_i, that is,

\displaystyle \displaystyle \mathbf{J} = \begin{bmatrix} \frac{\partial}{\partial X_1} \Phi_1 & \dots & \frac{\partial}{\partial X_n} \Phi_1 \\ \vdots & \ddots & \vdots \\ \frac{\partial}{\partial X_1} \Phi_n & \dots & \frac{\partial}{\partial X_n} \Phi_n\end{bmatrix}

Now let \mathbf{Y}_0 = (0, \dots, 0) and let \mathbf{U}_0 = I_n be the n \times n identity matrix. Then for each i \geq 0 define

\displaystyle \mathbf{U}_{i+1} = \mathbf{U}_i + \mathbf{U}_i(\mathbf{J}(\mathbf{Y}_i)\mathbf{U}_i - (\mathbf{U}_i - I_n))

and also

\displaystyle \mathbf{Y}_{i+1} = \mathbf{Y}_i + \mathbf{U}_{i+1}(\Phi(\mathbf{Y}_i) - \mathbf{Y}_i).

Somehow, magically (under appropriate conditions on \Phi, I presume), the sequence of \mathbf{Y}_i converge to the fixed point \mathbf{Y}. But I don’t understand where this is coming from, especially the equation for \mathbf{U}_{i+1}. Most generalizations of Newton’s method that I can find seem to involve multiplying by the inverse of the Jacobian matrix. So what’s going on here? Any ideas/pointers to the literature/etc?

About Brent

Assistant Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in math and tagged , , , , , , . Bookmark the permalink.

8 Responses to Any clues about this Newton iteration formula with Jacobian matrix?

  1. Ingo Blechschmidt says:

    You are quite right that, in theory, the update formula for Y should involve (J - I)^{-1} (not J^{-1} since we’re trying to find a solution of the equation \Phi(Y) - Y = 0). But how should we calculate this inverse?

    The algorithm you present solves this problem in a clever way. Namely, it calculates a sequence U_i of approximations to the inverse. It does so using a well-known method which is itself just a special instance of the Newton iteration. The first Google result for this is the short note http://www4.ncsu.edu/~aalexan3/articles/mat-inv-rep.pdf, but it should also be covered in any textbook on numerical linear algebra.

    Note that, with each update of Y, the relevant gradient and the relevant inverse matrix change. We don’t let the sub-algorithm for computing the inverse run to completion (convergence), we just let in run for a single step. In numerical practice it’s often the case that modifications like this of the Newton iteration still work extremely well.

  2. Olivier says:

    Have a look here: Algorithms for Combinatorial Systems: Well-Founded Systems and Newton Iterations, 2012, by Pivoteau, Salvy, Soria : https://arxiv.org/abs/1109.2688
    Specifically, pages 27 and 35. But the whole paper in general.
    You will find the whole explanation for the definition of the two series, and all proofs & conditions of convergence.
    (Actually, this whole codebase is based in this paper.)

    • Brent says:

      Thank you! I think I read that paper a long time ago but obviously I didn’t remember what was in it. I’ll give it another read.

  3. Jan Van lent says:

    Ingo Blechschmidt is right.

    The Newton method applied to A - X^{-1}=0 for finding A^{-1} is sometimes called Smith iteration.
    For generalisations to matrix roots and other functions as well as analysis, the work of Nick Higham may be of interest [1,2,3].

    [1] http://eprints.ma.man.ac.uk/637/
    [2] http://eprints.ma.man.ac.uk/310/
    [3] http://www.maths.manchester.ac.uk/~higham/fm/index.php

    • Brent says:

      Thanks for the links!

      • Jan Van lent says:

        Those links are definitely interesting, but I hadn’t seen the previous comment. The Pivoteau et al. paper looks like it is specifically relevant for your application. It is in general not obvious to show convergence for Newton. I guess your Phi function has some special properties that allow this (didn’t look at the paper in detail).

        • Brent says:

          OK, makes sense. Yes, the Phi function arises from a combinatorial system — and the kinds of sensible combinatorial systems one cares about definitely have some special structure one can take advantage of.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s