A while ago I wrote about using Boltzmann sampling to generate random instances of algebraic data types, and mentioned that I have some code I inherited for doing the core computations. There is one part of the code that I still don’t understand, having to do with a variant of Newton’s method for finding a fixed point of a mutually recursive system of equations. It seems to work, but I don’t like using code I don’t understand—for example, I’d like to be sure I understand the conditions under which it does work, to be sure I am not misusing it. I’m posting this in the hopes that someone reading this may have an idea.

Let $\Phi : \mathbb{R}^n \to \mathbb{R}^n$ be a vector function, defined elementwise in terms of functions $\Phi_1, \dots, \Phi_n : \mathbb{R}^n \to \mathbb{R}$:

$\displaystyle \Phi(\mathbf{X}) = (\Phi_1(\mathbf{X}), \dots, \Phi_n(\mathbf{X}))$

where $\mathbf{X} = (X_1, \dots, X_n)$ is a vector in $\mathbb{R}^n$. We want to find the fixed point $\mathbf{Y}$ such that $\Phi(\mathbf{Y}) = \mathbf{Y}$.

The algorithm (you can see the code here) now works as follows. First, define $\mathbf{J}$ as the $n \times n$ Jacobian matrix of partial derivatives of the $\Phi_i$, that is,

$\displaystyle \displaystyle \mathbf{J} = \begin{bmatrix} \frac{\partial}{\partial X_1} \Phi_1 & \dots & \frac{\partial}{\partial X_n} \Phi_1 \\ \vdots & \ddots & \vdots \\ \frac{\partial}{\partial X_1} \Phi_n & \dots & \frac{\partial}{\partial X_n} \Phi_n\end{bmatrix}$

Now let $\mathbf{Y}_0 = (0, \dots, 0)$ and let $\mathbf{U}_0 = I_n$ be the $n \times n$ identity matrix. Then for each $i \geq 0$ define

$\displaystyle \mathbf{U}_{i+1} = \mathbf{U}_i + \mathbf{U}_i(\mathbf{J}(\mathbf{Y}_i)\mathbf{U}_i - (\mathbf{U}_i - I_n))$

and also

$\displaystyle \mathbf{Y}_{i+1} = \mathbf{Y}_i + \mathbf{U}_{i+1}(\Phi(\mathbf{Y}_i) - \mathbf{Y}_i).$

Somehow, magically (under appropriate conditions on $\Phi$, I presume), the sequence of $\mathbf{Y}_i$ converge to the fixed point $\mathbf{Y}$. But I don’t understand where this is coming from, especially the equation for $\mathbf{U}_{i+1}$. Most generalizations of Newton’s method that I can find seem to involve multiplying by the inverse of the Jacobian matrix. So what’s going on here? Any ideas/pointers to the literature/etc?

Assistant Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in math and tagged , , , , , , . Bookmark the permalink.

1. Ingo Blechschmidt says:

You are quite right that, in theory, the update formula for $Y$ should involve $(J - I)^{-1}$ (not $J^{-1}$ since we’re trying to find a solution of the equation $\Phi(Y) - Y = 0$). But how should we calculate this inverse?

The algorithm you present solves this problem in a clever way. Namely, it calculates a sequence $U_i$ of approximations to the inverse. It does so using a well-known method which is itself just a special instance of the Newton iteration. The first Google result for this is the short note http://www4.ncsu.edu/~aalexan3/articles/mat-inv-rep.pdf, but it should also be covered in any textbook on numerical linear algebra.

Note that, with each update of $Y$, the relevant gradient and the relevant inverse matrix change. We don’t let the sub-algorithm for computing the inverse run to completion (convergence), we just let in run for a single step. In numerical practice it’s often the case that modifications like this of the Newton iteration still work extremely well.

• Brent says:

Ah, thank you, this makes a lot more sense now!

2. Olivier says:

Have a look here: Algorithms for Combinatorial Systems: Well-Founded Systems and Newton Iterations, 2012, by Pivoteau, Salvy, Soria : https://arxiv.org/abs/1109.2688
Specifically, pages 27 and 35. But the whole paper in general.
You will find the whole explanation for the definition of the two series, and all proofs & conditions of convergence.
(Actually, this whole codebase is based in this paper.)

• Brent says:

Thank you! I think I read that paper a long time ago but obviously I didn’t remember what was in it. I’ll give it another read.

3. Jan Van lent says:

Ingo Blechschmidt is right.

The Newton method applied to $A - X^{-1}=0$ for finding $A^{-1}$ is sometimes called Smith iteration.
For generalisations to matrix roots and other functions as well as analysis, the work of Nick Higham may be of interest [1,2,3].

• Brent says:

Those links are definitely interesting, but I hadn’t seen the previous comment. The Pivoteau et al. paper looks like it is specifically relevant for your application. It is in general not obvious to show convergence for Newton. I guess your $Phi$ function has some special properties that allow this (didn’t look at the paper in detail).