A while ago I wrote about using Boltzmann sampling to generate random instances of algebraic data types, and mentioned that I have some code I inherited for doing the core computations. There is one part of the code that I still don’t understand, having to do with a variant of Newton’s method for finding a fixed point of a mutually recursive system of equations. It seems to work, but I don’t like using code I don’t understand—for example, I’d like to be sure I understand the conditions under which it does work, to be sure I am not misusing it. I’m posting this in the hopes that someone reading this may have an idea.

Let be a vector function, defined elementwise in terms of functions :

where is a vector in . We want to find the fixed point such that .

The algorithm (you can see the code here) now works as follows. First, define as the Jacobian matrix of partial derivatives of the , that is,

Now let and let be the identity matrix. Then for each define

and also

Somehow, magically (under appropriate conditions on , I presume), the sequence of converge to the fixed point . But I don’t understand where this is coming from, especially the equation for . Most generalizations of Newton’s method that I can find seem to involve multiplying by the *inverse* of the Jacobian matrix. So what’s going on here? Any ideas/pointers to the literature/etc?

##
About Brent

Assistant Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.

You are quite right that, in theory, the update formula for should involve (not since we’re trying to find a solution of the equation ). But how should we calculate this inverse?

The algorithm you present solves this problem in a clever way. Namely, it calculates a sequence of approximations to the inverse. It does so using a well-known method which is itself just a special instance of the Newton iteration. The first Google result for this is the short note http://www4.ncsu.edu/~aalexan3/articles/mat-inv-rep.pdf, but it should also be covered in any textbook on numerical linear algebra.

Note that, with each update of , the relevant gradient and the relevant inverse matrix change. We don’t let the sub-algorithm for computing the inverse run to completion (convergence), we just let in run for a single step. In numerical practice it’s often the case that modifications like this of the Newton iteration still work extremely well.

Ah, thank you, this makes a lot more sense now!

Have a look here: Algorithms for Combinatorial Systems: Well-Founded Systems and Newton Iterations, 2012, by Pivoteau, Salvy, Soria : https://arxiv.org/abs/1109.2688

Specifically, pages 27 and 35. But the whole paper in general.

You will find the whole explanation for the definition of the two series, and all proofs & conditions of convergence.

(Actually, this whole codebase is based in this paper.)

Thank you! I think I read that paper a long time ago but obviously I didn’t remember what was in it. I’ll give it another read.

Ingo Blechschmidt is right.

The Newton method applied to for finding is sometimes called Smith iteration.

For generalisations to matrix roots and other functions as well as analysis, the work of Nick Higham may be of interest [1,2,3].

[1] http://eprints.ma.man.ac.uk/637/

[2] http://eprints.ma.man.ac.uk/310/

[3] http://www.maths.manchester.ac.uk/~higham/fm/index.php

Thanks for the links!

Those links are definitely interesting, but I hadn’t seen the previous comment. The Pivoteau et al. paper looks like it is specifically relevant for your application. It is in general not obvious to show convergence for Newton. I guess your function has some special properties that allow this (didn’t look at the paper in detail).

OK, makes sense. Yes, the Phi function arises from a combinatorial system — and the kinds of sensible combinatorial systems one cares about definitely have some special structure one can take advantage of.