Let be a vector function, defined elementwise in terms of functions :
where is a vector in . We want to find the fixed point such that .
The algorithm (you can see the code here) now works as follows. First, define as the Jacobian matrix of partial derivatives of the , that is,
Now let and let be the identity matrix. Then for each define
and also
Somehow, magically (under appropriate conditions on , I presume), the sequence of converge to the fixed point . But I don’t understand where this is coming from, especially the equation for . Most generalizations of Newton’s method that I can find seem to involve multiplying by the inverse of the Jacobian matrix. So what’s going on here? Any ideas/pointers to the literature/etc?
This fall, I will be teaching an undergraduate course on programming languages. It’s eminently sensible to ask a new hire to take on a course in their specialty, and one might think I would be thrilled. But in a way, I am dreading it.
It’s my own fault, really. In my hubris, I have decided that I don’t like the ways that PL courses are typically taught. So this summer I have to buckle down and actually design the course I do want to teach. It’s not that I’m dreading the course itself, but rather the amount of work it will take to create it!
I’m not a big fan of the sort of “survey of programming languages” course that gets taught a lot, where you spend three or four weeks on each of three or four different languages. I am not sure that students really learn much from the experience (though I would be happy to hear any reports to the contrary). At best it feels sort of like making students “eat their vegetables”—it’s not much fun but it will make them grow big and strong in some general sense.^{1} It’s unlikely that students will ever use the surveyed languages again. You might hope that students will think to use the surveyed languages later in their career because they were exposed to them in the course; but I doubt it, because three or four weeks is hardly enough to get any real sense for a language and where it might be useful. I think the only real argument for this sort of course is that it “exposes students to new ways of thinking”. While that is certainly true, and exposing students to new ways of thinking is important—essentially every class should be doing it, in one way or another—I think there are better ways to go about it.
In short, I want to design a course that will not only expose students to new ideas and ways of thinking, but will also give them some practical skills that they might actually use in their career. I started by considering the question: what does the field of programming languages uniquely have to offer to students that is both intellecually worthwhile (by my own standards) and valuable to them? Specifically, I want to consider students who go on to do something other than be an academic in PL: what do I want the next generation of software developers and academics in other fields to understand about programming languages?
A lightbulb finally turned on for me when I realized that while the average software developer will probably never use, say, Prolog, they almost certainly will develop a domain-specific language at some point—quite possibly without even realizing they are doing it! In fact, if we include embedded domain-specific languages, then in essence, anyone developing any API at all is creating a language. Even if you don’t want to extend the idea of “embedded domain-specific language” quite that far, the point is that the tools and ideas of language design are widely applicable. Giving students practice designing and implementing languages will make them better programmers.
So I want my course to focus on language design, encompassing both big ideas (type systems, semantics) as well as concrete tools (parsing, ASTs, type checking, interpreters). We will use a functional programming language (specifically, Haskell) for several reasons: to expose the students to a programming paradigm very different from the languages they already know (mainly Java and Python); because FP languages make a great platform for starting to talk about types; and because FP languages also make a great platform for building language-related tools like parsers, type checkers, etc. and for building embedded domain-specific languages. Notably, however, we will only use Haskell: though we will probably study other types of languages, we will ues Haskell as a medium for our study, e.g. by implementing simplified versions of them in Haskell. So while the students will be exposed to a number of ideas there is really only one concrete language they will be exposed to. The hope is that by working in a single language all semester, the students may actually end up with enough experience in the language that they really do go on to use it again later.
As an aside, an interesting challenge/opportunity comes from the fact that approximately half the students in the class will have already taken my functional programming class this past spring, and will therefore be familiar with Haskell. On the challenge side, how do I teach Haskell to the other half of the class without boring the half that already knows it? Part of the answer might lie in emphasis: I will be highlighting very different aspects of the language from those I covered in my FP course, though of course there will necessarily be overlap. On the opportunity side, however, I can also ask: how can I take advantage of the fact that half the class will already know Haskell? For example, can I design things in such a way that they help the other half of the class get up to speed more quickly?
In any case, here’s my current (very!) rough outline for the semester:
My task for the rest of the summer is to develop a more concrete curriculum, and to design some projects. This will likely be a project-based course, where the majority of the points will be concentrated in a few big projects—partly because the nature of the course lends itself well to larger projects, and partly to keep me sane (I will be teaching two other courses at the same time, and having lots of small assignments constantly due is like death by a thousand cuts).
I would love feedback of any kind. Do you think this is a great idea, or a terrible one? Have you, or anyone you know of, ever run a similar course? Do you have any appropriate assignments you’d like to share with me?
Actually, I love vegetables, but anyway.↩
The main motivation for writing the page is to explain the (to my knowledge, novel) Möbius method for printing and reading double-sided, like this:
I actually now use this in practice. As compared to the usual method of printing double-sided, this has several advantages:
But there are even new things to say about traditional double-sided printing, as well. I now know of several different algorithms for reading double-sided, each with its pros and cons; previously I had not even considered that there might be more than one way to do it.
First, what is Beeminder? Here’s what I wrote three and a half years ago, which I think is still a good description:
The basic idea is that it helps you keep track of progress on any quantifiable goals, and gives you short-term incentive to stay on track: if you don’t, Beeminder takes your money. But it’s not just about the fear of losing money. Shiny graphs tracking your progress coupled with helpfully concrete short-term goals (“today you need to write 1.3 pages of that paper”) make for excellent positive motivation, too.
The key property that makes Beeminder work so well for me is that it makes long-term goals into short-term ones. I am a terrible procrastinator—due to hyperbolic discounting I can be counted on to pretty much ignore anything with long-term rewards or consequences. A vague sense that I ought to take better care of my bike is not enough to compel me to action in the present; but “inflate your tires and grease your chain before midnight or else pay $5” is.
So, what have I accomplished over the past three years?
There are lots of other things I use Beeminder for, but these are the accomplishments I am proudest of. If you want to do awesome things but can never quite seem to find the time or motivation to do them, give it a try!
Several commenters pointed out the connection to Bayesian networks. I think they are right, and the network reliability problem is a very special case of Bayesian inference. However, so far this hasn’t seemed to help very much, since the things I can find about algorithms for Bayesian inference are either too general (e.g. allowing arbitrary functions at nodes) or too specific (e.g. only working for certain kinds of trees). So I’m going to put aside Bayesian inference for now; perhaps later I can come back to it.
In any case, Derek Elkins also made a comment which pointed to exactly what I wanted to talk about next.
Consider the related problem of computing the reliability of the single most reliable path from to in a network. This is really just a disguised version of the shortest path problem, so one can solve it using Dijkstra’s algorithm. But I want to discuss a more general way to think about solving it, using the theory of star semirings. Recall that a semiring is a set with two associative binary operations, “addition” and “multiplication”, which is a commutative monoid under addition, a monoid under multiplication, and where multiplication distributes over addition and . A star semiring is a semiring with an additional operation satisfying . Intuitively, (though can still be well-defined even when this infinite sum is not; we can at least say that if the infinite sum is defined, they must be equal). If is a star semiring, then the semiring of matrices over is also a star semiring; for details see Dolan (2013), O’Connor (2011), Penaloza (2005), and Lehmann (1977). In particular, there is a very nice functional algorithm for computing , with time complexity (Dolan 2013). (Of course, this is slower than Dijkstra’s algorithm, but unlike Dijkstra’s algorithm it also works for finding shortest paths in the presence of negative edge weights—in which case it is essentially the Floyd-Warshall algorithm.)
Now, given a graph and labelling , define the adjacency matrix to be the matrix of edge probabilities, that is, . Let be the star semiring of probabilities under maximum and multiplication (where , since ). Then we can solve the single most reliable path problem by computing over this semiring, and finding the largest entry. If we want to find the actual most reliable path, and not just its reliability, we can instead work over the semiring , i.e. probabilities paired with paths. You might enjoy working out what the addition, multiplication, and star operations should be, or see O’Connor (2011).
In fact, as shown by O’Connor and Dolan, there are many algorithms that can be recast as computing the star of a matrix, for an appropriate choice of semiring: for example, (reflexive-)transitive closure; all-pairs shortest paths; Gaussian elimination; dataflow analysis; and solving certain knapsack problems. One might hope that there is similarly an appropriate semiring for the network reliability problem. But I have spent some time thinking about this and I do not know of one.
Consider again the simple example given at the start of the previous post:
For this example, we computed the reliability of the network to be , by computing the probability of the upper path, , and the lower path, , and then combining them as , the probability of success on either path less the double-counted probability of simultaneous success on both.
Inspired by this example, one thing we might try would be to define operations and . But when we go to check the semiring laws, we run into a problem: distributivity does not hold! , but . The problem is that the addition operation implicitly assumes that the events with probabilities and are independent: otherwise the probability that they both happen is not actually equal to . The events with probabilities and , however, are not independent. In graph terms, they represent two paths with a shared subpath. In fact, our example computation at the beginning of the post was only correct since the two paths from to were completely independent.
We can at least compute the reliability of series-parallel graphs whose terminals correspond with and :
In the second case, having a parallel composition of graphs ensures that there are no shared edges between them, so and are indeed independent.
Of course, many interesting graphs are not series-parallel. The simplest graph for which the above does not work looks like this:
Suppose all the edges have probability . Can you find the reliability of this network?
More in a future post!
Dolan, Stephen. 2013. “Fun with Semirings: A Functional Pearl on the Abuse of Linear Algebra.” In ACM SIGPLAN Notices, 48:101–10. 9. ACM.
Lehmann, Daniel J. 1977. “Algebraic Structures for Transitive Closure.” Theoretical Computer Science 4 (1). Elsevier: 59–76.
O’Connor, Russell. 2011. “A Very General Method for Computing Shortest Paths.” http://r6.ca/blog/20110808T035622Z.html.
Penaloza, Rafael. 2005. “Algebraic Structures for Transitive Closure.” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.7650.
I make no particular guarantees about anything; e.g. there is a crufty, complicated shake script that builds everything, but it probably doesn’t even compile with the latest version of Shake.
There are some obvious next steps, for which I have not the time:
All the material is licensed under a Creative Commons Attribution 4.0 International License, so go wild using it however you like, or working on the above next steps. Pull requests are very welcome, and I will likely give out commit access like candy.
This morning Kenny Foner pointed out to me this tweet by Gabriel Gonzales, asking why there isn’t a default Arbitrary
instance for types implementing Generic
. It reminded me that I’ve been meaning for a while now (years, in fact!) to get around to packaging up some code that does this.
As several pointed out on Twitter, this seems obvious, but it isn’t. It’s easy to write a generic Arbitrary
instance, but hard to write one that generates a good distribution of values. The basic idea is clear: randomly pick a constructor, and then recursively generate random subtrees. The problem is that this is very likely to either blow up and generate gigantic (even infinite) trees, or to generate almost all tiny trees, or both. I wrote a post about this three years ago which illustrates the problem. It also explains half of the solution: generate random trees with a target size in mind, and throw out any which are not within some epsilon of the target size (crucially, stopping the generation early as soon as the tree being generated gets too big).
However, I never got around to explaining the other half of the solution: it’s crucially important to use the right probabilities when picking a constructor. With the wrong probabilities, you will spend too much time generating trees that are either too small or too big. The surprising thing is that with exactly the right probabilities, you can expect to wait only time before generating a tree of size (approximately^{1}) .^{2}
So, how does one pick the right probabilities? Essentially, you turn the generic description of your data type into a mutually recursive system of generating functions, and (numerically) find their radii of convergence, when thought of as functions in the complex plane. Using these values it is straightforward to compute the right probabilities to use. For the intrepid, this is explained in Duchon et. al^{3}.
I have some old Haskell code from Alexis Darrasse which already does a bunch of the work. It would have to be updated a bit to work with modern libraries and with GHC.Generics
, and packaged up to go on Hackage. I won’t really have time to work on this until the summer—but if anyone else is interested in working on this, let me know! I’d be happy to send you the code and provide some guidance in figuring it out.
The constant factor depends on how approximate you are willing to be.↩
I wanted to put an exclamation point at the end of that sentence, because this is really surprising. But it looked like factorial. So, here is the exclamation point: !↩
Duchon, Philippe, et al. “Boltzmann samplers for the random generation of combinatorial structures.” Combinatorics Probability and Computing 13.4-5 (2004): 577-625.↩
Suppose that when a router receives a message on an incoming connection, it immediately resends it on all outgoing connections. For , let denote the probability that, under this “flooding” scenario, at least one copy of a message originating at will eventually reach .
For example, consider the simple network shown below.
A message sent from along the upper route through has an probability of arriving at . By definition a message sent along the bottom route has an probability of arriving at . One way to think about computing the overall probability is to compute the probability that it is not the case that the message fails to traverse both links, that is, . Alternatively, in general we can see that , so as well. Intuitively, since the two events are not mutually exclusive, if we add them we are double-counting the situation where both links work, so we subtract the probability of both working.
The question is, given some graph and some specified nodes and , how can we efficiently compute ? For now I am calling this the “network reliability problem” (though I fully expect someone to point out that it already has a name). Note that it might make the problem a bit easier to restrict to directed acyclic graphs; but the problem is still well-defined even in the presence of cycles.
This problem turned out to be surprisingly more difficult and interesting than it first appeared. In a future post or two I will explain my solution, with a Haskell implementation. In the meantime, feel free to chime in with thoughts, questions, solutions, or pointers to the literature.