The network reliability problem and star semirings

In a previous post I defined the network reliability problem. Briefly, we are given a directed graph whose edges are labelled with probabilities, which we can think of as giving the likelihood of a message successfully traversing a link in a network. The problem is then to compute the probability that a message will successfully traverse the network from a given source node to a given target node.

Several commenters pointed out the connection to Bayesian networks. I think they are right, and the network reliability problem is a very special case of Bayesian inference. However, so far this hasn’t seemed to help very much, since the things I can find about algorithms for Bayesian inference are either too general (e.g. allowing arbitrary functions at nodes) or too specific (e.g. only working for certain kinds of trees). So I’m going to put aside Bayesian inference for now; perhaps later I can come back to it.

In any case, Derek Elkins also made a comment which pointed to exactly what I wanted to talk about next.

Star semirings and path independence

Consider the related problem of computing the reliability of the single most reliable path from s to t in a network. This is really just a disguised version of the shortest path problem, so one can solve it using Dijkstra’s algorithm. But I want to discuss a more general way to think about solving it, using the theory of star semirings. Recall that a semiring is a set with two associative binary operations, “addition” and “multiplication”, which is a commutative monoid under addition, a monoid under multiplication, and where multiplication distributes over addition and 0a = a0 = 0. A star semiring is a semiring with an additional operation (-)^* satisfying a^* = 1 + aa^* = 1 + a^*a. Intuitively, a^* = 1 + a + a^2 + a^3 + \dots (though a^* can still be well-defined even when this infinite sum is not; we can at least say that if the infinite sum is defined, they must be equal). If S is a star semiring, then the semiring of n \times n matrices over S is also a star semiring; for details see Dolan (2013), O’Connor (2011), Penaloza (2005), and Lehmann (1977). In particular, there is a very nice functional algorithm for computing M^*, with time complexity O(n^3) (Dolan 2013). (Of course, this is slower than Dijkstra’s algorithm, but unlike Dijkstra’s algorithm it also works for finding shortest paths in the presence of negative edge weights—in which case it is essentially the Floyd-Warshall algorithm.)

Now, given a graph G = (V,E) and labelling \varphi : E \to \mathbb{P}, define the |V| \times |V| adjacency matrix M_G to be the matrix of edge probabilities, that is, (M_G)_{uv} = \varphi(u,v). Let (\mathbb{P}, \max, 0, \times, 1) be the star semiring of probabilities under maximum and multiplication (where a^* = 1, since 1 = \max(1,a \times 1)). Then we can solve the single most reliable path problem by computing M_G^* over this semiring, and finding the largest entry. If we want to find the actual most reliable path, and not just its reliability, we can instead work over the semiring \mathbb{P} \times E^*, i.e. probabilities paired with paths. You might enjoy working out what the addition, multiplication, and star operations should be, or see O’Connor (2011).

In fact, as shown by O’Connor and Dolan, there are many algorithms that can be recast as computing the star of a matrix, for an appropriate choice of semiring: for example, (reflexive-)transitive closure; all-pairs shortest paths; Gaussian elimination; dataflow analysis; and solving certain knapsack problems. One might hope that there is similarly an appropriate semiring for the network reliability problem. But I have spent some time thinking about this and I do not know of one.

Consider again the simple example given at the start of the previous post:

For this example, we computed the reliability of the network to be 0.835, by computing the probability of the upper path, p = 0.45, and the lower path, q = 0.7, and then combining them as p + q - pq, the probability of success on either path less the double-counted probability of simultaneous success on both.

Inspired by this example, one thing we might try would be to define operations p \land q = pq and p \lor q = p + q - pq. But when we go to check the semiring laws, we run into a problem: distributivity does not hold! p \land (q \lor r) = p(q + r - qr) = pq + pr - pqr, but (p \land q) \lor (p \land r) = pq \lor pr = pq + pr - p^2qr. The problem is that the addition operation p \lor q = p + q - pq implicitly assumes that the events with probabilities p and q are independent: otherwise the probability that they both happen is not actually equal to pq. The events with probabilities pq and pr, however, are not independent. In graph terms, they represent two paths with a shared subpath. In fact, our example computation at the beginning of the post was only correct since the two paths from s to t were completely independent.

Graph reduction

We can at least compute the reliability of series-parallel graphs whose terminals correspond with s and t:

  • If G consists of a single edge, return that edge’s probability.
  • Otherwise, G is a composition of two subgraphs, whose reliabilities we recursively compute. Then:
    • If G is a sequential composition of graphs, return the product of their reliabilities.
    • If G is a parallel composition of two graphs with reliabilities p and q, return p + q - pq.

In the second case, having a parallel composition of graphs ensures that there are no shared edges between them, so p and q are indeed independent.

Of course, many interesting graphs are not series-parallel. The simplest graph for which the above does not work looks like this:

Suppose all the edges have probability 1/3. Can you find the reliability of this network?

More in a future post!


Dolan, Stephen. 2013. “Fun with Semirings: A Functional Pearl on the Abuse of Linear Algebra.” In ACM SIGPLAN Notices, 48:101–10. 9. ACM.

Lehmann, Daniel J. 1977. “Algebraic Structures for Transitive Closure.” Theoretical Computer Science 4 (1). Elsevier: 59–76.

O’Connor, Russell. 2011. “A Very General Method for Computing Shortest Paths.”

Penaloza, Rafael. 2005. “Algebraic Structures for Transitive Closure.”


About Brent

Associate Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in math and tagged , , , , , . Bookmark the permalink.

9 Responses to The network reliability problem and star semirings

  1. Pete McAllister says:

    I made it $\frac{169}{729}$ by just bashing out a decision tree, but I’m looking forward to reading about the right way to do it…

  2. If I’m not missing something the formula is
    (ab&ac) & (bd|cd) + (ab&!ac) & (bd | bc&cd)) + (!ab&ac) & cd
    which should be 55/243?

    • Pete McAllister says:

      I believe the discrepancy is due to interpretation of the puzzle statement. You are right if for every edge e with weight w, that edge works with probability w. I think the correct interpretation (from Brent’s commentary) is that whenever a node without outgoing edges e_i, w_i receives a packet, it forwards a packet onto e_i with probability w_i.

      My extra 4/729 is from the possibility that packets are forwarded a->b->c->d and a->c, but the a->c packet is dropped. Your term (bd|cd) conflates this case with a->b->c->d and a->c->d.

  3. Min-Sheng says:

    By factoring theorem,
    Pr(a->d)= Pr(a->d| work)*Pr( work)+ Pr(a->d| fail)*Pr( fail)
    There are four parts to be computed:
    1. Pr(
    2. Pr(
    3. Pr(a->d|
    work) is the reliability of the serial-parallel graph which nodes b, c are merged into one node and its value can be computed as {1/3+1/3-1/3*1/3}^2=25/81.
    4. Pr(a->d|
    fail) is the reliability of the serial-parallel graph which edge is removed and its value can be computed as 1/9+1/9-1/9*1/9=17/81.
    Thus, Pr(a->d)=(1/3)*(25/81) + (2/3)*(17/81)= 59/243.

    • Min-Sheng says:

      Sorry! I made a wrong computation. Since nodes b and c can be merged as one node only if both b->c and c->b work.

  4. Thomas Fritz says:

    Same result as Sjoerd, with a slightly different procedure.
    Let XY denote the event that a message passes from node x to node y. Event AD can be written in disjunctive normal form in terms of events for adjacent nodes
    AD = (AB ∩ BD) ∪ (AB ∩ BC ∩ CD) ∪ (AC ∩ CD)
    (In contrast to the formula by Sjoerd, there are no complements but the disjunctions are not disjoint)
    Using the inclusion-exclusion principle, a generalized version of the formula P(N ∪ M) = P(M) + P(N) – P(M ∩ N), the probability for AD is
    P(AD) = P(AB ∩ BD) + P(AB ∩ BC ∩ CD) + P(AC ∩ CD) – P((AB ∩ BD) ∩ (AB ∩ BC ∩ CD)) – P((AB ∩ BD) ∩ (AC ∩ CD)) – P((AB ∩ BC ∩ CD) ∩ (AC ∩ CD)) + P((AB ∩ BD) ∩ (AB ∩ BC ∩ CD) ∩ (AC ∩ CD)) =
    P(AB ∩ BD) + P(AB ∩ BC ∩ CD) + P(AC ∩ CD) – P(AB ∩ BC ∩ BD ∩ CD) – P(AB ∩ AC ∩ BD ∩ CD) – P(AB ∩ AC ∩ BC ∩ CD) + P(AB ∩ AC ∩ BC ∩ BD ∩ CD)
    and since the events are independent (adjacent nodes!), the result is
    P(AD) = 2(1/3)^2 + (1/3)^3 – 3(1/3)^4 + (1/3)^5 = 55/243

  5. Klaus Draeger says:

    Trying a slightly different approach (which unfortunately requires G to be acyclic), I am getting a different value. In order to deal with the possibility of nodes receiving several messages, I associate explicit distributions on numbers of messages with each node and edge:

    At a, this distribution is (0,1) since we have 0 messages with probability 0 and 1 message with probability 1.

    For each edge, multiply the vector on the source node with the matrix M whose coefficient $m_{i,j}$ represents the probability that i out of j messages survive; we thus get (2/3,1/3) on both ab and ac.

    For each node, the distribution is the convolution of the incoming edges. So for b, we also get the distribution (2/3,1/3); then (8/9,1/9) on bc and (16/27,10/27,1/27) for c (the coefficients in the convolution being 8/27=2/3*8/9, 10/27=2/3*1/9+1/3*8/9, 1/27=1/3*1/9).

    On edge bd, we again have (8/9,1/9) and on cd, (208/243,34/243,1/243). The final convolution at d is nasty, but we only need the first coefficient, 1664/2187, which is the probability that no messages arrive. The reliability then is 523/2187.

    • Klaus Draeger says:

      I noticed too late that this whole approach can be described much more neatly in terms of polynomials; given a distribution d on {0,…,n}, we represent d using the polynomial f(x)=d(0)+d(1)x+…+d(n)x^n. The approach can then be described as follows in terms of polynomials f_v and f_e for all nodes v and edges e:
      – For the initial node a, f_a(x) = x.
      – For any edge e out of a node v with reliability p, f_e(x) = f_v(p*x + (1-p)).
      – For any node v with incoming edges e_1,…,e_n, f_v(x) = f_{e_1}(x)*…*f_{e_n}(x).
      The reliability is then 1 – f_d(0), where d is the target node.
      In principle, this idea could also be applied for graphs with cycles, but then you would need to compute a fixpoint which will consist of power series rather than polynomials.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.