Competitive programming in Haskell: BFS, part 4 (implementation via STUArray)

In a previous post, we saw one way to implement our BFS API, but I claimed that it is not fast enough to solve Modulo Solitaire. Today, I want to demonstrate a faster implementation. (It’s almost certainly possible to make it faster still; I welcome suggestions!)

Once again, the idea is to replace the HashMaps from last time with mutable arrays, but in such a way that we get to keep the same pure API—almost. In order to allow arbitrary vertex types, while storing the vertices efficiently in a mutable array, we will require one extra argument to our bfs function, namely, an Enumeration specifying a way to map back and forth between vertices and array indices.

So why not instead just restrict vertices to some type that can be used as keys of a mutable array? That would work, but would unnecessarily restrict the API. For example, it is very common to see competitive programming problems that are “just” a standard graph algorithm, but on a non-obvious graph where the vertices are conceptually some more complex algebraic type, or on a graph where the vertices are specified as strings. Typically, competitive programmers just implement a mapping between vertices to integers on the fly—using either some math or some lookup data structures on the side—but wouldn’t it be nicer to be able to compositionally construct such a mapping and then have the graph search algorithm automatically handle the conversion back and forth? This is exactly what the Enumeration abstraction gives us.

This post is literate Haskell; you can obtain the source from the darcs repo. The source code (without accompanying blog post) can also be found in my comprog-hs repo.

{-# LANGUAGE FlexibleContexts    #-}
{-# LANGUAGE RankNTypes          #-}
{-# LANGUAGE RecordWildCards     #-}
{-# LANGUAGE ScopedTypeVariables #-}

module Graph where

import Enumeration

import           Control.Arrow       ((>>>))
import           Control.Monad
import           Control.Monad.ST
import qualified Data.Array.IArray   as IA
import           Data.Array.ST
import           Data.Array.Unboxed  (UArray)
import qualified Data.Array.Unboxed  as U
import           Data.Array.Unsafe   (unsafeFreeze)
import           Data.Sequence       (Seq (..), ViewL (..), (<|), (|>))
import qualified Data.Sequence       as Seq

infixl 0 >$>
(>$>) :: a -> (a -> b) -> b
(>$>) = flip ($)
{-# INLINE (>$>) #-}

exhaustM is like exhaust from the last post, but in the context of an arbitrary Monad. Each step will now be able to have effects (namely, updating mutable arrays) so needs to be monadic.

exhaustM :: Monad m => (a -> m (Maybe a)) -> a -> m a
exhaustM f = go
  where
    go a = do
      ma <- f a
      maybe (return a) go ma

The BFSResult type is the same as before.

data BFSResult v =
  BFSR { getLevel :: v -> Maybe Int, getParent :: v -> Maybe v }

Instead of using HashMaps in our BFSState as before, we will use STUArrays.¹ These are unboxed, mutable arrays which we can use in the ST monad. Note we also define V as a synonym for Int, just as a mnemonic way to remind ourselves which Int values are supposed to represent vertices.

type V = Int
data BFSState s =
  BS { level :: STUArray s V Int, parent :: STUArray s V V, queue :: Seq V }

To initialize a BFS state, we allocate new mutable level and parent arrays (initializing them to all $-1$ values), and fill in the level array and queue with the given start vertices. Notice how we need to be explicitly given the size of the arrays we should allocate; we will get this size from the Enumeration passed to bfs.

initBFSState :: Int -> [V] -> ST s (BFSState s)
initBFSState n vs = do
  l <- newArray (0,n-1) (-1)
  p <- newArray (0,n-1) (-1)

  forM_ vs $ \v -> writeArray l v 0
  return $ BS l p (Seq.fromList vs)

The bfs' function implements the BFS algorithm itself. Notice that it is not polymorphic in the vertex type; we will fix that with a wrapper function later. If you squint, the implementation looks very similar to the implementation of bfs from my previous post, with the big difference that everything has to be in the ST monad now.

bfs' :: Int -> [V] -> (V -> [V]) -> (V -> Bool) -> ST s (BFSState s)
bfs' n vs next goal = do
  st <- initBFSState n vs
  exhaustM bfsStep st
  where
    bfsStep st@BS{..} = case Seq.viewl queue of
      EmptyL -> return Nothing
      v :< q'
        | goal v -> return Nothing
        | otherwise -> v >$> next >>> filterM (fmap not . visited st)
            >=> foldM (upd v) (st{queue=q'}) >>> fmap Just

    upd p b@BS{..} v = do
      lp <- readArray level p
      writeArray level v (lp + 1)
      writeArray parent v p
      return $ b{queue = queue |> v}

visited :: BFSState s -> V -> ST s Bool
visited BS{..} v = (/= -1) <$> readArray level v
{-# INLINE visited #-}

The bfs function is a wrapper around bfs'. It presents the same API as before, with the exception that it requires an extra Enumeration v argument, and uses it to convert vertices to integers for the inner bfs' call, and then back to vertices for the final result. It also handles freezing the mutable arrays returned from bfs' and constructing level and parent lookup functions that index into them. Note, the use of unsafeFreeze seems unavoidable, since runSTUArray only allows us to work with a single mutable array; in any case, it is safe for the same reason the use of unsafeFreeze in the implementation of runSTUArray itself is safe: we can see from the type of toResult that the s parameter cannot escape, so the type system will not allow any further mutation to the arrays after it completes.

bfs :: forall v. Enumeration v -> [v] -> (v -> [v]) -> (v -> Bool) -> BFSResult v
bfs Enumeration{..} vs next goal
  = toResult $ bfs' card (map locate vs) (map locate . next . select) (goal . select)
  where
    toResult :: (forall s. ST s (BFSState s)) -> BFSResult v
    toResult m = runST $ do
      st <- m
      (level' :: UArray V Int) <- unsafeFreeze (level st)
      (parent' :: UArray V V) <- unsafeFreeze (parent st)
      return $
        BFSR
          ((\l -> guard (l /= -1) >> Just l) . (level' IA.!) . locate)
          ((\p -> guard (p /= -1) >> Just (select p)) . (parent' IA.!) . locate)

Incidentally, instead of adding an Enumeration v argument, why don’t we just make a type class Enumerable, like this?

class Enumerable v where
  enumeration :: Enumeration v

bfs :: forall v. Enumerable v => [v] -> ...

This would allow us to keep the same API for BFS, up to only different type class constraints on v. We could do this, but it doesn’t particularly seem worth it. It would typically require us to make a newtype for our vertex type (necessitating extra code to map in and out of the newtype) and to declare an Enumerable instance; in comparison, the current approach with an extra argument to bfs requires us to do nothing other than constructing the Enumeration itself.

Using this implementation, bfs is finally fast enough to solve Modulo Solitaire, like this:

main = C.interact $ runScanner tc >>> solve >>> format

data Move = Move { a :: !Int, b :: !Int } deriving (Eq, Show)
data TC = TC { m :: Int, s0 :: Int, moves :: [Move] } deriving (Eq, Show)

tc :: Scanner TC
tc = do
  m <- int
  n <- int
  TC m <$> int <*> n >< (Move <$> int <*> int)

type Output = Maybe Int

format :: Output -> ByteString
format = maybe "-1" showB

solve :: TC -> Output
solve TC{..} = getLevel res 0
  where
    res = bfs (finiteE m) [s0] (\v -> map (step m v) moves) (==0)

step :: Int -> Int -> Move -> Int
step m v (Move a b) = (a*v + b) `mod` m
{-# INLINE step #-}

It’s pretty much unchanged from before, except for the need to pass an Enumeration to bfs (in this case we just use finiteE m, which is the identity on the interval $[0 .. m)$ ).

Some remaining questions

This is definitely not the end of the story.

Submitting all this code (BFS, Enumeration, and the above solution itself) as a single file gives a 2x speedup over submitting them as three separate modules. That’s annoying—why is that?
Can we make this even faster? My solution to Modulo Solitaire runs in 0.57s. There are faster Haskell solutions (for example, Anurudh Peduri has a solution that runs in 0.32s), and there are Java solutions as fast as 0.18s, so it seems to me there ought to be ways to make it much faster. If you have an idea for optimizing this code I’d be very interested to hear it! I am far from an expert in Haskell optimization.
Can we generalize this nicely to other kinds of graph search algorithms (at a minimum, DFS and Dijkstra)? I definitely plan to explore this question in the future.

For next time: Breaking Bad

Next time, I want to look at a few other applications of this BFS code (and perhaps see if we can improve it along the way); I challenge you to solve Breaking Bad.

Why not use Vector, you ask? It’s probably even a bit faster, but the vector library is not supported on as many platforms.↩︎

5 Responses to Competitive programming in Haskell: BFS, part 4 (implementation via STUArray)

Anurudh Peduri says:

November 18, 2021 at 4:56 am

Hey, so the only difference I see between my solution and yours is that I used an `STUArray` to simulate the queue, with head & tail indices. I feel this is much faster than `Seq`, as we know the total size required (`|V|`) ahead of time.

my code – https://github.com/anurudhp/CPHaskell/blob/master/contests/kattis/modulosolitaire.hs

- Brent says:
  
  November 18, 2021 at 3:45 pm
  
  Ah, of course! Great idea.
  
  Unfortunately, I tried implementing it in a general way and my solution is now *slower*! I really don’t know why. I guess I should try to profile it.
  
Aaron Allen says:

November 24, 2021 at 2:11 pm

Just finished solving “Breaking Bad” and found it very enjoyable. As of now, I’ve got the fastest Haskell solution at 0.27s. Here is my code: https://gist.github.com/aaronallen8455/8558594a214786b916a14b82d24f9494

Pingback: Dynamic programming in Haskell: lazy immutable arrays | blog :: Brent -> [String]
Pingback: Dynamic programming in Haskell: automatic memoization | blog :: Brent -> [String]