Competitive programming in Haskell: BFS, part 4 (implementation via STUArray)

In a previous post, we saw one way to implement our BFS API, but I claimed that it is not fast enough to solve Modulo Solitaire. Today, I want to demonstrate a faster implementation. (It’s almost certainly possible to make it faster still; I welcome suggestions!)

Once again, the idea is to replace the HashMaps from last time with mutable arrays, but in such a way that we get to keep the same pure API—almost. In order to allow arbitrary vertex types, while storing the vertices efficiently in a mutable array, we will require one extra argument to our bfs function, namely, an Enumeration specifying a way to map back and forth between vertices and array indices.

So why not instead just restrict vertices to some type that can be used as keys of a mutable array? That would work, but would unnecessarily restrict the API. For example, it is very common to see competitive programming problems that are “just” a standard graph algorithm, but on a non-obvious graph where the vertices are conceptually some more complex algebraic type, or on a graph where the vertices are specified as strings. Typically, competitive programmers just implement a mapping between vertices to integers on the fly—using either some math or some lookup data structures on the side—but wouldn’t it be nicer to be able to compositionally construct such a mapping and then have the graph search algorithm automatically handle the conversion back and forth? This is exactly what the Enumeration abstraction gives us.

This post is literate Haskell; you can obtain the source from the darcs repo. The source code (without accompanying blog post) can also be found in my comprog-hs repo.

{-# LANGUAGE FlexibleContexts    #-}
{-# LANGUAGE RankNTypes          #-}
{-# LANGUAGE RecordWildCards     #-}
{-# LANGUAGE ScopedTypeVariables #-}

module Graph where

import Enumeration

import           Control.Arrow       ((>>>))
import           Control.Monad
import           Control.Monad.ST
import qualified Data.Array.IArray   as IA
import           Data.Array.ST
import           Data.Array.Unboxed  (UArray)
import qualified Data.Array.Unboxed  as U
import           Data.Array.Unsafe   (unsafeFreeze)
import           Data.Sequence       (Seq (..), ViewL (..), (<|), (|>))
import qualified Data.Sequence       as Seq

infixl 0 >$>
(>$>) :: a -> (a -> b) -> b
(>$>) = flip ($)
{-# INLINE (>$>) #-}

exhaustM is like exhaust from the last post, but in the context of an arbitrary Monad. Each step will now be able to have effects (namely, updating mutable arrays) so needs to be monadic.

exhaustM :: Monad m => (a -> m (Maybe a)) -> a -> m a
exhaustM f = go
  where
    go a = do
      ma <- f a
      maybe (return a) go ma

The BFSResult type is the same as before.

data BFSResult v =
  BFSR { getLevel :: v -> Maybe Int, getParent :: v -> Maybe v }

Instead of using HashMaps in our BFSState as before, we will use STUArrays.1 These are unboxed, mutable arrays which we can use in the ST monad. Note we also define V as a synonym for Int, just as a mnemonic way to remind ourselves which Int values are supposed to represent vertices.

type V = Int
data BFSState s =
  BS { level :: STUArray s V Int, parent :: STUArray s V V, queue :: Seq V }

To initialize a BFS state, we allocate new mutable level and parent arrays (initializing them to all -1 values), and fill in the level array and queue with the given start vertices. Notice how we need to be explicitly given the size of the arrays we should allocate; we will get this size from the Enumeration passed to bfs.

initBFSState :: Int -> [V] -> ST s (BFSState s)
initBFSState n vs = do
  l <- newArray (0,n-1) (-1)
  p <- newArray (0,n-1) (-1)

  forM_ vs $ \v -> writeArray l v 0
  return $ BS l p (Seq.fromList vs)

The bfs' function implements the BFS algorithm itself. Notice that it is not polymorphic in the vertex type; we will fix that with a wrapper function later. If you squint, the implementation looks very similar to the implementation of bfs from my previous post, with the big difference that everything has to be in the ST monad now.

bfs' :: Int -> [V] -> (V -> [V]) -> (V -> Bool) -> ST s (BFSState s)
bfs' n vs next goal = do
  st <- initBFSState n vs
  exhaustM bfsStep st
  where
    bfsStep st@BS{..} = case Seq.viewl queue of
      EmptyL -> return Nothing
      v :< q'
        | goal v -> return Nothing
        | otherwise -> v >$> next >>> filterM (fmap not . visited st)
            >=> foldM (upd v) (st{queue=q'}) >>> fmap Just

    upd p b@BS{..} v = do
      lp <- readArray level p
      writeArray level v (lp + 1)
      writeArray parent v p
      return $ b{queue = queue |> v}

visited :: BFSState s -> V -> ST s Bool
visited BS{..} v = (/= -1) <$> readArray level v
{-# INLINE visited #-}

The bfs function is a wrapper around bfs'. It presents the same API as before, with the exception that it requires an extra Enumeration v argument, and uses it to convert vertices to integers for the inner bfs' call, and then back to vertices for the final result. It also handles freezing the mutable arrays returned from bfs' and constructing level and parent lookup functions that index into them. Note, the use of unsafeFreeze seems unavoidable, since runSTUArray only allows us to work with a single mutable array; in any case, it is safe for the same reason the use of unsafeFreeze in the implementation of runSTUArray itself is safe: we can see from the type of toResult that the s parameter cannot escape, so the type system will not allow any further mutation to the arrays after it completes.

bfs :: forall v. Enumeration v -> [v] -> (v -> [v]) -> (v -> Bool) -> BFSResult v
bfs Enumeration{..} vs next goal
  = toResult $ bfs' card (map locate vs) (map locate . next . select) (goal . select)
  where
    toResult :: (forall s. ST s (BFSState s)) -> BFSResult v
    toResult m = runST $ do
      st <- m
      (level' :: UArray V Int) <- unsafeFreeze (level st)
      (parent' :: UArray V V) <- unsafeFreeze (parent st)
      return $
        BFSR
          ((\l -> guard (l /= -1) >> Just l) . (level' IA.!) . locate)
          ((\p -> guard (p /= -1) >> Just (select p)) . (parent' IA.!) . locate)

Incidentally, instead of adding an Enumeration v argument, why don’t we just make a type class Enumerable, like this?

class Enumerable v where
  enumeration :: Enumeration v

bfs :: forall v. Enumerable v => [v] -> ...

This would allow us to keep the same API for BFS, up to only different type class constraints on v. We could do this, but it doesn’t particularly seem worth it. It would typically require us to make a newtype for our vertex type (necessitating extra code to map in and out of the newtype) and to declare an Enumerable instance; in comparison, the current approach with an extra argument to bfs requires us to do nothing other than constructing the Enumeration itself.

Using this implementation, bfs is finally fast enough to solve Modulo Solitaire, like this:

main = C.interact $ runScanner tc >>> solve >>> format

data Move = Move { a :: !Int, b :: !Int } deriving (Eq, Show)
data TC = TC { m :: Int, s0 :: Int, moves :: [Move] } deriving (Eq, Show)

tc :: Scanner TC
tc = do
  m <- int
  n <- int
  TC m <$> int <*> n >< (Move <$> int <*> int)

type Output = Maybe Int

format :: Output -> ByteString
format = maybe "-1" showB

solve :: TC -> Output
solve TC{..} = getLevel res 0
  where
    res = bfs (finiteE m) [s0] (\v -> map (step m v) moves) (==0)

step :: Int -> Int -> Move -> Int
step m v (Move a b) = (a*v + b) `mod` m
{-# INLINE step #-}

It’s pretty much unchanged from before, except for the need to pass an Enumeration to bfs (in this case we just use finiteE m, which is the identity on the interval [0 .. m)).

Some remaining questions

This is definitely not the end of the story.

  • Submitting all this code (BFS, Enumeration, and the above solution itself) as a single file gives a 2x speedup over submitting them as three separate modules. That’s annoying—why is that?

  • Can we make this even faster? My solution to Modulo Solitaire runs in 0.57s. There are faster Haskell solutions (for example, Anurudh Peduri has a solution that runs in 0.32s), and there are Java solutions as fast as 0.18s, so it seems to me there ought to be ways to make it much faster. If you have an idea for optimizing this code I’d be very interested to hear it! I am far from an expert in Haskell optimization.

  • Can we generalize this nicely to other kinds of graph search algorithms (at a minimum, DFS and Dijkstra)? I definitely plan to explore this question in the future.

For next time: Breaking Bad

Next time, I want to look at a few other applications of this BFS code (and perhaps see if we can improve it along the way); I challenge you to solve Breaking Bad.


  1. Why not use Vector, you ask? It’s probably even a bit faster, but the vector library is not supported on as many platforms.↩︎

Posted in competitive programming, haskell | Tagged , , , , , , | 3 Comments

Competitive programming in Haskell: Enumeration

I’m in the middle of a multi-part series on implementing BFS in Haskell. In my last post, we saw one implementation, but I claimed that it is not fast enough to solve Modulo Solitaire, and I promised to show off a faster implementation in this post, but I lied; we have to take a small detour first.

The main idea to make a faster BFS implementation is to replace the HashMaps from last time with mutable arrays, but hopefully in such a way that we get to keep the same pure API. Using mutable arrays introduces a few wrinkles, though.

  1. The API we have says we get to use any type v for our vertices, as long as it is an instance of Ord and Hashable. However, this is not going to work so well for mutable arrays. We still want the external API to allow us to use any type for our vertices, but we will need a way to convert vertices to and from Int values we can use to index the internal mutable array.

  2. A data structre like HashMap is dynamically sized, but we don’t have this luxury with arrays. We will have to know the size of the array up front.

In other words, we need to provide a way to bijectively map vertices to a finite prefix of the natural numbers; that is, we need what I call invertible enumerations. This idea has come up for me multiple times: in 2016, I wrote about using such an abstraction to solve another competitive programming problem, and in 2019 I published a library for working with invertible enumerations. I’ve now put together a lightweight version of that library for use in competitive programming. I’ll walk through the code below, and you can also find the source code in my comprog-hs repository.

First, some extensions and imports.

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}

module Enumeration where

import qualified Data.List as L
import Data.Hashable
import qualified Data.Array as A
import qualified Data.HashMap.Strict as HM

An Enumeration a consists of a cardinality, and two functions, select and locate, which together form a bijection between (some subset of) values of type a and a finite prefix of the natural numbers. We can convert an Enumeration into a list just by mapping the select function over that finite prefix.

data Enumeration a = Enumeration
  { card   :: !Int
  , select :: Int -> a
  , locate :: a -> Int
  }

enumerate :: Enumeration a -> [a]
enumerate e = map (select e) [0 .. card e-1]

Since a occurs both positively and negatively, Enumeration is not a Functor, but we can map over Enumerations as long as we provide both directions of a bijection a <-> b.

mapE :: (a -> b) -> (b -> a) -> Enumeration a -> Enumeration b
mapE f g (Enumeration c s l) = Enumeration c (f . s) (l . g)

We have various fundamental ways to build enumerations: empty and unit enumerations, and an identity enumeration on a finite prefix of natural numbers.

voidE :: Enumeration a
voidE = Enumeration 0 (error "select void") (error "locate void")

unitE :: Enumeration ()
unitE = singletonE ()

singletonE :: a -> Enumeration a
singletonE a = Enumeration 1 (const a) (const 0)

finiteE :: Int -> Enumeration Int
finiteE n = Enumeration n id id

We can automatically enumerate all the values of a Bounded Enum instance. This is useful, for example, when we have made a custom enumeration type.

boundedEnum :: forall a. (Enum a, Bounded a) => Enumeration a
boundedEnum = Enumeration
  { card   = hi - lo + 1
  , select = toEnum . (+lo)
  , locate = subtract lo . fromEnum
  }
  where
    lo, hi :: Int
    lo = fromIntegral (fromEnum (minBound @a))
    hi = fromIntegral (fromEnum (maxBound @a))

We can also build an enumeration from an explicit list. We want to make sure this is efficient, since it is easy to imagine using this e.g. on a very large list of vertex values given as part of the input of a problem. So we build an array and a hashmap to allow fast lookups in both directions.

listE :: forall a. (Hashable a, Eq a) => [a] -> Enumeration a
listE as = Enumeration n (toA A.!) (fromA HM.!)
  where
    n = length as
    toA :: A.Array Int a
    toA = A.listArray (0,n-1) as
    fromA :: HM.HashMap a Int
    fromA = HM.fromList (zip as [0 :: Int ..])

Finally, we have a couple ways to combine enumerations into more complex ones, via sum and product.

(>+<) :: Enumeration a -> Enumeration b -> Enumeration (Either a b)
a >+< b = Enumeration
  { card   = card a + card b
  , select = \k -> if k < card a then Left (select a k) else Right (select b (k - card a))
  , locate = either (locate a) ((+card a) . locate b)
  }

(>*<) :: Enumeration a -> Enumeration b -> Enumeration (a,b)
a >*< b = Enumeration
  { card = card a * card b
  , select = \k -> let (i,j) = k `divMod` card b in (select a i, select b j)
  , locate = \(x,y) -> card b * locate a x + locate b y
  }

There are a few more combinators in the source code but I don’t know whether I’ll ever use them. You can read about them if you want. For now, let’s try using this to solve a problem!

…ah, who am I kidding, I can’t find any problems that can be directly solved using this framework. Invertibility is a double-edged sword—we absolutely need it for creating an efficient BFS with arbitrary vertices, and the combinators will come in quite handy if we want to use some complex type for vertices. However, requiring invertibility also limits the expressiveness of the library. For example, there is no Monad instance. This is why my simple-enumeration library has both invertible and non-invertible variants.

Posted in competitive programming, haskell | Tagged , , , , | 1 Comment

Competitive programming in Haskell: BFS, part 3 (implementation via HashMap)

In a previous post, I showed how we can solve Modulo Solitaire (and hopefully other BFS problems as well) using a certain API for BFS, and we also explored some alternatives. I had a very interesting discussion with Andrey Mokhov in the comments about potential designs for an even more general API; more on that in a future post, perhaps!

For today, though, I want to finally show one way to implement this API efficiently. Spoiler alert: this implementation ultimately won’t be fast enough for us, but it will be a helpful stepping stone on our way to a yet faster implementation (which will of course get its own post in due time).

This post is literate Haskell; you can obtain the source from the darcs repo. We begin with a few LANGUAGE pragmas and imports.

{-# LANGUAGE RecordWildCards            #-}
{-# LANGUAGE ScopedTypeVariables        #-}
{-# LANGUAGE TupleSections              #-}

module BFS where

import           Control.Arrow               ((>>>))
import           Data.Hashable               (Hashable)
import           Data.HashMap.Strict         (HashMap, (!))
import qualified Data.HashMap.Strict         as HM
import           Data.List                   (foldl')
import           Data.Sequence               (Seq (..), ViewL (..), (|>))
import qualified Data.Sequence               as Seq

Now a couple utility functions: (>$>) is just flipped function application, and exhaust iterates an (a -> Maybe a) function as many times as possible, returning the last non-Nothing value.

infixl 0 >$>
(>$>) :: a -> (a -> b) -> b
(>$>) = flip ($)
{-# INLINE (>$>) #-}

exhaust :: (a -> Maybe a) -> a -> a
exhaust f = go
  where
    go a = maybe a go (f a)

Here is the BFSResult record that we ultimately want to produce; it should be familiar from previous posts.

data BFSResult v =
  BFSR { getLevel :: v -> Maybe Int, getParent :: v -> Maybe v }

While running our BFS, we’ll keep track of three things: the level of each vertex that has been encountered; a mapping from each encountered vertex to its parent; and a queue of vertices that have been encountered but yet to be processed. We use a Seq from Data.Sequence to represent the queue, since it supports efficient (amortized constant-time) insertion and removal from either end of the sequence. There are certainly other potential ways to represent a queue in Haskell (and this probably deserves its own blog post) but Data.Sequence seems to give good performance for minimal effort (and in any case, as we’ll see, it’s not the performance bottleneck here). We use a pair of HashMaps to represent the level and parent maps.

data BFSState v =
  BS { level :: HashMap v Int, parent :: HashMap v v, queue :: Seq v }

Given a list of starting vertices, we can create an initial state, with a queue containing the starting vertices and all of them set to level 0.

initBFSState :: (Eq v, Hashable v) => [v] -> BFSState v
initBFSState vs = BS (HM.fromList (map (,0) vs)) HM.empty (Seq.fromList vs)

Now, here is our imeplementation of BFS, using the API discussed previously: it takes a list of starting vertices, a function giving the neighbors of each vertex, and a function identifying “target vertices” (so we can stop early), and returns a BFSResult record. We create an initial state, run bfsStep as much as possible, and convert the end state into a result.

bfs :: forall v. (Eq v, Hashable v) => [v] -> (v -> [v]) -> (v -> Bool) -> BFSResult v
bfs vs next goal = toResult $ exhaust bfsStep (initBFSState vs)
  where

Converting the final BFSState into a BFSResult is easy: just return functions that do a lookup into the relevant map.

    toResult BS{..} = BFSR (`HM.lookup` level) (`HM.lookup` parent)

To do a single step of BFS, try to remove the next vertex v from the queue. If the queue is empty, or the next vertex is a goal vertex, return Nothing to signal that we are done.

    bfsStep st@BS{..} = case Seq.viewl queue of
      EmptyL -> Nothing
      v :< q'
        | goal v    -> Nothing

Otherwise, use the next function to find the neighbors of v, keep only those we haven’t encountered before (i.e. those which are not keys in the level map), and use each one to update the BFS state (being sure to first set the queue to the new one with v removed).

        | otherwise ->
          v >$> next >>> filter (not . (`HM.member` level)) >>>
            foldl' (upd v) (st{queue=q'}) >>> Just

To update the BFS state based on a newly visited vertex, we record its parent, insert it into the level map with a level one greater than its parent, and add it to the end of the queue.

    upd p BS{..} v = BS
      (HM.insert v l level)
      (HM.insert v p parent)
      (queue |> v)
      where
        l = level!p + 1

And that’s it! This is good enough to solve many BFS problems on Open Kattis, such as Breaking Bad, ARMPIT Computations, and Folding a Cube. (I will leave you the pleasure of solving these problems yourself; I am especially fond of my Haskell solution to Folding a Cube.)

Unfortunately, it is not fast enough to solve Modulo Solitaire, which I picked specifically because it seems to be one of the most computationally demanding BFS problems I’ve seen. My solution using this HashMap-based implementation solves a bunch of initial test cases, but exceeds the 2 second time limit on one of the later test cases. Next time, I’ll show how to adapt this into an even faster implementation which is actually fast enough to solve Modulo Solitaire.

Posted in competitive programming, haskell | Tagged , , , , , | 4 Comments

Competitive programming in Haskell: BFS, part 2 (alternative APIs)

In my last post, I showed how we can solve Modulo Solitaire (and hopefully other BFS problems as well) using a certain API for BFS, which returns two functions: one, level :: v -> Maybe Int, gives the level (i.e. length of a shortest path to) of each vertex, and parent :: v -> Maybe v gives the parent of each vertex in the BFS forest. Before showing an implementation, I wanted to talk a bit more about this API and why I chose it.

In particular, Andrey Mokhov left a comment on my previous post with some alternative APIs:

bfsForest :: Ord a => [a] -> AdjacencyMap a -> Forest a
bfs :: Ord a => [a] -> AdjacencyMap a -> [[a]]

Of course, as Andrey notes, AdjacencyMap is actually a reified graph data structure, which we don’t want here, but that’s not essential; presumably the AdjacencyMap arguments in Andrey’s functions could easily be replaced by an implicit graph description instead. (Note that an API requiring an implicit representation is strictly more powerful, since if you have an explicit representation you can always just pass in a function which does lookups into your explicit representation.) However, Andrey raises a good point. Both these APIs return information which is not immediately available from my API.

  • bfsForest returns an actual forest we can traverse, giving the children of each node. My API only returns a parent function which gives the parent of each node. These contain equivalent information, however, and we can convert back and forth efficiently (where by “efficiently” in this context I mean “in O(n \lg n) time or better”) as long as we have a list of all vertices. To convert from a Forest to a parent function, just traverse the forest and remember all the parent-child pairs we see, building e.g. a Map that can be used for lookup. To convert back, first iterate over the list of all vertices, find the parent of each, and build an inverse mapping from parents to sets of children. If we want to proceed to building an actual Forest data structure, we can unfold one via repeated lookups into our child mapping.

    However, I would argue that in typical applications, having the parent function is more useful than having a Forest. For example, the parent function allows us to efficiently answer common, classic queries such as “Is vertex v reachable from vertex s?” and “What is a shortest path from s to v?” Answering these questions with a Forest would require traversing the entire Forest to look for the target vertex v.

  • bfs returns a list of levels: that is, the first list is the starting vertices, the next list is all vertices one step away from any starting vertex, the next list is all vertices two steps away, and so on. Again, given a list of all vertices, we can recover a list of levels from the level function: just traverse the list of all vertices, looking up the level of each and adding it to an appropriate mapping from levels to sets of vertices. Converting in the other direction is easy as well.

    A level list lets us efficiently answer a queries such as “how many vertices are exactly 5 steps away from s”?, whereas with the level function we can efficiently answer queries such as “What is the length of a shortest path from s to v?” In practice, the latter form of query seems more common.

In the final version of this BFS API, I will probably include some functions to recover forests and level sets as described above. Some benchmarking will be needed to see whether it’s more efficient to recover them after the fact or to actually keep track of them along the way.

Posted in competitive programming, haskell | Tagged , , , , | 6 Comments

Competitive programming in Haskell: BFS, part 1

In a previous post, I challenged you to solve Modulo Solitaire. In this problem, we are given a starting number s_0 and are trying to reach 0 in as few moves as possible. At each move, we may pick one of up to 10 different rules (a_i,b_i) that say we can transform s into (a_i s + b_i) \bmod m.

In one sense, this is a straightforward search problem. Conceptually, the numbers 0 through m-1 form the vertices of a graph, with a directed edge from s to t whenever there is some allowed (a_i, b_i) such that t = (a_i s + b_i) \bmod m; we want to do a breadth first search in this graph to find the length of a shortest path from s_0 to 0. However, m can be up to 10^6 and there can be up to 10 rules, giving a total of up to 10^7 edges. In the case that 0 is unreachable, we may have to explore every single edge. So we are going to need a pretty fast implementation; we’ll come back to that later.

Haskell actually has a nice advantage here. This is exactly the kind of problem in which we want to represent the graph implicitly. There is no reason to actually reify the graph in memory as a data structure; it would only waste memory and time. Instead, we can specify the graph implicitly using a function that gives the neighbors of each vertex, which means BFS itself will be a higher-order function. Higher-order functions are very awkward to represent in a language like Java or C++, so when I solve problems like this in Java, I tend to just write the whole BFS from scratch every single time, and I doubt I’m the only one. However, in Haskell we can easily make an abstract interface to BFS which takes a function as input specifying an implicit graph, allowing us to nicely separate out the graph search logic from the task of specifying the graph itself.

What would be my ideal API for BFS in Haskell? I think it might look something like this (but I’m happy to hear suggestions as to how it could be made more useful or general):

data BFSResult v =
  BFSR { level :: v -> Maybe Int, parent :: v -> Maybe v }

bfs ::
  (Ord v, Hashable v) =>
  [v] ->                      -- Starting vertices
  (v -> [v]) ->               -- Neighbors
  (v -> Bool) ->              -- Goal predicate
  BFSResult v

bfs takes a list of vertices to search from (which could be a singleton if there is a single specific starting vertex), a function specifying the out-neighbors of each vertex, and a predicate specifying which vertices are “goal” vertices (so we can stop early if we reach one), and returns a BFSResult record, which tells us the level at which each vertex was encountered, if at all (i.e. how many steps were required to reach it), and the parent of each vertex in the search. If we just want to know whether a vertex was reachable at all, we can see if level returns Just; if we want to know the shortest path to a vertex, we can just iterate parent. Vertices must be Ord and Hashable to facilitate storing them in data structures.

Using this API, the solution is pretty short.

main = C.interact $ runScanner tc >>> solve >>> format

data Move = Move { a :: !Int, b :: !Int } deriving (Eq, Show)
data TC = TC { m :: Int, s0 :: Int, moves :: [Move] } deriving (Eq, Show)

tc :: Scanner TC
tc = do
  m <- int
  n <- int
  TC m <$> int <*> n >< (Move <$> int <*> int)

format :: Maybe Int -> ByteString
format = maybe "-1" showB

solve :: TC -> Maybe Int
solve TC{..} = level res 0
  where
    res = bfs [s0] (\v -> map (step v) moves) (==0)
    step v (Move a b) = (a*v + b) `mod` m

We run a BFS from s_0, stopping when we reach 0, and then look up the level of 0 to see the minimum number of steps needed to reach it.

In part 2, I’ll talk about how to implement this API. There are many viable implementation strategies, but the trick is getting it to run fast enough.

Posted in competitive programming, haskell | Tagged , , , , | 5 Comments

Swarm: a lot can happen in a week

It’s been about a week since I put out an announcement and call for collaboration on a new game, Swarm. Since then, the response has been fantastic: lots of people have tried it out, a few have even streamed themselves playing it on Twitch, and there has been lots of development activity.

There’s still a long, long way to go before the game comes anywhere close to the vision for it, but we’ve made great progress! Some notable new features added since the initial announcement include:

  • New scan, upload, and install commands
  • Semicolons are no longer required beetween consecutive defs
  • Basic help panel, and panel shortcut keys
  • Dramatically reduced CPU usage when idle
  • An overhaul of parsing and pretty-printing of constants (makes adding new constants easier, and an important prerequisite for saving definitions and games)
  • Better handling of water (you can make curry now)!

A couple more exciting things in progress that should land very soon:

  • ASCII art recipes

  • Basic editor integration via LSP, so you can write Swarm programs in your favorite editor with automatically highlighted syntax and type errors.

And of course there are many other exciting things planned or in the works. Come join us!

Posted in haskell, projects | Tagged , , , , | Leave a comment

Swarm: preview and call for collaboration

For about a month now I have been working on building a game1, tentatively titled Swarm. It’s nowhere near finished, but it has at least reached a point where I’m not embarrassed to show it off. I would love to hear feedback, and I would especially love to have others contribute! Read on for more details.

Swarm is a 2D tile-based resource gathering game, but with a twist: the only way you can interact with the world is by building and programming robots. And there’s another twist: the kinds of commands your robots can execute, and the kinds of programming language features they can interpret, depends on what devices they have installed; and you can create new devices only by gathering resources. So you start out with only very basic capabilities and have to bootstrap your way into more sophisticated forms of exploration and resource collection.

I guess you could say it’s kind of like a cross between Minecraft, Factorio, and Karel the Robot, but with a much cooler programming language (lambda calculus + polymorphism + recursion + exceptions + a command monad for first-class imperative programs + a bunch of other stuff).

The game is far from complete, and especially needs a lot more depth in terms of the kinds of devices and levels of abstraction you can build. But for me at least, it has already crossed the line into something that is actually somewhat fun to play.

If it sounds interesting to you, give it a spin! Take a look at the README and the tutorial. If you’re interested in contributing to development, check out the CONTRIBUTING file and the GitHub issue tracker, which I have populated with a plethora of tasks of varying difficulty. This could be a great project to contribute to especially if you’re relatively new to Haskell; I try to keep everything well-organized and well-commented, and am happy to help guide new contributors.


  1. Can you tell I am on sabbatical?

Posted in haskell, projects | Tagged , , , , | 6 Comments

Competitive programming in Haskell: Codeforces Educational Round 114

Yesterday morning I competed in Educational Round 114 on codeforces.com, using only Haskell. It is somewhat annoying since it does not support as many Haskell libraries as Open Kattis (e.g. no unordered-containers, split, or vector); but on the other hand, a lot of really top competitive programmers are active there, and I enjoy occasionally participating in a timed contest like this when I am able.

WARNING: here be spoilers! Stop reading now if you’d like to try solving the contest problems yourself. (However, Codeforces has an editorial with explanations and solutions already posted, so I’m not giving anything away that isn’t already public.) I’m going to post my (unedited) code for each problem, but without all the imports and LANGUAGE extensions and whatnot; hopefully that stuff should be easy to infer.

Problem A – Regular Bracket Sequences

In this problem, we are given a number n and asked to produce any n distinct balanced bracket sequences of length 2n. I immediately just coded up a simple recursive function to generate all possible bracket sequences of length 2n, and then called take n on it. Thanks to laziness this works great. I missed that there is an even simpler solution: just generate the list ()()()()..., (())()()..., ((()))()..., i.e. where the kth bracket sequence starts with k nested pairs of brackets followed by n-k singleton pairs. However, I solved it in only four minutes anyway so it didn’t really matter!

readB = C.unpack >>> read

main = C.interact $
  C.lines >>> drop 1 >>> concatMap (readB >>> solve) >>> C.unlines

bracketSeqs 0 = [""]
bracketSeqs n =
  [ "(" ++ s1 ++ ")" ++ s2
  | k <- [0 .. n-1]
  , s1 <- bracketSeqs k
  , s2 <- bracketSeqs (n - k - 1)
  ]

solve n = map C.pack . take n $ bracketSeqs n

Problem B – Combinatorics Homework

In this problem, we are given numbers a, b, c, and m, and asked whether it is possible to create a string of a A’s, b B’s, and c C’s, such that there are exactly m adjacent pairs of equal letters. This problem requires doing a little bit of combinatorial analysis to come up with a simple Boolean expression in terms of a, b, c, and m; there’s not much to say about it from a Haskell point of view. You can refer to the editorial posted on Codeforces if you want to understand the solution.

readB = C.unpack >>> read

main = C.interact $
  C.lines >>> drop 1 >>> map (C.words >>> map readB >>> solve >>> bool "NO" "YES") >>> C.unlines

solve :: [Int] -> Bool
solve [a,b,c,m] = a + b + c - m >= 3 && m >= z - (x+y) - 1
  where
    [x,y,z] = sort [a,b,c]

Problem C – Slay the Dragon

This problem was super annoying and I still haven’t solved it. The idea is that you have a bunch of “heroes”, each with a numeric strength, and there is a dragon described by two numbers: its attack level and its defense level. You have to pick one hero to fight the dragon, whose strength must be greater than or equal to the dragon’s defense; all the rest of the heroes will stay behind to defend your castle, and their combined strength must be greater than the dragon’s attack. This might not be possible, of course, so you can first spend money to level up any of your heroes, at a rate of one coin per strength point; the task is to find the minimum amount of money you must spend.

The problem hinges on doing some case analysis. It took me a good while to come up with something that I think is correct. I spent too long trying to solve it just by thinking hard; I really should have tried formal program derivation much earlier. It’s easy to write down a formal specification of the correct answer which involves looping over every hero and taking a minimum, and this can be manipulated into a form that doesn’t need to do any looping.

In the end it comes down to (for example) finding the hero with the smallest strength greater than or equal to the dragon’s defense, and the hero with the largest strength less than or equal to it (though one of these may not exist). The intended way to solve the problem is to sort the heroes by strength and use binary search; instead, I put all the heroes in an IntSet and used the lookupGE and lookupLE functions.

However, besides my floundering around getting the case analysis wrong at first, I got tripped up by two other things: first, it turns out that on the Codeforces judging hardware, Int is only 32 bits, which is not big enough for this problem! I know this because my code was failing on the third test case, and when I changed it to use Int64 instead of Int (which means I also had to switch to Data.Set instead of Data.IntSet), it failed on the sixth test case instead. The other problem is that my code was too slow: in fact, it timed out on the sixth test case rather than getting it wrong per se. I guess Data.Set and Int64 just have too much overhead.

Anyway, here is my code, which I think is correct, but is too slow.

data TC = TC { heroes :: ![Int64], dragons :: ![Dragon] }
data Dragon = Dragon { defense :: !Int64, attack :: !Int64 }

main = C.interact $
  runScanner tc >>> solve >>> map (show >>> C.pack) >>> C.unlines

tc :: Scanner TC
tc = do
  hs <- numberOf int64
  ds <- numberOf (Dragon <$> int64 <*> int64)
  return $ TC hs ds

solve :: TC -> [Int64]
solve (TC hs ds) = map fight ds
  where
    heroSet = S.fromList hs
    total = foldl' (+) 0 hs
    fight (Dragon df atk) = minimum $
      [ max 0 (atk - (total - hero)) | Just hero <- [mheroGE] ]
      ++
      [ df - hero + max 0 (atk - (total - hero)) | Just hero <- [mheroLE]]
      where
        mheroGE = S.lookupGE df heroSet
        mheroLE = S.lookupLE df heroSet

I’d like to come back to this later. Using something like vector to sort and then do binary search on the heroes would probably be faster, but vector is not supported on Codeforces. I’ll probably end up manually implementing binary search on top of something like Data.Array.Unboxed. Doing a binary search on an array also means we can get away with doing only a single search, since the two heroes we are looking for must be right next to each other in the array.

Edited to add: I tried creating an unboxed array and implementing my own binary search over it; however, my solution is still too slow. At this point I think the problem is the sorting. Instead of calling sort on the list of heroes, we probably need to implement our own quicksort or something like that over a mutable array. That doesn’t really sound like much fun so I’m probably going to forget about it for now.

Problem D – The Strongest Build

In this problem, we consider a set of k-tuples, where the value for each slot in a tuple is chosen from among a list of possible values unique to that slot (the values for a slot are given to us in sorted order). For example, perhaps the first slot has the possible values 1, 2, 3, the second slot has possible values 5, 8, and the third slot has possible values 4, 7, 16. In this case there would be 3 \times 2 \times 3 possible tuples, ranging from (1,5,4) up to (3,8,16). We are also given a list of forbidden tuples, and then asked to find a non-forbidden tuple with the largest possible sum.

If the list of slot options is represented as a list of lists, with the first list representing the choices for the first slot, and so on, then we could use sequence to turn this into the list of all possible tuples. Hence, a naive solution could look like this:

solve :: Set [Int] -> [[Int]] -> [Int]
solve forbidden =
  head . filter (`S.notMember` forbidden) . sortOn (Down . sum) . sequence

Of course, this is much too slow. The problem is that although k (the size of the tuples) is limited to at most 10, there can be up to 2 \cdot 10^5 choices for each slot (the choices themselves can be up to 10^8). The list of all possible tuples could thus be truly enormous; in theory, there could be up to (2 \cdot 10^5)^{10} \approx 10^{53}), and generating then sorting them all is out of the question.

We can think of the tuples as forming a lattice, where the children of a tuple t are all the tuples obtained by downgrading exactly one slot of t to the next smaller choice. Then the intended solution is to realize that the largest non-forbidden tuple must either be the top element of the lattice (the tuple with the maximum possible value for every slot), OR a child of one of the forbidden tuples (it is easy to see this by contradiction—any tuple which is not the child of a forbidden tuple has at least one parent which has a greater total value). So we can just iterate over all the forbidden tuples (there are at most 10^5), generate all possible children (at most 10) for each one, and take the maximum.

However, that’s not how I solved it! I started thinking from the naive solution above, and wondered whether there is a way to do sortOn (Down . sum) . sequence more efficiently, by interleaving the sorting and the generation. If it can be done lazily enough, then we could just search through the beginning of the generated ordered list of tuples for the first non-forbidden one, without having to actually generate the entire list. Indeed, this reminded me very much of Richard Bird’s implementation of the Sieve of Eratosthenes (see p. 11 of that PDF). The basic idea is to make a function which takes a list of choices for a slot, and a (recursively generated) list of tuples sorted by decreasing sum, and combines each choice with every tuple, merging the results so they are still sorted. However, the key is that when combining the best possible choice for the slot with the largest tuple in the list, we can just immediately return the resulting tuple as the first (best) tuple in the output list, without needing to involve it in any merging operation. This affords just enough laziness to get the whole thing off the ground. I’m not going to explain it in more detail than that; you can study the code below if you like.

I’m quite pleased that this worked, though it’s definitely an instance of me making things more complicated than necessary.


data TC = TC { slots :: [[Choice]], banned :: [[Int]] }

tc = do
  n <- int
  TC <$> (n >< (zipWith Choice [1 ..] <$> numberOf int)) <*> numberOf (n >< int)

main = C.interact $
  runScanner tc >>> solve >>> map (show >>> C.pack) >>> C.unwords

solve :: TC -> [Int]
solve TC{..} = choices . fromJust $ find ((`S.notMember` bannedSet) . choices) bs
  where
    bannedSet = S.fromList banned
    revSlots = map reverse slots
    bs = builds revSlots

data Choice = Choice { index :: !Int, value :: !Int }

data Build = Build { strength :: !Int, choices :: [Int] }
  deriving (Eq, Show, Ord)

singletonBuild :: Choice -> Build
singletonBuild (Choice i v) = Build v [i]

mkBuild xs = Build (sum xs) xs

-- Pre: all input lists are sorted descending.
-- All possible builds, sorted in descending order of strength.
builds :: [[Choice]] -> [Build]
builds []     = []
builds (i:is) = chooseFrom i (builds is)

chooseFrom :: [Choice] -> [Build] -> [Build]
chooseFrom [] _  = []
chooseFrom xs [] = map singletonBuild xs
chooseFrom (x:xs) (b:bs) = addToBuild x b : mergeBuilds (map (addToBuild x) bs) (chooseFrom xs (b:bs))

addToBuild :: Choice -> Build -> Build
addToBuild (Choice i v) (Build s xs) = Build (v+s) (i:xs)

mergeBuilds xs [] = xs
mergeBuilds [] ys = ys
mergeBuilds (x:xs) (y:ys) = case compare (strength x) (strength y) of
  GT -> x : mergeBuilds xs (y:ys)
  _  -> y : mergeBuilds (x:xs) ys

Problems E and F

I didn’t even get to these problems during the contest; I spent too long fighting with problem C and implementing my overly complicated solution to problem D. I might attempt to solve them in Haskell too; if I do, I’ll write about them in another blog post!

Posted in competitive programming, haskell | Tagged , | 2 Comments

Automatically updated, cached views with lens

Recently I discovered a nice way to deal with records where certain fields of the record cache some expensive function of other fields, using the lens library. I very highly doubt I am the first person to ever think of this, but I don’t think I’ve seen it written down anywhere. I’d be very happy to be learn of similar approaches elsewhere.

The problem

Suppose we have some kind of record data structure, and an expensive-to-calculate function which computes some kind of “view”, or summary value, for the record. Like this:

data Record = Record
  { field1 :: A, field2 :: B, field3 :: C }

expensiveView :: A -> B -> C -> D
expensiveView = ...

(Incidentally, I went back and forth on whether to put real code or only pseudocode in this post; in the end, I decided on pseudocode. Hopefully it should be easy to apply in real situations.)

If we need to refer to the summary value often, we might like to cache the result of the expensive function in the record:

data Record = Record
  { field1 :: A, field2 :: B, field3 :: C, cachedView :: D }

expensiveView :: A -> B -> C -> D
expensiveView = ...

However, this has several drawbacks:

  1. Every time we produce a new Record value by updating one or more fields, we have to remember to also update the cached view. This is easy to miss, especially in a large codebase, and will most likely result in bugs that are very difficult to track down.

  2. Actually, it gets worse: what if we already have a large codebase that is creating updated Record values in various places? We now have to comb through the codebase looking for such places and modifying them to update the cachedExpensive field too. Then we cross our fingers and hope we didn’t miss any.

  3. Finally, there is nothing besides comments and naming conventions to prevent us from accidentally modifying the cachedExpensive field directly.

The point is that our Record type now has an associated invariant, and invariants which are not automatically enforced by the API and/or type system are Bad ™.

Lens to the rescue

If you don’t want to use lens, you can stop reading now. (Honestly, given the title, I’m not even sure why you read this far.) In my case, I was already using it heavily, and I had a lightbulb moment when I realized how I could leverage it to add a safe cached view to a data type without modifying the rest of my codebase at all!

The basic idea is this:

  1. Add a field to hold the cached value as before.
  2. Don’t use lens’s TemplateHaskell utilites to automatically derive lenses for all the fields. Instead, declare them manually, such that they automatically update the cached field on every set operation.
  3. For the field with the cached value itself, declare a Getter, not a Lens.
  4. Do not export the constructor or field projections for your data type; export only the type and the lenses.

In pseudocode, it looks something like this:

module Data.Record
  (Record, field1, field2, field3, cachedView)
  where

import Control.Lens

data Record = Record
  { _field1 :: A, _field2 :: B, _field3 :: C, _cachedView :: D }

expensiveView :: A -> B -> C -> D
expensiveView = ...

recache :: Record -> Record
recache r = r { _cachedView = expensiveView (_field1 r) (_field2 r) (_field3 r) }

cachingLens :: (Record -> a) -> (Record -> a -> Record) -> Lens' Record a
cachingLens get set = lens get (\r a -> recache $ set r a)

field1 :: Lens' Record A
field1 = cachingLens _field1 (\r x -> r { _field1 = x })

field2 :: Lens' Record B
field2 = cachingLens _field2 (\r x -> r { _field2 = x })

field3 :: Lens' Record C
field3 = cachingLens _field3 (\r x -> r { _field3 = x })

cachedView :: Getter Record D
cachedView = to _cachedView

This solves all the problems! (1) We never have to remember to update the cached field; using a lens to modify the value of another field will automatically cause the cached view to be recomputed as well. (3) We can’t accidentally set the cached field, since it only has a Getter, not a Lens. In fact, this even solves (2), the problem of having to update the rest of our codebase: if we are already using lens to access fields in the record (as I was), then the rest of the codebase doesn’t have to change at all! And if we aren’t using lens already, then the typechecker will infallibly guide us to all the places we have to fix; once our code typechecks again, we know we have caught every single access to the record in the codebase.

Variant for only a few fields

What if we have a large record, and the cached summary value only depends on a few of the fields? In that case, we can save a bit of work for ourselves by getting lens to auto-generate lenses for the other fields, and only handcraft lenses for the fields that are actually involved. Like this:

{-# LANGUAGE TemplateHaskell #-}

data Record = Record
  { _field1 :: A, _field2 :: B, _cachedView :: C, ... }

expensiveView :: A -> B -> C
expensiveView = ...

let exclude = ['_field1, '_field2, '_cachedView] in
  makeLensesWith
    (lensRules & lensField . mapped . mapped %~ \fn n ->
      if n `elem` exclude then [] else fn n)
  ''Record

field1 :: Lens' Record A
field1 = ... similar to before ...

field2 :: Lens' Record B
field2 = ...

cachedView :: Getter Record C
cachedView = to _cachedView

But what about the lens laws?

You might worry that having a lens for one field automatically update the value of another field might break the lens laws somehow, but it’s perfectly legal, as we can check.

  1. view l (set l v s) ≡ v clearly holds: setting the cachedView on the side doesn’t change the fact that we get back out whatever we put into, say, field1.
  2. set l v' (set l v s) ≡ set l v' s also clearly holds. On the left-hand side, the cached summary value will simply get overwritten in the same way that the other field does.
  3. set l (view l s) s ≡ s is actually a bit more subtle. If we view the value of field1, then set it with the same value again, how do we know the value of the overall record s doesn’t change? In particular, could we end up with a different cachedView even though field1 is the same? But in fact, in this specific scenario (putting the same value back into a field that we just read), the value of the cachedView won’t change. This depends on two facts: first, that the expensiveView is a deterministic function which always returns the same summary value for the same input record. Of course this is guaranteed by the fact that it’s a pure function. Second, we must maintain the invariant that the cachedView is always up-to-date, so that recomputing the summary value after setting a field to the same value it already had will simply produce the same summary value again, because we know the summary value was correct to begin with. And of course, maintaining this invariant is the whole point; it’s guaranteed by the way we only export the lenses (and only a Getter for the cachedView) and not the record constructor.

And that’s it! I’ve been using this approach very successfully in a current project (the same project that got me to implement Hindley-Milner with unification-fd—watch this space for an announcement soon!). If you know of similar approaches that have been written about elsewhere, or if you end up using this technique in your own project, I’d love to hear about it.

Posted in haskell | Tagged , , , | 10 Comments

Competitive programming in Haskell: Kadane’s algorithm

I will be giving a talk on competitive programming in Haskell tomorrow, September 10, at Haskell Love. You should attend! It’s free!

In my last competitive programming post, I challenged you to solve Purple Rain. We are presented with a linear sequence of cells, each colored either red or blue, and we are supposed to find the (contiguous) segment of cells with the maximal absolute difference between the number of red and blue. For example, below is shown one of the sample inputs, with the solution highlighted: the segment from cell 3 to cell 7 (the cells are numbered from 1) has four red cells compared to only one blue, for an absolute difference of three. You can verify that no other segment does better.

Transforming the problem

The obvious way to do this is to generate a list of all segments, compute the absolute difference between the number of red and blue cells for each, and take the maximum. However, that approach is doomed to exceed the time limit in any programming language: it would take O(n^3) time (O(n^2) possible segments times O(n) to sum each one), and the problem states that n can be up to 10^5. With 10^8 operations per second as a good rule of thumb, O(n^3) with n = 10^5 is clearly too slow. (In fact, any time you see an input size of 10^5, it is a dead giveaway that you are expected to find an O(n) or O(n \lg n) solution. 10^5 is big enough to make an O(n^2) solution prohibitively slow, but not so big that I/O itself becomes the bottleneck.)

The first insight is that we can transform this into the classic problem of finding the maximum sum subarray (also known as the maximum segment sum; either way I will abbreviate it as MSS) in two steps: first, turn each red cell into a 1, and each blue into -1. The sum of a segment then tells us how many more red than blue cells there are. Now, we actually want the biggest absolute difference between red and blue; but if we have an algorithm to find the MSS we can just run it twice: once to find the maximum excess of red over blue, and again with 1 and -1 flipped to find the maximum excess of blue over red.

The MSS problem has a long history in the functional programming community, being one of the flagship problems to demonstrate the techniques of program derivation in the style of the Bird-Meertens Formalism, aka Squiggol and The Algebra of Programming. It is possible to start out with a naive-but-obviously-correct O(n^3) implementation, and do a series of equational transformations to turn it into an efficient O(n) algorithm! If you’ve never seen that kind of thing before, I highly recommend checking it out; the Wikipedia page on the Bird-Meertens Formalism, linked above, is a good place to start. Certainly getting good at such derivations can be a handy skill when doing competitive programming in Haskell. But in any case, today I want to approach the problem from a different point of view, namely, coming up with a good functional equivalent to an existing imperative algorithm.

Kadane’s algorithm

Kadane’s algorithm, first proposed by Jay Kadane sometime in the late 1970s, is a linear-time algorithm for solving the MSS problem. It is actually quite simple to implement (the tricky part is understanding why it works!).

The idea is to loop through an array while keeping track of two things: a current value cur, and a best value. The best value is just the greatest value cur has ever taken on, so keeping it updated is easy: on every loop, we compare cur to best, and save the value of cur into best if it is higher. To keep cur updated, we simply add each new array value to it—but if it ever falls below zero, we just reset cur to zero. Here is some Java code:

public static int kadane(int[] a) {
    int best = 0, cur = 0;
    for (int i = 0; i < a.length; i++) {
        cur += a[i];
        if (cur < 0) cur = 0;
        if (cur > best) best = cur;
    }
   return best;
}

Again, it is not at all obvious why this works, though putting in the effort to understand a proof is well worth the time. That is not the purpose of this blog post, however, so I’ll leave you to read about it on your own!

Translating Kadane’s algorithm to Haskell

In the imperative version, we iterate through a list, keep track of a current value, and also keep track of the best value we have seen so far. It is possible to translate this directly to Haskell: create a record with two fields, one for the current thing and one for the best thing, then iterate through the list with foldl', doing the appropriate update at each step:

data State s = State { curThing :: s, bestThing :: s }

-- Given a way to update the current s value with the next list
-- element of type a, update a State s value which keeps track of the
-- current s value as well as the best s value seen so far.
step :: Ord s => (s -> a -> s) -> (State s -> a -> State s)
step update (State cur best) a = State next (max best next)
  where
    next = update cur a

bestIntermediate :: Ord s => (s -> a -> s) -> s -> [a] -> s
bestIntermediate update init = bestThing . foldl' (step update) (State init init)

But there’s a much better way! Note that the update function has the right type to be used with foldl'. But if we just computed foldl' update init directly, we would get only the single s value at the very end. But our goal is to get the best out of all the intermediate values. No problem: a scan is just a fold that returns all the intermediate values instead of only the final one! So instead of all this complicated and quasi-imperative State stuff, we just do a scanl' followed by taking the maximum:

bestIntermediate :: Ord s => (s -> a -> s) -> s -> [a] -> s
bestIntermediate update init = maximum . scanl' update init

Ah, much better! Using bestIntermediate, we can now translate Kadane’s algorithm as follows:

kadane1 :: [Int] -> Int
kadane1 = bestIntermediate next 0
  where
    next s a = max 0 (s + a)

Whenever I write down an algorithm like this in Haskell—especially if I have “translated” it from an existing, imperative algorithm—I like to figure out how I can generalize it as much as possible. What structure is assumed of the inputs that makes the algorithm work? Can we replace some concrete monomorphic types with polymorphic ones? What type class constraints are needed? Often these sorts of generalizations make for supposedly tricky competitive programming problems. For example, if you have only ever seen Dijkstra’s algorithm presented in terms of finding shortest paths by summing edge weights, it takes quite a bit of insight to realize that the algorithm really just finds the “best” path with respect to operations that play nicely with each other in certain ways. For example, if we use the operations max and min in place of min and (+), Dijkstra’s algorithm finds the path with the maximum bottleneck. (This will probably end up being its own blog post at some point…)

Anyway, how can we generalize kadane1? The obvious starting point is that if we just let GHC infer a type for kadane1, we would get something more general:

kadane2 :: (Num a, Ord a) => [a] -> a
kadane2 = bestIntermediate next 0
  where
    next s a = max 0 (s + a)

The only thing we do with the input list elements is add them and compare them; we also need 0 to have the same type as the input list elements. So the algorithm works for anything that has Ord and Num instances.

But wait—do we really need Num if all we are using is 0 and +? Really we are using the monoid of integers under addition. So we can generalize again, to any ordered monoid:

kadane :: (Monoid a, Ord a) => [a] -> a
kadane = bestIntermediate next mempty
  where
    next s a = max mempty (s <> a)

In fact, if you study the proof of Kadane’s algorithm, you will see that this works just so long as the monoid operation interacts nicely with the ordering, that is, if x < y implies x \diamond z < y \diamond z and z \diamond x < z \diamond y for all z (this is what is usually meant by an “ordered monoid”).

Finding the best segment

So far, our code finds the best segment sum, but it doesn’t tell us which segment it was that was best—and for this problem we are actually supposed to output the starting and ending indices of the best segment, not the maximum red-blue difference itself.

If I were doing this in Java, I would probably just add several more variables: one to record where the segment currently being considered starts (which gets reset to i+1 when cur is reset to 0), and two to record the start and end indices of the best segment so far. This gets kind of ugly. Conceptually, the values actually belong in triples representing a segment: the start and end index together with the sum. In Java, it would be too heavyweight to construct a class to store these three values together, so in practice I would just do it with a mess of individual variables as described above. Fortunately, in Haskell, this is very lightweight, and we should of course create a data type to represent a segment.

It’s also worth noting that in Haskell, we were naturally led to make a polymorphic bestIntermediate function, which will work just as well with a segment type as it does Int. Only our kadane function itself will have to change. We will make a data type to represent segments, with an appropriate Ord instance to specify when one segment is better than another, and we will update the next helper function to update a segment instead of just updating a sum.

The solution

So let’s get started! First, some LANGUAGE pragmas and imports we will need, and a basic solution framework.

{-# LANGUAGE DerivingStrategies         #-}
{-# LANGUAGE DerivingVia                #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE LambdaCase                 #-}
{-# LANGUAGE ViewPatterns               #-}

import           Control.Arrow         ((>>>))
import           Data.ByteString.Char8 (ByteString)
import qualified Data.ByteString.Char8 as C
import           Data.List             (scanl')
import           Data.Semigroup

showB :: Show a => a -> C.ByteString
showB = show >>> C.pack

main = C.interact $ solve >>> format

Next, a data type to represent a segment, along with an Ord instance, and a function to format the answer for this problem. Note that we make all the fields of Segment strict (this makes a big difference in runtime—you should pretty much always use strict fields unless there is a really good reason not to). Notice also that the Ord instance is where we encode the problem’s specific instructions about how to break ties: “If there are multiple possible answers, print the one that has the Westernmost (smallest-numbered) starting section. If there are multiple answers with the same Westernmost starting section, print the one with the Westernmost ending section.” So we first compare Segments by sum, then by left index, then by right index, being careful to reverse the order of comparison for the indices, since better for indices means smaller.

-- The segment [l,r) (i.e. l inclusive, r exclusive), with its sum
data Segment a = I { l :: !Int, r :: !Int, s :: !a } deriving (Eq, Show)

instance Ord a => Ord (Segment a) where
  compare (I l1 r1 s1) (I l2 r2 s2)
    = compare s1 s2 <> compare l2 l1 <> compare r2 r1

format :: Segment a -> ByteString
format (I l r _) = C.unwords [showB l, showB (r-1)]
  -- (r-1) since r is exclusive but we are supposed to show
  -- the index of the last element

And now for Kadane’s algorithm itself. The bestIntermediate function is unchanged. kadane changes to start with an appropriate “empty” segment, and to use a next function that updates the current segment [l,r) based on the next list element a (the one at index r+1). The right index always gets incremented to r+1. If the current sum combined with a is “negative” (that is, less than mempty), we reset the left index to r+1 as well (making the segment empty), and the running sum to mempty. Otherwise, we leave the left index unchanged and add a to the running sum. I also added an argument to indicate the index of the first list element, since in this problem the list is supposed to be indexed from 1, but future problems might be indexed from 0, and I would never remember which it is. (I suppose we could also take a list of pairs (i,a) where i is the index of a. That would even work for non-consecutive indices.)

bestIntermediate :: Ord s => (s -> a -> s) -> s -> [a] -> s
bestIntermediate update init = maximum . scanl' update init

kadane :: (Monoid a, Ord a) => Int -> [a] -> Segment a
kadane ix = bestIntermediate next (I ix ix mempty)
  where
    next (I l r s) a
      | s<>a < mempty = I (r+1) (r+1) mempty
      | otherwise     = I l     (r+1) (s<>a)

Finally, we can write the solve function itself. We map R’s and B’s in the input to 1 and -1, respectively, to generate a list of 1’s and -1’s called mountain. Then we call kadane on mountain and on map negate mountain, and pick whichever gives us a better answer. But wait, not quite! Remember that kadane needs a Monoid instance for the list elements, and Int has none. So we can whip up a quick newtype, Step, that has the right instances (note how deriving Num also allows us to use the literal values 1 and -1 as Step values). DerivingVia is quite handy in situations like this.

solve :: ByteString -> Segment Step
solve (C.init -> b) = max (kadane 1 mountain) (kadane 1 (map negate mountain))
  where
    mountain = map (\case { 'R' -> 1; 'B' -> -1 }) (C.unpack b)

newtype Step = Step Int
  deriving (Semigroup, Monoid) via Sum Int
  deriving (Eq, Ord, Num)

Next time: Modulo Solitaire and fast BFS

For next time, I invite you to solve Modulo Solitaire. Warning: this is a straightforward BFS problem; the issue is getting a Haskell solution to run fast enough! I struggled with this for quite some time before coming up with something that worked. My ultimate goal in this case is to develop a polished library with an elegant, functional API for doing BFS, but that runs fast under the hood. I’m very curious to see how others might approach the problem.

Posted in competitive programming, haskell | Tagged , , , , | 1 Comment