Competitive programming in Haskell: Infinite 2D array, Levels 2 and 3

In a previous post, I challenged you to solve Infinite 2D Array using Haskell. As a reminder, the problem specifies a two-parameter recurrence F_{x,y}, given by

  • F_{0,0} = 0
  • F_{0,1} = F_{1,0} = 1
  • F_{i,0} = F_{i-1,0} + F_{i-2,0} for i \geq 2
  • F_{0,i} = F_{0,i-1} + F_{0,i-2} for i \geq 2
  • F_{i,j} = F_{i-1,j} + F_{i,j-1} for i,j \geq 1.

Last time, we derived a formula for F_{x,y} that involves only a linear number of terms:

\displaystyle F_{x,y} = \left(\sum_{1 \leq k \leq x} F_k \binom{x-k+y-1}{x-k}\right) + \left(\sum_{1 \leq k \leq y} F_k \binom{y-k+x-1}{y-k}\right) \pmod{P}

While the number of terms may be linear, it can still be on the order of a million terms, so computing each term is going to have to be pretty quick in order to fit the whole thing within the one second time limit.

Fibonacci numbers modulo a prime

Computing Fibonacci numbers modulo a prime is not hard, especially since we want all the Fibonacci numbers from 1 up to \max(x,y): just compute each one by adding the previous two modulo P. We could also precompute a table of Fibonacci numbers mod P this way. And any of the fast methods for computing individual Fibonacci numbers (for example, using 2×2 matrix exponentiation) also work just fine if you reduce everything modulo P at each step, since they only involve addition, subtraction, and multiplication.

Binomial coefficients modulo a prime

What about binomial coefficients? Since n and k are not too large, and in particular since they will both be smaller than P, we can use the usual formula:

\displaystyle \binom n k = \frac{n!}{k!(n-k)!}

(If n and k could be much larger, or if they could be larger than P, we would have to use something like Lucas’s Theorem or other techniques; that might make for another interesting blog post sometime.) But how do we handle division in modular arithmtic? Since we’re working modulo a prime, every value a other than zero must have a modular inverse, that is, a value a^{-1} such that a \cdot a^{-1} \equiv 1 \pmod p (this is a corollary of Bézout’s Theorem). To compute the modular inverse for a given a, we have a couple options. One simple way is to use Fermat’s Little Theorem: if a is not divisible by a prime p, then a^{p-2} \cdot a = a^{p-1} \equiv 1 \pmod p, hence a^{p-2} is the modular inverse of a modulo p, and we can compute it efficiently using repeated squaring modulo p. Another option is to use the extended Euclidean algorithm to find the x and y (guaranteed to exist by Bézout’s Theorem) such that ax + py = 1; then x is the inverse of a modulo p.

Both of these methods take O(\lg p) time. In my experience, computing the p-2 power is easier to code (especially in Haskell where we get exponentiation by repeated squaring for free!), but using the extended Euclidean algorithm can be a bit faster when it’s well-optimized. (Note the extended Euclidean algorithm can be faster when a is small, but raising to the p-2 power always takes the same number of steps no matter what a is.)

Factorials modulo a prime

Since we’re going to be repeatedly using the same factorials, one thing we absolutely must do is precompute a table of factorials mod P, from 0 up to some maximum. In this case, since our formula involves things like \binom {x-k+y-1}{x-k}, we may need factorials up to x + y, so a table of size 2 \times 10^6 will do (x and y can be up to 10^6).

We could also precompute a table of modular inverses of factorials; to compute the inverse of k!, we just find the inverse of each k and multiply it by the (previously computed) inverse of (k-1)!. (Or we could just invert the value for k! stored in the other table.) Making a table of inverse factorials like this turns out not to help too much for this particular problem, but it can be an important optimization in some cases.

The end?

So we can compute each additional Fibonacci number in O(1); we can also now compute binomial coefficients modulo P in O(\lg P), with a few O(1) table lookups for factorials and an O(\lg P) inversion operation. (Again, we could achieve O(1) if we also stored a table of inverse factorials, but for this problem it seems the additional time needed to construct the table in the first place outweighs the time saved computing binomial coefficients.) In theory, we have everything we need to solve this problem efficiently.

However, for this problem, constant factors matter! There’s still quite a bit of nontrivial work I had to do to get my code fast enough. In my next and final post on this problem, we’ll walk through a few different ideas for implementing this concretely in Haskell.

Posted in competitive programming, haskell | Tagged , , | Leave a comment

Binary search over floating point representations

I got some good feedback on my last post about binary search, and thought it was worth a follow-up post.

An important fix

First things first: commenter Globules pointed out that doing (l+r) `div` 2 can overflow; my initial, glib answer was “sure, you have to be careful when doing arithmetic with fixed-sized integers”, but then I realized that it was easily fixable in this case! So I replaced that formula with l + (r-l) `div` 2 which won’t overflow, even when searching for values near the top of the range of a fixed-sized integer type.

An entirely frivolous distraction that I spent way too much time on

With that out of the way, let’s get to the more interesting discussion. Several commenters took issue with my use of unsafeCoerce to convert Double into Word64 and do binary search over bit representations for floating-point numbers, and they raised some good points:

  1. Even if, in the context of competitive programming, we’re not particularly concerned with maintainability or safety, there’s no guarantee that unsafeCoerce will behave the same on our local machine as on the judging machine! For example, the endianness might be different.
  2. Worse, it just plain doesn’t work: my worry about the bit representation of negative numbers looking bigger than that of positive numbers, because of the sign bit, was actually a very valid worry. It just so happens to work when searching for positive numbers, but when searching for negative numbers it goes into an infinite loop!

In my defense, I got this idea from Jules Jacobs’s original article… but looking at it again, there is a key footnote that I hadn’t noticed before:

I’m assuming that f2b respects ordering, that is, comparing f2b(x) < f2b(y) gives the same result as comparing the floats x < y. Depending on the bit representation of floats, one would have to shuffle the mantissa and exponent and sign bits around to ensure this.

Oops, of course! It turns out there are a lot of things about the IEEE-754 standard which make this work nicely for positive numbers: the exponent is stored first, with a bias so we don’t have to deal with signed exponents, and the mantissa is always understood to have a leading 1 bit which is not stored. For positive floating-point numbers x and y, it’s already the case that x < y if and only if their IEEE-754 representations, considered as unsigned integers, are in the same order! This is very clever, and it seems it was done this way on purpose, so that hardware comparison of floating-point numbers can be fast. And this is what made my example in the previous post work at all.

However, for negative numbers this doesn’t quite work. First of all, the high-order bit is the sign bit, so negative numbers all appear larger when interpreted as unsigned integers. Interpreting them as signed integers doesn’t work either, because they are just stored as a sign bit and a magnitude, as opposed to signed integers which are typically stored using 2’s complement, so negative floating point numbers are “backwards” compared to the interpretation of their bit pattern as (signed or unsigned) integers. But this is not hard to fix; commenter babel linked to a reference explaining exactly how to do the required bit-twiddling. Essentially, we always flip the sign bit, and flip all the other bits too if the sign bit was set.

So I could have just done this bit-twiddling on the result of unsafeCoerce. However, goaded by some other commenters, I wanted to try using encodeFloat/decodeFloat instead of unsafeCoerce to make it a little more platform-independent. I ended up spending many hours on this. I fixed about 17 gazillion bugs and gave up several times in frustration, only to realize another bug later and come back to it. In the end, though, I got it to work! You can see my f2b :: Double -> Word64 and b2f :: Word64 -> Double functions here. I do make some assumptions about the size of Double values, so it’s not completely platform-independent, but at least it should be independent of the endianness.

How do I know it’s correct? Well, I can’t be 100% sure, but I verified each of the following properties by running QuickCheck on a million random inputs (and I used these properties to find lots of bugs!):

  • f2b is monotonic: for all x :: Double, x < y if and only if f2b x < f2b y.
  • b2f is left inverse to f2b: for all x :: Double, b2f (f2b x) == x.
  • b2f is almost a right inverse to f2b; this direction is made more complicated by the fact that some Word64 values correspond to Infinity or NaN. Also, there are some Word64 values that correspond to really tiny floating-point values on the very edge of what is representable, where f2b (b2f w) is one more or less than the original w.
  • It’s actually not enough that x < y implies f2b x < f2b y; we also need the fact that the midpoint between f2b x and f2b y will correspond to a floating-point number between x and y. I was worried about this for a while, until I finally understood the fact that the mantissa is always assumed to have a leading 1 which is not stored. That makes everything work out nicely, and I checked this property with QuickCheck as well.

So, let’s see it in action! We can search for negative values now, or values that don’t exist, etc.

λ> search floating (> (-3.2934)) (-100) 100
λ> search floating (\x -> x**7 >= 1e34) (-1e100) (1e100)
λ> search floating (\x -> x**2 >= 150) 0 100
λ> search floating (\x -> x**2 >= (-150)) (-1e308) (1e308)

So, was it worth it? From a competitive programming point of view, probably not! I can think of one or two times I’ve really struggled with precision issues where this might have helped. But 99.9% of the time you can just use a normal binary search on floating-point values until you get within the required tolerance. Overall, though, despite the extreme frustration, this was a fun detour through some things I didn’t understand very well before. I now know a lot more about IEEE-754 encoding and Haskell’s support for floating-point values!

Posted in haskell | Tagged , , , | Leave a comment

Competitive programming in Haskell: better binary search

Binary search is a workhorse of competitive programming. There are occasional easy problems where binary search is the solution in and of itself; more often, it’s used as a primitive building block of more complex algorithms. It is often presented as a way to find the index of something in a sorted array in O(\lg n) time, and many languages have such a thing in their standard library (for example, see Arrays.binarySearch in Java, the bisect library in Python, or the binary_search function in C++). However, the idea of binary search is more general than searching in a sorted array; we’re doing binary search any time we repeatedly halve a search interval. For example, we can use it to find the smallest or largest number with a given property, or to find an optimal, “just right” measurement that is neither too small nor too big.

Generic binary search with first-class functions, take 1

A language with easy access to first-class functions provides a great opportunity to generalize binary search properly. For example, here’s a version of binary search that has lived in my competitive programming solution template for a long time:

-- Discrete binary search.  Find the smallest integer in [lo,hi] such
-- that monotone predicate p holds.
binarySearchD :: Int -> Int -> (Int -> Bool) -> Int
binarySearchD lo hi p
  | lo == hi = lo
  | p mid     = binarySearchD lo mid p
  | otherwise = binarySearchD (mid+1) hi p
    mid = (lo + hi) `div` 2

The key generalization is that it takes a predicate of type Int -> Bool as an argument. Note that in order for binary search to work, the predicate p must be monotonic. This means, intuitively, that p starts out False, and once it switches to True it never goes back. (Formally, p being monotonic means that for all x and y, if x <= y then p x <= p y, where False <= True). This is how we can tell if we’re “too low” or “too high”: we’re “too low” when p is False and “too high” when it is True.

This is definitely an improvement over array-specific versions of binary search. We can still use it to search in an array by providing a predicate that does an array lookup, but we can use it for other things as well.

I should note at this point that there is a very nice binary-search package published on Hackage, which I definitely recommend if you need binary search in some kind of project. However, for the purposes of competitive programming, we can’t rely on that package being available, and we’d also like something a bit simpler, so we don’t have to read the documentation every time we want to use it.

…can we do better?

So my binarySearchD function works fine as far as it goes, and I have used it regularly, but there are still several things about it that always annoyed me:

  • What if we want a slight variation, such as the largest integer such that something holds? Or the last integer where the predicate doesn’t hold? etc.? It is possible to use binarySearchD in these situations, but I find it tricky and error-prone to figure out how. And when I’m reaching for some function as a building block of a bigger algorithm I definitely don’t want to waste time and brainpower having to think carefully about small details like this.

  • Getting the implementation right in the first place was kind of tricky. Should we use mid+1? mid-1? Should we think in terms of a closed interval [lo,hi], or a half-open interval [lo,hi), or…? How can we convince ourselves our implementation is completely correct, and won’t get stuck in infinite recursion?

  • What if we want to do binary search over a continuous domain, like Double? We have to make a completely separate function, for example, like this:

-- Continuous binary search.  Given a tolerance eps, an interval
-- [a,b], a continuous, monotonically increasing function f, and a
-- target value tgt, find c ∈ [a,b] such that f(c) = tgt.
binarySearch :: (Fractional t, Ord t, Ord a) => t -> t -> t -> (t -> a) -> a -> t
binarySearch eps a b f tgt = go a b
    go lo hi
      | hi-lo < eps = mid
      | f mid < tgt = go mid hi
      | otherwise   = go lo mid
        mid = (lo + hi)/2

(Actually, I’m not sure why I wrote that version in terms of finding a “target” value. In practice I suppose continuous binary search often comes up that way, but looking at it now it seems less general. In any case, we’re going to throw this function away very shortly so it doesn’t really matter!)

Recently I came across a lovely article, Binary Search a Little Simpler & More Generic by Jules Jacobs. Jules explains a really elegant API for binary search that is so much better than anything I’d seen before, and solves all the above issues! I immediately went to implement it in Haskell, and I want to share it with you. As I’ve reflected on Jules’s presentation, I have identified three key ideas:

  1. Rather than looking for some index with a certain property, we’re really looking for the place where p switches from False to True. That actually happens in between two indices… so let’s return the pair of indices bracketing the change, rather than just a single index! This means we get both the last index that does not have property p and the first one that does, and we can use whichever one we want.

    This is a simple change, but in my experience, it helps a lot to reduce the cognitive load. Previously, if I wanted something like “the last index that does not have property p” I’d have to think hard about what the index I get out of the search represents, and figure out that I needed to subtract one. Now I only have to think “OK, I want the thing right before the predicate changes from False to True, so I can project it out with fst”.

  2. The second important idea is that we’re going to insist that p switches from False to True, not at most once, but exactly once. (If necessary, we can add special “virtual” -∞ and/or +∞ indices such that p (-∞) = False and p (+∞) = True.) Then as we narrow down our current search interval [l, r], we will maintain the invariant that p l = False and p r = True.

    This invariant makes everything so much cleaner, and it also ties in with the first important idea of returning a pair instead of a single index. Previously I always thought of binary search in terms of searching for a specific index, but that makes the semantics of the interval tricky. For example, do we maintain the invariant that the index we’re looking for is somewhere in the closed interval [l,r]? Somewhere in the half-open interval [l,r)? …? But I find it so much more elegant and natural to say “l always stays in the False part, and r always stays in the True part, and we just slide them closer until we find the exact dividing line between False and True.”

    I will note that there are a couple tradeoffs: first of all, our search function of course takes starting values for l and r as inputs, and it will now have as a prerequisite that p l = False and p r = True, so we have to think a little harder when calling it. We also have to work a little harder to figure out when e.g. a value we’re looking for was not found at all. Typically, if we use some sort of initial special +∞ value for r, if the returned r value is still +∞ it means nothing at all was found that made the predicate True.

  3. The final important idea is to abstract out a function mid to compute a potential next index to look at, given the current interval. We’ll insist that when mid l r returns a value, it must be strictly in between l and r (there’s no point in returning l or r because we already know p l = False and p r = True), and we’ll stop when it returns Nothing. This lets us cleanly separate out the logic of the recursion and keeping track of the current search interval from the details of the arithmetic needed for each step. In particular, it will allow us to unify binary search over both integral and floating-point domains.

Here’s the final form of our search function. Unlike, say, binarySearchD, it pretty much writes itself at this point:

search :: (a -> a -> Maybe a) -> (a -> Bool) -> a -> a -> (a,a)
search mid p = go
    go l r = case mid l r of
      Nothing -> (l,r)
      Just m
        | p m       -> go l m
        | otherwise -> go m r

We check our mid function to tell us what to look at next. If it returns Nothing, we stop and return the pair of the current (l,r). If it returns a “midpoint” m then we test the predicate on m and recurse. No tricky +1’s or -1’s to think about; given our invariant regarding l and r, it’s obvious which one we should replace with m depending on the outcome of the predicate, and we can’t get stuck in an infinite loop since m is always strictly between l and r.

(As an aside, I love that this is polymorphic in a with no class constraints! That’s another hint that this is really quite general. The class constraints will come with particular mid functions.)

So what about those mid functions? Here’s one for doing binary search over integers:

binary :: Integral a => a -> a -> Maybe a
binary l r
  | r - l > 1 = Just ((l+r) `div` 2)
  | otherwise = Nothing

Pretty straightforward! We stop when l and r are exactly one apart; otherwise we return their midpoint (you should convince yourself that (l+r) `div` 2 is always strictly in between l and r when r - l > 1).

For example, we can use this to take an integer square root:

λ> search binary (\x -> x^2 >= 150) 0 100

This tells us that 12 is the biggest integer whose square is less than 150, and 13 is the smallest integer whose square is greater.

But we needn’t limit ourselves to integers; as hinted previously, we can also do binary search over Fractional domains:

continuous :: (Fractional a, Ord a) => a -> a -> a -> Maybe a
continuous eps l r
  | r - l > eps = Just ((l+r) / 2)
  | otherwise = Nothing

Given an eps value, we stop when r - l <= eps, and otherwise return their midpoint. For example, we can use this to find the square root of 150 to 6 decimal places:

λ> search (continuous 1e-6) (\x -> x^2 >= 150) 0 100

We can even write some functions to do linear search! Why might we want to do that, you ask? Well, with some care, these can be used even with non-monotonic predicates, to find the first or last place the predicate switches from False to True (though using something like find or findIndex is typically easier than using search fwd).

fwd :: (Num a, Ord a) => a -> a -> Maybe a
fwd l r
  | r - l > 1 = Just (l+1)
  | otherwise = Nothing

bwd :: (Num a, Ord a) => a -> a -> Maybe a
bwd l r
  | r - l > 1 = Just (r-1)
  | otherwise = Nothing

I don’t have any great examples of using these off the top of my head, but we might as well include them.

[WARNING: this section about binary search on bit representations of floating-point numbers is completely wrong, but I’m leaving it here for context. See the discussion in the comments to this post and the follow-up post!]

But there’s more: we can also do exact binary search on the bit representations of floating-point numbers! That is, we do binary search as if the bit representations of l and r were unsigned integers. This is possibly more efficient than “continuous” binary search, and lets us find the two precisely adjacent floating-point numbers where our predicate switches from False to True.

binaryFloat :: Double -> Double -> Maybe Double
binaryFloat l r = decode <$> binary (encode l) (encode r)
    encode :: Double -> Word64
    encode = unsafeCoerce

    decode :: Word64 -> Double
    decode = unsafeCoerce

For example, we can find the closest possible floating-point approximation to the square root of 150:

λ> search binaryFloat (\x -> x^2 >= 150) 0 100
λ> sqrt 150

This honestly seems like black magic to me, and I don’t know enough about floating-point representation to have a good idea of how this works and what the caveats might be, but it’s worked for all the examples I’ve tried. It even works when l is negative and r is positive (it seems like in that case the bit representation of l would correspond to a larger unsigned integer than r, but somehow it all works anyway!).

λ> search binaryFloat (\x -> x^2 >= 150) (-100) 100


I’ve added the code from this post to my comprog-hs repository on GitHub. The source for this blog post itself is available on


And here are some problems for you to solve! I’ll discuss some of them in an upcoming post.

Posted in competitive programming, haskell | Tagged , , | 22 Comments

Swarm virtual hackathon

This Wednesday, December 14, we will have the first (annual? monthly? probably somewhere in between…) Swarm swarm, i.e. collaborative virtual hackathon. Details can be found here on the Swarm wiki.

As a reminder, Swarm is a 2D, open-world programming and resource gathering game, implemented in Haskell, with a strongly-typed, functional programming language and a unique upgrade system. Unlocking language features is tied to collecting resources, making it an interesting challenge to bootstrap your way into the use of the full language.

Posted in haskell, projects | Tagged , , , , | Leave a comment

Competitive programming in Haskell: Infinite 2D array, Level 1

In my previous post, I challenged you to solve Infinite 2D Array using Haskell. As a reminder, the problem specifies a two-parameter recurrence F_{x,y}, given by

  • F_{0,0} = 0
  • F_{0,1} = F_{1,0} = 1
  • F_{i,0} = F_{i-1,0} + F_{i-2,0} for i \geq 2
  • F_{0,i} = F_{0,i-1} + F_{0,i-2} for i \geq 2
  • F_{i,j} = F_{i-1,j} + F_{i,j-1} for i,j \geq 1.

We are given particular values of x and y, and asked to compute F_{x,y} \bmod (10^9 + 7). The problem is that x and y could be as large as 10^6, so simply computing the entire x \times y array is completely out of the question: it would take almost 4 terabytes of memory to store a 10^6 \times 10^6 array of 32-bit integer values. In this post, I’ll answer the Level 1 challenge: coming up with a general formula for F_{x,y}.

We need to be more clever about computing a given F_{x,y} without computing every entry in the entire 2D array, so we look for some patterns. It’s pretty obvious that the array has Fibonacci numbers along both the top two rows and the first two columns, though it’s sadly just as obvious that we don’t get Fibonacci numbers anywhere else. The last rule, the rule that determines the interior entries, says that each interior cell is the sum of the cell above it and the cell to the left. This looks a lot like the rule for generating Pascal’s triangle, i.e. binomial coefficients; in fact, if the first row and column were specified to be all 1’s instead of Fibonacci numbers, then we would get exactly binomial coefficients.

I knew that binomial coefficients can also be thought of as counting the number of paths from one point in a grid to another which can only take east or south steps, and this finally gave me the right insight. Each interior cell is a sum of other cells, which are themselves sums of other cells, and so on until we get to the edges, and so ultimately each interior cell can be thought of as a sum of a bunch of copies of numbers on the edges, i.e. Fibonacci numbers. How many copies? Well, the number of times each Fibonacci number on an edge contributes to a particular interior cell is equal to the number of paths from the Fibonacci number to the interior cell (with the restriction that the paths’ first step must immediately be into the interior of the grid, instead of taking a step along the first row or column). For example, consider F_{3,2} = 11. The two 1’s along the top row contribute 3 times and 1 time, respectively, whereas the 1’s and 2 along the first column contribute 3 times, 2 times, and once, respectively, for a total of 11:

The number of paths from F_{0,k} to F_{x,y} is the number of grid paths from (1,k) to (x,y), which is \binom{(x-1) + (y-k)}{y-k}. Likewise the number of paths from F_{k,0} to F_{x,y} is \binom{(x-k) + (y-1)}{x-k}. All together, this yields the formula

\displaystyle F_{x,y} = \left(\sum_{1 \leq k \leq x} F_k \binom{x-k+y-1}{x-k}\right) + \left(\sum_{1 \leq k \leq y} F_k \binom{y-k+x-1}{y-k}\right) \pmod{P}

Commenter Soumik Sarkar found a different formula,

\displaystyle F_{x,y} = F_{x+2y} + \sum_{1 \leq k \leq y} (F_k - F_{2k}) \binom{y-k+x-1}{y-k} \pmod{P}

which clearly has some similarity to mine, but I have not been able to figure out how to derive it, and Soumik did not explain how they found it. Any insights welcome!

In any case, both of these formulas involve a sum of only O(x+y) terms, instead of O(xy), although the individual terms are going to be much more work to compute. The question now becomes how to efficiently compute Fibonacci numbers and binomial coefficients modulo a prime. I’ll talk about that in the next post!

Posted in competitive programming, haskell | Tagged , , | 1 Comment

Swarm alpha release!

The Swarm development team is very proud to announce the very first alpha release of the game. There are still many missing features (for example, saving games is not yet possible) and known bugs, but at this point it’s quite playable (and, dare we say, fun!) and ready for some intrepid souls to try it out and give us some feedback.

What is it?

Swarm is a 2D, open-world programming and resource gathering game with a strongly-typed, functional programming language and a unique upgrade system. Unlocking language features is tied to collecting resources, making it an interesting challenge to bootstrap your way into the use of the full language.

Notable changes since the last progress update include:

  • An all-new in-game tutorial consisting of a sequence of guided challenges
  • Several new challenge scenarios (mazes! towers of hanoi!), and documentation on how to make your own
  • Lots more in-game help and info, including help on currently available commands + recipes, and a dialog showing all live robots
  • Many more entities, recipes, and language features to explore and collect
  • Better mouse support
  • Backwards incremental search and tab completion in the REPL
  • Many, many small bug fixes and improvements!

Give it a try!

To install, check out the installation instructions: you can download a binary release (for now, Linux only, but MacOS binaries should be on the horizon), or install from Hackage. Give it a try and send us your feedback, either via a github issue or via IRC!

Future plans & getting involved

We’re still hard at work on the game, and will next turn our attention to some big features, such as:

Of course, there are also tons of small things that need fixing and polishing too! If you’re interested in getting involved, check out our contribution guide, come join us on IRC (#swarm on Libera.Chat), or take a look at the list of issues marked “low-hanging fruit”.

Brought to you by the Swarm development team:

  • Brent Yorgey
  • Ondřej Šebek
  • Tristan de Cacqueray

With contributions from:

  • Alexander Block
  • Daniel Díaz Carrete
  • Huw Campbell
  • Ishan Bhanuka
  • Jacob
  • Jens Petersen
  • José Rafael Vieira
  • Joshua Price
  • lsmor
  • Noah Yorgey
  • Norbert Dzikowski
  • Paul Brauner
  • Ryan Yates
  • Sam Tay

…not to mention many others who gave valuable suggestions and feedback. Want to see your name listed here in the next release? See how you can contribute!

Posted in haskell, projects | Tagged , , , , | Leave a comment

Competitive programming in Haskell: Infinite 2D array

If you like number theory, combinatorics, and/or optimizing Haskell code, I challenge you to solve Infinite 2D Array using Haskell.

  • Level 1: can you come up with a general formula to compute F_{x,y}?
  • Level 2: In general, how can you efficiently compute F_{x,y} \pmod {10^9 + 7}?
  • Level 3: Now implement the above ideas in Haskell so your solution actually fits within the 1 second time limit.

I have solved it but it was definitely challenging. In a subsequent blog post I’ll talk about my solution and ask for other optimization ideas.

Posted in competitive programming, haskell | Tagged , , | 7 Comments

Types for top-level definitions

I’ve come up with idea for a type system for first-class (global) definitions, which can serve as a very lightweight alternative to a proper module system. I’m posting it here in the hopes of getting some feedback and pointers to related work.

Commands and expressions

The programming language of Swarm (for lack of a better term I will hereafter refer to it as Swarmlang) has a bunch of imperative commands, and standard monadic sequencing constructs. For example,

move; move

does two move commands in sequence, and

thing <- grab; give friend thing

first executes grab, binding the variable thing to the result, then executes give friend thing. Of course, there is also a rich language of pure expressions, with things like arithmetic, strings, lambdas and function application, pairs, sums, and so on.

Some languages make a syntactic distinction between statements and expressions, but Swarmlang does not: everything is an expression, and some expressions happen to have a command type. If t is a type, then cmd t is the type of an imperative command which, when executed, can have some effects and then returns a result of type t. (Of course this should feel very familiar to Haskell programmers; cmd has many similarities to IO.) This approach makes many things simpler and means that commands are first-class values.

Typechecking definitions

Swarmlang has definitions, which are just expressions with a command type. If e is an expression, then

def x = e end

has type cmd (). When executed, it should have the effect of binding the name x to the expression e, and bringing x into scope for all subsequent commands. Thus, it is valid to sequence this first definition with a second definition that mentions x, like so:

def x = e end;
def y = foo bar x end

Of course, this means that while typechecking the definition of y, we must be able to look up the type of x. However, the type of the first def command is simply cmd (), which does not tell us anything about x or its type. Normally, the typing rule for sequencing of commands would be something like

\displaystyle \frac{\Gamma \vdash c_1 : \mathrm{cmd}\; \tau_1 \qquad \Gamma \vdash c_2 : \mathrm{cmd}\; \tau_2}{\Gamma \vdash c_1 ; c_2 : \mathrm{cmd}\;\tau_2}

but this does not work for def commands, since it does not take into account the new names brought into scope. Up until now, I have dealt with this in a somewhat ad-hoc manner, with some special typechecking rules for def and some ad-hoc restrictions to ensure that def can only syntactically show up at the top level. However, I would really like to put everything on a more solid theoretical basis (which will hopefully simplify the code as well).

Decorating command types

The basic idea is to decorate the \mathrm{cmd} type with extra information about names bound by definitions. As usual, let \Gamma denote a generic context, that is, a finite mapping from variable names to their types. Then we extend the cmd type by adding a context to it:

\mathrm{cmd}\; \tau \Rightarrow \Gamma

is the type of a command which yields a result of type \tau and produces global bindings for some names whose types are recorded in \Gamma. (Of course, we can continue to use \mathrm{cmd}\; \tau as an abbreviation for \mathrm{cmd}\; \tau \Rightarrow \varnothing.) So, for example, def x = 3 end no longer has type \mathrm{cmd}\; (), but rather something like \mathrm{cmd}\; () \Rightarrow \{x : \mathrm{int}\}, representing the fact that although def x = 3 end does not result in an interesting value, it does bind a name, x, whose type is int.

This is slightly unusual in the fact that types and contexts are now mutually recursive, but that doesn’t seem like a big problem. We can now write down a proper typing rule for sequencing that takes definitions into account, something like this:

\displaystyle \frac{\Gamma \vdash c_1 : \mathrm{cmd} \; \tau_1 \Rightarrow \Gamma_1 \qquad \Gamma, \Gamma_1 \vdash c_2 : \mathrm{cmd} \; \tau_2 \Rightarrow \Gamma_2}{\Gamma \vdash c_1 ; c_2 : \mathrm{cmd} \; \tau_2 \Rightarrow \Gamma, \Gamma_1, \Gamma_2}

And of course the typing rule for def looks like this:

\displaystyle \frac{\Gamma \vdash e : \tau}{\Gamma \vdash \texttt{def}\; x = e\; \texttt{end} : \mathrm{cmd}\; () \Rightarrow \{x : \tau\}}

These rules together can now correctly typecheck an expression like

def x = 3 end;
def y = 2 + x end

where the second definition refers to the name defined by the first. The whole thing would end up having type \mathrm{cmd}\; () \Rightarrow \{ x : \mathrm{int}, y : \mathrm{int} \}.

…with polymorphism?

All this seems straightforward with only first-order types, as in my example typing rules above. But once you add parametric polymorphism my brain starts to hurt. Clearly, the context associated to a command type could bind variables to polytypes. For example, def id = \x.x end has type \mathrm{cmd}\; () \Rightarrow \{id : \forall \alpha. \alpha \to \alpha\}. But should the context associated to a command type always contain polytypes, or only when the command type is itself a polytype? In other words, how do we deal with the associated contexts in the monotypes that show up during type inference? And what would it mean to unify two command types with their contexts (and would that ever even be necessary)? I hope it’s actually simple and I just need to think about it some more, but I haven’t wrapped my brain around it yet.

Ideas and pointers welcome!

I’d be very happy to hear anyone’s ideas, or (especially) pointers to published work that seems related or relevant! Feel free to comment either here, or on the relevant github issue.

Posted in projects | Tagged , , | 7 Comments

Swarm: status report

Swarm is a 2D programming and resource gathering game, written in Haskell. I announced it last September and gave an update one week after that, but haven’t written anything since then. However, that doesn’t mean development has stopped! Since last October, the repo has grown by an additional 4K lines of Haskell code (to 12K). Notable changes since then include:

  • Many UI improvements, like a main menu, ASCII art recipes, and mouse support
  • Many new entities and recipes (multiple types of motors, drills, and mines; new materials like iron, silver, quartz, and glass; new devices like circuits, clocks, cameras, comparators, compasses, …)
  • The game now supports scenarios, generically describing how to initialize the game and what the winning conditions should be: scenarios can be used to describe open-world games, tutorials, tricky programming challenges, … etc.
  • Improvements to the programming language, like a new dedicated robot type, and a “delay” type for things that should be evaluated lazily
  • Increased emphasis on exploration, with the ability to scan unknown entities in the world

Development has picked up considerably in the past few weeks, and we’re making good progress toward a planned alpha release (though no concrete plans in terms of a release date yet). If you’re interested in getting involved, check out our contribution guide, come join us on IRC (#swarm on Libera.Chat) and take a look at the list of issues marked “low-hanging fruit”—as of this writing there are 28 such issues, so plenty of tasks for everyone!

Posted in haskell, projects | Tagged , , , , | 1 Comment

Competitive programming in Haskell: BFS, part 4 (implementation via STUArray)

In a previous post, we saw one way to implement our BFS API, but I claimed that it is not fast enough to solve Modulo Solitaire. Today, I want to demonstrate a faster implementation. (It’s almost certainly possible to make it faster still; I welcome suggestions!)

Once again, the idea is to replace the HashMaps from last time with mutable arrays, but in such a way that we get to keep the same pure API—almost. In order to allow arbitrary vertex types, while storing the vertices efficiently in a mutable array, we will require one extra argument to our bfs function, namely, an Enumeration specifying a way to map back and forth between vertices and array indices.

So why not instead just restrict vertices to some type that can be used as keys of a mutable array? That would work, but would unnecessarily restrict the API. For example, it is very common to see competitive programming problems that are “just” a standard graph algorithm, but on a non-obvious graph where the vertices are conceptually some more complex algebraic type, or on a graph where the vertices are specified as strings. Typically, competitive programmers just implement a mapping between vertices to integers on the fly—using either some math or some lookup data structures on the side—but wouldn’t it be nicer to be able to compositionally construct such a mapping and then have the graph search algorithm automatically handle the conversion back and forth? This is exactly what the Enumeration abstraction gives us.

This post is literate Haskell; you can obtain the source from the darcs repo. The source code (without accompanying blog post) can also be found in my comprog-hs repo.

{-# LANGUAGE FlexibleContexts    #-}
{-# LANGUAGE RankNTypes          #-}
{-# LANGUAGE RecordWildCards     #-}
{-# LANGUAGE ScopedTypeVariables #-}

module Graph where

import Enumeration

import           Control.Arrow       ((>>>))
import           Control.Monad
import           Control.Monad.ST
import qualified Data.Array.IArray   as IA
import           Data.Array.ST
import           Data.Array.Unboxed  (UArray)
import qualified Data.Array.Unboxed  as U
import           Data.Array.Unsafe   (unsafeFreeze)
import           Data.Sequence       (Seq (..), ViewL (..), (<|), (|>))
import qualified Data.Sequence       as Seq

infixl 0 >$>
(>$>) :: a -> (a -> b) -> b
(>$>) = flip ($)
{-# INLINE (>$>) #-}

exhaustM is like exhaust from the last post, but in the context of an arbitrary Monad. Each step will now be able to have effects (namely, updating mutable arrays) so needs to be monadic.

exhaustM :: Monad m => (a -> m (Maybe a)) -> a -> m a
exhaustM f = go
    go a = do
      ma <- f a
      maybe (return a) go ma

The BFSResult type is the same as before.

data BFSResult v =
  BFSR { getLevel :: v -> Maybe Int, getParent :: v -> Maybe v }

Instead of using HashMaps in our BFSState as before, we will use STUArrays.1 These are unboxed, mutable arrays which we can use in the ST monad. Note we also define V as a synonym for Int, just as a mnemonic way to remind ourselves which Int values are supposed to represent vertices.

type V = Int
data BFSState s =
  BS { level :: STUArray s V Int, parent :: STUArray s V V, queue :: Seq V }

To initialize a BFS state, we allocate new mutable level and parent arrays (initializing them to all -1 values), and fill in the level array and queue with the given start vertices. Notice how we need to be explicitly given the size of the arrays we should allocate; we will get this size from the Enumeration passed to bfs.

initBFSState :: Int -> [V] -> ST s (BFSState s)
initBFSState n vs = do
  l <- newArray (0,n-1) (-1)
  p <- newArray (0,n-1) (-1)

  forM_ vs $ \v -> writeArray l v 0
  return $ BS l p (Seq.fromList vs)

The bfs' function implements the BFS algorithm itself. Notice that it is not polymorphic in the vertex type; we will fix that with a wrapper function later. If you squint, the implementation looks very similar to the implementation of bfs from my previous post, with the big difference that everything has to be in the ST monad now.

bfs' :: Int -> [V] -> (V -> [V]) -> (V -> Bool) -> ST s (BFSState s)
bfs' n vs next goal = do
  st <- initBFSState n vs
  exhaustM bfsStep st
    bfsStep st@BS{..} = case Seq.viewl queue of
      EmptyL -> return Nothing
      v :< q'
        | goal v -> return Nothing
        | otherwise -> v >$> next >>> filterM (fmap not . visited st)
            >=> foldM (upd v) (st{queue=q'}) >>> fmap Just

    upd p b@BS{..} v = do
      lp <- readArray level p
      writeArray level v (lp + 1)
      writeArray parent v p
      return $ b{queue = queue |> v}

visited :: BFSState s -> V -> ST s Bool
visited BS{..} v = (/= -1) <$> readArray level v
{-# INLINE visited #-}

The bfs function is a wrapper around bfs'. It presents the same API as before, with the exception that it requires an extra Enumeration v argument, and uses it to convert vertices to integers for the inner bfs' call, and then back to vertices for the final result. It also handles freezing the mutable arrays returned from bfs' and constructing level and parent lookup functions that index into them. Note, the use of unsafeFreeze seems unavoidable, since runSTUArray only allows us to work with a single mutable array; in any case, it is safe for the same reason the use of unsafeFreeze in the implementation of runSTUArray itself is safe: we can see from the type of toResult that the s parameter cannot escape, so the type system will not allow any further mutation to the arrays after it completes.

bfs :: forall v. Enumeration v -> [v] -> (v -> [v]) -> (v -> Bool) -> BFSResult v
bfs Enumeration{..} vs next goal
  = toResult $ bfs' card (map locate vs) (map locate . next . select) (goal . select)
    toResult :: (forall s. ST s (BFSState s)) -> BFSResult v
    toResult m = runST $ do
      st <- m
      (level' :: UArray V Int) <- unsafeFreeze (level st)
      (parent' :: UArray V V) <- unsafeFreeze (parent st)
      return $
          ((\l -> guard (l /= -1) >> Just l) . (level' IA.!) . locate)
          ((\p -> guard (p /= -1) >> Just (select p)) . (parent' IA.!) . locate)

Incidentally, instead of adding an Enumeration v argument, why don’t we just make a type class Enumerable, like this?

class Enumerable v where
  enumeration :: Enumeration v

bfs :: forall v. Enumerable v => [v] -> ...

This would allow us to keep the same API for BFS, up to only different type class constraints on v. We could do this, but it doesn’t particularly seem worth it. It would typically require us to make a newtype for our vertex type (necessitating extra code to map in and out of the newtype) and to declare an Enumerable instance; in comparison, the current approach with an extra argument to bfs requires us to do nothing other than constructing the Enumeration itself.

Using this implementation, bfs is finally fast enough to solve Modulo Solitaire, like this:

main = C.interact $ runScanner tc >>> solve >>> format

data Move = Move { a :: !Int, b :: !Int } deriving (Eq, Show)
data TC = TC { m :: Int, s0 :: Int, moves :: [Move] } deriving (Eq, Show)

tc :: Scanner TC
tc = do
  m <- int
  n <- int
  TC m <$> int <*> n >< (Move <$> int <*> int)

type Output = Maybe Int

format :: Output -> ByteString
format = maybe "-1" showB

solve :: TC -> Output
solve TC{..} = getLevel res 0
    res = bfs (finiteE m) [s0] (\v -> map (step m v) moves) (==0)

step :: Int -> Int -> Move -> Int
step m v (Move a b) = (a*v + b) `mod` m
{-# INLINE step #-}

It’s pretty much unchanged from before, except for the need to pass an Enumeration to bfs (in this case we just use finiteE m, which is the identity on the interval [0 .. m)).

Some remaining questions

This is definitely not the end of the story.

  • Submitting all this code (BFS, Enumeration, and the above solution itself) as a single file gives a 2x speedup over submitting them as three separate modules. That’s annoying—why is that?

  • Can we make this even faster? My solution to Modulo Solitaire runs in 0.57s. There are faster Haskell solutions (for example, Anurudh Peduri has a solution that runs in 0.32s), and there are Java solutions as fast as 0.18s, so it seems to me there ought to be ways to make it much faster. If you have an idea for optimizing this code I’d be very interested to hear it! I am far from an expert in Haskell optimization.

  • Can we generalize this nicely to other kinds of graph search algorithms (at a minimum, DFS and Dijkstra)? I definitely plan to explore this question in the future.

For next time: Breaking Bad

Next time, I want to look at a few other applications of this BFS code (and perhaps see if we can improve it along the way); I challenge you to solve Breaking Bad.

  1. Why not use Vector, you ask? It’s probably even a bit faster, but the vector library is not supported on as many platforms.↩︎

Posted in competitive programming, haskell | Tagged , , , , , , | 3 Comments