I had fun this past December solving Advent of Code problems in Haskell. I was particularly proud of my solution to one particular problem involving generating and processing large bitstrings, which I’d like to share here. I think it really shows off the power of an algebraic, DSL-based approach to problem solving.

This post is literate Haskell—download it and play along!

```
> {-# LANGUAGE GADTs #-}
>
> import Control.Arrow ((***))
> import Data.Bits (xor)
> import Data.List (unfoldr)
> import Data.List.Split (chunksOf)
> import Data.Maybe (fromJust)
```

# The problem

You can go read the problem description if you like, but it’s rather verbose—I’ll try to give a more concise description here, illustrated with Haskell code.

The problem is concerned with strings of bits:

```
> type BitString = [Bool]
```

We’ll start just by defining a few utility functions to view and input bitstrings conveniently.

```
> readbits :: String -> BitString
> readbits = map (=='1')
>
> showbits :: BitString -> String
> showbits = map (\b -> if b then '1' else '0')
>
> withbits :: (BitString -> BitString) -> String -> String
> withbits f = showbits . f . readbits
```

Now on to the problem proper. There is a central operation—which I’ll call the “dragon transform”—which makes a longer bitstring from a shorter one. Given a bitstring , append a 0 to the end, and then append a reversed and inverted version of (where “invert” means to flip all the bits). Like so:

```
> invert :: BitString -> BitString
> invert = map not
>
> dragon :: BitString -> BitString
> dragon s = s ++ [False] ++ invert (reverse s)
```

For example,

```
ghci> withbits dragon "1"
"100"
ghci> withbits dragon "1101111"
"110111100000100"
```

(Incidentally, this operation is called `dragon`

since it is related to the classic dragon curve. Hint: interpret 0 as a left turn and 1 as a right turn.)

Given a starting bitstring, and a target length, we are supposed to iterate `dragon`

until we have at least the number of target bits, and then truncate the string to the desired length:

```
> fill :: Int -> BitString -> BitString
> fill len = take len . head . dropWhile ((< len) . length) . iterate dragon
```

For example, if we start with `1`

, we have to iterate `dragon`

three times to end up with at least ten bits.

```
ghci> map showbits . take 4 $ iterate dragon [True]
["1","100","1000110","100011001001110"]
ghci> withbits (fill 10) "1"
"1000110010"
```

Finally, after extending an initial bitstring to a given length, we perform a checksum operation:

- If there are an odd number of bits, we are done.
- Otherwise, take the bits two at a time and compute the negation of their exclusive or: that is, 1 if the bits are the same and 0 if they are different (otherwise known as
`(==)`

). This results in a bitstring half as long. Now repeat the process, continuing to halve the length until we have an odd number of bits remaining.

In code:

```
> checksum :: BitString -> BitString
> checksum a
> | odd (length a) = a
> | otherwise = checksum . map xnor . chunksOf 2 $ a
> where
> xnor [x,y] = x == y
```

# The first task

So, we now have a simple reference implementation that directly follows the specification. We can use this to solve the first task, which just asks to start with a given short bitstring, extend it to length , and then compute the checksum. I think different logged-in users get different starting strings, but mine was `01000100010010111`

:

```
> input = "01000100010010111"
```

```
ghci> withbits (checksum . fill 272) input
"10010010110011010"
```

Notice that , so after expanding to that length and then repeatedly halving the length, we end up with a checksum of length 17.

# The second task

That was easy. Bring on the second task! Well… of course, it is much bigger. It asks to use the same starting bitstring, but this time extend it to length before computing the checksum (which will again end up having length 17). Using this naive, unoptimized implementation completely blows up: it turns out that generating a list of 35 million booleans is really not a good idea. Using actual lists with a cons cell for each bit incurs a whole lot of memory and allocation overhead; it just made my computer grind to a halt.

As you may realize, there is a lot of low-hanging fruit here: for example, we can use an unboxed `Vector`

instead of a list, or even do some deforestation to avoid allocation (the former code is by Eric Mertens aka `glguy`

, the latter by Daniel Wagner aka `dmwit`

). Using techniques like that, it’s possible to get the runtime and memory requirements down to something reasonable. But that’s not what I want to talk about. Though more efficient, those solutions are still actually computing every single bit. It seemed to me we shouldn’t have to do that: the computation has a lot of nice structure, and seemingly a lot of opportunity for sharing intermediate results. I went off in search of a way to compute the correct checksum *without* actually generating the entire intermediate bitstring.

# Interlude: xnor

The first order of business was to work out an algebraic understanding of the `xnor`

operation, which I will denote (the circled plus operator denotes `xor`

, and the overbar denotes logical negation). One fundamental fact is that

(checking whether and are equal is the same as first negating one and then checking whether they are unequal). From this, and the fact that is associative, we can prove associativity of `xnor`

:

Associativity, along with the fact that is an identity for the operation, means it forms a monoid. When we repeatedly take the `xnor`

of adjacent bits, we are therefore basically doing an `mconcat`

using a strictly balanced combining scheme. But associativity means we can be freer about the order in which we do the combining. If we start with a bitstring of length , the checksumming operation iterates times, and each consecutive sequence of bits gets folded down into a single bit via `mconcat`

. In other words, the `checksum`

operation can be reimplemented like this:

```
> checksum2 :: BitString -> BitString
> checksum2 a = map combine . chunksOf (powTwo (length a)) $ a
> where
> combine = foldr (==) True
>
> -- Find the biggest power of two that divides n
> powTwo n
> | odd n = 1
> | otherwise = 2 * powTwo (n `div` 2)
```

Let’s check that this works:

```
ghci> withbits (checksum2 . fill 272) input
"10010010110011010"
ghci> let bits = fill 272 (readbits input) in checksum bits == checksum2 bits
True
```

Now, this isn’t really any faster yet; but this idea will be important later!

There’s one more thing we can observe about `xnor`

: if we fold an *odd* number of bits with `xnor`

, it’s the same as taking the `xor`

of all the bits; if we fold an *even* number of bits, it’s the same as taking the `xor`

of all the bits and then negating the result. That is,

and so on. The proof is a simple induction argument, making use of the relation we noted before. So when folding `xnor`

, as a simple optimization, we can avoid doing a lot of negations by just computing the `xor`

and then negating appropriately based on the parity of the number of bits.

# The algebra of bitstrings

With that under our belts, we can move on to the real meat of the solution. The central idea is that instead of representing bitstrings directly as lists (or vectors, or whatever) of bits, we represent them using a *deep embedding* of a little bitstring algebra (aka DSL). That is, we represent each bitstring operation as a constructor of an algebraic data type, which allows us to directly manipulate bitstring *expressions*. The point is that this algebra/DSL has a lot of nice structure that allows us to work at an abstract, algebraic level instead of working directly with bits.

There’s one more twist to note before actually seeing the data type definition. We know that we will need to talk about the *length* of bitstrings as well as their *xnor/xor*. Instead of having to recalculate these every time we need them, we can *cache* them at each node of a bitstring expression. We’ll see how these cached values come in handy later.

```
> data BitExpr where
```

So, what does our algebra of bitstrings need? First, it’s useful to have an explicit representation of the empty bitstring, as well as a singleton bit. We don’t need to cache length or `xor`

values here, since they are obvious and can be computed in constant time.

```
> Emp :: BitExpr
> Bit :: Bool -> BitExpr
```

Next, we need to be able to append bitstrings. Notice the `Bool`

, which represents the cached `xor`

of the entire bitstring, as well as the `Integer`

which represents the length.

```
> App :: !Bool -> !Integer -> BitExpr -> BitExpr -> BitExpr
```

Finally, we need three unary operations on bitstrings: invert, reverse, and `dragon`

. Each also carries a cached length and `xor`

.

```
> Inv :: !Bool -> !Integer -> BitExpr -> BitExpr
> Rev :: !Bool -> !Integer -> BitExpr -> BitExpr
> Drg :: !Bool -> !Integer -> BitExpr -> BitExpr
>
> deriving Show
```

Note that `Drg`

is redundant in some sense, since the dragon transform can be encoded in terms of append, inverse, and reverse. However, it’s critical that we include it explicitly: since the dragon transform uses the input bitstring twice, expanding an iterated application of `Drg`

in terms of the other constructors would result in an exponential blowup in the size of the expression.

To be concrete, let’s write a straightforward interpreter which formally connects a bitstring expression with its intended semantics as a bitstring. This comes in handy for testing, but other than testing, the whole point is that we will *not* use this—we want to solve the problem at the level of bitstring expressions, without ever actually generating their corresponding bitstrings.

```
> toBits :: BitExpr -> BitString
> toBits Emp = []
> toBits (Bit b) = [b]
> toBits (App _ _ s1 s2) = toBits s1 ++ toBits s2
> toBits (Inv _ _ s) = invert (toBits s)
> toBits (Rev _ _ s) = reverse (toBits s)
> toBits (Drg _ _ s) = dragon (toBits s)
```

Next, let’s write some simple utility functions to extract the cached length or `xor`

from the root of a bitstring expression:

```
> bsLen :: BitExpr -> Integer
> bsLen Emp = 0
> bsLen (Bit _) = 1
> bsLen (App _ l _ _) = l
> bsLen (Inv _ l _) = l
> bsLen (Rev _ l _) = l
> bsLen (Drg _ l _) = l
>
> bsXor :: BitExpr -> Bool
> bsXor Emp = False
> bsXor (Bit b) = b
> bsXor (App b _ _ _) = b
> bsXor (Inv b _ _) = b
> bsXor (Rev b _ _) = b
> bsXor (Drg b _ _) = b
```

Next, we’ll write some smart constructors which automatically take care of properly computing the cached length and `xor`

.

```
> bit :: Bool -> BitExpr
> bit = Bit
```

Appending combines `xor`

values with `xor`

and adds lengths. `app`

also does a bit of optimization when appending with the empty bitstring. For convenience, we can also use `app`

to create a function `bits`

to convert a literal bitstring into a `BitExpr`

.

```
> app :: BitExpr -> BitExpr -> BitExpr
> app s1 Emp = s1
> app s1 s2 = App (bsXor s1 `xor` bsXor s2) (bsLen s1 + bsLen s2) s1 s2
>
> bits :: String -> BitExpr
> bits = foldr (app . bit . (=='1')) Emp
```

Inverting a bitstring preserves the `xor`

when it has even length, and inverts the `xor`

when it has odd length. Note how we make use of *both* the cached `xor`

and length values to compute the new cached `xor`

.

```
> inv :: BitExpr -> BitExpr
> inv s = Inv (if even (bsLen s) then bsXor s else not (bsXor s))
> (bsLen s)
> s
```

Reversing preserves `xor`

and length.

```
> rev :: BitExpr -> BitExpr
> rev s = Rev (bsXor s) (bsLen s) s
```

Finally, the `dragon`

operation: the `xor`

of `dragon s`

is the `xor`

of `s`

combined with the `xor`

of `inv s`

; the length is one more than twice the length of `s`

.

```
> drg :: BitExpr -> BitExpr
> drg s = Drg (bsXor s `xor` bsXor (inv s)) (2*(bsLen s) + 1) s
```

We can test these:

```
ghci> let t = drg (bits "11" `app` inv (bits "10000"))
ghci> showbits . toBits $ t
"110111100000100"
ghci> bsLen t
15
```

# Splitting

Remember that our high-level goal is to take the expanded version of our bitstring, split it into blocks of length , and then separately reduce each block with `xnor`

. It turns out that we have enough information to split a bitstring *expression* into two bitstring expressions which correspond to splitting off a block of a given size from the beginning of the corresponding bitstring. That is, we will write a function `splitBits :: Integer -> BitExpr -> (BitExpr, BitExpr)`

which works like `splitAt`

, but on bitstring expressions instead of bitstrings. In other words, it will satisfy the property

`splitAt n . toBits == (toBits *** toBits) . splitBits n`

We’ll go through the implementation case by case. You might like to try implementing `splitBits`

yourself before peeking at mine; it makes for a nice exercise.

```
> splitBits :: Integer -> BitExpr -> (BitExpr, BitExpr)
```

In the base cases, to split zero bits off the front of a bitstring, or if we are asked to split off more bits than there are, just generate the empty bitstring expression.

```
> splitBits 0 s = (Emp, s)
> splitBits n s | n >= bsLen s = (s, Emp)
```

To split an `App`

node, compare the number of bits we want to split off with the length of the first bitstring, and recursively split in either the left or right side appropriately, remembering to subtract the length of the first bitstring from the number of bits to split if we recurse on the right side.

```
> splitBits n (App _ _ s1 s2)
> | n < bsLen s1
> = let (s1a, s1b) = splitBits n s1 in (s1a, s1b `app` s2)
> | otherwise
> = let (s2a, s2b) = splitBits (n - bsLen s1) s2 in (s1 `app` s2a, s2b)
```

Inverting commutes with splitting, so to split an `Inv`

node, we can just split recursively and then rewrap the results with `inv`

.

```
> splitBits n (Inv _ _ s) = (inv *** inv) $ splitBits n s
```

To split `Rev`

and `Drg`

nodes, we expand the expressions a bit to get rid of the top-level constructor before re-calling `splitBits`

.

```
> splitBits n (Rev _ _ s) = splitBits n (pushRev s)
> splitBits n (Drg _ _ s) = splitBits n (expandDragon s)
```

In the case of `Rev`

, we can “push the reverse through” one level, transforming it into an equivalent expression which no longer has a `Rev`

node at the top. We make use of some nice algebraic properties governing the interaction of reverse with the other operations:

- Reversing an empty or singleton bitstring does nothing.
`reverse (s1 ++ s2) == reverse s2 ++ reverse s1`

`reverse . invert = invert . reverse`

`reverse . reverse = id`

- Finally,
`reverse . dragon = dragon . invert`

, which can be easily proved by expanding`dragon`

in terms of the other operations and then applying the above algebraic laws.

Using these properties, we can implement `pushRev`

as follows:

```
> pushRev :: BitExpr -> BitExpr
> pushRev Emp = Emp
> pushRev (Bit b) = Bit b
> pushRev (App _ _ s1 s2) = rev s2 `app` rev s1
> pushRev (Inv _ _ s) = inv (rev s)
> pushRev (Rev _ _ s) = s
> pushRev (Drg _ _ s) = drg (inv s)
```

Finally, `expandDragon`

just expands a dragon operation in terms of the other operations. Although this approximately doubles the size of the bitstring expression, we only do this lazily, when we are actually trying to split the result of a dragon transform. It’s only natural that splitting an expression results in somewhat larger expressions.

```
> expandDragon :: BitExpr -> BitExpr
> expandDragon s = s `app` (bit False `app` inv (rev s))
```

# Filling and checksumming

We’re almost there! We can now implement the `fill`

and `checksum`

operations at the level of bitstring expressions.

`fill`

is straightforward: keep applying the `drg`

smart constructor until the cached length is sufficient, then use `splitBits`

to create an expression corresponding to only the first bits.

```
> fillE :: Integer -> String -> BitExpr
> fillE n str = fst . splitBits n $ go (bits str)
> where
> go s | bsLen s >= n = s
> | otherwise = go (drg s)
```

Finally, we can implement `checksumE`

using the same pattern as `checksum2`

, where we break up the string into chunks of size and then reduce each chunk. The only difference is that now we use `splitBits`

to split, and the cached `xor`

to compute the reduction. We know each of the blocks has an even length, so the `xnor`

is just the negation of the cached `xor`

.

```
> checksumE :: BitExpr -> BitString
> checksumE s = map (not . bsXor) . unfoldr doSplit $ s
> where
> doSplit Emp = Nothing
> doSplit s = Just (splitBits blockLen s)
> blockLen = powTwo (bsLen s)
> powTwo n
> | odd n = 1
> | otherwise = 2 * powTwo (n `div` 2)
```

Let’s check that we get the same answer for the first task:

```
ghci> showbits $ checksumE (fillE 272 input)
"10010010110011010"
ghci> withbits (checksum . fill 272) input
"10010010110011010"
```

Great! And now for the second task:

```
ghci> showbits $ checksumE (fillE (17 * 2^21) input)
"01010100101011100"
```

On my machine this finishes pretty much instantaneously, taking only 0.02 seconds. In order to generate enough bits, the dragon transform must be applied 21 times, but that just generates a small expression with 21 `Drg`

constructors. Splitting into chunks of length certainly expands the size of the expressions a bit, but everything stays nice and logarithmic since many of the `Drg`

constructors can remain unexpanded.

In fact, this can easily handle MUCH larger problem instances. For example:

```
ghci> showbits $ checksumE (fillE (17 * 2^80) input)
"10000100010001100"
ghci> showbits $ checksumE (fillE (17 * 2^81) input)
"01010100101011100"
```

Semantically, this corresponds to generating *yottabytes* worth of bits (I had to look up the proper prefix) and then checksumming them; operationally, though, these are still basically instantaneous. (Interestingly, I also tried , and it instantaneously printed the first 11 bits of the answer and then segfaulted. Perhaps I have found a bug in GHC 8.0.2.)

Notice that the checksum for is actually the same as that for . After playing around with it a bit, the checksums for seem to have a period of 12, but I’m not sure how to prove it!