Higher-dimensional enumeration

The other day in the #haskell IRC channel, some of us were playing around with generalizing Haskell list enumeration syntax:

Prelude> [1..5]
[1,2,3,4,5]

First, we defined the (…) operator to perform enumeration, so it can be partially applied. (Unfortunately, we can’t define a (..) operator for this purpose, since it’s reserved syntax.)

Prelude> let x ... y = [x..y]
Prelude> map (...4) [1,3,7]
[[1,2,3,4],[3,4],[]]

So, how to generalize? Someone typed this:

Prelude> let x .... y = map (x...) (x...y)
Prelude> 1 .... 3
[[1],[1,2],[1,2,3]]

Interesting! And what about five dots? Unfortunately, this doesn’t work:

Prelude> let x ..... y = map (x...) (x....y)

<interactive>:1:28:
    Occurs check: cannot construct the infinite type: a = [a]
      Expected type: [a]
      Inferred type: [[a]]
    In the second argument of `map', namely `(x .... y)'
    In the expression: map ((x ...)) (x .... y)

Of course, the problem is that (x …. y) has type [[a]], so we can’t just use map (x…), we have to map twice in order to reach the proper depth:

Prelude> let x ..... y = map (map (x...)) (x .... y)
Prelude> 1 ..... 3
[[[1]],[[1],[1,2]],[[1],[1,2],[1,2,3]]]

We could keep going, although it gets increasingly difficult to tell apart the number of dots.

x ...... y = (map . map . map) (x...) (x ..... y)
x ....... y = (map . map . map . map) (x...) (x ...... y)
...

Let’s take a step back and think about what is going on here. First, (1…3) just means we are counting from 1 up to 3: [1,2,3]. At the next level, (1 .... 3) means we are counting up to (1 … 3) = [1,2,3]: first [1], then [1,2], then [1,2,3]. So (1 .... 3) = [[1],[1,2],[1,2,3]]. We can illustrate it thus:

    1
  1    2
1    2    3

Then (1 ..... 3) means we’re counting up to THAT: first [[1]], then [[1],[1,2]], then [[1],[1,2],[1,2,3]]. Any guesses on a nice visualization? That’s right, it’s a tetrahedron!

In general, continuing this procedure gives us higher- and higher-dimensional simplices. From this perspective, an expression like [1..5] is really giving us discrete points on a 1-simplex, i.e. a line. Now, the big question: can we generalize the operators (…), (….), (…..), etc. into a single higher-order function which can compute this “simplicial enumeration of order n”, for any given n > 0?

At first blush this doesn’t seem possible. For one thing, there’s a problem with assigning a type to this hypothetical higher-order enumeration function: each of the enumeration operators we made returns a different type! (…) returns [a], (....) returns [[a]], and so on. So we’d have to unify the infinite family of types [a], [[a]], [[[a]]], etc. into a single type. Secondly, there’s the problem that each specific enumeration operator uses a different number of calls to map in its implementation. We could try to copy and paste some code from Oleg involving overlapping instances and crazy typeclasses… but (in this instance) there’s a better way.

What we’d like to do is create a type which encompasses [a], [[a]], [[[a]]], and so on. A good first try might be:

data DeepList a = List [a] | Deep [DeepList a]

which says that a DeepList of ‘a’ is either a list of ‘a’, or a list of DeepLists of ‘a’. Sounds good, right? Unfortunately, there is a subtle problem with this definition: lists in different parts of a DeepList value may have different depths. For example, the following is a perfectly legitimate value of type DeepList Int:

Deep [List [1,2,3], Deep [List [1,6], List [4]]]

If we remove the Deep and List constructors, however, this corresponds to the ill-typed list value [[1,2,3], [[1,6], [4]]]. Since we only want values corresponding to nested list types, this ability to have different-depth lists in different parts of the tree won’t do. How can we force different branches to all have the same depth?

To the rescue come non-regular types — that is, recursive parameterized types where the type parameters on the right side are not the same as on the left. In particular, we can use a Bush type defined as follows:

data Bush a = Leaves [a] | Trunk (Bush [a])

This definition is very similar to that of DeepList, but the key difference is that a Bush may consist of a Bush of lists of ‘a’, rather than a list of Bushes of ‘a’. Every occurrence of the Trunk constructor adds one more nested layer to the type parameter, until finally a Leaves constructor is reached, followed by a suitably-nested list. For example, here are some values of type (Bush Int):

Leaves [1,2,3]
Trunk (Leaves [[1,2], [6,7]])
Trunk (Trunk (Leaves [[[5],[6,8,9],[]],[[6,19],[20]]]))

You can check that something like Trunk (Leaves [1,2,3]) gives a type error.

So, our higher-order enumeration function can return a Bush. But how will we deal with the (map . map …)’s? Actually, this is easier than it looks: all we need to do is make Bush a Functor, since that’s really what’s going on: we just want to be able to map (x…) over the result of the enumeration one dimension lower, whatever that might happen to mean. So, here we go:

import Data.List

data Bush a = Trunk (Bush [a]) | Leaves [a]

instance (Show a) => Show (Bush a) where   -- show Bush values without the constructors
  show (Leaves l) = show l
  show (Trunk b)  = show b

instance Functor Bush where
  fmap f (Trunk b)  = Trunk $ fmap (map f) b    -- add another map
  fmap f (Leaves l) = Leaves $ map f l

flatten :: Bush a -> [a]                  -- flatten a Bush into a single list
flatten (Leaves l) = l
flatten (Trunk b)  = concat $ flatten b

x ... y = [x..y]

And now our higher-order enumeration function (which we’ll call simplex, for reasons which are hopefully clear) is easy to write:

simplex :: (Enum t, Integral a) => a -> t -> t -> Bush t
simplex n _ _ | n < 1 = Leaves []
simplex 1 x y = Leaves (x...y)
simplex n x y = Trunk $ fmap (x...) (simplex (n-1) x y)

Let’s try it out:

*Main> simplex 1 1 3
[1,2,3]
*Main> simplex 2 1 3
[[1],[1,2],[1,2,3]]
*Main> simplex 3 1 3
[[[1]],[[1],[1,2]],[[1],[1,2],[1,2,3]]]

It works! Finally, I’ll leave the explanation of the following QuickCheck properties as an exercise for the reader:

simplex' :: (Integral a, Integral t) => a -> t -> Bush t
simplex' n x = simplex n 1 x

fac :: (Integral t) => t -> t
fac n = product [1..n]
binom :: Integer -> Integer -> Integer
binom n k | n < k     = 0
          | k < 0     = 0
          | otherwise = (fac n) `div` (fac k * fac (n-k))

prop_simplex k n = (k > 0) ==>
    (genericLength . flatten . simplex' k $ n) == binom (n + k - 1) k

prop_simplex_components k n = (k > 0) ==>
    (map genericLength . group . sort . flatten . simplex' k $ n) == map (flip binom (k-1)) [k+n-2,k+n-3..(k-1)]

6 Responses to Higher-dimensional enumeration

dmwit says:

October 1, 2007 at 11:31 pm

Pretty neat… but what is that “Plus” you threw in there?

David R. MacIver says:

October 2, 2007 at 4:02 am

It doesn’t really require Oleg levels of type hackery, ‘though it does need multi-parameter type classes and incoherent instances. See http://hpaste.org/3031 for a demonstration.

I think the requirement for incoherent instances is probably removable, but I haven’t thought about it too carefully.

Brent says:

October 2, 2007 at 5:26 am

@dmwit: oops, thanks for the catch!

@David: neat, thanks.

Miles says:

October 2, 2007 at 4:22 pm

Nice. Though I was amused to read it immediately after Mark-Jason Dominus’ post on why you really shouldn’t use that algorithm to calculate binomial coefficients :-)

Brent says:

October 2, 2007 at 4:39 pm

Miles: hehe, indeed. I certainly know better ways to calculate binomial coefficients, but couldn’t be bothered when writing some quick proof-of-concept quickcheck tests. =)

Miles says:

October 3, 2007 at 6:09 am

Strikes me as the kind of thing that should be in a library somewhere.