## Catsters guide is complete!

About a year and a half ago I announced that I had started creating a guide to the excellent series of category theory YouTube videos by the Catsters (aka Eugenia Cheng and Simon Willerton). I am happy to report that as of today, the guide is finally complete!

As far as possible, I have tried to arrange the order so that each video only depends on concepts from earlier ones. (If you have any suggestions for improving the ordering, I would love to hear them!) Along with each video you can also find my cryptic notes; I make no guarantee that they will be useful to anyone (even me!), but hopefully they will at least give you an idea of what is in each video.

If and when they post any new videos (pretty please?) I will try to keep it updated.

## The Species of Bracelets

Summary: The species package now has support for bracelets, i.e. equivalence classes of lists up to rotation and reversal. I show some examples of their use and then explain the (very interesting!) mathematics behind their implementation.

I recently released a new version of my species package which adds support for the species of bracelets. A bracelet is a (nonempty) sequence of items which is considered equivalent up to rotation and reversal. For example, the two structures illustrated below are considered equivalent as bracelets, since you can transform one into the other by a rotation and a flip:

In other words, a bracelet has the same symmetry as a regular polygon—that is, its symmetry group is the dihedral group $D_{2n}$. (Actually, this is only true for $n \geq 3$—I’ll say more about this later.)

Bracelets came up for me recently in relation to a fun side project (more on that soon), and I am told they also show up in applications in biology and chemistry (for example, bracelet symmetry shows up in molecules with cycles, which are common in organic chemistry). There was no way to derive the species of bracelets from what was already in the library, so I added them as a new primitive.

Let’s see some examples (later I discuss how they work). First, we set some options and imports.

ghci> :set -XNoImplicitPrelude
ghci> :m +NumericPrelude
ghci> :m +Math.Combinatorics.Species

Unlabelled bracelets, by themselves, are completely uninteresting: there is only a single unlabelled bracelet shape of any positive size. (Unlabelled species built using bracelets can be interesting, however; we’ll see an example in just a bit). We can ask the library to tell us how many distinct size-$n$ unlabelled bracelets there are for $n \in \{0, \dots, 9\}$:

ghci> take 10 $unlabelled bracelets [0,1,1,1,1,1,1,1,1,1]  Labelled bracelets are a bit more interesting. For $n \geq 3$ there are $(n-1)!/2$ labelled bracelets of size $n$: there are $(n-1)!$ cycles of size $n$ (there are $n!$ lists, which counts each cycle $n$ times, once for each rotation), and counting cycles exactly double counts bracelets, since each bracelet can be flipped in one of two ways. For example, there are $(5-1)!/2 = 24/2 = 12$ labelled bracelets of size $5$. ghci> take 10$ labelled bracelets
[0,1,1,1,3,12,60,360,2520,20160]


In addition to counting these, we can exhaustively generate them (this is a bit annoying with the current API; I hope to improve it):

ghci> enumerate bracelets [0,1] :: [Bracelet Int]
[<<0,1>>]

ghci> enumerate bracelets [0..2] :: [Bracelet Int]
[<<0,1,2>>]

ghci> enumerate bracelets [0..3] :: [Bracelet Int]
[<<0,1,2,3>>,<<0,1,3,2>>,<<0,2,1,3>>]


And here are all $12$ of the size-$5$ bracelets, where I’ve used a different color to represent each label (see here for the code used to generate them):

As a final example, consider the species $B \times E^2$, the Cartesian product of bracelets with ordered pairs of sets. That is, given a set of labels, we simultaneously give the labels a bracelet structure and also partition them into two (distinguishable) sets. Considering unlabelled structures of this species—that is, equivalence classes of labelled structures under relabelling—means that we can’t tell the labels apart, other than the fact that we can still tell which are in the first set and which are in the second. So, if we call the first set “purple” and the second “green”, we are counting the number of bracelets made from (otherwise indistinguishable) purple and green beads. Let’s call these binary bracelets. Here’s how many there are of sizes $0$ through $14$:

ghci> let biBracelets = bracelet >< (set * set)
ghci> take 15 $unlabelled biBracelets [0,2,3,4,6,8,13,18,30,46,78,126,224,380,687]  Let’s use the OEIS to check that we’re on the right track: ghci> :m +Math.OEIS ghci> let res = lookupSequence (drop 1 . take 10$ unlabelled biBracelets)
ghci> fmap description res
Just "Number of necklaces with n beads of 2 colors, allowing turning over."


Unfortunately the species library can’t currently enumerate unlabelled structures of species involving Cartesian product, though I hope to fix that. But for now we can draw these purple-green bracelets with some custom enumeration code. You can see the numbers $2, 3, 4, 6, 8, 13$ show up here, and it’s not too hard to convince yourself that each row contains all possible binary bracelets of a given size.

If you’re just interested in what you can do with bracelets, you can stop reading now. If you’re interested in the mathematical and algorithmic details of how they are implemented, read on!

## Exponential generating functions

The exponential generating function (egf) associated to a combinatorial species $F$ is defined by

$\displaystyle F(x) = \sum_{n \geq 0} |F[n]| \frac{x^n}{n!}.$

That is, the egf is an (infinite) formal power series where the coefficient of $x^n/n!$ is the number of distinct labelled $F$-structures on $n$ labels. We saw above that for $n \geq 3$ there are $(n-1)!/2$ labelled bracelets of size $n$, and there is one bracelet each of sizes $1$ and $2$. The egf for bracelets is thus:

$\displaystyle B(x) = x + \frac{x^2}{2} + \sum_{n \geq 3} \frac{(n-1)!}{2} \frac{x^n}{n!} = x + \frac{x^2}{2} + \sum_{n \geq 3} \frac{x^n}{2n}.$

(Challenge: show this is also equivalent to $\frac{1}{2}(x + x^2/2 - \ln(1-x))$.) This egf is directly encoded in the species library, and this is what is being used to evaluate labelled bracelets in the example above.

Incidentally, the reason $(n-1)!/2$ only works for $n \geq 3$ is in some sense due to the fact that the dihedral groups $D_2 = Z_2$ and $D_4 = K_4$ are a bit weird: every dihedral group $D_{2n}$ is a subgroup of the symmetric group $\mathcal{S}_n$ except for $D_2$ and $D_4$. The problem is that for $n < 3$, “flips” actually have no effect, as you can see below:

So, for example, $D_4$ has $4$ elements, corresponding to the identity, a 180 degree rotation, a flip, and a rotation + flip; but the symmetric group $\mathcal{S}_2$ only has two elements, in this case corresponding to the identity and a 180 degree rotation. The reason $(n-1)!/2$ doesn’t work, then, is that the division by two is superfluous: for $n < 3$, counting cycles doesn’t actually overcount bracelets, because every cycle is already a flipped version of itself. So it would also be correct (if rather baroque) to say that for $n < 3$ there are actually $(n-1)!$ bracelets.

I find this fascinating; it’s almost as if for bigger $n$ the dihedral symmetry has “enough room to breathe” whereas for small $n$ it doesn’t have enough space and gets crushed and folded in on itself, causing weird things to happen. It makes me wonder whether there are other sorts of symmetry with a transition from irregularity to regularity at even bigger $n$. Probably this is an easy question for a group theorist to answer but I’ve never thought about it before.

## Ordinary generating functions

The ordinary generating function (ogf) associated to a species $F$ is defined by

$\displaystyle \tilde F(x) = \sum_{n \geq 0} |F[n]/\mathord{\sim}| x^n$

where $\sim$ is the equivalence relation on $F$-structures induced by permuting the labels. That is, the coefficient of $x^n$ is the number of equivalence classes of $F$-structures on $n$ labels up to relabelling. There is only one unlabelled bracelet of any size $n \geq 1$, that is, any bracelet of size $n$ can be transformed into any other just by switching labels around. The unique unlabelled bracelet of a given size can be visualized as a bracelet of uniform beads:

though it’s occasionally important to keep in mind the more formal definition as an equivalence class of labelled bracelets. Since there’s just one unlabelled bracelet of each size, the ogf for bracelets is rather boring:

$\displaystyle \tilde B(x) = x + x^2 + x^3 + \dots = \frac{x}{x - 1}$.

This is encoded in the species library too, and was used to compute unlabelled bracelets above.

## egfs, ogfs, and homomorphisms

egfs are quite natural (in fact, species can be seen as a categorification of egfs), and the mapping from species to their associated egf is a homomorphism that preserves many operations such as sum, product, Cartesian product, composition, and derivative. ogfs, however, are not as nice. The mapping from species to ogfs preserves sum and product but does not, in general, preserve other operations like Cartesian product, composition or derivative. In some sense ogfs throw away too much information. Here’s a simple example to illustrate this: although the ogfs for bracelets and cycles are the same, namely, $x/(1-x)$ (there is only one unlabelled bracelet or cycle of each size), the ogfs for binary bracelets and binary cycles are different:

ghci> -- recall biBracelets = bracelet >< (set * set)
ghci> let biCycles = cycles >< (set * set)
ghci> take 15 $unlabelled biBracelets [0,2,3,4,6,8,13,18,30,46,78,126,224,380,687] ghci> take 15$ unlabelled biCycles
[0,2,3,4,6,8,14,20,36,60,108,188,352,632,1182]


(Puzzle: why are these the same up through $n=5$? Find the unique pair of distinct binary $6$-cycles which are equivalent as bracelets.)

Clearly, there is no way to take equal ogfs, apply the same operation to both, and get different results out. So the species library cannot be working directly with ogfs in the example above—something else must be going on. That something else is cycle index series, which generalize both egfs and ogfs, and retain enough information that they once again preserve many of the operations we care about.

## Cycle index series

Let $\mathcal{S}_n$ denote the symmetric group of order $n$, that is, the group of permutations on $\{1, \dots, n\}$ under composition. It is well-known that every permutation $\sigma \in \mathcal{S}_n$ can be uniquely decomposed as a product of disjoint cycles. The cycle type of $\sigma$ is the sequence of natural numbers $\sigma_1, \sigma_2, \sigma_3, \dots$ where $\sigma_i$ is the number of $i$-cycles in the cycle decomposition of $\sigma$. For example, the permutation $(132)(45)(78)(6)$ has cycle type $1,2,1,0,0,0,\dots$ since it has one $1$-cycle, two $2$-cycles, and one $3$-cycle.

For a species $F$ and a permutation $\sigma \in \mathcal{S}_n$, let $\mathrm{fix}\ F[\sigma]$ denote the number of $F$-structures that are fixed by the action of $\sigma$, that is,

$\displaystyle \mathrm{fix}\ F[\sigma] = \#\{ f \in F[n] \mid F[\sigma] f = f \}.$

The cycle index series of a combinatorial species $F$ is a formal power series in an infinite set of variables $x_1, x_2, x_3, \dots$ defined by

$\displaystyle Z_F(x_1, x_2, x_3, \dots) = \sum_{n \geq 0} \frac{1}{n!} \sum_{\sigma \in \mathcal{S}_n} \mathrm{fix}\ F[\sigma] x_1^{\sigma_1} x_2^{\sigma_2} x_3^{\sigma_3} \dots$

We also sometimes write $x^\sigma$ as an abbreviation for $x_1^{\sigma_1} x_2^{\sigma_2} x_3^{\sigma_3} \dots$. As a simple example, consider the species of lists, i.e. linear orderings. For each $n$, the identity permutation (with cycle type $n,0,0,\dots$) fixes all $n!$ lists of length $n$, whereas all other permutations do not fix any lists. Therefore

$\displaystyle Z_L(x_1, x_2, x_3, \dots) = \sum_{n \geq 0} \frac{1}{n!} n! x_1^n = \sum_{n \geq 0} x_1^n = \frac{1}{1 - x_1}.$

(This is not really that great of an example, though—since lists are regular species, that is, they have no nontrivial symmetry, their cycle index series, egf, and ogf are all essentially the same.)

Cycle index series are linked to both egfs and ogfs by the identities

$\displaystyle \begin{array}{rcl}F(x) &=& Z_F(x,0,0,\dots) \\ \tilde F(x) &=& Z_F(x,x^2,x^3, \dots)\end{array}$

To show the first, note that setting all $x_i$ to $0$ other than $x_1$ means that the only terms that survive are terms with only $x_1$ raised to some power. These correspond to permutations with only $1$-cycles, that is, identity permutations. Identity permutations fix all $F$-structures of a given size, so we have

$\begin{array}{rcl} Z_F(x,0,0,\dots) &=& \displaystyle \sum_{n \geq 0} \frac{1}{n!} \mathrm{fix}\ F[\mathit{id}] x^n \\ &=& \displaystyle \sum_{n \geq 0} |F[n]| \frac{x^n}{n!}. \end{array}$

To prove the link to ogfs, note first that for any permutation $\sigma \in \mathcal{S}_n$ with cycle type $\sigma_1,\sigma_2,\sigma_3,\dots$ we have $\sigma_1 + 2\sigma_2 + 3\sigma_3 + \dots = n$. Thus:

$\begin{array}{rcl} Z_F(x,x^2,x^3,\dots) &=& \displaystyle \sum_{n \geq 0} \frac{1}{n!} \sum_{\sigma \in \mathcal{S}_n} \mathrm{fix}\ F[\sigma] x^{\sigma_1}x^{2\sigma_2}x^{3\sigma_3}\dots \\ &=& \displaystyle \sum_{n \geq 0} \frac{1}{n!} \sum_{\sigma \in \mathcal{S}_n} \mathrm{fix}\ F[\sigma] x^n \\ &=& \displaystyle \sum_{n \geq 0} |F[n]/\mathord{\sim}| x^n \end{array}$

where the final step is an application of Burnside’s Lemma.

The important point is that the mapping from species to cycle index series is again a homomorphism for many of the operations we care about, including Cartesian product and composition. So in order to compute an ogf for some species defined in terms of operations that are not compatible with ogfs, one can start out computing with cycle index series and then project down to an ogf at the end.

## Cycle index series for bracelets

Let’s now see how to work out the cycle index series for the species of bracelets. For $n=1$, the single bracelet is fixed by the only element of $\mathcal{S}_1$, giving a term of $x_1$. For $n=2$, the single bracelet is fixed by both elements of $\mathcal{S}_2$, one of which has cycle type $2,0,0,\dots$ and the other $0,1,0,\dots$. Bracelets of size $n \geq 3$, as discussed previously, have the dihedral group $D_{2n}$ as their symmetry group. That is, every one of the $(n-1)!/2$ size-$n$ bracelets is fixed by the action of each element of $D_{2n}$, and no bracelets are fixed by the action of any other permutation. Putting this all together, we obtain

$\begin{array}{rcl} Z_B(x_1, x_2, x_3, \dots) &=& \displaystyle \sum_{n \geq 0} \frac{1}{n!} \sum_{\sigma \in \mathcal{S}_n} \mathrm{fix}\ B[\sigma] x_1^{\sigma_1}x_2^{\sigma_2}x_3^{\sigma_3}\dots \\ &=& \displaystyle x_1 + \frac{1}{2}(x_1^2 + x_2) + \sum_{n \geq 3} \frac{1}{n!} \sum_{\sigma \in D_{2n}} \frac{(n-1)!}{2} x^\sigma \\ &=& \displaystyle x_1 + \frac{1}{2}(x_1^2 + x_2) + \sum_{n \geq 3} \frac{1}{2n} \sum_{\sigma \in D_{2n}} x^\sigma. \end{array}$

Our remaining task is thus to compute $\sum_{\sigma \in D_{2n}} x^\sigma$, that is, to compute the cycle types of elements of $D_{2n}$ for $n \geq 3$. I don’t know whether there’s a nice closed form for $Z_B$, but for our purposes it doesn’t matter: it suffices to come up with a finite algorithm to generate all its terms with their coefficients. A closed form might be important if we want to compute with $Z_B$ symbolically, but if we just want to generate coefficients, an algorithm is good enough.

In general, $D_{2n}$ has $n$ elements corresponding to rotations (including the identity element, which we think of as a rotation by $0$ degrees) and $n$ elements corresponding to reflections across some axis. Below I’ve drawn illustrations showing the symmetries of bracelets of size $5$ and $6$; each symmetry corresponds to an element of $D_{2n}$.

The lines indicate reflections. You can see that in general there are $n$ lines of reflection. The curved arrows indicate clockwise rotations; taking any number of consecutive arrows from $0$ to $n-1$ gives a distinct rotational symmetry. Let’s label the rotations $R_k$ (for $k \in \{0, \dots, n-1\}$), where $R_k$ indicates a rotation by $k/n$ of a turn (so $R_0$ is the identity element). We won’t bother labelling the reflections since it’s not clear how we would choose canonical names for them, and in any case (as we’ll see) we don’t have as much of a need to give them names as we do for the rotations. The only thing we will note is that for even $n$ there are two distinct types of reflections, as illustrated by the dark and light blue lines on the right: the dark blue lines pass through two vertices, and the light blue ones pass through two edges. In the odd case, on the other hand, every line of reflection passes through one vertex and one edge. If you haven’t studied dihedral groups before, you might want to take a minute to convince yourself that this covers all the possible symmetries. It’s clear that a rotation followed by a rotation is again a rotation; what may be less intuitively clear is that a reflection followed by a reflection is a rotation, and that a rotation followed by a reflection is a reflection.

So the name of the game is to consider each group element as a permutation of the labels, and compute the cycle type of the permutation. Let’s tackle the reflections first; we have to separately consider the cases when $n$ is odd and even. We saw above that when $n = 2m+1$ is odd, each line of reflection passes through exactly one vertex. As a permutation, that means the reflection will fix the label at the vertex it passes through, and swap the labels on other vertices in pairs, as shown in the leftmost diagram below:

So the permutation has cycle type $1,m,0,0,\dots$. There is one $1$-cycle, and the remaining $n-1 = 2m$ elements are paired off in $2$-cycles. There are $n$ of these reflections in total, yielding a term of $nx_1x_2^m$ (where $m = \lfloor n/2 \rfloor$).

When $n = 2m$ is even, half of the reflections (the light blue ones) have no fixed points, as in the middle diagram above; they put everything in $2$-cycles. The other half of the even reflections fix two vertices, with the rest in $2$-cycles, as in the rightmost diagram above. In all, this yields terms $mx_1^2x_2^{m-1} + mx_2^m$.

Now let’s tackle the rotations. One could be forgiven for initially assuming that each rotation will just yield one big $n$-cycle… a rotation is just cycling the vertices, right? But it is a bit more subtle than that. Let’s look at some examples. In each example below, the green curved arrow indicates a rotation $R_k$ applied to the bracelet. As you can check, the other arrows show the resulting permutation on the labels, that is, each arrow points from one node to the node where it ends up under the action of the rotation.

Do you see the pattern? In the case when $k=1$ (the first example above), or more generally when $n$ and $k$ are relatively prime (the second example above, with $n=5$ and $k=2$), $R_k$ indeed generates a single $n$-cycle. But when $n$ and $k$ are not relatively prime, it generates multiple cycles. By symmetry the cycles must all be the same size; in general, the rotation $R_k$ generates $(n,k)$ cycles of size $n/(n,k)$ (where $(n,k)$ denotes the greatest common divisor of $n$ and $k$). So, for example, $2$ cycles are generated when $n=6$ and $k=2$ or $k=4$ (the next two examples above). The last example shows $n=12$ and $k=3$; we can see that three $4$-cycles are generated. Note this even works when $k=0$: we have $(n,0) = n$, so we get $n$ cycles of size $n/n = 1$, i.e. the identity permutation.

So $R_k$ contributes a term $x_{n/(n,k)}^{(n,k)}$. However, we can say something a bit more concise than this. Note, for example, when $n=12$, as the contribution of all the $R_k$ we get

$x_1^{12} + x_{12} + x_6^2 + x_4^3 + x_3^4 + x_{12} + x_2^6 + x_{12} + x_3^4 + x_4^3 + x_6^2 + x_{12}$

but we can collect like terms to get

$x_1^{12} + x_2^6 + 2x_3^4 + 2x_4^3 + 2x_6^2 + 4x_{12}$

For a given divisor $d \mid n$, the coefficient of $x_{n/d}^d$ is the number of nonnegative integers less than $n$ whose $\gcd$ with $n$ is equal to $d$. For example, the coefficient of $x_{6}^2$ is $2$, since there are two values of $k$ for which $(n,k) = 12/6 = 2$ and hence generate a six-cycle, namely, $2$ and $10$. So as the contribution of the $R_k$ we could write something like

$\displaystyle \sum_{d \mid n} \left( \#\{ 1 \leq k \leq n \mid (k,n) = d \} \right) x_{n/d}^d$

but there is a better way. Note that

$\displaystyle \#\{ 1 \leq k \leq n \mid (k,n) = d \} = \#\{ 1 \leq j \leq n/d \mid (j,n/d) = 1 \}$

since multiplying and dividing by $d$ establishes a bijection between the two sets. For example, we saw that $2$ and $10$ are the two numbers whose $\gcd$ with $12$ is $2$; this corresponds to the fact that $2/2 = 1$ and $10/2 = 5$ are relatively prime to $12/2 = 6$.

But counting relatively prime numbers is precisely what Euler’s totient function (usually written $\varphi$) does. So we can rewrite the coefficient of $x_{n/d}^d$ as

$\displaystyle \varphi(n/d) x_{n/d}^d$.

Finally, since we are adding up these terms for all divisors $d \mid n$, we can swap $d$ and $n/d$ (divisors of $n$ always come in pairs whose product is $n$), and rewrite this as

$\displaystyle \varphi(d) x_d^{n/d}$.

To sum up, then, we have for each $n \geq 3$:

1. When $n$ is odd, $n x_1 x_2^{\lfloor n/2 \rfloor}$.
2. When $n$ is even, $(n/2) x_1^2 x_2^{n/2 - 1} + (n/2)x_2^{n/2}$.
3. For each $d \mid n$, we have $\varphi(d) x_d^{n/d}$.

The only overlap is between (2) and (3): when $d = 2$ both generate $x_2^{n/2}$ terms. Using Iverson brackets (the notation $[P]$ is equal to $1$ if the predicate $P$ is true, and $0$ if it is false), we can thus write the sum of the above for a particular $n$ as

$\displaystyle [n\text{ odd}](n x_1 x_2^{\lfloor n/2 \rfloor}) + [n\text{ even}]\left(\frac n 2 x_1^2 x_2^{n/2 - 1}\right) + \sum_{d \mid n} \left(\varphi(d) + [d = 2]\frac n 2\right) x_d^{n/d}$.

Substituting this for $\displaystyle \sum_{\sigma \in D_{2n}} x^\sigma$ yields a full definition of $Z_B$. You can see the result encoded in the species library here. Here’s the beginning of the full expanded series:

ghci> :m +Math.Combinatorics.Species.Types
ghci> take 107 $show (bracelets :: CycleIndex) "CI x1 + 1 % 2 x2 + 1 % 2 x1^2 + 1 % 3 x3 + 1 % 2 x1 x2 + 1 % 6 x1^3 + 1 % 4 x4 + 3 % 8 x2^2 + 1 % 4 x1^2 x2"  This, then, is how unlabelled biBracelets (for example) is calculated, where biBracelets = bracelet >< (set * set). The cycle index series for bracelet and set are combined according to the operations on cycle index series corresponding to * and ><, and then the resulting cycle index series is mapped down to an ogf by substituting $x^k$ for each $x_k$. ## Bracelet generation The final thing to mention is how bracelet generation works. Of course we can’t really generate actual bracelets, but only lists. Since bracelets can be thought of as equivalence classes of lists (under rotation and reversal), the idea is to pick a canonical representative element of each equivalence class, and generate those. A natural candidate is the lexicographically smallest among all rotations and reversals (assuming the labels have an ordering; if they don’t we can pick an ordering arbitrarily). One easy solution would be to generate all possible lists and throw out the redundant ones, but that would be rather inefficient. It is surprisingly tricky to do this efficiently. Fortunately, there is a series of papers by Joe Sawada (Generating bracelets with fixed content; A fast algorithm to generate necklaces with fixed content; Generating bracelets in constant amortized time) describing (and proving correct) some efficient algorithms for generating things like cycles and bracelets. In fact, they are as efficient as possible, theoretically speaking: they do only $O(1)$ work per cycle or bracelet generated. One problem is that the algorithms are very imperative, so they cannot be directly transcribed into Haskell. But I played around with benchmarking various formulations in Haskell and got it as fast as I could. (Interestingly, using STUArray was a lot slower in practice than a simple functional implementation, even though the imperative solution is asymptotically faster in theory—my functional implementation is at least $O(\lg n)$ per bracelet, and quite possibly $O(n)$, though since $n$ is typically quite small it doesn’t really matter very much. Of course it’s also quite possible that there are tricks to make the array version go faster that I don’t know about.) The result is released in the multiset-comb package; you can see the bracelet generation code here. Posted in combinatorics, haskell, math, species | | 5 Comments ## Ally Skills Tutorial at ICFP I just signed up for the Ally Skills Tutorial at ICFP, and if you are (1) a man who (2) will be at ICFP in Vancouver, I encourage you to sign up, too! From the website: The Ally Skills Tutorial teaches men simple, everyday ways to support women in their workplaces and communities. Participants learn techniques that work at the office, in classrooms, at conferences, and online. The skills we teach are relevant everywhere, including skills particularly relevant to open technology and culture communities. At the end of the tutorial, participants will feel more confident in speaking up to support women, be more aware of the challenges facing women in their workplaces and communities, and have closer relationships with the other participants. This sounds super helpful—I suspect there is often a large gap between the extent to which I want to support women and the extent to which I actually know, practically, how to do so. The workshop will be taught by Valerie Aurora, Linux filesystem developer and Ada Initiative co-founder; I expect it will be high quality! Posted in haskell | Tagged , , , , | 2 Comments ## Crystal Ball Connection Patterns via Species and Generating Functions A couple weeks ago, Denise posted Puzzle: Crystal Ball Connection Patterns on her blog, Let’s Play Math. I had fun playing with it and thought I would demonstrate how to apply some high-powered combinatorial techniques to it (probably not what Denise had in mind!). The setup is that there are $n$ (distinct) friends who can talk to each other on the phone. Only two people can talk at a time (no conference calls). The question is to determine how many different “configurations” there are. Not everyone has to talk, so a configuration consists of some subset of the friends arranged in (unordered) conversational pairs. Warning: spoilers ahead! If you’d like to play around with this yourself (and it is indeed a nice, accessible combinatorics problem to play with), stop reading now. My goal in this post is to have fun applying some advanced tools to this (relatively) simple problem. # Telephone numbers Let’s start by visualizing some configurations. In her post, Denise illustrated the complete set of configurations for $n = 4$, which I will visualize like this: Notice how I’ve arranged them: in the first row is the unique configuration where no one is talking (yes, that counts). In the second row are the six possible configurations with just a single conversation. The last row has the three possible configurations with two conversations. One good approach at this point would be to derive some recurrences. This problem does indeed admit a nice recurrence, but I will let you ponder it. Instead, let’s see if we can just “brute-force” our way to a general formula, using our combinatorial wits. Later I will demonstrate a much more principled, mechanical way to derive a general formula. Let’s start by coming up with a formula for $T_{n,k}$, the number of configurations with $n$ people and $k$ conversations. The number of ways of choosing $k$ pairs out of a total of $n$ is the multinomial coefficient $\displaystyle \binom{n}{2,2,\dots,2,n-2k} = \frac{n!}{(2!)^k(n-2k)!}$. However, that overcounts things: it actually distinguishes the first pair, second pair, and so on, but we don’t want to have any ordering on the pairs. So we have to divide by $k!$, the number of distinct orderings of the pairs. Thus, $\displaystyle T_{n,k} = \frac{n!}{2^k (n-2k)! k!}.$ Let’s do a few sanity checks. First, when $k=0$, we have $T_{n,0} = \frac{n!}{n!} = 1$. We can also try some other small numbers we’ve already enumerated by hand: for example, $T_{4,1} = \frac{4!}{2 \cdot 2 \cdot 1} = 6$, and $T_{4,2} = \frac{4!}{4 \cdot 1 \cdot 2} = 3$. So this seems to work. For $n$ people, there can be at most $\lfloor n/2 \rfloor$ conversations. So, the total number of configurations is going to be $\displaystyle T_n = \sum_{k=0}^{\lfloor n/2 \rfloor} T_{n,k}$. We can use this to compute $T_n$ for the first few values of $n$: $\begin{array}{rcl}T_0 &=& 1\\T_1 &=& 1 \\ T_2 &=& 1 + 1 = 2 \\ T_3 &=& 1 + 3!/2 = 4 \\ T_4 &=& 1 + 6 + 3 = 10 \\ T_5 &=& 1 + 5!/(2 \cdot 3!) + 5!/(4 \cdot 2) = 1 + 10 + 15 = 26 \\ T_6 &=& 1 + 6!/(2 \cdot 4!) + 6!/(4 \cdot 2 \cdot 2) + 6!/(8 \cdot 3!) = 1 + 15 + 45 + 15 = 76 \end{array}$ At this point we could look up the sequence 1,1,2,4,10,26,76 on the OEIS and find out all sorts of fun things: e.g. that we are also counting self-inverse permutations, i.e. involutions, that these numbers are also called “restricted Stirling numbers of the second kind”, some recurrence relations, etc., as well as enough references to keep us busy reading for a whole year. # Species We can describe configurations as elements of the combinatorial species $C = E \circ (E_2 + X)$. That is, a configuration is an unordered set ($E$) of ($\circ$) things ($E_2 + X$), where each thing can either be an unordered pair ($E_2$) of people talking on the phone, or ($+$) a single person ($X$) who is not talking. We can now use the Haskell species library to automatically generate some counts and see whether they agree with our manual enumerations. First, some boilerplate setup: ghci> :set -XNoImplicitPrelude ghci> :m +NumericPrelude ghci> :m +Math.Combinatorics.Species Now we define the species of configurations: ghci> let configurations = set o (set ofSizeExactly 2 + singleton) We can ask the library to count the number of configurations for different $n$: ghci> take 10 (labelled configurations) [1,1,2,4,10,26,76,232,764,2620]  Oh good, those numbers look familiar! Now, I wonder how many configurations there are for $n = 100$? ghci> labelled configurations !! 100 24053347438333478953622433243028232812964119825419485684849162710512551427284402176  Yikes! We can also use the library to generate exhaustive lists of configurations, and draw them using diagrams. For example, here are all $76$ configurations for $n=6$. (If you want to see the code used to generate this diagram, you can find it here.) And just for fun, let’s draw all $232$ configurations for $n = 7$: Whee! # A general formula, via generating functions Finally, I want to show how to use the species definition given above and the theory of generating functions to (somewhat) mechanically derive a general formula for the number of configurations. (Hopefully it will end up being equivalent to the formula we came up with near the beginning of the post!) Of course, this is also what the species library is doing, but only numerically—we will do things symbolically. First, note that we are counting labelled configurations (the friends are all distinct), so we want to consider exponential generating functions (egfs). Recall that the egf for a species $F$ is given by $\displaystyle F(x) = \sum_{n \geq 0} |F[n]| \frac{x^n}{n!}$, that is, a (possibly infinite) formal power series where the coefficient of $x^n/n!$ is the number of distinct labelled $F$-structures of size $n$. In our case, we need $\displaystyle E(x) = \sum_{n \geq 0} 1 \cdot \frac{x^n}{n!} = e^x$, since there is exactly one set structure of any size, and $\displaystyle E_2(x) = \frac{x^2}{2}$, which is just the restriction of $E(x)$ to only the $x^2$ term. Of course, we also have $X(x) = x$. Putting this together, we calculate $\begin{array}{rcl}\displaystyle (E \circ (E_2 + X))(x) &=& e^{(x^2/2 + x)} \\[1em] &=& \displaystyle \sum_{n \geq 0} \frac{(x^2/2 + x)^n}{n!} \\[1em] &=& \displaystyle \sum_{n \geq 0} \sum_{k = 0}^n \frac{1}{n!} \binom{n}{k} \left(\frac{x^2}{2}\right)^k x^{n-k} \\[1em] &=& \displaystyle \sum_{n \geq 0} \sum_{k=0}^n \frac{1}{n!} \binom{n}{k} \frac{x^{n+k}}{2^k} \end{array}$ Ultimately, we want something of the form $\displaystyle \sum_{m \geq 0} f_m \frac{x^m}{m!}$, so we’ll need to collect up like powers of $x$. To do that, we can do a bit of reindexing. Right now, the double sum is adding up a bunch of terms that can be thought of as making a triangle: Each ordered pair in the triangle corresponds to a single term being added. Each column corresponds to a particular value of $n$, with $n$ increasing to the right. Within each column, $k$ goes from $0$ up to $n$. The powers of $x$ in our double sum are given by $n+k$. If we draw in lines showing terms that have the same power of $x$, it looks like this: So let’s choose a new variable $m$, defined by $m = n + k$. We can see that we will have terms for every $m \geq 0$. We will also keep the variable $k$ for our other index, and substitute $n = m - k$ to get rid of $n$. In other words, instead of adding up the triangle by columns, we are going to add it up by diagonals. Previously we had $k \leq n$; substituting for $n$ that now turns into $k \leq m - k$. Adding $k$ to both sides and dividing by $2$ yields $k \leq \lfloor m/2 \rfloor$ (we can round down since $k$ is an integer). Looking at the diagram above, this makes sense: the height of each diagonal line is indeed half its index. Rewriting our indices of summation and substituting $m - k$ for $n$, we now have: $\begin{array}{rcl}\displaystyle \sum_{n \geq 0} \sum_{k=0}^n \frac{1}{n!} \binom{n}{k} \frac{x^{n+k}}{2^k} &=& \displaystyle \sum_{m \geq 0} \sum_{k=0}^{\lfloor m/2 \rfloor} \frac{1}{(m-k)!} \binom{m-k}{k} \frac{x^m}{2^k} \\[1em] &=& \displaystyle \sum_{m \geq 0} \sum_{k=0}^{\lfloor m/2 \rfloor} \frac{1}{k!(m-2k)!} \frac{x^m}{2^k} \\[1em] &=& \displaystyle \sum_{m \geq 0} \frac{x^m}{m!} \sum_{k=0}^{\lfloor m/2 \rfloor} \frac{m!}{k!(m-2k)!2^k} \end{array}$ And hey, look at that! The coefficient of $x^m/m!$ is exactly what we previously came up with for $T_m$. Math works! Posted in combinatorics, haskell, math, species | | 1 Comment ## BlogLiterately 0.8.1, now with HTTPS! I’ve just released version 0.8.1 of BlogLiterately, a tool for formatting and posting stuff (especially Haskelly stuff) to blogs. This is in conjunction with the release of haxr-3000.11. After much blood, sweat, and tears, I was able to rip the HTTP package out of the guts of haxr, replace it with http-streams, and carefully sew everything back together around the edges. The result is that haxr now finally supports making XML-RPC calls via HTTPS, which in turn means that BlogLiterately once again works with WordPress, which no longer supports XML-RPC over HTTP. Happy blogging! Posted in haskell | Tagged , , , , | 1 Comment ## Thoughts on grade inflation, part I: is grade inflation bad? Grade inflation. It’s terrible, horrible, no good, very bad, and ruining everything. …right? Well… I’m not so sure. What I do know is that the typical conversation around grade inflation frustrates me. At best, it often leaves many important assumptions unstated and unquestioned. Is grade inflation really bad? If so, why? What are the underlying assumptions and values that drive us to think of it in one way or another? At worst, the conversation is completely at the wrong level. Grade inflation is actually a symptom pointing at a much deeper question, one that gets at the heart of education and pedagogy: what do grades mean? Or, put another way, what do grades measure? This will be a two-part series. In this first post, I consider the first question: is grade inflation bad? In most conversations I have been a part of, this is taken as given, but I think it deserves more careful thought. I don’t know of any reasons to think that grade inflation is good, but I also don’t buy many of the common arguments (often implied rather than explicitly stated) as to why it is bad; in this post I consider three common ones. ## What is grade inflation? Just to make sure everyone is on the same page: by grade inflation I mean the phenomenon where average student grades are increasing over time, that is, the average student now receives higher grades than the average student of n years ago. (You could also think of it as the average value of a given grade going down over time.) This phenomenon is widespread in the US. I am only really familiar with the educational system in the US, so this post will of necessity be rather US-centric; I would be interested to hear about similarities and differences with other countries. Let’s now consider some common arguments as to why grade inflation is bad. ## The “Back in my day…” argument This is not so much an “argument” as an attitude, and it goes something like this: “Back in MY day, a C really meant a C! These ungrateful, entitled young whippersnappers don’t understand the true value of grades…” This is a caricature, of course, but I have definitely encountered variants of this attitude. This makes about as much sense to me as complaining “back in MY day, a dollar was really worth a dollar! And now my daughter is asking me for twenty dollars to go to a movie with her friends. TWENTY DOLLARS! These ungrateful, entitled young whippersnappers don’t understand the true value of money…” Nonsense, of course they do. It just costs$20 to go to a movie these days. A dollar is worth what it is now worth; a C is worth what it is now worth. Get over it.  It’s not that students don’t understand “the true value of grades”, it’s just that the value of grades is different than it used to be.

If I may hazard a guess, I think what this often boils down to is that people blame grade inflation on student attitudes of entitlement. As a potential contributing factor to grade inflation (and insofar as we would like to teach students different attitudes), that is certainly worth thinking about. But grade inflation potentially being caused by something one dislikes is not an argument that grade inflation itself is bad.

## The compression argument

Of course, there’s one important difference between money and grades: amounts of money have no upper limit, whereas grades are capped at A+.  This brings us to what I often hear put forth as the biggest argument against grade inflation, that it compresses grades into a narrower and narrower band, squeezed from above by that highest possible A+.  The problem with this, some argue, is that grade compression causes information to be lost.  The “signal” of grades becomes noisier, and it becomes harder for, say, employers and grad schools to be able to distinguish between different students.

My first, more cynical reaction is this: well, cry me a river for those poor, poor employers and grad schools, who will now have to assess students on real accomplishments, skills, and personal qualities, or (more likely) find some other arbitrary measurement to use. Do we really think grades are such a high-quality signal in the first place? Do they really measure something important and intrinsic about a student? (More on this in my next post.)  If the signal is noisy or arbitrary in the first place then compressing it really doesn’t matter that much.

Less cynically, let’s suppose the grade-signal really is that high-quality and important, and we are actually worried about the possibility of losing information. Consider the extreme situation, where grade inflation has progressed to such a degree that professors only give one of two possible grades: A (“outstandingly excellent”) or A+ (“superlatively superb”). An A- is so insultingly low that professors never give it (for fear of lawsuits, perhaps); for simplicity’s sake let’s suppose that no one ever fails, either. In this hypothetical scenario, at an institution like Williams where students take 32 courses, there are only 33 possible GPAs: you could get 32 A+’s, or one A and 31 A+’s, or two A’s and 30 A+’s… all the way down to getting all A’s (“straight-A student” means something rather different in this imaginary universe!).

But here’s the thing: I think 33 different GPAs would still be enough! I honestly don’t think companies or grad schools can meaningfully care about distinctions finer than having 33 different buckets of students. (If you think differently, I’d love to hear your argument.) If student GPAs are normally distributed, this even means that the top few buckets have much less than 1/33 of all the students. So if the top grad schools and companies want to only consider the top 1% of all students (or whatever), they can just look at the top bucket or two. You might say this is unfair for the students, but really, I can’t see how this would be any more or less fair than the current system.

Of course, under this hypothetical two-grade system, GPAs might not be normally distributed. For one thing, if grade inflation kept going, the distribution might become more and more skewed to the right, until, for example, half of all students were getting straight A+’s, or, in the theoretical limit, all students get only A+’s. But I really don’t think this would actually happen; I think you would see some regulating effects kick in far before this theoretical limit was reached. Professors would not actually be willing to give all A+’s (or even, for that matter, all A’s and A+’s).

The GPAs could also be very bimodal, if, for example, students are extremely consistent: a student who consistently scores in the top 40% of every class would get the same grades (all A+’s) as a student who consistently scores in the top 10%. However, I doubt this is how it would work (as any professor knows, “consistent” and “student” are a rare pairing). It would be interesting to actually work out what GPA distributions would result from various assumptions about student behavior.

## The moving target argument

The final argument against grade inflation that I sometimes hear goes like this: the problem is not so much that the average GPA is going up but simply that it is moving at all, which makes it harder for grad schools and employers to know how to calibrate their interpretations. But I don’t really buy this one either. The value of money is moving too, and yes, in some grand sense I suppose that makes it slightly harder for people to figure out how much things are worth. But somehow, everyone seems to manage just fine. I think employers and grad schools will manage just fine too. I don’t think GPAs are changing anywhere near fast enough for it to make much difference. And in any case, most of the time, the only thing employers and grad schools really care about is comparing the GPAs of students who graduated around the same time, in which case the absolute average GPA doesn’t matter at all. (One can make an argument about the difficulties caused by different schools having different average GPAs, but that is always going to be an issue, grade inflation or no.)

In the end, then, I am not so sure that grade inflation per se is such a terrible thing. However, it is well worth pondering the causes of grade inflation, and the deeper questions it leads to: what are grades? Why do we give them? What purposes do they serve, and what do they measure? I’ll take up these questions in a subsequent post.

Posted in teaching | Tagged , , , | 4 Comments

Summary: given an injective function $A \times N \hookrightarrow B \times N$, it is possible to constructively “divide by $N$” to obtain an injection $A \hookrightarrow B$, as shown recently by Peter Doyle and Cecil Qiu and expounded by Richard Schwartz. Their algorithm is nontrivial to come up with—this had been a longstanding open question—but it’s not too difficult to explain. I exhibit some Haskell code implementing the algorithm, and show some examples.

# Introduction: division by two

Suppose someone hands you the following:

• A Haskell function f :: (A, Bool) -> (B, Bool), where A and B are abstract types (i.e. their constructors are not exported, and you have no other functions whose types mention A or B).

• A promise that the function f is injective, that is, no two values of (A, Bool) map to the same (B, Bool) value. (Thus (B, Bool) must contain at least as many inhabitants as (A, Bool).)

• A list as :: [A], with a promise that it contains every value of type A exactly once, at a finite position.

Can you explicitly produce an injective function f' :: A -> B? Moreover, your answer should not depend on the order of elements in as.

It really seems like this ought to be possible. After all, if (B, Bool) has at least as many inhabitants as (A, Bool), then surely B must have at least as many inhabitants as A. But it is not enough to reason merely that some injection must exist; we have to actually construct one. This, it turns out, is tricky. As a first attempt, we might try f' a = fst (f (a, True)). That is certainly a function of type A -> B, but there is no guarantee that it is injective. There could be a1, a2 :: A which both map to the same b, that is, one maps to (b, False) and the other to (b, True). The picture below illustrates such a situation: (a1, True) and (a2, True) both map to b2. So the function f may be injective overall, but we can’t say much about f restricted to a particular Bool value.

The requirement that the answer not depend on the order of as also makes things difficult. (Over in math-land, depending on a particular ordering of the elements in as would amount to the well-ordering principle, which is equivalent to the axiom of choice, which in turn implies the law of excluded middle—and as we all know, every time someone uses the law of excluded middle, a puppy dies. …I feel like I’m in one of those DirecTV commercials. “Don’t let a puppy die. Ignore the order of elements in as.”) Anyway, making use of the order of values in as, we could do something like the following:

• For each a :: A:
• Look at the B values generated by f (a,True) and f (a,False). (Note that there might only be one distinct such B value).
• If neither B value has been used so far, pick the one that corresponds to (a,True), and add the other one to a queue of available B values.
• If one is used and one unused, pick the unused one.
• If both are used, pick the next available B value from the queue.

It is not too hard I couldn’t be bothered to show that this will always successfully result in a total function A -> B, which is injective by construction. (One has to show that there will always be an available B value in the queue when you need it.) The only problem is that the particular function we get depends on the order in which we iterate through the A values. The above example illustrates this as well: if the A values are listed in the order $[a_1, a_2]$, then we first choose $a_1 \mapsto b_2$, and then $a_2 \mapsto b_3$. If they are listed in the other order, we end up with $a_2 \mapsto b_2$ and $a_1 \mapsto b_1$. Whichever value comes first “steals” $b_2$, and then the other one takes whatever is left. We’d like to avoid this sort of dependence on order. That is, we want a well-defined algorithm which will yield a total, injective function A -> B, which is canonical in the sense that the algorithm yields the same function given any permutation of as.

It is possible—you might enjoy puzzling over this a bit before reading on!

# Division by N

The above example is a somewhat special case. More generally, let $N = \{0, \dots, n-1\}$ denote a canonical finite set of size $n$, and let $A$ and $B$ be arbitrary sets. Then, given an injection $f : A \times N \hookrightarrow B \times N$, is it possible to effectively (that is, without excluded middle or the axiom of choice) compute an injection $A \hookrightarrow B$?

Translating down to the world of numbers representing set cardinalities—natural numbers if $A$ and $B$ are finite, or cardinal numbers in general—this just says that if $an \leq bn$ then $a \leq b$. This statement about numbers is obviously true, so it would be nice if we could say something similar about sets, so that this fact about numbers and inequalities can be seen as just a “shadow” of a more general theorem about sets and injections.

As hinted in the introduction, the interesting part of this problem is really the word “effectively”. Using the Axiom of Choice/Law of Excluded Middle makes the problem a lot easier, but either fails to yield an actual function that we can compute with, instead merely guaranteeing the existence of such a function, or gives us a function that depends on a particular ordering of $A$.

Apparently this has been a longstanding open question, recently answered in the affirmative by Peter Doyle and Cecil Qiu in their paper Division By Four. It’s a really great paper: they give some fascinating historical context for the problem, and explain their algorithm (which is conceptually not all that difficult) using an intuitive analogy to a card game with certain rules. (It is not a “game” in the usual sense of having winners and losers, but really just an algorithm implemented with “players” and “cards”. In fact, you could get some friends together and actually perform this algorithm in parallel (if you have sufficiently nerdy friends).) Richard Schwartz’s companion article is also great fun and easy to follow (you should read it first).

# A Game of Thrones Cards

Here’s a quick introduction to the way Doyle, Qiu, and Schwartz use a card game to formulate their algorithm. (Porting this framework to use “thrones” and “claimants” instead of “spots” and “cards” is left as an exercise to the reader.)

The finite set $N$ is to be thought of as a set of suits. The set $A$ will correspond to a set of players, and $B$ to a set of ranks or values (for example, Ace, 2, 3, …) In that case $B \times N$ corresponds to a deck of cards, each card having a rank and a suit; and we can think of $A \times N$ in terms of each player having in front of them a number of “spots” or “slots”, each labelled by a suit. An injection $A \times N \hookrightarrow B \times N$ is then a particular “deal” where one card has been dealt into each of the spots in front of the players. (There may be some cards left over in the deck, but the fact that the function is total means every spot has a card, and the fact that it is injective is encoded in the common-sense idea that a given card cannot be in two spots at once.) For example, the example function from before:

corresponds to the following deal:

Here each column corresponds to one player’s hand, and the rows correspond to suit spots (with the spade spots on top and the heart spots beneath). We have mapped $\{b_1, b_2, b_3\}$ to the ranks A, 2, 3, and mapped T and F to Spades and Hearts respectively. The spades are also highlighted in green, since later we will want to pay particular attention to what is happening with them. You might want to take a moment to convince yourself that the deal above really does correspond to the example function from before.

Of course, doing everything effectively means we are really talking about computation. Doyle and Qiu do talk a bit about computation, but it’s still pretty abstract, in the sort of way that mathematicians talk about computation, so I thought it would be interesting to actually implement the algorithm in Haskell.

The algorithm “works” for infinite sets, but only (as far as I understand) if you consider some notion of transfinite recursion. It still counts as “effective” in math-land, but over here in programming-land I’d like to stick to (finitely) terminating computations, so we will stick to finite sets $A$ and $B$.

First, some extensions and imports. Nothing too controversial.

> {-# LANGUAGE DataKinds                  #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> {-# LANGUAGE KindSignatures             #-}
> {-# LANGUAGE RankNTypes                 #-}
> {-# LANGUAGE ScopedTypeVariables        #-}
> {-# LANGUAGE StandaloneDeriving         #-}
> {-# LANGUAGE TypeOperators              #-}
>
> module PanGalacticDivision where
>
> import           Control.Arrow (second, (&&&), (***))
> import           Data.Char
> import           Data.List     (find, findIndex, transpose)
> import           Data.Maybe
>
> import           Diagrams.Prelude hiding (universe, value)
> import           Diagrams.Backend.Rasterific.CmdLine
> import           Graphics.SVGFonts


We’ll need some standard machinery for type-level natural numbers. Probably all this stuff is in a library somewhere but I couldn’t be bothered to find out. Pointers welcome.

> -- Standard unary natural number type
> data Nat :: * where
>   Z :: Nat
>   Suc :: Nat -> Nat
>
> type One = Suc Z
> type Two = Suc One
> type Three = Suc Two
> type Four = Suc Three
> type Six = Suc (Suc Four)
> type Eight = Suc (Suc Six)
> type Ten = Suc (Suc Eight)
> type Thirteen = Suc (Suc (Suc Ten))
>
> -- Singleton Nat-indexed natural numbers, to connect value-level and
> -- type-level Nats
> data SNat :: Nat -> * where
>   SZ :: SNat Z
>   SS :: Natural n => SNat n -> SNat (Suc n)
>
> -- A class for converting type-level nats to value-level ones
> class Natural n where
>   toSNat :: SNat n
>
> instance Natural Z where
>   toSNat = SZ
>
> instance Natural n => Natural (Suc n) where
>   toSNat = SS toSNat
>
> -- A function for turning explicit nat evidence into implicit
> natty :: SNat n -> (Natural n => r) -> r
> natty SZ r     = r
> natty (SS n) r = natty n r
>
> -- The usual canonical finite type.  Fin n has exactly n
> -- (non-bottom) values.
> data Fin :: Nat -> * where
>   FZ :: Fin (Suc n)
>   FS :: Fin n -> Fin (Suc n)
>
> finToInt :: Fin n -> Int
> finToInt FZ     = 0
> finToInt (FS n) = 1 + finToInt n
>
> deriving instance Eq (Fin n)


# Finiteness

Next, a type class to represent finiteness. For our purposes, a type a is finite if we can explicitly list its elements. For convenience we throw in decidable equality as well, since we will usually need that in conjunction. Of course, we have to be careful: although we can get a list of elements for a finite type, we don’t want to depend on the ordering. We must ensure that the output of the algorithm is independent of the order of elements.1 This is in fact true, although somewhat nontrivial to prove formally; I mention some of the intuitive ideas behind the proof below.

While we are at it, we give Finite instances for Fin n and for products of finite types.

> class Eq a => Finite a where
>   universe :: [a]
>
> instance Natural n => Finite (Fin n) where
>   universe = fins toSNat
>
> fins :: SNat n -> [Fin n]
> fins SZ     = []
> fins (SS n) = FZ : map FS (fins n)
>
> -- The product of two finite types is finite.
> instance (Finite a, Finite b) => Finite (a,b) where
>   universe = [(a,b) | a <- universe, b <- universe]


# Division, inductively

Now we come to the division algorithm proper. The idea is that panGalacticPred turns an injection $A \times N \hookrightarrow B \times N$ into an injection $A \times (N-1) \hookrightarrow B \times (N-1)$, and then we use induction on $N$ to repeatedly apply panGalacticPred until we get an injection $A \times 1 \hookrightarrow B \times 1$.

> panGalacticDivision
>   :: forall a b n. (Finite a, Eq b)
>   => SNat n -> ((a, Fin (Suc n)) -> (b, Fin (Suc n))) -> (a -> b)


In the base case, we are given an injection $A \times 1 \hookrightarrow B \times 1$, so we just pass a unit value in along with the $A$ and project out the $B$.

> panGalacticDivision SZ f = \a -> fst (f (a, FZ))


In the inductive case, we call panGalacticPred and recurse.

> panGalacticDivision (SS n') f = panGalacticDivision n' (panGalacticPred n' f)


# Pan-Galactic Predecessor

And now for the real meat of the algorithm, the panGalacticPred function. The idea is that we swap outputs around until the function has the property that every output of the form $(b,0)$ corresponds to an input also of the form $(a,0)$. That is, using the card game analogy, every spade in play should be in the leftmost spot (the spades spot) of some player’s hand (some spades can also be in the deck). Then simply dropping the leftmost card in everyone’s hand (and all the spades in the deck) yields a game with no spades. That is, we will have an injection $A \times \{1, \dots, n-1\} \hookrightarrow B \times \{1, \dots, n-1\}$. Taking predecessors everywhere (i.e. “hearts are the new spades”) yields the desired injection $A \times (N-1) \hookrightarrow B \times (N-1)$.

We need a Finite constraint on a so that we can enumerate all possible inputs to the function, and an Eq constraint on b so that we can compare functions for extensional equality (we iterate until reaching a fixed point). Note that whether two functions are extensionally equal does not depend on the order in which we enumerate their inputs, so far validating my claim that nothing depends on the order of elements returned by universe.

> panGalacticPred
>   :: (Finite a, Eq b, Natural n)
>   => SNat n
>   -> ((a, Fin (Suc (Suc n))) -> (b, Fin (Suc (Suc n))))
>   -> ((a, Fin (Suc n)) -> (b, Fin (Suc n)))


We construct a function f' which is related to f by a series of swaps, and has the property that it only outputs FZ when given FZ as an input. So given (a,i) we can call f' on (a, FS i) which is guaranteed to give us something of the form (b, FS j). Thus it is safe to strip off the FS and return (b, j) (though the Haskell type checker most certainly does not know this, so we just have to tell it to trust us).

> panGalacticPred n f = \(a,i) -> second unFS (f' (a, FS i))
>   where
>     unFS :: Fin (Suc n) -> Fin n
>     unFS FZ = error "impossible!"
>     unFS (FS i) = i


To construct f' we iterate a certain transformation until reaching a fixed point. For finite sets $A$ and $B$ this is guaranteed to terminate, though it is certainly not obvious from the Haskell code. (Encoding this in Agda so that it is accepted by the termination checker would be a fun (?) exercise.)

One round of the algorithm consists of two phases called “shape up” and “ship out” (to be described shortly).

>     oneRound = natty n $shipOut . shapeUp > > -- iterate 'oneRound' beginning with the original function... > fs = iterate oneRound f > -- ... and stop when we reach a fixed point. > f' = fst . head . dropWhile (uncurry (=/=))$ zip fs (tail fs)
>     f1 =/= f2 = all (\x -> f1 x == f2 x) universe


# Encoding Card Games

Recall that a “card” is a pair of a value and a suit; we think of $B$ as the set of values and $N$ as the set of suits.

> type Card v s = (v, s)
>
> value :: Card v s -> v
> value = fst
>
> suit :: Card v s -> s
> suit = snd


Again, there are a number of players (one for each element of $A$), each of which has a “hand” of cards. A hand has a number of “spots” for cards, each one labelled by a different suit (which may not have any relation to the actual suit of the card in that position).

> type PlayerSpot p s = (p, s)
> type Hand v s = s -> Card v s


A “game” is an injective function from player spots to cards. Of course, the type system is not enforcing injectivity here.

> type Game p v s = PlayerSpot p s -> Card v s


Some utility functions. First, a function to project out the hand of a given player.

> hand :: p -> Game p v s -> Hand v s
> hand p g = \s -> g (p, s)


A function to swap two cards, yielding a bijection on cards.

> swap :: (Eq s, Eq v) => Card v s -> Card v s -> (Card v s -> Card v s)
> swap c1 c2 = f
>   where
>     f c
>       | c == c1   = c2
>       | c == c2   = c1
>       | otherwise = c


leftmost finds the leftmost card in a player’s hand which has a given suit.

> leftmost :: Finite s => s -> Hand v s -> Maybe s
> leftmost targetSuit h = find (\s -> suit (h s) == targetSuit) universe


# Playing Rounds

playRound abstracts out a pattern that is used by both shapeUp and shipOut. The first argument is a function which, given a hand, produces a function on cards; that is, based on looking at a single hand, it decides how to swap some cards around.2 playRound then applies that function to every hand, and composes together all the resulting permutations.

Note that playRound has both Finite s and Finite p constraints, so we should think about whether the result depends on the order of elements returned by any call to universe—I claimed it does not. Finite s corresponds to suits/spots, which corresponds to $N$ in the original problem formulation. $N$ explicitly has a canonical ordering, so this is not a problem. The Finite p constraint, on the face of it, is more problematic. We will have to think carefully about each of the rounds implemented in terms of playRound and make sure they do not depend on the order of players. Put another way, it should be possible for all the players to take their turn simultaneously.

> playRound :: (Finite s, Finite p, Eq v) => (Hand v s -> Card v s -> Card v s) -> Game p v s -> Game p v s
> playRound withHand g = foldr (.) id swaps . g
>   where
>     swaps = map (withHand . flip hand g) players
>     players = universe


# Shape Up and Ship Out

Finally, we can describe the “shape up” and “ship out” phases, beginning with “shape up”. A “bad” card is defined as one having the lowest suit; make sure every hand with any bad cards has one in the leftmost spot (by swapping the leftmost bad card with the card in the leftmost spot, if necessary).

> shapeUp :: (Finite s, Finite p, Eq v) => Game p v s -> Game p v s
> shapeUp = playRound shapeUp1
>   where
>     shapeUp1 theHand =
>       case leftmost badSuit theHand of
>         Nothing      -> id


And now for the “ship out” phase. Send any “bad” cards not in the leftmost spot somewhere else, by swapping with a replacement, namely, the card whose suit is the same as the suit of the spot, and whose value is the same as the value of the bad card in the leftmost spot. The point is that bad cards in the leftmost spot are OK, since we will eventually just ignore the leftmost spot. So we have to keep shipping out bad cards not in the leftmost spot until they all end up in the leftmost spot. For some intuition as to why this is guaranteed to terminate, consult Schwartz; note that columns tend to acquire more and more cards that have the same rank as a spade in the top spot (which never moves).

> shipOut :: (Finite s, Finite p, Eq v) => Game p v s -> Game p v s
> shipOut = playRound shipOutHand
>   where
>     spots = universe
>     shipOutHand theHand = foldr (.) id swaps
>       where
>         swaps = map (shipOut1 . (theHand &&& id)) (drop 1 spots)
>         shipOut1 ((_,s), spot)
>           | s == badSuit = swap (theHand spot) (value (theHand badSuit), spot)
>           | otherwise    = id


And that’s it! Note that both shapeUp and shipOut are implemented by composing a bunch of swaps; in fact, in both cases, all the swaps commute, so the order in which they are composed does not matter. (For proof, see Schwartz.) Thus, the result is independent of the order of the players (i.e. the set A).

Enough code, let’s see an example! This example is taken directly from Doyle and Qiu’s paper, and the diagrams are being generated literally (literately?) by running the code in this blog post. Here’s the starting configuration:

Again, the spades are all highlighted in green. Recall that our goal is to get them all to be in the first row, but we have to do it in a completely deterministic, canonical way. After shaping up, we have:

Notice how the 6, K, 5, A, and 8 of spades have all been swapped to the top of their column. However, there are still spades which are not at the top of their column (in particular the 10, 9, and J) so we are not done yet.

Now, we ship out. For example, the 10 of spades is in the diamonds position in the column with the Ace of spades, so we swap it with the Ace of diamonds. Similarly, we swap the 9 of spades with the Queen of diamonds, and the Jack of spades with the 4 of hearts.

Shaping up does nothing at this point so we ship out again, and then continue to alternate rounds.

In the final deal above, all the spades are at the top of a column, so there is an injection from the set of all non-spade spots to the deck of cards with all spades removed. This example was, I suspect, carefully constructed so that none of the spades get swapped out into the undealt portion of the deck, and so that we end up with only spades in the top row. In general, we might end up with some non-spades also in the top row, but that’s not a problem. The point is that ignoring the top row gets rid of all the spades.

Anyway, I hope to write more about some “practical” examples and about what this has to do with combinatorial species, but this post is long enough already. Doyle and Qiu also describe a “short division” algorithm (the above is “long division”) that I hope to explore as well.

# The rest of the code

For completeness, here’s the code I used to represent the example game above, and to render all the card diagrams (using diagrams 1.3).

> type Suit = Fin
> type Rank = Fin
> type Player = Fin
>
> readRank :: SNat n -> Char -> Rank n
> readRank n c = fins n !! (fromJust $findIndex (==c) "A23456789TJQK") > > readSuit :: SNat n -> Char -> Suit n > readSuit (SS _) 'S' = FZ > readSuit (SS (SS _)) 'H' = FS FZ > readSuit (SS (SS (SS _))) 'D' = FS (FS FZ) > readSuit (SS (SS (SS (SS _)))) 'C' = FS (FS (FS FZ)) > > readGame :: SNat a -> SNat b -> SNat n -> String -> Game (Player a) (Rank b) (Suit n) > readGame a b n str = \(p, s) -> table !! finToInt p !! finToInt s > where > table = transpose . map (map readCard . words) . lines$ str
>
> -- Example game from Doyle & Qiu
> exampleGameStr :: String
> exampleGameStr = unlines
>   [ "4D 6H QD 8D 9H QS 4C AD 6C 4S"
>   , "JH AH 9C 8H AS TC TD 5H QC JS"
>   , "KC 6S 4H 6D TS 9S JC KD 8S 8C"
>   , "5C 5D KS 5S TH JD AC QH 9D KH"
>   ]
>
> exampleGame :: Game (Player Ten) (Rank Thirteen) (Suit Four)
> exampleGame = readGame toSNat toSNat toSNat exampleGameStr
>
> suitSymbol :: Suit n -> String
> suitSymbol = (:[]) . ("♠♥♦♣"!!) . finToInt  -- Huzzah for Unicode
>
> suitDia :: Suit n -> Diagram B
> suitDia = (suitDias!!) . finToInt
>
> suitDias = map mkSuitDia (fins (toSNat :: SNat Four))
> mkSuitDia s = text' (suitSymbol s) # fc (suitColor s) # lw none
>
> suitColor :: Suit n -> Colour Double
> suitColor n
>   | finToInt n elem [0,3] = black
>   | otherwise               = red
>
> rankStr :: Rank n -> String
> rankStr n = rankStr' (finToInt n + 1)
>   where
>     rankStr' 1 = "A"
>     rankStr' i | i <= 10    = show i
>                | otherwise = ["JQK" !! (i - 11)]
>
> text' t = stroke (textSVG' (TextOpts lin INSIDE_H KERN False 1 1) t)
>
> renderCard :: (Rank b, Suit n) -> Diagram B
> renderCard (r, s) = mconcat
>   [ mirror label
>   , cardContent (finToInt r + 1)
>   , back
>   ]
>   where
>     cardWidth  = 2.25
>     cardHeight = 3.5
>     cardCorners = 0.1
>     mirror d = d  d # rotateBy (1/2)
>     back  = roundedRect cardWidth cardHeight cardCorners # fc white
>           # lc (case s of { FZ -> green; _ -> black })
>     label = vsep 0.1 [text' (rankStr r), text' (suitSymbol s)]
>           # scale 0.6 # fc (suitColor s) # lw none
>           # translate ((-0.9) ^& 1.5)
>     cardContent n
>       | n <= 10   = pips n
>       | otherwise = face n # fc (suitColor s) # lw none
>                            # sized (mkWidth (cardWidth * 0.6))
>     pip = suitDia s # scale 1.1
>     pips 1 = pip # scale 2
>     pips 2 = mirror (pip # up 2)
>     pips 3 = pips 2  pip
>     pips 4 = mirror (pair pip # up 2)
>     pips 5 = pips 4  pip
>     pips 6 = mirror (pair pip # up 2)  pair pip
>     pips 7 = pips 6  pip # up 1
>     pips 8 = pips 6  mirror (pip # up 1)
>     pips 9 = mirror (pair (pip # up (2/3)  pip # up 2))  pip # up (case finToInt s of {1 -> -0.1; 3 -> 0; _ -> 0.1})
>     pips 10 = mirror (pair (pip # up (2/3)  pip # up 2)  pip # up (4/3))
>     pips _ = mempty
>     up n = translateY (0.5*n)
>     pair d = hsep 0.4 [d, d] # centerX
>     face 11 = squares # frame 0.1
>     face 12 = loopyStar
>     face 13 = burst # centerXY
>     squares
>       = strokeP (mirror (square 1 # translate (0.2 ^& 0.2)))
>       # fillRule EvenOdd
>     loopyStar
>       = regPoly 7 1
>       # star (StarSkip 3)
>       # pathVertices
>       # map (cubicSpline True)
>       # mconcat
>       # fillRule EvenOdd
>     burst
>       = [(1,5), (1,-5)] # map r2 # fromOffsets
>       # iterateN 13 (rotateBy (-1/13))
>       # mconcat # glueLine
>       # strokeLoop
>
> renderGame :: (Natural n, Natural a) => Game (Player a) (Rank b) (Suit n) -> Diagram B
> renderGame g = hsep 0.5 $map (\p -> renderHand p$ hand p g) universe
>
> renderHand :: Natural n => Player a -> Hand (Rank b) (Suit n) -> Diagram B
> renderHand p h = vsep 0.2 \$ map (renderCard . h) universe


1. If we could program in Homotopy Type Theory, we could make this very formal by using the notion of cardinal-finiteness developed in my dissertation (see section 2.4).

2. In practice this function on cards will always be a permutation, though the Haskell type system is not enforcing that at all. An early version of this code used the Iso type from lens, but it wasn’t really paying its way.