```
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> module Virtual where
>
> import Control.Applicative
> import Test.QuickCheck
```

Yesterday on #haskell, augur asked me to explain how subtraction works for combinatorial species. (For an introduction to species, see my paper from the 2010 Haskell Symposium.) For example, given the equation

defining the species (which corresponds to the Haskell type constructor `Maybe`

), our algebraic sensibilities tell us this can be rewritten as

but can we make sense of this combinatorially? Addition has an obvious combinatorial interpretation (disjoint union), and so does multiplication (pairing), but what does subtraction mean? Once we allow subtraction, we have to assign meaning to expressions like `Void - Bool`

, which apparently must be a species with negative two inhabitants. What on earth could that mean!? Combinatorics is all about *counting* things; we can have zero structures, or one structure, or seventy-three structures, but surely we can’t have a negative number of structures, can we?

## Building the Integers

Let’s start with a slight detour to talk about natural numbers and integers. Here’s a standard Haskell definition of the naturals:

```
> data Nat = Z | S Nat
> deriving (Show, Eq, Ord)
```

Note that the auto-derived `Eq`

and `Ord`

instances have exactly the right behavior (in the latter case due to the fact that we happened to list the `Z`

constructor first).

```
*Virtual> S (S Z) == S (S Z)
True
*Virtual> S Z == S (S Z)
False
*Virtual> S (S Z) < S Z
False
```

We can do arithmetic on natural numbers, implemented via a `Num`

instance:

```
> instance Num Nat where
> Z + n = n
> S m + n = S (m + n)
>
> Z * n = Z
> S m * n = n + m * n
>
> (-) = error "Subtraction! What's that?"
>
> fromInteger n | n < 0 = error "No can do."
> fromInteger 0 = Z
> fromInteger n = S (fromInteger (n-1))
>
> negate = error "Huh?"
> abs = id
```

It seems to work:

```
*Virtual> 1 + 3 :: Nat
S (S (S (S Z)))
```

As much as I dislike `Num`

, in this particular instance it is actually useful to illustrate an important point about `Nat`

, namely, that subtraction (and, more fundamentally, negation) are not defined for natural numbers. (Of course, we could implement a *partial* subtraction operation, but we’d like to stick to total functions as much as possible.) We have no way to implement `(-)`

or `negate`

, and we even have problems with `fromInteger`

if the input is negative.

But what if we were crazy enough to actually want negation and subtraction? How should we generalize the natural numbers? Our first instinct might be to define something like

```
> data I' = Pos Nat | Neg Nat
```

but this is annoying for a number of reasons. There are two different representations of zero (and having `Neg Z`

represent `-1`

, and so on, is too hideous to even mention (whoops)). Code to work with values of this type would also be ugly: lots of case analysis on `Pos`

and `Neg`

constructors with very similar code duplicated in each case. And we’re going to have to implement subtraction pretty much from scratch; there’s no good way to reuse the machinery we’ve already built for working with natural numbers.

However, there’s a much better way. No subtraction? No problem! We’ll just treat subtraction as a *new syntactic construct*.

```
> data I = Nat :- Nat
> deriving Show
```

That is, a value of type `I`

is a pair of `Nat`

s, thought of as their difference. Why call it `I`

? Well, observe that any integer, positive or negative, can be represented in this way, as a difference of two natural numbers. For example, `-2`

can be represented by the pair `3 :- 5`

. So this *is* our representation of integers!

Of course, `-2`

can also be represented by `4 :- 6`

, or `0 :- 2`

, or a countably infinite number of other choices. We really want to deal with *equivalence classes* of these pairs, so the first task is to decide on the proper equivalence. When are the pairs `p1 :- n1`

and `p2 :- n2`

equivalent? Well… when the difference between `p1`

and `n1`

is the same as the difference between `p2`

and `n2`

! This is circular, of course, since we are in the process of defining what "difference" means. But a moment’s thought reveals that we can reexpress this using only addition:

```
> instance Eq I where
> p1 :- n1 == p2 :- n2 = p1 + n2 == p2 + n1
```

In fact, our equivalence extends naturally to an ordering as well:

```
> instance Ord I where
> compare (p1 :- n1) (p2 :- n2) = compare (p1 + n2) (p2 + n1)
```

How about arithmetic? To add two integers, we just add their positive and negative parts pointwise; multiplication works by distributing out the product and collecting up the positive and negative parts; negation is just a flip:

```
> instance Num I where
> (p1 :- n1) + (p2 :- n2) = (p1 + p2) :- (n1 + n2)
> (p1 :- n1) * (p2 :- n2) = (p1 * p2 + n1 * n2) :- (p1 * n2 + p2 * n1)
> negate (p :- n) = n :- p
> fromInteger n | n < 0 = Z :- fromInteger (-n)
> | otherwise = fromInteger n :- Z
> abs i | i > 0 = i
> | otherwise = -i
```

Let’s see if it works:

```
*Virtual> (-4) :: I
Z :- S (S (S (S Z)))
*Virtual> 2 + 4 + (-3) :: I
S (S (S (S (S (S Z))))) :- S (S (S Z))
*Virtual> canonicalize it
S (S (S Z)) :- Z
*Virtual> 2 - 5 :: I
S (S Z) :- S (S (S (S (S Z))))
*Virtual> canonicalize it
Z :- S (S (S Z))
```

Seems to! Another important point is that we can inject the naturals into this new setting,

```
> n2i :: Nat -> I
> n2i = (:- 0)
```

and this injection is a *semiring homomorphism*: that is, adding (or multiplying) two `Nat`

s and applying `n2i`

to the result is the same (up to equality on `I`

) as first applying `n2i`

to each and then adding (or multiplying) the resulting `I`

values. But you don’t have to take my word for it:

```
> instance Arbitrary Nat where
> arbitrary = (fromInteger . unNN) `fmap` arbitrary
> where unNN (NonNegative x) = x
>
> instance Arbitrary I where
> arbitrary = (:-) <$> arbitrary <*> arbitrary
>
> prop_n2i_hom_add i j = n2i (i+j) == n2i i + n2i j
> prop_n2i_hom_mul i j = n2i (i*j) == n2i i * n2i j
```

```
*Virtual> quickCheck prop_n2i_hom_add
+++ OK, passed 100 tests.
*Virtual> quickCheck prop_n2i_hom_mul
+++ OK, passed 100 tests.
```

It is also not hard to choose a canonical representative from each equivalence class of `I`

values, namely, the pairs where at least one of the elements is zero. Here’s a function that returns the canonical representative corresponding to any value:

```
> canonicalize :: I -> I
> canonicalize i@(Z :- _) = i
> canonicalize i@(_ :- Z) = i
> canonicalize (S m :- S n) = canonicalize (m :- n)
```

We can use `canonicalize`

to write a (partial) projection from integers back to naturals. To check whether an integer actually corresponds to a natural, just canonicalize it and check whether its negative component is zero:

```
> i2n :: I -> Maybe Nat
> i2n i = case canonicalize i of
> (n :- Z) -> Just n
> _ -> Nothing
```

Although `i2n`

is partial, it is *complete*:

```
> prop_i2n_complete n = i2n (n2i n) == Just n
```

```
*Virtual> quickCheck prop_i2n_complete
+++ OK, passed 100 tests.
```

The result of all this is that we can assign meaning to arbitrary expressions involving natural numbers, addition, multiplication, *and subtraction*, by first injecting all the naturals into our type for integers. What’s more, we have a method for deciding whether the final result is actually the image of a natural number.

```
*Virtual> 7 - 3 :: Nat
*** Exception: Subtraction! What's that?
*Virtual> i2n (7 - 3) :: Maybe Nat
Just (S (S (S (S Z))))
```

## Virtual Species

So what was the point of talking about natural numbers and integers? We were supposed to be talking about combinatorial species, right?

Well, it turns out that we *were* talking about combinatorial species, since they include the natural numbers as a sub-semiring. The point is that the whole construction outlined above can be generalized to work over the entire semiring of species. In particular, we define *virtual species* to be equivalence classes of pairs of species , where is considered "positive" and "negative". and are equivalent if (where denotes species isomorphism). Addition and multiplication for virtual species are defined in exactly the same way as we did for integers.

And what about `canonicalize`

? Does something similar exist for virtual species? The answer, as it turns out, is *yes*, and this is a rather big deal! This blog post is too short to contain the details (they could probably make another post or three), but the key theorem is that every species can be decomposed uniquely into a sum of species of a special type known as *molecular* species. To canonicalize a virtual species we need only decompose both its positive and negative parts, and cancel any molecular species we find on both sides. Unlike integers, canonical virtual species do not necessarily have the zero species on one side or the other (that is, there exist "complex" virtual species which are not purely positive or negative). However, injection of species into the ring of virtual species does of course work by pairing with the zero species, and this injection is a semiring homomorphism as we would expect. Other species operations (such as differentiation and cartesian product) can also be extended to virtual species in a way compatible with this injection.

So to our original species expression we can assign the virtual species . Of course, this is not canonical; unsurprisingly, the molecular decomposition of is , so canceling the gives us the canonical (and strictly positive) species , as expected.

Here, however, is where the similarity to naturals and integers ends. Virtual species give us a lot more additional structure over species than integers do over naturals. For example, integers don’t give us any sort of multiplicative inverses for naturals; for that we need another round of completion to obtain the rationals. Surprisingly, it turns out that virtual species already provide multiplicative inverses for "most" species! This allows us to assign meaning to expressions such as . Perhaps this should be the subject of another blog post…

The approach of defining Nat :- Nat as a new data constructor reminds me a bit of difference lists used in Prolog. Is that perhaps an application of virtual species? (though, it seems a naive encoding in this scheme would just cancel out and leave you with Void, which is certainly not what happens in practice).

I don’t know, I’m not familiar with difference lists used in Prolog.

Difference lists work by subtracting values ([1,2,3] – [3] = [1,2]), not by subtracting types (List – List = Void). But maybe you can do something analogous with _type-level_ difference lists.

It reminds me of the construction of rationals from the integers. Basically it

isthe same construction.It is not quite the same. Note that the rules for addition and multiplication are different. E.g. for integers-as-pairs-of-naturals, (p1,n1) * (p2,n2) = (p1*p2 + n1*n2, p1*n2 + p2*n1), but for rationals-as-pairs-of-integers, (p1,q1)*(p2,q1) = (p1*p1, q1*q2). But you’re right that they are very similar.

No, I really meant they are the same construction. Whereas to construct integers from naturals you are using the + operation, for integers to rationals you use the * operation. The integers are (isomorphic to) equivalence classes of pairs of naturals (a, b) under the equivalence a_1 + b_2 = a_2 + b_1, the rationals are (isomorphic to) equivalence classes of pairs of integers (a, b) under the equivalence a_1 * b_2 = a_2 * b_1. See? the only complication here is that you have to disallow 0 to be in the denominator because you connect these two structures by the distributive law. So the rationals-as-pairs-of-integers (p_1, q_1) * (p_2, q_2) = (p_1 * p_1, q_1 * q_2) has the exact counterpart for integers-as-pairs-of-naturals for the operation * replaced by +,… (p_1, q_1) + (p_2, q_2) = (p_1 + p_2, q_1 + q_2). but I understand that only the imprecision of natural language is responsible for the seeming contradiction. ;)

Oh, I do see now, thanks for explaining! It makes sense in retrospect, since in the case of integers we want to create additive inverses, and in the case of rationals we want to create multiplicative inverses.

Precisely :) I sort of bumped into this while thinking in the shower… didn’t notice half-an-hour pass…

Definitely the same construction. To give a name to it, it’s the Grothendieck contruction, which given an abelian semigroup A, gives its unique universal group G(A); that is, the unique group G(A) and inclusion map as you mentioned such that for any homomorphism (of semigroups) from A to a group B, that homomorphism can be written as the composition of the inclusion map from A to G(A), and some group homomorphism from G(A) to B.

Noting this raises some other questions. The inclusion map from the Grothendieck construction is one-to-one iff the semigroup is cancellative. So before you rush to claim that subtraction is meaningful, one should first ask whether species together with addition satisfy the cancellation property… given species x, y, and z such that x + z = y + z, do we necessarily have x = y?

In case species don’t turn out to be cancellative, you generally want to modify your equivalence relation a bit to the following: (P_1, N_1) ~ (P_2, N_2) iff there exists some species Q such that P_1 + N_2 + Q = P_2 + N_1 + Q.

Amazing stuff. I didn’t know that this construction had a name either.

Yeah I figured they needed to be commutative monoid as well. But what about zero for the multiplicative case? It does not cancel! So we almost have ourselves a group, except for the zero, as is well-lamented. Does that case (from a commutative monoid that is almost cancellative you can build up a thing that is almost a group) have a name as well?

Thanks Chris. Someone else already did point me to the Grothendieck construction (which I hadn’t heard of before) but I appreciate the extra exposition!

Good point re: cancellation. It turns out that species do satisfy additive cancellation: whenever there is a natural isomorphism from F + H to G + H, there is a natural isomorphism from F to G. The proof of this is via (a simplified version of) the Garsia-Milne involution principle (of which there is a nice explanation in Doron Zeilberger’s introduction to enumerative combinatorics in the Princeton Companion to Mathematics).

By the way, just to make it clear, I didn’t come up with any of this, it’s all in Bergeron et al. I’m just trying to explain it in a way more accessible to Haskell hackers.

Here’s a post on Grothendieck group: http://www.reddit.com/r/haskell/comments/99je0

Pingback: Unordered tuples and type algebra | blog :: Brent -> [String]

Pingback: And now, back to your regularly scheduled combinatorial species | blog :: Brent -> [String]