Idempotent Template Haskell

August 16, 2011

Yesterday I was looking for a way to have some Template Haskell code generate a definition for a certain identifier if and only if such a definition did not already exist. Essentially I want to be able to call the code multiple times but have it generate something only once. This turns out to be extremely easy but non-obvious, so I thought I’d write it here for posterity. The trick is that reify throws an error if called on an identifier that doesn’t exist, and recover can be used to catch TH errors. Thus:


recover (generateBinding n) (reify n >> return [])

which uses generateBinding n to “recover” from the error of n‘s non-existence.


trapd

May 11, 2011


Species subtraction made simple

November 24, 2010
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> module Virtual where
> 
> import Control.Applicative
> import Test.QuickCheck

Yesterday on #haskell, augur asked me to explain how subtraction works for combinatorial species. (For an introduction to species, see my paper from the 2010 Haskell Symposium.) For example, given the equation

M = 1 + X

defining the species M (which corresponds to the Haskell type constructor Maybe), our algebraic sensibilities tell us this can be rewritten as

X = M - 1

but can we make sense of this combinatorially? Addition has an obvious combinatorial interpretation (disjoint union), and so does multiplication (pairing), but what does subtraction mean? Once we allow subtraction, we have to assign meaning to expressions like Void - Bool, which apparently must be a species with negative two inhabitants. What on earth could that mean!? Combinatorics is all about counting things; we can have zero structures, or one structure, or seventy-three structures, but surely we can’t have a negative number of structures, can we?

Building the Integers

Let’s start with a slight detour to talk about natural numbers and integers. Here’s a standard Haskell definition of the naturals:

> data Nat = Z | S Nat
>   deriving (Show, Eq, Ord)

Note that the auto-derived Eq and Ord instances have exactly the right behavior (in the latter case due to the fact that we happened to list the Z constructor first).

*Virtual> S (S Z) == S (S Z)
True
*Virtual> S Z == S (S Z)
False
*Virtual> S (S Z) < S Z
False

We can do arithmetic on natural numbers, implemented via a Num instance:

> instance Num Nat where
>   Z   + n = n
>   S m + n = S (m + n) 
> 
>   Z   * n = Z
>   S m * n = n + m * n
> 
>   (-) = error "Subtraction! What's that?"
> 
>   fromInteger n | n < 0 = error "No can do."
>   fromInteger 0 = Z
>   fromInteger n = S (fromInteger (n-1))
> 
>   negate = error "Huh?"
>   abs = id

It seems to work:

*Virtual> 1 + 3 :: Nat
S (S (S (S Z)))

As much as I dislike Num, in this particular instance it is actually useful to illustrate an important point about Nat, namely, that subtraction (and, more fundamentally, negation) are not defined for natural numbers. (Of course, we could implement a partial subtraction operation, but we’d like to stick to total functions as much as possible.) We have no way to implement (-) or negate, and we even have problems with fromInteger if the input is negative.

But what if we were crazy enough to actually want negation and subtraction? How should we generalize the natural numbers? Our first instinct might be to define something like

> data I' = Pos Nat | Neg Nat

but this is annoying for a number of reasons. There are two different representations of zero (and having Neg Z represent -1, and so on, is too hideous to even mention (whoops)). Code to work with values of this type would also be ugly: lots of case analysis on Pos and Neg constructors with very similar code duplicated in each case. And we’re going to have to implement subtraction pretty much from scratch; there’s no good way to reuse the machinery we’ve already built for working with natural numbers.

However, there’s a much better way. No subtraction? No problem! We’ll just treat subtraction as a new syntactic construct.

> data I = Nat :- Nat
>   deriving Show

That is, a value of type I is a pair of Nats, thought of as their difference. Why call it I? Well, observe that any integer, positive or negative, can be represented in this way, as a difference of two natural numbers. For example, -2 can be represented by the pair 3 :- 5. So this is our representation of integers!

Of course, -2 can also be represented by 4 :- 6, or 0 :- 2, or a countably infinite number of other choices. We really want to deal with equivalence classes of these pairs, so the first task is to decide on the proper equivalence. When are the pairs p1 :- n1 and p2 :- n2 equivalent? Well… when the difference between p1 and n1 is the same as the difference between p2 and n2! This is circular, of course, since we are in the process of defining what "difference" means. But a moment’s thought reveals that we can reexpress this using only addition:

> instance Eq I where
>   p1 :- n1 == p2 :- n2  =  p1 + n2 == p2 + n1

In fact, our equivalence extends naturally to an ordering as well:

> instance Ord I where
>   compare (p1 :- n1) (p2 :- n2) = compare (p1 + n2) (p2 + n1)

How about arithmetic? To add two integers, we just add their positive and negative parts pointwise; multiplication works by distributing out the product and collecting up the positive and negative parts; negation is just a flip:

> instance Num I where
>   (p1 :- n1) + (p2 :- n2) = (p1 + p2) :- (n1 + n2)
>   (p1 :- n1) * (p2 :- n2) = (p1 * p2 + n1 * n2) :- (p1 * n2 + p2 * n1)
>   negate (p :- n) = n :- p
>   fromInteger n | n < 0     = Z :- fromInteger (-n)
>                 | otherwise = fromInteger n :- Z
>   abs i | i > 0     =  i
>         | otherwise = -i

Let’s see if it works:

*Virtual> (-4) :: I
Z :- S (S (S (S Z)))
*Virtual> 2 + 4 + (-3) :: I
S (S (S (S (S (S Z))))) :- S (S (S Z))
*Virtual> canonicalize it
S (S (S Z)) :- Z
*Virtual> 2 - 5 :: I
S (S Z) :- S (S (S (S (S Z))))
*Virtual> canonicalize it
Z :- S (S (S Z))                      

Seems to! Another important point is that we can inject the naturals into this new setting,

> n2i :: Nat -> I
> n2i = (:- 0)

and this injection is a semiring homomorphism: that is, adding (or multiplying) two Nats and applying n2i to the result is the same (up to equality on I) as first applying n2i to each and then adding (or multiplying) the resulting I values. But you don’t have to take my word for it:

> instance Arbitrary Nat where
>   arbitrary = (fromInteger . unNN) `fmap` arbitrary
>     where unNN (NonNegative x) = x
> 
> instance Arbitrary I where
>   arbitrary = (:-) <$> arbitrary <*> arbitrary
> 
> prop_n2i_hom_add i j = n2i (i+j) == n2i i + n2i j
> prop_n2i_hom_mul i j = n2i (i*j) == n2i i * n2i j
*Virtual> quickCheck prop_n2i_hom_add
+++ OK, passed 100 tests.
*Virtual> quickCheck prop_n2i_hom_mul
+++ OK, passed 100 tests.

It is also not hard to choose a canonical representative from each equivalence class of I values, namely, the pairs where at least one of the elements is zero. Here’s a function that returns the canonical representative corresponding to any value:

> canonicalize :: I -> I
> canonicalize i@(Z :- _)   = i
> canonicalize i@(_ :- Z)   = i
> canonicalize (S m :- S n) = canonicalize (m :- n)

We can use canonicalize to write a (partial) projection from integers back to naturals. To check whether an integer actually corresponds to a natural, just canonicalize it and check whether its negative component is zero:

> i2n :: I -> Maybe Nat
> i2n i = case canonicalize i of
>           (n :- Z) -> Just n
>           _        -> Nothing

Although i2n is partial, it is complete:

> prop_i2n_complete n = i2n (n2i n) == Just n
*Virtual> quickCheck prop_i2n_complete 
+++ OK, passed 100 tests.

The result of all this is that we can assign meaning to arbitrary expressions involving natural numbers, addition, multiplication, and subtraction, by first injecting all the naturals into our type for integers. What’s more, we have a method for deciding whether the final result is actually the image of a natural number.

*Virtual> 7 - 3 :: Nat
*** Exception: Subtraction! What's that?
*Virtual> i2n (7 - 3) :: Maybe Nat
Just (S (S (S (S Z))))

Virtual Species

So what was the point of talking about natural numbers and integers? We were supposed to be talking about combinatorial species, right?

Well, it turns out that we were talking about combinatorial species, since they include the natural numbers as a sub-semiring. The point is that the whole construction outlined above can be generalized to work over the entire semiring of species. In particular, we define virtual species to be equivalence classes of pairs of species (P, N), where P is considered "positive" and N "negative". (P_1, N_1) and (P_2, N_2) are equivalent if P_1 + N_2 \cong P_2 + N_1 (where \cong denotes species isomorphism). Addition and multiplication for virtual species are defined in exactly the same way as we did for integers.

And what about canonicalize? Does something similar exist for virtual species? The answer, as it turns out, is yes, and this is a rather big deal! This blog post is too short to contain the details (they could probably make another post or three), but the key theorem is that every species can be decomposed uniquely into a sum of species of a special type known as molecular species. To canonicalize a virtual species we need only decompose both its positive and negative parts, and cancel any molecular species we find on both sides. Unlike integers, canonical virtual species do not necessarily have the zero species on one side or the other (that is, there exist "complex" virtual species which are not purely positive or negative). However, injection of species into the ring of virtual species does of course work by pairing with the zero species, and this injection is a semiring homomorphism as we would expect. Other species operations (such as differentiation and cartesian product) can also be extended to virtual species in a way compatible with this injection.

So to our original species expression M - 1 we can assign the virtual species (M,1). Of course, this is not canonical; unsurprisingly, the molecular decomposition of M is 1 + X, so canceling the 1 gives us the canonical (and strictly positive) species (X,0), as expected.

Here, however, is where the similarity to naturals and integers ends. Virtual species give us a lot more additional structure over species than integers do over naturals. For example, integers don’t give us any sort of multiplicative inverses for naturals; for that we need another round of completion to obtain the rationals. Surprisingly, it turns out that virtual species already provide multiplicative inverses for "most" species! This allows us to assign meaning to expressions such as \frac{1}{1 - X}. Perhaps this should be the subject of another blog post…


Back from Baltimore

October 2, 2010

I’m finally back in Philly after attending ICFP, the Haskell Symposium, and the Haskell Implementors’ Workshop in Baltimore. I enjoyed meeting every person I met, so if I met you please feel free to instantiate the universal quantifier.

  • Here’s a link to my Haskell Symposium paper, Species and Functors and Types, Oh My!. My talk seemed to be well-received, and it was videotaped, but unfortunately I think it’s going into the ACM Digital Library, which you need a subscription to access. Here are the slides, at least, although they don’t make all that much sense by themselves. I had several people ask me what I used to make the slides; it was a combination of inkscape and my diagrams EDSL.
  • Here are links to the slides and video of my talk from the Haskell Implementors’ Workshop, Typed type-level functional programming with GHC.

The Haskell Alphabet

September 18, 2010

Here’s a little something I’ve been working on in bits of spare time here and there for the past five months or so:

The Haskell Alphabet

I seem to recall it was inspired by a conversation in #haskell around the beginning of April, but I can’t find it in the logs. In any case, enjoy, and feel free to post comments, suggestions for additional links, or alternate stanzas as comments here!


Typed type-level programming in Haskell, part IV: collapsing types and kinds

August 5, 2010

In Part III, we saw how the current state of the art in Haskell type-level programming leaves some things to be desired: it requires duplicating both data declarations and code, and even worse, it’s untyped. What to do?

Currently, GHC’s core language has three “levels”:

  1. Expressions: these include things like term variables, lambdas, applications, and case expressions.
  2. Types: e.g. type variables, base types, forall, function types.
  3. Kinds: * and arrow kinds.

Types classify expressions (the compiler accepts only well-typed expressions), and kinds classify types (the compiler accepts only well-kinded types). For example,

3 :: Int,
Int :: *,
Maybe :: * -> *,

and so on.

The basic idea is to allow the automatic lifting of types to kinds, and their data constructors to type constructors. For example, assuming again that we have

data Nat = Z | S Nat

we can view Z :: Nat as either the data constructor Z with type Nat, or as the type Z with kind Nat. Likewise, S :: Nat -> Nat can be viewed either as a data constructor and a type, or a type constructor and a kind.

One obvious question: if Z is a type, and types classify expressions, then what expressions have type Z? The answer: there aren’t any. But this makes sense. We want to be able to use Z as a type-level “value”, and don’t really care about whether it classifies any expressions. And indeed, without this auto-lifting, if we wanted to have a type-level Z we would have declared an empty data type data Z.

Notice we have much richer kinds now, since we are basically importing an entire copy of the type level into the kind level. But that also means we will need something to classify kinds as well, so we need another level… and what’s to stop us from lifting kinds up to the next level, and so on? We would end up with an infinite hierarchy of levels. In fact, this is exactly what Omega does.

But in our case we can do something much simpler: we simply collapse the type and kind levels into a single level so that types and kinds are now the same thing, which I will call typekinds (for lack of a better term). We just take the ordinary syntax of types that we already had, and the only things we need to add are lifted data constructors and *. (There are still some questions about whether we represent arrow kinds using the arrow type constructor or forall, but I’ll leave that aside for the moment.) To tie the knot, we add the axiom that the typekind * is classified by itself. It is well-known that this allows the encoding of set-theoretic paradoxes that render the type system inconsistent when viewed as a logic — but Haskell’s type system is already an inconsistent logic anyway, because of general recursion, so who cares?

So, what are the difficult issues remaining?

  • Coercions: GHC’s core language includes a syntax of coercions for explicitly casting between equivalent types. Making the type system richer requires more sophisticated types of coercions and makes it harder to prove that everything still works out. But I think we have this mostly ironed out.
  • Surface syntax: Suppose in addition to Z :: Nat we also declare a type called Z. This is legal, since expressions and types inhabit different namespaces. But now suppose GHC sees Z in a type. How does it know which Z we want? Is it the type Z, or is it the data constructor Z lifted to a type? There has to be a way for the programmer to specify what they mean. I think we have a solution to this that is simple to understand and not too heavyweight — I can write about this in more detail if anyone wants.
  • Type inference: This probably makes type inference a lot harder. But I wouldn’t know for sure since that’s the one thing we haven’t really thought too hard about yet.

One final question that may be bothering some: why not just go all the way and collapse all the levels, and have a true dependently-typed language? It’s a valid question, and there are of course languages, notably Agda, Coq, and Epigram, which take this approach. However, one benefit of maintaining a separation between the expression and typekind levels is that it enables a simple erasure semantics: everything at the expression level is needed at runtime, and everything at the typekind level can be erased at compile time since it has no computational significance. Erasure analysis for languages with collapsed expression and type levels is still very much an active area of research.

There’s more to say, but at this point it’s probably easiest to just open things up to questions/comments/feature requests and I’ll write more about whatever comes up! I should probably give more examples as well, which I’ll try to do soon.


Species and Functors and Types, Oh My!

August 1, 2010

My paper on combinatorial species and the species library (an improved version of my previous ICFP submission) has been accepted to the 2010 Haskell Symposium! I look forward to seeing people in Baltimore in September, and in the meantime the impatient can read the final version here.


Typed type-level programming in Haskell, part III: I can haz typs plz?

July 19, 2010

In Part II, I showed how type families can be used to do type-level programming in a functional style. For example, here is addition of natural numbers again:

  data Z
  data S n

  type family Plus m n :: *
  type instance Plus Z n = n
  type instance Plus (S m) n = S (Plus m n)

Now, why might we want to do such a thing? One example (I know, I know, this is always the example… but hey, it’s a good example) is if we wanted to have a type of polymorphic length-indexed vectors (or as they are sometimes known, “Length-Observed Lists”) where the type of a vector includes its length. Using a generalized algebraic data type (GADT), we can write something like this:

  data LOL :: * -> * -> * where
    KThxBye :: LOL Z a
    Moar    :: a -> LOL n a -> LOL (S n) a

This says that

  1. LOL is a type constructor of kind * -> * -> *, that is, it takes two type arguments of kind * and produces a type of kind *. The intention is that the first argument records the length, and the second records the type of the elements.
  2. KThxBye constructs a vector of length zero.
  3. Given an element of type a and a vector of as of length n, Moar constructors a vector of length S n.

The type-level function Plus comes in when we implement an append function for our length-indexed vectors: in order to express the type of append we have to add the lengths of the input vectors.

  append :: LOL m a -> LOL n a -> LOL (Plus m n) a
  append KThxBye     v = v
  append (Moar x xs) v = Moar x (append xs v)

If you haven't already seen things like this, it's a good exercise to figure out why this definition of append typechecks (and why it wouldn't typecheck if we put anything other than Plus m n as the length of the output).

OK, great! We can make GHC check the lengths of our lists at compile time. So what's the problem? Well, there are (at least) three obvious things which this code leaves to be desired:

  1. It doesn't matter whether we have already declared a Nat type with constructors Z and S; we have to redeclare some empty types Z and S to represent our type-level natural number "values". And declaring empty types to use like "values" seems silly anyway.
  2. It also doesn't matter whether we've already implemented a plus function for our Nat values; we must re-code the addition algorithm at the type level with the type family Plus. Especially irksome is the fact that these definitions will be virtually identical.
  3. Finally, and most insidiously, LOL is essentially untyped. Look again at the kind of LOL :: * -> * -> *. There's nothing in the kind of LOL that tells us the first argument is supposed to be a type-level number. Nothing prevents us from accidentally writing the type LOL Int (S Z) -- we'll only run into (potentially confusing) problems later when we try to write down a value with this type.

Wouldn't it be nice if we could reuse (1) values and (2) functions at the type level, and (3) get more informative kinds in the bargain? Indeed, inspired by Conor McBride's SHE, our work aims precisely to enable (1) and (3) in GHC as a start, and hopefully eventually (2) (and other features) as well. Hopefully soon, you'll be able to write this:

  data Nat = Z | S Nat

  type family Plus (m::Nat) (n::Nat) :: Nat
  type instance Plus Z n = n
  type instance Plus (S m) n = S (Plus m n)

  data LOL :: Nat -> * -> * where
    KThxBye :: LOL Z a
    Moar    :: a -> LOL n a -> LOL (S n) a

  append :: ...  -- exactly the same as before

...or even this:

  data Nat = Z | S Nat

  plus :: Nat -> Nat -> Nat
  plus Z n = n
  plus (S m) n = S (plus m n)

  data LOL :: Nat -> * -> * where ... -- same as above

  append :: LOL m a -> LOL n a -> LOL (plus m n) a
  append = ...  -- same as before

In another post I'll explain what the above fantasy code would be doing in a bit more detail, talk about precisely how we propose to accomplish this, and discuss why we might want to do things this way, rather than introducing full dependent types (or just chucking Haskell and all moving to Agda).


Improving GHC’s constraint solving

June 17, 2010

So I’ve been here at MSR Cambridge for almost two weeks now (!), working in the Programming Principles and Tools Group with Simon Peyton-Jones and Dimitrios Vytiniotis — and also collaborating with Stephanie Weirich and Steve Zdancewic, who are also in Cambridge. So, what have I been doing?

This week, Simon, Dimitris, and I have been having a major GHC hacking session, implementing the new constraint-solving type inference engine described in their new OutsideIn(X) paper. It’s been a lot of fun — I’ve never hacked on GHC before, and it’s quite a new experience hacking on such a large (and amazing) piece of software. I’ve been working on the constraint canonicaliser, which puts constraints into canonical forms appropriate for the constraint simplifier to work with. As a simple example, the equality constraint (Int, a) ~ (b, Char) gets decomposed into the primitive constraints Int ~ b and a ~ Char. It’s also responsible for flipping and flattening equality constraints so that type function applications only happen on the left-hand side: for example, the constraint F a b ~ G Int (where both F and G are type families), gets rewritten to a pair of constraints F a b ~ c, G Int ~ c, where c is a fresh variable. If we didn’t do this it would lead to problems with termination: for example, the constraint a ~ [F a] could lead to infinite rewriting of a to [F a] to [F [F a]] to… (And before you protest that we ought to just throw out a ~ [F a] on the grounds that it is recursive, note that F a may not mention a at all; for example, perhaps F is defined by F [x] = Int.)

Constraints also include type class constraints and implicit parameter constraints, although there’s much less work to do with those as far as canonicalisation is concerned.

Next week, I’ll likely get back to our other, more ambitious research project, but I’ll write more about that next time. In the meantime, if you’re in or near Cambridge and want to meet up, just drop me a note!


Typeclassopedia in Japanese!

October 20, 2009

Satoshi Nakamura has published a Japanese translation of the Typeclassopedia. I don’t read any Japanese, but it sure looks cool, and I hope this will be a great resource for Japanese learners of Haskell. A big thank you to Satoshi for his hard work; the Typeclassopedia is certainly not short!


Follow

Get every new post delivered to your Inbox.