In Part III, we saw how the current state of the art in Haskell type-level programming leaves some things to be desired: it requires duplicating both data declarations and code, and even worse, it’s untyped. What to do?
Currently, GHC’s core language has three “levels”:
- Expressions: these include things like term variables, lambdas, applications, and case expressions.
- Types: e.g. type variables, base types, forall, function types.
- Kinds:
*
and arrow kinds.
Types classify expressions (the compiler accepts only well-typed expressions), and kinds classify types (the compiler accepts only well-kinded types). For example,
3 :: Int
,
Int :: *
,
Maybe :: * -> *
,
and so on.
The basic idea is to allow the automatic lifting of types to kinds, and their data constructors to type constructors. For example, assuming again that we have
data Nat = Z | S Nat
we can view Z :: Nat
as either the data constructor Z
with type Nat
, or as the type Z
with kind Nat
. Likewise, S :: Nat -> Nat
can be viewed either as a data constructor and a type, or a type constructor and a kind.
One obvious question: if Z
is a type, and types classify expressions, then what expressions have type Z
? The answer: there aren’t any. But this makes sense. We want to be able to use Z
as a type-level “value”, and don’t really care about whether it classifies any expressions. And indeed, without this auto-lifting, if we wanted to have a type-level Z
we would have declared an empty data type data Z
.
Notice we have much richer kinds now, since we are basically importing an entire copy of the type level into the kind level. But that also means we will need something to classify kinds as well, so we need another level… and what’s to stop us from lifting kinds up to the next level, and so on? We would end up with an infinite hierarchy of levels. In fact, this is exactly what Omega does.
But in our case we can do something much simpler: we simply collapse the type and kind levels into a single level so that types and kinds are now the same thing, which I will call typekinds (for lack of a better term). We just take the ordinary syntax of types that we already had, and the only things we need to add are lifted data constructors and *
. (There are still some questions about whether we represent arrow kinds using the arrow type constructor or forall, but I’ll leave that aside for the moment.) To tie the knot, we add the axiom that the typekind *
is classified by itself. It is well-known that this allows the encoding of set-theoretic paradoxes that render the type system inconsistent when viewed as a logic — but Haskell’s type system is already an inconsistent logic anyway, because of general recursion, so who cares?
So, what are the difficult issues remaining?
- Coercions: GHC’s core language includes a syntax of coercions for explicitly casting between equivalent types. Making the type system richer requires more sophisticated types of coercions and makes it harder to prove that everything still works out. But I think we have this mostly ironed out.
- Surface syntax: Suppose in addition to
Z :: Nat
we also declare a type calledZ
. This is legal, since expressions and types inhabit different namespaces. But now suppose GHC seesZ
in a type. How does it know whichZ
we want? Is it the typeZ
, or is it the data constructorZ
lifted to a type? There has to be a way for the programmer to specify what they mean. I think we have a solution to this that is simple to understand and not too heavyweight — I can write about this in more detail if anyone wants. - Type inference: This probably makes type inference a lot harder. But I wouldn’t know for sure since that’s the one thing we haven’t really thought too hard about yet.
One final question that may be bothering some: why not just go all the way and collapse all the levels, and have a true dependently-typed language? It’s a valid question, and there are of course languages, notably Agda, Coq, and Epigram, which take this approach. However, one benefit of maintaining a separation between the expression and typekind levels is that it enables a simple erasure semantics: everything at the expression level is needed at runtime, and everything at the typekind level can be erased at compile time since it has no computational significance. Erasure analysis for languages with collapsed expression and type levels is still very much an active area of research.
There’s more to say, but at this point it’s probably easiest to just open things up to questions/comments/feature requests and I’ll write more about whatever comes up! I should probably give more examples as well, which I’ll try to do soon.
How does this approach relate to UHC’s kind polymorphism,
as described in http://monoidal.blogspot.com/2010/07/kind-polymorphism-in-action.html ?
Great question. Off the top of my head I am not sure, but I will think about it and try to write something about it soon.
Doesn’t kind polymorphism just mean allowing kind variables?
I’m wondering about your use of the word “expression”.
Wouldn’t it be better to call the lowest level “value”? I thought something like `Maybe Int` is also called a (type) application _expression_.
It might be my misunderstanding of the English word, but I thought you can have expressions on any level (value, type, kind, sort).
Well, you’re right, “expression” is open to misinterpretation. The problem is that “value” is sometimes used to mean only “fully evaluated things” (i.e. 7 is a value but 3+4 is not) which I was trying to avoid. I’m not sure what the best/most standard terminology is.
I think its fair to say 3+4 is an expression at the value level. In the same way that `Maybe Int` is an expression at the type level.
An expression at a certain level is something that will evaluate to the thing of that level. E.g. 3+4 will evaluate to the value “7” and ([] Int) will evaluate to the type “list of int”.
This is a non-theoretical, minor and nearly *unfair* problem, but collapsing value-level and type-level expressions would also break the Haskell syntax for product types : what would ((), ()) mean at the type level ?
You can “collapse” expressions and types and still have type erasure. Cayenne does that. The cost is that you need to stratify the kinds again.
Interesting, thanks for pointing this out. So you can erase anything classified by a kind, but not things classified by other values/types?
That’s right. You have to be careful how you classify compound data structures that contain parts at different levels. Unfortunately, the stratification needed for type erasure is not the same as the one needed to make the logic consistent.
There are also schemes for type-like erasure that don’t involve applying it to a certain universe. For instance, in a function of the type:
forall (n : Nat) (t u : *). (t -> u) -> Vec n t -> Vec n u
it may be that n appears only in the types, and so can be erased same as t and u. In something like GHC, this happens automatically, because we must stick our type naturals into * like everything else type-level.
But, you can instead distinguish erasable things from non-erasable things via the quantifier. In a pure type system, this means you have two pis, two lambdas and two applications (or two annotations on each). Typing rules for the different quantifiers ensure that you can’t use static arguments in runtime-relevant positions (the reverse is fine).
I was going to say the distinction is even more natural in Haskell, because there’s already ‘forall’ which only does static stuff, and binds a variable, and (->), which is non-dependent, and types the dynamic stuff. But that’s probably not true if we’re aiming for type-level programming and datatypes that look like they’re value-indexed. Nevertheless, ‘forall’ is a natural name for the static quantifier, while something else could be used for a dynamic dependent one (if such a thing exists).
This doesn’t cover all types of erasure (like runtime erasure of proof-irrelevant terms, not that those would be available if your logic is inconsistent), but it handles the static-dynamic type of erasability (and likely, the parametricity theorems that go with it) that is normally baked into types.
For more information, there’s Erasure and Polymorphism in Pure Type Systems by Nathan Mishra-Linger and Tim Sheard, as well as the former’s thesis.
does * :: * also mean that typechecking becomes undecidable? (even without UndecidableInstances i mean)
Hmm, why would it? (I’m genuinely curious.)
It most likely does. If the type level has enough power you can write type expression that don’t have a normal form.
Does it help that you can’t write explicit type lambdas? I’m not familiar enough with the exact way that * : * leads to nontermination. Perhaps I should be.
I’m just guessing. But since H-M type checking is already on the brink of undecidability (equivalent to a Turing machine with a finite tape), it takes very little to push it over.
You need to watch out for Russel’s paradox and Girard’s paradox (which is very complex to encode).
Are you referring to the “simplified” version or the previous one? I’m thinking of:
“A simplification of Girard’s paradox”
http://www.springerlink.com/content/w718604jn467672h/
So given your description it seems that (*, Int) is a valid typekind?
And (Bool, 5) would be a valid expression of this type.
But you don’t want to allow the latter.
In that case I would find it disturbing if the former was allowed.
Yes, (*, Int) is a valid typekind, but it is uninhabited. (Bool, 5) is not a valid expression from a purely syntactic point of view, since Bool is not a value-level expression.
I meant (Bool, 5) where Bool is the normal type and 5 is a normal Int.
I don’t think it will cause any problems that you can’t write (Bool, 5) in your program. But is it really uninhabited? You have Bool::* and 5::Int, and normally forming a pair of inhabited types creates a new inhabited type. If you want it uninhabited, how are you going to set up your type system to make it so?
You’re right, (*, Int) _as a kind_ is inhabited by the _type_ (Bool, {5}) (where I have used curly braces to explicitly mark the lifting of 5 from the value to the type level). But can you explain why we wouldn’t want to allow (Bool, {5})? (*, Int) _as a type_ is not inhabited by any value-level expressions, since there is no value-level expression with type *.
You’re making statements like “there is no value-level expression with type *”, but I’m not sure how you will reflect that in the typing rules. Normally the typing rule of pairs is something like this:
a::A b::B (A,B)::*
——————-
(a,b) :: (A,B)
Instantiating this:
Bool::* 5::Int (*,Int)::*
————————–
(Bool,5)::(*,Int)
So you’ll have to come up with some different typing rule for pairs if you want to make (*,Int) uninhabited.
I think I see the confusion. Note there are *two different* judgements involved: a typing judgement which relates value-level and typekind-level things, and a kinding judgement which relates typekind-level things to other typekind-level things. I’ll write the first as ::T and the second as ::K. So we have (Bool, 5) ::K (*, Int) in just the way you describe (since Bool ::K * and 5 ::K Int), and this is perfectly fine. However, we do not have (Bool, 5) ::T (*, Int), since there are no rules of the form ? ::T *. Even if there were, Bool is not syntactically a value-level thing so it cannot appear on the left-hand side of the ::T judgement anyway.
I’ll have to try that out on paper to see if I believe it works. :) It probably does.
BTW, there’s nothing bad about allowing (Bool, 5) ::T (*, Int).
Er, sorry, I guess it (*, Int) is inhabited by (Bool, 5) considered as a _type_ (where the 5 has been auto-lifted). But I see no problem with that. (The type (Bool, 5) is definitely uninhabited.)
Sigh, I should think longer before I post. Of course (Bool, 5) is inhabited by things like (True, _|_). Anyway, I still don’t think any of this is a problem.
Thanks for this quite exciting work!
“plus :: Nat -> Nat -> Nat” is closed, no new cases can be added to this in a separate module. type families aren’t; presumably, “type instance Plus (S m) n ..” could be given in a different module than the one where “type instance Plus Z n ..” is defined. Maybe then it would make more sense to autolift the function “plus” to a closed type family, say something defined with a syntax like
type family Plus (m::Nat) (n::Nat) :: Nat where
Plus Z n = n
Plus (S m) n = S (Plus m n)
or, if this were possible, more analogously as:
Plus :: Nat -> Nat -> Nat
Plus Z n = n
Plus (S m) n = S (Plus m n)
if autolifted functions are to be added later on, this would be especially practical in the meantime, since a closed type family could be less restrictive about say overlap..
If there exist closed type families, this suggests also closed data/newtype families, say with analogous “data/newtype family .. where ..” syntax (if the second, more direct syntax is supportable, it should be limited to the most useful, type family case only).
Given that families are also created by the associated types syntax, closed families also suggest closed classes…
Pingback: Typed type-level programming: status report « blog :: Brent -> [String]