Types for top-level definitions

I’ve come up with idea for a type system for first-class (global) definitions, which can serve as a very lightweight alternative to a proper module system. I’m posting it here in the hopes of getting some feedback and pointers to related work.

Commands and expressions

The programming language of Swarm (for lack of a better term I will hereafter refer to it as Swarmlang) has a bunch of imperative commands, and standard monadic sequencing constructs. For example,

move; move

does two move commands in sequence, and

thing <- grab; give friend thing

first executes grab, binding the variable thing to the result, then executes give friend thing. Of course, there is also a rich language of pure expressions, with things like arithmetic, strings, lambdas and function application, pairs, sums, and so on.

Some languages make a syntactic distinction between statements and expressions, but Swarmlang does not: everything is an expression, and some expressions happen to have a command type. If t is a type, then cmd t is the type of an imperative command which, when executed, can have some effects and then returns a result of type t. (Of course this should feel very familiar to Haskell programmers; cmd has many similarities to IO.) This approach makes many things simpler and means that commands are first-class values.

Typechecking definitions

Swarmlang has definitions, which are just expressions with a command type. If e is an expression, then

def x = e end

has type cmd (). When executed, it should have the effect of binding the name x to the expression e, and bringing x into scope for all subsequent commands. Thus, it is valid to sequence this first definition with a second definition that mentions x, like so:

def x = e end;
def y = foo bar x end

Of course, this means that while typechecking the definition of y, we must be able to look up the type of x. However, the type of the first def command is simply cmd (), which does not tell us anything about x or its type. Normally, the typing rule for sequencing of commands would be something like

\displaystyle \frac{\Gamma \vdash c_1 : \mathrm{cmd}\; \tau_1 \qquad \Gamma \vdash c_2 : \mathrm{cmd}\; \tau_2}{\Gamma \vdash c_1 ; c_2 : \mathrm{cmd}\;\tau_2}

but this does not work for def commands, since it does not take into account the new names brought into scope. Up until now, I have dealt with this in a somewhat ad-hoc manner, with some special typechecking rules for def and some ad-hoc restrictions to ensure that def can only syntactically show up at the top level. However, I would really like to put everything on a more solid theoretical basis (which will hopefully simplify the code as well).

Decorating command types

The basic idea is to decorate the \mathrm{cmd} type with extra information about names bound by definitions. As usual, let \Gamma denote a generic context, that is, a finite mapping from variable names to their types. Then we extend the cmd type by adding a context to it:

\mathrm{cmd}\; \tau \Rightarrow \Gamma

is the type of a command which yields a result of type \tau and produces global bindings for some names whose types are recorded in \Gamma. (Of course, we can continue to use \mathrm{cmd}\; \tau as an abbreviation for \mathrm{cmd}\; \tau \Rightarrow \varnothing.) So, for example, def x = 3 end no longer has type \mathrm{cmd}\; (), but rather something like \mathrm{cmd}\; () \Rightarrow \{x : \mathrm{int}\}, representing the fact that although def x = 3 end does not result in an interesting value, it does bind a name, x, whose type is int.

This is slightly unusual in the fact that types and contexts are now mutually recursive, but that doesn’t seem like a big problem. We can now write down a proper typing rule for sequencing that takes definitions into account, something like this:

\displaystyle \frac{\Gamma \vdash c_1 : \mathrm{cmd} \; \tau_1 \Rightarrow \Gamma_1 \qquad \Gamma, \Gamma_1 \vdash c_2 : \mathrm{cmd} \; \tau_2 \Rightarrow \Gamma_2}{\Gamma \vdash c_1 ; c_2 : \mathrm{cmd} \; \tau_2 \Rightarrow \Gamma, \Gamma_1, \Gamma_2}

And of course the typing rule for def looks like this:

\displaystyle \frac{\Gamma \vdash e : \tau}{\Gamma \vdash \texttt{def}\; x = e\; \texttt{end} : \mathrm{cmd}\; () \Rightarrow \{x : \tau\}}

These rules together can now correctly typecheck an expression like

def x = 3 end;
def y = 2 + x end

where the second definition refers to the name defined by the first. The whole thing would end up having type \mathrm{cmd}\; () \Rightarrow \{ x : \mathrm{int}, y : \mathrm{int} \}.

…with polymorphism?

All this seems straightforward with only first-order types, as in my example typing rules above. But once you add parametric polymorphism my brain starts to hurt. Clearly, the context associated to a command type could bind variables to polytypes. For example, def id = \x.x end has type \mathrm{cmd}\; () \Rightarrow \{id : \forall \alpha. \alpha \to \alpha\}. But should the context associated to a command type always contain polytypes, or only when the command type is itself a polytype? In other words, how do we deal with the associated contexts in the monotypes that show up during type inference? And what would it mean to unify two command types with their contexts (and would that ever even be necessary)? I hope it’s actually simple and I just need to think about it some more, but I haven’t wrapped my brain around it yet.

Ideas and pointers welcome!

I’d be very happy to hear anyone’s ideas, or (especially) pointers to published work that seems related or relevant! Feel free to comment either here, or on the relevant github issue.

About Brent

Assistant Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in projects and tagged , , . Bookmark the permalink.

7 Responses to Types for top-level definitions

  1. Brent says:

    Right after publishing this I started wondering whether we could make the “output context” part of the typing *judgment* and not part of the types themselves. I have not thought through this too carefully yet but it would certainly solve the issues with worrying about unification and so on. Glad to hear feedback on that idea too.

    • This is how Benjamin handles top-level definitions in his TAPL interpreters; there’s a statement-level typing relation with signature Γ ⊦ S ⇒ Γ.

      But if you really want these to be expressions and they can show up in higher order functions and the definitions are global/dynamic, I think tracking things statically is going to be quite challenging. Any time you can put a definition under control flow (a lambda, an if) and then rely on it elsewhere, you have to handle merging, i.e., your type system does def/use analysis.

      • Brent says:

        Cool, glad to know I was on the right track re: handling top-level definitions.

        I actually have no particularly compelling reason to let defs show up anywhere, I just (naively) thought that it would be easy if we could embed contexts into their types. But I now get what you’re saying about def/use analysis. That sounds tricky and definitely not worth it.

  2. Nikolaos Bezirgiannis says:

    These inference rules are interesting but assume a strict sequence of ‘def’initions. How could you ‘def’ine (mutually) recursive commands?

    • Brent says:

      Yes, good point, that has occurred to me as well. Currently there’s no way to define mutually recursive commands, but perhaps in the future we will generalize `def` so you can define multiple things at once.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.