I’ve come up with idea for a type system for first-class (global) definitions, which can serve as a very lightweight alternative to a proper module system. I’m posting it here in the hopes of getting some feedback and pointers to related work.
Commands and expressions
The programming language of Swarm (for lack of a better term I will hereafter refer to it as Swarmlang) has a bunch of imperative commands, and standard monadic sequencing constructs. For example,
move; move
does two move
commands in sequence, and
thing <- grab; give friend thing
first executes grab
, binding the variable thing
to the result, then executes give friend thing
. Of course, there is also a rich language of pure expressions, with things like arithmetic, strings, lambdas and function application, pairs, sums, and so on.
Some languages make a syntactic distinction between statements and expressions, but Swarmlang does not: everything is an expression, and some expressions happen to have a command type. If t
is a type, then cmd t
is the type of an imperative command which, when executed, can have some effects and then returns a result of type t
. (Of course this should feel very familiar to Haskell programmers; cmd
has many similarities to IO
.) This approach makes many things simpler and means that commands are first-class values.
Typechecking definitions
Swarmlang has definitions, which are just expressions with a command type. If e
is an expression, then
def x = e end
has type cmd ()
. When executed, it should have the effect of binding the name x
to the expression e
, and bringing x
into scope for all subsequent commands. Thus, it is valid to sequence this first definition with a second definition that mentions x
, like so:
def x = e end;
def y = foo bar x end
Of course, this means that while typechecking the definition of y
, we must be able to look up the type of x
. However, the type of the first def
command is simply cmd ()
, which does not tell us anything about x
or its type. Normally, the typing rule for sequencing of commands would be something like
but this does not work for def
commands, since it does not take into account the new names brought into scope. Up until now, I have dealt with this in a somewhat ad-hoc manner, with some special typechecking rules for def
and some ad-hoc restrictions to ensure that def
can only syntactically show up at the top level. However, I would really like to put everything on a more solid theoretical basis (which will hopefully simplify the code as well).
Decorating command types
The basic idea is to decorate the type with extra information about names bound by definitions. As usual, let
denote a generic context, that is, a finite mapping from variable names to their types. Then we extend the
cmd
type by adding a context to it:
is the type of a command which yields a result of type and produces global bindings for some names whose types are recorded in
. (Of course, we can continue to use
as an abbreviation for
.) So, for example,
def x = 3 end
no longer has type , but rather something like
, representing the fact that although
def x = 3 end
does not result in an interesting value, it does bind a name, x
, whose type is int
.
This is slightly unusual in the fact that types and contexts are now mutually recursive, but that doesn’t seem like a big problem. We can now write down a proper typing rule for sequencing that takes definitions into account, something like this:
And of course the typing rule for def
looks like this:
These rules together can now correctly typecheck an expression like
def x = 3 end;
def y = 2 + x end
where the second definition refers to the name defined by the first. The whole thing would end up having type .
…with polymorphism?
All this seems straightforward with only first-order types, as in my example typing rules above. But once you add parametric polymorphism my brain starts to hurt. Clearly, the context associated to a command type could bind variables to polytypes. For example, def id = \x.x end
has type . But should the context associated to a command type always contain polytypes, or only when the command type is itself a polytype? In other words, how do we deal with the associated contexts in the monotypes that show up during type inference? And what would it mean to unify two command types with their contexts (and would that ever even be necessary)? I hope it’s actually simple and I just need to think about it some more, but I haven’t wrapped my brain around it yet.
Ideas and pointers welcome!
I’d be very happy to hear anyone’s ideas, or (especially) pointers to published work that seems related or relevant! Feel free to comment either here, or on the relevant github issue.
Right after publishing this I started wondering whether we could make the “output context” part of the typing *judgment* and not part of the types themselves. I have not thought through this too carefully yet but it would certainly solve the issues with worrying about unification and so on. Glad to hear feedback on that idea too.
This is how Benjamin handles top-level definitions in his TAPL interpreters; there’s a statement-level typing relation with signature Γ ⊦ S ⇒ Γ.
But if you really want these to be expressions and they can show up in higher order functions and the definitions are global/dynamic, I think tracking things statically is going to be quite challenging. Any time you can put a definition under control flow (a lambda, an if) and then rely on it elsewhere, you have to handle merging, i.e., your type system does def/use analysis.
Cool, glad to know I was on the right track re: handling top-level definitions.
I actually have no particularly compelling reason to let defs show up anywhere, I just (naively) thought that it would be easy if we could embed contexts into their types. But I now get what you’re saying about def/use analysis. That sounds tricky and definitely not worth it.
As a shell lover, I’m definitely into the idea of static disciplines for managing dynamic scope. But you may not be so into that for a language that’s supposed to be easy to use… 😁
Hah! Yeah, not so much…
These inference rules are interesting but assume a strict sequence of ‘def’initions. How could you ‘def’ine (mutually) recursive commands?
Yes, good point, that has occurred to me as well. Currently there’s no way to define mutually recursive commands, but perhaps in the future we will generalize `def` so you can define multiple things at once.