Avoiding the axiom of choice, part I

I’m hard at work on my dissertation, and plan to get back to doing a bit of blogging based on stuff I’m writing and thinking about, as a way of forcing myself to explain things clearly and to potentially get some useful feedback. Yes, this means you can expect more on species! But first, I’ll write a few posts about the axiom of choice and anafunctors, which have turned out to be fundamental to some of the work I’ve been doing. If you look at the nLab page for anafunctors, linked above, you could be forgiven for thinking this is a pretty esoteric corner of category theory, but in fact it’s not too bad once you grasp the essentials, and is quite relevant for anyone interested in category theory and constructive/computational foundations (such as homotopy type theory (HoTT)).

The Axiom of Choice

The (in)famous Axiom of Choice (hereafter, AC) can be formulated in a number of equivalent ways. Perhaps the most well-known is:

The Cartesian product of any collection of non-empty sets is non-empty.

Given a family of sets \{X_i \mid i \in I\}, an element of their Cartesian product is some I-indexed tuple \{x_i \mid i \in I\} where x_i \in X_i for each i. Such a tuple can be thought of as a function (called a choice function) which picks out some particular x_i from each X_i.

The Axiom of Choice in type theory, take 1

We can express this in type theory as follows. First, we assume we have some type I which indexes the collection of sets; that is, there will be one set for each value of type I. Given some type A, we can define a subset of the values of type A using a predicate, that is, a function P : A \to \star (where \star denotes the universe of types). For some particular a : A, applying P to a yields a type, which can be thought of as the type of evidence that a is in the subset P; a is in the subset if and only if P\ a is inhabited. An I-indexed collection of subsets of A can then be expressed as a function C : I \to A \to \star. In particular, C(i,a) is the type of evidence that a is in the subset indexed by i. (Note that we could also make A into a family of types indexed by I, that is, A : I \to \star, but it wouldn’t add anything to this discussion.)

A set is nonempty if it has at least one element, so the fact that all the sets in C are nonempty can be modeled by a dependent function which yields an element of A for each index, along with a proof that it is contained in the corresponding subset.

(i : I) \to (a : A) \times C(i,a)

(Note I’m using the notation (x:X) \to T(x) for dependent function types instead of \prod_{x:X} T(x), and (x:X) \times T(x) for dependent pairs instead of \sum_{x:X} T(x).) An element of the Cartesian product of C can be expressed as a function I \to A that picks out an element for each I (the choice function), together with a proof that the chosen elements are in the appropriate sets:

(g : I \to A) \times ((i : I) \to C(i, g(i)))

Putting these together, apparently the axiom of choice can be modelled by the type

((i : I) \to (a : A) \times C(i,a)) \to (g : I \to A) \times ((i : I) \to C(i, g(i)))

Converting back to \Pi and \Sigma notation and squinting actually gives some good insight into what is going on here:

\left( \prod_{i : I} \sum_{a : A} C(i,a) \right) \to \left( \sum_{g : I \to A} \prod_{i : I} C(i, g(i)) \right)

Essentially, this says that we can “turn a (dependent) product of sums into a (dependent) sum of products”. This sounds a lot like distributivity, and indeed, the strange thing is that this is simply true: implementing a function of this type is a simple exercise! If you aren’t familiar with dependent type theory, you can get the intuitive idea by implementing a non-dependent Haskell analogue, namely something of type

(i -> (a,c)) -> (i -> a, i -> c).

Not too hard, is it? (The implementation of the dependent version is essentially the same; it’s only the types that get more complicated, not the implementation.) So what’s going on here? Why is AC so controversial if it is simply true in type theory?

The Axiom of Choice in type theory, take 2

This is not the axiom of choice you’re looking for. — Obi-Wan Funobi

The problem, it turns out, is that we’ve modelled the axiom of choice improperly, and it all boils down to how non-empty is defined. When a mathematician says “S is non-empty”, they typically don’t actually mean “…and here is an element of S to prove it”; instead, they literally mean “it is not the case that S is empty”, that is, assuming S is empty leads to a contradiction. (Actually, it is a bit more subtle yet, but this is a good first approximation.) In classical logic, these viewpoints are equivalent; in constructive logic, however, they are very different! In constructive logic, knowing that it is a contradiction for S to be empty does not actually help you find an element of S. We modelled the statement “this collection of non-empty sets” essentially by saying “here is an element in each set”, but in constructive logic that is a much stronger statement than simply saying that each set is not empty.

(I should mention at this point that when working in HoTT, the best way to model what classical mathematicians mean when they say “S is non-empty” is probably not with a negation, but instead with the propositional truncation of the statement that S contains an element. Explaining this would take us too far afield; if you’re interested, you can find details in Chapter 3 of the HoTT book, where all of this and much more is explained in great detail.)

From this point of view, we can see why the “AC” in the previous section was easy to implement: it had to produce a function choosing a bunch of elements, but it was given a bunch of elements to start! All it had to do was shuffle them around a bit. The “real” AC, on the other hand, has a much harder job: it is told some sets are non-empty, but without any actual elements being mentioned, and it then has to manufacture a bunch of elements out of thin air. This is why it has to be taken as an axiom; we can also see that it doesn’t fit very well in a constructive/computational context. Although it is logically consistent to assume it as an axiom, it has no computational interpretation, so anything we define using it will just get stuck operationally.

So, we’ll just avoid using AC. No problem, right?

Don’t look now, but AC is behind you

The problem is that AC is really sneaky. It tends to show up all over the place, but disguised so that you don’t even realize it’s there. You really have to train yourself to think in a fundamentally constructive way before you start to notice the places where it is used. Next time I’ll explain one place it shows up a lot, namely, when defining functors in category theory (though thankfully, not when defining Functor instances in Haskell).

About these ads
This entry was posted in category theory, math, species and tagged , , , , , . Bookmark the permalink.

17 Responses to Avoiding the axiom of choice, part I

  1. tikitu says:

    The category theory is way over my head, but I took a peek at your LaTeX: you might find \includeonly{} useful instead of commenting in/out chapters.

  2. shebang23 says:

    Wow, now AC actually starts to make sense! Looking forward to reading the sequel.

    But are you saying that AC is trivial in classical logic? Isn’t ZFC based on such a logic?

    • Brent says:

      Glad it is making sense! And no, I’m certainly not saying AC is trivial in classical logic, sorry if that wasn’t clear. Yes, ZF is based on classical logic, and in fact it has been shown that AC and its negation are both consistent with ZF, which in particular means that neither one is derivable in ZF (because then adding the other would lead to inconsistency). So typically mathematicians work in ZFC, i.e. ZF extended with AC as an extra axiom. (Also, do you have a suggestion how I could improve the text of the post to make this more clear?)

      • shebang23 says:

        So here’s how I understood your post:

        1. Proving that the product set is non-empty is straightforward when you know that the original sets are non-empty in the sense that they are inhabited. OTOH, it’s impossible if all you know about them is that assuming their emptiness leads to contradiction.

        2. In constructive logic, the above interpretations are different, so AC is not provable. OTOH, in classical logic, they are equivalent, so the proof in your post should work, right?

        Or is it maybe the case that the logic you used to prove AC is more expressive than the first-order logic used in ZF?

        • Brent says:

          Ah, thanks, I see the confusion now. Indeed, I think I hadn’t properly undstood the relationship of AC and ZF / constructive logic. In particular, saying “this is why it has to be taken as an axiom” (i.e. because it has to manufacture elements out of thin air) is bogus. That’s why it doesn’t make sense from a constructive point of view but it isn’t why it’s not derivable in ZF. Indeed, the statement that a set is non-empty and that it has an element are equivalent in classical logic. So consider this statement, which I’ll call the Axiom of Finite Choice (AFC):

          The Cartesian product of any *finite* collection of non-empty sets is non-empty.

          It turns out that AFC is straightforward to prove in ZF: you just use induction plus the fact that non-emptiness is the same as having an element. So the real reason AC is not a theorem of ZF is the fact that it applies to *any* set of non-empty sets, even infinite ones, where induction can’t be applied.

          • shebang23 says:

            Yeah, that’s exactly how I’d understood AC.

            I hoped that constructivist view would shed new light on AC, but it looks like we agree in that it doesn’t. Oh well.

          • shebang23 says:

            Also, you *did* manage to prove the weaker AC (which in classical logic is equivalent to normal AC) in the type theory. Is it the case that your interpretation of set-theoretic concepts in type theory is not precise enough to be able to translate the proof back into set theory?

          • Brent says:

            I am not entirely sure, but yes, I think something like that may be going on. I may be wrong about any of the following but here’s some informed speculation on the difference. First, the concept of “all types” is a fairly nicely behaved thing, since the *set* of all types is defined inductively. So we may e.g. prove that certain things hold for all types by structural induction on their syntax. On the other hand, the concept of “all sets” is not very well-behaved at all. It is not a set, and you cannot use induction to prove things about it. (I am not quite sure this fully explains the apparent discrepancy, though, since we did not actually need any sort of induction to prove the weaker AC in type theory.)

  3. Most mathematicians accept AC since so much interesting mathematics would have to be thrown in the garbage otherwise. It does lead to some great paradoxes though. Among my favorites is: http://en.wikipedia.org/wiki/Banach–Tarski_paradox

  4. Pingback: Unique isomorphism and generalized “the” | blog :: Brent -> [String]

  5. Dominic Steinitz says:

    I have been reading a few books on functional analysis (Brezis, Robinson, Teschl) and all them use Zorn’s lemma which is well known to be equivalent to AC. I wonder how Zorn’s lemma would look in type theory? It seems from what you write that in type theory one can’t just say something exists but one has to give a proof of existence along with it? My knowledge of type theory is sadly limited. Should I read the HoTT book? I am familiar with homotopy (although how type theory and continuous deformation of functions relate is mysterious). I have a long plane journey coming up :-)

    • Brent says:

      Right, in classical logic you can prove that a Foo exists using double negation elimination (or equivalently, the law of excluded middle), i.e. that it can’t be the case that a Foo does not exist. But obviously this does not actually help you find a Foo. Intuitionistic/constructive logic rejects double negation elimionation/LEM, so the only way to prove an existence statement is to actually construct a witness.

      It should be quite possible to encode Zorn’s lemma in type theory, though it’s a bit more involved since you first have to encode the concept of a poset, and of a chain, etc. etc.

      You should definitely read the HoTT book; in particular it is specifically aimed at mathematicians who don’t know a lot of type theory (though knowing some certainly doesn’t hurt; the fact that you know Haskell will help a bit).

  6. Pingback: AC and equivalence of categories | blog :: Brent -> [String]

  7. Pingback: Anafunctors | blog :: Brent -> [String]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s