Suppose someone hands you the following:
A Haskell function f :: (A, Bool) -> (B, Bool)
, where A
and B
are abstract types (i.e. their constructors are not exported, and you have no other functions whose types mention A
or B
).
A promise that the function f
is injective, that is, no two values of (A, Bool)
map to the same (B, Bool)
value. (Thus (B, Bool)
must contain at least as many inhabitants as (A, Bool)
.)
A list as :: [A]
, with a promise that it contains every value of type A
exactly once, at a finite position.
Can you explicitly produce an injective function f' :: A -> B
? Moreover, your answer should not depend on the order of elements in as
.
It really seems like this ought to be possible. After all, if (B, Bool)
has at least as many inhabitants as (A, Bool)
, then surely B
must have at least as many inhabitants as A
. But it is not enough to reason merely that some injection must exist; we have to actually construct one. This, it turns out, is tricky. As a first attempt, we might try f' a = fst (f (a, True))
. That is certainly a function of type A -> B
, but there is no guarantee that it is injective. There could be a1, a2 :: A
which both map to the same b
, that is, one maps to (b, False)
and the other to (b, True)
. The picture below illustrates such a situation: (a1, True)
and (a2, True)
both map to b2
. So the function f
may be injective overall, but we can’t say much about f
restricted to a particular Bool
value.
The requirement that the answer not depend on the order of as
also makes things difficult. (Over in math-land, depending on a particular ordering of the elements in as
would amount to the well-ordering principle, which is equivalent to the axiom of choice, which in turn implies the law of excluded middle—and as we all know, every time someone uses the law of excluded middle, a puppy dies. …I feel like I’m in one of those DirecTV commercials. “Don’t let a puppy die. Ignore the order of elements in as
.”) Anyway, making use of the order of values in as
, we could do something like the following:
a :: A
:
B
values generated by f (a,True)
and f (a,False)
. (Note that there might only be one distinct such B
value).B
value has been used so far, pick the one that corresponds to (a,True)
, and add the other one to a queue of available B
values.B
value from the queue.It is not too hard I couldn’t be bothered to show that this will always successfully result in a total function A -> B
, which is injective by construction. (One has to show that there will always be an available B
value in the queue when you need it.) The only problem is that the particular function we get depends on the order in which we iterate through the A
values. The above example illustrates this as well: if the A
values are listed in the order , then we first choose , and then . If they are listed in the other order, we end up with and . Whichever value comes first “steals” , and then the other one takes whatever is left. We’d like to avoid this sort of dependence on order. That is, we want a well-defined algorithm which will yield a total, injective function A -> B
, which is canonical in the sense that the algorithm yields the same function given any permutation of as
.
It is possible—you might enjoy puzzling over this a bit before reading on!
The above example is a somewhat special case. More generally, let denote a canonical finite set of size , and let and be arbitrary sets. Then, given an injection , is it possible to effectively (that is, without excluded middle or the axiom of choice) compute an injection ?
Translating down to the world of numbers representing set cardinalities—natural numbers if and are finite, or cardinal numbers in general—this just says that if then . This statement about numbers is obviously true, so it would be nice if we could say something similar about sets, so that this fact about numbers and inequalities can be seen as just a “shadow” of a more general theorem about sets and injections.
As hinted in the introduction, the interesting part of this problem is really the word “effectively”. Using the Axiom of Choice/Law of Excluded Middle makes the problem a lot easier, but either fails to yield an actual function that we can compute with, instead merely guaranteeing the existence of such a function, or gives us a function that depends on a particular ordering of .
Apparently this has been a longstanding open question, recently answered in the affirmative by Peter Doyle and Cecil Qiu in their paper Division By Four. It’s a really great paper: they give some fascinating historical context for the problem, and explain their algorithm (which is conceptually not all that difficult) using an intuitive analogy to a card game with certain rules. (It is not a “game” in the usual sense of having winners and losers, but really just an algorithm implemented with “players” and “cards”. In fact, you could get some friends together and actually perform this algorithm in parallel (if you have sufficiently nerdy friends).) Richard Schwartz’s companion article is also great fun and easy to follow (you should read it first).
Here’s a quick introduction to the way Doyle, Qiu, and Schwartz use a card game to formulate their algorithm. (Porting this framework to use “thrones” and “claimants” instead of “spots” and “cards” is left as an exercise to the reader.)
The finite set is to be thought of as a set of suits. The set will correspond to a set of players, and to a set of ranks or values (for example, Ace, 2, 3, …) In that case corresponds to a deck of cards, each card having a rank and a suit; and we can think of in terms of each player having in front of them a number of “spots” or “slots”, each labelled by a suit. An injection is then a particular “deal” where one card has been dealt into each of the spots in front of the players. (There may be some cards left over in the deck, but the fact that the function is total means every spot has a card, and the fact that it is injective is encoded in the common-sense idea that a given card cannot be in two spots at once.) For example, the example function from before:
corresponds to the following deal:
Here each column corresponds to one player’s hand, and the rows correspond to suit spots (with the spade spots on top and the heart spots beneath). We have mapped to the ranks A, 2, 3, and mapped T and F to Spades and Hearts respectively. The spades are also highlighted in green, since later we will want to pay particular attention to what is happening with them. You might want to take a moment to convince yourself that the deal above really does correspond to the example function from before.
Of course, doing everything effectively means we are really talking about computation. Doyle and Qiu do talk a bit about computation, but it’s still pretty abstract, in the sort of way that mathematicians talk about computation, so I thought it would be interesting to actually implement the algorithm in Haskell.
The algorithm “works” for infinite sets, but only (as far as I understand) if you consider some notion of transfinite recursion. It still counts as “effective” in math-land, but over here in programming-land I’d like to stick to (finitely) terminating computations, so we will stick to finite sets and .
First, some extensions and imports. Nothing too controversial.
> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE GADTs #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> {-# LANGUAGE KindSignatures #-}
> {-# LANGUAGE RankNTypes #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE StandaloneDeriving #-}
> {-# LANGUAGE TypeOperators #-}
>
> module PanGalacticDivision where
>
> import Control.Arrow (second, (&&&), (***))
> import Data.Char
> import Data.List (find, findIndex, transpose)
> import Data.Maybe
>
> import Diagrams.Prelude hiding (universe, value)
> import Diagrams.Backend.Rasterific.CmdLine
> import Graphics.SVGFonts
We’ll need some standard machinery for type-level natural numbers. Probably all this stuff is in a library somewhere but I couldn’t be bothered to find out. Pointers welcome.
> -- Standard unary natural number type
> data Nat :: * where
> Z :: Nat
> Suc :: Nat -> Nat
>
> type One = Suc Z
> type Two = Suc One
> type Three = Suc Two
> type Four = Suc Three
> type Six = Suc (Suc Four)
> type Eight = Suc (Suc Six)
> type Ten = Suc (Suc Eight)
> type Thirteen = Suc (Suc (Suc Ten))
>
> -- Singleton Nat-indexed natural numbers, to connect value-level and
> -- type-level Nats
> data SNat :: Nat -> * where
> SZ :: SNat Z
> SS :: Natural n => SNat n -> SNat (Suc n)
>
> -- A class for converting type-level nats to value-level ones
> class Natural n where
> toSNat :: SNat n
>
> instance Natural Z where
> toSNat = SZ
>
> instance Natural n => Natural (Suc n) where
> toSNat = SS toSNat
>
> -- A function for turning explicit nat evidence into implicit
> natty :: SNat n -> (Natural n => r) -> r
> natty SZ r = r
> natty (SS n) r = natty n r
>
> -- The usual canonical finite type. Fin n has exactly n
> -- (non-bottom) values.
> data Fin :: Nat -> * where
> FZ :: Fin (Suc n)
> FS :: Fin n -> Fin (Suc n)
>
> finToInt :: Fin n -> Int
> finToInt FZ = 0
> finToInt (FS n) = 1 + finToInt n
>
> deriving instance Eq (Fin n)
Next, a type class to represent finiteness. For our purposes, a type a
is finite if we can explicitly list its elements. For convenience we throw in decidable equality as well, since we will usually need that in conjunction. Of course, we have to be careful: although we can get a list of elements for a finite type, we don’t want to depend on the ordering. We must ensure that the output of the algorithm is independent of the order of elements.^{1} This is in fact true, although somewhat nontrivial to prove formally; I mention some of the intuitive ideas behind the proof below.
While we are at it, we give Finite
instances for Fin n
and for products of finite types.
> class Eq a => Finite a where
> universe :: [a]
>
> instance Natural n => Finite (Fin n) where
> universe = fins toSNat
>
> fins :: SNat n -> [Fin n]
> fins SZ = []
> fins (SS n) = FZ : map FS (fins n)
>
> -- The product of two finite types is finite.
> instance (Finite a, Finite b) => Finite (a,b) where
> universe = [(a,b) | a <- universe, b <- universe]
Now we come to the division algorithm proper. The idea is that panGalacticPred
turns an injection into an injection , and then we use induction on to repeatedly apply panGalacticPred
until we get an injection .
> panGalacticDivision
> :: forall a b n. (Finite a, Eq b)
> => SNat n -> ((a, Fin (Suc n)) -> (b, Fin (Suc n))) -> (a -> b)
In the base case, we are given an injection , so we just pass a unit value in along with the and project out the .
> panGalacticDivision SZ f = \a -> fst (f (a, FZ))
In the inductive case, we call panGalacticPred
and recurse.
> panGalacticDivision (SS n') f = panGalacticDivision n' (panGalacticPred n' f)
And now for the real meat of the algorithm, the panGalacticPred
function. The idea is that we swap outputs around until the function has the property that every output of the form corresponds to an input also of the form . That is, using the card game analogy, every spade in play should be in the leftmost spot (the spades spot) of some player’s hand (some spades can also be in the deck). Then simply dropping the leftmost card in everyone’s hand (and all the spades in the deck) yields a game with no spades. That is, we will have an injection . Taking predecessors everywhere (i.e. “hearts are the new spades”) yields the desired injection .
We need a Finite
constraint on a
so that we can enumerate all possible inputs to the function, and an Eq
constraint on b
so that we can compare functions for extensional equality (we iterate until reaching a fixed point). Note that whether two functions are extensionally equal does not depend on the order in which we enumerate their inputs, so far validating my claim that nothing depends on the order of elements returned by universe
.
> panGalacticPred
> :: (Finite a, Eq b, Natural n)
> => SNat n
> -> ((a, Fin (Suc (Suc n))) -> (b, Fin (Suc (Suc n))))
> -> ((a, Fin (Suc n)) -> (b, Fin (Suc n)))
We construct a function f'
which is related to f
by a series of swaps, and has the property that it only outputs FZ
when given FZ
as an input. So given (a,i)
we can call f'
on (a, FS i)
which is guaranteed to give us something of the form (b, FS j)
. Thus it is safe to strip off the FS
and return (b, j)
(though the Haskell type checker most certainly does not know this, so we just have to tell it to trust us).
> panGalacticPred n f = \(a,i) -> second unFS (f' (a, FS i))
> where
> unFS :: Fin (Suc n) -> Fin n
> unFS FZ = error "impossible!"
> unFS (FS i) = i
To construct f'
we iterate a certain transformation until reaching a fixed point. For finite sets and this is guaranteed to terminate, though it is certainly not obvious from the Haskell code. (Encoding this in Agda so that it is accepted by the termination checker would be a fun (?) exercise.)
One round of the algorithm consists of two phases called “shape up” and “ship out” (to be described shortly).
> oneRound = natty n $ shipOut . shapeUp
>
> -- iterate 'oneRound' beginning with the original function...
> fs = iterate oneRound f
> -- ... and stop when we reach a fixed point.
> f' = fst . head . dropWhile (uncurry (=/=)) $ zip fs (tail fs)
> f1 =/= f2 = all (\x -> f1 x == f2 x) universe
Recall that a “card” is a pair of a value and a suit; we think of as the set of values and as the set of suits.
> type Card v s = (v, s)
>
> value :: Card v s -> v
> value = fst
>
> suit :: Card v s -> s
> suit = snd
Again, there are a number of players (one for each element of ), each of which has a “hand” of cards. A hand has a number of “spots” for cards, each one labelled by a different suit (which may not have any relation to the actual suit of the card in that position).
> type PlayerSpot p s = (p, s)
> type Hand v s = s -> Card v s
A “game” is an injective function from player spots to cards. Of course, the type system is not enforcing injectivity here.
> type Game p v s = PlayerSpot p s -> Card v s
Some utility functions. First, a function to project out the hand of a given player.
> hand :: p -> Game p v s -> Hand v s
> hand p g = \s -> g (p, s)
A function to swap two cards, yielding a bijection on cards.
> swap :: (Eq s, Eq v) => Card v s -> Card v s -> (Card v s -> Card v s)
> swap c1 c2 = f
> where
> f c
> | c == c1 = c2
> | c == c2 = c1
> | otherwise = c
leftmost
finds the leftmost card in a player’s hand which has a given suit.
> leftmost :: Finite s => s -> Hand v s -> Maybe s
> leftmost targetSuit h = find (\s -> suit (h s) == targetSuit) universe
playRound
abstracts out a pattern that is used by both shapeUp
and shipOut
. The first argument is a function which, given a hand, produces a function on cards; that is, based on looking at a single hand, it decides how to swap some cards around.^{2} playRound
then applies that function to every hand, and composes together all the resulting permutations.
Note that playRound
has both Finite s
and Finite p
constraints, so we should think about whether the result depends on the order of elements returned by any call to universe
—I claimed it does not. Finite s
corresponds to suits/spots, which corresponds to in the original problem formulation. explicitly has a canonical ordering, so this is not a problem. The Finite p
constraint, on the face of it, is more problematic. We will have to think carefully about each of the rounds implemented in terms of playRound
and make sure they do not depend on the order of players. Put another way, it should be possible for all the players to take their turn simultaneously.
> playRound :: (Finite s, Finite p, Eq v) => (Hand v s -> Card v s -> Card v s) -> Game p v s -> Game p v s
> playRound withHand g = foldr (.) id swaps . g
> where
> swaps = map (withHand . flip hand g) players
> players = universe
Finally, we can describe the “shape up” and “ship out” phases, beginning with “shape up”. A “bad” card is defined as one having the lowest suit; make sure every hand with any bad cards has one in the leftmost spot (by swapping the leftmost bad card with the card in the leftmost spot, if necessary).
> shapeUp :: (Finite s, Finite p, Eq v) => Game p v s -> Game p v s
> shapeUp = playRound shapeUp1
> where
> badSuit = head universe
> shapeUp1 theHand =
> case leftmost badSuit theHand of
> Nothing -> id
> Just badSpot -> swap (theHand badSuit) (theHand badSpot)
And now for the “ship out” phase. Send any “bad” cards not in the leftmost spot somewhere else, by swapping with a replacement, namely, the card whose suit is the same as the suit of the spot, and whose value is the same as the value of the bad card in the leftmost spot. The point is that bad cards in the leftmost spot are OK, since we will eventually just ignore the leftmost spot. So we have to keep shipping out bad cards not in the leftmost spot until they all end up in the leftmost spot. For some intuition as to why this is guaranteed to terminate, consult Schwartz; note that columns tend to acquire more and more cards that have the same rank as a spade in the top spot (which never moves).
> shipOut :: (Finite s, Finite p, Eq v) => Game p v s -> Game p v s
> shipOut = playRound shipOutHand
> where
> badSuit = head universe
> spots = universe
> shipOutHand theHand = foldr (.) id swaps
> where
> swaps = map (shipOut1 . (theHand &&& id)) (drop 1 spots)
> shipOut1 ((_,s), spot)
> | s == badSuit = swap (theHand spot) (value (theHand badSuit), spot)
> | otherwise = id
And that’s it! Note that both shapeUp
and shipOut
are implemented by composing a bunch of swaps; in fact, in both cases, all the swaps commute, so the order in which they are composed does not matter. (For proof, see Schwartz.) Thus, the result is independent of the order of the players (i.e. the set A
).
Enough code, let’s see an example! This example is taken directly from Doyle and Qiu’s paper, and the diagrams are being generated literally (literately?) by running the code in this blog post. Here’s the starting configuration:
Again, the spades are all highlighted in green. Recall that our goal is to get them all to be in the first row, but we have to do it in a completely deterministic, canonical way. After shaping up, we have:
Notice how the 6, K, 5, A, and 8 of spades have all been swapped to the top of their column. However, there are still spades which are not at the top of their column (in particular the 10, 9, and J) so we are not done yet.
Now, we ship out. For example, the 10 of spades is in the diamonds position in the column with the Ace of spades, so we swap it with the Ace of diamonds. Similarly, we swap the 9 of spades with the Queen of diamonds, and the Jack of spades with the 4 of hearts.
Shaping up does nothing at this point so we ship out again, and then continue to alternate rounds.
In the final deal above, all the spades are at the top of a column, so there is an injection from the set of all non-spade spots to the deck of cards with all spades removed. This example was, I suspect, carefully constructed so that none of the spades get swapped out into the undealt portion of the deck, and so that we end up with only spades in the top row. In general, we might end up with some non-spades also in the top row, but that’s not a problem. The point is that ignoring the top row gets rid of all the spades.
Anyway, I hope to write more about some “practical” examples and about what this has to do with combinatorial species, but this post is long enough already. Doyle and Qiu also describe a “short division” algorithm (the above is “long division”) that I hope to explore as well.
For completeness, here’s the code I used to represent the example game above, and to render all the card diagrams (using diagrams 1.3).
> type Suit = Fin
> type Rank = Fin
> type Player = Fin
>
> readRank :: SNat n -> Char -> Rank n
> readRank n c = fins n !! (fromJust $ findIndex (==c) "A23456789TJQK")
>
> readSuit :: SNat n -> Char -> Suit n
> readSuit (SS _) 'S' = FZ
> readSuit (SS (SS _)) 'H' = FS FZ
> readSuit (SS (SS (SS _))) 'D' = FS (FS FZ)
> readSuit (SS (SS (SS (SS _)))) 'C' = FS (FS (FS FZ))
>
> readGame :: SNat a -> SNat b -> SNat n -> String -> Game (Player a) (Rank b) (Suit n)
> readGame a b n str = \(p, s) -> table !! finToInt p !! finToInt s
> where
> table = transpose . map (map readCard . words) . lines $ str
> readCard [r,s] = (readRank b r, readSuit n s)
>
> -- Example game from Doyle & Qiu
> exampleGameStr :: String
> exampleGameStr = unlines
> [ "4D 6H QD 8D 9H QS 4C AD 6C 4S"
> , "JH AH 9C 8H AS TC TD 5H QC JS"
> , "KC 6S 4H 6D TS 9S JC KD 8S 8C"
> , "5C 5D KS 5S TH JD AC QH 9D KH"
> ]
>
> exampleGame :: Game (Player Ten) (Rank Thirteen) (Suit Four)
> exampleGame = readGame toSNat toSNat toSNat exampleGameStr
>
> suitSymbol :: Suit n -> String
> suitSymbol = (:[]) . ("♠♥♦♣"!!) . finToInt -- Huzzah for Unicode
>
> suitDia :: Suit n -> Diagram B
> suitDia = (suitDias!!) . finToInt
>
> suitDias = map mkSuitDia (fins (toSNat :: SNat Four))
> mkSuitDia s = text' (suitSymbol s) # fc (suitColor s) # lw none
>
> suitColor :: Suit n -> Colour Double
> suitColor n
> | finToInt n `elem` [0,3] = black
> | otherwise = red
>
> rankStr :: Rank n -> String
> rankStr n = rankStr' (finToInt n + 1)
> where
> rankStr' 1 = "A"
> rankStr' i | i <= 10 = show i
> | otherwise = ["JQK" !! (i - 11)]
>
> text' t = stroke (textSVG' (TextOpts lin INSIDE_H KERN False 1 1) t)
>
> renderCard :: (Rank b, Suit n) -> Diagram B
> renderCard (r, s) = mconcat
> [ mirror label
> , cardContent (finToInt r + 1)
> , back
> ]
> where
> cardWidth = 2.25
> cardHeight = 3.5
> cardCorners = 0.1
> mirror d = d d # rotateBy (1/2)
> back = roundedRect cardWidth cardHeight cardCorners # fc white
> # lc (case s of { FZ -> green; _ -> black })
> label = vsep 0.1 [text' (rankStr r), text' (suitSymbol s)]
> # scale 0.6 # fc (suitColor s) # lw none
> # translate ((-0.9) ^& 1.5)
> cardContent n
> | n <= 10 = pips n
> | otherwise = face n # fc (suitColor s) # lw none
> # sized (mkWidth (cardWidth * 0.6))
> pip = suitDia s # scale 1.1
> pips 1 = pip # scale 2
> pips 2 = mirror (pip # up 2)
> pips 3 = pips 2 pip
> pips 4 = mirror (pair pip # up 2)
> pips 5 = pips 4 pip
> pips 6 = mirror (pair pip # up 2) pair pip
> pips 7 = pips 6 pip # up 1
> pips 8 = pips 6 mirror (pip # up 1)
> pips 9 = mirror (pair (pip # up (2/3) pip # up 2)) pip # up (case finToInt s of {1 -> -0.1; 3 -> 0; _ -> 0.1})
> pips 10 = mirror (pair (pip # up (2/3) pip # up 2) pip # up (4/3))
> pips _ = mempty
> up n = translateY (0.5*n)
> pair d = hsep 0.4 [d, d] # centerX
> face 11 = squares # frame 0.1
> face 12 = loopyStar
> face 13 = burst # centerXY
> squares
> = strokeP (mirror (square 1 # translate (0.2 ^& 0.2)))
> # fillRule EvenOdd
> loopyStar
> = regPoly 7 1
> # star (StarSkip 3)
> # pathVertices
> # map (cubicSpline True)
> # mconcat
> # fillRule EvenOdd
> burst
> = [(1,5), (1,-5)] # map r2 # fromOffsets
> # iterateN 13 (rotateBy (-1/13))
> # mconcat # glueLine
> # strokeLoop
>
> renderGame :: (Natural n, Natural a) => Game (Player a) (Rank b) (Suit n) -> Diagram B
> renderGame g = hsep 0.5 $ map (\p -> renderHand p $ hand p g) universe
>
> renderHand :: Natural n => Player a -> Hand (Rank b) (Suit n) -> Diagram B
> renderHand p h = vsep 0.2 $ map (renderCard . h) universe
If we could program in Homotopy Type Theory, we could make this very formal by using the notion of cardinal-finiteness developed in my dissertation (see section 2.4).↩
In practice this function on cards will always be a permutation, though the Haskell type system is not enforcing that at all. An early version of this code used the Iso
type from lens
, but it wasn’t really paying its way.↩
Polynomial Functors Constrained by Regular Expressions
Here’s the 2-minute version: certain operations or restrictions on functors can be described by regular expressions, where the elements of the alphabet correspond to type arguments. The idea is to restrict to only those structures for which an inorder traversal yields a sequence of types matching the regular expression. For example, gives you even-size things; gives you the derivative (the structure has a bunch of values of type , a single hole of type , and then more values of type ), and the dissection.
The punchline is that we show how to use the machinery of semirings, finite automata, and some basic matrix algebra to automatically derive an algebraic description of any functor constrained by any regular expression. This gives a nice unified way to view differentiation and dissection; we also draw some connections to the theory of divided differences.
I’m still open to discussion, suggestions, typo fixes, etc., though at this point they won’t make it into the proceedings. There’s certainly a lot more that could be said or ways this could be extended further.
For starters, here are a few of the major events that have happened in the meantime, that readers of this blog might care about:
Some statistics^{2}:
[I was planning to also make a visualization of my TagTime data showing when I was sleeping, working, or not-working, but putting together the video and this blog post has taken long enough already! Perhaps I’ll get around to it later.]
Overall, I would call the experiment a huge success—although as you can see, I was a full 2.5 hours per day off my target of 13.5 hours of productive work each day. What with eating, showering, making lunch, getting dinner, taking breaks (both intentional breaks as well as slacking off), and a few miscellaneous things I had to take care of like taking the car to get the tire pressure adjusted… it all adds up surprisingly fast. I think this was one of the biggest revelations for me; going into it I thought 3 hours of not-work per day was extremely generous. I now think three hours of not-work per day is probably within reach for me but would be extremely difficult, and would probably require things like planning out meals ahead of time. In any case, 55 hours of actual, focused work is still fantastic.
Some random observations/thoughts:
Having multiple projects to work on was really valuable; when I got tired of working on one thing I could often just switch to something else instead of taking an actual break. I can imagine this might be different if I were working on a big coding project (as most of the other maniac weeks have been). The big project would itself provide multiple different subtasks to work on, but more importantly, coding provides immediate feedback that is really addictive. Code a new feature, and you can actually run the new code! And it does something cool! That it didn’t do before! In contrast, when I write another page of my dissertation I just have… another page of my dissertation. I am, in fact, relatively excited about my dissertation, but it can’t provide that same sort of immediate reinforcing feedback, and it was difficult to keep going at times.
I found that having music playing really helped me get into a state of “flow”. The first few days I would play some album and then it would stop and I wouldn’t think to put on more. Later in the week I would just queue up many hours of music at a time and that worked great.
I was definitely feeling worn out by the end of the week—the last two days in particular, it felt a lot harder to get into a flow. I think I felt so good the first few days that I became overconfident—which is good to keep in mind if I do this again. The evening of 12 August was particularly bad; I just couldn’t focus. It might have been better in the long run to just go home and read a book or something; I’m just not sure how to tell in the moment when I should push through and when it’s better to cut my losses.
Blocking Facebook, turning off email notifications, etc. was really helpful. I did end up allowing myself to check email using my phone (I edited the rules a few hours before I started) and I think it was a good idea—I ended up still needing to communicate with some people, so it was very convenient and not too distracting.
Note there are two places on Tuesday afternoon where you can see the clock jump ahead by an hour or so; of course those are times when I turned off the recording. One corresponded to a time when I needed to read and write some sensitive emails; during the other, I was putting student pictures into an anki deck, and turned off the recording to avoid running afoul of FERPA.
That’s all I can think of for now; questions or comments, of course, are welcome.
Some technical notes (don’t try this at home; see http://expost.padm.us/maniactech for some recommendations on making your own timelapse). To record and create the video I used a homegrown concoction of scrot, streamer, ImageMagick, ffmpeg, with some zsh and Haskell scripts to tie it all together, and using diagrams to generate the clock and tag displays. I took about 3GB worth of raw screenshots, and it takes probably about a half hour to process all of it into a video.↩
These statistics are according to TagTime, i.e. gathered via random sampling, so there is a bit of inherent uncertainty. I leave it as an exercise for the reader to calculate the proper error bars on these times (given that I use a standard ping interval of 45 minutes).↩
Computed as 74/(171 – 9) pings multiplied by 24 hours; 9 pings occurred on Sunday morning which I did not count as part of the maniac week.↩
This is somewhat inflated by Saturday night/Sunday morning, when I both slept in and got a higher-than-average number of pings; the average excluding that night is 6.75 hours, which sounds about right.↩
Over the past year I’ve had several people say things along the lines of, “let me know if you want me to read through your thesis”. I never took them all that seriously (it’s easy to say you are willing to read a 200-page document…), but it never hurts to ask, right?
My thesis defense is scheduled for October 14, and I’m currently undertaking a massive writing/editing push to try to get as much of it wrapped up as I can before classes start on September 4. So, if there’s anyone out there actually interested in reading a draft and giving feedback, now is your chance!
The basic idea of my dissertation is to put combinatorial species and related variants (including a port of the theory to HoTT) in a common categorical framework, and then be able to use them for working with/talking about data types. If you’re brave enough to read it, you’ll find lots of category theory and type theory, and very little code—but I can promise lots of examples and pretty pictures. I’ve tried to make it somewhat self-contained, so it may be a good way to learn a bit of category theory or homotopy type theory, if you’ve been curious to learn more about those topics.
You can find the latest draft here (auto-updated every time I commit); more generally, you can find the git repo here. If you notice any typos or grammatical errors, feel free to open a pull request. For anything more substantial—thoughts on the organization, notes or questions about things you found confusing, suggestions for improvement, pointers to other references—please send me an email (first initial last name at gmail). And finally, please send me any feedback by September 9 at the latest (but the earlier the better). I need to have a final version to my committee by September 23.
Last but not least, if you’re interested to read it but don’t have the time or inclination to provide feedback on a draft, never fear—I’ll post an announcement when the final version is ready for your perusal!
Here are the rules:
And no, I’m not crazy. You (yes, you) could do this too.
In my previous post, we considered the “Axiom of Protoequivalence”—that is, the statement that every fully faithful, essentially surjective functor (i.e. every protoequivalence) is an equivalance—and I claimed that in a traditional setting this is equivalent to the axiom of choice. However, intuitively it feels like AP “ought to” be true, whereas AC must be rejected in constructive logic.
One way around this is by generalizing functors to anafunctors, which were introduced by Makkai (1996). The original paper is difficult going, since it is full of tons of detail, poorly typeset, and can only be downloaded as seven separate postscript files. There is also quite a lot of legitimate depth to the paper, which requires significant categorical sophistication (more than I possess) to fully understand. However, the basic ideas are not too hard to grok, and that’s what I will present here.
It’s important to note at the outset that anafunctors are much more than just a technical device enabling the Axiom of Protoequivalence. More generally, if everything in category theory is supposed to be done “up to isomorphism”, it is a bit suspect that functors have to be defined for objects on the nose. Anafunctors can be seen as a generalization of functors, where each object in the source category is sent not just to a single object, but to an entire isomorphism class of objects, without privileging any particular object in the class. In other words, anafunctors are functors whose “values are specified only up to unique isomorphism”.
Such functors represent a many-to-many relationship between objects of and objects of . Normal functors, as with any function, may of course map multiple objects of to the same object in . The novel aspect is the ability to have a single object of correspond to multiple objects of . The key idea is to add a class of “specifications” which mediate the relationship between objects in the source and target categories, in exactly the same way that a “junction table” must be added to support a many-to-many relationship in a database schema, as illustrated below:
On the left is a many-to-many relation between a set of shapes and a set of numbers. On the right, this relation has been mediated by a “junction table” containing a set of “specifications”—in this case, each specification is simply a pair of a shape and a number—together with two mappings (one-to-many relations) from the specifications to both of the original sets, such that a specification maps to a shape and number if and only if and were originally related.
In particular, an anafunctor is defined as follows.
, , and together define a many-to-many relationship between objects of and objects of . is called a specified value of at if there is some specification such that and , in which case we write . Moreover, is a value of at (not necessarily a specified one) if there is some for which .
The idea now is to impose additional conditions which ensure that “acts like” a regular functor .
Our initial intuition was that an anafunctor should map objects of to isomorphism classes of objects in . This may not be immediately apparent from the definition, but is in fact the case. In particular, the identity morphism maps to isomorphisms between specified values of ; that is, under the action of an anafunctor, an object together with its identity morphism “blow up” into an isomorphism class (aka a clique). To see this, let be two different specifications corresponding to , that is, . Then by preservation of composition and identities, we have , so and constitute an isomorphism between and .
There is an alternative, equivalent definition of anafunctors, which is somewhat less intuitive but usually more convenient to work with: an anafunctor is a category of specifications together with a span of functors where is fully faithful and (strictly) surjective on objects.
Note that in this definition, must be strictly (as opposed to essentially) surjective on objects, that is, for every there is some such that , rather than only requiring . Given this strict surjectivity on objects, it is equivalent to require to be full, as in the definition above, or to be (strictly) surjective on the class of all morphisms.
We are punning on notation a bit here: in the original definition of anafunctor, is a set and and are functions on objects, whereas in this more abstract definition is a category and and are functors. Of course, the two are closely related: given a span of functors , we may simply take the objects of as the class of specifications , and the actions of the functors and on objects as the functions from specifications to objects of and . Conversely, given a class of specifications and functions and , we may construct the category with and with morphisms in acting as morphisms in . From to , we construct the functor given by on objects and the identity on morphisms, and the other functor maps in to in .
Every functor can be trivially turned into an anafunctor . Anafunctors also compose. Given compatible anafunctors and , consider the action of their composite on objects: each object of may map to multiple objects of , via objects of . Each such mapping corresponds to a zig-zag path . In order to specify such a path it suffices to give the pair , which determines , , and . Note, however, that not every pair in corresponds to a valid path, but only those which agree on the middle object . Thus, we may take as the set of specifications for the composite , with and . On morphisms, . It is not hard to check that this satisfies the anafunctor laws.
If you know what a pullback is, note that the same thing can also be defined at a higher level in terms of spans. , the category of all (small) categories, is complete, and in particular has pullbacks, so we may construct a new anafunctor from to by taking a pullback of and and then composing appropriately.
One can go on to define ananatural transformations between anafunctors, and show that together these constitute a -category which is analogous to the usual -category of (small) categories, functors, and natural transformations; in particular, there is a fully faithful embedding of into , which moreover is an equivalence if AC holds.
To work in category theory based on set theory and classical logic, while avoiding AC, one is therefore justified in “mixing and matching” functors and anafunctors as convenient, but discussing them all as if they were regular functors (except when defining a particular anafunctor). Such usage can be formalized by turning everything into an anafunctor, and translating functor operations and properties into corresponding operations and properties of anafunctors.
However, as I will argue in some future posts, there is a better solution, which is to throw out set theory as a foundation of category theory and start over with homotopy type theory. In that case, thanks to a generalized notion of equality, regular functors act like anafunctors, and in particular AP holds.
Makkai, Michael. 1996. “Avoiding the Axiom of Choice in General Category Theory.” Journal of Pure and Applied Algebra 108 (2). Elsevier: 109–73.
In my previous post, I explained one place where the axiom of choice often shows up in category theory, namely, when defining certain functors whose action on objects is specified only up to unique isomorphism. In this post, I’ll explain another place AC shows up, when talking about equivalence of categories. (Actually, as we’ll see, it’s really the same underlying issue, of defining a functor defined only up to unique isomorphism; this is just a particularly important instantiation of that issue.)
When are two categories “the same”? In traditional category theory, founded on set theory, there are quite a few different definitions of “sameness” for categories. Ultimately, this comes down to the fact that set theory does not make a very good foundation for category theory! There are lots of different ideas of equivalence, and they often do not correspond to the underlying equality on sets, so one must carefully pick and choose which notions of equality to use in which situations (and some choices might be better than others!). Every concept, it seems, comes with “strict” and “weak” variants, and often many others besides. Maintaining the principle of equivalence requires hard work and vigilence.
As an example, consider the following definition, our first candidate for the definition of “sameness” of categories:
Two categories and are isomorphic if there are functors and such that and .
Seems pretty straightforward, right? Well, this is the right idea in general, but it is subtly flawed. In fact, it is somewhat “evil”, in that it talks about equality of functors ( and must be equal to the identity). However, two functors and can be isomorphic without being equal, if there is a natural isomorphism between them—that is, a pair of natural transformations and such that and are both equal to the identity natural transformation.^{1} For example, consider the Haskell functors given by
data Rose a = Node a [Rose a]
data Fork a = Leaf a | Fork (Fork a) (Fork a)
These are obviously not equal, but they are isomorphic, in the sense that there are natural transformations (i.e. polymorphic functions) rose2fork :: forall a. Rose a -> Fork a
and fork2rose :: forall a. Fork a -> Rose a
such that rose2fork . fork2rose === id
and fork2rose . rose2fork === id
(showing this is left as an exercise for the interested reader).
Here, then, is a better definition:
Categories and are equivalent if there are functors and which are inverse up to natural isomorphism, that is, there are natural isomorphisms and .
So the compositions of the functors and do not literally have to be the identity functor, but only (naturally) isomorphic to it. This does turn out to be a well-behaved notion of sameness for categories (although you’ll have to take my word for it).
The story doesn’t end here, however. In set theory, a function is a bijection—that is, an isomorphism of sets—if and only if it is both injective and surjective. By analogy, one might wonder what properties a functor must have in order to be one half of an equivalence. This leads to the following definition:
is proto-equivalent^{2} to if there is a functor which is full and faithful (i.e., a bijection on each hom-set) as well as essentially surjective, that is, for every object there exists some object such that .
Intuitively, this says that “embeds” an entire copy of into (that’s the “full and faithful” part), and that every object of which is not directly in the image of is isomorphic to one that is. So every object of is “included” in the image of , at least up to isomorphism (which, remember, is supposed to be all that matters).
So, are equivalence and protoequivalence the same thing? In one direction, it is not too hard to show that every equivalence is a protoequivalence: if and are inverse-up-to-natural-isomorphism, then they must be fully faithful and essentially surjective. It would be nice if the converse were also true: in that case, in order to prove two categories equivalent, it would suffice to construct a single functor from one to the other, and show that has the requisite properties. This often ends up being more convenient than explicitly constructing two functors and showing they are inverse. However, it turns out that the converse is provable only if one accepts the axiom of choice!
To get an intuitive sense for why this is, suppose is fully faithful and essentially surjective. To construct an equivalence between and , we must define a functor and show it is inverse to (up to natural isomorphism). However, to define we must give its action on each object , that is, we must exhibit a function . We know that for each there exists some object such that . That is,
is a collection of non-empty sets. However, in a non-constructive logic, knowing these sets are nonempty does not actually give us any objects! Instead, we have to use the axiom of choice, which gives us a choice function , and we can use this function as the object mapping of the functor .
So AC is required to prove that every protoequivalence is an equivalence. In fact, the association goes deeper yet: it turns out that the statement “every protoequivalence is an equivalence” (let’s call this the Axiom of Protoequivalence, or AP for short) not only requires AC, but is equivalent to it—that is, you can also derive AC given AP as an axiom!
On purely intuitive grounds, however, I would wager that to (almost?) anyone with sufficient category theory experience, it “feels” like AP “ought to be” true. If there is a full, faithful, and essentially surjective functor , then and “ought to be” equivalent. The particular choice of functor “doesn’t matter”, since it makes no difference up to isomorphism. On the other hand, we certainly don’t want to accept the axiom of choice. This puts us in the very awkward and inconsistent position of having two logically equivalent statements which we want to respectively affirm and reject. A fine pickle indeed! What to do?
There are four options (that I know of, at least):
This is a perfectly sensible and workable approach. It’s important to highlight, therefore, that the “problem” is in some sense more a philosophical problem than a technical one. One can perfectly well adopt the above solution and continue to do category theory; it just may not be the “nicest” (a philosophical rather than technical notion!) way to do it.
We can therefore also consider some more creative solutions!
In a classical setting, one can avoid AC and affirm (an analogue of) AP by generalizing the notion of functor to that of anafunctor (Makkai 1996). Essentially, an anafunctor is a functor “defined only up to unique isomorphism”. It turns out that the appropriate analogue of AP, where “functor” has been replaced by “anafunctor”, is indeed true—and neither requires nor implies AC. Anafunctors “act like” functors in a sufficiently strong sense that one can simply do category theory using anafunctors in place of functors. However, one also has to replace natural transformations with “ananatural transformations”, etc., and it quickly gets rather fiddly.
In a constructive setting, a witness of essential surjectivity is necessarily a function which gives an actual witness , along with a proof that , for each . In other words, a constructive witness of essential surjectivity is already a “choice function”, and an inverse functor can be defined directly, with no need to invoke AC and no need for anafunctors. So in constructive logic, AP is simply true. However, this version of “essential surjectivity” is rather strong, in that it forces you to make choices you might prefer not to make: for each there might be many isomorphic to choose from, with no “canonical” choice, and it is annoying (again, a philosophical rather than technical consideration!) to be forced to choose one.
Instead of generalizing functors, a more direct solution is to generalize the notion of equality. After all, what really seems to be at the heart of all these problems is differing notions of equality (i.e. equality of sets vs isomorphism vs equivalence…). This is precisely what is done in homotopy type theory (Univalent Foundations Program 2013).^{3} It turns out that if one builds up suitable notions of category theory on top of HoTT instead of set theory, then (a) AP is true, (b) without the need for AC, (c) even with a weaker version of essential surjectivity that corresponds more closely to essential surjectivity in classical logic.^{4} This is explained in Chapter 9 of the HoTT book.
I plan to continue writing about these things in upcoming posts, particularly items (2) and (4) above. (If you haven’t caught on by now, I’m essentially blogging parts of my dissertation; we’ll see how far I get before graduating!) In the meantime, feedback and discussion are very welcome!
Makkai, Michael. 1996. “Avoiding the Axiom of Choice in General Category Theory.” Journal of Pure and Applied Algebra 108 (2). Elsevier: 109–73.
Univalent Foundations Program, The. 2013. Homotopy Type Theory: Univalent Foundations of Mathematics. Institute for Advanced Study: http://homotopytypetheory.org/book.
The astute reader may well ask: but how do we know this is a non-evil definition of isomorphism between functors? Is it turtles all the way down (up)? This is a subtle point, but it turns out that it is not evil to talk about equality of natural transformations, since for the usual notion of category there is no higher structure after natural transformations, i.e. no nontrivial morphisms (and hence no nontrivial isomorphisms) between natural transformations. (However, you can have turtles all the way up if you really want.)↩
I made this term up, since there is no term in standard use: of course, if you accept AC, there is no need for a separate term at all!↩
As a historical note, it seems that the original work on anafunctors is part of the same intellectual thread that led to the development of HoTT.↩
That is, using propositional truncation to encode the classical notion of “there exists”.↩
In category theory, one is typically interested in specifying objects only up to unique isomorphism. In fact, definitions which make use of actual equality on objects are sometimes referred to (half-jokingly) as evil. More positively, the principle of equivalence states that properties of mathematical structures should be invariant under equivalence. This principle leads naturally to speaking of “the” object having some property, when in fact there may be many objects with the given property, but all such objects are uniquely isomorphic; this cannot cause confusion if the principle of equivalence is in effect.
This phenomenon should be familiar to anyone who has seen simple universal constructions such as terminal objects or categorical products. For example, an object is called if there is a unique morphism from each object . In general, there may be many objects satisfying this criterion. For example, in , the category of sets and functions, every singleton set is terminal: there is always a unique function from any set to a singleton set , namely, the function that sends each element of to . However, it is not hard to show that any two terminal objects must be uniquely isomorphic^{1}. Thus it “does not matter” which terminal object we use—they all have the same properties, as long as we don’t do anything “evil”—and one therefore speaks of “the” terminal object of . As another example, a product of two objects is a diagram with the universal property that any other with morphisms to and uniquely factors through . Again, there may be multiple such products, but they are all uniquely isomorphic, and one speaks of “the” product .
Note that in some cases, there may be a canonical choice among isomorphic objects. For example, this is the case with products in , where we may always pick the Cartesian product as a canonical product of and (even though there are also other products, such as ). In such cases use of “the”, as in “the product of and ”, is even more strongly justified, since we may take it to mean “the canonical product of and ”. However, in many cases (for example, with terminal objects in ), there is no canonical choice, and “the terminal object” simply means something like “some terminal object, it doesn’t matter which”.
Beneath this seemingly innocuous use of “the” (often referred to as generalized “the”), however, lurks the axiom of choice! For example, if a category has all products, we can define a functor ^{2} which picks out “the” product of any two objects and —indeed, may be taken as the definition of the product of and . But how is to be defined? Consider , where denotes the set of all possible products of and , i.e. all suitable diagrams in . Since has all products, this is a collection of nonempty sets; therefore we may invoke AC to obtain a choice function, which is precisely , the action of on objects. The action of on morphisms may then be defined straightforwardly.
The axiom of choice really is necessary to construct : as has already been noted, there is, in general, no way to make some canonical choice of object from each equivalence class. On the other hand, this seems like a fairly “benign” use of AC. If we have a collection of equivalence classes, where the elements in each class are all uniquely isomorphic, then using AC to pick one representative from each really “does not matter”, in the sense that we cannot tell the difference between different choices (as long as we refrain from evil). Unfortunately, even such “benign” use of AC still poses a problem for computation.
If you have never seen this proof before, I highly recommend working it out for yourself. Given two terminal objects and , what morphisms must exist between them? What can you say about their composition? You will need to use both the existence and uniqueness of morphisms to terminal objects.↩
Note that we have made use here of “the” product category —fortunately , like , has a suitably canonical notion of products.↩
The (in)famous Axiom of Choice (hereafter, AC) can be formulated in a number of equivalent ways. Perhaps the most well-known is:
Given a family of sets , an element of their Cartesian product is some -indexed tuple where for each . Such a tuple can be thought of as a function (called a choice function) which picks out some particular from each .
We can express this in type theory as follows. First, we assume we have some type which indexes the collection of sets; that is, there will be one set for each value of type . Given some type , we can define a subset of the values of type using a predicate, that is, a function (where denotes the universe of types). For some particular , applying to yields a type, which can be thought of as the type of evidence that is in the subset ; is in the subset if and only if is inhabited. An -indexed collection of subsets of can then be expressed as a function . In particular, is the type of evidence that is in the subset indexed by . (Note that we could also make into a family of types indexed by , that is, , but it wouldn’t add anything to this discussion.)
A set is nonempty if it has at least one element, so the fact that all the sets in are nonempty can be modeled by a dependent function which yields an element of for each index, along with a proof that it is contained in the corresponding subset.
(Note I’m using the notation for dependent function types instead of , and for dependent pairs instead of .) An element of the Cartesian product of can be expressed as a function that picks out an element for each (the choice function), together with a proof that the chosen elements are in the appropriate sets:
Putting these together, apparently the axiom of choice can be modelled by the type
Converting back to and notation and squinting actually gives some good insight into what is going on here:
Essentially, this says that we can “turn a (dependent) product of sums into a (dependent) sum of products”. This sounds a lot like distributivity, and indeed, the strange thing is that this is simply true: implementing a function of this type is a simple exercise! If you aren’t familiar with dependent type theory, you can get the intuitive idea by implementing a non-dependent Haskell analogue, namely something of type
(i -> (a,c)) -> (i -> a, i -> c)
.
Not too hard, is it? (The implementation of the dependent version is essentially the same; it’s only the types that get more complicated, not the implementation.) So what’s going on here? Why is AC so controversial if it is simply true in type theory?
This is not the axiom of choice you’re looking for. — Obi-Wan Funobi
The problem, it turns out, is that we’ve modelled the axiom of choice improperly, and it all boils down to how non-empty is defined. When a mathematician says “ is non-empty”, they typically don’t actually mean “…and here is an element of to prove it”; instead, they literally mean “it is not the case that is empty”, that is, assuming is empty leads to a contradiction. (Actually, it is a bit more subtle yet, but this is a good first approximation.) In classical logic, these viewpoints are equivalent; in constructive logic, however, they are very different! In constructive logic, knowing that it is a contradiction for to be empty does not actually help you find an element of . We modelled the statement “this collection of non-empty sets” essentially by saying “here is an element in each set”, but in constructive logic that is a much stronger statement than simply saying that each set is not empty.
(I should mention at this point that when working in HoTT, the best way to model what classical mathematicians mean when they say “ is non-empty” is probably not with a negation, but instead with the propositional truncation of the statement that contains an element. Explaining this would take us too far afield; if you’re interested, you can find details in Chapter 3 of the HoTT book, where all of this and much more is explained in great detail.)
From this point of view, we can see why the “AC” in the previous section was easy to implement: it had to produce a function choosing a bunch of elements, but it was given a bunch of elements to start! All it had to do was shuffle them around a bit. The “real” AC, on the other hand, has a much harder job: it is told some sets are non-empty, but without any actual elements being mentioned, and it then has to manufacture a bunch of elements out of thin air. This is why it has to be taken as an axiom; we can also see that it doesn’t fit very well in a constructive/computational context. Although it is logically consistent to assume it as an axiom, it has no computational interpretation, so anything we define using it will just get stuck operationally.
So, we’ll just avoid using AC. No problem, right?
The problem is that AC is really sneaky. It tends to show up all over the place, but disguised so that you don’t even realize it’s there. You really have to train yourself to think in a fundamentally constructive way before you start to notice the places where it is used. Next time I’ll explain one place it shows up a lot, namely, when defining functors in category theory (though thankfully, not when defining Functor
instances in Haskell).