Thoughts on grade inflation, part I: is grade inflation bad?

Grade inflation.  It’s terrible, horrible, no good, very bad, and ruining everything.  …right?

Well… I’m not so sure.  What I do know is that the typical conversation around grade inflation frustrates me. At best, it often leaves many important assumptions unstated and unquestioned.  Is grade inflation really bad? If so, why? What are the underlying assumptions and values that drive us to think of it in one way or another? At worst, the conversation is completely at the wrong level.  Grade inflation is actually a symptom pointing at a much deeper question, one that gets at the heart of education and pedagogy: what do grades mean? Or, put another way, what do grades measure?

This will be a two-part series.  In this first post, I consider the first question: is grade inflation bad?  In most conversations I have been a part of, this is taken as given, but I think it deserves more careful thought. I don’t know of any reasons to think that grade inflation is good, but I also don’t buy many of the common arguments (often implied rather than explicitly stated) as to why it is bad; in this post I consider three common ones.

What is grade inflation?

Just to make sure everyone is on the same page: by grade inflation I mean the phenomenon where average student grades are increasing over time, that is, the average student now receives higher grades than the average student of n years ago. (You could also think of it as the average value of a given grade going down over time.) This phenomenon is widespread in the US. I am only really familiar with the educational system in the US, so this post will of necessity be rather US-centric; I would be interested to hear about similarities and differences with other countries.

Let’s now consider some common arguments as to why grade inflation is bad.

The “Back in my day…” argument

This is not so much an “argument” as an attitude, and it goes something like this: “Back in MY day, a C really meant a C! These ungrateful, entitled young whippersnappers don’t understand the true value of grades…”

This is a caricature, of course, but I have definitely encountered variants of this attitude. This makes about as much sense to me as complaining “back in MY day, a dollar was really worth a dollar! And now my daughter is asking me for twenty dollars to go to a movie with her friends. TWENTY DOLLARS! These ungrateful, entitled young whippersnappers don’t understand the true value of money…” Nonsense, of course they do.  It just costs $20 to go to a movie these days. A dollar is worth what it is now worth; a C is worth what it is now worth. Get over it.  It’s not that students don’t understand “the true value of grades”, it’s just that the value of grades is different than it used to be.

There are a couple important caveats here: first, one can, of course, argue about what the value of a C ought to be, based on some ideas or assumptions about what grades (should) mean. I will talk about this at length in my next post. But you cannot blame students for not understanding your idea of what grades ought to mean! Second, it is certainly possible (even likely) that student attitudes towards grades have changed, and one can (and I do!) complain about those attitudes as compared to student attitudes in the past. But that is different than claiming that students don’t understand the value of grades.

If I may hazard a guess, I think what this often boils down to is that people blame grade inflation on student attitudes of entitlement. As a potential contributing factor to grade inflation (and insofar as we would like to teach students different attitudes), that is certainly worth thinking about. But grade inflation potentially being caused by something one dislikes is not an argument that grade inflation itself is bad.

The compression argument

Of course, there’s one important difference between money and grades: amounts of money have no upper limit, whereas grades are capped at A+.  This brings us to what I often hear put forth as the biggest argument against grade inflation, that it compresses grades into a narrower and narrower band, squeezed from above by that highest possible A+.  The problem with this, some argue, is that grade compression causes information to be lost.  The “signal” of grades becomes noisier, and it becomes harder for, say, employers and grad schools to be able to distinguish between different students.

My first, more cynical reaction is this: well, cry me a river for those poor, poor employers and grad schools, who will now have to assess students on real accomplishments, skills, and personal qualities, or (more likely) find some other arbitrary measurement to use. Do we really think grades are such a high-quality signal in the first place? Do they really measure something important and intrinsic about a student? (More on this in my next post.)  If the signal is noisy or arbitrary in the first place then compressing it really doesn’t matter that much.

Less cynically, let’s suppose the grade-signal really is that high-quality and important, and we are actually worried about the possibility of losing information. Consider the extreme situation, where grade inflation has progressed to such a degree that professors only give one of two possible grades: A (“outstandingly excellent”) or A+ (“superlatively superb”). An A- is so insultingly low that professors never give it (for fear of lawsuits, perhaps); for simplicity’s sake let’s suppose that no one ever fails, either. In this hypothetical scenario, at an institution like Williams where students take 32 courses, there are only 33 possible GPAs: you could get 32 A+’s, or one A and 31 A+’s, or two A’s and 30 A+’s… all the way down to getting all A’s (“straight-A student” means something rather different in this imaginary universe!).

But here’s the thing: I think 33 different GPAs would still be enough! I honestly don’t think companies or grad schools can meaningfully care about distinctions finer than having 33 different buckets of students. (If you think differently, I’d love to hear your argument.) If student GPAs are normally distributed, this even means that the top few buckets have much less than 1/33 of all the students. So if the top grad schools and companies want to only consider the top 1% of all students (or whatever), they can just look at the top bucket or two. You might say this is unfair for the students, but really, I can’t see how this would be any more or less fair than the current system.

Of course, under this hypothetical two-grade system, GPAs might not be normally distributed. For one thing, if grade inflation kept going, the distribution might become more and more skewed to the right, until, for example, half of all students were getting straight A+’s, or, in the theoretical limit, all students get only A+’s. But I really don’t think this would actually happen; I think you would see some regulating effects kick in far before this theoretical limit was reached. Professors would not actually be willing to give all A+’s (or even, for that matter, all A’s and A+’s).

The GPAs could also be very bimodal, if, for example, students are extremely consistent: a student who consistently scores in the top 40% of every class would get the same grades (all A+’s) as a student who consistently scores in the top 10%. However, I doubt this is how it would work (as any professor knows, “consistent” and “student” are a rare pairing). It would be interesting to actually work out what GPA distributions would result from various assumptions about student behavior.

The moving target argument

The final argument against grade inflation that I sometimes hear goes like this: the problem is not so much that the average GPA is going up but simply that it is moving at all, which makes it harder for grad schools and employers to know how to calibrate their interpretations. But I don’t really buy this one either. The value of money is moving too, and yes, in some grand sense I suppose that makes it slightly harder for people to figure out how much things are worth. But somehow, everyone seems to manage just fine. I think employers and grad schools will manage just fine too. I don’t think GPAs are changing anywhere near fast enough for it to make much difference. And in any case, most of the time, the only thing employers and grad schools really care about is comparing the GPAs of students who graduated around the same time, in which case the absolute average GPA doesn’t matter at all. (One can make an argument about the difficulties caused by different schools having different average GPAs, but that is always going to be an issue, grade inflation or no.)

In the end, then, I am not so sure that grade inflation per se is such a terrible thing. However, it is well worth pondering the causes of grade inflation, and the deeper questions it leads to: what are grades? Why do we give them? What purposes do they serve, and what do they measure? I’ll take up these questions in a subsequent post.

Posted in teaching | Tagged , , , | 3 Comments

Pan-Galactic Division in Haskell

Summary: given an injective function A \times N \hookrightarrow B \times N, it is possible to constructively “divide by N” to obtain an injection A \hookrightarrow B, as shown recently by Peter Doyle and Cecil Qiu and expounded by Richard Schwartz. Their algorithm is nontrivial to come up with—this had been a longstanding open question—but it’s not too difficult to explain. I exhibit some Haskell code implementing the algorithm, and show some examples.

Introduction: division by two

Suppose someone hands you the following:

  • A Haskell function f :: (A, Bool) -> (B, Bool), where A and B are abstract types (i.e. their constructors are not exported, and you have no other functions whose types mention A or B).

  • A promise that the function f is injective, that is, no two values of (A, Bool) map to the same (B, Bool) value. (Thus (B, Bool) must contain at least as many inhabitants as (A, Bool).)

  • A list as :: [A], with a promise that it contains every value of type A exactly once, at a finite position.

Can you explicitly produce an injective function f' :: A -> B? Moreover, your answer should not depend on the order of elements in as.

It really seems like this ought to be possible. After all, if (B, Bool) has at least as many inhabitants as (A, Bool), then surely B must have at least as many inhabitants as A. But it is not enough to reason merely that some injection must exist; we have to actually construct one. This, it turns out, is tricky. As a first attempt, we might try f' a = fst (f (a, True)). That is certainly a function of type A -> B, but there is no guarantee that it is injective. There could be a1, a2 :: A which both map to the same b, that is, one maps to (b, False) and the other to (b, True). The picture below illustrates such a situation: (a1, True) and (a2, True) both map to b2. So the function f may be injective overall, but we can’t say much about f restricted to a particular Bool value.

9577e0a81dc50817

The requirement that the answer not depend on the order of as also makes things difficult. (Over in math-land, depending on a particular ordering of the elements in as would amount to the well-ordering principle, which is equivalent to the axiom of choice, which in turn implies the law of excluded middle—and as we all know, every time someone uses the law of excluded middle, a puppy dies. …I feel like I’m in one of those DirecTV commercials. “Don’t let a puppy die. Ignore the order of elements in as.”) Anyway, making use of the order of values in as, we could do something like the following:

  • For each a :: A:
    • Look at the B values generated by f (a,True) and f (a,False). (Note that there might only be one distinct such B value).
    • If neither B value has been used so far, pick the one that corresponds to (a,True), and add the other one to a queue of available B values.
    • If one is used and one unused, pick the unused one.
    • If both are used, pick the next available B value from the queue.

It is not too hard I couldn’t be bothered to show that this will always successfully result in a total function A -> B, which is injective by construction. (One has to show that there will always be an available B value in the queue when you need it.) The only problem is that the particular function we get depends on the order in which we iterate through the A values. The above example illustrates this as well: if the A values are listed in the order [a_1, a_2], then we first choose a_1 \mapsto b_2, and then a_2 \mapsto b_3. If they are listed in the other order, we end up with a_2 \mapsto b_2 and a_1 \mapsto b_1. Whichever value comes first “steals” b_2, and then the other one takes whatever is left. We’d like to avoid this sort of dependence on order. That is, we want a well-defined algorithm which will yield a total, injective function A -> B, which is canonical in the sense that the algorithm yields the same function given any permutation of as.

It is possible—you might enjoy puzzling over this a bit before reading on!

Division by N

The above example is a somewhat special case. More generally, let N = \{0, \dots, n-1\} denote a canonical finite set of size n, and let A and B be arbitrary sets. Then, given an injection f : A \times N \hookrightarrow B \times N, is it possible to effectively (that is, without excluded middle or the axiom of choice) compute an injection A \hookrightarrow B?

Translating down to the world of numbers representing set cardinalities—natural numbers if A and B are finite, or cardinal numbers in general—this just says that if an \leq bn then a \leq b. This statement about numbers is obviously true, so it would be nice if we could say something similar about sets, so that this fact about numbers and inequalities can be seen as just a “shadow” of a more general theorem about sets and injections.

As hinted in the introduction, the interesting part of this problem is really the word “effectively”. Using the Axiom of Choice/Law of Excluded Middle makes the problem a lot easier, but either fails to yield an actual function that we can compute with, instead merely guaranteeing the existence of such a function, or gives us a function that depends on a particular ordering of A.

Apparently this has been a longstanding open question, recently answered in the affirmative by Peter Doyle and Cecil Qiu in their paper Division By Four. It’s a really great paper: they give some fascinating historical context for the problem, and explain their algorithm (which is conceptually not all that difficult) using an intuitive analogy to a card game with certain rules. (It is not a “game” in the usual sense of having winners and losers, but really just an algorithm implemented with “players” and “cards”. In fact, you could get some friends together and actually perform this algorithm in parallel (if you have sufficiently nerdy friends).) Richard Schwartz’s companion article is also great fun and easy to follow (you should read it first).

A Game of Thrones Cards

Here’s a quick introduction to the way Doyle, Qiu, and Schwartz use a card game to formulate their algorithm. (Porting this framework to use “thrones” and “claimants” instead of “spots” and “cards” is left as an exercise to the reader.)

The finite set N is to be thought of as a set of suits. The set A will correspond to a set of players, and B to a set of ranks or values (for example, Ace, 2, 3, …) In that case B \times N corresponds to a deck of cards, each card having a rank and a suit; and we can think of A \times N in terms of each player having in front of them a number of “spots” or “slots”, each labelled by a suit. An injection A \times N \hookrightarrow B \times N is then a particular “deal” where one card has been dealt into each of the spots in front of the players. (There may be some cards left over in the deck, but the fact that the function is total means every spot has a card, and the fact that it is injective is encoded in the common-sense idea that a given card cannot be in two spots at once.) For example, the example function from before:

9577e0a81dc50817

corresponds to the following deal:

88e26b8c78506653

Here each column corresponds to one player’s hand, and the rows correspond to suit spots (with the spade spots on top and the heart spots beneath). We have mapped \{b_1, b_2, b_3\} to the ranks A, 2, 3, and mapped T and F to Spades and Hearts respectively. The spades are also highlighted in green, since later we will want to pay particular attention to what is happening with them. You might want to take a moment to convince yourself that the deal above really does correspond to the example function from before.

A Haskell implementation

Of course, doing everything effectively means we are really talking about computation. Doyle and Qiu do talk a bit about computation, but it’s still pretty abstract, in the sort of way that mathematicians talk about computation, so I thought it would be interesting to actually implement the algorithm in Haskell.

The algorithm “works” for infinite sets, but only (as far as I understand) if you consider some notion of transfinite recursion. It still counts as “effective” in math-land, but over here in programming-land I’d like to stick to (finitely) terminating computations, so we will stick to finite sets A and B.

First, some extensions and imports. Nothing too controversial.

> {-# LANGUAGE DataKinds                  #-}
> {-# LANGUAGE GADTs                      #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving #-}
> {-# LANGUAGE KindSignatures             #-}
> {-# LANGUAGE RankNTypes                 #-}
> {-# LANGUAGE ScopedTypeVariables        #-}
> {-# LANGUAGE StandaloneDeriving         #-}
> {-# LANGUAGE TypeOperators              #-}
> 
> module PanGalacticDivision where
> 
> import           Control.Arrow (second, (&&&), (***))
> import           Data.Char
> import           Data.List     (find, findIndex, transpose)
> import           Data.Maybe
> 
> import           Diagrams.Prelude hiding (universe, value)
> import           Diagrams.Backend.Rasterific.CmdLine
> import           Graphics.SVGFonts

We’ll need some standard machinery for type-level natural numbers. Probably all this stuff is in a library somewhere but I couldn’t be bothered to find out. Pointers welcome.

> -- Standard unary natural number type
> data Nat :: * where
>   Z :: Nat
>   Suc :: Nat -> Nat
> 
> type One = Suc Z
> type Two = Suc One
> type Three = Suc Two
> type Four = Suc Three
> type Six = Suc (Suc Four)
> type Eight = Suc (Suc Six)
> type Ten = Suc (Suc Eight)
> type Thirteen = Suc (Suc (Suc Ten))
> 
> -- Singleton Nat-indexed natural numbers, to connect value-level and
> -- type-level Nats
> data SNat :: Nat -> * where
>   SZ :: SNat Z
>   SS :: Natural n => SNat n -> SNat (Suc n)
> 
> -- A class for converting type-level nats to value-level ones
> class Natural n where
>   toSNat :: SNat n
> 
> instance Natural Z where
>   toSNat = SZ
> 
> instance Natural n => Natural (Suc n) where
>   toSNat = SS toSNat
> 
> -- A function for turning explicit nat evidence into implicit
> natty :: SNat n -> (Natural n => r) -> r
> natty SZ r     = r
> natty (SS n) r = natty n r
> 
> -- The usual canonical finite type.  Fin n has exactly n
> -- (non-bottom) values.
> data Fin :: Nat -> * where
>   FZ :: Fin (Suc n)
>   FS :: Fin n -> Fin (Suc n)
> 
> finToInt :: Fin n -> Int
> finToInt FZ     = 0
> finToInt (FS n) = 1 + finToInt n
> 
> deriving instance Eq (Fin n)

Finiteness

Next, a type class to represent finiteness. For our purposes, a type a is finite if we can explicitly list its elements. For convenience we throw in decidable equality as well, since we will usually need that in conjunction. Of course, we have to be careful: although we can get a list of elements for a finite type, we don’t want to depend on the ordering. We must ensure that the output of the algorithm is independent of the order of elements.1 This is in fact true, although somewhat nontrivial to prove formally; I mention some of the intuitive ideas behind the proof below.

While we are at it, we give Finite instances for Fin n and for products of finite types.

> class Eq a => Finite a where
>   universe :: [a]
> 
> instance Natural n => Finite (Fin n) where
>   universe = fins toSNat
> 
> fins :: SNat n -> [Fin n]
> fins SZ     = []
> fins (SS n) = FZ : map FS (fins n)
> 
> -- The product of two finite types is finite.
> instance (Finite a, Finite b) => Finite (a,b) where
>   universe = [(a,b) | a <- universe, b <- universe]

Division, inductively

Now we come to the division algorithm proper. The idea is that panGalacticPred turns an injection A \times N \hookrightarrow B \times N into an injection A \times (N-1) \hookrightarrow B \times (N-1), and then we use induction on N to repeatedly apply panGalacticPred until we get an injection A \times 1 \hookrightarrow B \times 1.

> panGalacticDivision
>   :: forall a b n. (Finite a, Eq b)
>   => SNat n -> ((a, Fin (Suc n)) -> (b, Fin (Suc n))) -> (a -> b)

In the base case, we are given an injection A \times 1 \hookrightarrow B \times 1, so we just pass a unit value in along with the A and project out the B.

> panGalacticDivision SZ f = \a -> fst (f (a, FZ))

In the inductive case, we call panGalacticPred and recurse.

> panGalacticDivision (SS n') f = panGalacticDivision n' (panGalacticPred n' f)

Pan-Galactic Predecessor

And now for the real meat of the algorithm, the panGalacticPred function. The idea is that we swap outputs around until the function has the property that every output of the form (b,0) corresponds to an input also of the form (a,0). That is, using the card game analogy, every spade in play should be in the leftmost spot (the spades spot) of some player’s hand (some spades can also be in the deck). Then simply dropping the leftmost card in everyone’s hand (and all the spades in the deck) yields a game with no spades. That is, we will have an injection A \times \{1, \dots, n-1\} \hookrightarrow B \times \{1, \dots, n-1\}. Taking predecessors everywhere (i.e. “hearts are the new spades”) yields the desired injection A \times (N-1) \hookrightarrow B \times (N-1).

We need a Finite constraint on a so that we can enumerate all possible inputs to the function, and an Eq constraint on b so that we can compare functions for extensional equality (we iterate until reaching a fixed point). Note that whether two functions are extensionally equal does not depend on the order in which we enumerate their inputs, so far validating my claim that nothing depends on the order of elements returned by universe.

> panGalacticPred
>   :: (Finite a, Eq b, Natural n)
>   => SNat n
>   -> ((a, Fin (Suc (Suc n))) -> (b, Fin (Suc (Suc n))))
>   -> ((a, Fin (Suc n)) -> (b, Fin (Suc n)))

We construct a function f' which is related to f by a series of swaps, and has the property that it only outputs FZ when given FZ as an input. So given (a,i) we can call f' on (a, FS i) which is guaranteed to give us something of the form (b, FS j). Thus it is safe to strip off the FS and return (b, j) (though the Haskell type checker most certainly does not know this, so we just have to tell it to trust us).

> panGalacticPred n f = \(a,i) -> second unFS (f' (a, FS i))
>   where
>     unFS :: Fin (Suc n) -> Fin n
>     unFS FZ = error "impossible!"
>     unFS (FS i) = i

To construct f' we iterate a certain transformation until reaching a fixed point. For finite sets A and B this is guaranteed to terminate, though it is certainly not obvious from the Haskell code. (Encoding this in Agda so that it is accepted by the termination checker would be a fun (?) exercise.)

One round of the algorithm consists of two phases called “shape up” and “ship out” (to be described shortly).

>     oneRound = natty n $ shipOut . shapeUp
> 
>     -- iterate 'oneRound' beginning with the original function...
>     fs = iterate oneRound f
>     -- ... and stop when we reach a fixed point.
>     f' = fst . head . dropWhile (uncurry (=/=)) $ zip fs (tail fs)
>     f1 =/= f2 = all (\x -> f1 x == f2 x) universe

Encoding Card Games

Recall that a “card” is a pair of a value and a suit; we think of B as the set of values and N as the set of suits.

> type Card v s = (v, s)
> 
> value :: Card v s -> v
> value = fst
> 
> suit :: Card v s -> s
> suit = snd

Again, there are a number of players (one for each element of A), each of which has a “hand” of cards. A hand has a number of “spots” for cards, each one labelled by a different suit (which may not have any relation to the actual suit of the card in that position).

> type PlayerSpot p s = (p, s)
> type Hand v s = s -> Card v s

A “game” is an injective function from player spots to cards. Of course, the type system is not enforcing injectivity here.

> type Game p v s = PlayerSpot p s -> Card v s

Some utility functions. First, a function to project out the hand of a given player.

> hand :: p -> Game p v s -> Hand v s
> hand p g = \s -> g (p, s)

A function to swap two cards, yielding a bijection on cards.

> swap :: (Eq s, Eq v) => Card v s -> Card v s -> (Card v s -> Card v s)
> swap c1 c2 = f
>   where
>     f c
>       | c == c1   = c2
>       | c == c2   = c1
>       | otherwise = c

leftmost finds the leftmost card in a player’s hand which has a given suit.

> leftmost :: Finite s => s -> Hand v s -> Maybe s
> leftmost targetSuit h = find (\s -> suit (h s) == targetSuit) universe

Playing Rounds

playRound abstracts out a pattern that is used by both shapeUp and shipOut. The first argument is a function which, given a hand, produces a function on cards; that is, based on looking at a single hand, it decides how to swap some cards around.2 playRound then applies that function to every hand, and composes together all the resulting permutations.

Note that playRound has both Finite s and Finite p constraints, so we should think about whether the result depends on the order of elements returned by any call to universe—I claimed it does not. Finite s corresponds to suits/spots, which corresponds to N in the original problem formulation. N explicitly has a canonical ordering, so this is not a problem. The Finite p constraint, on the face of it, is more problematic. We will have to think carefully about each of the rounds implemented in terms of playRound and make sure they do not depend on the order of players. Put another way, it should be possible for all the players to take their turn simultaneously.

> playRound :: (Finite s, Finite p, Eq v) => (Hand v s -> Card v s -> Card v s) -> Game p v s -> Game p v s
> playRound withHand g = foldr (.) id swaps . g
>   where
>     swaps = map (withHand . flip hand g) players
>     players = universe

Shape Up and Ship Out

Finally, we can describe the “shape up” and “ship out” phases, beginning with “shape up”. A “bad” card is defined as one having the lowest suit; make sure every hand with any bad cards has one in the leftmost spot (by swapping the leftmost bad card with the card in the leftmost spot, if necessary).

> shapeUp :: (Finite s, Finite p, Eq v) => Game p v s -> Game p v s
> shapeUp = playRound shapeUp1
>   where
>     badSuit = head universe
>     shapeUp1 theHand =
>       case leftmost badSuit theHand of
>         Nothing      -> id
>         Just badSpot -> swap (theHand badSuit) (theHand badSpot)

And now for the “ship out” phase. Send any “bad” cards not in the leftmost spot somewhere else, by swapping with a replacement, namely, the card whose suit is the same as the suit of the spot, and whose value is the same as the value of the bad card in the leftmost spot. The point is that bad cards in the leftmost spot are OK, since we will eventually just ignore the leftmost spot. So we have to keep shipping out bad cards not in the leftmost spot until they all end up in the leftmost spot. For some intuition as to why this is guaranteed to terminate, consult Schwartz; note that columns tend to acquire more and more cards that have the same rank as a spade in the top spot (which never moves).

> shipOut :: (Finite s, Finite p, Eq v) => Game p v s -> Game p v s
> shipOut = playRound shipOutHand
>   where
>     badSuit = head universe
>     spots = universe
>     shipOutHand theHand = foldr (.) id swaps
>       where
>         swaps = map (shipOut1 . (theHand &&& id)) (drop 1 spots)
>         shipOut1 ((_,s), spot)
>           | s == badSuit = swap (theHand spot) (value (theHand badSuit), spot)
>           | otherwise    = id

And that’s it! Note that both shapeUp and shipOut are implemented by composing a bunch of swaps; in fact, in both cases, all the swaps commute, so the order in which they are composed does not matter. (For proof, see Schwartz.) Thus, the result is independent of the order of the players (i.e. the set A).

Enough code, let’s see an example! This example is taken directly from Doyle and Qiu’s paper, and the diagrams are being generated literally (literately?) by running the code in this blog post. Here’s the starting configuration:

08c40ab96ca385c0

Again, the spades are all highlighted in green. Recall that our goal is to get them all to be in the first row, but we have to do it in a completely deterministic, canonical way. After shaping up, we have:

e32d54891cc5e470

Notice how the 6, K, 5, A, and 8 of spades have all been swapped to the top of their column. However, there are still spades which are not at the top of their column (in particular the 10, 9, and J) so we are not done yet.

Now, we ship out. For example, the 10 of spades is in the diamonds position in the column with the Ace of spades, so we swap it with the Ace of diamonds. Similarly, we swap the 9 of spades with the Queen of diamonds, and the Jack of spades with the 4 of hearts.

271c3505d198b229

Shaping up does nothing at this point so we ship out again, and then continue to alternate rounds.

d2012b69fc3cc161

In the final deal above, all the spades are at the top of a column, so there is an injection from the set of all non-spade spots to the deck of cards with all spades removed. This example was, I suspect, carefully constructed so that none of the spades get swapped out into the undealt portion of the deck, and so that we end up with only spades in the top row. In general, we might end up with some non-spades also in the top row, but that’s not a problem. The point is that ignoring the top row gets rid of all the spades.

Anyway, I hope to write more about some “practical” examples and about what this has to do with combinatorial species, but this post is long enough already. Doyle and Qiu also describe a “short division” algorithm (the above is “long division”) that I hope to explore as well.

The rest of the code

For completeness, here’s the code I used to represent the example game above, and to render all the card diagrams (using diagrams 1.3).

> type Suit = Fin
> type Rank = Fin
> type Player = Fin
> 
> readRank :: SNat n -> Char -> Rank n
> readRank n c = fins n !! (fromJust $ findIndex (==c) "A23456789TJQK")
> 
> readSuit :: SNat n -> Char -> Suit n
> readSuit (SS _) 'S'                = FZ
> readSuit (SS (SS _)) 'H'           = FS FZ
> readSuit (SS (SS (SS _))) 'D'      = FS (FS FZ)
> readSuit (SS (SS (SS (SS _)))) 'C' = FS (FS (FS FZ))
> 
> readGame :: SNat a -> SNat b -> SNat n -> String -> Game (Player a) (Rank b) (Suit n)
> readGame a b n str = \(p, s) -> table !! finToInt p !! finToInt s
>   where
>     table = transpose . map (map readCard . words) . lines $ str
>     readCard [r,s] = (readRank b r, readSuit n s)
> 
> -- Example game from Doyle & Qiu
> exampleGameStr :: String
> exampleGameStr = unlines
>   [ "4D 6H QD 8D 9H QS 4C AD 6C 4S"
>   , "JH AH 9C 8H AS TC TD 5H QC JS"
>   , "KC 6S 4H 6D TS 9S JC KD 8S 8C"
>   , "5C 5D KS 5S TH JD AC QH 9D KH"
>   ]
> 
> exampleGame :: Game (Player Ten) (Rank Thirteen) (Suit Four)
> exampleGame = readGame toSNat toSNat toSNat exampleGameStr
> 
> suitSymbol :: Suit n -> String
> suitSymbol = (:[]) . ("♠♥♦♣"!!) . finToInt  -- Huzzah for Unicode
> 
> suitDia :: Suit n -> Diagram B
> suitDia = (suitDias!!) . finToInt
> 
> suitDias = map mkSuitDia (fins (toSNat :: SNat Four))
> mkSuitDia s = text' (suitSymbol s) # fc (suitColor s) # lw none
> 
> suitColor :: Suit n -> Colour Double
> suitColor n
>   | finToInt n `elem` [0,3] = black
>   | otherwise               = red
> 
> rankStr :: Rank n -> String
> rankStr n = rankStr' (finToInt n + 1)
>   where
>     rankStr' 1 = "A"
>     rankStr' i | i <= 10    = show i
>                | otherwise = ["JQK" !! (i - 11)]
> 
> text' t = stroke (textSVG' (TextOpts lin INSIDE_H KERN False 1 1) t)
> 
> renderCard :: (Rank b, Suit n) -> Diagram B
> renderCard (r, s) = mconcat
>   [ mirror label
>   , cardContent (finToInt r + 1)
>   , back
>   ]
>   where
>     cardWidth  = 2.25
>     cardHeight = 3.5
>     cardCorners = 0.1
>     mirror d = d  d # rotateBy (1/2)
>     back  = roundedRect cardWidth cardHeight cardCorners # fc white
>           # lc (case s of { FZ -> green; _ -> black })
>     label = vsep 0.1 [text' (rankStr r), text' (suitSymbol s)]
>           # scale 0.6 # fc (suitColor s) # lw none
>           # translate ((-0.9) ^& 1.5)
>     cardContent n
>       | n <= 10   = pips n
>       | otherwise = face n # fc (suitColor s) # lw none
>                            # sized (mkWidth (cardWidth * 0.6))
>     pip = suitDia s # scale 1.1
>     pips 1 = pip # scale 2
>     pips 2 = mirror (pip # up 2)
>     pips 3 = pips 2  pip
>     pips 4 = mirror (pair pip # up 2)
>     pips 5 = pips 4  pip
>     pips 6 = mirror (pair pip # up 2)  pair pip
>     pips 7 = pips 6  pip # up 1
>     pips 8 = pips 6  mirror (pip # up 1)
>     pips 9 = mirror (pair (pip # up (2/3)  pip # up 2))  pip # up (case finToInt s of {1 -> -0.1; 3 -> 0; _ -> 0.1})
>     pips 10 = mirror (pair (pip # up (2/3)  pip # up 2)  pip # up (4/3))
>     pips _ = mempty
>     up n = translateY (0.5*n)
>     pair d = hsep 0.4 [d, d] # centerX
>     face 11 = squares # frame 0.1
>     face 12 = loopyStar
>     face 13 = burst # centerXY
>     squares
>       = strokeP (mirror (square 1 # translate (0.2 ^& 0.2)))
>       # fillRule EvenOdd
>     loopyStar
>       = regPoly 7 1
>       # star (StarSkip 3)
>       # pathVertices
>       # map (cubicSpline True)
>       # mconcat
>       # fillRule EvenOdd
>     burst
>       = [(1,5), (1,-5)] # map r2 # fromOffsets
>       # iterateN 13 (rotateBy (-1/13))
>       # mconcat # glueLine
>       # strokeLoop
> 
> renderGame :: (Natural n, Natural a) => Game (Player a) (Rank b) (Suit n) -> Diagram B
> renderGame g = hsep 0.5 $ map (\p -> renderHand p $ hand p g) universe
> 
> renderHand :: Natural n => Player a -> Hand (Rank b) (Suit n) -> Diagram B
> renderHand p h = vsep 0.2 $ map (renderCard . h) universe

  1. If we could program in Homotopy Type Theory, we could make this very formal by using the notion of cardinal-finiteness developed in my dissertation (see section 2.4).

  2. In practice this function on cards will always be a permutation, though the Haskell type system is not enforcing that at all. An early version of this code used the Iso type from lens, but it wasn’t really paying its way.

Posted in haskell, math, species | Tagged , , , , , , , , , | 4 Comments

Polynomial Functors Constrained by Regular Expressions

I’ve now finished revising the paper that Dan Piponi and I had accepted to MPC 2015; you can find a PDF here:

Polynomial Functors Constrained by Regular Expressions

Here’s the 2-minute version: certain operations or restrictions on functors can be described by regular expressions, where the elements of the alphabet correspond to type arguments. The idea is to restrict to only those structures for which an inorder traversal yields a sequence of types matching the regular expression. For example, (aa)^* gives you even-size things; a^*ha^* gives you the derivative (the structure has a bunch of values of type a, a single hole of type h, and then more values of type a), and b^*ha^* the dissection.

dissected-tree

The punchline is that we show how to use the machinery of semirings, finite automata, and some basic matrix algebra to automatically derive an algebraic description of any functor constrained by any regular expression. This gives a nice unified way to view differentiation and dissection; we also draw some connections to the theory of divided differences.

I’m still open to discussion, suggestions, typo fixes, etc., though at this point they won’t make it into the proceedings. There’s certainly a lot more that could be said or ways this could be extended further.

Posted in math, writing | Tagged , , , , , , | 4 Comments

Blogging again, & some major life events

It’s been a long time since I’ve written anything here; the blog was on hold while I was finishing my PhD and on the academic job market. Now that things have settled down a bit I plan to get back to blogging.

For starters, here are a few of the major events that have happened in the meantime, that readers of this blog might care about:

  • I successfully defended my PhD dissertation in October, officially graduated in December, and got an actual diploma in the mail a few weeks ago. I’ll be back in Philadelphia for the official graduation ceremony in May.
  • I accepted a tenure-track position at Hendrix College in Conway, Arkansas, and will be moving there this summer.
  • Dan Piponi and I had a paper accepted to MPC 2015. Here’s the github repo, and I plan to post a PDF copy here soon (once I get around to incorporating feedback from the reviewers). I look forward to seeing a bunch of folks (Volk?) in Königswinter this summer; I already have my plane tickets (CIU -> DTW -> AMS -> CGN, it’s a long story).
  • Work on diagrams continues strong (no thanks to me!), and we are aiming for a big new release soon—I will certainly post about that here as well.
Posted in diagrams, meta | Tagged , , , , , | Leave a comment

Maniac week postmortem

My maniac week was a great success! First things first: here’s a time-lapse video1 (I recommend watching it at the full size, 1280×720).

Some statistics2:

  • Total hours of productive work: 55.5 (74 pings)
  • Average hours of work per day3: 11
  • Average hours of sleep per night: 7.8 (52 pings over 5 nights)4
  • Total hours not working or sleeping: 27.25 (37 pings)
  • Average hours not working per day: 5.5
  • Pages of dissertation written: 24 (157 to 181)

[I was planning to also make a visualization of my TagTime data showing when I was sleeping, working, or not-working, but putting together the video and this blog post has taken long enough already! Perhaps I’ll get around to it later.]

Overall, I would call the experiment a huge success—although as you can see, I was a full 2.5 hours per day off my target of 13.5 hours of productive work each day. What with eating, showering, making lunch, getting dinner, taking breaks (both intentional breaks as well as slacking off), and a few miscellaneous things I had to take care of like taking the car to get the tire pressure adjusted… it all adds up surprisingly fast. I think this was one of the biggest revelations for me; going into it I thought 3 hours of not-work per day was extremely generous. I now think three hours of not-work per day is probably within reach for me but would be extremely difficult, and would probably require things like planning out meals ahead of time. In any case, 55 hours of actual, focused work is still fantastic.

Some random observations/thoughts:

  • Having multiple projects to work on was really valuable; when I got tired of working on one thing I could often just switch to something else instead of taking an actual break. I can imagine this might be different if I were working on a big coding project (as most of the other maniac weeks have been). The big project would itself provide multiple different subtasks to work on, but more importantly, coding provides immediate feedback that is really addictive. Code a new feature, and you can actually run the new code! And it does something cool! That it didn’t do before! In contrast, when I write another page of my dissertation I just have… another page of my dissertation. I am, in fact, relatively excited about my dissertation, but it can’t provide that same sort of immediate reinforcing feedback, and it was difficult to keep going at times.

  • I found that having music playing really helped me get into a state of “flow”. The first few days I would play some album and then it would stop and I wouldn’t think to put on more. Later in the week I would just queue up many hours of music at a time and that worked great.

  • I was definitely feeling worn out by the end of the week—the last two days in particular, it felt a lot harder to get into a flow. I think I felt so good the first few days that I became overconfident—which is good to keep in mind if I do this again. The evening of 12 August was particularly bad; I just couldn’t focus. It might have been better in the long run to just go home and read a book or something; I’m just not sure how to tell in the moment when I should push through and when it’s better to cut my losses.

  • Blocking Facebook, turning off email notifications, etc. was really helpful. I did end up allowing myself to check email using my phone (I edited the rules a few hours before I started) and I think it was a good idea—I ended up still needing to communicate with some people, so it was very convenient and not too distracting.

  • Note there are two places on Tuesday afternoon where you can see the clock jump ahead by an hour or so; of course those are times when I turned off the recording. One corresponded to a time when I needed to read and write some sensitive emails; during the other, I was putting student pictures into an anki deck, and turned off the recording to avoid running afoul of FERPA.

That’s all I can think of for now; questions or comments, of course, are welcome.


  1. Some technical notes (don’t try this at home; see http://expost.padm.us/maniactech for some recommendations on making your own timelapse). To record and create the video I used a homegrown concoction of scrot, streamer, ImageMagick, ffmpeg, with some zsh and Haskell scripts to tie it all together, and using diagrams to generate the clock and tag displays. I took about 3GB worth of raw screenshots, and it takes probably about a half hour to process all of it into a video.

  2. These statistics are according to TagTime, i.e. gathered via random sampling, so there is a bit of inherent uncertainty. I leave it as an exercise for the reader to calculate the proper error bars on these times (given that I use a standard ping interval of 45 minutes).

  3. Computed as 74/(171 – 9) pings multiplied by 24 hours; 9 pings occurred on Sunday morning which I did not count as part of the maniac week.

  4. This is somewhat inflated by Saturday night/Sunday morning, when I both slept in and got a higher-than-average number of pings; the average excluding that night is 6.75 hours, which sounds about right.

Posted in meta | Tagged , , , , , | 9 Comments

Readers wanted!

tl;dr: Read a draft of my thesis and send me your feedback by September 9!

Over the past year I’ve had several people say things along the lines of, “let me know if you want me to read through your thesis”. I never took them all that seriously (it’s easy to say you are willing to read a 200-page document…), but it never hurts to ask, right?

My thesis defense is scheduled for October 14, and I’m currently undertaking a massive writing/editing push to try to get as much of it wrapped up as I can before classes start on September 4. So, if there’s anyone out there actually interested in reading a draft and giving feedback, now is your chance!

The basic idea of my dissertation is to put combinatorial species and related variants (including a port of the theory to HoTT) in a common categorical framework, and then be able to use them for working with/talking about data types. If you’re brave enough to read it, you’ll find lots of category theory and type theory, and very little code—but I can promise lots of examples and pretty pictures. I’ve tried to make it somewhat self-contained, so it may be a good way to learn a bit of category theory or homotopy type theory, if you’ve been curious to learn more about those topics.

You can find the latest draft here (auto-updated every time I commit); more generally, you can find the git repo here. If you notice any typos or grammatical errors, feel free to open a pull request. For anything more substantial—thoughts on the organization, notes or questions about things you found confusing, suggestions for improvement, pointers to other references—please send me an email (first initial last name at gmail). And finally, please send me any feedback by September 9 at the latest (but the earlier the better). I need to have a final version to my committee by September 23.

Last but not least, if you’re interested to read it but don’t have the time or inclination to provide feedback on a draft, never fear—I’ll post an announcement when the final version is ready for your perusal!

Posted in category theory, combinatorics, diagrams, grad school, math, species, writing | Tagged , , , , | 15 Comments

Maniac week

Inspired by Bethany Soule (and indirectly by Nick Winter, and also by the fact that my dissertation defense and the start of the semester are looming), I am planning a “maniac week” while Joyia and Noah will be at the beach with my family (I will join them just for the weekend). The idea is to eliminate as many distractions as possible and to do a ton of focused work. Publically committing (like this) to a time frame, ground rules, and to putting up a time-lapse video of it afterwards are what actually make it work—if I don’t succeed I’ll have to admit it here on my blog; if I waste time on Facebook the whole internet will see it in the video; etc. (There’s actually no danger of wasting time on Facebook in particular since I have it blocked, but you get the idea.)

Here are the rules:

  • I will start at 6pm (or thereabouts) on Friday, August 8.
  • I will continue until 10pm on Wednesday, August 13, with the exception of the morning of Sunday, August 10 (until 2pm).
  • I will get at least 7.5 hours of sleep each night.
  • I will not eat cereal for any meal other than breakfast.
  • I will reserve 3 hours per day for things like showering, eating, and just plain resting.  Such things will be tracked by the TagTime tag “notwork”.
  • I will spend the remaining 13.5 hours per day working productively. Things that will count as productive work:
    • Working on my dissertation
    • Course prep for CS 354 (lecture and assignment planning, etc.) and CS 134 (reading through the textbook); making anki decks with names and faces for both courses
    • Updating my academic website (finish converting to Hakyll 4; add potential research and independent study topics for undergraduates)
    • Processing FogBugz tickets
    • I may work on other research or coding projects (e.g. diagrams) each day, but only after spending at least 7 hours on my dissertation.
  • I will not go on IRC at all during the week.  I will disable email notifications on my phone (but keep the phone around for TagTime), and close and block gmail in my browser.  I will also disable the program I use to check my UPenn email account.
  • For FogBugz tickets which require responding to emails, I will simply write the email in a text file and send it later.
  • I may read incoming email and write short replies on my phone, but will keep it to a bare minimum.
  • I will not read any RSS feeds during the week.  I will block feedly in my browser.
  • On August 18 I will post a time-lapse video of August 8-13.  I’ll probably also write a post-mortem blog post, if I feel like I have anything interesting to say.
  • I reserve the right to tweak these rules (by editing this post) up until August 8 at 6pm.  After that point it’s shut up and work time, and I cannot change the rules any more.

And no, I’m not crazy. You (yes, you) could do this too.

Posted in grad school, meta, writing | Tagged , , , , , | 9 Comments