This semester I’m teaching a Discrete Mathematics course. Recently, I assigned them a homework problem from the textbook that asked them to prove that the binary operator on the real numbers is associative, that is, for all real numbers , , and ,
.
You might like to pause for a minute to think about how you would prove this! Of course, how you prove it depends on how you define , so you might like to think about that too.
The book expected them to do a proof by cases, with some sort of case split on the order of , , and . What they turned in was mostly pretty good, actually, but while grading it I became disgusted with the whole thing and thought there has to be a better way.
I was reminded of an example of Dijkstra’s that I remember reading. So I asked myself—what would Dijkstra do? The thing I remember reading may have, in fact, been this exact proof, but I couldn’t remember any details and I still can’t find it now, so I had to (re)work out the details, guided only by some vague intuitions.
Dijkstra would certainly advocate proving associativity of using a calculational approach. Dijkstra would also advocate using a symmetric infix operator symbol for a commutative and associative operation, so let’s adopt the symbol for . ( would also be a reasonable choice, though I find it less mnemonic.)
How can we calculate with ? We have to come up with some way to characterize it that allows us to transform expressions involving into something else more fundamental. The most obvious definition would be “ if , and otherwise”. However, although this is a fantastic implementation of if you actually want to run it, it is not so great for reasoning about , precisely because it involves doing a case split on whether . This is the definition that leads to the ugly proof by cases.
How else could we define it? The usual more mathematically sophisticated way to define it would be as a greatest lower bound, that is, “ if and only if and and is the greatest such number, that is, for any other such that and , we have .” However, this is a bit roundabout and also not so conducive to calculation.
My first epiphany was that the best way to characterize is by its relationship to . After one or two abortive attempts, I hit upon the right idea:
That is, an arbitrary is less than or equal to the minimum of and precisely when it is less than or equal to both. In fact, this completely characterizes , and is equivalent to the second definition given above.^{1} (You should try convincing yourself of this!)
But how do we get anywhere from by itself? We need to somehow introduce a thing which is less than or equal to it, so we can apply our characterization. My second epiphany was that equality of real numbers can also be characterized by having the same “downsets”, i.e. two real numbers are equal if and only if the sets of real numbers less than or equal to them are the same. That is,
Now the proof almost writes itself. Let be arbitrary; we calculate as follows:
Of course this uses our characterization of via its relationship to , along with the fact that is associative. Since we have proven that if and only if for arbitrary , therefore .

Thinking about it later, I realized that this should not be surprising: it’s just characterizing as the categorical product, i.e. meet, i.e. greatest lower bound, in the poset of real numbers ordered by the usual .↩
I’m so soaked in category theory that when I “paused” my *immediate* thought was to use the fact that it was a categorical product from which the proof follows readily as you describe. By just swapping out the symbols, your proof works just as well for proving the associativity (up to isomorphism) of any categorical product.
I often prefer/recommend figuring out how a concept should be characterized categorically and then using that characterization. My first answer on StackExchange (with an unregistered account) takes this view for integer division: https://math.stackexchange.com/a/731909 At this point, I tend to view logical and ordertheoretic concepts as (0,1)category theory (https://ncatlab.org/nlab/show/%280%2C1%29category+theory check out the sidebar). For example, a directed set is a filtered (0,1)category and a Heyting algebra is a (0,1)topos. Relevantly, downsets are (0,1)presheaves. This gets me more value out of the investment I’ve made in learning category theory. For example, we now immediately know that the poset of downsets is a Heyting algebra and that the exponential downset is defined by x in Q^P if and only if for all z <= x, z in P implies z in Q.
Of course, this decategorification can help with understanding/motivating the general categorical concepts too.
Incidentally, your formula involving equality is the decategorification of the fact that full and faithful functors reflect isomorphisms and the fact that the Yoneda embedding is full and faithful(, and, of course, the fact that all functors preserve isomorphisms).
Thanks for the insights! I hadn’t thought of the connection to the Yoneda embedding but it makes perfect sense now that you mention it.
At this point in my development it seems like I often tend to have the right ideas or intuitions because of my experience with category theory, but don’t actually make the explicit connection until later. But I think I need to work on consciously trying to characterize things categorically in the first place.
This drives me mad! After reading the first half of the article I thought “I bet thats easy. Just use the identity . Apply that definition to and and you will get two simple algebraic terms. It _should_ be easy to show that those are equal.”
I was a bit stunned that I can’t get it to work. Perhaps it’s too late in the evening for me.
Some additional thoughts:
– It is funny that you characterize a number by it’s downsets. But I imagine that there are sets of numbers where this doesn’t work.The bugeyed line for example.
– My own approach won’t work in any fields with characteristic 2.
That sounds awful. =) I imagine in order to simplify those you’d have to split into cases to deal with the absolute values. And that formula only makes sense over fields like Q or R, it wouldn’t work for min over the natural numbers, for example. However, if you do get it to work out, I would be interested to see the result.
Characterizing a number by its downsets works in any partially ordered set. If you define a lessthanorequal relation on the bugeyed line which is reflexive, transitive, and antisymmetric, then it will work. And the obvious way of extending the usual ordering on the reals to the bugeyed line (everything is ordered the way you would expect, except that the two zeros are incomparable to each other) satisfies these properties. Maybe you are thinking that the two zeros have the same downsets, but they do not, since each number is included in its own downset.
It works on ℕ or ℤ too, because if (+ x y) is odd, then ( x y) is odd too, therefore ( (+ x y) (abs ( x y))) is even and can be divided by 2 in the original set.
Of course, you don’t have to split cases for abs, since (abs x) = (sqrt (* x x))…
(which too work in ℤ).
Sometimes the opposite is true; the easiest way to work with expressions involving absolute values is often to express them in terms of min and max and then rearrange them using associativity. For example here: https://math.stackexchange.com/a/555286/1149
Your second definition characterizes \downarrow using the universal property of the product, while your third characterizes it as the right adjoint to the delta functor.
… Adjoint functors really do allow for a more elegant definition, don’t they?
The Yonesa embedding is another one of those stunningly useful tools that category theory gives us. The fact that CT can apply itself so broadly is amazing (but I’m still upset that the variably based logarithm is a functor).
(why would you ban anonymous comments?)
Page EWD124013:
Re: comments, I think it’s been set like that for a long time, when I used to get a lot of spam. I think spam filtering is much better now. Maybe I’ll try enabling anonymous/unmoderated comments and see how it goes.
Thanks for the link!
I find your proof quite interesting. However, I think it is possible to write a much simpler proof, with less machinery and also without cases :
Let a, b, c be reals. Then, at least one of a, b, c has to satisfy x <= a /\ x <= b /\ x <= c. Without loss of generality, suppose x = a.
Then it follows that for all y in {a, b, c}, min(a, y) = a. Therefore :
min(a, min(b, c))
= a
= min(a, c)
= min(min(a, b), c)
What do you think?
Hmm, that’s a nice idea. However, I’m not sure I believe the “without loss of generality” bit: the smallest element of {a,b,c} need not be the one that comes first. There’s no a priori reason to believe that just because min(a, min(b,c)) = min(min(a,b),c) when a is the smallest, it will necessarily hold when b or c is the smallest instead. Perhaps you could make this work using the fact that min is commutative, I’m not sure off the top of my head.
Also, in order to conclude min(a, min(b,c)) = a from the fact that (for all y in {a,b,c}. min(a,y) = a), you would have to show that min(b,c) \in {a,b,c}. Granted, this should not be difficult, but it requires reasoning about the definition of min. I guess am not convinced your approach is really all that much simpler in the end.
I think you’re right. I glossed over a bunch of things, so the argument isn’t very convincing. I kind of implicitly assumed that min was commutative, and I feel the w.l.o.g. bit follows from that but I can’t come up with a compelling proof. Without that we’re back to cases, which is what I was trying to avoid in the first place.
I think a better proof would be to suppose min isn’t associative and to derive a contradiction, since then you really can suppose a <= b <= c without any additional assumptions.
How would this work? You can easily start like so:
No problems there. You can also say
But you can’t then assume that your specific a,b,c, obeying a ≤ b ≤ c, satisfy the inequality min(min(a,b), c) ≠ min(a, min(b,c)). The assumption lets you state that a particular triple satisfies that inequality, not that ALL triples do.
I posted another reply, but it looks like it wasn’t allowed. So restating without the formatting:
“I think a better proof would be to suppose min isn’t associative and to derive a contradiction, since then you really can suppose a <= b <= c without any additional assumptions."
I don't see how this would work. You can definitely say "without loss of generality, suppose a ≤ b ≤ c". But that doesn't get you anywhere; when you assume that min is not associative, that assumption lets you state that a _particular_ triple a,b,c fails to satisfy min(a, min(b,c)) = min(min(a,b), c); it doesn't let you state that ALL triples so fail. You can't plug your fully general triple a ≤ b ≤ c into the specifically qualified inequality min(min(a,b), c) ≠ min(a, min(b,c)) — when the inequality holds, it may be the case that a is greater than c.
Here is another method, but it is easier to present with max(.). A similar method works for min(.).
We can define max(a,b) as the limit of (a^n + b^n)^(1/n) when n > infinity. This also says that the infinity norm is the limit of the pnorms. Now we have:
max(a,max(b,c)) = lim{n>inf} (a^n + (lim{n’>inf} (b^n’ + c^n’)^(1/n’))^n )^(1/n)
now, since both limits are well defined (in a sense that needs proof, of course), we can take the diagonal limit:
max(a,max(b,c)) = lim{n>inf} (a^n + (b^n+c^n)^(1/n)^n )^(1/n)
by algebra:
(a^n + (b^n+c^n)^(1/n)^n )^(1/n) = (a^n + b^n + c^n)^(1/n)
which is the same expression we get for max(max(a,b),c)).
Takeaway from proof would be equality of any nested tree of max(.) expressions with even special many parameter max(.) such as max(a,b,c) would all be equal, as they reach the same expression.
Observe that min(a,b) = inf({a} ∪ {b}). Observe that inf(A ∪ B) = inf(A ∪ {inf(B)}) = inf({inf(A)} ∪ B).
Then min(a, min(b,c)) = inf({a} ∪ {inf({b} ∪ {c})}) = inf({a} ∪ {b} ∪ {c}) = inf({inf({a} ∪ {b})} ∪ {c}) = min(min(a,b), c).
This is basically the same proof you provide above, but without the need to invoke an arbitrary z ≤ a so much. We just call z the infimum of {a, b, c}.
It makes more sense to think of min as a unary operator (which we do in the middle here, where we prove in passing that min(a,b,c) = inf({a} ∪ {b} ∪ {c}) ) than to think of it as a binary operator. It is a category error to ask whether a unary operator is associative. ;)