This is certainly not a new observation, and I’m sure it is folklore and/or contained in various textbooks, but it was amusing to derive it independently.
Suppose we have a finite set of real numbers , and we want to pick a value
which is somehow “in the middle” of the
. The punchline is that
- if we want to minimize the sum of the squared distances from
to each
, we should pick
to be the average of the
;
- if we want to minimize the sum of the absolute distances from
to each
, we should pick
to be the median of the
.
The first of these is tricky to understand intuitively but easy to derive; the second is intuitively straightforward but trying to derive it leads to an interesting twist.
Average = minimizing sum of squared distances
Let’s not worry about why we would want to minimize the sum of squared distances; there are good reasons and it’s not the point. I don’t know about you, but I find it difficult to reason intuitively about how and why to pick to minimize this sum of squared differences. If you know of an intuitive way to explain this, I would love to hear about it! But in any case, it is easy to derive using some strightforward calculus.
Let denote the sum of squared distances from a given
to each of the
. Taking the derivative of
with respect to
, we find
.
Setting the derivative equal to zero, we can first divide through by the factor of 2, yielding
Since does not depend on
, this is just
copies of
less the sum of the
. Hence, solving for
yields
as expected: the value of which minimizes the sum of squared distances to the
is their average, that is, the sum of the
divided by the size of the set.
Median = minimizing sum of absolute distances
Now suppose we want to minimize the sum of absolute distances instead, that is,
In this scenario, it is much easier to reason out the correct answer. Start with some arbitrary , and imagine nudging it by some small amount
, say, to the right.
’s distances to any points on its left will each increase by
, and its distances to any points on its right will each decrease by the same amount. Therefore, if there are more
to the left of
, then the overall sum of distances distances will increase; if there are more
to the right, then the overall sum will decrease. So, to find
which minimizes the sum of absolute differences, we want the same number of
on the left and the right, that is, we want the median. Note that if
is odd, then we must pick
to be exactly equal to the
in the middle; if
is even, then we can pick
to be anywhere inside the interval between the middle two
.
Just for fun, can we derive this answer using calculus, like we did for minimizing squared differences? There is a wrinkle, of course, which is that the absolute value function is not differentiable everywhere: it has a sharp corner at zero. But we won’t let that stop us! Clearly the derivative of is
when
and
when
. So it seems reasonable to just assign the derivative a value of
at
. Algebraically, we can define
where is equal to
when the proposition
is true, and
when it is false (this notation is called the Iverson bracket). So when
we get
; when
we get
; and when
both propositions are false so we get
.
Armed with this definition, we can differentiate with respect to
:
Clearly, this is zero when , that is, when there are the same number of
on either side of
.
The curious thing to me is that even though the derivative of is undefined when
, it seems like it “wants” to be 0 here. In general, if we assign the value
to the derivative at
, the derivative of
becomes
When is nonzero and
is odd, there are no values of
for which this derivative is zero, making it more difficult to find the minimum.
Your $[x>0] – [x-0]$ is known as the _sign function_ or _signum function_: https://en.wikipedia.org/wiki/Sign_function . I’d hoped Wikipedia would have some explanation of why $sgn(0)$, considered as the derivative of the absolute value function, “ought” to be $0$, but I didn’t see anything compelling.
Still, I wonder if there’s some theorem about how, when $f$ is discontinuous at $x$, the function $ \hat f(x) = \frac12 \left(\lim_{x\to 0^-} f(x)+ \lim_{x\to 0^+} f(x) \right)$ is somehow well-behaved. For example, isn’t it the case that a Fourier series that converges to $f$ everywhere except at the discontinuities must actually converge to $\hat f$ everywhere?