*This is certainly not a new observation, and I’m sure it is folklore and/or contained in various textbooks, but it was amusing to derive it independently.*

Suppose we have a finite set of real numbers , and we want to pick a value which is somehow “in the middle” of the . The punchline is that

- if we want to minimize the sum of the
*squared*distances from to each , we should pick to be the*average*of the ; - if we want to minimize the sum of the
*absolute*distances from to each , we should pick to be the*median*of the .

The first of these is tricky to understand intuitively but easy to derive; the second is intuitively straightforward but trying to derive it leads to an interesting twist.

## Average = minimizing sum of squared distances

Let’s not worry about *why* we would want to minimize the sum of *squared* distances; there are good reasons and it’s not the point. I don’t know about you, but I find it difficult to reason intuitively about how and why to pick to minimize this sum of squared differences. If you know of an intuitive way to explain this, I would love to hear about it! But in any case, it is easy to derive using some strightforward calculus.

Let denote the sum of squared distances from a given to each of the . Taking the derivative of with respect to , we find

.

Setting the derivative equal to zero, we can first divide through by the factor of 2, yielding

Since does not depend on , this is just copies of less the sum of the . Hence, solving for yields

as expected: the value of which minimizes the sum of squared distances to the is their average, that is, the sum of the divided by the size of the set.

## Median = minimizing sum of absolute distances

Now suppose we want to minimize the sum of *absolute* distances instead, that is,

In this scenario, it is much easier to reason out the correct answer. Start with some arbitrary , and imagine nudging it by some small amount , say, to the right. ’s distances to any points on its left will each increase by , and its distances to any points on its right will each decrease by the same amount. Therefore, if there are more to the left of , then the overall sum of distances distances will increase; if there are more to the right, then the overall sum will decrease. So, to find which minimizes the sum of absolute differences, we want the same number of on the left and the right, that is, we want the *median*. Note that if is odd, then we must pick to be exactly equal to the in the middle; if is even, then we can pick to be anywhere inside the interval between the middle two .

Just for fun, can we derive this answer using calculus, like we did for minimizing squared differences? There is a wrinkle, of course, which is that the absolute value function is not differentiable everywhere: it has a sharp corner at zero. But we won’t let that stop us! Clearly the derivative of is when and when . So it seems reasonable to just assign the derivative a value of at . Algebraically, we can define

where is equal to when the proposition is true, and when it is false (this notation is called the *Iverson bracket*). So when we get ; when we get ; and when both propositions are false so we get .

Armed with this definition, we can differentiate with respect to :

Clearly, this is zero when , that is, when there are the same number of on either side of .

The curious thing to me is that even though the derivative of is undefined when , it seems like it “wants” to be 0 here. In general, if we assign the value to the derivative at , the derivative of becomes

When is nonzero and is odd, there are no values of for which this derivative is zero, making it more difficult to find the minimum.

Your $[x>0] – [x-0]$ is known as the _sign function_ or _signum function_: https://en.wikipedia.org/wiki/Sign_function . I’d hoped Wikipedia would have some explanation of why $sgn(0)$, considered as the derivative of the absolute value function, “ought” to be $0$, but I didn’t see anything compelling.

Still, I wonder if there’s some theorem about how, when $f$ is discontinuous at $x$, the function $ \hat f(x) = \frac12 \left(\lim_{x\to 0^-} f(x)+ \lim_{x\to 0^+} f(x) \right)$ is somehow well-behaved. For example, isn’t it the case that a Fourier series that converges to $f$ everywhere except at the discontinuities must actually converge to $\hat f$ everywhere?