Binary search over floating point representations

I got some good feedback on my last post about binary search, and thought it was worth a follow-up post.

An important fix

First things first: commenter Globules pointed out that doing (l+r) `div` 2 can overflow; my initial, glib answer was “sure, you have to be careful when doing arithmetic with fixed-sized integers”, but then I realized that it was easily fixable in this case! So I replaced that formula with l + (r-l) `div` 2 which won’t overflow, even when searching for values near the top of the range of a fixed-sized integer type.

An entirely frivolous distraction that I spent way too much time on

With that out of the way, let’s get to the more interesting discussion. Several commenters took issue with my use of unsafeCoerce to convert Double into Word64 and do binary search over bit representations for floating-point numbers, and they raised some good points:

  1. Even if, in the context of competitive programming, we’re not particularly concerned with maintainability or safety, there’s no guarantee that unsafeCoerce will behave the same on our local machine as on the judging machine! For example, the endianness might be different.
  2. Worse, it just plain doesn’t work: my worry about the bit representation of negative numbers looking bigger than that of positive numbers, because of the sign bit, was actually a very valid worry. It just so happens to work when searching for positive numbers, but when searching for negative numbers it goes into an infinite loop!

In my defense, I got this idea from Jules Jacobs’s original article… but looking at it again, there is a key footnote that I hadn’t noticed before:

I’m assuming that f2b respects ordering, that is, comparing f2b(x) < f2b(y) gives the same result as comparing the floats x < y. Depending on the bit representation of floats, one would have to shuffle the mantissa and exponent and sign bits around to ensure this.

Oops, of course! It turns out there are a lot of things about the IEEE-754 standard which make this work nicely for positive numbers: the exponent is stored first, with a bias so we don’t have to deal with signed exponents, and the mantissa is always understood to have a leading 1 bit which is not stored. For positive floating-point numbers x and y, it’s already the case that x < y if and only if their IEEE-754 representations, considered as unsigned integers, are in the same order! This is very clever, and it seems it was done this way on purpose, so that hardware comparison of floating-point numbers can be fast. And this is what made my example in the previous post work at all.

However, for negative numbers this doesn’t quite work. First of all, the high-order bit is the sign bit, so negative numbers all appear larger when interpreted as unsigned integers. Interpreting them as signed integers doesn’t work either, because they are just stored as a sign bit and a magnitude, as opposed to signed integers which are typically stored using 2’s complement, so negative floating point numbers are “backwards” compared to the interpretation of their bit pattern as (signed or unsigned) integers. But this is not hard to fix; commenter babel linked to a reference explaining exactly how to do the required bit-twiddling. Essentially, we always flip the sign bit, and flip all the other bits too if the sign bit was set.

So I could have just done this bit-twiddling on the result of unsafeCoerce. However, goaded by some other commenters, I wanted to try using encodeFloat/decodeFloat instead of unsafeCoerce to make it a little more platform-independent. I ended up spending many hours on this. I fixed about 17 gazillion bugs and gave up several times in frustration, only to realize another bug later and come back to it. In the end, though, I got it to work! You can see my f2b :: Double -> Word64 and b2f :: Word64 -> Double functions here. I do make some assumptions about the size of Double values, so it’s not completely platform-independent, but at least it should be independent of the endianness.

How do I know it’s correct? Well, I can’t be 100% sure, but I verified each of the following properties by running QuickCheck on a million random inputs (and I used these properties to find lots of bugs!):

  • f2b is monotonic: for all x :: Double, x < y if and only if f2b x < f2b y.
  • b2f is left inverse to f2b: for all x :: Double, b2f (f2b x) == x.
  • b2f is almost a right inverse to f2b; this direction is made more complicated by the fact that some Word64 values correspond to Infinity or NaN. Also, there are some Word64 values that correspond to really tiny floating-point values on the very edge of what is representable, where f2b (b2f w) is one more or less than the original w.
  • It’s actually not enough that x < y implies f2b x < f2b y; we also need the fact that the midpoint between f2b x and f2b y will correspond to a floating-point number between x and y. I was worried about this for a while, until I finally understood the fact that the mantissa is always assumed to have a leading 1 which is not stored. That makes everything work out nicely, and I checked this property with QuickCheck as well.

So, let’s see it in action! We can search for negative values now, or values that don’t exist, etc.

λ> search floating (> (-3.2934)) (-100) 100
(-3.2934,-3.2933999999999997)
λ> search floating (\x -> x**7 >= 1e34) (-1e100) (1e100)
(71968.56730011519,71968.5673001152)
λ> search floating (\x -> x**2 >= 150) 0 100
(12.247448713915889,12.24744871391589)
λ> search floating (\x -> x**2 >= (-150)) (-1e308) (1e308)
(-1.0e308,-9.999999999999998e307)

So, was it worth it? From a competitive programming point of view, probably not! I can think of one or two times I’ve really struggled with precision issues where this might have helped. But 99.9% of the time you can just use a normal binary search on floating-point values until you get within the required tolerance. Overall, though, despite the extreme frustration, this was a fun detour through some things I didn’t understand very well before. I now know a lot more about IEEE-754 encoding and Haskell’s support for floating-point values!

Advertisement

About Brent

Associate Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in haskell and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.