Competitive programming in Haskell: tries

In my previous post, I challenged you to solve Alien Math, which is about reading numbers in some base $B$ , but with a twist. We are given a list of $B$ strings representing the names of the digits $0$ through $B-1$ , and a single string describing a number, consisting of concatenated digit names. For example, if $B = 3$ and the names of the digits are zero, one, two, then we might be given a string like twotwozerotwoone, which we should interpret as $22021_3 = 223_{10}$ . Crucially, we are also told that the digit names are prefix-free, that is, no digit name is a prefix of any other. But other than that, the digit names could be really weird: they could be very different lengths, some digit names could occur as substrings (just not prefixes) of others, digit names could share common prefixes, and so on. So this is really more of a parsing problem than a math problem; once we have parsed the string as a list of digits, converting from base $B$ is the easy part.

One simple way we can do this is to define a map from digit names to digits, and simply look up each prefix of the given string until we find a hit, then chop off that prefix and start looking at successive prefixes of the remainder. This takes something like $O(n^2 \lg n)$ time in the worst case (I think)—but this is actually fine since $n$ is at most 300. This solution is accepted and runs in 0.00 seconds for me.

Tries

However, I want to talk about a more sophisticated solution that has better asymptotic time complexity and generalizes nicely to other problems. Reading a sequence of strings from a prefix-free set should make you think of Huffman coding, if you’ve ever seen that before. In general, the idea is to define a trie containing all the digit names, with each leaf storing the corresponding digit. We can then scan through the input one character at a time, keeping track of our current position in trie, and emit a digit (and restart at the root) every time we reach a leaf. This should run in $O(n)$ time.

Let’s see some generic Haskell code for tries (this code can also be found at byorgey/comprog-hs/Trie.hs on GitHub). First, some imports, a data type definition, and emptyTrie and foldTrie for convenience:

module Trie where

import           Control.Monad              ((>=>))
import qualified Data.ByteString.Lazy.Char8 as C
import           Data.List                  (foldl')
import           Data.Map                   (Map, (!))
import qualified Data.Map                   as M
import           Data.Maybe                 (fromMaybe)

data Trie a = Trie
  { trieSize :: !Int
  , value    :: !(Maybe a)
  , children :: !(Map Char (Trie a))
  }
  deriving Show

emptyTrie :: Trie a
emptyTrie = Trie 0 Nothing M.empty

-- | Fold a trie into a summary value.
foldTrie :: (Int -> Maybe a -> Map Char r -> r) -> Trie a -> r
foldTrie f (Trie n b m) = f n b (M.map (foldTrie f) m)

A trie has a cached size (we could easily generalize this to store any sort of monoidal annotation), a possible value (i.e. the value associated with the empty string key, if any), and a map from characters to child tries. The cached size is not needed for this problem, but is included since I needed it for some other problems.

Now for inserting a key/value pair into a Trie. This code honestly took me a while to get right! We fold over the given string key, producing for each key suffix a function which will insert that key suffix into a trie. We have to be careful to correctly update the size, which depends on whether the key being inserted already exists—so the recursive go function actually returns a pair of a new Trie and an Int representing the change in size.

-- | Insert a new key/value pair into a trie, updating the size
--   appropriately.
insert :: C.ByteString -> a -> Trie a -> Trie a
insert w a t = fst (go w t)
  where
    go = C.foldr
      (\c insSuffix (Trie n v m) ->
         let (t', ds) = insSuffix (fromMaybe emptyTrie (M.lookup c m))
         in  (Trie (n+ds) v (M.insert c t' m), ds)
      )
      (\(Trie n v m) ->
         let ds = if isJust v then 0 else 1
         in  (Trie (n+ds) (Just a) m, ds)
      )

Now we can create an entire Trie in one go by folding over a list of key/value pairs with insert:

-- | Create an initial trie from a list of key/value pairs.  If there
--   are multiple pairs with the same key, later pairs override
--   earlier ones.
mkTrie :: [(C.ByteString, a)] -> Trie a
mkTrie = foldl' (flip (uncurry insert)) emptyTrie

A few lookup functions: one to look up a single character and return the corresponding child trie, and then on top of that we can build one to look up the value associated to an entire string key.

-- | Look up a single character in a trie, returning the corresponding
--   child trie (if any).
lookup1 :: Char -> Trie a -> Maybe (Trie a)
lookup1 c = M.lookup c . children

-- | Look up a string key in a trie, returning the corresponding value
--   (if any).
lookup :: C.ByteString -> Trie a -> Maybe a
lookup = C.foldr ((>=>) . lookup1) value

Finally, a function that often comes in handy for using a trie to decode a prefix-free code. It takes an input string and looks it up character by character; every time it encounters a key which exists in the trie, it emits the corresponding value and then starts over at the root of the trie.

decode :: Trie a -> C.ByteString -> [a]
decode t = reverse . snd . C.foldl' step (t, [])
  where
    step (s, as) c =
      let Just s' = lookup1 c s
      in  maybe (s', as) (\a -> (t, a:as)) (value s')

These tries are limited to string keys, since that is most useful in a competitive programming context, but it is of course possible to make much more general sorts of tries — see Hinze, Generalizing Generalized Tries.

Solution

Finally, we can use our generic tries to solve the problem: read the input, build a trie mapping digit names to values, use the decode function to read the given number, and finally interpret the resulting list of digits in the given base.

import Control.Arrow ((>>>))
import ScannerBS
import Trie

main = C.interact $ runScanner tc >>> solve >>> showB

data TC = TC { base :: Integer, digits :: [C.ByteString], number :: C.ByteString }
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  base <- integer
  TC base <$> (fromIntegral base >< str) <*> str

solve :: TC -> Integer
solve TC{..} = foldl' (\n d -> n*base + d) 0 (decode t number)
  where
    t = mkTrie (zip digits [0 :: Integer ..])

Practice problems

Here are a few other problems where you can profitably make use of tries. Some of these can be solved directly using the Trie code given above; others may require some modifications or enhancements to the basic concept.

For next time

For next time, I challenge you to solve Chemist’s vows!

2 Responses to Competitive programming in Haskell: tries

Jan-Willem Maessen says:

May 3, 2023 at 5:26 pm

Interestingly, the solution here is similar in principle to something I coded up in Haskell years ago to take the output of addr2line and a larger-than-memory log file, and stream through the log file replacing code addresses with symbolic names. Very useful if you have limited memory (tens of megs at the time) and several gig of logs.

Pingback: Competitive programming in Haskell: parsing with an NFA | blog :: Brent -> [String]