Many parser combinator libraries in Haskell (such as parsec) have both a `Monad`

interface as well as an `Applicative`

interface. (Actually, to be really useful, you also need `MonadPlus`

along with `Monad`

, or `Alternative`

along with `Applicative`

, in order to encode choice; from now on when I talk about `Monad`

and `Applicative`

note that I really have `MonadPlus`

or `Alternative`

in mind as well.) The `Applicative`

interface is often nicer, but it is less powerful than the `Monad`

interface: in particular, using `Applicative`

you can only parse context-free languages, whereas `Monad`

lets you parse arbitrary context-sensitive languages. Intuitively, this is because the structure of `Applicative`

computations cannot depend on intermediate results, whereas `Monad`

computations allow you to choose which computation to run next based on intermediate results.

This is a good bit of intuition, with only one minor caveat: it isn’t true! I believe it was two years ago, during the second Hac phi, when I first learned from Edward Kmett how `Applicative`

(by which I mean, of course, `Alternative`

) can be used to parse arbitrary context-sensitive languages. The question just came up again in the `#haskell`

IRC channel, and I figured it would be useful to have this all written down somewhere. In particular, Reid Barton gave a nice sketch which I decided to turn into some working code.

Here’s the key insight: normally, grammars are defined as *finite* objects: a finite set of terminals, a finite set of nonterminals, and a finite set of productions. However, Haskell’s general recursion means that we can write down a "grammar" with an *infinite* set of production rules. This is what lets us get away with parsing context-sensitive languages with `Applicative`

: we just make a different production rule for every possible input!

First, some imports. Notice that I do *not* import `Control.Monad`

.

```
> import Text.Parsec
> import Text.Parsec.String
> import Control.Arrow ((&&&))
> import Control.Applicative hiding ((<|>))
> import Data.List (group)
```

The usual `guard`

function is for `MonadPlus`

, but we can make something equivalent for `Alternative`

.

```
> guard' :: Alternative f => Bool -> f ()
> guard' True = pure ()
> guard' False = empty
```

And now for the meat of the example. `parseArbitrary`

takes an arbitrary predicate on `Strings`

built from lowercase letters and turns it into a parser. The created parser will accept `Strings`

for which the predicate evaluates to `True`

(returning `()`

) and fail for any other string.

```
> parseArbitrary :: (String -> Bool) -> Parser ()
> parseArbitrary p =
```

If we encounter `eof`

, we simply ensure that the predicate holds of the empty string.

```
> (eof <* guard' (p []))
```

Otherwise, we choose between 26 alternatives based on the next character in the input. If the character `c`

is encountered, we make a recursive call to `parseArbitrary (p . (c:))`

. The remainder of the input must satisfy `(p . (c:))`

, that is, it must consist of some `String`

`s`

such that `(c:s)`

satisfies the predicate `p`

.

```
> <|> foldr (<|>) parserZero
> (map (\c -> char c *>
> parseArbitrary (p . (c:))
> )
> ['a'..'z']
> )
```

For any given predicate `p`

, you can think of `parseArbitrary p`

as an infinite tree with a 26-way branch at each node. Each node "remembers" the path taken to reach it from the root of the tree, in the form of prepend functions composed with the original predicate. We have constructed an infinite grammar: each node in the tree corresponds to a production, one for every possible input prefix.

Let’s try it out. Here’s a function which only accepts `String`

s of the form `"aaabbbccc"`

, with an equal number of a’s, b’s, and c’s. This is a well-known example of a language which is not context-free (easily shown using the pumping lemma for context-free languages).

```
> f :: String -> Bool
> f s
> | [('a',na), ('b',nb), ('c',nc)]
> <- map (head &&& length). group $ s
>
> = na == nb && nb == nc
>
> | otherwise = False
```

Now we make `f`

into a parser and test it on some example inputs:

```
> p = parseArbitrary f
>
> main = do
> parseTest p "aaa"
> parseTest p "aaabbbcc"
> parseTest p "aaaabcccc"
> parseTest p "aaaaabbbbbccccc"
```

The last test succeeds by returning `()`

. For the first three, we get error messages like this:

```
parse error at (line 1, column 4):
unexpected end of input
expecting "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y" or "z"
```

Obviously, these are not very helpful. But what were you expecting? ~~After all, this is one of those things that is interesting in theory, but in practice amounts to an awful hack that no one would ever want to use in real code.~~

~~
~~~~In the end, it’s still true that ~~`Applicative`

can only parse context-free languages *as long as we restrict ourselves to finite grammars*—which any sensible programmer would do anyway.

[**ETA**: it looks like using infinite grammars is not as impractical as I thought—see the comments below!]