## The generic-random library, part 1: simple generic Arbitrary instances

(Note, 9/21/2017: The `generic-random` package has been changed and updated quite a bit since I wrote this post. For more up-to-date information, see e.g. the tutorial included with the package.)

In a previous post I pointed out that we know all the theory to make nice, principled, practical random generators for recursive algebraic data types; someone just needed to step up and do the work. Well, Li-yao Xia took up the challenge and produced a brilliant package, generic-random, available on Hackage right now for you to use!

However, although the package does include some Haddock documentation, it is probably difficult for someone with no experience or background in this area to navigate. So I thought it would be worth writing a few blog posts by way of a tutorial and introduction to the package.

``````> {-# LANGUAGE GADTSyntax           #-}
> {-# LANGUAGE DeriveGeneric        #-}
> {-# LANGUAGE FlexibleContexts     #-}
> {-# LANGUAGE UndecidableInstances #-}
>
> import GHC.Generics
> import Test.QuickCheck
>
> import Generic.Random.Generic
``````

## The problem

First, a quick recap of the problem we are trying to solve: the obvious, naive way of generating random instances of some recursive algebraic data type often produces really terrible distributions. For example, one might generate really tiny structures most of the time and then occasionally generate a humongous one. For more background on the problem, see this post or this one.

## A first example: generating generic `Arbitrary` instances

As a first example, consider the following algebraic data type:

``````> data Foo where
>   Bar  :: Char -> Int -> String -> Foo
>   Baz  :: Bool -> Bool -> Foo
>   Quux :: [Woz] -> Foo
>   deriving (Show, Generic)
>
> data Woz where
>   Wiz :: Int -> Woz
>   Waz :: Bool -> Woz
>   deriving (Show, Generic)
``````

You have probably noticed by now that this is not recursive (well, except for the embedded lists). Patience! We’ll get to recursive ADTs in due time, but it turns out the library has some nice things to offer for non-recursive ADTs as well, and it makes for an easier introduction.

Now, suppose we wanted to use QuickCheck to test some properties of a function that takes a `Foo` as an argument. We can easily make our own instances of `Arbitrary` for `Foo` and `Woz`, like so:

``````instance Arbitrary Foo where
arbitrary = oneof
[ Bar <\$> arbitrary <*> arbitrary <*> arbitrary
, Baz <\$> arbitrary <*> arbitrary
, Quux <\$> arbitrary
]

instance Arbitrary Woz where
arbitrary = oneof
[ Wiz <\$> arbitrary
, Waz <\$> arbitrary
]``````

This works reasonably well:

``````λ> sample (arbitrary :: Gen Foo)
Baz True True
Baz False True
Baz True True
Quux []
Baz False True
Bar '<' 3 "zy\\\SOHpO_"
Baz False True
Bar '\SOH' 0 "\"g\NAKm"
Bar 'h' (-9) "(t"
Quux [Wiz (-2),Waz False]
Baz False True``````

The only problem is that writing those instances is quite tedious. There is no thought required at all. Isn’t this exactly the sort of thing that is supposed to be automated with generic programming?

Why yes, yes it is. And the `generic-random` package can do exactly that. Notice that we have derived `Generic` for `Foo` and `Woz`. We can now use the `genericArbitrary` function from `Generic.Random.Generic` to derive completely standard `Arbitrary` instances, just like the ones we wrote above:

``````> instance Arbitrary Foo where
>   arbitrary = genericArbitrary
>
> instance Arbitrary Woz where
>   arbitrary = genericArbitrary
``````
``````λ> sample (arbitrary :: Gen Foo)
Quux []
Bar '\159' (-2) ""
Baz True True
Baz False False
Baz True True
Baz True False
Quux [Wiz 9,Wiz 7,Waz True,Waz True,Waz False]
Quux [Wiz (-10),Waz False,Waz False,Waz True,Waz True,Wiz (-14),Wiz 13,Waz True,Wiz (-8),Wiz 12,Wiz (-13)]
Bar '\130' 10 "FN\222j?\b=\237(\NULW\231+ts\245"
Bar 'n' 14 ""
Bar '\205' 4 "\SYN"``````

Seems about the same, except we wrote way less code! Huzzah!

If we want certain constructors to occur more frequently, we can also control that using `genericArbitraryFrequency`, which takes a list of `Int`s (each `Int` specifies the weight for one constructor).

A few notes:

• Using the `Generic.Random.Generic` module is the quickest and simplest way to generate random instances of your data type, if it works for your use case.

• It has some limitations, namely:

• It only generates `Arbitrary` instances for QuickCheck. It can’t create more general random generators.

• It probably won’t work very well for recursive data types.

However, these limitations are addressed by other parts of the library. Intrigued? Read on!

## Recursive types, the simple way

Let’s now consider a simple recursive type:

``````> data Tree a where
>   Leaf   :: a                -> Tree a
>   Branch :: Tree a -> Tree a -> Tree a
>   deriving (Show, Generic)
>
> treeSize :: Tree a -> Int
> treeSize (Leaf _)     = 1
> treeSize (Branch l r) = 1 + treeSize l + treeSize r
``````

We can try using `genericArbitrary`:

``````instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = genericArbitrary``````

The problem is that this tends to generate some tiny trees and some enormous trees, with not much in between:

``````λ> map treeSize  replicateM 50 (generate (arbitrary :: Gen (Tree Int)))
[1,1,1,269,1,1,1,1,1,11,7,3,5,1,1,1,7,1,1,1,3,3,83,5,1,1,3,111,265,47,1,3,19,1,11,1,5,3,15,15,1,91,1,13,4097,119,1,15,5,3]``````

And this is not a problem specific to trees; this kind of thing is likely to happen for any recursive type.

Before we get to more interesting/complicated tools, it’s worth noting that `random-generics` provides a simple mechanism to limit the size of the generated structures: the `genericArbitrary'` function works like `genericArbitrary` but uses QuickCheck’s `sized` mechanism to cut off the recursion when it gets too big. The available size is partitioned among recursive calls, so it does not suffer from the exponential growth you might see if only the depth was limited. When the size counter reaches zero, the generator tries to terminate the recursion by picking some finite, non-recursive value(s). The parameter to `genericArbitrary'` is a natural number specifying how deep the finite, recursion-terminating values can be. `Z` (i.e zero) means the generator will only be willing to terminate the recursion with nullary constructors. In our case, `Tree` does not have any nullary constructors, so we should not use `Z`: if we do, the generator will be unable to terminate the recursion when the size reaches zero and we will get the same behavior as `genericArbitrary`. Instead, we should use `S Z`, which means it will be able to pick the depth-1 term `Leaf x` (for some arbitrary `x`) to terminate the recursion.

Let’s try it:

``````> instance (Arbitrary a, Generic a, BaseCases Z (Rep a)) => Arbitrary (Tree a) where
>   arbitrary = genericArbitrary' (S Z)
``````
``````λ> sample (arbitrary :: Gen (Tree Int))
Leaf 0
Branch (Leaf 0) (Branch (Leaf 0) (Branch (Leaf 0) (Leaf 0)))
Branch (Leaf (-1)) (Leaf 1)
Leaf (-3)
Leaf 7
Branch (Leaf (-4)) (Branch (Branch (Leaf 1) (Leaf (-1))) (Leaf (-1)))
Branch (Leaf (-2)) (Branch (Leaf 1) (Branch (Leaf 0) (Branch (Leaf 0) (Leaf 0))))
Leaf 14
Branch (Branch (Leaf 2) (Leaf 2)) (Branch (Branch (Branch (Leaf 1) (Branch (Branch (Leaf 0) (Branch (Leaf 0) (Leaf 0))) (Branch (Leaf 0) (Leaf 0)))) (Branch (Branch (Branch (Leaf 0) (Leaf 0)) (Leaf 0)) (Leaf 0))) (Leaf (-3)))
Leaf 4
Leaf 9``````

Ah, that’s much better.

Finally, `genericArbitraryFrequency'` is the same as `genericArbitraryFrequency` but limits the recursion depth as `genericArbitrary'` does.

If you have a recursive data type you want to use with QuickCheck, it’s worth trying this, since it is quick and simple. The main problem with this approach is that it does not generate a uniform distribution of values. (Also, it is limited in that it is specifically tied to QuickCheck.) In this example, although you can’t necessarily tell just by looking at the sample random trees, I guarantee you that some kinds of trees are much more likely to be generated than others. (Though I couldn’t necessarily tell you which kinds.) This can be bad if the specific trees that will trigger a bug are in fact unlikely to be generated.

Next time, we’ll look at how we can actually have efficient, size-limited, uniform random generators using Boltzmann samplers. Assistant Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in combinatorics, haskell and tagged , , , . Bookmark the permalink.

### 4 Responses to The generic-random library, part 1: simple generic Arbitrary instances

1. Li-yao Xia says:

Thank you for this well-written piece!

• Brent says:

And thank you for the nice library! =D

This site uses Akismet to reduce spam. Learn how your comment data is processed.