For another project I’m working on, I needed a way to enumerate and randomly sample values from various potentially infinite collections. There are plenty of packages in this space, but none of them quite fit my needs:
universe
(and related packages) is very nice, but it’s focused on enumerating values of Haskell data types, not arbitrary sets: since it uses type classes, you have to make a new Haskell type for each thing you want to enumerate. It also uses actual Haskell lists of values, which doesn’t play nicely with sampling.enumerable
has not been updated in a long time and seems to be superseded byuniverse
.enumerate
is likewise focused on generating values of Haskell data types, with accompanying generic deriving machinery.size-based
is used as the basis for the venerabletesting-feat
library, but these are again focused on generating values of Haskell data types. I’m also not sure I need the added complexity of size-indexed enumerations.enumeration
looks super interesting, and I might be able to use it for what I want, but (a) I’m not sure whether it’s maintained anymore, and (b) it seems rather more complex than I need.
I really want something like Racket’s nice data/enumerate
package, but nothing like that seems to exist in Haskell. So, of course, I made my own! For now you can find it on GitHub.1 Here’s the package in a nutshell:
- Enumerations are represented by the parameterized type
Enumeration
, which is an instance ofFunctor
,Applicative
, andAlternative
(but notMonad
). - Enumerations keep track of their cardinality, which could be either countably infinite or a specific natural number.
- Enumerations are represented as functions from index to value, so they can be efficiently indexed (which also enables efficient random sampling).
- The provided combinators will always do something sensible so that every value in the resulting enumeration occurs at a finite index. For example, if you take the disjoint union of two infinite enumerations, the resulting enumeration will alternate between values from the two inputs.
I wrote about something similar a few years ago. The main difference is that in that post I limited myself to only finite enumerations. There’s a lot more I could say but for now I think I will just show some examples:
>>> enumerate empty
[]
>>> enumerate unit
[()]
>>> enumerate $ empty <|> unit <|> unit
[(),()]
>>> enumerate $ finite 4 >< finiteList [27,84,17]
[(0,27),(0,84),(0,17),(1,27),(1,84),(1,17),(2,27),(2,84),(2,17),(3,27),(3,84),(3,17)]
>>> select (finite 4000000000000 >< finite 123456789) 0
(0,0)
>>> select (finite 4000000000000 >< finite 123456789) 196598723084073
(1592449,82897812)
>>> card (finite 4000000000000 >< finite 123456789)
Finite 493827156000000000000
>>> :set -XTypeApplications
>>> enumerate $ takeE 26 . dropE 65 $ boundedEnum @Char
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>>> take 10 . enumerate $ nat >< nat
[(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)]
>>> take 10 . enumerate $ cw
[1 % 1,1 % 2,2 % 1,1 % 3,3 % 2,2 % 3,3 % 1,1 % 4,4 % 3,3 % 5]
>>> take 15 . enumerate $ listOf nat
[[],[0],[0,0],[1],[0,0,0],[1,0],[2],[0,1],[1,0,0],[2,0],[3],[0,0,0,0],[1,1],[2,0,0],[3,0]]
data Tree = L | B Tree Tree
deriving (Eq, Show)
trees :: Enumeration Tree
trees = infinite $ singleton L <|> B <$> trees <*> trees
>>> take 3 . enumerate $ trees
[L,B L L,B L (B L L)]
>>> select trees 87239862967296
B (B (B (B (B L L) (B (B (B L L) L) L)) (B L (B L (B L L)))) (B (B (B L (B L (B L L))) (B (B L L) (B L L))) (B (B L (B L (B L L))) L))) (B (B L (B (B (B L (B L L)) (B L L)) L)) (B (B (B L (B L L)) L) L))
treesOfDepthUpTo :: Int -> Enumeration Tree
treesOfDepthUpTo 0 = singleton L
treesOfDepthUpTo n = singleton L <|> B <$> t' <*> t'
where t' = treesOfDepthUpTo (n-1)
>>> card (treesOfDepthUpTo 0)
Finite 1
>>> card (treesOfDepthUpTo 1)
Finite 2
>>> card (treesOfDepthUpTo 3)
Finite 26
>>> card (treesOfDepthUpTo 10)
Finite
14378219780015246281818710879551167697596193767663736497089725524386087657390556152293078723153293423353330879856663164406809615688082297859526620035327291442156498380795040822304677
>>> select (treesOfDepthUpTo 10) (2^50)
B L (B L (B L (B (B L (B (B L (B (B L L) L)) (B (B (B (B L L) (B L L)) (B L (B L L))) (B (B (B L L) L) (B (B L L) L))))) (B (B (B (B (B (B L L) L) (B (B L L) L)) (B L L)) (B (B (B (B L L) L) (B L (B L L))) (B (B (B L L) (B L L)) L))) (B (B (B (B L L) (B L L)) (B (B (B L L) L) L)) (B (B L L) (B (B (B L L) L) (B (B L L) L))))))))
Comments, questions, suggestions for additional features, etc. are all very welcome!
-
I chose the name
enumeration
before I realized there was already a package of that name on Hackage! So now I have to come up with another name that’s not already taken. Suggestions welcome…↩
You should also look at the embedded enumerations in the [http://hackage.haskell.org/package/gencheck](gencheck) package.
Ah, thanks, I’ll take a look!
Nice! Your package seems to be compatible with my Unfoldable package. Your Enumeration is an Unfolder instance.
Ah, very cool, I didn’t know about that package! For the Unfolder instance, should chooseInt = finite?
Yes
I guess there is no Monad instance, because there is no efficient way to sample and compute the cardinality of resulting enumerations. Presumably, this is a known/studied issue — if anyone could point me to a paper/blog post on this topic that would be great.
Sadly, it looks like there is no efficient Selective instance either. To implement it, we’d need a way to compute the cardinality of Lefts and Rights in an Enumeration (Either a b).
The inefficiency of a Monad instance is certainly one reason. But also, because of the way I decided to do disjoint unions (doing interleaving only when necessary, but when possible simply listing all elements of a finite enumeration first, followed by the elements of the other) it actually becomes impossible to implement a Monad instance, because for join it may take you an infinite amount of time to decide what to produce first. If you have an infinite enumeration of infinite enumerations, you will spend eternity searching for a finite enumeration in there because if you ever found one it would have to come first. With something that always does interleaving no matter what it could work, but the math becomes more complicated and the resulting enumerations are (I consider) less “natural”. I chose the more “natural” order since due to inefficiency the Monad instance was not a great loss anyway.
I would be interested in anything on this topic as well, but I am not aware of anything.
I’m still trying to understand why there’s no Selective instance — could you elaborate?
Aha, I see. Interesting!
Speaking of Selective, we need to implement this function (excuse the name clash):
select :: Enumeration (Either a b) -> Enumeration (a -> b) -> Enumeration b
select x y = …
Let’s look at just computing the Cardinality of the result. The answer is:
#x.Left * #y + #x.Right
where #x.Left and #x.Right are the numbers of occurrences of Left and Right values in the first enumeration x, and #y is the cardinality of the second enumeration y. Why? Because whenever we get a value of type Left a, we need to apply one possible function to it. But we have no way of extracting #x.Left and #x.Right from x in the current representation of the Enumeration data type. And I’m not sure how this could be fixed.
Note: I’m being a bit imprecise here, because every Applicative instance can be given a lawful Selective instance simply by using the function selectA, which performs both effects regardless of whether the value is Left or Right. This would essentially duplicate values of type Right b #y times, essentially behaving in an Applicative way. But I think such an implementation would not be very useful; we really want to avoid duplicating values in the result in order to have a useful Selective instance.
Ah, OK, I see now. Yes, I agree this does not seem possible. Although I think as with join, if you make some assumptions about which enumerations must be finite or infinite, you could implement it, but it would be terribly inefficient. Basically anything that creates any kind of dependency from the *values* of an enumeration to the *structure* of a resulting enumeration is going to be a source of inefficiency, because you can no longer just do some arithmetic to do indexing; you have to actually start, well, enumerating.
Pingback: Lightweight invertible enumerations in Haskell | blog :: Brent -> [String]
Pingback: Resumen de lecturas compartidas durante mayo de 2019 | Vestigium