Data.List.Split

December 21, 2008

Have you ever had a string like this

"abc;def;ghijk;lm"

and wanted to turn it into a list of strings, like this?

["abc", "def", "ghijk", "lm"]

Of course, you could always use a parsing library, or a regular expression library, but sometimes you just want something a little more lightweight. Perl and Ruby both have library functions called “split” to do just this. Haskell’s standard libraries, on the other hand, have no such function, much to the consternation of many a newbie and experienced Haskeller alike. There have been many proposals to add such a thing to the standard Data.List module in the past, but nothing ever came of it, primarily because there are many slightly different ways to split a list, and no one could ever agree on the One True Splitting Interface.

I decided we’ve been Doing It Wrong. Instead of bickering about the one true interface and going through the stringent library proposals process, let’s just get some useful code together and release it on Hackage. (Of course there are advantages to inclusion in the standard libraries — but that can come later.) So I solicited contributions on a wiki page, took some of the ideas, bits of code, and some ideas of my own, and created Data.List.Split.

Instead of talking about it more, I’ll just show some examples:


*Data.List.Split> splitOn ";" "abc;def;ghijk;lm"
["abc","def","ghijk","lm"]
*Data.List.Split> splitWhen (<0) [1,4,-8,4,-3,-2,9]
[[1,4],[4],[],[9]]
*Data.List.Split> split (startsWith "app") "applyappicativeapplaudapproachapple"
["apply","appicative","applaud","approach","apple"]
*Data.List.Split> split (dropDelims $ oneOf ":;") "::abc;:;;fg:h;;ij;"
["","","abc","","","","fg","h","","ij",""]
*Data.List.Split> split (condense . dropInitBlank $ oneOf ":;") "::abc;:;;fg:h;;ij;"
["::","abc",";:;;","fg",":","h",";;","ij",";",""]

Detailed documentation can be found in the package itself. Install it from Hackage:

cabal install split

You can also check out the darcs repo. Comments, suggestions, and patches welcome!


QuickCheck rocks my socks

December 16, 2008

Over the past few days I’ve been hacking on a Data.List.Split module, to be used when you just want to quickly split up a list without going to the trouble of using a real parsing or regular expression library — for example, suppose you are writing a one-off script that needs to read in strings like “abc;def;gh;i” and you want to split it on the semicolons to yield a list of Strings. Of course, such a thing isn’t in the standard libraries since no one can agree on the right interface; the idea is to provide a whole module with lots of different ways to split instead of a single function, and to just put it on Hackage instead of going through the much more difficult process of getting it included in the standard libraries. Anyway, more on that when it’s released, hopefully in a few days.

Like any good Haskell programmer writing a nice pure library, today I started adding a suite of QuickCheck properties. I set up a framework and added a couple basic properties: 200 tests passed! I added another: 100 tests passed! Now I was on a roll, and added three more. This time… it hung after checking the fourth property only 8 times. OK, no problem, I’ve seen this sort of thing before when there’s some sort of combinatorial explosion in the size of the randomly generated test data… except the test data is so simple that’s definitely not what’s happening here. Hmm… maybe it’s infinite recursion? But I really can’t see where infinite recursion could crop up. Oh, unless… hmm, yes, if function A ever returns an empty list, it would cause an infinite loop. But surely function A can never return an empty list! Well, let’s try it. prop_A_nonempty x = (not . null) (funcA x). And… Falsified! Whoops.

If you’re curious, the case I forgot was when you specify the empty list as a delimiter — obvious in retrospect, perhaps, but without QuickCheck’s assistance, I probably would have ended up releasing the library with this latent infinite recursion bug!