Decomposing data structures

So, what are combinatorial species? As a very weak first approximation, you can think of them as a generalization of algebraic data types.1 That doesn’t really say much about what they are, but at least it does explain why programmers might be interested in them.

The goal of species is to have a unified theory of structures, or containers. By a structure we mean some sort of “shape” containing locations (or positions). Here are two different structures, each with eight locations:

One thing that’s important to get straight from the beginning is that we are talking about structures with labeled locations. The numbers in the picture above are not data being stored in the structures, but names or labels for the locations. To talk about a data structure (i.e. a structure filled with data), we would have to also specify a mapping from locations to data, like \{ 0 \mapsto \texttt{'s'}, 1 \mapsto \texttt{'p'}, 2 \mapsto \texttt{'e'} \dots \}

Now go reread the above paragraph! For programmers I find that this is one of the most difficult things to grasp at first—or at least one of the things that is easiest to forget. The fact that the labels are often natural numbers (which are often also used as sample data) does not help.

One useful intuition is to think of the labels as memory addresses, which point off to some location where a data value is stored. This intuition has some particularly interesting consequences when we get to talking about operations like Cartesian product and functor composition, since it gives us a way to model sharing (albeit only in limited ways).

Why have labels at all? In the tree shown above, we can uniquely identify each location by a path from the root of the tree, without referencing their labels at all. However, the other structure illustrates one reason why labels are needed. The circle is supposed to indicate that the structure has rotational symmetry, so there would be no way to uniquely refer to any location other than by giving them labels.

The idea of decomposing data structures as shapes with locations combined with data is not unique to species. In the computer science community, the idea goes back, I think, to Jay and Cockett (1994) in their work on “shapely types” (their “locations” are always essentially natural numbers, since they work in terms of shapes and lists of data) and more recently Abbott, Altenkirch, and Ghani (2003) with their definition of “containers” (which, like the theory of species, has a much more general notion of locations). However, it should be noted that the literature on species never actually talks about mappings from labels to data: combinatorialists don’t care about data structures, they only care about structures!

Now that we have some motivation, and with the requisite disclaimers about labels out of the way, in my next post I’ll motivate and explain the formal definition of species.

References

Abbott, Michael, Thorsten Altenkirch, and Neil Ghani. 2003. “Categories of Containers.” In Foundations of Software Science and Computation Structures, 23–38. http://dx.doi.org/10.1007/3-540-36576-1_2.

Jay, C. Barry, and J. Robin B. Cockett. 1994. “Shapely Types and Shape Polymorphism.” In ESOP ’94: Proceedings of the 5th European Symposium on Programming, 302–316. London, UK: Springer-Verlag.

About Brent

Associate Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.
This entry was posted in math, species and tagged , , , . Bookmark the permalink.

9 Responses to Decomposing data structures

  1. Ruben says:

    This is so cool. I was looking for some relation combinatorics and data structures
    Thanks for the references, I will look at them

  2. Harley Eades says:

    This is really cool. I plan to follow your posts.

    Algebraic datatypes can be made modular using some well-known ways. For example, using disjoint union and interpreting the type as a polynomial functor then defining the data type as as the least fixed point of the disjoint union of the polynomial functors. Can data types based on this idea be made modular in the same way? Could it possibly be done in a different way?

    How do we interpret species categorically?

    Anyway just some thoughts I had while reading your post.

  3. Pingback: Combinatorial species definition | blog :: Brent -> [String]

  4. Prior art: Colin Banger and David Skillicorn worked on a Bird-Meertens Formalism approach to arrays in terms of shape and data before Jay and Cockett in 1994; eg in a paper in ATABLE 1992, “multidimensional arrays can be modelled as a shape descriptor paired with an indexed sequence of items” (http://www.hains.eu/papers/atable92_proceedings.pdf). Although it’s quite possible that each knew about the others’ work.

  5. Pingback: Species definition clarification and exercises | blog :: Brent -> [String]

  6. Pingback: The algebra of species: primitives | blog :: Brent -> [String]

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.