Decomposing data structures

So, what are combinatorial species? As a very weak first approximation, you can think of them as a generalization of algebraic data types.¹ That doesn’t really say much about what they are, but at least it does explain why programmers might be interested in them.

The goal of species is to have a unified theory of structures, or containers. By a structure we mean some sort of “shape” containing locations (or positions). Here are two different structures, each with eight locations:

One thing that’s important to get straight from the beginning is that we are talking about structures with labeled locations. The numbers in the picture above are not data being stored in the structures, but names or labels for the locations. To talk about a data structure (i.e. a structure filled with data), we would have to also specify a mapping from locations to data, like $\{ 0 \mapsto \texttt{'s'}, 1 \mapsto \texttt{'p'}, 2 \mapsto \texttt{'e'} \dots \}$

Now go reread the above paragraph! For programmers I find that this is one of the most difficult things to grasp at first—or at least one of the things that is easiest to forget. The fact that the labels are often natural numbers (which are often also used as sample data) does not help.

One useful intuition is to think of the labels as memory addresses, which point off to some location where a data value is stored. This intuition has some particularly interesting consequences when we get to talking about operations like Cartesian product and functor composition, since it gives us a way to model sharing (albeit only in limited ways).

Why have labels at all? In the tree shown above, we can uniquely identify each location by a path from the root of the tree, without referencing their labels at all. However, the other structure illustrates one reason why labels are needed. The circle is supposed to indicate that the structure has rotational symmetry, so there would be no way to uniquely refer to any location other than by giving them labels.

The idea of decomposing data structures as shapes with locations combined with data is not unique to species. In the computer science community, the idea goes back, I think, to Jay and Cockett (1994) in their work on “shapely types” (their “locations” are always essentially natural numbers, since they work in terms of shapes and lists of data) and more recently Abbott, Altenkirch, and Ghani (2003) with their definition of “containers” (which, like the theory of species, has a much more general notion of locations). However, it should be noted that the literature on species never actually talks about mappings from labels to data: combinatorialists don’t care about data structures, they only care about structures!

Now that we have some motivation, and with the requisite disclaimers about labels out of the way, in my next post I’ll motivate and explain the formal definition of species.

References

Abbott, Michael, Thorsten Altenkirch, and Neil Ghani. 2003. “Categories of Containers.” In Foundations of Software Science and Computation Structures, 23–38. http://dx.doi.org/10.1007/3-540-36576-1_2.

Jay, C. Barry, and J. Robin B. Cockett. 1994. “Shapely Types and Shape Polymorphism.” In ESOP ’94: Proceedings of the 5th European Symposium on Programming, 302–316. London, UK: Springer-Verlag.

No relation to Generalized Algebraic Data Types.↩

About Brent

Associate Professor of Computer Science at Hendrix College. Functional programmer, mathematician, teacher, pianist, follower of Jesus.

View all posts by Brent →

9 Responses to Decomposing data structures

Ruben says:

October 28, 2012 at 9:10 am

This is so cool. I was looking for some relation combinatorics and data structures
Thanks for the references, I will look at them

Harley Eades says:

October 28, 2012 at 11:44 am

This is really cool. I plan to follow your posts.

Algebraic datatypes can be made modular using some well-known ways. For example, using disjoint union and interpreting the type as a polynomial functor then defining the data type as as the least fixed point of the disjoint union of the polynomial functors. Can data types based on this idea be made modular in the same way? Could it possibly be done in a different way?

How do we interpret species categorically?

Anyway just some thoughts I had while reading your post.

- Brent says:
  
  October 28, 2012 at 11:55 am
  
  The answer to all your questions is “yes”, as will be explained in future posts. =)
  
  - Harley Eades says:
    
    October 28, 2012 at 11:58 am
    
    Great! I look forward to reading them. :-)
    
Pingback: Combinatorial species definition | blog :: Brent -> [String]
Jeremy Gibbons says:

November 29, 2012 at 6:58 am

Prior art: Colin Banger and David Skillicorn worked on a Bird-Meertens Formalism approach to arrays in terms of shape and data before Jay and Cockett in 1994; eg in a paper in ATABLE 1992, “multidimensional arrays can be modelled as a shape descriptor paired with an indexed sequence of items” (http://www.hains.eu/papers/atable92_proceedings.pdf). Although it’s quite possible that each knew about the others’ work.

- Brent says:
  
  November 29, 2012 at 7:41 am
  
  Ah, thanks for the reference!
  
Pingback: Species definition clarification and exercises | blog :: Brent -> [String]
Pingback: The algebra of species: primitives | blog :: Brent -> [String]