## CCSC-Midsouth conference and programming contest

I’m in Memphis (again!) this weekend attending the 2016 CCSC Midsouth Regional Conference, which also has a student programming contest attached. I’m very proud of the two teams from Hendrix, who placed first and fifth out of 17 teams. This is now the third year in a row that a team from Hendrix has won the contest (though I had nothing to do with the first two!). The members of the winning team are all graduating this year, but each of the members of the other team will be around at least one more year, and I have talked to quite a few other students who are interested. I’m excited to continue building up a programming team with a tradition of excellence.

## CIS 194 materials now on github

I’ve been meaning for a while to put the source files for my CIS 194 materials in a publically accessible place, and I’ve finally gotten around to it: you can now find everything in byorgey/haskell-course on github.

I make no particular guarantees about anything; e.g. there is a crufty, complicated shake script that builds everything, but it probably doesn’t even compile with the latest version of Shake.

There are some obvious next steps, for which I have not the time:

1. Get everything building again
2. Fix any outstanding typos, confusions, etc.
3. Update it to be more specifically for general-audience learning rather than for a course at Penn
4. Find a permanent web home for the material. It’s sort of a happy accident that it has been accessible on the Penn site for so long, but it’s not a stable situation; I don’t even know who would have access to that site anymore.

All the material is licensed under a Creative Commons Attribution 4.0 International License, so go wild using it however you like, or working on the above next steps. Pull requests are very welcome, and I will likely give out commit access like candy.

Posted in haskell | Tagged , , , , , , , | 1 Comment

## Boltzmann sampling for generic Arbitrary instances

Update, 7/17/2017: this now exists; see https://byorgey.wordpress.com/2016/09/20/the-generic-random-library-part-1-simple-generic-arbitrary-instances/ .

tl;dr: I know how to generate random instances of data types in a generic way, and even have some old code that already does all the hard work, but won’t have time to polish and package it until this summer. If you’re interested in helping, let me know!

This morning Kenny Foner pointed out to me this tweet by Gabriel Gonzales, asking why there isn’t a default Arbitrary instance for types implementing Generic. It reminded me that I’ve been meaning for a while now (years, in fact!) to get around to packaging up some code that does this.

As several pointed out on Twitter, this seems obvious, but it isn’t. It’s easy to write a generic Arbitrary instance, but hard to write one that generates a good distribution of values. The basic idea is clear: randomly pick a constructor, and then recursively generate random subtrees. The problem is that this is very likely to either blow up and generate gigantic (even infinite) trees, or to generate almost all tiny trees, or both. I wrote a post about this three years ago which illustrates the problem. It also explains half of the solution: generate random trees with a target size in mind, and throw out any which are not within some epsilon of the target size (crucially, stopping the generation early as soon as the tree being generated gets too big).

However, I never got around to explaining the other half of the solution: it’s crucially important to use the right probabilities when picking a constructor. With the wrong probabilities, you will spend too much time generating trees that are either too small or too big. The surprising thing is that with exactly the right probabilities, you can expect to wait only $O(n)$ time before generating a tree of size (approximately1) $n$.2

So, how does one pick the right probabilities? Essentially, you turn the generic description of your data type into a mutually recursive system of generating functions, and (numerically) find their radii of convergence, when thought of as functions in the complex plane. Using these values it is straightforward to compute the right probabilities to use. For the intrepid, this is explained in Duchon et. al3.

I have some old Haskell code from Alexis Darrasse which already does a bunch of the work. It would have to be updated a bit to work with modern libraries and with GHC.Generics, and packaged up to go on Hackage. I won’t really have time to work on this until the summer—but if anyone else is interested in working on this, let me know! I’d be happy to send you the code and provide some guidance in figuring it out.

1. The constant factor depends on how approximate you are willing to be.
2. I wanted to put an exclamation point at the end of that sentence, because this is really surprising. But it looked like $n$ factorial. So, here is the exclamation point: !
3. Duchon, Philippe, et al. “Boltzmann samplers for the random generation of combinatorial structures.” Combinatorics Probability and Computing 13.4-5 (2004): 577-625.

## At SIGCSE 2016 in Memphis

This weekend I’m in Memphis for SIGCSE 2016. This is my first time at SIGCSE, so I’m looking forward to picking up some new ideas in CS education, and more importantly to meeting lots of new people. If you’re here too, feel free to say hi!

## The network reliability problem

Let $G = (V,E)$ be a directed graph with vertices $V$ and edges $E$. Multiple edges between the same pair of vertices are allowed. For concreteness’ sake, think of the vertices as routers, and the edges as (one-way) connections. Let $\mathbb{P} = [0,1]$ denote the set of probabilities, and $\varphi : E \to \mathbb{P}$ be a function which assigns some probability to each edge. Think of $\varphi(e)$ as the probability that a single message sent along the edge $e$ from the source router will successfully reach the target router on the other end.

Suppose that when a router receives a message on an incoming connection, it immediately resends it on all outgoing connections. For $s,t \in V$, let $P(s,t)$ denote the probability that, under this “flooding” scenario, at least one copy of a message originating at $s$ will eventually reach $t$.

For example, consider the simple network shown below.

A message sent from $s$ along the upper route through $r$ has an $0.5 \times 0.9 = 0.45$ probability of arriving at $t$. By definition a message sent along the bottom route has an $0.7$ probability of arriving at $t$. One way to think about computing the overall probability $P(s,t)$ is to compute the probability that it is not the case that the message fails to traverse both links, that is, $1 - (1 - 0.45)(1 - 0.7) = 1 - 0.165 = 0.835$. Alternatively, in general we can see that $1 - (1 - p)(1 - q) = p + q - pq$, so $0.45 + 0.7 - 0.45 \times 0.7 = 0.835$ as well. Intuitively, since the two events are not mutually exclusive, if we add them we are double-counting the situation where both links work, so we subtract the probability of both working.

The question is, given some graph $G$ and some specified nodes $s$ and $t$, how can we efficiently compute $P(s,t)$? For now I am calling this the “network reliability problem” (though I fully expect someone to point out that it already has a name). Note that it might make the problem a bit easier to restrict to directed acyclic graphs; but the problem is still well-defined even in the presence of cycles.

This problem turned out to be surprisingly more difficult and interesting than it first appeared. In a future post or two I will explain my solution, with a Haskell implementation. In the meantime, feel free to chime in with thoughts, questions, solutions, or pointers to the literature.

Posted in math | Tagged , , , | 17 Comments

## A strange representation of Z6

On my other blog I am writing about a proof of the Lucas-Lehmer test, and today in the course of working up some examples I stumbled across this little gem.

Let $M$ be a monoid, and let $M^*$ denote the subset of elements of $M$ which actually have an inverse. Then it is not hard to show that $M^*$ is a group: the identity is its own inverse and hence is in $M^*$; it is closed under the monoid operation since if $a$ and $b$ have inverses then so does $ab$ (namely, $b^{-1}a^{-1}$); and clearly the inverse of every element in $M^*$ is also in $M^*$, because being an inverse also implies having one.

Now let $M = \{a + b\sqrt{3} \mid 0 \leq a,b < 3\}$, where the operation is multiplication, but the coefficients $a$ and $b$ are reduced modulo 3. For example, $(2 + \sqrt 3)(2 + 2 \sqrt 3) = (4 + 6) + (4 + 2) \sqrt 3 = 1 + 0 \sqrt 3$. This does turn out to be associative, and is clearly commutative; and $1 = 1 + 0\sqrt 3$ is the identity. I wrote a little program to see which elements have inverses, and it turns out that the three elements with $a = 0$ do not, but the other six do. So this is an Abelian group of order 6; but there’s only one such group, namely, the cyclic group $\mathbb{Z}_6$. And, sure enough, $M^*$ turns out to be generated by $2 + \sqrt 3$ and $2 + 2 \sqrt 3$.

Posted in math | Tagged , , , | 11 Comments

## ICFP roundup

Well, ICFP 2015 in Vancouver is already two months in the past… I’ve been meaning for a while to write a post reflecting on the experience. Better late than never!

## The Place

I had never been to Vancouver before; it seems like a beautiful and fun city. One afternoon I skipped all the talks and went for a long hike—I ended up walking around the entire perimeter of the Stanley Park seawall, which was gorgeous. The banquet was at the (really cool) aquarium—definitely the first time I have eaten dinner while being watched by an octopus.

## The People

Instead of staying in the conference hotel, four of us (me, Ryan Yates, Ryan Trinkle, and Michael Sloan) rented an apartment through AirBnB.1 The apartment was really great, it ended up being cheaper per person than sharing two hotel rooms, and it was a lot of fun to have a comfortable place to hang out in the evenings—where we could sit around in our pajamas, and talk, or write code, or whatever, without having to be “on”.

I met some new people, including Aahlad Gogineni from Tufts (along with another Tufts student whose name I unfortunately forget); Zac Slade and Boyd Smith from my new home state of Arkansas; and some folks from Vancouver whose names I am also blanking on at the moment. I also met a few people in person for the first time I had previously only communicated with electronically, like Rein Heinrich and Chris Smith.

I also saw lots of old friends—way too many to list. It once again reminded me how thankful I am to be part of such a great community. Of course, the community is also far from perfect; towards that end I really enjoyed and appreciated the ally skills tutorial taught by Valerie Aurora (which probably deserves its own post).

## The Content

Here are just a few of my favorite talks:

I had a lot of great discussions relating to diagrams. For example:

• I talked with Alan Zimmerman about using his and Matthew Pickering’s great work on ghc-exactprint with an eye towards shipping future diagrams releases along with an automated refactoring tool for updating users’ diagrams code.

• After talking a bit with Michael Sloan I got a much better sense for the ways stack can support our development and CI testing process.

• I had a lot of fun talking with Ryan Yates about various things, including projecting diagrams from 3D into 2D, and reworking the semantics and API for diagrams’ paths and trails to be more elegant and consistent. We gave a presentation at FARM which seemed to be well-received.

I got another peek at how well Idris is coming along, including a few personal demonstrations from David Christiansen (thanks David!). I am quite impressed, and plan to look into using it during the last few weeks of my functional programming course this spring (in the past I have used Agda).

If I had written this as soon as I got back, I probably could have remembered a lot more; oh well. All in all, a wonderful week, and I’m looking forward to Japan next year!