Recently I discovered a nice way to deal with records where certain fields of the record cache some expensive function of other fields, using the lens
library. I very highly doubt I am the first person to ever think of this, but I don’t think I’ve seen it written down anywhere. I’d be very happy to be learn of similar approaches elsewhere.
The problem
Suppose we have some kind of record data structure, and an expensive-to-calculate function which computes some kind of “view”, or summary value, for the record. Like this:
data Record = Record
{ field1 :: A, field2 :: B, field3 :: C }
expensiveView :: A -> B -> C -> D
expensiveView = ...
(Incidentally, I went back and forth on whether to put real code or only pseudocode in this post; in the end, I decided on pseudocode. Hopefully it should be easy to apply in real situations.)
If we need to refer to the summary value often, we might like to cache the result of the expensive function in the record:
data Record = Record
{ field1 :: A, field2 :: B, field3 :: C, cachedView :: D }
expensiveView :: A -> B -> C -> D
expensiveView = ...
However, this has several drawbacks:
-
Every time we produce a new
Record
value by updating one or more fields, we have to remember to also update the cached view. This is easy to miss, especially in a large codebase, and will most likely result in bugs that are very difficult to track down. -
Actually, it gets worse: what if we already have a large codebase that is creating updated
Record
values in various places? We now have to comb through the codebase looking for such places and modifying them to update thecachedExpensive
field too. Then we cross our fingers and hope we didn’t miss any. -
Finally, there is nothing besides comments and naming conventions to prevent us from accidentally modifying the
cachedExpensive
field directly.
The point is that our Record
type now has an associated invariant, and invariants which are not automatically enforced by the API and/or type system are Bad ™.
Lens to the rescue
If you don’t want to use lens
, you can stop reading now. (Honestly, given the title, I’m not even sure why you read this far.) In my case, I was already using it heavily, and I had a lightbulb moment when I realized how I could leverage it to add a safe cached view to a data type without modifying the rest of my codebase at all!
The basic idea is this:
- Add a field to hold the cached value as before.
- Don’t use
lens
’s TemplateHaskell utilites to automatically derive lenses for all the fields. Instead, declare them manually, such that they automatically update the cached field on everyset
operation. - For the field with the cached value itself, declare a
Getter
, not aLens
. - Do not export the constructor or field projections for your data type; export only the type and the lenses.
In pseudocode, it looks something like this:
module Data.Record
(Record, field1, field2, field3, cachedView)
where
import Control.Lens
data Record = Record
{ _field1 :: A, _field2 :: B, _field3 :: C, _cachedView :: D }
expensiveView :: A -> B -> C -> D
expensiveView = ...
recache :: Record -> Record
recache r = r { _cachedView = expensiveView (_field1 r) (_field2 r) (_field3 r) }
cachingLens :: (Record -> a) -> (Record -> a -> Record) -> Lens' Record a
cachingLens get set = lens get (\r a -> recache $ set r a)
field1 :: Lens' Record A
field1 = cachingLens _field1 (\r x -> r { _field1 = x })
field2 :: Lens' Record B
field2 = cachingLens _field2 (\r x -> r { _field2 = x })
field3 :: Lens' Record C
field3 = cachingLens _field3 (\r x -> r { _field3 = x })
cachedView :: Getter Record D
cachedView = to _cachedView
This solves all the problems! (1) We never have to remember to update the cached field; using a lens to modify the value of another field will automatically cause the cached view to be recomputed as well. (3) We can’t accidentally set the cached field, since it only has a Getter
, not a Lens
. In fact, this even solves (2), the problem of having to update the rest of our codebase: if we are already using lens
to access fields in the record (as I was), then the rest of the codebase doesn’t have to change at all! And if we aren’t using lens
already, then the typechecker will infallibly guide us to all the places we have to fix; once our code typechecks again, we know we have caught every single access to the record in the codebase.
Variant for only a few fields
What if we have a large record, and the cached summary value only depends on a few of the fields? In that case, we can save a bit of work for ourselves by getting lens
to auto-generate lenses for the other fields, and only handcraft lenses for the fields that are actually involved. Like this:
{-# LANGUAGE TemplateHaskell #-}
data Record = Record
{ _field1 :: A, _field2 :: B, _cachedView :: C, ... }
expensiveView :: A -> B -> C
expensiveView = ...
let exclude = ['_field1, '_field2, '_cachedView] in
makeLensesWith
(lensRules & lensField . mapped . mapped %~ \fn n ->
if n `elem` exclude then [] else fn n)
''Record
field1 :: Lens' Record A
field1 = ... similar to before ...
field2 :: Lens' Record B
field2 = ...
cachedView :: Getter Record C
cachedView = to _cachedView
But what about the lens laws?
You might worry that having a lens for one field automatically update the value of another field might break the lens laws somehow, but it’s perfectly legal, as we can check.
view l (set l v s) ≡ v
clearly holds: setting thecachedView
on the side doesn’t change the fact that we get back out whatever we put into, say,field1
.set l v' (set l v s) ≡ set l v' s
also clearly holds. On the left-hand side, the cached summary value will simply get overwritten in the same way that the other field does.set l (view l s) s ≡ s
is actually a bit more subtle. If we view the value offield1
, thenset
it with the same value again, how do we know the value of the overall records
doesn’t change? In particular, could we end up with a differentcachedView
even thoughfield1
is the same? But in fact, in this specific scenario (putting the same value back into a field that we just read), the value of thecachedView
won’t change. This depends on two facts: first, that theexpensiveView
is a deterministic function which always returns the same summary value for the same input record. Of course this is guaranteed by the fact that it’s a pure function. Second, we must maintain the invariant that thecachedView
is always up-to-date, so that recomputing the summary value after setting a field to the same value it already had will simply produce the same summary value again, because we know the summary value was correct to begin with. And of course, maintaining this invariant is the whole point; it’s guaranteed by the way we only export the lenses (and only aGetter
for thecachedView
) and not the record constructor.
And that’s it! I’ve been using this approach very successfully in a current project (the same project that got me to implement Hindley-Milner with unification-fd
—watch this space for an announcement soon!). If you know of similar approaches that have been written about elsewhere, or if you end up using this technique in your own project, I’d love to hear about it.
Nice. I had all the ingredients in my head and yet they did not come together to see this. Someone else had mentioned a link between lens and ornaments, and I had not seen the link… until now.
Nice! This is worth remembering.
If you are worried about large existing codebases, switching everything to use lenses might be a big hammer…
I think you can achieve the original goal also with pattern synonyms, in an “even more transparent way”, can you?
Ok, you write “If you don’t want to use lens, you can stop reading now”, that’s fair :-)
Sure, switching everything to use lenses would indeed be a big hammer. In my case I was *already* using lenses for other reasons, so the cost was very low.
I am curious though, how would you achieve this using pattern synonyms? I don’t see how it would work well in the case where the view only depends on a few of the record’s fields, and you don’t want to recompute the cached value every time anything at all changes, only when something it depends on changes. It also seems like you would have to give up the ability to easily modify a specific field, if you had to write everything in terms of decomposing the entire record by pattern-matching on it and then reassembling. But maybe I’m just failing to imagine what you have in mind.
You are right about the selective cache update!
I was assuming that one can define an explicitly bidirectional record pattern synonym that allows record update syntax as before, making the whole change _completely_ invisible to users. But reading https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/pattern_synonyms.html it’s not clear if that would work. I should try it out, but not today :-)
As a complete newcomer to Haskell, is laziness what makes this cache viable? What I mean by this: is it correct that _cachedView is not actually evaluated (=is an unevaluated thunk?) unless it is used somewhere else? Otherwise, I’d expect that updating it whenever a dependent field is updated, would result in a lot of unnecessary evaluations and it would make updating dependent fields too expensive?
If this is the case, why do you put such a big emphasis on recomputing cached values when it comes to the pattern synonyms approach (whatever that is)?
Thanks for helping me to understand!
Hey there, good questions, thanks. Laziness does help; in particular, if you do many updates without consulting the cached view, those intermediate cached values will never actually be computed (you are correct about that). However, I don’t know if I’d say that is what makes it viable. In general, having a field to store a cached view is probably only going to be helpful if you tend to do updates infrequently and consult the cached view often (in which case the laziness isn’t buying you much anyway). But I guess you could say that with laziness, you are much more free to add a cached field “speculatively”, even if you’re not sure about the access pattern. If you do end up doing updates a lot as compared to consulting the cached value, thanks to laziness you are no worse off than if you had just computed the view directly every time without caching it.
The problem with the pattern synonyms approach is that it would recompute the cached value even when a field changes which the cached value does not depend on. Now, if you never actually look at the cached value, it won’t make a difference since it won’t be computed due to laziness. However, consider a scenario in which you alternately update a field which the cached value shouldn’t depend on, then consult the cached value. If the cached value isn’t updated at all, then consulting it every time will be very fast. However, if it is updated, then it must actually be recomputed every single time, even though the ultimate value will be the same. Laziness doesn’t help you when you replace a value by an unevaluated expression which just so happens to eventually have the same value.
As for what “the pattern synonyms approach” would be, GHC provides “pattern synonyms” which allow you to define custom construction and pattern-matching methods for a data type that don’t directly correspond to constructors. For example, if you define `pattern Singleton x = [x]` then you can build a singleton list with `Singleton 3` or use it to pattern match on a list, as in `f (Singleton x) = …` You can also get much fancier than that and have them do arbitrary computation. So the idea here would be to make a custom pattern synonym which automatically fills in the cached value every time you use it to construct a record.
Thanks a lot! This also makes clear, how this approach is superior to a more invasive approach like composing the record and the view into a new record `RecordWithView`. This seems to be a valuable tool, in particular when refactoring for performance reasons, but I can imagine it would be easy to overuse it, so I’ll be cautious. 🙂
One more question: this post does not provide a solution to atomically enforce the invariant that the _cachedView is correct and up-to-date after construction? So typically you would insert a dummy value for _cachedView and then call `recache`? Or use a custom function `A -> B -> C -> Record` for construction? Or even those pattern synonyms I still need to read up?
It does actually provide a solution: you do not export the record constructor or any of its fields from the module. The only things you export are the name of the record, the lenses, and some function(s) for constructing record values initially (which of course must call `recache`). This way, you control the only means of constructing and updating the record, and as long as you ensure that all the lenses and construction functions preserve the cachedView invariant, then other code outside the module cannot break the invariant, because those are the only ways to construct or modify record values.