clojure tools

» fine made & hand crafted

The Rule of Three

As Joshua Bloch says in his talk „API Design and why it matters“:

Write multiple plug-ins before release!

If you write one, it probably won't support another.

If you write two, it will support more with difficulty.

If you write three, it will work fine.

In a recent post of mine on the Clojure group I encountered exactly this rule. So let's explore it using the example of clojure.core/memoize.

Note: You might also be interested in the follow-up.

`memoize`

There is a function in clojure.core which is called memoize. You provide it another function f and memoize will give you a function, which calls f with its arguments and caches the result. Calling the function again with some previous set of arguments will not call f again , but will return the cached result.

Caching Strategies

However, memoize uses a quite naive strategy to cache values: once a value is in the cache, it will be there forever. This can lead to some serious memory usage depending the arguments and the return values of f.

One way to remedy this it to provide different strategies as is done eg. here for a time-to-live strategy or in the thread where I posted my above mentioned message. However one thing both approaches have in common, is that they have to define their own version of memoize. So I thought about how memoize could be made more general.

Taking out the Strategy

So first let us make the strategy an option in a backward compatible way.

(defn memoize
  ([f] (memoize f (fn [c _] c)))
  ([f strategy]
   (let [cache (ref {})]
     (fn [& args]
       (if-let [e (find @cache args)]
         (val e)
         (dosync
           (if-let [e (find @cache args)]
             (val e)
             (let [result (apply f args)]
               (alter cache assoc args result)
               (alter cache strategy args)
               result))))))))

This implements the naive strategy. The strategy is now a separate function which acts on the cache and the function arguments. In the default cache we simply return the original cache, so we get the naive behaviour of the current memoize as it is in clojure.core.

Adding our second Strategy: first-in-first-out

Now let's add the bound FIFO strategy from the above mentioned thread.

(defn fifo-cache
  [bound]
  (let [values (ref clojure.lang.PersistentQueue/EMPTY)]
    (fn [cache args]
      (alter values conj args)
      (if (> (count @values) bound)
        (let [k (peek @values)]
          (alter values pop)
          (dissoc cache k))
        cache))))

We store all encountered argument lists in a queue. Whenever we hit the given limit we remove items in FIFO order from the cache. So the cache will never grow beyond the given limit. To use this strategy we can now simply pass it to the new memoize.

(memoize f (fifo-cache 5))

But before we are jumping too high in our joy, let's remember the title of our post: „The Rule of Three“. So let's add another strategy.

The third Strategy: least-recently-used

Here we keep track of the items, which were requested recently. If we hit the given limit the longest unused items are removed from the cache. But wait. How can we implement this? We need to store the time when an item was last requested. But this is currently not possible. To implement such a strategy we need another hook which also keeps track of the read accesses to the cache. So let's modify our already modified memoize and then implement the new strategy.

(defn memoize
  ([f] (memoize f [find (fn [c _] c)]))
  ([f [cache-lookup cache-update]]
   (let [cache (ref {})]
     (fn [& args]
       (if-let [e (cache-lookup @cache args)]
         (val e)
         (dosync
           (if-let [e (cache-lookup @cache args)]
             (val e)
             (let [result (apply f args)]
               (alter cache assoc args result)
               (alter cache cache-update args)
               result))))))))

(defn lru-cache
  [bound]
  (let [values (ref {})]
    [(fn lru-cache-lookup
       [cache args]
       (when-let [e (find cache args)]
         (dosync (alter values assoc args (System/currentTimeMillis)))
         e))
     (fn lru-cache-update
       [cache args]
       (alter values assoc args (System/currentTimeMillis))
       (if (> (count @values) bound)
         (let [k (apply min-key @values (keys @values))]
           (alter values dissoc k)
           (dissoc cache k))
         cache))]))

; Usage
(memoize f (lru-cache 5))

Of course the fifo-cache must be altered also to fit the new interface. In this case we can just use clojure.core/find as lookup function.

Summary

Ok. Let's summarise the whole thing and implement some more example strategies:

first-in-first-out
least-recently-used
time-to-live
least-used

So here we go:

(defn memoize
  "Returns a memoized version of a referentially transparent function.
  The memoized version of the function keeps a cache of the mapping from
  arguments to results and, when calls with the same arguments are repeated
  often, has higher performance at the expense of higher memory use.

  Optionally takes a cache strategy consisting of a lookup function and
  a cache update function. The lookup function should return the map entry
  from the cache or nil of the argument list is not registered in the cache
  (cf. clojure.core/find)."
  ([f] (memoize f [find (fn [c _] c)]))
  ([f [cache-lookup cache-update]]
   (let [cache (ref {})]
     (fn [& args]
       (if-let [e (cache-lookup @cache args)]
         (val e)
         (dosync
           (if-let [e (cache-lookup @cache args)]
             (val e)
             (let [result (apply f args)]
               (alter cache assoc args result)
               (alter cache cache-update args)
               result))))))))

(defn fifo-cache-strategy
  "Implements a first-in-first-out cache strategy. When the given limit
  is reached, items from the cache will be dropped in insertion order."
  [limit]
  [find
   (let [values (ref clojure.lang.PersistentQueue/EMPTY)]
     (fn fifo-cache-update
       [cache args]
       (dosync
         (alter values conj args)
         (if (> (count @values) limit)
           (let [k (peek @values)]
             (alter values pop)
             (dissoc cache k))
           cache))))])

(defn lru-cache-strategy
  "Implements a LRU cache strategy, which drops the least recently used
  argument lists from the cache. If the given limit of items in the cache
  is reached, the longest unaccessed item is removed from the cache. In
  case there is a tie, the removal order is unspecified."
  [limit]
  (let [values (ref {})]
    [(fn lru-cache-lookup
       [cache args]
       (when-let [e (find cache args)]
         (dosync (alter values assoc args (System/currentTimeMillis)))
         e))
     (fn lru-cache-update
       [cache args]
       (dosync
         (alter values assoc args (System/currentTimeMillis))
         (if (> (count @values) limit)
           (let [k (apply min-key @values (keys @values))]
             (alter values dissoc k)
             (dissoc cache k))
           cache)))]))

(defn ttl-cache-strategy
  "Implements a time-to-live cache strategy. Upon access to the cache
  all expired items will be removed. The time to live is defined by
  the given expiry time span. Items will only be removed when the
  function is called. No background activity is done."
  [ttl]
  (let [values (ref {})]
    [(fn ttl-cache-lookup
       [cache args]
       (dosync
         (let [now   (System/currentTimeMillis)
               ks    (map key (filter #(> (- now (val %)) ttl) @values))
               cache (apply dissoc cache ks)]
           (alter values #(apply dissoc % ks))
           (find cache args))))
     (fn ttl-cache-update
       [cache args]
       (dosync (alter values assoc args (System/currentTimeMillis)))
       cache)]))

(defn lu-cache-strategy
  "Implements a least-used cache strategy. Upon access to the cache
  it will be tracked which items are requested. If the cache size reaches
  the given limit, items with the lowest usage count will be removed. In
  case of ties the removal order is unspecified."
  [limit]
  (let [values (ref {})]
    [(fn lu-cache-lookup
       [cache args]
       (when-let [e (find cache args)]
         (dosync (alter values update-in [args] inc))
         e))
     (fn lu-cache-update
       [cache args]
       (dosync
         (alter values assoc args 1)
         (if (> (count @values) limit)
           (let [k (apply min-key @values (keys @values))]
             (alter values dissoc k)
             (dissoc cache k))
           cache)))]))