As Joshua Bloch says in his talk „API Design and why it matters“:
Write multiple plug-ins before release!
- If you write one, it probably won't support another.
- If you write two, it will support more with difficulty.
- If you write three, it will work fine.
In a recent post of mine on the Clojure group I encountered
exactly this rule. So let's explore it using the example of
clojure.core/memoize
.
Note: You might also be interested in the follow-up.
There is a function in clojure.core
which is called memoize
.
You provide it another function f
and memoize
will give you a
function, which calls f
with its arguments and caches the result.
Calling the function again with some previous set of arguments will not
call f
again , but will return the cached result.
However, memoize
uses a quite naive strategy to cache values: once a
value is in the cache, it will be there forever. This can lead to some
serious memory usage depending the arguments and the return values of
f
.
One way to remedy this it to provide different strategies as is done eg.
here for a time-to-live strategy
or in the thread
where I posted my above mentioned message. However one thing both
approaches have in common, is that they have to define their own version
of memoize
. So I thought about how memoize
could be made more
general.
So first let us make the strategy an option in a backward compatible way.
(defn memoize
([f] (memoize f (fn [c _] c)))
([f strategy]
(let [cache (ref {})]
(fn [& args]
(if-let [e (find @cache args)]
(val e)
(dosync
(if-let [e (find @cache args)]
(val e)
(let [result (apply f args)]
(alter cache assoc args result)
(alter cache strategy args)
result))))))))
This implements the naive strategy. The strategy is now a separate
function which acts on the cache and the function arguments. In the
default cache we simply return the original cache, so we get the naive
behaviour of the current memoize
as it is in clojure.core
.
Now let's add the bound FIFO strategy from the above mentioned thread.
(defn fifo-cache
[bound]
(let [values (ref clojure.lang.PersistentQueue/EMPTY)]
(fn [cache args]
(alter values conj args)
(if (> (count @values) bound)
(let [k (peek @values)]
(alter values pop)
(dissoc cache k))
cache))))
We store all encountered argument lists in a queue. Whenever we hit the
given limit we remove items in FIFO order from the cache. So the cache
will never grow beyond the given limit. To use this strategy we can now
simply pass it to the new memoize
.
(memoize f (fifo-cache 5))
But before we are jumping too high in our joy, let's remember the title of our post: „The Rule of Three“. So let's add another strategy.
Here we keep track of the items, which were requested recently. If we
hit the given limit the longest unused items are removed from the cache.
But wait. How can we implement this? We need to store the time when an
item was last requested. But this is currently not possible. To implement
such a strategy we need another hook which also keeps track of the read
accesses to the cache. So let's modify our already modified memoize
and then implement the new strategy.
(defn memoize
([f] (memoize f [find (fn [c _] c)]))
([f [cache-lookup cache-update]]
(let [cache (ref {})]
(fn [& args]
(if-let [e (cache-lookup @cache args)]
(val e)
(dosync
(if-let [e (cache-lookup @cache args)]
(val e)
(let [result (apply f args)]
(alter cache assoc args result)
(alter cache cache-update args)
result))))))))
(defn lru-cache
[bound]
(let [values (ref {})]
[(fn lru-cache-lookup
[cache args]
(when-let [e (find cache args)]
(dosync (alter values assoc args (System/currentTimeMillis)))
e))
(fn lru-cache-update
[cache args]
(alter values assoc args (System/currentTimeMillis))
(if (> (count @values) bound)
(let [k (apply min-key @values (keys @values))]
(alter values dissoc k)
(dissoc cache k))
cache))]))
; Usage
(memoize f (lru-cache 5))
Of course the fifo-cache
must be altered also to fit the new interface.
In this case we can just use clojure.core/find
as lookup function.
Ok. Let's summarise the whole thing and implement some more example strategies:
So here we go:
(defn memoize
"Returns a memoized version of a referentially transparent function.
The memoized version of the function keeps a cache of the mapping from
arguments to results and, when calls with the same arguments are repeated
often, has higher performance at the expense of higher memory use.
Optionally takes a cache strategy consisting of a lookup function and
a cache update function. The lookup function should return the map entry
from the cache or nil of the argument list is not registered in the cache
(cf. clojure.core/find)."
([f] (memoize f [find (fn [c _] c)]))
([f [cache-lookup cache-update]]
(let [cache (ref {})]
(fn [& args]
(if-let [e (cache-lookup @cache args)]
(val e)
(dosync
(if-let [e (cache-lookup @cache args)]
(val e)
(let [result (apply f args)]
(alter cache assoc args result)
(alter cache cache-update args)
result))))))))
(defn fifo-cache-strategy
"Implements a first-in-first-out cache strategy. When the given limit
is reached, items from the cache will be dropped in insertion order."
[limit]
[find
(let [values (ref clojure.lang.PersistentQueue/EMPTY)]
(fn fifo-cache-update
[cache args]
(dosync
(alter values conj args)
(if (> (count @values) limit)
(let [k (peek @values)]
(alter values pop)
(dissoc cache k))
cache))))])
(defn lru-cache-strategy
"Implements a LRU cache strategy, which drops the least recently used
argument lists from the cache. If the given limit of items in the cache
is reached, the longest unaccessed item is removed from the cache. In
case there is a tie, the removal order is unspecified."
[limit]
(let [values (ref {})]
[(fn lru-cache-lookup
[cache args]
(when-let [e (find cache args)]
(dosync (alter values assoc args (System/currentTimeMillis)))
e))
(fn lru-cache-update
[cache args]
(dosync
(alter values assoc args (System/currentTimeMillis))
(if (> (count @values) limit)
(let [k (apply min-key @values (keys @values))]
(alter values dissoc k)
(dissoc cache k))
cache)))]))
(defn ttl-cache-strategy
"Implements a time-to-live cache strategy. Upon access to the cache
all expired items will be removed. The time to live is defined by
the given expiry time span. Items will only be removed when the
function is called. No background activity is done."
[ttl]
(let [values (ref {})]
[(fn ttl-cache-lookup
[cache args]
(dosync
(let [now (System/currentTimeMillis)
ks (map key (filter #(> (- now (val %)) ttl) @values))
cache (apply dissoc cache ks)]
(alter values #(apply dissoc % ks))
(find cache args))))
(fn ttl-cache-update
[cache args]
(dosync (alter values assoc args (System/currentTimeMillis)))
cache)]))
(defn lu-cache-strategy
"Implements a least-used cache strategy. Upon access to the cache
it will be tracked which items are requested. If the cache size reaches
the given limit, items with the lowest usage count will be removed. In
case of ties the removal order is unspecified."
[limit]
(let [values (ref {})]
[(fn lu-cache-lookup
[cache args]
(when-let [e (find cache args)]
(dosync (alter values update-in [args] inc))
e))
(fn lu-cache-update
[cache args]
(dosync
(alter values assoc args 1)
(if (> (count @values) limit)
(let [k (apply min-key @values (keys @values))]
(alter values dissoc k)
(dissoc cache k))
cache)))]))
Comments and feedback appreciated.
Published by Meikel Brandmeyer on .
I'm a long-time Clojure user and the developer of several open source projects mostly involving Clojure. I try to actively contribute to the Clojure community.
My most active projects are at the moment VimClojure, Clojuresque and ClojureCheck.
Copyright © 2009-2014 All Right Reserved. Meikel Brandmeyer