clojure tools

» fine made & hand crafted

Taming the Bound Seq

On the Clojure group every once in a while someone posts a question concerning the interaction of binding and lazy sequences. Let's have a short look and what the problem is, why it arises and how a solution can be implemented.

The Problem

People happily create lazy sequences depending on dynamic variables.

(def x 1)

(defn make-seq
  [n]
  (lazy-seq
    (cons (+ x n) (make-seq (inc n)))))

Now they create an instance of the sequence in the scope of a binding call.

(def s (binding [x 100] (make-seq 5)))

Question: What is the result of (first s)? Right. 6. Huh? Yes. There were a lot of puzzled looks by clojure newbies already, when they first saw this phenomenon.

So, why is that so?

The Root Cause

binding changes the value a Var is bound to for the executed code inside its scope. The + call, where x shows up, is inside the scope of the binding. So why is it not affected? Because it is not executed inside the scope of the binding. The lazy-seq defers the execution until first retrieves the first item. Then the + call actually happens. But now x has already its old value. When the first call is inside the binding everything is fine.

(let [s (binding [x 100]
          (let [s (make-seq 5)]
            (println (first s))
            s))]
  (println (first s))
  (println (second s)))

Question: What is printed? Right. 105, 105, and 7. Huh? The first first realises the first item of the sequence. Since the call happens in the binding, we have the binding still in effect. So we print 105. The second first happens outside the binding, but the previous value was cached. So we get again 105. Last but not least, the second call realises the second item of the sequence, but since the binding is now gone, we got the original value of x and 7 is printed.

How can the problem be solved?

Possible Remedies

As we saw above, we can exploit caching of the lazy sequence! The simpliest thing is a wrap of the sequence into a doall before we pass it out of the binding. However this might not be very useful if the sequence is very large.

Then we have to deploy bigger guns and dive a little into the thread bindings interface of Clojure. The solution is to install the required bindings every time, when we realise one item of the sequence. In order to do this we first capture the required bindings with get-thread-bindings.

(defn make-bound-seq
  [n]
  (let [bindings (get-thread-bindings)
        step     (fn step [n]
                   (lazy-seq
                     (push-thread-bindings bindings)
                     (try
                       (cons (+ x n) (step (inc n)))
                       (finally
                         (pop-thread-bindings)))))]
    (step n)))

Always, always, always, ALWAYS follow this style! First a push-thread-bindings, then a try with your code and a finally with a pop-thread-bindings. This ensures that every push-thread-bindings is complemented with a pop-thread-bindings – even if an Exception is thrown. Failing to do so, will throw up your bindings completely, leaving the running program in a broken state.

Update: As suggested by Chouser this can be simplified with with-bindings. Together with the very good comment by Graham Fawcett we can actually define a helper, which turns any input sequence into a bound sequence.

(defn bound-seq*
  [bind-map inner-seq]
  (lazy-seq
    (with-bindings bind-map
      (when-let [s (seq inner-seq)]
        (cons (first s) (bound-seq* bind-map (rest s)))))))

(defmacro bound-seq
  ([inner-seq]
   `(bound-seq* (get-thread-bindings) ~inner-seq))
  ([bind-map inner-seq]
   `(bound-seq* (hash-map ~@(mapcat (fn [[k v]] [`(var ~k) v]) bind-map))
                ~inner-seq)))

Now we can take our original sequence and can turn it easily into bound sequence.

(def bs (bound-seq {x 100} (make-seq 5)))

Upshot

One has to understand the interactions between dynamic variables and laziness. As with everything else in life you have to understand. And then you have to do the Right Thing. The above seems noisy. But as Rich once stated: running into a lot of such trouble is a sign, that you misuse dynamic variables. Use them wisely.

Post Scriptum

What was said above also applies to new threads. Dynamic variables are thread-local. The „Good Kirk - Bad Kirk“ scenario of multithreading. So when you start a new thread by virtue of a function, the function does not have access to the bindings of the old thread. This can be easily remedied by using the bound-fn* helper. Simply wrap the function into a bound-fn\* and you'll be fine. Or define your function directly with bound-fn.

Published by Meikel Brandmeyer on 23 November 2009, 22:36.