The promise of transducers

Transducers promise to separate the element transformation from the actual traversal of the input data. This allows the very same transducer pipeline to be used in reduce (resp. the transducer specific variant transduce) and at the same time apply the exact same transformation logic to asynchronous channels as provided by core.async.

Just as well you could create a traditional lazy sequence from the transducer. The promise is to provide the laziness we know from sequences, but save the cons cells created in each step of the transformation pipeline. This is more efficient and reduces the load on the garbage collector.

A very attractive promise indeed. But do transducers deliver on that promise?

An example

To illustrate this principle let's have a look at an example. Assume the following transducer:

(defn neighbourhood
  [n]
  (list (dec n) n (inc n)))

(def example-t
  (comp
    (mapcat neighbourhood)
    (map str)))

This is a very simple transducer pipeline. It specifies a transformation, which first turns a given natural number into a vector of its neighbours and itself. Since we use mapcat this list is flattened into the further pipeline steps, which – in this case – just plainly turns the numbers into strings.

To get a lazy sequence from this transducer we use the sequence function which was augmented to understand transducers.

user=> (sequence example-t (list 1 4 7))
("0" "1" "2" "3" "4" "5" "6" "7" "8")

This looks quite similar to our traditional approach using the sequence functions.

(defn example-s
  [coll]
  (->> coll
    (mapcat neighbourhood)
    (map str)))

This result is the same:

user=> (example-s (list 1 4 7))
("0" "1" "2" "3" "4" "5" "6" "7" "8")

However, the latter is tied to lazy sequences while we may use the transducer in various other scenarios.

user=> (transduce example-t str (list 1 4 7))
"012345678"
user=> (async/chan 0 example-t)
#<channel>

A counter example

As stated above this is a very tempting promise transducers give here. We get all the niceties of the functions we are used to plus some real improvements in terms of efficiency and reuse.

However, there is a little bit of a problem. The transducer version is not as lazy as the traditional sequence form:

user=> (first (example-s (list 1 4 7)))
">0<"
"0"
user=> (first (sequence example-t (list 1 4 7)))
">0<"
">1<"
">2<"
"0"

Note: I replaced the call to str in the map pipeline step with a different function which also prints the result in order to show the effect.

As you can see the sequence is realized further than actually needed. In fact the sequence is realized up to the length of the first return value in the mapcat step. To understand why this happens, we have to have look under the hood.

Here are – roughly – the implementations of the two mapcats.

(defn lazy-mapcat
  [f coll]
  (->> coll
    (map f)
    (apply concat)))

(defn transducer-mapcat
  [f]
  (comp
    (map f)
    (cat)))

This looks quite similar. And a quick confirmation in the REPL shows that the map is not the problem. So the culprit must lie in the difference between the lazy concat and the transducer cat.

Push vs. pull

The main difference between the concatenating flavours is, that the lazy concat is based on the pull principle while the transducer cat is based on the push principle.

cat is called with an input. It has no way to delay processing the input to a later point it time. So it has no choice but immediately walk the input and push its elements further downstream.

The lazy concat can do that however. It keeps the input in mind, but the work is only done, when specifically being asked for it. That way concat can also choose to process the input only step by step.

Upshot

When using sequence to apply a transducer pipeline to a collection the result may not be as lazy as it is when resorting to the traditional sequence functions.

Beware the edge cases!

Published by Meikel Brandmeyer on .