Transducers promise to separate the element transformation from the
actual traversal of the input data. This allows the very same transducer
pipeline to be used in
reduce (resp. the transducer specific
transduce) and at the same time apply the exact same
transformation logic to asynchronous channels as provided by
Just as well you could create a traditional lazy sequence from the transducer. The promise is to provide the laziness we know from sequences, but save the cons cells created in each step of the transformation pipeline. This is more efficient and reduces the load on the garbage collector.
A very attractive promise indeed. But do transducers deliver on that promise?
To illustrate this principle let's have a look at an example. Assume the following transducer:
(defn neighbourhood [n] (list (dec n) n (inc n))) (def example-t (comp (mapcat neighbourhood) (map str)))
This is a very simple transducer pipeline. It specifies a transformation,
which first turns a given natural number into a vector of its neighbours
and itself. Since we use
mapcat this list is flattened into
the further pipeline steps, which – in this case – just plainly turns the
numbers into strings.
To get a lazy sequence from this transducer we use the
sequence function which was augmented to understand
user=> (sequence example-t (list 1 4 7)) ("0" "1" "2" "3" "4" "5" "6" "7" "8")
This looks quite similar to our traditional approach using the sequence functions.
(defn example-s [coll] (->> coll (mapcat neighbourhood) (map str)))
This result is the same:
user=> (example-s (list 1 4 7)) ("0" "1" "2" "3" "4" "5" "6" "7" "8")
However, the latter is tied to lazy sequences while we may use the transducer in various other scenarios.
user=> (transduce example-t str (list 1 4 7)) "012345678" user=> (async/chan 0 example-t) #<channel>
As stated above this is a very tempting promise transducers give here. We get all the niceties of the functions we are used to plus some real improvements in terms of efficiency and reuse.
However, there is a little bit of a problem. The transducer version is not as lazy as the traditional sequence form:
user=> (first (example-s (list 1 4 7))) ">0<" "0" user=> (first (sequence example-t (list 1 4 7))) ">0<" ">1<" ">2<" "0"
Note: I replaced the call to
str in the
map pipeline step with a different function which also prints
the result in order to show the effect.
As you can see the sequence is realized further than actually needed. In
fact the sequence is realized up to the length of the first return value in
mapcat step. To understand why this happens, we have to
have look under the hood.
Here are – roughly – the implementations of the two mapcats.
(defn lazy-mapcat [f coll] (->> coll (map f) (apply concat))) (defn transducer-mapcat [f] (comp (map f) (cat)))
This looks quite similar. And a quick confirmation in the REPL shows
map is not the problem. So the culprit must lie in
the difference between the lazy
concat and the transducer
The main difference between the concatenating flavours is, that the lazy
concat is based on the pull principle while the transducer
cat is based on the push principle.
cat is called with an input. It has no way to delay
processing the input to a later point it time. So it has no choice but
immediately walk the input and push its elements further downstream.
concat can do that however. It keeps the input in
mind, but the work is only done, when specifically being asked for it. That
concat can also choose to process the input only step by
sequence to apply a transducer pipeline to a
collection the result may not be as lazy as it is when resorting to the
traditional sequence functions.
Beware the edge cases!
Published by Meikel Brandmeyer on .