Beware the unnamed seq

At the moment there is a discussion going on the google group where the different ways of telling whether a collection is empty are benchmarked. The bone of contention is the advice to use seq to test for (non-)emptiness. However the discussion misses the point.

When to use `seq`

At least to a certain degree. An important point to consider is: when are you using seq? Usually you use it in sequence handling functions. Called on a sequence it tells you whether there is „something“ (a sequence with a first element and a rest) or „nothing“ (nil).

In particular has seq a return value. And in general seq is not the identity. That means you should capture its return value, because it is useful information.

(when-let [s (seq s)]
  (let [fst (first s)]
    ...))

This is a usual pattern one will encounter when studying idiomatic Clojure code. With a few lines of code we actually achieve quite a bit.

  • We get the test for emptiness.
  • We get the return value of seq.

Uh. Scary. Why is the last point of interest? Because we likely have to call seq anyway to realise the sequence in the non-empty branch. By using seq for the emptiness test and capturing its return value we catch both flies with one stroke. Calling seq on the result of a previous seq call actually is the identity. So it will be fast.

*Any other way of determining emptiness and realising the sequence will likely be slower!*

When *not* to use `seq`

So seq is the way to test for emptiness, right?

Wrong! It's the way to go when you are in sequence world. However when working with data structures you are not in sequence world. Using seq then is expensive, because it allocates a new object, which is immediately thrown away. Wasteful.

Here the right approach is to use zero? together with count. You probably won't get faster than that in this scenario. (Note: all data structures are Counted!)

Why not `empty?`

At the moment the choice of empty? would be rather unfortunate due to its implementation: #(not (seq %)). Its implementation should be #(zero? (count %)). Hopefully protocols will remedy this and put empty? where it belongs. As well as fixing the documentation to point out the right way.

Upshot

We really have to distinguish the two different worlds in Clojure: sequences and data structures. Use the right tools where they are appropriate. Any conflation of both worlds is likely to fail. Unfortunately Clojure itself is not clean in this area.

So pay close attention to what you do and consider a seq without capturing its return value a code smell!

Published by Meikel Brandmeyer on .