Paul Khuong mostly on Lisp

rss feed

Wed, 03 Sep 2008


SWAR implementation of (some #'zerop ...)

SWAR (SIMD Within A Register) codes can portably express short-vector parallelism for operations on small packed data (e.g. byte or even nybble vectors). A trivial application of the technique is when we test whether any bit in a word is set (equal to 1) by comparing the whole word against 0. Obviously, that also work to test whether any field (of arbitrary width) is not filled with 0. This document, from 1997, provides a fairly clear and complete overview.

Just like testing whether any bit is set, it is easy to find whether some bit is not set (it's simply the opposite of whether every bit in the word is set). Things are more complex when the data are wider than a single bit (but obviously narrower than a full word). I found a short implementation (and barely tested it), but it might be possible to do even shorter. Skip to the series of asterisks if you want to solve that puzzle (to efficiently find whether any field in a sequence of data, itself packed into a single word, is 0) yourself.

To simplify the description, I'll assume that we're working with 4-bit-wide fields. It should be clear how the code can be adapted to other widths or even mixed widths.

Let x = aaaabbbbccccdddd... be the input.

1. x' = x | (x >> 1). The first bit in each field is now, for our purposes, noise. However, some of the remaining 3 bits are now non-zero iff some the original 4 were.

2. ones = x' & ~0x8888.... The noise bits are masked out.

3. borrow = 0x8888... - ones. The first bit of each field in borrow is 0 iff some of the 3 other bits in ones aren't (iff some of the 4 bits in x weren't).

4. result = borrow & 0x8888... is zero iff the first bit of every field in borrow is 0 (iff every field in x was non-null).

And, finally, that is easy to test for, a word at a time. In the end, it takes 5 (hopelessly serial) operations (>>, |, &, - and &) and a conditional branch.

**** Testing whether any field in a word is filled with 0 may seem incredibly obscure. Apart from (some #'zerop [packed-unboxed-vector]), what use is there for such code sequences? One trick is to exploit xor. xor lets us compare multiple fields at a time: a field is 0 in a xor b iff the corresponding fields in a and b are identical (bit for bit). Now that we can determine when at least one pair of fields is equal, it's simple to implement, e.g., default FIND on specialised vectors without having to test each datum separately (until the very end, when we know that one of the pairs is equal, but not which). As usual, a full implementation, capable of dealing with :start, :end and displaced vectors is a lot more work.

posted at: 02:07 | /LowLevel | permalink

Made with PyBlosxom Contact me by email: