Paul Khuong mostly on Lisp

Sun, 06 Apr 2008

On Magic Pixie Dust

(Thank you xkcd)

Common Lisp isn't one of them scripting languages in which lists have O(1) random dereferencing or arrays O(1) insertion at both ends, but it's still possible to forget about invisible costs.

Still on the topic of random shuffling, Leslie Polzer lists three alternatives, and suggests another, one that feels like a "Good Solution". Unfortunately, the only one not to work is the latter.

It seems to me the best solution is the one that looks the worst to some. Fisher-Yates is a nice, simple shuffling algorithm. It could have been cuter, but I don't see what's wrong with imperative code.

(defun fisher-yates (sequence)
  (let ((vector (coerce sequence 'vector)))
    (loop for i downfrom (1- (length vector)) to 0
          do (rotatef (aref vector i)
                      (aref vector (random (1+ i)))))
    (replace sequence vector)))

The other sort-based solutions also work, except the final cleverer one that uses memoisation. It looks good. Unfortunately, it's not thread-safe and, contrary to the advertisement, it conses (probably more so than the other implementations). Worse, it doesn't work.

It's obviously not thread-safe, since each call to seqrnd reinitialises the global memoisation table. A better solution would use dynamic scoping instead of assignment. If the code isn't meant to be used in a threaded environment, then CLRHASH would probably be a better idea.

Generic memoisation is often based on a hash table. Such a table must save the key-value association somewhere. It needs at least two words for the key and the value. Usually, the overhead is greater than that, for various implementation-level reasons. Compare that to the solutions that sort a sequence of conses: with a vector, the space overhead's usually a bit more than 3 words/value (4 for a list). Specialised data structures can often bring the cost down, by being lossy or using some attribute of the set of associations. Generic memoisation is a nice hack, but is often replaced by specialised code as development goes on.

The shuffling code also does not work. A memoisation table will associate random numbers to values, not (initial) positions. If the sequence to shuffle contains the same element several times, all the references/copies will be bundled sequentially! For example, the list (a a b b) can only be shuffled two ways: (a a b b) or (b b a a). There are also issues with numbers and characters, for which EQ is free to return NIL regardless of the arguments, but that's not exactly common.

(Mis)Using existing code is a very good idea, especially for quick hacks or boilerplate that's only in the way of the essence of a project. However, it's still important to be aware of what's going on underneath it all. Excessive use of opaque magic leads to beautiful broken code.

Extra! Extra! We can abuse CLISP's arbitrary precision floats to generate a lot of digits of Pi for us:

[1]> (setf (long-float-digits) 3500)
3500
[2]> (* 4 (atan 1L0))
3.14159265358979323846264338327950288419...

posted at: 17:50 | /Lisp | permalink

Contact me by email: pvk@pvk.ca.