### Constraint sets in SBCL: preliminary exploration

Part of what makes SBCL’s static analyses powerful is their flow-sensitivity: the system can express the fact that
something is true only in one of the branches of a conditional. At the heart of that capability are sets of constraints
that represent that knowledge; e.g. `(eql x y) `

to assert that `x `

and `y `

hold the same value in a given
branch.

Until September 2008, we implemented these sets as specialised hash tables. In commit 1.0.20.4, Richard M.
Kreuter replaced the hash tables with bit vectors. On most workloads, the change was beneficial, if only
because the compiler now spends less time in GC. However, in some cases (I would expect, with a large
number of variables and conditional branches), users have reported severe performance regressions (e.g.
`https://bugs.launchpad.net/sbcl/+bug/394206`

), regarding not only compilation times, but also memory
usage.

Exploring alternative representations for constraint sets (consets) looks like good way to get started with SBCL: it’s interesting as a data structure challenge, involves “real world” engineering trade-offs, and will have an immediate observable impact on the performance of the compiler. I’ve said as much before. Maybe this post will spur some interest.

As a first step, I instrumented the conset implementation to output some profiling information on each operation: the size of the constraint universe (approximately, the number of unique constraints in the compilation unit), and, for each conset argument, the size of the bit vector and the size of the set. I’m only using a random sample of ≈ 1% of the output from a full SBCL build.

Before graphing out histograms for the population and size of the consets, I counted the operations that were actually performed to see what was important:

operation | frequency (%) |

`member ` | 53 |

`do ` | 19 |

`adjoin ` | 14 |

`copy ` | 4.4 |

`do-intersect` | 3.1 |

`equal ` | 2.7 |

`union ` | 1.4 |

`empty ` | 0.98 |

`intersection` | 0.88 |

`difference ` | 0.38 |

All the operations should be obvious, except maybe `do`

, which iterates over the constraints in the set,
`do-intersect`

, which iterates over the intersection of two sets, and `empty`

, which tests for emptiness.

Of the 53% calls to `member`

, 38% come from `do-intersect`

, and another 14% from `adjoin`

. There are actually
very few calls to `member `

per se. A large number of the calls to `member `

returned `T`

: 29%. That’s unfortunate, since
failure is much easier to fast path. `do `

is also slightly over-represented, since `do-intersect `

also count as calls to
`do`

.

### Graphs galore

The spikes are powers of 2, and the last non-zero value is 4096. Kreuter reported that the largest universe he’d seen was 8k elements in an email. We don’t really have to worry about enormous universes except to avoid disastrous blow-ups.

We can also see that the representation as bitvector isn’t very dense. However, keep in mind that a bit is much smaller than a word. It takes very sparse sets for alternatives such as hash tables to save space.

The very long tails here means that while there are some fairly large consets, the vast majority of the operations only touch sets that have less than a half dozen elements.

That’s also true for calls to `member`

, but even more for `adjoin `

and `empty`

.

The profile for binary operations (`union`

, `intersection`

, `difference `

and `equal`

) is very similar to that of
`adjoin`

. In detail, we have:

`equal `

only shows a heavier tail. `intersection `

and `difference`

, however, both peak around 20 elements,
instead of 1 to 3. Finally, we can see that `union `

is pretty much a non-issue.

It’s also interesting to see the difference between the size of the largest and smallest argument for binary operations.

The heavier tails on binary operations probably come from the large arguments, while the smaller arguments seem to follow the overall distribution.

Finally, iteration through the set’s elements (`do `

or `do-intersect`

) is interesting because the ratio between
the set’s population and the bit vector’s size is a good indicator of that operation’s performance. For
`do-intersect `

only the set that’s iterated over is counted (not the one on which `member `

tests are then
executed).

### Making sense of the data

I’m not sure what information we can extract from this data, and especially not how to extrapolate to the cases reported by users who experience serious regressions.

One obvious way to speed `do `

up is to work with vectors of fixnum or machine words, and do the
rest ourselves. That would help with long spans of 0s, and would simplify the inner loop. We could
do something similar for `empty `

for implementations that don’t special-case `FIND `

on bit-vectors (e.g.
SBCL).

This does nothing to reduce memory usage. We probably want to switch to another representation for really sparse sets, a hash table or a (sorted) vector. But, if we do that, we run the risk of switching to a sparse implementation only because we have constraints with very high and very low indices, but nothing in the middle. A fat and shallow (two-level?) tree would help avoid that situation, and let us choose locally between a few representations.