Early conclusion: the buffer is a Good Thing.
Issues: The buffer works by loading the entire file into RAM, then doing manipulations on it in RAM. However, the vast majority of our levels are in four-direction mode, which holds four sets of textures in a single source file. We only need to read one of them at a time.
With the buffer off in four-direction mode, the loading phase takes 7.8 seconds (likely the same number in reality - this isn't even remotely close to perfect test conditions). With the buffer on, the loading phase takes 7.2 seconds.
In theory, I should be capable of the same performance with the buffer on as I am in single-tile mode. However, the processing phase, in either case, takes about 42 seconds. Is it really worth the extra coding work for such a small amount of speed increase overall?
More things to think about:
The computer I'm running this test on is probably one of the slower computers in the company at this point. Part of that "loading" phase is actually decompression - about 3 seconds of it, in fact. Most of the other computers are about twice as fast as mine . . . so instead of 50-seconds-to-47-seconds, we're looking at something more like 25-seconds-to-22-seconds (since the speed increase in buffered mode is due to network use and linux box speed, not local computer speed.)
As mentioned, with the buffer on, the program loads the entire file from hard drive. When you've got thirty CPUs reading gigantic work units off a hard drive, disk bandwidth is a problem - before adding the tier-1 compression step, we were maxing out our linux box whenever I turned the network on. Turning the buffer on without setting up partial reads really isn't very practical.
I wonder if I can hack out something "easy" that gets good performance without needing to have intimate knowledge of the file format. Maybe if I have it load-on-demand blocks of 256k or so . . . most of the speed hits are due to random file seeks over Samba. (Ick.)