July 15th, 2003


(no subject)

So I've been working on the distributed network quite a lot at work. One of our build steps is rather slow and involves compressing a lot of data, and I've been working on farming that out to the network. I got it working at one point, but ended up turning it off again due to a bottleneck I hadn't considered (thirty computers reading uncompressed bitmaps uses a *lot* of hard drive bandwidth.) After going back and solving the bottleneck (by bzipping all the bitmaps :P) I had some more work that needed to be done, plus one or two glitches that needed to be fixed. Turning it on produced *another* batch of glitches (so it went off again), but I'd fixed those and felt it was ready for another real-world test . . . which was good timing, all things considered.

boss> This is *incredibly* slow. I've been running this build for half an hour now, and it's not even a quarter done.
me> You know, if I turned the distributed network back on, I bet it'd be faster to restart the build than to leave it running. I think I've got the latest batch of bugs worked out.
boss> You think so?
me> Yeah. There we go, it's enabled. Go for it.
boss> Okay, let's give it a try.

It finished in ten minutes.

Of course, with my luck it'll be turned off again by the time I get in tomorrow, but maybe I finally got the thing working entirely :)

Well . . . until the next set of improvements . . . I think I can decrease disk bandwidth by another factor of eight. *grin*

(no subject)

With the buffer off in single-tile mode, the loading phase on this test chunk takes 7.2 seconds. With the buffer on in single-tile mode, the loading phase takes 4.2 seconds.

Early conclusion: the buffer is a Good Thing.

Issues: The buffer works by loading the entire file into RAM, then doing manipulations on it in RAM. However, the vast majority of our levels are in four-direction mode, which holds four sets of textures in a single source file. We only need to read one of them at a time.

With the buffer off in four-direction mode, the loading phase takes 7.8 seconds (likely the same number in reality - this isn't even remotely close to perfect test conditions). With the buffer on, the loading phase takes 7.2 seconds.

In theory, I should be capable of the same performance with the buffer on as I am in single-tile mode. However, the processing phase, in either case, takes about 42 seconds. Is it really worth the extra coding work for such a small amount of speed increase overall?

More things to think about:

The computer I'm running this test on is probably one of the slower computers in the company at this point. Part of that "loading" phase is actually decompression - about 3 seconds of it, in fact. Most of the other computers are about twice as fast as mine . . . so instead of 50-seconds-to-47-seconds, we're looking at something more like 25-seconds-to-22-seconds (since the speed increase in buffered mode is due to network use and linux box speed, not local computer speed.)

As mentioned, with the buffer on, the program loads the entire file from hard drive. When you've got thirty CPUs reading gigantic work units off a hard drive, disk bandwidth is a problem - before adding the tier-1 compression step, we were maxing out our linux box whenever I turned the network on. Turning the buffer on without setting up partial reads really isn't very practical.

I wonder if I can hack out something "easy" that gets good performance without needing to have intimate knowledge of the file format. Maybe if I have it load-on-demand blocks of 256k or so . . . most of the speed hits are due to random file seeks over Samba. (Ick.)
  • Current Mood
    productive productive

(no subject)

Okay. I think some people here are involved in Big Corporate Programming. So here's a question:

This. There's only two things I'm going to point out (besides the great concept of "bizarre sock-related accidents"), and those are (1) three YEARS of data has been delayed, and (2) the group is getting paid TWO HUNDRED MILLION POUNDS.

So here's my question - What do you do with two hundred million pounds, over a period of time of three years, that still doesn't result in people being able to add things to the database?

For that matter, WTF does this database need two hundred million pounds of funding for? I find it rather hard to believe it gets heavier load than, say, Livejournal, and LJ certainly doesn't have that much funding behind it. I mean, I'm probably missing something here, but it's . . . a database. We're looking at, what, fifty thousand for hardware, maybe a few thousands a month for colocation, and then just a lot of employees for data entry . . . which they can't do because they can't write to the database, so where is all this money going?

I realize this is an entirely different universe from the one I live in, but if my boss wanted, say, a bug tracker database, I could do it on our existing hardware in a month. That's $4000 of salary (although I get, like, half of that, grrr, stupid tax, but that's not the point). $4000. If it needed to be really big and corporate and scalable I could still get it done in a year with some good hardware. We're looking at $100k at most. Not two hundred million pounds. (Which converts out to about 350 million dollars, incidentally.)

So what am I missing? There's gotta be something, even big businesses can't possibly be so incompetent as to have a 0.05% return on investment.
  • Current Mood