Today's entry: Going All The Way. There's a classic problem known as the 8-queens problem. Put 8 queens on a chessboard so none of them can attack each other. (normal definition of queen - they can attack diagonally and along lines for any number of squares.) It's not hard to write a program to solve this, though it might take a few minutes for it to run.

A variation of this problem is the n-queens problem. For some (positive) integer n, place n queens on an n by n chessboard. I'm pretty sure it's possible for all numbers but 2 and 3. However, the amount of CPU time it takes to calculate increases dramatically.

For a variety of reasons (one of which being the sheer challenge), I wanted to see if I could manage to calculate really high numbers on the queens problem - you know, 30 or 40 or so. So I browsed around sites and found some interesting things - someone had set up a program on his computer to calculate 20-queens, and it had taken a few days. But there was a solution. (I only really wanted one.) A few days? For a measly 20 queens? I could do better than that . . .

So I did. It took a lot of optimization, but I ended up with a program that would go up through 28-queens in under a second per puzzle, as I remember. It only started going really *slowly* around 34 queens . . . from there it started quadrupling runtime per 2 queens (odd-numbered puzzles are really easy, it's the even-numbered puzzles that are tough.) I got up to 40 queens - which took a few hours, though I was doing other stuff at the time - and stopped it.

And it was mainly because I didn't give up. Okay, this bit is slow - how can I speed it up? How can I make this test run faster? Why am I doing this loop repeatedly - can't I do it only once?

There was a puzzle at Tira's that I beat - the family had had it for six months or so, and nobody's managed it. I did it in maybe three minutes. How? I figured out a simple necessity of the solution and just started *doing* it. Talking to Tira later on, I realized she had been on the right track, she just hadn't kept on going. And later, watching her, she got stuck at a point that I recognized as being a point on the way to finishing it.

See, if you start from the easy end, there's this bit where, in order to keep going, you have to kinda take apart half of what you've done in order to swing one of the wooden bits into the right place - it threads through the middle, and you've already boxed off the middle pretty efficiently. When I was doing it, I just sorta said "okay, this needs to happen, so I'll make it happen" and did it. But most people seem to get to that point and say "well, this is tough, so it can't be the right way". Part of me being able to do these things appears to be the ability to just turn that part of my brain off and bull straight through even when it seems like a bad idea.

I'm not sure what this says about me though.

Since I know there are gonna be a few curious people, I'll now write instructions on how to make the Possibly World's Fastest Brute-Force N-Queens Solver. And I'll do it in a way that will hopefully be understandable to non-CS people. However, I will include it

Still with me?

(wonder if anyone's going to be.)

Okay, first I have to explain a little about "big-O" notation. Big-O is a way of describing how complex a calculation gets when the data it's run on gets larger. Here, take an analogy - you have a bunch of colored links in a chain. How long does it take you to find a certain color? Well, let's say you start at the end and check each link as you go. This is O( n ) - the amount of time it takes you to find the link is linear compared to the number of links. If you double the number of links, then on average, it'll take you twice as long to find the link you're looking for.

Now, instead, let's use links with three connectors, and let's make a "tree" out of them. We'll put one link at the "bottom", then arrange the links upwards from there one at a time, so that of any three links - parent, left child, and right child - the name of the color of the left child will be alphabetically before the name of the color of the parent, and the name of the color of the right child will be alphabetically after the name of the color of the parent. This is an O( log n ) structure. An exponential growth in the number of links only gives you a linear increase in how long it takes to find the link, since each "tier" of the structure can hold twice as many links as the one below it. (I'm explaining this badly. Bear with me.)

Now, let's do a different structure - thousands of numbered bins, in order. Hey, let's throw a robotic delivery system into the mix - now we can access any bin in a certain amount of time. We come up with a formula that can be applied to the color of a link to give us a number, and we put each link in that appropriately numbered bin. Perhaps purple links end up in bin 198, and red links end up in bin 5. Whatever. It doesn't matter. This is an O( 1 ) system, because it doesn't matter how many colors of links we have, it takes the same amount of time to find the right color - the time to calculate the number and tell the robot to get the right link. (At least, until we start running out of bins.)

There are, of course, many others, and this is one of those subjects one could devote a lifetime to - sort functions, for example, end up with complexities like O( n log n ) or O( n^2 ). Bogosort is O( n! ). But in general, you get the idea. Note that multipliers are ignored - O( 2n ) == O( n ).

(For those who are curious, the O( n ) structure I referred to is a linked list - the O( log n ) structure is known as a binary tree, and the O( 1 ) structure is a hash table, all well-known data collection structures.)

Okay. To the one person who's probably still reading this . . . ;)

The most obvious way to write an n-queens program is to do it brute-force. Test every possible placement of queens. It's simple combinatorics - you end up with ( n^2 choose n ) possible combinations and possible tests. Which, as anyone who knows combinatorics knows, is far far far more than you even want to comprehend.

So let's start making it easier. If two queens can't attack each other, then by definition, they're not on the same row. Well that makes it easier. We can restrict each queen to its own row, and that yields a mere n^n combinations. (Which is a lot less than ( n^2 choose n )). So I started writing it recursively. The function would check each possible position for its queen and see if its queen could attack anything. If it couldn't, it would move to the next queen and see if there were any workable positions for it, and so on. If it found a branch that didn't work, it would just drop down as low as it needed to find another branch to try. (I hope this makes *some* sense. I'm trying to write this to be comprehensible to non-CS people, but I fear I might just be making it more confusing for everyone involved.)

Well, I tried this, and it was slow.

So - how can I speed it up?

Well, the next (successful) thing I tried was to scrap the entire idea of a chessboard. It doesn't really matter. There are only three things that matter - the vertical columns, the northwest-to-southeast diagonals, and the northeast-to-southwest diagonals. Remember that we already got rid of the rows, in effect.

So from here it was even easier. Instead of checking intersections with queens, I'd just see if the appropriate column and diagonals were flagged as "having queens in them already". If they weren't, I'd "place" the queen (i.e. flag the columns and diagonals) and move on to the next. Removing queens was trivial also - I'd just unflag the columns and diagonals. Since I could only have one queen in each column/diagonal anyway, I didn't have to worry about accidentally unflagging something that still had a queen.

20-queens in under a second.

Woo.

Still wasn't fast enough.

Another idea: I could do something similar with columns that I'd already done with rows! I made a queue to keep track of free columns. A queue is a standard data structure where you push things in one end and pull them out the other end (and can be implemented in many ways, but I'll get to this in a minute.) In this case, I would push all the columns in order into it at the beginning. The testing process was simple. I would pop the next column out and test it. If it worked, I would call recursively with the next queen. The thing that worried me was, could my queue get unordered during the call? This would be devastating - I might end up, say, testing 4 twice and 19 not at all. But, no, it wouldn't, because each call would rotate the queue around completely! It would pull all the items out and push them all back in. Of course, its subcalls would be doing the same thing, but after each call returned it would be in the same state as it was originally. Problem solved.

The way I set this up was a little array of the appropriate size (at this point I was using 30-queens for testing, so it was a 30-number sized array). It would keep track of its position, but it would wrap around the array, so each time something was pushed or popped it would do a modulus calculation. (Modulus, for those who don't know, is also known as remainder - it's the division remainder, that's all. 5 mod 2 == 1. 8 mod 3 == 2.) This would make it wrap around, and since I'd never have more than 30 items, it would never overflow.

30 queens, in 17 seconds.

Wonder if I can make it faster . . .

And yes, I could. Modulus is slow - it's a byproduct of division, basically, and division is slow. But - if you're using constants (which I was) sometimes the compiler can do some interesting optimizations.

Division is slow. Division by 32 is slow. However, a right bitshift by 5 is equivalent to a division by 32, so if it's a *constant* division by 32 it is instead compiled as a right bitshift by 5. Which is extremely fast.

Modulus 32, on the other hand, would end up compiled as a bitmask with 31. (those who know binary and don't already know why this works, play with the numbers a little, or ask me. But it does :) ) Bitmasks are also fast.

So I made the buffer 32 items large instead of 30.

30-queens in 2.4 seconds.

And no, I never expected it to be that much :)

So there it is - how to write a blindingly fast n-queens algorithm.

I wonder if anyone's still reading this, and I wonder if any of it makes any sense :P