Category Archives: machine learning

Genetic vs. Exhaustive

Genetic algorithms are basically a guided trial-and-error methodology. The only advantage I can think of for a GA over a exhaustive search is that since GA optimizes a fitness function in steps, you might get to an optimal solution faster, because the GA will favor solutions that are incrementally better. An exhaustive search that’s guaranteed to find a solution might go the long way around.

  • If “computation resources” means CPU but not memory, then the GA gene pool might have a smaller memory footprint, requiring less memory.
  • On the other hand, because of hill climbing, a GA might get stuck on a local maximum & any mutations might not be sufficient to shake it loose.
  • Also, the search time for GA grows exponentially with the size of the input, so an exhaustive search may end up being faster after all, depending on the size of the space you’re searching, and if you can constrain the size of the space by excluding possibilities.

Lately I’ve been thinking about GAs in terms of “entropy per second” and measuring the progress of my GA as a measure of how many distinct possibilities it runs through per second. Then I can compare that to the total entropy of the problem space. If a brute force search can beat that entropy per second with parallelization or fast processors, then a GA’s “best score” isn’t any better than a discovered “best score”.

But I’ve just been noodling that over; I haven’t actually instrumented a GA that way yet.

(Entropy is ln(N) for N possible states, or ln(N) - sum(n * ln(n) for all n) / N where n is the possible ways to achieve one result out of N possible outcomes.)

Leave a comment

Filed under computer algorithms, machine learning, opinionizing

Entropy and Learning to Play the Game

I’m speculating with this post — I would really love to hear your opinions and thoughts!

John Cook has an interesting blog post on calculating entropy: http://www.johndcook.com/blog/2013/08/17/calculating-entropy/

Entropy is a curious thing. There are different variations on it. There is the original thermodynamic version of entropy, which is a measure of the heat loss of a system as it does useful work. There is a statistical mechanics interpretation on entropy, which is the amount of information needed to express the state of a  physical system. And there is the information theory (i.e. computing) version of entropy, which is the amount of information needed to specify the state of an information system.

There is disagreement on whether the different versions are fundamentally related to each other, but a common term keeps popping up: “n log(n)”

John Cook’s post looks at entropy based on a number of probabilities that add up to 1.0. The equation he comes up with is:

entropy2

Now, if we pretend for a moment that all the probabilities are equally likely, then the right-hand term goes away, and we’re left with “log N”. That’s the upper bound on the amount of entropy. If we could pin the probabilities down, we could reduce that, but that’s good enough for the moment.

If the calculation is done with log base 2, then the result is the number of “bits” of entropy. That’s means representing the space with the familiar  binary series of 0 and 1. If you use the natural log, then the result is the number of “nats”. I’m not sure what representation that corresponds to. Maybe frequencies or wavelengths?

Taking that, we can track the progress of Machine Learning as it applies to an area that people are interested in. Games.

Lets’s start with Tic Tac Toe.

Tic Tac Toe is a fun game if you’re young, but it doesn’t take long before we consider it to be “solved”. That is, two players who know what they’re doing will always play to a draw. Every time.

So let’s consider Tic Tac Toe as a Machine Learning problem:

  • If we look at the number of possible Tic-Tac-Toe games, we find about 255,000 games. That represents an entopy of 18 bits, or 12 nats. That means you could represent every the Tic-Tac-Toe games with 3 bytes apiece if you really scrunched it down.
  • But we can reduce that even more by observing that the board has rotational symmetry. We can rotate the board in 8 ways. That means there’s really only about 32,000 possible games. That’s 15 bits, or 10.4 nats.
  • However, my source tells me that of those, only 1145 are games that someone who knows what they’re doing will play. All the others result in a loss. So an expert Tic Tac Toe player — a player who will never lose — only has to remember 1145 games. That’s 10 bits, or 7 nats.

Clearly a computer can handle that much information. So at 7 nats, we consider the game to be “trivial” or “solved”. It’s not even interesting to play.

Next consider backgammon.

Backgammon introduces the roll of dice, so instead of games I put in the number of board positions. If a computer knew the value of each board position, then it could roll the dice, evaluate each possible move, and then pick the one that was most likely to result in a win.

  • If we play Backgammon naively, by just memorizing board positions, it is significantly more complicated, requiring 66 bits, or 46 nats to master.

But computer backgammon programs have all but solved backgammon. Except for the throw of the dice, they can beat expert players more often than not. So humans still consider backgammon a challenging game, but computer players can master the 46 nats required to come out ahead.

How about Chess?

There is the Shannon number, which is an estimate of the number of possible board positions. And, with some estimates of the number of possible moves and length of games, you can estimate the number of chess games.

  • The space of possible Chess games comes out to be 398 bits, or 276 nats.

The best humans have not beat the best chess players since 2006. So even at 276 nats, computers can master the space of chess games better than human intelligence.

Jeopardy, anyone?

What about a game show? Recently Watson competed on Jeopardy, and beat its human opponents. I’m not sure how many bits or nats Jeopardy represents, but it has to be pretty big. The contestants are given a study book that contains all the answers, so at least a computer player must be able to handle the entropy contained in that book.

The game of Life

So the entropy contained in a game progresses from “trival” to “interesting” to “captivating”. Captivating in a sense that people will spend their whole lives mastering that game. And people will play on Jeopardy, hoping their ability to learn and remember will outpace their opponents. But in all these areas, machine learning has surpassed human performance.

What’s next? Clearly at some point, machine learning isn’t strong enough to master the entropy of the given space. Shakespeare wrote Hamlet, but writing a compelling tragedy is more than a computer can muster. Even a solid romance novel, for that matter. Or a pop song that climbs the charts.

But given how rapidly storage capacity and processing power is growing, how long will it be before a big processor *can* handle the nats required to write a sonnet?

At what point does an intelligence move from “trivial” to “interesting” to “captivating” to “self-deterministic”? That is, how many nats does a machine need to command before its thinking is so complex that we declare that it has free will?

Space Size bits nats
Tic Tac Toe – expert 1145 10.2 7.0
Tic Tac Toe – symmetic 31896 15.0 10.4
Tic Tac Toe – naive 255168 18.0 12.4
Backgammon – naive 1.10E+20 66.6 46.1
Chess – naive 1.00E+120 398.6 276.3
Jeapardy  ?  ?  ?
Hamlet  ?  ?  ?

Here’s a comparison with byte streams to see how much data you can pack into them.

Space Size bits nats
1 byte 256 8.0 5.5
4 bytes – 32 bits 4294967296 32.0 22.2
8 bytes – 64 bits 1.84E+19 64.0 44.4
16 bytes 3.40E+38 128.0 88.7
32 bytes 1.16E+77 256.0 177.4
64 bytes 1.34E+154 512.0 354.9

Source of game numbers:

Tic Tac Toe: http://www.mathrec.org/old/2002jan/solutions.html
Backgammon http://www.bkgm.com/articles/Levner/BruteForceBackgammon/
Chess: http://en.wikipedia.org/wiki/Shannon_number

Leave a comment

Filed under machine learning, opinionizing

What is a Genetic Algorithm?

“Genetic algorithm” is also called “genetic search”. It’s modeled after an idealized view of natural selection, but very, very loosely. Really it’s method of making guesses at an answer to a question based on combinations of previous answers and how they did.

 A genetic search starts with two things:

 1) A population of solutions encoded as “gene” strings. These are usually randomly generated to start with.

 2) One or more methods for combining and mutating those strings.

Implicit in both of those is a way to “score” the strings, so that you can tell good answers from bad ones.

The genetic algorithm is just a process:

a) select two gene strings from the population

b) combining them into a new solution

c) score that solution to see how good it is

d) put that solution into the population & remove one that has a lower score

So, over time, the population contains strings that reflect the best parts of their ancestors & have the highest scores. If there’s a single “right” solution, eventually the algorithm will probably find it. If there’s no “right” solution, the algorithm will ruthlessly optimize the scores to find ones that work best.

It all sounds vaguely racist, maybe not so vaguely, but you have to remember that a) it’s just a computer algorithm and not a metaphor for natural processes, and b) animals and humans don’t have a “score” associated with them.

The search can get hung up on a local maximum, so sometimes you have to blow away the population and start over. But there’s no rule against keeping a log of the best solutions so you can see how they did over time.

 One of the neat things about genetic algorithms is because it’s mindlessly and ruthlessly optimizing that score, it will often discover solutions you didn’t consider, or even don’t like, but which meet the criteria best. That can be fun, and it can also be frustrating. But I think it’s more fun than frustrating.

 

Leave a comment

Filed under computer algorithms, machine learning