## Friday, December 13, 2002

### Learning via Occam's Razor

I have been asked to expand upon the spam example from yesterday's Foundations of Complexity lesson. To do this let me go into a little background on computational learning theory.

The basic model for learning theory has examples given as strings and labelled positive or negative. From these labelled examples, one comes up with a hypothesis that hopefully will classify future examples well.

In PAC (Probably Approximately Correct) learning you want an algorithm that takes labelled examples generated by some distribution and will output some hypothesis that will high confidence will usually classify future examples drawn from the same distribution.

One simple approach is the Occam algorithm based on the Occam's Razor principle "when you have two competing theories which make exactly the same predictions, the one that is simpler is the better."

The Occam algorithm works by finding the smallest representation of a hypothesis consistent with the labelled examples and using that hypothesis to predict future examples. There is a theorem in learning theory that says this algorithm works well with the number of samples roughly the size of the smallest representation.

Let us focus on the problem of identifying spam using general circuits, a collection of AND, OR and NOT gates that capture computation. We'll talk more about circuits in a future lesson but just think of the running time of an algorithm as roughly the size of the equivalent circuit.

Given a collection of emails each labelled either as SPAM or NOT SPAM, the Occam algorithm requires us to find the smallest circuit that correctly labels all of these emails. In general finding such a circuit could be hard but under the assumption that P=NP this is an easy task. By the result mentioned above this circuit will likely classify SPAM correctly most of the time in the future.