Restaurant Example

In this chapter we will look at examples where input and output are discrete and where the output has exactly 2 possible values (boolean classification).

goal predicate / output boolean of WillWait

attributes / input vector Here only discrete values.

Attributes

Training set

positive examples - in which WillWait $y_j$ is true: $(\vec x_2,\vec x_3,...)$

negative examples - in which WillWait $y_j$ is false: $(\vec x_2, \vec x_5, ...)$

Decision-Tree-Learning Algorithm

Greedy divide-and-conquer algorithm: queue sorted by the "most important attribute".

/*
- examples: (x, y) pairs
- attributes: attributes of values x_1, x_2, ... x_n in x in example
- classification: y_j = true / false
*/function DTL(examples, attributes, parent_examples) returns a decision tree
if examples is empty then return MAJORITY(parent_examples.classifications)
else if all examples have the same classification then return the classification
else if attributes is empty then return MAJORITY(examples.classification)
else
A = empty
for each Attribute A' do
A ← maxImportance(A, A', examples)tree ← a new decision tree with root test Afor each (value v of A) do //all possible values from domain of A
exs ← {e : e ∈ examples | e.A = v} //all examples where this value occured
subtree ← DTL(exs, attributes\{A}, examples) //recursive call
add a new branch to tree with label "A = v" and subtree subtreereturn tree

The algorithm is recursive, every sub tree is an independent problem with fewer examples and one less attribute.

no examples left
then it means that there is no example with this combination of values → We return a default value: the majority function on examples in the parents.

Remaining examples are all positive or negative
we are done

no attributes left but both positive and negative examples
These examples have the same description but different classifications, we return the plurality-classification (Reasons: noise in data, non determinism)

some positive or negative examples
choose the most important attribute to split and repeat this process for subtree

Solution

Notice how it ignores the Price and Type attributes.

Importance

We choose the attribute with most importance - Which attribute is the best predictor for the classification?

The one with the highest information gain.

low importance: Type

because four possible values, each of which has the same number of positive as negative examples.

high importance: Patrons

values lead to example sets that are definitely true or false.