Exam on 2021-02-11

Part 1

a) Construct a neural network

Inputs I1,I2I_{1}, I_{2}

Outputs O1,O2,O3O_{1}, O_{2}, O_{3}

You can also draw a separate network for every output.

I1I2O1O2O311111100000111000011 \begin{array}{c|c||c|c|c}I_{1} & I_{2} & O_{1} & O_{2} & O_{3} \\\hline 1 & 1 & 1 & 1 & 1 \\1 & 0 & 0 & 0 & 0 \\0 & 1 & 1 & 1 & 0 \\0 & 0 & 0 & 1 & 1\end{array} 

Activation function:

g(x)={1if  13x20otherwise  g(x)=\left\{\begin{array}{ll}1 & \text {if ~} \frac{1}{3} \cdot x \geq 2 \\0 & \text {otherwise }\end{array}\right. 

b) Explain goal-based agents - in what way are they different from utility-based agents?

3. Goal-based agents

Add on: explicit goals , ability to plan into the future, model outcomes for predictions

4. Utility-based agents

Add on: take happiness into account (current state) → Humans are utility-based agents

c) Explain the terms uninformed search, overfitting, activation function and deep learning.

Uninformed search

basic algorithms, no further information about solution costs (heuristics)


Overfitting

When the learned function fitted itself to the training set - but does not generalize well anymore

Tradeoff between complexity and simplicity

precision, fitting training data vs. generalizing, finding patterns

complex hypotheses - fit the data well vs. simpler hypotheses- might generalize better

Overfitting

The DTL algorithm will use any pattern it finds in the examples even if they are not correlated.

\uparrow likelyhood of overfitting if \uparrow hypothesis space, \uparrow number of inputs

\downarrow likelyhood of overfitting if \uparrow examples.


Activation function

The function that defines when the (mathematial model of a) neuron-unit fires.

It uses a threshold, linear or logistic function on the weighted sum of all inputs.

aj=g(inj)=g(∑i=0nwi,jai)a_j = g(in_j) = g\biggl( \sum_{i=0}^n w_{i,j}a_i \biggl)aj​=g(inj​)=g(i=0∑n​wi,j​ai​)

Where gg is an activation function :

perceptron hard threshhold g(x)={1 if x00 otherwise  g(x)=\left\{\begin{array}{ll}1 & \text { if } x \geq 0 \\0 & \text { otherwise }\end{array}\right. 

sigmoid perceptron logistic function g(x)=11+exg(x) = \frac{1}{1+e^{-x}}


Deep learning

Hierarchical structure learning: NNs with many layers

Feature extraction and learning (un)supervised

Problem: requires lots of data

Push for hardware (GPUs, designated chips): “Data-Driven” vs. “Model-Driven” computing

d) Multiple choice

1. Local beam search amounts to a locally confined run of kk random-restart searches. ✅

Idea keep kk states instead of just 11

Not the same as running kk searches in parallel: Searches that find good states recruit other searches to join them - choose top kk of all their successors

Problem often, all kk states end up on same local hill

Solution choose kk successors randomly, biased towards good ones

2. Neural networks are generally bad in object recognition. ❌

No - they are particularly good at it.

3. A rational agent does not necessarily always select the best possible action. ✅

Being rational does not mean being:

omniscient (knowing everything) percepts may lack relevant information

clairvoyant (seeing into the future) predictions are not always accurate

👉 successful (ideal outcome)

4. A game of poker is a discrete environment ✅

discrete vs. continuous

property values of the world

e) Construct a decision Tree

A decision tree uses the following example set:

ABC Output 1a1b1c2T2a2b1c1F3a2b2c2T4a2b3c1F5a2b3c1T6a2b3c1T7a2b3c2F8a1b2c2T \begin{array}{c||c|c|c|c} & A & B & C & \text { Output } \\\hline 1 & a_{1} & b_{1} & c_{2} & T \\2 & a_{2} & b_{1} & c_{1} & F \\3 & a_{2} & b_{2} & c_{2} & T \\4 & a_{2} & b_{3} & c_{1} & F \\5 & a_{2} & b_{3} & c_{1} & T \\6 & a_{2} & b_{3} & c_{1} & T \\7 & a_{2} & b_{3} & c_{2} & F \\8 & a_{1} & b_{2} & c_{2} & T\end{array} 

Construct a decision tree from the given data, using the algorithm from the lecture. The next attribute is always chosen according to the predetermined order:A,B,CA, B, C . Which special case occurs? How does the algorithm deal with it?


We have a predefined order, therefore we do not have to calculate the importance of each attribute individually.

The domains of these attributes:

DA={a1,a2}D_A =\{a_1, a_2\}

DB={b1,b2,b3}D_B =\{b_1, b_2, b_3\}

DC={c1,c2}D_C =\{c_1, c_2\}

/*
- examples: (x, y) pairs
- attributes: attributes of values x_1, x_2, ... x_n in x in example
- classification: y_j = true / false
*/function DTL(examples, attributes, parent_examples) returns a decision tree
if examples is empty then return MAJORITY(parent_examples.classifications)
else if all examples have the same classification then return the classification
else if attributes is empty then return MAJORITY(examples.classification)
else
A = empty
for each Attribute A' do
A ← maxImportance(A, A', examples)tree ← a new decision tree with root test Afor each (value v of A) do //all possible values from domain of A
exs ← {e : e ∈ examples | e.A = v} //all examples where this value occured
subtree ← DTL(exs, attributes\{A}, examples) //recursive call
add a new branch to tree with label "A = v" and subtree subtreereturn tree

The algorithm is recursive, every sub tree is an independent problem with fewer examples and one less attribute.

The returned tree is not identical to the correct function / tree ff . It is a hypothesis hh based on examples.


In this special case using this predefined order: we have no attributes left but both positive and negative examples .

Reason: These examples have the same description but different classifications, we return the plurality-classification (Reasons: noise in data, non determinism)

f) Search problem - A* Search

f(n)=g(n)+h(n)f(n) = g(n) + h(n)

Starting node SS

Goal node GG

h(S)=10,h(a)=7,h(b)=8,h(c)=5,h(d)=3,h(e)=2,h(f)=2,h(h)=0,h(G)=0 h(S)=10,\\ h(a)=7, \\h(b)=8,\\ h(c)=5,\\ h(d)=3, \\h(e)=2,\\ h(f)=2, \\ h(h)=0,\\ h(G)=0 

StepPriority queue before stepPath after stepCost0S(10)S01a(10),b(12)S→a32b(12),c(10),e(9),G(13)S→a→e73b(12),c(10),f(11),G(13)S→a→c54b(12),d(13),f(11),G(13)S→a→e→f95b(12),d(13),h(10),G(11)S→a→e→f→h106b(12),d(13),G(11)S→a→e→f→G11\begin{array}{|c|l|l|c|} \hline \text { Step } & \text { Priority queue before step } & \text { Path after step } & \text { Cost } \\\hline \hline 0 & S(10) & S & 0 \\\hline 1 & a(10), b(12) & S \rightarrow a & 3 \\\hline 2 & \textcolor{grey}{ b(12)}, c(10), e(9), G(13) & S \rarr a \rarr e & 7 \\\hline 3 & \textcolor{grey}{ b(12)}, \textcolor{grey}{c(10)}, f(11), \textcolor{grey}{G(13)} & S \rarr a \rarr c & 5 \\\hline 4 & \textcolor{grey}{ b(12)}, d(13), \textcolor{grey}{ f(11)}, \textcolor{grey}{G(13)} & S \rarr a \rarr e \rarr f & 9 \\\hline 5 & \textcolor{grey}{ b(12)}, \textcolor{grey}{ d(13)}, h(10),G(11) & S \rarr a \rarr e \rarr f \rarr h& 10 \\\hline 6 & \textcolor{grey}{ b(12)}, \textcolor{grey}{d(13)},\textcolor{grey}{ G(11)} & S \rarr a \rarr e \rarr f \rarr G& 11 \\\hline\end{array}Step0123456​Priority queue before stepS(10)a(10),b(12)b(12),c(10),e(9),G(13)b(12),c(10),f(11),G(13)b(12),d(13),f(11),G(13)b(12),d(13),h(10),G(11)b(12),d(13),G(11)​Path after stepSS→aS→a→eS→a→cS→a→e→fS→a→e→f→hS→a→e→f→G​Cost037591011​​

If I marked it grey, that means its from another step.

Were using a graph search algorithm with an explored set.

This is the explored set (seen in the correct order): explored={S,a,e,c,f,g}\text{explored}=\{S,a,e,c,f,g\}

g) Describe how hill climbing functions. Which problem can occur? Whats the solution to it?

Also called "hill-climbing":

++ escaping shoulders

- getting stuck in loop at local maximum

Solution: Random-restart hill climbing: restart after step limit overcomes local maxima

h) DFS vs. IDS - are they optimal with infinite branching factors?

They still must terminate at some point even with infinite branches, since they use a stack as their frontier-set data-structure.

They explore the tree in its full depth first before continuing with its breadth.

3) Depth-first search DFS

Fronier as stack.

Much faster than BFS if solutions are dense.

Backtracking variant: store one successor node at a time - generate child node with Δ\Delta change.

completeness No - only complete with explored set and in finite spaces.

time complexityO(bm)O(b^m) - Terrible results if mdm \geq d .

space complexityO(bm)O(bm)

solution optimality No

4) Depth-limited search DLS

DFS limited with \ell - returns a cutoff\text{cutoff} subtree/subgraph.

completeness No - May not terminate (even for finite search space)

time complexityO(b)O(b^\ell)

space complexityO(b)O(b\ell )

solution optimality No

5) Iterative deepening search IDS

Iteratively increasing \ell in DLS to gain completeness.

completeness Yes

time complexityO(bd)O(b^d)

  • in depth

    Only bb1\frac{b}{b-1} times more inefficient than BFS therefore O(bd)O(b^d)

    IDS(d)=∑i=0dDFS(i)=∑i=0d(∑j=0ibj)=∑i=0dbd+1−1b−1⩽∑i=0dbd+1b−1=bb−1⋅∑i=0dbi=bb−1⋅bd+1−1b−1=bb−1⋅(bb−1⋅bd)=O(bd)IDS(d) = \sum_{i=0}^{d} DFS(i)=\sum_{i=0}^{d} \big( \sum_{j=0}^{i} b^j \big) = \\\sum_{i=0}^{d} \frac{b^{d+1}-1}{b-1} \leqslant \sum_{i=0}^{d} \frac{b^{d+1}}{b-1} = \frac{b}{b-1} \cdot \sum_{i=0}^{d} b^i = \frac{b}{b-1} \cdot \frac{b^{d+1}-1}{b-1} = \\ \frac{b}{b-1} \cdot (\frac{b}{b-1} \cdot b^d) = O(b^d)IDS(d)=i=0∑d​DFS(i)=i=0∑d​(j=0∑i​bj)=i=0∑d​b−1bd+1−1​⩽i=0∑d​b−1bd+1​=b−1b​⋅i=0∑d​bi=b−1b​⋅b−1bd+1−1​=b−1b​⋅(b−1b​⋅bd)=O(bd)

space complexityO(bd)O(bd)

  • in depth

    The total number of nodes generated in the worst case is:

    (d)b+(d1)b+...+(1)b=O(db)(d)b +(d-1)b+...+(1)b=O(db)

    Most of the nodes are in the bottom level, so it does not matter much that the upper levels are generated multiple times. In IDS, the nodes on the bottom level (depth dd ) are generated once, those on the next-to-bottom level are generated twice, and so on, up to the children of the root, which are generated dd times.

    Therefore asymptotically the same as DFS.

solution optimality Yes, if step cost ≥ 1

Part 2

a) Cryptographic puzzle as a CSP

  E G G
+ O I L
-------
M A Y O

Defined as a constraint satisfaction problem:

X={E,G,O,I,L,A,Y,X1,X2,X3}X = \{E, G, O,I,L,A,Y, X_1, X_2, X_3\}

D={0,1,2,3,4,5,6,7,8,9}D=\{0,1,2,3,4,5,6,7,8,9\}

Constraints:

Each letter stands for a different digit: Alldiff(E,G,O,I,L,A,Y)\operatorname{Alldiff}(E,G,O,I,L,A,Y)

G+L=O+10X1X1+G+I=Y+10X2,X2+E+O=A+10X3,X3=M \begin{array}{l} G+L=O+10 \cdot X_{1} \\ X_{1}+G+I=Y+10 \cdot X_{2}, \\ X_{2}+E+O=A+10 \cdot X_{3}, \\ X_{3}=M \end{array} 

The X1,X2,X3X_1, X_2, X_3 variables stand for the digit that gets carried over into the next column by the addition. Therefore X1X_1 stands for 1010 in our unknown numeric system, X2X_2 stands for 100100 , X3X_3 stands for 10001000 .

b) 3-color-problem and backtracking-search heuristics

Least constraining value

Partial assignment: 1=green,2=red1=\text {green}, 2=\text {red}

Question: Which value would be assigned to the variable 33 by the least-constraining-value heuristic?

Answer: 3=red3=\text {red}

Minimum remaining values

Partial assignment: 2=blue, 5=red ,7=green2=\text {blue, } 5=\text {red }, 7=\text {green}

Question: Which variable would be assigned next using the minimum-remaining-values heuristic?

Answer 44

Degree heuristic

Question: Which variable would be assigned a value from its domain first by the degree heuristic?

Answer: 44

Explanation

Variable ordering: fail first

  • Minimum remaining (legal) values MRV
  • degree heuristic - (used as a tie breaker), variable that occurs the most often in constraints

Value ordering: fail last

  • least-constraining-value - a value that rules out the fewest choices for the neighbouring variables in the constraint graph.

c) Multiple Choice

1. A CSP can only contain binary constraints ❌

Backtracking search algorithm

Depth-first search with single-variable assignments:

  1. chooses values for one unassigned variable at a time - does not matter which one because of commutativity.
  1. tries all values in the domain of that variable
  1. backtracks when a variable has no legal values left to assign.

2. The backtracking search is an uninformed search algorithm. ✅

3. The axiom of continuity is: ABCp[p,A;1p,B]CA \succ B \succ C \Rightarrow \exists p[p, A ; 1-p, B] \sim C

The right answer would be ABCp[p,A;1p,C]B A \succ B \succ C \Rightarrow \exists p\space[p, A ; 1-p, C] \sim B 

d) Axiom of monotonicity from utility theory

Monotonicity

Agent prefers a higher probability of the state that it prefers.

AB(p>q[p,A;1p,B][q,A;1q,B]) A \succ B \Rightarrow(p>q \Leftrightarrow[p, A ; 1-p, B] \succ[q, A ; 1-q, B]) 

e) Multiple Choice

1. Classical planning environments are static ✅

Classical Planning Environments

Used in classical planning:

  • fully observable
  • deterministic, finite
  • static (change happens only when the agent acts)
  • discrete in time, actions, objects, and effects

2. xMeets(x,bob)\exists x \operatorname{Meets}(x, b o b) is syntactically correct in STRIPS ❌

No, STRIPS does not support quantors.

3. Unlike STRIPS, ADL supports equality ✅

4. In partial order planning, the constraint ABA \prec B means that AA must be executed immediately before BB

It means that AA must be executed some time before BB .

f) Formalizing a STRIPS action

Action(AttendingMeeting(emp,room)\text{Action}(AttendingMeeting(emp, room)

PRECOND: Employee(emp)Invited(emp)Room(room)In(emp,room)Mask(emp) \text{PRECOND: } Employee(emp) \wedge Invited(emp) \wedge Room(room) \wedge In(emp, room) \wedge Mask(emp) 

EFFECT:¬BreathesWell(emp)Stressed(emp)) \text{EFFECT}: \neg Breathes-Well(emp) \wedge Stressed(emp)~)