Exam on 2021-02-11

TU_Wien-Einfhrung_in_die_Knstliche_Intelligenz_VU_(Eiter_Tompits)_-_Prfung_2021-02-11.pdf

Part 1

a) Construct a neural network

Inputs $I_{1}, I_{2}$

Outputs $O_{1}, O_{2}, O_{3}$

You can also draw a separate network for every output.

$\begin{array}{c|c||c|c|c}I_{1} & I_{2} & O_{1} & O_{2} & O_{3} \\\hline 1 & 1 & 1 & 1 & 1 \\1 & 0 & 0 & 0 & 0 \\0 & 1 & 1 & 1 & 0 \\0 & 0 & 0 & 1 & 1\end{array}$

Activation function:

$g(x)=\left\{\begin{array}{ll}1 & \text {if ~} \frac{1}{3} \cdot x \geq 2 \\0 & \text {otherwise }\end{array}\right.$

b) Explain goal-based agents - in what way are they different from utility-based agents?

3. Goal-based agents

Add on: explicit goals , ability to plan into the future, model outcomes for predictions

model the world , goals, actions and their effects explicitly

more flexible, better maintainability

Search and planning of actions to achieve a goal

4. Utility-based agents

Add on: take happiness into account (current state) → Humans are utility-based agents

assess goals with a utility function (maximizing happiness, concept from micro-economics)

resolve conflicting goals - goals are weighted differently, the aim is to optimize a set of values

use expected utility for decision making

c) Explain the terms uninformed search, overfitting, activation function and deep learning.

Uninformed search

basic algorithms, no further information about solution costs (heuristics)

Overfitting

When the learned function fitted itself to the training set - but does not generalize well anymore

Tradeoff between complexity and simplicity

precision, fitting training data vs. generalizing, finding patterns

complex hypotheses - fit the data well vs. simpler hypotheses- might generalize better

Overfitting

The DTL algorithm will use any pattern it finds in the examples even if they are not correlated.

$\uparrow$ likelyhood of overfitting if $\uparrow$ hypothesis space, $\uparrow$ number of inputs

$\downarrow$ likelyhood of overfitting if $\uparrow$ examples.

Activation function

The function that defines when the (mathematial model of a) neuron-unit fires.

It uses a threshold, linear or logistic function on the weighted sum of all inputs.

aj=g(inj)=g(∑i=0nwi,jai)a_j = g(in_j) = g\biggl( \sum_{i=0}^n w_{i,j}a_i \biggl)aj=g(inj)=g(i=0∑nwi,jai)

Where $g$ is an activation function :

perceptron hard threshhold $g(x)=\left\{\begin{array}{ll}1 & \text { if } x \geq 0 \\0 & \text { otherwise }\end{array}\right.$

sigmoid perceptron logistic function $g(x) = \frac{1}{1+e^{-x}}$

Deep learning

Hierarchical structure learning: NNs with many layers

Feature extraction and learning (un)supervised

Problem: requires lots of data

Push for hardware (GPUs, designated chips): “Data-Driven” vs. “Model-Driven” computing

d) Multiple choice

1. Local beam search amounts to a locally confined run of $k$ random-restart searches. ✅

Idea keep $k$ states instead of just $1$

Not the same as running $k$ searches in parallel: Searches that find good states recruit other searches to join them - choose top $k$ of all their successors

Problem often, all $k$ states end up on same local hill

Solution choose $k$ successors randomly, biased towards good ones

2. Neural networks are generally bad in object recognition. ❌

No - they are particularly good at it.

3. A rational agent does not necessarily always select the best possible action. ✅

Being rational does not mean being:

omniscient (knowing everything) percepts may lack relevant information

clairvoyant (seeing into the future) predictions are not always accurate

👉 successful (ideal outcome)

4. A game of poker is a discrete environment ✅

discrete vs. continuous

property values of the world

e) Construct a decision Tree

A decision tree uses the following example set:

$\begin{array}{c||c|c|c|c} & A & B & C & \text { Output } \\\hline 1 & a_{1} & b_{1} & c_{2} & T \\2 & a_{2} & b_{1} & c_{1} & F \\3 & a_{2} & b_{2} & c_{2} & T \\4 & a_{2} & b_{3} & c_{1} & F \\5 & a_{2} & b_{3} & c_{1} & T \\6 & a_{2} & b_{3} & c_{1} & T \\7 & a_{2} & b_{3} & c_{2} & F \\8 & a_{1} & b_{2} & c_{2} & T\end{array}$

Construct a decision tree from the given data, using the algorithm from the lecture. The next attribute is always chosen according to the predetermined order: $A, B, C$ . Which special case occurs? How does the algorithm deal with it?

We have a predefined order, therefore we do not have to calculate the importance of each attribute individually.

The domains of these attributes:

$D_A =\{a_1, a_2\}$

$D_B =\{b_1, b_2, b_3\}$

$D_C =\{c_1, c_2\}$

/*
- examples: (x, y) pairs
- attributes: attributes of values x_1, x_2, ... x_n in x in example
- classification: y_j = true / false
*/function DTL(examples, attributes, parent_examples) returns a decision tree
if examples is empty then return MAJORITY(parent_examples.classifications)
else if all examples have the same classification then return the classification
else if attributes is empty then return MAJORITY(examples.classification)
else
A = empty
for each Attribute A' do
A ← maxImportance(A, A', examples)tree ← a new decision tree with root test Afor each (value v of A) do //all possible values from domain of A
exs ← {e : e ∈ examples | e.A = v} //all examples where this value occured
subtree ← DTL(exs, attributes\{A}, examples) //recursive call
add a new branch to tree with label "A = v" and subtree subtreereturn tree

The algorithm is recursive, every sub tree is an independent problem with fewer examples and one less attribute.

The returned tree is not identical to the correct function / tree $f$ . It is a hypothesis $h$ based on examples.

in depth explaination
1. no examples left
  then it means that there is no example with this combination of values → We return a default value: the majority function on examples in the parents.
1. Remaining examples are all positive or negative
  we are done
1. no attributes left but both positive and negative examples
  These examples have the same description but different classifications, we return the plurality-classification (Reasons: noise in data, non determinism)
1. some positive or negative examples
  choose the most important attribute to split and repeat this process for subtree

In this special case using this predefined order: we have no attributes left but both positive and negative examples .

Reason: These examples have the same description but different classifications, we return the plurality-classification (Reasons: noise in data, non determinism)

f) Search problem - A* Search

$f(n) = g(n) + h(n)$

Starting node $S$

Goal node $G$

$h(S)=10,\\ h(a)=7, \\h(b)=8,\\ h(c)=5,\\ h(d)=3, \\h(e)=2,\\ h(f)=2, \\ h(h)=0,\\ h(G)=0$

StepPriority queue before stepPath after stepCost0S(10)S01a(10),b(12)S→a32b(12),c(10),e(9),G(13)S→a→e73b(12),c(10),f(11),G(13)S→a→c54b(12),d(13),f(11),G(13)S→a→e→f95b(12),d(13),h(10),G(11)S→a→e→f→h106b(12),d(13),G(11)S→a→e→f→G11\begin{array}{|c|l|l|c|} \hline \text { Step } & \text { Priority queue before step } & \text { Path after step } & \text { Cost } \\\hline \hline 0 & S(10) & S & 0 \\\hline 1 & a(10), b(12) & S \rightarrow a & 3 \\\hline 2 & \textcolor{grey}{ b(12)}, c(10), e(9), G(13) & S \rarr a \rarr e & 7 \\\hline 3 & \textcolor{grey}{ b(12)}, \textcolor{grey}{c(10)}, f(11), \textcolor{grey}{G(13)} & S \rarr a \rarr c & 5 \\\hline 4 & \textcolor{grey}{ b(12)}, d(13), \textcolor{grey}{ f(11)}, \textcolor{grey}{G(13)} & S \rarr a \rarr e \rarr f & 9 \\\hline 5 & \textcolor{grey}{ b(12)}, \textcolor{grey}{ d(13)}, h(10),G(11) & S \rarr a \rarr e \rarr f \rarr h& 10 \\\hline 6 & \textcolor{grey}{ b(12)}, \textcolor{grey}{d(13)},\textcolor{grey}{ G(11)} & S \rarr a \rarr e \rarr f \rarr G& 11 \\\hline\end{array}Step0123456Priority queue before stepS(10)a(10),b(12)b(12),c(10),e(9),G(13)b(12),c(10),f(11),G(13)b(12),d(13),f(11),G(13)b(12),d(13),h(10),G(11)b(12),d(13),G(11)Path after stepSS→aS→a→eS→a→cS→a→e→fS→a→e→f→hS→a→e→f→GCost037591011

If I marked it grey, that means its from another step.

Were using a graph search algorithm with an explored set.

This is the explored set (seen in the correct order): $\text{explored}=\{S,a,e,c,f,g\}$

g) Describe how hill climbing functions. Which problem can occur? Whats the solution to it?

Also called "hill-climbing":

$+$ escaping shoulders

$-$ getting stuck in loop at local maximum

Solution: Random-restart hill climbing: restart after step limit overcomes local maxima

Pseudocode

function HILL-CLIMBING(problem) returns a state that is a local maximum
inputs: problem // a problem
local variables:
current, // a node
neighbor // a node
current ← MAKE-NODE(INITIAL-STATE[problem])
loop do
neighbor ← a highest-valued successor of current
if VALUE[neighbor] ≤ VALUE[current] then return STATE[current]
current ← neighbor

h) DFS vs. IDS - are they optimal with infinite branching factors?

They still must terminate at some point even with infinite branches, since they use a stack as their frontier-set data-structure.

They explore the tree in its full depth first before continuing with its breadth.

3) Depth-first search DFS

Fronier as stack.

Much faster than BFS if solutions are dense.

Backtracking variant: store one successor node at a time - generate child node with $\Delta$ change.

completeness No - only complete with explored set and in finite spaces.

time complexity $O(b^m)$ - Terrible results if $m \geq d$ .

space complexity $O(bm)$

solution optimality No

4) Depth-limited search DLS

DFS limited with $\ell$ - returns a $\text{cutoff}$ subtree/subgraph.

Pseudocode

function DEPTH-LIMITED-SEARCH(problem, limit) returns a solution, or failure/cutoff
return RECURSIVE-DLS(MAKE-NODE(problem.INITIAL-STATE), problem, limit)function RECURSIVE-DLS(node, problem, limit) returns a solution, or failure/cutoff
if problem.GOAL-TEST(node.STATE) then return SOLUTION(node)
else if limit = 0 then return cutoff
else
cutoff occurred? ← false
for each action in problem.ACTIONS(node.STATE) do
child ← CHILD-NODE(problem, node, action)
result ← RECURSIVE-DLS(child, problem, limit − 1)
if result = cutoff then cutoff occurred? ← true
else if result != failure then return result
if cutoff occurred? then return cutoff else return failure

completeness No - May not terminate (even for finite search space)

time complexity $O(b^\ell)$

space complexity $O(b\ell )$

solution optimality No

5) Iterative deepening search IDS

Iteratively increasing $\ell$ in DLS to gain completeness.

Pseudocode

function ITERATIVE-DEEPENING-SEARCH(problem) returns a solution, or failure
for depth = 0 to ∞ do
result ← DEPTH-LIMITED-SEARCH(problem, depth)
if result 6= cutoff then return result

completeness Yes

time complexity $O(b^d)$

in depth
Only $\frac{b}{b-1}$ times more inefficient than BFS therefore $O(b^d)$
IDS(d)=∑i=0dDFS(i)=∑i=0d(∑j=0ibj)=∑i=0dbd+1−1b−1⩽∑i=0dbd+1b−1=bb−1⋅∑i=0dbi=bb−1⋅bd+1−1b−1=bb−1⋅(bb−1⋅bd)=O(bd)IDS(d) = \sum_{i=0}^{d} DFS(i)=\sum_{i=0}^{d} \big( \sum_{j=0}^{i} b^j \big) = \\\sum_{i=0}^{d} \frac{b^{d+1}-1}{b-1} \leqslant \sum_{i=0}^{d} \frac{b^{d+1}}{b-1} = \frac{b}{b-1} \cdot \sum_{i=0}^{d} b^i = \frac{b}{b-1} \cdot \frac{b^{d+1}-1}{b-1} = \\ \frac{b}{b-1} \cdot (\frac{b}{b-1} \cdot b^d) = O(b^d)IDS(d)=i=0∑dDFS(i)=i=0∑d(j=0∑ibj)=i=0∑db−1bd+1−1⩽i=0∑db−1bd+1=b−1b⋅i=0∑dbi=b−1b⋅b−1bd+1−1=b−1b⋅(b−1b⋅bd)=O(bd)

space complexity $O(bd)$

in depth
The total number of nodes generated in the worst case is:
$(d)b +(d-1)b+...+(1)b=O(db)$
Most of the nodes are in the bottom level, so it does not matter much that the upper levels are generated multiple times. In IDS, the nodes on the bottom level (depth $d$ ) are generated once, those on the next-to-bottom level are generated twice, and so on, up to the children of the root, which are generated $d$ times.
Therefore asymptotically the same as DFS.

solution optimality Yes, if step cost ≥ 1

Part 2

a) Cryptographic puzzle as a CSP

  E G G
+ O I L
-------
M A Y O

Defined as a constraint satisfaction problem:

$X = \{E, G, O,I,L,A,Y, X_1, X_2, X_3\}$

$D=\{0,1,2,3,4,5,6,7,8,9\}$

Constraints:

Each letter stands for a different digit: $\operatorname{Alldiff}(E,G,O,I,L,A,Y)$

$\begin{array}{l} G+L=O+10 \cdot X_{1} \\ X_{1}+G+I=Y+10 \cdot X_{2}, \\ X_{2}+E+O=A+10 \cdot X_{3}, \\ X_{3}=M \end{array}$

The $X_1, X_2, X_3$ variables stand for the digit that gets carried over into the next column by the addition. Therefore $X_1$ stands for $10$ in our unknown numeric system, $X_2$ stands for $100$ , $X_3$ stands for $1000$ .

b) 3-color-problem and backtracking-search heuristics

Least constraining value

Partial assignment: $1=\text {green}, 2=\text {red}$

Question: Which value would be assigned to the variable $3$ by the least-constraining-value heuristic?

Answer: $3=\text {red}$

Minimum remaining values

Partial assignment: $2=\text {blue, } 5=\text {red }, 7=\text {green}$

Question: Which variable would be assigned next using the minimum-remaining-values heuristic?

Answer $4$

Degree heuristic

Question: Which variable would be assigned a value from its domain first by the degree heuristic?

Answer: $4$

Explanation

Variable ordering: fail first

Minimum remaining (legal) values MRV

degree heuristic - (used as a tie breaker), variable that occurs the most often in constraints

Value ordering: fail last

least-constraining-value - a value that rules out the fewest choices for the neighbouring variables in the constraint graph.

c) Multiple Choice

1. A CSP can only contain binary constraints ❌

Backtracking search algorithm

Depth-first search with single-variable assignments:

chooses values for one unassigned variable at a time - does not matter which one because of commutativity.

tries all values in the domain of that variable

backtracks when a variable has no legal values left to assign.

2. The backtracking search is an uninformed search algorithm. ✅

3. The axiom of continuity is: $A \succ B \succ C \Rightarrow \exists p[p, A ; 1-p, B] \sim C$ ❌

The right answer would be $A \succ B \succ C \Rightarrow \exists p\space[p, A ; 1-p, C] \sim B$

d) Axiom of monotonicity from utility theory

Monotonicity

Agent prefers a higher probability of the state that it prefers.

$A \succ B \Rightarrow(p>q \Leftrightarrow[p, A ; 1-p, B] \succ[q, A ; 1-q, B])$

e) Multiple Choice

1. Classical planning environments are static ✅

Classical Planning Environments

Used in classical planning:

fully observable

deterministic, finite

static (change happens only when the agent acts)

discrete in time, actions, objects, and effects

2. $\exists x \operatorname{Meets}(x, b o b)$ is syntactically correct in STRIPS ❌

No, STRIPS does not support quantors.

3. Unlike STRIPS, ADL supports equality ✅

4. In partial order planning, the constraint $A \prec B$ means that $A$ must be executed immediately before $B$ ❌

It means that $A$ must be executed some time before $B$ .

f) Formalizing a STRIPS action

$\text{Action}(AttendingMeeting(emp, room)$

$\text{PRECOND: } Employee(emp) \wedge Invited(emp) \wedge Room(room) \wedge In(emp, room) \wedge Mask(emp)$

$\text{EFFECT}: \neg Breathes-Well(emp) \wedge Stressed(emp)~)$

Part 1

a) Construct a neural network

b) Explain goal-based agents - in what way are they different from utility-based agents?

c) Explain the terms uninformed search, overfitting, activation function and deep learning.

d) Multiple choice

1. Local beam search amounts to a locally confined run of @import url("https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.2/katex.min.css"); kkk﻿ random-restart searches. ✅

2. Neural networks are generally bad in object recognition. ❌

3. A rational agent does not necessarily always select the best possible action. ✅

4. A game of poker is a discrete environment ✅

e) Construct a decision Tree

f) Search problem - A* Search

g) Describe how hill climbing functions. Which problem can occur? Whats the solution to it?

h) DFS vs. IDS - are they optimal with infinite branching factors?

Part 2

a) Cryptographic puzzle as a CSP

b) 3-color-problem and backtracking-search heuristics

Least constraining value

Minimum remaining values

Degree heuristic

Explanation

c) Multiple Choice

1. A CSP can only contain binary constraints ❌

2. The backtracking search is an uninformed search algorithm. ✅

3. The axiom of continuity is: @import url("https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.2/katex.min.css"); A≻B≻C⇒∃p[p,A;1−p,B]∼CA \succ B \succ C \Rightarrow \exists p[p, A ; 1-p, B] \sim CA≻B≻C⇒∃p[p,A;1−p,B]∼C﻿ ❌

d) Axiom of monotonicity from utility theory

e) Multiple Choice

1. Classical planning environments are static ✅

2. @import url("https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.2/katex.min.css"); ∃xMeets⁡(x,bob)\exists x \operatorname{Meets}(x, b o b)∃xMeets(x,bob)﻿ is syntactically correct in STRIPS ❌

3. Unlike STRIPS, ADL supports equality ✅

f) Formalizing a STRIPS action

1. Local beam search amounts to a locally confined run of $k$ random-restart searches. ✅

3. The axiom of continuity is: $A \succ B \succ C \Rightarrow \exists p[p, A ; 1-p, B] \sim C$ ❌

2. $\exists x \operatorname{Meets}(x, b o b)$ is syntactically correct in STRIPS ❌