Exam on 2018-06-26

Part 1)

a) Construct a neural network

Properties:

  • max 1 inner layer
  • 2 binary inputs I1,I2I_1, I_2
  • 3 outputs O1,O2,O3O_1, O_2, O_3

I1I2O1O2O300000011101011111100 \begin{array}{c|c||c|c|c}I_{1} & I_{2} & O_{1} & O_{2} & O_{3} \\\hline 0 & 0 & 0 & 0 & 0 \\0 & 1 & 1 & 1 & 0 \\1 & 0 & 1 & 1 & 1 \\1 & 1 & 1 & 0 & 0\end{array} 

Activation function:

g(x)={1 if x1,0 otherwise  g(x)=\left\{\begin{array}{ll}1 & \text { if } x \geq 1, \\0 & \text { otherwise }\end{array}\right. 

b) Describe the properties of DFIDS and DFS

Depth-first search DFS

Fronier as stack.

Much faster than BFS if solutions are dense.

completeness No - only with explored set and in finite spaces.

time complexityO(bm)O(b^m) - Terrible results if mdm \geq d .

space complexityO(bm)O(bm) - backtracking variant: store one successor node at a time

solution optimality No

Depth-limited search DLS

DFS limited with \ell - returns a cutoff\text{cutoff} subtree/subgraph.

completeness No - May not terminate (even for finite search space)

time complexityO(b)O(b^\ell)

space complexityO(b)O(b\ell )

solution optimality No

Iterative deepening search IDS

Iteratively increasing \ell in DLS to gain completeness.

completeness Yes

time complexityO(bd)O(b^d)

  • in depth

    Only bb1\frac{b}{b-1} times more inefficient than BFS therefore O(bd)O(b^d)

    IDS(d)=∑i=0dDFS(i)=∑i=0d(∑j=0ibj)=∑i=0dbd+1−1b−1⩽∑i=0dbd+1b−1=bb−1⋅∑i=0dbi=bb−1⋅bd+1−1b−1=bb−1⋅(bb−1⋅bd)=O(bd)IDS(d) = \sum_{i=0}^{d} DFS(i)=\sum_{i=0}^{d} \big( \sum_{j=0}^{i} b^j \big) = \\\sum_{i=0}^{d} \frac{b^{d+1}-1}{b-1} \leqslant \sum_{i=0}^{d} \frac{b^{d+1}}{b-1} = \frac{b}{b-1} \cdot \sum_{i=0}^{d} b^i = \frac{b}{b-1} \cdot \frac{b^{d+1}-1}{b-1} = \\ \frac{b}{b-1} \cdot (\frac{b}{b-1} \cdot b^d) = O(b^d)IDS(d)=i=0∑d​DFS(i)=i=0∑d​(j=0∑i​bj)=i=0∑d​b−1bd+1−1​⩽i=0∑d​b−1bd+1​=b−1b​⋅i=0∑d​bi=b−1b​⋅b−1bd+1−1​=b−1b​⋅(b−1b​⋅bd)=O(bd)

space complexityO(bd)O(bd)

  • in depth

    The total number of nodes generated in the worst case is:

    (d)b+(d1)b+...+(1)b=O(db)(d)b +(d-1)b+...+(1)b=O(db)

    Most of the nodes are in the bottom level, so it does not matter much that the upper levels are generated multiple times. In IDS, the nodes on the bottom level (depth dd ) are generated once, those on the next-to-bottom level are generated twice, and so on, up to the children of the root, which are generated dd times.

    Therefore asymptotically the same as DFS.

solution optimality yes, if step cost ≥ 1

c) Name the 4 components of an agents task description.

PEAS: Task description of Agents. ie. Autonomous taxi

Performance measure safety, destination, profits, legality, comfort, . . .

Environment streets/freeways, traffic, pedestrians, weather, . . .

Actuators steering, accelerator, brake, horn, speaker/display, . . .

Sensors video, accelerometers, gauges, engine sensors, keyboard, GPS, . . .

d) What is deep learning, when does it work well, where are its shortcomings?

Deep Learning

Deep learning: Hierarchical structure learning - Artificial neural network with many layers.

Feature extraction and learning (un)supervised

Problem: requires lots of data

Advantages and Disadvantages of Neural Networks

Pros

  • Less need for determining relevant input factors
  • Inherent parallelism
  • Easier to develop than statistical methods
  • Capability to learn and improve
  • Quite good for complex pattern recognition tasks
  • Usable for "unstructured" and "difficult" input (e.g., images)
  • Fault tolerance

Cons

  • Choice of parameters (layers, units) requires skill
  • Requires sufficient training material
  • Resulting hypotheses cannot be understood easily
  • Knowledge implicit (subsymbolic representation)
  • Verification, behavior prediction difficult (if not impossible)

Challenges

e) Which functions can be represented with single and multi-layer perceptrons?

No hidden layers all linearly seperable functions

One hidden layer all continuous function, can be as precise as one wants

Two hidden layers all functions

Single-layer feed-forward neural networks (Perceptrons)

No hidden layers: Always represents a linear separator in the input space - therefore can not represent the XOR function.

Multilayer feed-forward neural networks

With a single, sufficiently large hidden layer (2 layers in total), it is possible to represent any continuous function of the inputs with arbitrary accuracy.

  • in depth

    Above we see a nested non-linear function as the solution to the output.

    With the sigmoid function as gg and a hidden layer, each output unit computes a soft-thresholded linear combination of several sigmoid functions.

    For example, by adding two opposite-facing soft threshold functions and thresholding the result, we can obtain a "ridge" function.

    Combining two such ridges at right angles to each other (i.e., combining the outputs from four hidden units), we obtain a "bump".

    With more hidden units, we can produce more bumps of different sizes in more places.

With two hidden layers, even discontinuous functions can be represented.

f) Prove or contradict the following

h(n)h(n) is consistent for all consistent heuristics h1,h2h_1, h_2 if: h(n):=max{h1(n),h2(n)}h(n) := \max\{h_1(n), h_2(n)\}

Consistency

For all neighbours nn' of nn :

h(n)c(n,a,n)+h(n) h(n) \leq c\left(n, a, n^{\prime}\right)+h\left(n^{\prime}\right) 

  • This also implies admissibility and that ff is non decreasing along every path

    f(n)=g(n)+h(n)=g(n)+c(n,a,n)+h(n)g(n)+h(n)=f(n) \begin{aligned}f\left(n^{\prime}\right) &=g\left(n^{\prime}\right)+h\left(n^{\prime}\right) \\&=g(n)+c\left(n, a, n^{\prime}\right)+h\left(n^{\prime}\right) \\& \geq g(n)+h(n)=f(n)\end{aligned} 

    f(n)f(n)f(n') \geq f(n)

Proof

h(n)=max{h1(n),h2(n)}h(n) = \max\{h_1(n), h_2(n)\}

We know that h1h_1 and h2h_2 are consistent and therefore:

h1(n)c(n,a,n)+h1(n) h_1(n) \leq c\left(n, a, n^{\prime}\right)+h_1\left(n^{\prime}\right) 

h2(n)c(n,a,n)+h2(n) h_2(n) \leq c\left(n, a, n^{\prime}\right)+h_2\left(n^{\prime}\right) 

This means that:

h(n)=max{h1(n),h2(n)}max{c(n,a,n)+h1(n),h2(n)c(n,a,n)+h2(n)}max{h1(n),h2(n)}undefinedh(n)+c(n,a,n) \begin{aligned}h(n) &=\max\{h_1(n), h_2(n)\}\\& \leq \max\{c(n, a, n^{\prime})+h_1(n^{\prime}), h_2(n) \leq c(n, a, n^{\prime})+h_2(n^{\prime})\}\\& \leq \underbrace{\max\{h_1(n^{\prime}), h_2 (n^{\prime})\}}_{h(n)}+c(n, a, n^{\prime}) \end{aligned} 

Which proves the consistency of h(n)h(n) .

g) Describe the working principle of Hill Climbing. What solution are generated when can't solutions be found?

We have 2 local variables: current and neighbour that get chosen randomly in the beginning. (random-restart-hill-climbing).

while(true){
neighbour := highest valued neighbour
if(neigbour.value ≤ current.value) return current.state
else current := neighbour
}

We pick the neighbour with the highest value until there are no higher values left - then we can't climb anymore. Then we return the highest found state.

We can get stuck in local maxima but can escape shoulders.

Part 2)

a) Three-colorability problem CSP → Constriant Graph

Variables {1,2,3,4,5,6,7,8,9,10,11,12,13}\{1,2,3,4,5,6,7,8,9,10,11,12,13\}

Domains Di={ red, green, blue },1i3D_{i}=\{\text { red, green, blue }\}, 1 \leq i \leq 3

Constraints Adjacent states shall have different colors.

Because we only have binary constrains we can construct the graph as follows:

b) Three-colorability problem

Same variables, domains, constraints as before.

Least-constraining-value heuristic

Partial assignment: 33 = green, 44 = blue — which value would 55 get?

The value blue\text{blue} would be the least constraining for all neighbours.

Minimum-remaining-values heuristic

Partial assignment: 11 = blue, 22 = red, 33 = green

The variable 44 has the minimum remaining values in its domains (only one).

Degree heuristic

Which variable would be assigned first?

The variable 55 has the most neighbours.

c) Explain Backtracking Search for CSPs - why isn't it efficient?

CSPs as standard search problems

If we do not consider commutativity we have n!dnn!d^n leafs else we have dnd^n leafs.

The order of the assignments has no effect on the outcome. - Choosing a single variable variable in each tree level.

Backtracking search algorithm

Uninformed search algorithm (not effective for large trees).

Depth-first search:

  1. chooses values for one unassigned variable at a time - does not matter which one because of commutativity.
  1. tries all values in the domain of that variable
  1. backtracks when a variable has no legal values left to assign.

Because its ineffective we use heuristics / General-purpose methods for backtracking search.

d) Planning: What 3 problems occur in reasoning about actions?

Reasoning about the results of actions

frame problem ... what are things that stay unchanged after action?

ramification problem ... implicit effects (if person is in car, they both move)

qualification problem ... what are preconditions for actions? correct conceptualisation

e) Formalizing in STRIPS an action syntax

InstallInstall gets a requested action from a trusted source server and then installs it on the target server.

Predicates:

Program(x)\text{Program}(x)xx is a program

Server(x)\text{Server}(x)xx is a server

Trusted(x)\text{Trusted}(x)xx is trusted

Requested(x)\text{Requested}(x)xx is requested

Available(x,y)\text{Available}(x,y)xx is only available on yy

Action(Install(p,ss,ts),\operatorname{Action}(Install(p, ss, ts),

PRECOND: Program(p)Server(ss)Server(ts)Trusted(ss)Requested(p)Available(p,ss) \text{PRECOND: } \text{Program}(p) \wedge \text{Server}(ss) \wedge \text{Server}(ts) \wedge \text{Trusted}(ss) \wedge \text{Requested}(p) \wedge \text{Available}(p,ss) 

EFFECT: ¬Requested(p)Available(p,ts)) \text {EFFECT:} ~\neg\text{Requested}(p) \wedge \text{Available}(p,ts)~~) 

f) Describe the node types in decision networks

= influence diagrams.

General framework for rational decisions. Return the action with highest utility.

Decision networks are an extension of Bayesian networks.

Presents:

  1. current state
  1. possible actions
  1. resulting state from actions
  1. utility of each state

Node types:

Chance nodes (ovals) Random variables, uncertainty - Bayes Network

Decision nodes (rectangles) Decision maker has choice of action

Utility nodes (diamonds) Utility function