Exam on 2012-10-02

1. Formalize a STRIPS action

Action(Drive(car,from,to),\text{Action}(\text{Drive}(car, from, to),

PRECOND: At(car,from)Car(car)Location(from)Location(to) \text{PRECOND:}~\text{At}(car, from)\wedge \text{Car}(car) \wedge \text{Location}(from) \wedge \text{Location}(to) 

EFFECT: At(car,to)¬At(car,from)) \text{EFFECT:} ~\text{At}(car, to) \wedge \neg\text{At}(car, from)~~) 

2. What is a consistent plan? What is a solution?

Solution

The solution to a declared planning-problem is a plan:

Any action sequence that when executed in the initial state - results in a state that satisfies the goal.

Every action sequence that maintains the partial order is a solution.

(The in POP algorithm the goal is therefore only defined by not having open preconditions)

Consistent actions

Chosen actions should not undo the progress made.

Consistent plans (in partial order planning)

Must be free of contradictions / cycles in the ordering constraint set : (AB)(BA)(A \prec B) \wedge (B\prec A)

Must be free of conflicts in the casual links set : precondition pp must remain true\text{true} if ApBA \stackrel{p}{\longrightarrow} B .

3. Describe the two possible ways to search in the state space search

  1. Totally ordered plan searches

    Search in strictly linear order.

    • progression planing: initial state\rarrgoal
    • regression planing: initial state\larrgoal

      (possible because of declarative representation of PDDL and STRIPS)

  1. Partial-order planning

    Order is not specified.

4. What are the differences between ADL and STRIPS?

STRIPS

  • only positive literals in states
  • closed-world-assumption: unmentioned literals are false
  • Effect P¬QP \wedge \neg Q means add PP and delete QQ
  • only ground atoms in goals
  • Effects are conjunctions
  • No support for equality or types

ADL

  • positive and negative literals in states
  • open-world-assumption: unmentioned literals are unknown
  • Effect P¬QP \wedge \neg Q means add PP and ¬Q\neg Q delete QQ and ¬P\neg P
  • goals contain \exists quantor
  • goals allow conjunction and disjunction
  • conditional effects are allowed: P:EP:E means EE is only an effect if PP is satisfied
  • equality and types built in

5. Risk aversion

Given:

  • lottery LL
  • the expected monetary value EMV(L)\text{EMV}(L)
  • Utility function U(L)\text{U}(L)

What would the risk averse agent prefer U(L)\text{U}(L) or U(SEMV(L))\text{U}(S_{\text{EMV}(L)}) ?

Answer:

as we gain more money, its utility does not rise proportionally.

The Utility for getting your first million dollars is very high, but the utility for the additional million is smaller.

For the risk averse agent:

U(L)<U(SEMV(L))U(L) < U(\textcolor{pink}{S_{EMV(L)}})

the utility of being faced with that lottery << than the utility of being handed the expected monetary value of the lottery with absolute certainty

6. The axioms of utility theory

  1. Orderability

    The agent must have a preference.

    (AB)xor(BB)xor(AB) (A \succ B) \space \textsf{xor} \space (B \succ B) \space \textsf{xor} \space (A \succsim B) 

  1. Transivity

    (AB)(BC)(AC)(A \succ B) \wedge(B \succ C) \Rightarrow(A \succ C)

  1. Continuity

    If ABCA \succ B \succ C there is a probability pp for which the agent would be indifferent to

    getting BB with absolute certainty

    or AA with probability pp and CC with 1p1-p

    ABCp[p,A;1p,C]B A \succ B \succ C \Rightarrow \exists p\space[p, A ; 1-p, C] \sim B 

  1. Substitutability

    If agent is indifferent to AA and BB then agent is indifferent to complex lotteries with same probabilities.

    AB[p,A;1p,C][p,B;1p,C] A \sim B \Rightarrow[p, A ; 1-p, C] \sim[p, B ; 1-p, C] 

  1. Monotonicity

    Agent prefers a higher probability of the state that it prefers.

    AB(p>q[p,A;1p,B][q,A;1q,B]) A \succ B \Rightarrow(p>q \Leftrightarrow[p, A ; 1-p, B] \succ[q, A ; 1-q, B]) 

  1. Decomposability

    Compound lotteries can be reduced to simpler ones.

    [p,A;1p,[q,B;1q,C]][p,A;(1p)q,B;(1p)(1q),C] [p, A ; 1-p,[q, B ; 1-q, C]] \sim[p, A ;(1-p) q, B ;(1-p)(1-q), C] 

If an agent violates these axioms it will exhibit irrational behaviour.

7. Components of a decision network

Chance nodes (ovals) random variables

Decision nodes (rectangles) points where decision maker has a choice of actions

Utility nodes (diamonds) agent’s utility function

8. How do the nodes get expanded in uninformed search algorithms?

The node expansion is definied by the goal testing (at expansion / at generation) and the used datastructure for the frontier-set.

  1. Breadth-first search BFS Frontier as queue.
  1. Uniform-cost search UCS Frontier as priority queue ordered by costs.
  1. Depth-first search DFS Frontier as stack.
  1. Depth-limited search DLS DFS limited with \ell - returns a cutoff\text{cutoff} subtree/subgraph.
  1. Iterative deepening search IDS Iteratively increasing \ell in DLS to gain completeness.

9. Given admissible heuristics (h1,h2,,hn)(h_1, h_2, \dots, h_n) . Which one is the best?

Admissibility

Optimistic, never overestimating, never negative.

h(n)f(n)h(n) \leq f^*(n)

h(n)0h(n) \geq 0

g\forall g goals: h(g)=0h(g)=0(follows from the above)

Dominance

For two admissible heuristics, we say h2h_2 dominates h1h_1 , if:

n:h2(n)h1(n)\forall n: h_2(n) \geq h_1(n)

Then h2h_2 is better for search.

Therefore we must choose the admissible heuristic that dominates all the others.

10. Admissible heuristics have a problem with graph search. Which one is it? How can this problem be solved?

For the optimality of the A* solution in graphs the heuristic h(n)h(n) must be consistent.


If h(n)h(n) decreases during optimal path to goal it is discarded by graph-search.

(But in tree search it is not a problem since there is only a single path to the optimal goal).

Solutions to problem:

  • consistency of hh (non decreasing ff on every path)
  • additional book-keeping

11. What is inductive learning?

Types of learning

analytical / deductive learning : going from general rule to more specific and efficient rule

Inductive Learning : discovering the general rule from examples

  • Ignores prior knowledge
  • Assumes a deterministic, observable "environment"
  • Assumes examples are given (selection of examples for training set is challenging)
  • In its simplest form: learning a function from examples
  • Assumes agent wants to learn

Learning a function from examples (= supervised learning)

the aim is to find a simple hypothesis that is approximately consistent with training examples

Training set (of data points)

Example input-output pairs: (x1,y1),(x2,y2),...,(xn,yn)(x_1, y_1), (x_2,y_2), ..., (x_n,y_n)

Where each yiy_i was generated by an unknown functionf(x)=yf(x) = y

Task of agent

Discover a function hhhypothesis that approximates the true function ff .

Learning: search through the hypothesis spaceH\mathcal{H} for one that performs well even for examples beyond the training set.

We say a problem is realizable if the fHf\in \mathcal H . (We dont always know because the true function is unknown).

Variations

  • If function is stochastic we have to learn a conditional probability distributionP(Yx)P(Y|x).
  • If output is out of a finite / discrete set of values then the problem is called boolean / binary classification.
  • If output is a number the problem is called regression .

Test set

Measures the accuracy of the hypothesis.

If it performed well one says it generalized well.

If the hypothesis agrees with all the data then it is called a consistent hypothesis .

12. What is a learning curve?

How do we know that hfh ≈ f ? (Hume’s Problem of Induction)

Learning curve = % correct on test set when using hh

Learning curve depends on realizability of the target function.

Non-realizability can be due to

  • missing attributes
  • too restrictive hypothesis space (e.g.,linear function)
  • redundant expressiveness (e.g., loads of irrelevant attributes)

(x-axis: size of training set, y-axis: % correct on test set)

13. Learning agents and their components

Goal: making agents improve themselves by studying their own experiences

Learning Agent = performance element + learning element

Performance element

Everything in between sensors and actuators abstracted away.

Takes in percepts and chooses actions.

Critic

Feedback on performance (based on a fixed performance standard).

Learning element LE

Change performance element for improvement.

Change "knowledge components" of agent (problem generator and performance element - independent of agents architecture).

Problem generator

suggest actions for new experiences to learn from - based on learning goals

Example: Taxi

Performance element knowledge about driving

Critic reactions of customer / other drivers / pedestrians / bikers / police

Learning element create rules - ie. not to turn steering wheel to quickly, break softly etc.

Problem generator try brakes on slippery road, push back with side (wing) mirror only

14. Construct a neural network that represents a half adder

Version 1:

Activation function:

g(x)={1 if x1,0 otherwise  g(x)=\left\{\begin{array}{ll}1 & \text { if } x \geq 1, \\0 & \text { otherwise }\end{array}\right. 

Version 2:

Activation function:

g(x)={1 if x0,0 otherwise  g(x)=\left\{\begin{array}{ll}1 & \text { if } x \geq 0, \\0 & \text { otherwise }\end{array}\right. 

This could be done more elegantly but we just took the previous solution and subtracted 11 from every nodes weight to be able to copy the previous solution.

15. Advantages and disadvantages of neural networks / learning by observation

Advantages and Disadvantages of Neural Networks

Pros

  • Less need for determining relevant input factors
  • Inherent parallelism
  • Easier to develop than statistical methods
  • Capability to learn and improve
  • Useful for complex pattern recognition tasks, "unstructured" and "difficult" input (e.g., images)
  • Fault tolerance

Cons

Challenges