Utility

Utility Theory

Utility theory and probability theory (for decision-theoretic agents)

deals with uncertainty, conflicting goals, maximizing utility.

Decision theory

deals with desireability of immediate outcomes in an episodic environment.

Non-deterministic, in partially observable environments

Outcome defined by random variable RESULT(a)\text{RESULT}(a) .

Probability of outcome ss' with given observations e\vec e .

P(RESULT(a)=sa,e)P\space(\text{RESULT}(a)= s'|a, \mathbf e)

Utility function

U(s)U(s) desireability of state

Expected Utility (average utility value)

Its impicit that ss' can follow from the current state ss .

EU(ae)=sP(RESULT(a)=sa,e)U(s) E U(a \mid \mathbf{e})=\sum_{s^{\prime}} P(\operatorname{RESULT}(a)=s^{\prime} \mid a, \mathbf{e}) \cdot U(s^{\prime}) 

Sum of: Probability of state occuring after action times its utility

Principle of maximum expected utility MEU

rational agent should choose the action that maximizes the agents expected utility

 action =argmaxaEU(ae) \text { action }=\underset{a}{\operatorname{argmax}} \space E U(a \mid \mathbf{e}) 

Axioms of Utility theory

Constraints on rational preferences of an agent - MEU can be derived from these constraints.

Notation:

ABA \succ B agent prefers state AA over state BB

ABA \sim B agent is indifferent between state AA and state BB

ABA \succsim B one of the above

Lottery

Set of possible outcomes for each action.

L=[p1,S1;p2,S2;pn,Sn] L=\left[p_{1}, S_{1} ; \space p_{2}, S_{2} ; \space \ldots p_{n}, S_{n}\right] 

LL Lottery

SiS_i outcome (can be atomic or another lottery = complex lottery )

pip_i probability

Constraints

  1. Orderability

    The agent must have a preference.

    (AB)xor(BB)xor(AB) (A \succ B) \space \textsf{xor} \space (B \succ B) \space \textsf{xor} \space (A \succsim B) 

  1. Transivity

    (AB)(BC)(AC)(A \succ B) \wedge(B \succ C) \Rightarrow(A \succ C)

  1. Continuity

    If ABCA \succ B \succ C there is a probability pp for which the agent would be indifferent to

    getting BB with absolute certainty

    or AA with probability pp and CC with 1p1-p

    ABCp[p,A;1p,C]B A \succ B \succ C \Rightarrow \exists p\space[p, A ; 1-p, C] \sim B 

  1. Substitutability

    If agent is indifferent to AA and BB then agent is indifferent to complex lotteries with same probabilities.

    AB[p,A;1p,C][p,B;1p,C] A \sim B \Rightarrow[p, A ; 1-p, C] \sim[p, B ; 1-p, C] 

  1. Monotonicity

    Agent prefers a higher probability of the state that it prefers.

    AB(p>q[p,A;1p,B][q,A;1q,B]) A \succ B \Rightarrow(p>q \Leftrightarrow[p, A ; 1-p, B] \succ[q, A ; 1-q, B]) 

  1. Decomposability

    Compound lotteries can be reduced to simpler ones.

    [p,A;1p,[q,B;1q,C]][p,A;(1p)q,B;(1p)(1q),C] [p, A ; 1-p,[q, B ; 1-q, C]] \sim[p, A ;(1-p) q, B ;(1-p)(1-q), C] 

If an agent violates these axioms it will exhibit irrational behaviour.

Preference constraints → Utility Function

Existence of utility function

If agent is rational, there exists a real-valued function UU so that

U(A)>U(B)ABU(A)>U(B) \Leftrightarrow A \succ B

U(A)=U(B)ABU(A)=U(B) \Leftrightarrow A \sim B

The agents behavior would not change if:

U(S)=aU(S)+bU^{\prime}(S)=a U(S)+b(affine transformation) with constantsa,b>0a,b >0

It is therefore not something unique .

The numbers do not matter - this is a value / ordinal utility function .

Expected utility of a lottery

is the sum of the probability of each outcome times its utility.

U([p1,S1;;pn,Sn])=ipiU(Si) U\left([p_{1}, S_{1} ; \ldots ; p_{n}, S_{n}\right])=\sum_{i} p_{i} \cdot U(S_{i}) 

Utility assessment and Utility scales

We want to build a decision theoretic system that helps the agent make decisions.

Preference elicitation

Testing / observing agent and finding out its underlying utility function.

There are no absolute values for utility function - we try to create it:

U(Si)=μU(S_i)= \mu_{\top} best possible prize

U(Si)=μU(S_i)= \mu_{\bot} worst possible catastrophe

Normalized utilities

1=μ1= \mu_{\top} best possible prize

0=μ0= \mu_{\bot} worst possible catastrophe

Utility of Money

Utility measure = agents total net assets.

Agents have monotonic preference for more money - they prefer having more.

That says nothing about preferences between lotteries involving money.

Expected monetary value EMV

The EMV (money made on average) ≠ the utility of it, because of:

Certainty equivalent

Most people will accept about $400 in alternative to playing a gamble that gives $1000 half the time and $0 the other half.

In this case:

  • certainty equivalent of the lottery $400
  • expected monetary value EMV $500

Insurance premium

= EMV - certainty equivalent of a lottery

is based on risk aversion.

Risk neutral

For small changes in wealth relative to the current wealth, almost any curve will be approximately linear.

An agent that has a linear curve is said to be risk-neutral. This justifies the axioms of probability.