Informed Search

Algorithms

Heuristic / Best First Search BFS algorighms use problem/domain-specific knowledge during search.

$f^*(n)$ (unknown) true total cost from $n$ to goal

$f(n)$ Evaluation function - estimation for $f^*(n)$ , used for node expansion

$g(n)$ Path cost from start to $n$ .

$h(n)$ Heuristic function - estimated cost of the cheapest path from $n$ .

Function can not be computed from the problem itself.

Computing it must have a low cost.

Output must be $0$ at the goal node.

Greedy BFS

Just like UCS but $h$  instead of $g$ for the priority queue ordering - does not take already spent costs into account.

$f(n) = h(n)$

completeness No - only in finite state spaces with explored set

time complexity $O(b^m)$

space complexity $O(b^m)$

optimality of solution No

A* Search

Solves non-optimality and possibility of non-termination of greedy search.

Just like UCS - but $g+h$  for the priority queue.

$f(n) = g(n) + h(n)$ where heuristic must be admissible.

completeness Yes - with finite nodes and $b$ , costs $\leq C^*$ , step-costs $\geq \varepsilon$

time complexity $O(b^{\varepsilon d})$ where $b^\varepsilon$ is the effective branching factor with rel. error $\varepsilon = \frac{h - f^*}{f^*}$ .

space complexity Exponential, keeps all nodes in memory

optimality of solution Yes

non admissible $h(n)$ for tree-search is not optimal (proof)
$S \dots$ start node
$G,H\dots$ goal nodes
We immedeately expand non-optimal goal $G$ because $h(G) = 0$ although $f(G) = 5+0$ .
Then we check $f(T) = 6+1$ and terminate search.

admissible $h(n)$ for tree-search is optimal (proof)
$S\dots$ start node
$G\dots$ optimal goal node
$G_2\dots$ non goal node
$n\dots$ unexpanded node on optimal path to $G$
Since there are no unexpanded nodes to $G_2$
$f(G_2) = g(G_2) + h(G_2)$
$f(G_2) = g(G_2) + 0$
Because $G_2$ is suboptimal
$\textcolor{grey}{f(G_2) =} g(G_2) > g(G) \textcolor{grey}{= g(n)+ f^*(n)}$ (cost to $n$ , and $n$ to goal)
$f(G_2) > g(n)+ f^*(n)$
Because the heuristic is admissible
$f(G_2) > g(n)+ f^*(n) \geq g(n)+ h(n)\textcolor{grey}{= f(n)}$
$f(G_2) > f(n)$
Therefore A* will never select $G_2$

consistent $h(n)$ for graph search is optimal
If $h(n)$ decreases during optimal path to goal it is discarded by graph-search.
(But in tree search it is not a problem since there is only a single path to the optimal goal).
Solutions to problem:
- consistency of $h$ (non decreasing $f$ on every path)
- additional book-keeping

optimal efficiency Yes

A* expands the fewest nodes possible - no other optimal algorithm can expand fewer nodes than A* because not expanding nodes with $f(n) < C^*$ runs the risk of missing the optimal solution.

Relationship to Dijkstras Algorithm
Shortest path from one node to all nodes in a graph. (There are 2 variants: shortest path from one node to all nodes - or - all nodes to all nodes)
Uses no heuristics, made for a finite and explicit graph.
Uses $f(n) = g(n)$ - can be seen as a UCS algorithm or as a special case of A* where $\forall n: h(n) = 0$ .

Contours in the state space
Because $f$ -costs are nondecreasing along any path we can draw contours in the state space, just like the contours in a topographic map.
With uniform-cost search $h(n) = 0$ , the bands will be "circular" around the start state.
With more accurate heuristics, the bands will stretch toward the goal state and become more narrowly focused around the optimal path.

Heuristics

Admissibility

Optimistic, never overestimating, never negative.

$h(n) \leq f^*(n)$

$h(n) \geq 0$

$\forall g$ goals: $h(g)=0$ (follows from the above)

Deriving admissible heuristics

from exact solution cost of a relaxed version of the problem.

Because the optimal solution cost of a relaxed problem is $\leq$ than the optimal solution cost of the real problem. The relaxation should be polynomially computable.

Examples
Examples: 8-puzzle
- Rules are relaxed so that a tile can move anywhere: $h_1(n)$ gives the shortest solution
- Rules are relaxed so that a tile can move to any adjacent square: $h_2(n)$ gives the shortest solution
Example: Traveling salesman problem TSP
"Find the shortest tour visiting all cities exactly once."
View all cities as a Graph $G=(V,E)$ where each Vertice must be connected .
The Minimum spanning tree MST is a lower bound for this problem that can be calculated with Kruskal $O(|E|log|E|)$ and Prim $O(|V^2|)$
Example: shortest path
straight airline-distance between two destinations

Consistency / Monotonicity

Every consistent heuristic is also admissible. Consistency is stricter than admissibility.

Implies that $f$ -value is non-decreasing on every path .

For every node $n$ and its successor $n'$ :

$h(n) \leq c(n,a,n') + h(n')$

This is a form of the general triangle inequality.

$f$ is then non-decreasing along every path:

$f(n') = g(n') + h(n')= g(n) + {c(n, a,n') + h(n')}$

$f(n') = g(n) + \textcolor{pink}{c(n, a,n') + h(n')} \geq g(n) +\textcolor{pink}{ h(n)} = f(n)$ (because of consistency)

$f(n') \geq f(n)$

Dominance

For two admissible heuristics, we say $h_2$ dominates $h_1$ , if:

$\forall n: h_2(n) \geq h_1(n)$

Then $h_2$ is better for search.

Local Search

For many (also optimization) problems, paths are irrelevant, only the solution matters.

The aim is an iterative improvement .

We define our state space: set of "complete" but not optimal configurations:

keep a single "current" state - (constant space usage)

try to improve it until goal state is reached - (optimal configuration)

Examples
Examples: unknown environments (online search), TSP, constraint satisfaction problems, $\dots$
Traveling Salesman Problem TSP
Variants of this approach get within 1% of optimal very quickly with thousands of cities
Pairwise exchange of crossing paths:
1. start with any complete tour
1. perform pairwise exchanges
n-queens
Move a queen to reduce number of conflicts (attacks), e.g., in column
Terminates in few steps (at most quadratic, but much fewer in reality)

Greedy local search

Also called "hill-climbing":

$+$ escaping shoulders

$-$ getting stuck in loop at local maximum

Solution: Random-restart hill climbing: restart after step limit overcomes local maxima

Pseudocode

function HILL-CLIMBING(problem) returns a state that is a local maximum
inputs: problem // a problem
local variables:
current, // a node
neighbor // a node
current ← MAKE-NODE(INITIAL-STATE[problem])
loop do
neighbor ← a highest-valued successor of current
if VALUE[neighbor] ≤ VALUE[current] then return STATE[current]
current ← neighbor

Simulated annelaing

Devised in the 1950s for physical process modeling.

Corresponds to "cooling off" process of materials

In depth
- value = energy $E$
- Based on Boltzmann distribution: If $T$ is decreased slowly enough: reach best (lowest energy) state $x^*$ with probability approaching $1$ , for small $T$
- The slower this process is, the better the results

Idea: Escaping local maxima by gradually allowing less "bad" moves in size and frequency.

Problem: "Slowly enough" can be worse than exhaustive search.

Pseudocode

function SIMULATED-ANNEALING(problem, schedule) returns a solution state
inputs:
problem, //a problem
schedule, //a mapping from time to “temperature”
local variables: current, //a node
next, //a node
T, //a “temperature” controlling prob. of downward steps
current ← MAKE-NODE(INITIAL-STATE[problem])
for t ← 1 to ∞ do
T ← schedule[t]
if T = 0 then return current
next ← a randomly selected successor of current
∆E ← VALUE[next] – VALUE[current]
if ∆E > 0 then current ← next
else current ← next only with probability e∆E/T

Local beam search

Idea

keep $k$ states instead of just $1$ to start searching from

Not running $k$ searches in parallel: Searches that find good states recruit other searches to join them

choose top $k$ of all their successors

Problem

often, all $k$ states end up on same local hill

Solution

choose $k$ successors randomly, biased towards good ones

Genetic algorithms GA

Stochastic local beam search + successors from pairs of states

GAs require states encoded as strings (lists of instructions as individuals)

Crossover can produce solutions quite distant from their parents → Crossover helps if and only if substrings are meaningful components (blocks) (not random shuffling)