# 数学代写|运筹学作业代写operational research代考|Value Iteration

## 数学代写|运筹学作业代写operational research代考|Value Iteration

In this section, we discuss a method to approach the solution of (10.4) arbitrarily close. This method is known as value iteration or successive approximation.
Algorithm 10.1 (Value iteration algorithm).
Step 1. (Initialization) Specify the desired accuracy $\epsilon>0$, set $n=1$ and compute
$$V_1(i)=\max {a \in D(i)}{r(i, a)} \text { for all } i \in S$$ Step 2. (Iteration) Set $n:=n+1$ and determine $V_n(i)$ using $$V_n(i)=\max {a \in D(i)}\left{r(i, a)+\beta \sum_{j \in S} p(j \mid i, a) V_{n-1}(j)\right} \text { for all } i \in S .$$
Step 3. (Stop criterion) If $\left|V_n-V_{n-1}\right|<\epsilon(1-\beta) /(2 \beta)$ then stop, else, return to step 2. Here, we use the maximum norm, that is, $\left|V_n-V_{n-1}\right|=$ $\max {i \in S}\left{\left|V_n(i)-V{n-1}(i)\right|\right}$
If we view $V_n(i)$ as
$V_n(i)=$ maximum expected present value, beginning in state $i \in S$, when $n$ additional decisions are to be made,
then (10.7) is the dynamic programming recursion of a known finite-horizon problem. Intuitively, it is now obvious that as $n \rightarrow \infty$, we have $V_n(i) \rightarrow V(i)$ for all $i \in S$, so that $V_n$ converges to the solution of the optimality equations. We will not prove this, but we will illustrate it for our machine maintenance example.

Example 10.2 (Continued). For our example, with $\beta=0.8$, we get the following.

## 数学代写|运筹学作业代写operational research代考|Policy Iteration

Another method to find an optimal policy and the corresponding optimal value function uses an iteration method in which better and better stationary policies are found, known as policy or strategy iteration. If the state space $S$ is finite, the method converges in finitely many steps.

Let $\pi=(\delta, \delta, \ldots, \delta, \ldots)$ be a stationary policy; that is, for any $i \in S$, the same decision $\delta(i) \in D(i)$ is made at every time $t \in T$. Let $V_\pi: S \rightarrow \mathbb{R}$ be the corresponding value function. It satisfies the following equations:
$$V_\pi(i)=r(i, \delta(i))+\beta \sum_{j \in S} p(j \mid i, \delta(i)) V_\pi(j), \quad i \in S$$
This is a system of linear equations with a unique solution; it is finite if $|S|<\infty$. The policy iteration method is based on the fact that the class of stationary policies contains an optimal policy provided that $|D(i)|<\infty$ for all $i \in S$.

Theorem 10.4. Let $\pi^=\left(\delta^, \delta^, \ldots\right)$ be a stationary policy that, for every $i \in S$, prescribes a decision $\delta^(i) \in D(i)$ such that
$$r\left(i, \delta^(i)\right)+\beta \sum_{j \in S} p\left(j \mid i, \delta^(i)\right) V(j)=\max {a \in D(i)}\left{r(i, a)+\beta \sum{j \in S} p(j \mid i, a) V(j)\right}$$
Then
$$V_{\pi^}(i)=V(i) \quad \text { for all } i \in S$$ and $\pi^$ is optimal.
This theorem says that any stationary policy that prescribes decisions for which the maximum is reached in $(10.4)$ is optimal.

# 运筹学代考

## 数学代写|运筹学作业代写operational research代考|Value Iteration

$$V_1(i)=\max a \in D(i) r(i, a) \text { for all } i \in S$$

$V_n(i)=$ 最大预期现值，从状态开始 $i \in S$ ， 什么时候 $n$ 要做出额外的决定，

## 数学代写|运筹学作业代写operational research代考|Policy Iteration

$$V_\pi(i)=r(i, \delta(i))+\beta \sum_{j \in S} p(j \mid i, \delta(i)) V_\pi(j), \quad i \in S$$

$V_{-} _\left{p^{\wedge}\right}(i)=V(i) \backslash q u a d \backslash$ text ${$ 对于所有 $} i \backslash$ in $S$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: