# 数学代写|数值分析代写numerical analysis代考|Simulated Annealing

## 数学代写|数值分析代写numerical analysis代考|Simulated Annealing

Simulated annealing $[203,226]$ was developed as a means of global discrete optimization. The physical insight is that a physical system that is cooled very rapidly often only goes part way towards the minimum energy configuration, and often ends up in a local but not global minimum of the total potential energy. On the other hand, cooling the same system slowly allowed the system to come close to the global minimum of total potential energy.

Temperature here relates to thermal energy, which is kinetic energy at a microscopic level. Thermal energy allows increases in the potential energy at the level of individual molecules and atoms. This, like the stochastic gradient method, is a nonmonotone optimization method. That is, the objective function value can sometimes increase, even though we wish to minimize this value.

Simulated annealing is also a stochastic optimization process. However, there are some differences with Stochastic Gradient Method. While the step lengths $s_k$ in Stochastic Gradient Method are typically chosen with the property that $\sum_{k=0}^{\infty} s_k=$ $+\infty$ and $\sum_{k=0}^{\infty} s_k^2$ finite, in simulated annealing, the corresponding step lengths $s_k$ decrease much more slowly.

The starting point for simulated annealing is the Metropolis-Hastings algorithm (Algorithm 71 of Section 7.4.2.3). Let $X$ be the state space we wish to optimize over. We choose the function $q$ generating the probability distribution $p(z)=q(z) / \sum_{x \in X} q(x)$ to be $$q(x)=\exp (-\beta f(x))$$
where $f$ is the function we wish to minimize. Physically $\beta$ corresponds to $1 /\left(k_B T\right)$ where $k_B$ is Boltzmann’s constant and $T$ the absolute temperature. The larger the value of $\beta$ the more concentrated the distribution is near the global minimum; at the other extreme, if $\beta=0$ simulated annealing becomes essentially a random walk.
If we take $\beta=+\infty$, then the only steps of the Metropolis-Hastings algorithm that are accepted are steps where the candidate iterate $x^{\prime}$ satisfies $f\left(x^{\prime}\right)<f\left(x_k\right)$; this is a randomized descent algorithm: pick a neighbor of $x_k$ at random. If the neighbor has a smaller value of $f$, this neighbor becomes $x_{k+1}$. Otherwise $x_{k+1}=x_k$. Clearly, this method will become stuck in local minima.

## 数学代写|数值分析代写numerical analysis代考|Second Derivatives and Newton’s Method

The standard first-order conditions for the unconstrained minimization of a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ are $\nabla f(\boldsymbol{x})=\mathbf{0}$. This is a system of $n$ equations in $n$ unknowns. We can apply the multivariate Newton method (Algorithm 43 in Section 3.3.4) and many of its variations to the problem of solving $\nabla f(\boldsymbol{x})=\mathbf{0}$. This requires solving the linear system (Hess $\left.f\left(\boldsymbol{x}_k\right)\right) \boldsymbol{d}_k=-\nabla f\left(\boldsymbol{x}_k\right)$ where Hess $f(\boldsymbol{x})$ is the Hessian matrix of $f$ at $x$.

Newton’s method has the advantage of rapid quadratic convergence when starting close to the solution and the Hessian matrix at the solution is invertible. However, the cost of each iteration can be substantial if the dimension $n$ is large. If $n$ is large, then sparse solution methods (see Section 2.3) or iterative methods (see Section 2.4) can be used to solve the linear systems that arise in Newton’s method.

Newton’s method in this form converges to solutions of $\nabla f(\boldsymbol{x})=\mathbf{0}$. This is a necessary but not sufficient condition for a local minimizer except under special assumptions. For example, if $f$ is smooth and convex, then $\nabla f(\boldsymbol{x})=\mathbf{0}$ is both a necessary and sufficient condition for a global minimizer. In general, there are saddle points (where Hess $f(\boldsymbol{x})$ has both negative and positive eigenvalues) and local maximizers (where Hess $f(\boldsymbol{x})$ has only negative eigenvalues). Newton’s method regards saddle points and local maximizers as equally valid solutions of $\nabla f(\boldsymbol{x})=\mathbf{0}$. Yet, for optimization purposes, we wish to avoid these points. A consequence of this behavior of Newton’s method is that the Newton step $\boldsymbol{d}_k$ satisfying Hess $f\left(x_k\right) \boldsymbol{d}_k=-\nabla f\left(\boldsymbol{x}_k\right)$ might not be a descent direction: $\boldsymbol{d}_k^T \nabla f\left(\boldsymbol{x}_k\right)=-\nabla f\left(\boldsymbol{x}_k\right)^T$ (Hess $\left.f\left(\boldsymbol{x}_k\right)\right)^{-1} \nabla f\left(\boldsymbol{x}_k\right)$ could be positive for an indefinite Hessian matrix. This means that any line search method based on the sufficient decrease criterion (8.3.4) can fail. This includes the Armijo/backtracking, Goldstein, and Wolfe-based line search methods.

In order to accommodate Hessian matrices that are not positive definite, we need to modify the Newton method. We can do this by modifying the Hessian matrix used [190, Sec. 3.4]. We can also use a different globalization strategy than using line searches, such as trust region methods [190, Chap. 4].

# 数值分析代考

## 数学代写|数值分析代写numerical analysis代考|Simulated Annealing

$$q(x)=\exp (-\beta f(x))$$

## 数学代写|数值分析代写numerical analysis代考|Second Derivatives and Newton’s Method

$\boldsymbol{d}_k^T \nabla f\left(\boldsymbol{x}_k\right)=-\nabla f\left(\boldsymbol{x}_k\right)^T$ (赫斯 $\left.f\left(\boldsymbol{x}_k\right)\right)^{-1} \nabla f\left(\boldsymbol{x}_k\right)$ 对于不确定的 Hessian 矩阵可能 为正。这意味着任何基于充分减少标准 (8.3.4) 的线搜索方法都可能失败。这包括 Armijo/backtracking、Goldstein 和基于 Wolfe 的线搜索方法。

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: