数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|CSCl4961

数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Stochastic Gradient Descent with Averaging

Stochastic gradient descent is slow because of the fast decay of $\tau_k$ toward zero. To improve somehow the convergence speed, it is possible to average the past iterate, i.e. run a “classical” SGD on auxiliary variables $\left(\tilde{x}k\right)_k$ $$\tilde{x}^{(\ell+1)}=\tilde{x}_k-\tau_k \nabla f{i(k)}\left(\tilde{x}k\right)$$ and output as estimated weight vector the Cesaro average $$x_k \stackrel{\text { def. }}{=} \frac{1}{k} \sum{\ell=1}^k \tilde{x}{\ell}$$ This defines the Stochastic Gradient Descent with Averaging (SGA) algorithm. Note that it is possible to avoid explicitly storing all the iterates by simply updating a running average as follow $$x{k+1}=\frac{1}{k} \tilde{x}_k+\frac{k-1}{k} x_k .$$
In this case, a typical choice of decay is rather of the form
$$\tau_k \stackrel{\text { def. }}{=} \frac{\tau_0}{1+\sqrt{k / k_0}}$$
Notice that the step size now goes much slower to 0 , at rate $k^{-1 / 2}$.
Typically, because the averaging stabilizes the iterates, the choice of $\left(k_0, \tau_0\right)$ is less important than for SGD.

Bach proves that for logistic classification, it leads to a faster convergence (the constant involved are smaller) than SGD, since on contrast to SGD, SGA is adaptive to the local strong convexity of $E$.

数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Stochastic Averaged Gradient Descent

For problem size $n$ where the dataset (of size $n \times p$ ) can fully fit into memory, it is possible to further improve the SGA method by bookkeeping the previous gradients. This gives rise to the Stochastic Averaged Gradient Descent (SAG) algorithm.

We store all the previously computed gradients in $\left(G^i\right){i=1}^n$, which necessitates $O(n \times p)$ memory. The iterates are defined by using a proxy $g$ for the batch gradient, which is progressively enhanced during the iterates. The algorithm reads $$x{k+1}=x_k-\tau g \quad \text { where }\left{\begin{array}{l} h \leftarrow \nabla f_{i(k)}\left(\tilde{x}_k\right), \ g \leftarrow g-G^{i(k)}+h, \ G^{i(k)} \leftarrow h . \end{array}\right.$$
Note that in contrast to SGD and SGA, this method uses a fixed step size $\tau$. Similarly to the BGD, in order to ensure convergence, the step size $\tau$ should be of the order of $1 / L$ where $L$ is the Lipschitz constant of $f$. This algorithm improves over SGA and SGD since it has a convergence rate of $O(1 / k)$ as does BGD. Furthermore, in the presence of strong convexity (for instance when $X$ is injective for logistic classification), it has a linear convergence rate, i.e.
$$\mathbb{E}\left(f\left(x_k\right)\right)-f\left(x^{\star}\right)=O\left(\rho^k\right),$$
for some $0<\rho<1$
Note that this improvement over SGD and SGA is made possible only because SAG explicitly uses the fact that $n$ is finite (while SGD and SGA can be extended to infinite $n$ and more general minimization of expectations (43)).
Figure 18 shows a comparison of SGD, SGA and SAG.

机器学习中的优化理论代考

数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Stochastic Gradient Descent with Averaging

$$\tilde{x}^{(\ell+1)}=\tilde{x}_k-\tau_k \nabla f i(k)(\tilde{x} k)$$

$$x_k \stackrel{\text { def. }}{=} \frac{1}{k} \sum \ell=1^k \tilde{x} \ell$$

$$x k+1=\frac{1}{k} \tilde{x}_k+\frac{k-1}{k} x_k .$$

$$\tau_k \stackrel{\text { def. }}{=} \frac{\tau_0}{1+\sqrt{k / k_0}}$$

Bach 证明对于 logistic 分类，它比 SGD 收敛更快（涉及的常数更小），因为与 SGD 相比，SGA 适应局 部强凸性 $E$.

数学代写|机器学习中的优化理论代写OPTIMIZATION FOR MACHINE LEARNING代考|Stochastic Averaged Gradient Descent

$$h \leftarrow \nabla f_{i(k)}\left(\tilde{x}_k\right), g \leftarrow g-G^{i(k)}+h, G^{i(k)} \leftarrow h .$$
\正确的。
NotethatincontrasttoSGDandSGA, thismethodusesa fixedstepsize $\$ \tau \$$. Similarlytothe B G. \ \$$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: