计算机代写|数据分析信号处理和机器学习中的矩阵方法代写Matrix Methods In Data Analysis, Signal Processing, And Machine Learning代考|CSC2321

计算机代写|数据分析信号处理和机器学习中的矩阵方法代写Matrix Methods In Data Analysis, Signal Processing, And Machine Learning代考|Stochastic Gradient Descent

We can obtain better results for more restrictive classes of functions. Recall that we can only set a small step because the gradient changes as we change the solution. A function is smooth if the change in the gradient is bounded by how far the solution changes. More precisely, a function is $\beta$ smooth if for any two points $x, y$, we have
$$|\nabla f(x)-\nabla f(y)| \leq \beta|x-y|$$
Note that this implies for any $x, y$
$$f(y) \leq f(x)+\langle\nabla f(x), y-x\rangle+\frac{\beta}{2}|y-x|^2$$
Thus, we can always bound the objective using a quadratic function. In order to find the next location for the solution, we minimize the quadratic approximation at the current point. The solution turns out to be $\eta_t=1 / \beta$. With this choice, we have
$$f\left(x^{(t)}\right) \leq f\left(x^{(t-1)}\right)-\frac{1}{2 \beta} \mid \nabla f\left(\left.x^{(t-1)}\right|^2\right.$$
We analyze this algorithm using the potential Missing \left or extra \right. The change in the potential is
Missing \left or extra \right }

计算机代写|数据分析信号处理和机器学习中的矩阵方法代写Matrix Methods In Data Analysis, Signal Processing, And Machine Learning代考|Two Notes on Proof

We now extend our results to the constrained case $\min {x \in S} f(x)$ for a convex set $S$. In the unconstrained case, we take the quadratic approximation of the function at the current solution and the next solution is the minimizer of the quadratic approximation. Notice that this step is still meaningful when we have constraints. Thus our algorithm is $$x^{(t)} \leftarrow \underset{z \in S}{\operatorname{argmin}} f\left(x^{(t-1)}\right)+\left\langle\nabla f\left(x^{(t-1)}\right), z-x^{(t-1)}\right\rangle+\frac{\beta}{2}\left|z-x^{(t-1)}\right|^2$$ Another idea is to move in the direction of the gradient, which might take us out of the feasible region, and then project back to the feasible region. In other words, our algorithm is $$y^{(t)} \leftarrow x^{(t-1)}-\eta_t \nabla f\left(x^{(t-1)}\right) \quad x^{(t)} \leftarrow \underset{x \in S}{\operatorname{argmin}}\left|y^{(t)}-x\right|$$ It turns out that for step size $\eta_t=1 / \beta$, these two algorithms are identical. In order to analyze this algorithm, we need a property of the projection operation. Lemma 6.1. Given a convex set $S$, let $a \in S$ and $b^{\prime} \in \mathbb{R}^n$. Let $b=\operatorname{argmin}{x \in S} \frac{1}{2}\left|x-b^{\prime}\right|^2$. Then $\left\langle a-b, b-b^{\prime}\right\rangle \geq 0$ and therefore, $|a-b|^2 \leq\left|a-b^{\prime}\right|^2$.

Proof. The lemma follows from the optimality of $b$. The gradient of $\frac{1}{2}\left|x-b^{\prime}\right|^2$ at $x=b$ is $b-b^{\prime}$. Because of the optimality of $b$, we have $\left\langle a-b, b-b^{\prime}\right\rangle \geq 0$.
Using this property, we obtain Missing \left or extra \right and we can observe that the rest of the original proof goes through in the constrained setting.

机器学习中的矩阵方法代考

计算机代写|数据分析信号处理和机器学习中的矩阵方法代写Matrix Methods In Data Analysis, Signal Processing, And Machine Learning代 考|Smooth functions

计算机代写|数据分析信号处理和机器学习中的矩阵方法代写Matrix Methods In Data Analysis, Signal Processing, And Machine Learning代 考|Constrained optimization

