# 数学代写|变分法代写variational methods代考|CSE393

## 数学代写|变分法代写variational methods代考|Stochastic Variational Inference

Stochastic Variational Inference (SVI) optimizes the ELBO by taking noisy estimates of the gradient $g[11]$, hence the name. Stochastic optimization is ubiquitous on modern ML since it is much faster than assessing a massive data set, which is commonplace nowadays.
The major requirements for the approximation to be valid are:

1. The gradient estimator $\hat{g}$ should be unbiased $\mathbb{E}[\hat{g}]=\mathbb{E}[g]$
2. The step size sequence $\left{\alpha_i \mid i \in \mathcal{N}\right}$ (learning rate) that nudges the parameters toward the optimal should be annealed so that
$$\sum_{i=0}^{\infty} \alpha_i=\infty \quad \text { and } \quad \sum_{i=0}^{\infty} \alpha_i^2<\infty$$
Intuitively, the first condition on the step size relates to the exploration capacity so the algorithm may find good solutions no matter where it is initialized. The second guarantees that its energy is bounded so that it can converge to the solution. Instead of computing the expectation step in Eq. (3.18) for all $N$ data points (at every iteration), we do it for a uniformly sampled (with replacement) subset of desired size $n$. From these new variational parameters, we compute the maximization step (or the expectation of the global variational parameters) as though we observed the data points $N / n$ times and update the estimate as the weighted average of the previous estimate and the subset optimal, according to Eq. (3.19).

Theoretically, this process should go on forever with increasingly smaller step sizes according to the constraints stated above. In practice, however, it ends when it reaches a stopping criteria, which should indicate that the ELBO has converged.
SVI is a stochastic optimization algorithm originally developed for fully factorized approximations (MFVI) [11] and later extended to support models with arbitrary dependencies between global and local variables [10].

## 数学代写|变分法代写variational methods代考|Minimizing the Forward KL Divergence

Differently from Sect. 3.2.1, we now employ the forward $\mathrm{KL}$ divergence $D_{K L}(p | q)$ for measuring the quality of the approximation. The change in the ordering of the arguments is the reason why ADF (and EP in Sect. 3.2.3) behaves so differently from ADF. KL is a divergence and not a distance, so the symmetry property does not hold and exchanging the arguments leads to a distinct functional with distinct properties.

The reverse KL divergence $D_{K L}(q | p)$ used in VI severely penalizes the approximating distribution $q$ for placing mass in regions where $p$ has low probability. Rewriting Eq. (3.4) as $$D_{K L}(q | p)=\mathbb{E}q[\log q(x)]-\mathbb{E}_q[\log p(x)],$$ we can note that the term $\log p(x)$ rapidly tends to $=\infty$ for such regions. Conversely, by exchanging $p$ and $q$ in Eqs. (3.4) and (3.34) we get $$D{K L}(p | q)=\mathbb{E}_p[\log p(x)]-\mathbb{E}_p[\log q(x)]$$
The forward KL has the opposite behavior, that is, it favors spreading the mass of $q$ over the support of $p$. Even low probability regions of $p$ must have mass attributed to in $q$ to avoid obtaining samples from $p(x)$ such that $\log q(x)$ tends to $-\infty$. Figure $3.9$ neatly illustrates this property for both $\mathrm{KL}$ forms.

## 数学代写|变分法代写variational methods代考|Stochastic Variational Inference

1. 梯度估计器 $\hat{g}$ 应该是公正的 $\mathbb{E}[\hat{g}]=\mathbb{E}[g]$
2. 步长序列 Veft{alpha_i \mid i \in \mathcal{N}ไright} $}$ 将参数推向最优的 (学习率) 应该退火， 以便
$$\sum_{i=0}^{\infty} \alpha_i=\infty \text { and } \sum_{i=0}^{\infty} \alpha_i^2<\infty$$
直观上，步长的第一个条件与探索能力有关，因此无论在何处初始化算法都可以找到好的解决方室。 第二个保证它的能量是有界的，以便它可以收敛到解决方案。而不是计算等式中的期望步聚。(3.18) 对所有人 $N$ 数据点 (在每次迭代中)，我们针对所需大小的均匀采样（带替换）子集执行此操作 $n$.
从这些新的变分参数中，我们计算最大化步霢（或全局变分参数的期望），就像我们观察数据点一样 $N / n$ 次，并将估计值更新为先前估计值和子集最优值的加权平均值，根据等式。(3.19)。
从理论上讲，根据上述约束，这个过程应该一直持续下去，步长越来越小。然而，在实践中，它在达到 停止标准时结束，这应该表明 ELBO 已经收敛。

SVI 是一种随机优化算法，最初是为完全分解逼近 (MFVI) [11] 而开发的，后来扩展到支持全局变量和局 部变量之间具有任意依赖性的模型 [10]。

## 数学代写|变分法代写variational methods代考|Minimizing the Forward KL Divergence

$$D_{K L}(q \mid p)=\mathbb{E} q[\log q(x)]-\mathbb{E}_q[\log p(x)],$$

$$D K L(p \mid q)=\mathbb{E}_p[\log p(x)]-\mathbb{E}_p[\log q(x)]$$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: