# 数学代写|变分法代写variational methods代考|Math595

## 数学代写|变分法代写variational methods代考|Black Box Variational Inference

As seen in Sect. 3.2.1.5, the SVI computes the distribution updates in a closed form, which requires model-specific knowledge and implementation. Moreover, the gradient of the ELBO must have a closed-form analytical formula. Black Box Variational Inference (BBVI) [25] avoids these problems by estimating the gradient instead of actually computing it.
BBVI uses the score function estimator [34]
$$\nabla_\phi \mathbb{E}{q(\mathbf{z} ; \phi)}[f(\mathbf{z} ; \theta)]=\mathbb{E}{q(\mathbf{z} ; \phi)}\left[f(\mathbf{z} ; \theta) \nabla_\phi \log q(\mathbf{z} ; \phi)\right]$$
where the approximating distribution $q(\mathbf{z} ; \phi)$ is a continuous function of $\phi$ (see Appendix A.1). Using this estimator to compute the gradient of the ELBO in Eq. (3.7) gives us
$$\nabla_\phi \mathrm{ELBO}=\mathbb{E}q\left[\left(\nabla\phi \log q(\mathbf{z} ; \phi)\right)(\log p(\mathbf{x}, \mathbf{z})-\log q(\mathbf{z} ; \phi))\right] .$$
The expectation in $\mathrm{Fq}$. (3.77) is approximated by a Monte Carlo integration.
The sole assumption of the gradient estimator in Eq. (3.77) about the model is the feasibility of computing the log of the joint $p\left(\mathbf{x}, \mathbf{z}_s\right)$. The sampling method and the gradient of the log both rely on the variational distribution $q$. Thus, we can derive them only once for each approximating family $q$ and reuse them for different models $p\left(\mathbf{x}, \mathbf{z}_s\right)$. Hence the name black box: we just need to specify the model $p\left(\mathbf{x}, \mathbf{z}_s\right)$ and can directly perform VI on it. Actually, $p\left(\mathbf{x}, \mathbf{z}_s\right)$ does not even need to be normalized, since the log of the normalization constant does not contribute to the gradient in Eq. (3.77).

## 数学代写|变分法代写variational methods代考|Black Box α Minimization

Black Box $\alpha$ minimization [9] (BB- $\alpha$ ) optimizes an approximation of the power EP energy function $[19,20]$. Instead of considering $i$ different local compatibility functions $\widetilde{f}_i$, it ties them together so that all $\widetilde{f}_i$ are equal, that is, $\widetilde{f}_i=\widetilde{f}$. We may view it as an average factor approximation, which we use to approximate the average effect of the original $f_i[9]$.

Further restricting these factors to belong to the exponential family amounts to tying their natural parameters. As a consequence, BB- $\alpha$ no longer needs to store an approximating site per likelihood factor, which leads to significant memory savings in large data sets. The fixed points differ from power EP, though they become equal in the limit of infinite data.
$\mathrm{BB}-\alpha$ dispenses with the need for double-loop algorithms to directly minimize the energy and employs gradient-descent methods for this matter. This contrasts with the iterative update scheme of Sect. 3.2.3. As other modern methods designed for large-scale learning, it employs stochastic optimization to avoid cycling through the whole data set. Besides, it estimates the expectation over the approximating distribution $q$ present in the energy function by Monte Carlo sampling.

Differently from BBVI [25], the BB- $\alpha$ uses the pathwise derivative estimator [24] to estimate the gradient (see Appendix A.1). We must be able to express the random variable $\mathbf{z} \sim q(\mathbf{z}, \phi)$ as an invertible deterministic transformation $g(\cdot ; \phi)$ of a base random variable $\epsilon \sim p(\epsilon)$, so we can write
$$\nabla_\phi \mathbb{E}{q(\mathbf{z} ; \phi)}[f(\mathbf{z} ; \theta)]=\mathbb{E}{p(\epsilon)}\left[\nabla_\phi f(g(\epsilon ; \phi) ; \theta)\right]$$
The approach requires not only the distribution $q(\mathbf{z} ; \phi)$ to be reparameterizable but also $f(\mathbf{z} ; \theta)$ to be known and a continuous function of $\phi$ for all values of $\mathbf{z}$. Note that it requires, in addition to the likelihood function, its gradients. Still, we can readily obtain them with automatic differentiation tools if the likelihood is analytically defined and differentiable.

As observed in Sect. 3.2.3, the parameter $\alpha$ in Eq. (3.68) controls the divergence function. Hence, the method is able to interpolate between VI $(\alpha \rightarrow-1)$ and an algorithm similar to EP $(\alpha \rightarrow 1)$. Interestingly, the authors [9] claim to usually obtain the best results by setting $\alpha=0$, halfway through VI and EP. This value corresponds to the so-called Hellinger distance, the sole member of the $\alpha$-family that is symmetric.

## 数学代写|变分法代写variational methods代考|Black Box Variational Inference

BBVI 使用评分函数估计器 [34]
$$\nabla_\phi \mathbb{E} q(\mathbf{z} ; \phi)[f(\mathbf{z} ; \theta)]=\mathbb{E} q(\mathbf{z} ; \phi)\left[f(\mathbf{z} ; \theta) \nabla_\phi \log q(\mathbf{z} ; \phi)\right]$$

$$\nabla_\phi \mathrm{ELBO}=\mathbb{E} q[(\nabla \phi \log q(\mathbf{z} ; \phi))(\log p(\mathbf{x}, \mathbf{z})-\log q(\mathbf{z} ; \phi))]$$

## 数学代写|变分法代写variational methods代考|Black Box α Minimization

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: