# 计算机代写|机器学习代写machine learning代考|CS7641

## 计算机代写|机器学习代写machine learning代考|Exponential family factor analysis

So far we have assumed the observed data is real-valued, so $x_n \in \mathbb{R}^D$. If we want to model other kinds of data (e.g., binary or categorical), we can simply replace the Gaussian output distribution with a suitable member of the exponential family, where the natural parameters are given by a linear function of $\boldsymbol{z}_n$. That is, we use
$$p\left(\boldsymbol{x}_n \mid \boldsymbol{z}_n\right)=\exp \left(\mathcal{T}(\boldsymbol{x})^{\top} \boldsymbol{\theta}+h(\boldsymbol{x})-g(\boldsymbol{\theta})\right)$$
where the $N \times D$ matrix of natural parameters is assumed to be given by the low rank decomposition $\Theta=\mathbf{Z W}$, where $\mathbf{Z}$ is $N \times L$ and $\mathbf{W}$ is $L \times D$. The resulting model is called exponential family factor analysis.

Unlike the linear-Gaussian FA, we cannot compute the exact posterior $p\left(\boldsymbol{z}_n \mid \boldsymbol{x}_n, \mathbf{W}\right)$ due to the lack of conjugacy between the expfam likelihood and the Gaussian prior. Furthermore, we cannot compute the exact marginal likelihood either, which prevents us from finding the optimal MLE.
[CDS02] proposed a coordinate ascent method for a deterministic variant of this model, known as exponential family PCA. This alternates between computing a point estimate of $\boldsymbol{z}_n$ and W. This can be regarded as a degenerate version of variational EM, where the E step uses a delta function posterior for $\boldsymbol{z}_n$. [GS08] present an improved algorithm that finds the global optimum, and [Ude $+16$ ] presents an extension called generalized low rank models, that covers many different kinds of loss function.

However, it is often preferable to use a probabilistic version of the model, rather than computing point estimates of the latent factors. In this case, we must represent the posterior using a nondegencrate distribution to avoid overfitting, since the number of latent variables is proportional to the number of datacases [WCS08]. Fortunately, we can use a non-degenerate posterior, such as a Gaussian, by optimizing the variational lower bound. We give some examples of this below.

## 计算机代写|机器学习代写machine learning代考|Bottleneck autoencoders

We start by considering the special case of a linear autoencoder, in which there is one hidden layer, the hidden units are computed using $z=\mathbf{W}1 \boldsymbol{x}$, and the output is reconstructed using $\hat{x}=\mathbf{W}_2 z$, where $\mathbf{W}_1$ is a $L \times D$ matrix, $\mathbf{W}_2$ is a $D \times L$ matrix, and $L{n=1}^N\left|\boldsymbol{x}_n-\mathbf{W} \boldsymbol{x}_n\right|_2^2$, one can show [BH89; KJ95] that $\hat{\mathbf{W}}$ is an orthogonal projection onto the first $L$ eigenvectors of the empirical covariance matrix of the data. This is therefore equivalent to PCA.

If we introduce nonlinearities into the autoencoder, we get a model that is strictly more powerful than PCA, as proved in [JHG00]. Such methods can learn very useful low dimensional representations of data.

Consider fitting an autoencoder to the Fashion MNIST dataset. We consider both an MLP architecture (with 2 layers and a bottleneck of size 30), and a CNN based architecture (with 3 layers and a 3d bottleneck with 64 channels). We use a Bernoulli likelihood model and binary cross entropy as the loss. Figure $20.17$ shows some test images and their reconstructions. We see that the CNN model reconstructs the images more accurately than the MLP model. However, both models are small, and were only trained for 5 epochs; results can be improved by using larger models, and training for longer.

Figure $20.18$ visualizes the first 2 (of 30 ) latent dimensions produced by the MLP-AE. More precisely, we plot the tSNE embeddings (see Section 20.4.10), color coded by class label. We also show some corresponding images from the dataset, from which the embeddings were derived. We see that the method has done a good job of separating the classes in a fully unsupervised way. We also see that the latent space of the MLP and CNN models is very similar (at least when viewed through this $2 \mathrm{~d}$ projection).

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Exponential family factor analysis

$$p\left(\boldsymbol{x}_n \mid \boldsymbol{z}_n\right)=\exp \left(\mathcal{T}(\boldsymbol{x})^{\top} \boldsymbol{\theta}+h(\boldsymbol{x})-g(\boldsymbol{\theta})\right)$$

[CDS02] 为该模型的确定性变体提出了坐标上升法，称为指数族 PCA。这在计算点估计之间交莫 $z_n$ 和 W。这可以看作是变分 EM 的退化版本，其中 E步骤使用 delta 函数后验 $z_n$. [GSO8] 提出了一种改进的 算法，可以找到全局最优值，并且 [Ude $+16]$ 提出了一种称为广义低秩模型的扩展，它涵盖了许多不同 类型的损失函数。

## 计算机代写|机器学习代写machine learning代考|Bottleneck autoencoders

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: