计算机代写|机器学习代写machine learning代考|CS446

计算机代写|机器学习代写machine learning代考|Sparse autoencoders

Yet another way to regularize autoencoders is to add a sparsity penalty to the latent activations of the form $\Omega(\boldsymbol{z})=\lambda|z|_1$. (This is called activity regularization.)

An alternative way to implement sparsity, that often gives better results, is to use logistic units, and then to compute the expected fraction of time each unit $k$ is on within a minibatch (call this $q_k$ ), and ensure that this is close to a desired target value $p$, as proposed in [GBB11]. In particular, we use the regularizer $\Omega\left(z_{1: L, 1: N}\right)-\lambda \sum_k D_{\mathrm{KL}}\left(\boldsymbol{p} | \boldsymbol{q}k\right)$ for latent dimensions 1: $L$ and examples 1: $N$, where $\boldsymbol{p}=(p, 1-p)$ is the desired target distribution, and $\boldsymbol{q}_k=\left(q_k, 1-q_k\right)$ is the empirical distribution for unit $k$, computed using $q_k=\frac{1}{N} \sum{n=1}^N \mathbb{I}\left(z_{n, k}=1\right)$.

Figure $20.21$ shows the results when fitting an AE-MLP (with 300 hidden units) to Fashion MNIST. If we set $\lambda=0$ (i.e., if we don’t impose a sparsity penalty), we see that the average activation value is about $0.4$, with most neurons being partially activated most of the time. With the $\ell_1$ penalty, we see that most units are off all the time, which means they are not being used at all. With the KL penalty, we see that about $70 \%$ of neurons are off on average, but unlike the $\ell_1$ case, we don’t see units being permanently turned off (the average activation level is 0.1). This latter kind of sparse firing pattern is similar to that observed in biological brains (see e.g., [Bey $+19]$ ).

计算机代写|机器学习代写machine learning代考|Variational autoencoders

In this section, we discuss the variational autoencoder or VAE [KW14; RMW14; KW19a], which can be thought of as a probabilistic version of a deterministic autoencoder (Section 20.3) The principal advantage is that a VAE is a generative model that can create new samples, whereas an autoencoder just computes embeddings of input vectors.

We discuss VAEs in detail in the sequel to this book, [Mur22]. However, in brief, the VAE combines two key ideas. First we create a non-linear extension of the factor analysis generative model, i.e., we replace $p(\boldsymbol{x} \mid \boldsymbol{z})=\mathcal{N}\left(\boldsymbol{x} \mid \mathbf{W} \boldsymbol{z}, \sigma^2 \mathbf{I}\right)$ with
$$p_{\boldsymbol{\theta}}(\boldsymbol{x} \mid \boldsymbol{z})=\mathcal{N}\left(\boldsymbol{x} \mid f_d(\boldsymbol{z} ; \boldsymbol{\theta}), \sigma^2 \mathbf{I}\right)$$

where $f_d$ is the decoder. For hinary ohservations we should nse a Rernoulli likelihond:
$$p(\boldsymbol{x} \mid \boldsymbol{z}, \boldsymbol{\theta})=\prod_{i=1}^D \operatorname{Ber}\left(x_i \mid f_d(\boldsymbol{z} ; \boldsymbol{\theta}), \sigma^2 \mathbf{I}\right)$$
Second, we create another model, $q(\boldsymbol{z} \mid \boldsymbol{x})$, called the recognition network or inference network, that is trained simultaneously with the generative model to do approximate posterior inference. If we assume the posterior is Gaussian, with diagonal covariance, we get
$$q_{\boldsymbol{\phi}}(\boldsymbol{z} \mid \boldsymbol{x})=\mathcal{N}\left(\boldsymbol{z} \mid f_{e, \mu}(\boldsymbol{x} ; \boldsymbol{\phi}), \operatorname{diag}\left(f_{e, \sigma}(\boldsymbol{x} ; \boldsymbol{\phi})\right)\right)$$
where $f_e$ is the encoder. See Figure $20.22$ for a sketch.
The idea of training an inference network to “invert” a generative network, rather than running an optimization algorithm to infer the latent code, is called amortized inference.

机器学习代考

计算机代写|机器学习代写machine learning代考|Variational autoencoders

$$p_\theta(\boldsymbol{x} \mid \boldsymbol{z})=\mathcal{N}\left(\boldsymbol{x} \mid f_d(\boldsymbol{z} ; \boldsymbol{\theta}), \sigma^2 \mathbf{I}\right)$$

$$p(\boldsymbol{x} \mid \boldsymbol{z}, \boldsymbol{\theta})=\prod_{i=1}^D \operatorname{Ber}\left(x_i \mid f_d(\boldsymbol{z} ; \boldsymbol{\theta}), \sigma^2 \mathbf{I}\right)$$

$$q_\phi(\boldsymbol{z} \mid \boldsymbol{x})=\mathcal{N}\left(\boldsymbol{z} \mid f_{e, \mu}(\boldsymbol{x} ; \boldsymbol{\phi}), \operatorname{diag}\left(f_{e, \sigma}(\boldsymbol{x} ; \boldsymbol{\phi})\right)\right)$$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: