# 计算机代写|机器学习代写machine learning代考|CS446

## 计算机代写|机器学习代写machine learning代考|Mixture of linear experts

In this section, we consider a simple example in which we use linear regression experts and a linear classification gating function, i.e., the model has the form:
\begin{aligned} p(y \mid \boldsymbol{x}, z=k, \boldsymbol{\theta}) &=\mathcal{N}\left(y \mid \boldsymbol{w}_k^{\top} \boldsymbol{x}, \sigma_k^2\right) \ p(z=k \mid \boldsymbol{x}, \boldsymbol{\theta}) &=\operatorname{Cat}\left(z \mid \mathcal{S}_k(\mathbf{V} \boldsymbol{x})\right) \end{aligned}
where $\mathcal{S}_k$ is the $k$ ‘th output from the softmax function. The individual weighting term $p(z=k \mid \boldsymbol{x})$ is called the responsibility for expert $k$ for input $\boldsymbol{x}$. In Figure 13.23b, we see how the gating networks softly partitions the input space amongst the $K=3$ experts.

Each expert $p(y \mid \boldsymbol{x}, z=k)$ corresponds to a linear regression model with different parameters. These are shown in Figure 13.23c.

If we take a weighted combination of the experts as our output, we get the red curve in Figure 13.23a, which is clearly is a bad predictor. If instead we only predict using the most active expert (i.e., the one with the highest responsibility), we get the discontinuous black curve, which is a much better predictor.

## 计算机代写|机器学习代写machine learning代考|Neural Networks for Images

To see why it is not a good idea to apply MLPs directly to image data, recall that the core operation in an MLP at each hidden layer is computing the activations $\boldsymbol{z}=\varphi(\mathbf{W} \boldsymbol{x})$, where $\boldsymbol{x}$ is the input to a layer, $W$ are the weights, and $\varphi()$ is the nonlinear activation function. Thus the $j$ ‘th element of the hidden layer has value $z_j=\varphi\left(\boldsymbol{w}_j^{\top} \boldsymbol{x}\right)$. We can think of this inner product operation as comparing the input $\boldsymbol{x}$ to a learned template or pattern $\boldsymbol{w}_j$; if the match is good (large positive inner product), the activation of that unit will be large (assuming a ReLU nonlinearity), signalling that the $j$ th pattern is present in the input.

However, this does not work well if the input is a variable-sized image, $\boldsymbol{x} \in \mathbb{R}^{W H C}$, where $W$ is the width, $H$ is the height, and $C$ is the number of input channels (e.g., $C=3$ for RGB color). The problem is that we would need to learn a different-sized weight matrix $\mathbf{W}$ for every size of input image. In addition, even if the input was fixed size, the number of parameters needed would be prohibitive for reasonably sized images, since the weight matrix would have size $(W \times H \times C) \times D$, where $D$ is the number of outputs (hidden units). The final problem is that a pattern that occurs in one location may not be recognized when it occurs in a different location – that is, the model may not exhibit translation invariance – because the weights are not shared across locations (see Figure 14.1).

To solve these problems, we will use convolutional neural networks (CNNs), in which we replace matrix multiplication with a convolution operation. We explain this in detail in Section $14.2$, but the basic idea is to divide the input into overlapping $2 \mathrm{~d}$ image patches, and to compare each patch with a set of small weight matrices, or filters, which represent parts of an object; this is illustrated in Figure 14.2. We can think of this as a form of template matching. We will learn these templates from data, as we explain below. Because the templates are small (often just $3 x 3$ or $5 \times 5$ ), the number of parameters is significantly reduced. And because we use convolution to do the is useful for tasks such as image classification, where the goal is to classify if an object is present, regardless of its location.

CNNs have many other applications besides image classification, as we will discuss later in this chapter. They can also be applied to $1 \mathrm{~d}$ inputs (see Section $15.3$ ) and $3 \mathrm{~d}$ inputs; however, we mostly focus on the $2 \mathrm{~d}$ case in this chapter.

# 机器学习代考

## 计算机代写|机器学习代写machine learning代考|Mixture of linear experts

$$p(y \mid \boldsymbol{x}, z=k, \boldsymbol{\theta})=\mathcal{N}\left(y \mid \boldsymbol{w}_k^{\top} \boldsymbol{x}, \sigma_k^2\right) p(z=k \mid \boldsymbol{x}, \boldsymbol{\theta}) \quad=\operatorname{Cat}\left(z \mid \mathcal{S}_k(\mathbf{V} \boldsymbol{x})\right)$$

## 计算机代写|机器学习代写machine learning代考|Neural Networks for Images

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: