# 数学代写|机器学习代写machine learning代考|COMP3670

## 数学代写|机器学习代写machine learning代考|MAP estimation with a Laplace prior

There are many ways to compute such sparse estimates (see e.g., [Bha+19]). In this section we focus on MAP estimation using the Laplace distribution (which we discussed in Section 11.6.1) as the prior:
$$p(\boldsymbol{w} \mid \lambda)=\prod_{d=1}^D \operatorname{Lap}\left(w_d \mid 0,1 / \lambda\right) \propto \prod_{d=1}^D e^{-\lambda\left|w_d\right|}$$
where $\lambda$ is the sparsity parameter, and
$$\operatorname{Lap}(w \mid \mu, b) \triangleq \frac{1}{2 b} \exp \left(-\frac{|w-\mu|}{b}\right)$$
Here $\mu$ is a location parameter and $b>0$ is a scale parameter. Figure $2.15$ shows that Lap $(w \mid 0, b)$ puts more density on 0 than $\mathcal{N}\left(w \mid 0, \sigma^2\right)$, even when we fix the variance to be the same.
To perform MAP estimation of a linear regression model with this prior, we just have to minimize the following objective:
$$\operatorname{PNLL}(\boldsymbol{w})=-\log p(\mathcal{D} \mid \boldsymbol{w})-\log p(\boldsymbol{w} \mid \lambda)=|\mathbf{X} \boldsymbol{w}-\boldsymbol{y}|_2^2+\lambda|\boldsymbol{w}|_1$$
where $|\boldsymbol{w}|_1 \triangleq \sum_{d=1}^D\left|w_d\right|$ is the $\ell_1$ norm of $\boldsymbol{w}$. This method is called lasso, which stands for “least absolute shrinkage and selection operator” [Tib96]. (We explain the reason for this name below.) More generally, MAP estimation with a Laplace prior is called $\ell_1$-regularization.

## 数学代写|机器学习代写machine learning代考|Why does 1 regularization yield sparse solutions

We now explain why $\ell_1$ regularization results in sparse solutions, whereas $\ell_2$ regularization does not. We focus on the case of linear regression, although similar arguments hold for other models.

The lasso objective is the following non-smooth objective (see Section 8.1.4 for a discussion of smoothness):
$$\min {\boldsymbol{w}} \mathrm{NLL}(\boldsymbol{w})+\lambda|\boldsymbol{w}|_1$$ This is the Lagrangian for the following quadratic program (see Section 8.5.4): $$\min {\boldsymbol{w}} \operatorname{NLL}(\boldsymbol{w}) \text { s.t. }|\boldsymbol{w}|_1 \leq B$$
where $B$ is an upper bound on the $\ell_1$-norm of the weights: a small (tight) bound $B$ corresponds to a large penalty $\lambda$, and vice versa.

Similarly, we can write the ridge regression objective $\min {\boldsymbol{w}} \mathrm{NLL}(\boldsymbol{w})+\lambda|\boldsymbol{w}|_2^2$ in bound constrained form: $$\min {\boldsymbol{w}} \operatorname{NLL}(\boldsymbol{w}) \text { s.t. }|\boldsymbol{w}|_2^2 \leq B$$

In Figure 11.8, we plot the contours of the NLL objective function, as well as the contours of the $\ell_2$ and $\ell_1$ constraint surfaces. From the theory of constrained optimization (Section 8.5) we know that the optimal solution occurs at the point where the lowest level set of the objective function intersects the constraint surface (assuming the constraint is active). It should be geometrically clear that as we relax the constraint $B$, we “grow” the $\ell_1$ “ball” until it meets the objective; the corners of the ball are more likely to intersect the ellipse than one of the sides, especially in high dimensions, because the corners “stick out” more. The corners correspond to sparse solutions, which lie on the coordinate axes. By contrast, when we grow the $\ell_2$ ball, it can intersect the objective at any point; there are no “corners”, so there is no preference for sparsity.

# 机器学习代考

## 数学代写|机器学习代写machine learning代考|MAP estimation with a Laplace prior

$$p(\boldsymbol{w} \mid \lambda)=\prod_{d=1}^D \operatorname{Lap}\left(w_d \mid 0,1 / \lambda\right) \propto \prod_{d=1}^D e^{-\lambda\left|w_d\right|}$$

$$\operatorname{Lap}(w \mid \mu, b) \triangleq \frac{1}{2 b} \exp \left(-\frac{|w-\mu|}{b}\right)$$

$$\operatorname{PNLL}(\boldsymbol{w})=-\log p(\mathcal{D} \mid \boldsymbol{w})-\log p(\boldsymbol{w} \mid \lambda)=|\mathbf{X} \boldsymbol{w}-\boldsymbol{y}|2^2+\lambda|\boldsymbol{w}|_1$$ 在哪里 $|\boldsymbol{w}|_1 \triangleq \sum{d=1}^D\left|w_d\right|$ 是个 $\ell_1$ 规范 $\boldsymbol{w}$. 这种方法称为套索，代表”最小绝对收缩和选择算子”TTib96]。 (我们在下面解释这个名称的原因。) 更一般地，使用拉普拉斯先验的 MAP 估计被称为 $\ell_1$-正则化。

## 数学代写|机器学习代写machine learning代考|Why does 1 regularization yield sparse solutions

Iasso 目标是以下非平滑目标 (有关平滑度的讨论，请参见第 8.1.4 节)：
$$\min \boldsymbol{w} \operatorname{NLL}(\boldsymbol{w})+\lambda|\boldsymbol{w}|_1$$

$$\min \boldsymbol{w} \operatorname{NLL}(\boldsymbol{w}) \text { s.t. }|\boldsymbol{w}|_1 \leq B$$

$$\min \boldsymbol{w} \operatorname{NLL}(\boldsymbol{w}) \text { s.t. }|\boldsymbol{w}|_2^2 \leq B$$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: