机器学习代写|机器学习代写machine learning代考|CS7641

机器学习代写|机器学习代写machine learning代考|Regularized Linear Multiple Regression Model

Ridge regression, originally proposed as a method to combat multicollinearity, is also a common approach for controlling overfitting in an MLR model (Christensen 2011). It translates the OLS problem into the minimization of the penalized residual sum of squares defined as
$$\operatorname{PRSS}\lambda(\boldsymbol{\beta})=\sum{i=1}^n\left(y_i-\beta_0-\sum_{j=1}^p x_{i j} \beta_j\right)^2+\lambda \sum_{j=1}^p \beta_j^2,$$
where $\lambda \geq 0$ is known as the regularization or tuning parameter, which determines the level or degree to which the beta coefficients are shrunk toward zero. When $\lambda=0$, the OLS is the solution to the beta coefficients, but when $\lambda$ is large, the $\operatorname{PRSS}\lambda(\boldsymbol{\beta})$ is dominated by the penalization term, and the OLS solution has to shrink toward 0 (Christensen 2011). In general, when the number of parameters to be estimated is larger than the number of observations, the estimator can be highly variable. In this situation, the intuition of Ridge regression tries to alleviate this by constraining the sum of squares for the beta coefficients. Note that $\operatorname{PRSS}\lambda(\boldsymbol{\beta})$ can be expressed as
$$\operatorname{PRSS}\lambda(\boldsymbol{\beta})=\operatorname{RSS}(\boldsymbol{\beta})+\lambda \boldsymbol{\beta}^{\mathrm{T}} \boldsymbol{D} \boldsymbol{\beta},$$ where $\boldsymbol{D}=\operatorname{diag}(0,1, \ldots, 1)$ is an identity matrix of dimension $(p+1) \times(p+1)$ but with one zero in its first entry. Then, the gradient of $\operatorname{RSS}\lambda(\boldsymbol{\beta})$, that is, the first derivative with regard to $\boldsymbol{\beta}$ of $\operatorname{RSS}\lambda(\boldsymbol{\beta})$, is $$\nabla \operatorname{PRSS}\lambda(\boldsymbol{\beta})=2\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X} \boldsymbol{\beta}-\boldsymbol{X}^{\mathrm{T}} \boldsymbol{y}\right)+2 \lambda \boldsymbol{D} \boldsymbol{\beta}$$
Solving $\nabla \operatorname{PRSS}\lambda(\boldsymbol{\beta})=\mathbf{0}$, the Ridge solution is given by $$\hat{\boldsymbol{\beta}}^R(\lambda)=\underset{\boldsymbol{\beta}}{\operatorname{argmin}} \operatorname{PRSS}\lambda(\boldsymbol{\beta})=\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}+\lambda \boldsymbol{D}\right)^{-1} \mathbf{X}^{\mathrm{T}} \boldsymbol{y} .$$

机器学习代写|机器学习代写machine learning代考|Lasso Regression

Like Ridge regression, the Lasso regression solves the OLS problem but penalizes the residual sum squared in a slightly different way. With the standardized variables, the Lasso estimator of $\boldsymbol{\beta}s$ is defined as $$\tilde{\boldsymbol{\beta}}_s^L(\lambda)=\underset{\mu, \boldsymbol{\beta}{\mathrm{as}}}{\arg \operatorname{mRSS}_\lambda\left(\boldsymbol{\beta}_s\right),}$$

where now $\operatorname{PRSS}\lambda\left(\boldsymbol{\beta}_s\right)=\sum{i=1}^n\left(y_i-\mu-\sum_{j=1}^p x_{i j s} \beta_{j s}\right)^2+\lambda \sum_{j=1}^p\left|\beta_{j s}\right|$ is the $\operatorname{RSS}(\boldsymbol{\beta})$ but penalized by the sum of the absolute regression coefficients. For $\lambda=0$, the solution is the OLS, while when $\lambda$ is large, the OLS solutions are shrunken toward 0 (Tibshirani 1996).

Note that for any given values of $\boldsymbol{\beta}{0 s}$, the value of $\mu$ that minimizes $\operatorname{PRSS}\lambda\left(\boldsymbol{\beta}s\right)$ is the sample mean of the responses, $\widetilde{\mu}=\frac{1}{n} \sum{i=1}^n y_i$, the same as the Ridge estimator. However, the rest of the Lasso estimator of $\boldsymbol{\beta}s, \boldsymbol{\beta}{0 s}$, cannot be obtained analytically, so numerical methods are often used.

Although there are efficient algorithms for computing the entire regularization path for the Lasso regression coefficients (Efron et al. 2004; Friedman et al. 2008), here we will describe the coordinate-wise descent given in Friedman et al. (2007). The idea of this method is to successively optimize the $\operatorname{PRSS}\lambda\left(\boldsymbol{\beta}_s\right)$ one parameter at a time (beta coefficient). Holding $\beta{k s}, j \neq k$, fixed at their current values $\widetilde{\beta}{j s}(\lambda)$, the value of $\beta_k$ that minimizes $\operatorname{PRSS}\lambda\left(\boldsymbol{\beta}s\right)$ is given by \begin{aligned} &\widetilde{\beta}{k s}^*(\lambda)=S\left(\sum_{i=1}^n x_{i j s}\left(y_i-\widetilde{y}i^{(k)}\right), \lambda\right) \ &=S\left(\tilde{n}{k s}(\lambda)+\sum_{i=1}^n x_{i j s}\left(y_i-\widetilde{y}i\right), \lambda\right), \end{aligned} where $\widetilde{y}_i^{(k)}=\bar{y}+\sum{j=1}^p x_{i j k} \widetilde{\beta}{j s}(\lambda)$ and $S(\beta, \lambda)=\left{\begin{array}{cc}\beta-\lambda & \text { if } \beta>0 \text { and } \lambda<|\beta| \ \beta+\lambda & \text { if } \beta<0 \text { and } \lambda<|\beta| \text {. To } \ 0 & \text { if } \lambda \geq|\beta|\end{array}\right.$ obtain the Lasso estimate of $\beta{0 \mathrm{~s}}$, this process is repeated across all the coefficients until a convergence threshold criterion is reached.

机器学习代考

机器学习代写|机器学习代写机器学习代考|正则化线性多元回归模型

$$\operatorname{PRSS}\lambda(\boldsymbol{\beta})=\sum{i=1}^n\left(y_i-\beta_0-\sum_{j=1}^p x_{i j} \beta_j\right)^2+\lambda \sum_{j=1}^p \beta_j^2,$$

$$\operatorname{PRSS}\lambda(\boldsymbol{\beta})=\operatorname{RSS}(\boldsymbol{\beta})+\lambda \boldsymbol{\beta}^{\mathrm{T}} \boldsymbol{D} \boldsymbol{\beta},$$，其中$\boldsymbol{D}=\operatorname{diag}(0,1, \ldots, 1)$是一个维度为$(p+1) \times(p+1)$的单位矩阵，但在其第一个条目中有一个零。那么，$\operatorname{RSS}\lambda(\boldsymbol{\beta})$的梯度，即$\operatorname{RSS}\lambda(\boldsymbol{\beta})$对$\boldsymbol{\beta}$的一阶导数为$$\nabla \operatorname{PRSS}\lambda(\boldsymbol{\beta})=2\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X} \boldsymbol{\beta}-\boldsymbol{X}^{\mathrm{T}} \boldsymbol{y}\right)+2 \lambda \boldsymbol{D} \boldsymbol{\beta}$$

机器学习代写|机器学习代写machine learning代考|Lasso Regression

，其中$\operatorname{PRSS}\lambda\left(\boldsymbol{\beta}_s\right)=\sum{i=1}^n\left(y_i-\mu-\sum_{j=1}^p x_{i j s} \beta_{j s}\right)^2+\lambda \sum_{j=1}^p\left|\beta_{j s}\right|$是$\operatorname{RSS}(\boldsymbol{\beta})$，但被绝对回归系数的和所惩罚。对于$\lambda=0$，其解是OLS，而当$\lambda$较大时，OLS解向0收缩(Tibshirani 1996) 注意，对于$\boldsymbol{\beta}{0 s}$的任何给定值，使$\operatorname{PRSS}\lambda\left(\boldsymbol{\beta}s\right)$最小化的$\mu$值是响应的样本平均值$\widetilde{\mu}=\frac{1}{n} \sum{i=1}^n y_i$，与Ridge估计量相同。然而，$\boldsymbol{\beta}s, \boldsymbol{\beta}{0 s}$的Lasso估计量的其余部分无法解析得到，因此经常使用数值方法 虽然有计算Lasso回归系数的整个正则化路径的有效算法(Efron et al. 2004;Friedman et al. 2008)，这里我们将描述Friedman et al.(2007)给出的坐标智慧下降。这种方法的思想是依次优化$\operatorname{PRSS}\lambda\left(\boldsymbol{\beta}_s\right)$一次一个参数(beta系数)。保持$\beta{k s}, j \neq k$，固定在它们的当前值$\widetilde{\beta}{j s}(\lambda)$，使$\operatorname{PRSS}\lambda\left(\boldsymbol{\beta}s\right)$最小化的$\beta_k$的值由\begin{aligned} &\widetilde{\beta}{k s}^*(\lambda)=S\left(\sum_{i=1}^n x_{i j s}\left(y_i-\widetilde{y}i^{(k)}\right), \lambda\right) \ &=S\left(\tilde{n}{k s}(\lambda)+\sum_{i=1}^n x_{i j s}\left(y_i-\widetilde{y}i\right), \lambda\right), \end{aligned}给出，其中$\widetilde{y}_i^{(k)}=\bar{y}+\sum{j=1}^p x_{i j k} \widetilde{\beta}{j s}(\lambda)$和$S(\beta, \lambda)=\left{\begin{array}{cc}\beta-\lambda & \text { if } \beta>0 \text { and } \lambda<|\beta| \ \beta+\lambda & \text { if } \beta<0 \text { and } \lambda<|\beta| \text {. To } \ 0 & \text { if } \lambda \geq|\beta|\end{array}\right.$获得$\beta{0 \mathrm{~s}}$的Lasso估计，这个过程在所有系数中重复，直到达到收敛阈值准则

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: