# 统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|ECE6254

## 统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Fitting a Linear Multiple Regression Model via

In a general context, we have a covariate vector $X=\left(X_1, \ldots, X_p\right)^{\mathrm{T}}$ and we want to use this information to predict or explain how this variable affects a real-value response $Y$. The linear multiple regression model assumes a relationship given by
$$Y=\beta_0+\sum_{j=1}^p X_j \beta_j+\epsilon,$$
where $\epsilon$ is a random error with mean $0, E(\epsilon)=0$ and is independent of $X$. This error is included in the model to capture measurement errors and the effects of other unregistered explanatory variables that can help to explain the mean response.

Then, the conditional mean of this model is $E(Y \mid X)=\beta_0+\sum_{j=1}^p X_j \beta_j$ and the conditional distribution of $Y$ given $X$ is only affected by the information of $X$.
For estimating the parameters $\beta=\left(\beta_0, \beta_1, \ldots, \beta_p\right)^{\mathrm{T}}$, usually we have a set of data $\left(x_i^{\mathrm{T}}, y_i\right), i=1, \ldots, n$, often known as training data, where $\boldsymbol{x}i=\left(x{i 1}, \ldots, x_{i p}\right)^{\mathrm{T}}$ is a vector of features measurement and $y_i$ is the response measurement corresponding to the $i$ th individual drawn. The most common method for estimating $\boldsymbol{\beta}$ is the least squares method (OLS) that consists of taking the $\boldsymbol{\beta}$ value that minimizes the residual sum of squares defined as
$$\operatorname{RSS}(\boldsymbol{\beta})=\sum_{i=1}^n\left(y_i-\boldsymbol{\beta}0-\boldsymbol{x}_i^{\mathrm{T}} \boldsymbol{\beta}_0\right)^2=(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta}),$$ where $\boldsymbol{\beta}_0=\left(\beta_1, \ldots, \beta_p\right)^{\mathrm{T}}, \boldsymbol{y}=\left(y_1, \ldots, y_n\right)^{\mathrm{T}}$ is the vector with the response values of all individuals, and $X$ is an $n \times(p+1)$ matrix that contains the information of the measured features of all individuals, including the intercept in the first entry: $$\boldsymbol{X}=\left[\begin{array}{cccc} 1 & x{11} & \cdots & x_{1 p} \ \vdots & \vdots & \vdots & \vdots \ 1 & x_{n 1} & \cdots & x_{n p} \end{array}\right] .$$
If the $\boldsymbol{X}$ matrix has full column rank, then by differentiating the residual sum of squares with respect to the $\boldsymbol{\beta}$ coefficients, we can find the set of $\boldsymbol{\beta}$ parameters that minimize the $\operatorname{RSS}(\boldsymbol{\beta})$,
$$\frac{\operatorname{RSS}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=\frac{(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=\frac{\boldsymbol{y}^{\mathrm{T}} \boldsymbol{y}-\boldsymbol{2} \boldsymbol{y}^{\mathrm{T}} \boldsymbol{X} \boldsymbol{\beta}+\boldsymbol{\beta}^{\mathrm{T}}\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right) \boldsymbol{\beta}}{\partial \boldsymbol{\beta}}=\mathbf{2}\left[\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right) \boldsymbol{\beta}-\boldsymbol{X}^{\mathrm{T}} \boldsymbol{Y}\right]$$

## 统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Fitting the Linear Multiple Regression Model via

The maximum likelihood (ML) estimation is a more general and popular method for estimating the parameters of a model (Casella and Berger 2002). It consists of finding the parameter value that maximizes the “probability” of observed values in the sample under the adopted model. Specifically, if $\left(\boldsymbol{x}_i^{\mathrm{T}}, y_i\right), i=1, \ldots, n$, is a set of observations from a multiple linear regression model (3.1) with homoscedastic and uncorrelated errors, the MLE of $\boldsymbol{\beta}$ and $\sigma^2, \widehat{\boldsymbol{\beta}}$ and $\widehat{\sigma}^2$, of this model is defined as
$$\left(\widehat{\boldsymbol{\beta}}^{\mathrm{T}}, \hat{\sigma}^2\right)=\underset{\boldsymbol{\beta}, \sigma^2}{\arg \max } L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right),$$
where $L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)$ is the likelihood function of the parameters, which is the probability of the observed response values but viewed as a function of the parameters
$$L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)=\left(\frac{1}{\sqrt{2 \pi \sigma^2}}\right)^n \exp \left[-\frac{1}{2 \sigma^2}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})\right] .$$
Then, the $\log \left(L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)$ is equal to
$$\log \left(L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)=-\frac{n}{2} \log (2 \pi)-n \log (\sigma)-\frac{1}{2 \sigma^2}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})$$
To find the maximum of $\sigma^2$ and $\boldsymbol{\beta}$, we get the derivative of $\log \left(L\left(\widehat{\boldsymbol{\beta}}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)$ with regard to these parameters
$$\begin{gathered} \frac{\log \left(L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)}{\partial \boldsymbol{\beta}}=\frac{\left[\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right) \boldsymbol{\beta}-\boldsymbol{X}^{\mathrm{T}} \boldsymbol{Y}\right]}{\sigma^2} \ \frac{\log \left(L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)}{\partial \sigma^2}=-\frac{n}{2 \sigma^2}+\frac{1}{2 \sigma^4}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta}) \end{gathered}$$
Now, by setting these derivatives equal to zero and solving the resulting equations for $\boldsymbol{\beta}$ and $\sigma^2$. we found that the estimates of these parameters are
$$\begin{gathered} \widehat{\boldsymbol{\beta}}=\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{y} \ \widehat{\sigma}^2-\frac{1}{n}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}}) . \end{gathered}$$
From this we can see that for each value of $\sigma^2$, the value of $\boldsymbol{\beta}$ that maximizes the likelihood is the same value that maximizes $-\frac{1}{2 \sigma^2}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})$, which in turn minimizes $\left.(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}} \boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta}\right)$, which is precisely the OLS of $\boldsymbol{\beta}, \widehat{\boldsymbol{\beta}}$. But when equating the derivative of $\log \left(L\left(\widehat{\boldsymbol{\beta}}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)$ to zero and solving for $\sigma^2$, the value of $\sigma^2$ that maximizes $L\left(\widehat{\boldsymbol{\beta}}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)$ is $\widehat{\sigma}^2=\frac{1}{n}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}})$.
Finally,
$$L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right) \leq L\left(\widehat{\boldsymbol{\beta}}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right) \leq L\left(\widehat{\boldsymbol{\beta}}, \widehat{\sigma}^2 ; \boldsymbol{y}, \boldsymbol{X}\right)$$
and from here, the MLE of $\boldsymbol{\beta}$ and $\sigma^2$ are $\widehat{\boldsymbol{\beta}}$ and $\widehat{\sigma}^2$, because it can be shown that the values of parameters that maximize the likelihood are unique when the design matrix $\boldsymbol{X}$ is of full column rank.

# 统计与机器学习代考

## 统计代写|统计与机器学习作业代写统计和机器学习代考|通过

$$Y=\beta_0+\sum_{j=1}^p X_j \beta_j+\epsilon,$$

$$\frac{\operatorname{RSS}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=\frac{(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=\frac{\boldsymbol{y}^{\mathrm{T}} \boldsymbol{y}-\boldsymbol{2} \boldsymbol{y}^{\mathrm{T}} \boldsymbol{X} \boldsymbol{\beta}+\boldsymbol{\beta}^{\mathrm{T}}\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right) \boldsymbol{\beta}}{\partial \boldsymbol{\beta}}=\mathbf{2}\left[\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right) \boldsymbol{\beta}-\boldsymbol{X}^{\mathrm{T}} \boldsymbol{Y}\right]$$

## 统计代写|统计与机器学习作业代写统计和机器学习代考|拟合线性多元回归模型通过

$$\left(\widehat{\boldsymbol{\beta}}^{\mathrm{T}}, \hat{\sigma}^2\right)=\underset{\boldsymbol{\beta}, \sigma^2}{\arg \max } L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right),$$
where $L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)$ 参数的似然函数，即观测到的响应值的概率，但被视为参数的函数，是否
$$L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)=\left(\frac{1}{\sqrt{2 \pi \sigma^2}}\right)^n \exp \left[-\frac{1}{2 \sigma^2}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})\right] .$$

$$\log \left(L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)=-\frac{n}{2} \log (2 \pi)-n \log (\sigma)-\frac{1}{2 \sigma^2}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})$$的最大值 $\sigma^2$ 和 $\boldsymbol{\beta}$的导数 $\log \left(L\left(\widehat{\boldsymbol{\beta}}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)$ 对于这些参数
$$\begin{gathered} \frac{\log \left(L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)}{\partial \boldsymbol{\beta}}=\frac{\left[\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right) \boldsymbol{\beta}-\boldsymbol{X}^{\mathrm{T}} \boldsymbol{Y}\right]}{\sigma^2} \ \frac{\log \left(L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)}{\partial \sigma^2}=-\frac{n}{2 \sigma^2}+\frac{1}{2 \sigma^4}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta}) \end{gathered}$$现在，通过让这些导数等于零，求解得到的方程 $\boldsymbol{\beta}$ 和 $\sigma^2$。我们发现这些参数的估计值
$$\begin{gathered} \widehat{\boldsymbol{\beta}}=\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{y} \ \widehat{\sigma}^2-\frac{1}{n}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}}) . \end{gathered}$$从这里我们可以看到，对于的每一个值 $\sigma^2$的价值。 $\boldsymbol{\beta}$ 最大化的可能性和最大化的值是一样的 $-\frac{1}{2 \sigma^2}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})$，这反过来又会最小化 $\left.(\boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta})^{\mathrm{T}} \boldsymbol{y}-\boldsymbol{X} \boldsymbol{\beta}\right)$，正是的OLS $\boldsymbol{\beta}, \widehat{\boldsymbol{\beta}}$。但是当求导的时候 $\log \left(L\left(\widehat{\boldsymbol{\beta}}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)\right)$ 到零，求解 $\sigma^2$的价值。 $\sigma^2$ 最大化 $L\left(\widehat{\boldsymbol{\beta}}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right)$ 是 $\widehat{\sigma}^2=\frac{1}{n}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{X} \widehat{\boldsymbol{\beta}})$.

$$L\left(\boldsymbol{\beta}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right) \leq L\left(\widehat{\boldsymbol{\beta}}, \sigma^2 ; \boldsymbol{y}, \boldsymbol{X}\right) \leq L\left(\widehat{\boldsymbol{\beta}}, \widehat{\sigma}^2 ; \boldsymbol{y}, \boldsymbol{X}\right)$$
，从这里，MLE $\boldsymbol{\beta}$ 和 $\sigma^2$ 是 $\widehat{\boldsymbol{\beta}}$ 和 $\widehat{\sigma}^2$，因为可以看出，当设计矩阵时，使似然最大化的参数值是唯一的 $\boldsymbol{X}$ 是全列秩。

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: