# 统计代写|线性回归分析代写linear regression analysis代考|Least Squares Theory

## 统计代写|线性回归分析代写linear regression analysis代考|Least Squares Theory

Definition 11.13. Estimating equations are used to find estimators of unknown parameters. The least squares criterion and log likelihood for maximum likelihood estimators are important examples.

Estimating equations are often used with a model, like $\boldsymbol{Y}=\boldsymbol{X} \boldsymbol{\beta}+\boldsymbol{e}$, and often have a variable $\boldsymbol{\beta}$ that is used in the equations to find the estimator $\hat{\boldsymbol{\beta}}$ of the vector of parameters in the model. For example, the log likelihood $\log \left(L\left(\boldsymbol{\beta}, \sigma^2\right)\right)$ has $\boldsymbol{\beta}$ and $\sigma^2$ as variables for a parametric statistical model where $\boldsymbol{\beta}$ and $\sigma^2$ are fixed unknown parameters, and maximizing the log likelihood with respect to these variables gives the maximum likelihood estimators of the parameters $\boldsymbol{\beta}$ and $\sigma^2$. So the term $\boldsymbol{\beta}$ is both a variable in the estimating equations, which could be replaced by another variable such as $\boldsymbol{\eta}$, and a vector of parameters in the model. In the theorem below, we could replace $\boldsymbol{\eta}$ by $\boldsymbol{\beta}$ where $\boldsymbol{\beta}$ is a vector of parameters in the linear model and a variable in the least squares criterion which is an estimating equation.
Theorem 11.16. Let $\boldsymbol{\theta}=\boldsymbol{X} \boldsymbol{\eta} \in C(\boldsymbol{X})$ where $Y_i=\boldsymbol{x}i^T \boldsymbol{\eta}+r_i(\boldsymbol{\eta})$ and the residual $r_i(\boldsymbol{\eta})$ depends on $\boldsymbol{\eta}$. The least squares estimator $\hat{\boldsymbol{\beta}}$ is the value of $\boldsymbol{\eta} \in \mathbb{R}^p$ that minimizes the least squares criterion $\sum{i=1}^n r_i^2(\boldsymbol{\eta})=|\boldsymbol{Y}-\boldsymbol{X} \boldsymbol{\eta}|^2$
Proof. Following Seber and Lee (2003, pp. 36-36), let $\hat{\boldsymbol{Y}}=\hat{\boldsymbol{\theta}}=\boldsymbol{P}{\boldsymbol{X}} \boldsymbol{Y} \in$ $C(\boldsymbol{X}), \boldsymbol{r}=\left(\boldsymbol{I}-\boldsymbol{P}{\boldsymbol{X}}\right) \boldsymbol{Y} \in[C(\boldsymbol{X})]^{\perp}$, and $\boldsymbol{\theta} \in C(\boldsymbol{X})$. Then $(\boldsymbol{Y}-\hat{\boldsymbol{\theta}})^T(\hat{\boldsymbol{\theta}}-$ $\boldsymbol{\theta})=\left(\boldsymbol{Y}-\boldsymbol{P}{\boldsymbol{X}} \boldsymbol{Y}\right)^T\left(\boldsymbol{P}{\boldsymbol{X}} \boldsymbol{Y}-\boldsymbol{P}{\boldsymbol{X}} \boldsymbol{\theta}\right)=\boldsymbol{Y}^T\left(\boldsymbol{I}-\boldsymbol{P}{\boldsymbol{X}}\right) \boldsymbol{P}{\boldsymbol{X}}(\boldsymbol{Y}-\boldsymbol{\theta})=0$ since $\boldsymbol{P}{\boldsymbol{X}} \boldsymbol{\theta}=\boldsymbol{\theta}$. Thus $|\boldsymbol{Y}-\boldsymbol{\theta}|^2=(\boldsymbol{Y}-\hat{\boldsymbol{\theta}}+\hat{\boldsymbol{\theta}}-\boldsymbol{\theta})^T(\boldsymbol{Y}-\hat{\boldsymbol{\theta}}+\hat{\boldsymbol{\theta}}-\boldsymbol{\theta})=$
$$|\boldsymbol{Y}-\hat{\boldsymbol{\theta}}|^2+|\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}|^2+2(\boldsymbol{Y}-\hat{\boldsymbol{\theta}})^T(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}) \geq|\boldsymbol{Y}-\hat{\boldsymbol{\theta}}|^2$$
with equality iff $|\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}|^2=0$ iff $\hat{\boldsymbol{\theta}}=\boldsymbol{\theta}=\boldsymbol{X} \boldsymbol{\eta}$. Since $\hat{\boldsymbol{\theta}}=\boldsymbol{X} \hat{\boldsymbol{\beta}}$ the result follows.

## 统计代写|线性回归分析代写linear regression analysis代考|Hypothesis Testing

Suppose $\boldsymbol{Y}=\boldsymbol{X} \boldsymbol{\beta}+\boldsymbol{e}$ where $\operatorname{rank}(\boldsymbol{X})=p, E(\boldsymbol{e})=\mathbf{0}$ and $\operatorname{Cov}(\boldsymbol{e})=\sigma^2 \boldsymbol{I}$. Let $\boldsymbol{L}$ be an $r \times p$ constant matrix with $\operatorname{rank}(\boldsymbol{L})=r$, let $\boldsymbol{c}$ be an $r \times 1$ constant vector, and consider testing $H_0: \boldsymbol{L} \boldsymbol{\beta}=\boldsymbol{c}$. First theory will be given for when $\boldsymbol{e} \sim N_n\left(\mathbf{0}, \sigma^2 \boldsymbol{I}\right)$. The large sample theory will be given for when the iid zero mean $e_i$ have $V\left(e_i\right)=\sigma^2$. Note that the normal model will satisfy the large sample theory conditions.

The partial $F$ test and its special cases the ANOVA $F$ test and the Wald $t$ test use $\boldsymbol{c}=\mathbf{0}$. Let the full model use $Y, x_1 \equiv 1, x_2, \ldots, x_p$, and let the reduced model use $Y, x_1=x_{j_1} \equiv 1, x_{j_2}, \ldots, x_{j_k}$ where $\left{j_1, \ldots, j_k\right} \subset$ ${1, \ldots, p}$ and $j_1=1$. Here $1 \leq k<p$, and if $k=1$, then the model is $Y_i=\beta_1+e_i$. Hence the full model is $Y_i=\beta_1+\beta_2 x_{i, 2}+\cdots+\beta_p x_{i, p}+e_i$, while the reduced model is $Y_i=\beta_1+\beta_{j_2} x_{i, j_2}+\cdots+\beta_{j_k} x_{i, j_k}+e_i$. In matrix form, the full model is $\boldsymbol{Y}=\boldsymbol{X} \boldsymbol{\beta}+\boldsymbol{e}$ and the reduced model is $\boldsymbol{Y}=\boldsymbol{X}R \boldsymbol{\beta}_R+\boldsymbol{e}_R$ where the columns of $\boldsymbol{X}_R$ are a proper subset of the columns of $\boldsymbol{X}$. i) The partial $\mathbf{F}$ test has $H_0: \beta{j_{k+1}}=\cdots=\beta_{j_p}=0$, or $H_0:$ the reduced model is good, or $H_0: \boldsymbol{L} \boldsymbol{\beta}=\mathbf{0}$ where $\boldsymbol{L}$ is a $(p-k) \times p$ matrix where the $i$ th row of $\boldsymbol{L}$ has a 1 in the $j_{k+i}$ th position and zeroes elsewhere. In particular, if $\beta_1, \ldots, \beta_k$ are the only $\beta_i$ in the reduced model, then $\boldsymbol{L}=\left[\begin{array}{ll}0 & \boldsymbol{I}{p-k}\end{array}\right]$ and $\mathbf{0}$ is a $(p-k) \times k$ matrix. Hence $r=p-k=$ number of predictors in the full model but not in the reduced model. ii) The ANOVA F test is the special case of the partial $F$ test where the reduced model is $Y_i=\beta_1+\epsilon_i$. Hence $H_0: \beta_2=\cdots=\beta_p=0$, or $H_0$ : none of the nontrivial predictors $x_2, \ldots, x_p$ are needed in the linear model, or $H_0: \boldsymbol{L} \boldsymbol{\beta}=\mathbf{0}$ where $\boldsymbol{L}=\left[\begin{array}{ll}\mathbf{0} & \boldsymbol{I}{p-1}\end{array}\right]$ and $\mathbf{0}$ is a $(p-1) \times 1$ vector. Hence $r=p-1$. iii) The Wald $\mathbf{t}$ test uses the reduced model that deletes the $j$ th predictor from the full model. Hence $H_0: \beta_j=0$, or $H_0:$ the $j$ th predictor $x_j$ is not needed in the linear model given that the other predictors are in the model, or $H_0: \boldsymbol{L}_j \boldsymbol{\beta}=0$ where $\boldsymbol{L}_j=[0, \ldots, 0,1,0, \ldots, 0]$ is a $1 \times p$ row vector with a 1 in the $j$ th position for $j=1, \ldots, p$. Hence $r=1$.

A way to get the test statistic $F_R$ for the partial $F$ test is to fit the full model and the reduced model. Let $R S S$ be the RSS of the full model, and let $R S S(R)$ be the RSS of the reduced model. Similarly, let $M S E$ and $M S E(R)$ be the MSE of the full and reduced models. Let $d f_R=n-k$ and $d f_F=n-p$ be the degrees of freedom for the reduced and full models.

# 线性回归代考

## 统计代写|线性回归分析代写linear regression analysis代考|Least Squares Theory

$$\sum i=1^n r_i^2(\boldsymbol{\eta})=|\boldsymbol{Y}-\boldsymbol{X} \boldsymbol{\eta}|^2$$

$$|\boldsymbol{Y}-\hat{\boldsymbol{\theta}}|^2+|\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}|^2+2(\boldsymbol{Y}-\hat{\boldsymbol{\theta}})^T(\hat{\boldsymbol{\theta}}-\boldsymbol{\theta}) \geq|\boldsymbol{Y}-\hat{\boldsymbol{\theta}}|^2$$

## 统计代写|线性回归分析代写linear regression analysis代考|Hypothesis Testing

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: