# 统计代写|线性回归分析代写linear regression analysis代考|The EE Plot for Variable Selection

## 统计代写|线性回归分析代写linear regression analysis代考|The EE Plot for Variable Selection

Variable selection is the search for a subset of variables that can be deleted without important loss of information. Olive and Hawkins (2005) make an EE plot of $E S P(I)$ versus $E S P$ where $E S P(I)$ is for a submodel $I$ and $E S P$ is for the full model. This plot can also be used to complement the hypothesis test that the reduced model $I$ (which is selected before gathering data) can be used instead of the full model. The obvious extension to GAMs is to make the EE plot of $E A P(I)$ versus $E A P$. If the fitted full model and submodel $I$ are good, then the plotted points should follow the identity line with high correlation (use correlation $\geq 0.95$ as a benchmark).

To justify this claim, assume that there exists a subset $S$ of predictor variables such that if $\boldsymbol{x}S$ is in the model, then none of the other predictors is needed in the model. Write $E$ for these (“extraneous”) variables not in $S$, partitioning $\boldsymbol{x}=\left(\boldsymbol{x}_S^T, \boldsymbol{x}_E^T\right)^T$. Then $$A P=\alpha+\sum{j=1}^p S_j\left(x_j\right)=\alpha+\sum_{j \in S} S_j\left(x_j\right)+\sum_{k \in E} S_k\left(x_k\right)=\alpha+\sum_{j \in S} S_j\left(x_j\right)$$

The extraneous terms that can be eliminated given that the subset $S$ is in the model have $S_k\left(x_k\right)=0$ for $k \in E$.

Now suppose that $I$ is a candidate subset of predictors and that $S \subseteq I$. Then
$$A P=\alpha+\sum_{j=1}^p S_j\left(x_j\right)=\alpha+\sum_{j \in S} S_j\left(x_j\right)=\alpha+\sum_{k \in I} S_k\left(x_k\right)=A P(I),$$
(if $I$ includes predictors from $E$, these will have $S_k\left(x_k\right)=0$ ). For any subset $I$ that includes all relevant predictors, the correlation $\operatorname{corr}(\mathrm{AP}, \mathrm{AP}(\mathrm{I}))=1$. Hence if the full model and submodel are reasonable and if EAP and EAP(I) are good estimators of AP and AP(I), then the plotted points in the EE plot of $\operatorname{EAP}(\mathrm{I})$ versus EAP will follow the identity line with high correlation.

## 统计代写|线性回归分析代写linear regression analysis代考|Overdispersion

Definition 13.23. Overdispersion occurs when the actual conditional variance function $V(Y \mid \boldsymbol{x})$ is larger than the model conditional variance function $V_M(Y \mid \boldsymbol{x})$

Overdispersion can occur if the model is missing factors, if the response variables are correlated, if the population follows a mixture distribution, or if outliers are present. Typically it is assumed that the model is correct so $V(Y \mid \boldsymbol{x})=V_M(Y \mid \boldsymbol{x})$. Hence the subscript $M$ is usually suppressed. A GAM has conditional mean and variance functions $E_M(Y \mid A P)$ and $V_M(Y \mid A P)$ where the subscript $M$ indicates that the function depends on the model. Then overdispersion occurs if $V(Y \mid \boldsymbol{x})>V_M(Y \mid A P)$ where $E(Y \mid \boldsymbol{x})$ and $V(Y \mid \boldsymbol{x})$ denote the actual conditional mean and variance functions. Then the assumptions that $E(Y \mid \boldsymbol{x})=E_M(Y \mid \boldsymbol{x}) \equiv m(A P)$ and $V(Y \mid \boldsymbol{x})=$ $V_M(Y \mid A P) \equiv v(A P)$ need to be checked.

First check that the assumption $E(Y \mid \boldsymbol{x})=m(S P)$ is a reasonable approximation to the data using the response plot with lowess and the estimated conditional mean function $\hat{E}_M(Y \mid \boldsymbol{x})=\hat{m}(S P)$ added as visual aids. Overdispersion can occur even if the model conditional mean function $E(Y \mid S P)$ is a good approximation to the data. For example, for many data sets where $E\left(Y_i \mid \boldsymbol{x}_i\right)=m_i \rho\left(S P_i\right)$, the binomial regression model is inappropriate since $V\left(Y_i \mid \boldsymbol{x}_i\right)>m_i \rho\left(S P_i\right)\left(1-\rho\left(S P_i\right)\right)$. Similarly, for many data sets where $E(Y \mid \boldsymbol{x})=\mu(\boldsymbol{x})=\exp (S P)$, the Poisson regression model is inappropriate since $V(Y \mid \boldsymbol{x})>\exp (S P)$. If the conditional mean function is adequate, then we suggest checking for overdispersion using the $O D$ plot.

# 线性回归代考

## 统计代写|线性回归分析代写linear regression analysis代考|The EE Plot for Variable Selection

$$A P=\alpha+\sum_{j=1}^p S_j\left(x_j\right)=\alpha+\sum_{j \in S} S_j\left(x_j\right)=\alpha+\sum_{k \in I} S_k\left(x_k\right)=A P(I)$$
(如果 $I$ 包括来自 $E$ ，这些将有 $S_k\left(x_k\right)=0$ ). 对于任何子集 $I$ 包括所有相关的预测变 量，相关性 $\operatorname{corr}(\mathrm{AP}, \mathrm{AP}(\mathrm{I}))=1$. 因此，如果完整模型和子模型是合理的，并且如果 $E A P$ 和 $\operatorname{EAP}(\mathrm{I})$ 是 $A P$ 和 $A P(I)$ 的良好估计量，则 EE 图中的标绘点EAP $E(I)$ 与 $\operatorname{EAP}$ 将遵 循具有高相关性的身份线。

## 统计代写|线性回归分析代写linear regression analysis代考|Overdispersion

$E(Y \mid \boldsymbol{x})=E_M(Y \mid \boldsymbol{x}) \equiv m(A P)$ 和 $V(Y \mid \boldsymbol{x})=V_M(Y \mid A P) \equiv v(A P)$ 需要检 查。

$V\left(Y_i \mid \boldsymbol{x}_i\right)>m_i \rho\left(S P_i\right)\left(1-\rho\left(S P_i\right)\right)$. 同样，对于许多数据集，其中 $E(Y \mid \boldsymbol{x})=\mu(\boldsymbol{x})=\exp (S P)$ ，泊松回归模型是不合适的，因为
$V(Y \mid \boldsymbol{x})>\exp (S P)$. 如果条件均值函数足够，那么我们建议使用 $O D$ 阴谋。

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: