## 统计代写|回归分析作业代写Regression Analysis代考|Variable Selection

When you are contemplating which variables you should use to predict $Y$ in your regression model, consider the following three questions:

1. Global Variables: Which variables $V_1, V_2, \ldots$ are possibly related to $Y$ ? (This set of variables is likely to be infinite.)
2. Measurable Variables: Which subset of variables $\left{W_1, \ldots, W_{\mathrm{K}}\right} \subseteq\left{V_1, V_2, \ldots\right}$ can you actually measure, and get into a data set amenable to estimation of regression models?
3. Variables to Use in Your Estimated Model: Which subset of measured variables $\left.\mid X_1, \ldots, X_k\right} \subseteq\left{W_1, \ldots, W_K\right}$ should you ultimately use in your estimated model?
Question 1 is purely conceptual; it is a “think about it using your knowledge of the subject matter” kind of question. But such “brainstorming” about potential variables will be very useful to you when you frame your causality arguments; “brainstorming” will also be very useful to help you identify good predictive variables that you might decide to collect data upon.

Question 2 becomes more practical. When you are designing your study to collect measurements on $W_1, \ldots, W_K$, consider not only your answers to question 1 , but also the following:

• Precedent. Which variables have been used in the past for similar prediction models?
• Availability. Which variables are actually available?
In addition, consider the goals of your analysis. Are you interested in predictive modeling, without regard for causality, or are you interested in establishing whether there is a causal link between a particular $X$ variable and $Y$ ? In the predictive case, the candidate set $W_1, \ldots, W_K$ should include everything you can possibly lay your hands on. You are looking for the biggest, richest data set you can get.

On the other hand, if you wish to establish a causal connection, then you need to ask yourself, “What are the possible confounding variables?” Do some literature review, or simply think about it very hard. Use the two examples discussed previously in this book for guidance: (i) the drownings/ice cream sales case had a possible confounding variable, “Temperature,” and (ii) the charitable contributions/number of dependents example had a possible confounding variable “Religiosity.” You need to identify as many confounding variables as possible, or perhaps their surrogates, to include in the measured set $W_1, \ldots, W_K$, in order to have a better argument for causal effect of $W_1$ (say) on $Y$.

Now consider question 3 above, and notice the ” $K$ ” versus ” $k$ ” distinction. This distinction is not a “random vs. fixed” distinction as in the case of $\mathbf{X}$ vs.

## 统计代写|回归分析作业代写Regression Analysis代考|Theory Versus Practice

Hans is applying for graduate school at Calisota Tech University (CTU). He sends CTU his quantitative score on the GRE entrance examination $\left(X_1=140\right)$, his verbal score on the $\mathrm{GRE}\left(X_2=160\right)$, and his undergraduate GPA $\left(X_3=2.7\right)$. What would be his final graduate GPA at CTU?

Of course, no one can say. But what we do know, from the Law of Total Variance discussed in Chapter 6, is that the variance of the conditional distribution of $Y=$ final CTU GPA is smaller on average when you consider additional variables. Specifically,
$$\mathrm{E}\left{\operatorname{Var}\left(Y \mid X_1, X_2, X_3\right)\right} \leq \mathrm{E}\left{\operatorname{Var}\left(Y \mid X_1, X_2\right)\right} \leq \mathrm{E}\left{\operatorname{Var}\left(Y \mid X_1\right)\right}$$

Figure $11.1$ shows how these inequalities might appear, as they relate to Hans. The variation in potentially observable GPAs among students who are like Hans in that they have GRE Math $=140$ is shown in the top panel. Some of that variation is explained by different verbal abilities among students, and the second panel removes that source of variation by considering GPA variation among students who, like Hans, have GRE Math $=140$, and GRE Verbal $=160$. But some of that variation is explained by the general student diligence. Assuming undergraduate GPA is a reasonable measure of such “diligence,” the final panel removes that source of variation by considering GPA variation among students who, like Hans, have GRE Math $=140$, and GRE verbal $=160$, and undergrad GPA $=2.7$. Of course, this can go on and on if additional variables were available, with each additional variable removing a source of variation, leading to distributions with smaller and smaller variances.

The means of the distributions shown in Figure $11.1$ are $3.365,3.5$, and 3.44, respectively. If you were to use one of the distributions to predict Hans, which one would you pick? Clearly, you should pick the one with the smallest variance. His ultimate GPA will be the same number under all three distributions, and since the third distribution has the smallest variance, his GPA will likely be closer to its mean (3.44) than to the other distribution means ( $3.365$ or $3.5)$.
While Figure $11.1$ gives the right answer in theory, which is to use $\mathrm{E}\left(\mathrm{GPA} \mid X_1=140\right.$, $\left.X_2=160, X_3=2.7\right)=3.44$ to predict Hans’ GPA, the problem is that you do not know the number $3.44$, since it is the mean of infinitely many potentially observable GPA values among students who have the combination $X_1=140, X_2=160$, and $X_3=2$.7. Instead, you have to estimate $\mathrm{E}\left(\mathrm{GPA} \mid X_1=140, X_2=160, X_3=2.7\right)$ using data. By the same logic, you do not know $\mathrm{E}\left(\mathrm{GPA} \mid X_1=140, X_2=160\right)=3.5$ and $\mathrm{E}\left(\mathrm{GPA} \mid X_1=140\right)=3.365$ either, and you would have to estimate them using data as well.

## 统计代写|回归分析作业代写Regression Analysis代考|Variable Selection

1. 全局变量：哪些变量在1,在2,…可能与是? （这组变量很可能是无限的。）
2. 可测量变量：变量的哪个子集\left{W_1, \ldots, W_{\mathrm{K}}\right} \subseteq\left{V_1, V_2, \ldots\right}\left{W_1, \ldots, W_{\mathrm{K}}\right} \subseteq\left{V_1, V_2, \ldots\right}您可以实际测量并进入适合回归模型估计的数据集吗？
3. 估计模型中使用的变量：测量变量的哪个子集\left.\mid X_1, \ldots, X_k\right} \subseteq\left{W_1, \ldots, W_K\right}\left.\mid X_1, \ldots, X_k\right} \subseteq\left{W_1, \ldots, W_K\right}你最终应该在你的估计模型中使用吗？
问题 1 纯粹是概念性的；这是一个“用你对主题的知识来思考它”的问题。但是，当您构建因果关系论证时，这种关于潜在变量的“头脑风暴”对您非常有用；“头脑风暴”对于帮助您确定可能决定收集数据的良好预测变量也非常有用。

• 先例。过去哪些变量曾用于类似的预测模型？
• 可用性。哪些变量实际上是可用的？
此外，请考虑分析的目标。您是否对预测建模感兴趣，不考虑因果关系，或者您是否有兴趣确定特定对象之间是否存在因果关系？X变量和是? 在预测情况下，候选集在1,…,在ķ应该包括你可能接触到的所有东西。您正在寻找可以获得的最大、最丰富的数据集。

## 统计代写|回归分析作业代写Regression Analysis代考|Theory Versus Practice

Hans 正在申请 Calisota Tech University (CTU) 的研究生院。他向 CTU 发送了他在 GRE 入学考试中的量化分数(X1=140), 他的口头分数在GR和(X2=160)，以及他的本科 GPA(X3=2.7). 他在 CTU 的最终研究生 GPA 是多少？

\mathrm{E}\left{\operatorname{Var}\left(Y \mid X_1, X_2, X_3\right)\right} \leq \mathrm{E}\left{\operatorname{Var}\left(Y \中间 X_1, X_2\right)\right} \leq \mathrm{E}\left{\operatorname{Var}\left(Y \mid X_1\right)\right}\mathrm{E}\left{\operatorname{Var}\left(Y \mid X_1, X_2, X_3\right)\right} \leq \mathrm{E}\left{\operatorname{Var}\left(Y \中间 X_1, X_2\right)\right} \leq \mathrm{E}\left{\operatorname{Var}\left(Y \mid X_1\right)\right}

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: