数学代写|基础数据分析代写Elementary data Analysis代考|STAT280

数学代写|基础数据分析代写Elementary data Analysis代考|Warnings

Some caveats are in order.

1. All of these model selection methods aim at getting models which will generalize well to new data, if it follows the same distribution as old data. Generalizing well even when distributions change is a much harder and much less well-understood problem (Quiñonero-Candela et al., 2009). It is particularly troublesome for a lot of applications involving large numbers of human beings, because society keeps changing all the time – variables vary by definition, but the relationships between variables also change. (That’s history.)
2. All of the standard theory of statistical inference you have learned so far presumes that you have a model which was fixed in advance of seeing the data. If you use the data to select the model, that theory becomes invalid, and it will no longer give you correct $p$-values for hypothesis tests, confidence sets for parameters, etc., etc. Typically, using the same data both to select a model and to do inference leads to too much confidence that the model is correct, significant, and estimated precisely.
3. All the model selection methods we have discussed aim at getting models which predict well. This is not necessarily the same as getting the true theory of the world. Presumably the true theory will also predict well, but the converse does not necessarily follow. We will see examples later where false but low-capacity models, because they have such low variance of estimation, actually out-predict correctly specified models.

The last two items – combining selection with inference, and parameter interpretation – deserve elaboration.

数学代写|基础数据分析代写Elementary data Analysis代考|Inference after Selection

You have, by this point, learned a lot of inferential statistics – how to test various hypotheses, calculate $p$-values, find confidence regions, etc. Most likely, you have been taught procedures or calculations which all presume that the model you are working with is fixed in advance of seeing the data. But, of course, if you do model selection, the model you do inference within is not fixed in advance, but is actually a function of the data. What happens then?

This depends on whether you do inference with the same data used to select the model, or with another, independent data set. If it’s the same data, then all of the inferential statistics become invalid – none of the calculations of probabilities on which they rest are right any more. Typically, if you select a model so that it fits the data well, what happens is that confidence regions become too small ${ }^{10}$, as do $p$-values for testing hypotheses about parameters. Nothing can be trusted as it stands.

The essential difficulty is this: Your data are random variables. Since you’re doing model selection, making your model a function of the data, that means your model is random too. That means there is some extra randomness in your estimated parameters (and everything else), which isn’t accounted for by formulas which assume a fixed model (Exercise 4). This is not just a problem with formal model-selection devices like cross-validation. If you do an initial, exploratory data analysis before deciding which model to use – and that’s generally a good idea – you are, yourself, acting as a noisy, complicated model-selection device.

There are three main ways of dealing with this issue of post-selection inference.

1. Ignore it. This can actually make sense if you don’t really care about doing inference within your selected model, you just care about what model is selected. Otherwise, I can’t recommend it.
2. Beat it with more statistical theory. There is, currently, a lot of interest among statisticians in working out exactly what happens to sampling distributions under various combinations of models, model-selection methods, and assumptions about the true, data-generating process. Since this is an active area of research in statistical theory, I will pass it by, with some references in $\ 3.6^{11}$.
3. Evade it with an independent data set. Remember that if the events $A$ and $B$ are probabilistically independent, then $\operatorname{Pr}(A \mid B)=\operatorname{Pr}(A)$. Now set $A=$ “the confidence set we calculated from this new data covers the truth” and $B=$ “the model selected from this old data was such-and-such”. So long as the old and the new data are independent, it doesn’t matter that the model was selected using data, rather than being fixed in advance.

基础数据分析代考

数学代写|基础数据分析代写基本数据分析代考|警告

. sh
. .

1. 所有这些模型选择方法的目的都是在新数据遵循与旧数据相同的分布情况下，得到对新数据具有良好泛化能力的模型。即使在分布发生变化的情况下，如何很好地进行泛化是一个更加困难和更不容易理解的问题(Quiñonero-Candela et al.， 2009)。对于许多涉及大量人员的应用程序来说，这尤其麻烦，因为社会一直在变化——变量根据定义而变化，但变量之间的关系也在变化。到目前为止，你学过的所有统计推断的标准理论都假设你有一个在看到数据之前就已经固定的模型。如果使用数据来选择模型，该理论就失效了，它将不再为假设检验提供正确的$p$ -值，参数的置信集，等等。通常，使用相同的数据来选择模型和进行推断会导致过于相信模型是正确的、重要的和精确估计的。我们所讨论的所有模型选择方法都是为了得到预测良好的模型。这与获得世界的真实理论并不一定相同。假设真正的理论也能很好地预测，但反过来就不一定了。稍后我们将看到一些例子，其中错误但低容量的模型(因为它们具有如此低的估计方差)实际上超出了正确指定的模型的预测
最后两项-结合选择推理和参数解释-值得详细阐述 数学代写|基础数据分析代写基本数据分析代考|选择后推断 . 到此为止，您已经学习了许多推理统计学——如何检验各种假设，计算$p$ -值，找到置信区域，等等。最有可能的是，您已经学习了程序或计算，它们都假定您正在处理的模型在看到数据之前就已经固定了。但是，当然，如果你进行模型选择，你在其中进行推理的模型并不是预先固定的，而实际上是数据的一个函数。然后会发生什么?
这取决于你是使用相同的数据来选择模型，还是使用另一个独立的数据集进行推断。如果是同样的数据，那么所有的推论统计都是无效的——它们所依赖的概率计算都不再正确。通常情况下，如果您选择一个模型以便它很好地适合数据，那么将会发生的情况是置信区域变得太小${ }^{10}$，以及用于测试参数假设的$p$ -值。没有什么是可以信任的。关键的困难在于:你的数据是随机变量。因为你在做模型选择，让你的模型成为数据的函数，这意味着你的模型也是随机的。这意味着在你估计的参数(和其他一切)中有一些额外的随机性，这是无法用假设固定模型的公式解释的(练习4)。这不仅仅是像交叉验证这样的正式模型选择设备的问题。如果你在决定使用哪个模型之前做了初步的探索性数据分析(这通常是个好主意)，那么你自己就是在充当一个嘈杂的、复杂的模型选择设备有三种主要的方法来处理这个后选择推理的问题
1. 忽略它。这实际上是有意义的，如果你不关心在你选择的模型中做推断，你只关心选择了什么模型。否则，我就不推荐了。用更多的统计理论击败它。目前，统计学家对研究在各种模型、模型选择方法和关于真实数据生成过程的假设的组合下，抽样分布究竟会发生什么很感兴趣。由于这是统计理论中一个活跃的研究领域，我将跳过它，在$\ 3.6^{11}$中有一些参考资料。记住，如果事件$A$和$B$是概率独立的，那么$\operatorname{Pr}(A \mid B)=\operatorname{Pr}(A)$。现在设置$A=$“我们从这个新数据中计算出的置信集覆盖真相”和$B=$“从这个旧数据中选择的模型是某某”。只要新旧数据是独立的，模型是用数据选择的并不重要，而不是预先固定的。

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: