OPM has experimented with two closely related methods: “double selection” (DS) and “double machine learning” (DML). ${ }^{10}$ Both approaches build on the idea that when the ultimate objective is to estimate causal effects, good covariate selection means that one needs to build a model with variables that are related to participation status and that are related to the outcome of interest. Otherwise, differences in outcomes between participants and non-participants might be due to those variables and not the policy to be evaluated.

Hence, in a first stage, machine learning algorithms should be applied to two separate estimation problems: how covariates are related to the treatment assignment variable and how they are related to the outcome variable. In a second stage, the results of these first two analyses can be used to estimate the impact of the policy.

The key insight here is that the first stage in this approach can be interpreted as a prediction problem that machine learning can help to address. DS does so by employing machine learning-driven regularized regression methods, such as LASSO, to automatically select a limited set of variables that turn out to be related to both participation status and the outcome variable, and to then feed those variables into causally interpreted second-stage estimations. Simply put, machine learning in the first stage here is used for variable selection alone.
DML exploits the predictive power of machine learning approaches more comprehensively. In the first stage, a flexible set of machine learning methods (e.g. random forests or neural networks) can be used to separately model the relationship between covariates, the outcome variable, and the participation assignment variable. These models are used to predict both outcomes and the participation status. Prediction errors, that is, residuals, are recorded and used in the second stage to estimate the effect of the policy. The intuition behind this approach is that if the relationships between covariates and the outcome, on the one hand, and between covariates and the participation assignment, on the other, are modelled well in the first stage, the remaining errors will capture information that cannot be explained by the covariates controlled for. Hence, this information should reveal whether once taking covariates into account, participation assignment can explain remaining variation in the outcome variable.

## 经济代写|发展经济学代写Development Economics代考|What Was the Benefit of Using Machine Learning in QIEs?

There are two main benefits of using machine learning for variable selection and modelling purposes in QIEs. First, employing such algorithms allows for a full systematic search over the set of baseline covariates – including their transformations and interactions – to identify and control for relationships in the data that might be biasing raw comparisons of outcomes between participant and non-participant groups. Hence, assuming that these machine learning algorithms are employed correctly, this adds substantive robustness to the underlying assumption in many QIEs that all relevant covariates that drive systematic differences between the two groups are controlled for appropriately and hence treatment is as good as random.

Second, employing these methods removes researcher discretion from the process of covariate selection and modelling, thereby increasing the rigour of QIE estimation processes. Even though, as described above, conventionally this process is pre-specified and theory driven, disagreement often still persists among researchers about the exact specifications to choose in practice. Machine learning allows researchers to be systematic about this process:

There are many disadvantages to the traditional process, including but not limited to the fact that researchers would find it difficult to be systematic or comprehensive in checking alternative specifications (…). The regularisation and systematic model selection have many advantages over traditional approaches, and for this reason will become a standard part of empirical practice in economics.

OPM 尝试了两种密切相关的方法：“双重选择”（DS）和“双重机器学习”（DML）。10这两种方法都建立在这样一种思想上，即当最终目标是估计因果效应时，良好的协变量选择意味着需要建立一个模型，其中的变量与参与状态相关并且与感兴趣的结果相关。否则，参与者和非参与者之间的结果差异可能是由于这些变量而不是要评估的政策。

DML 更全面地利用了机器学习方法的预测能力。在第一阶段，可以使用一组灵活的机器学习方法（例如随机森林或神经网络）分别对协变量、结果变量和参与分配变量之间的关系进行建模。这些模型用于预测结果和参与状态。预测误差，即残差，在第二阶段被记录并用于估计策略的效果。这种方法背后的直觉是，如果一方面协变量和结果之间的关系，另一方面协变量和参与分配之间的关系在第一阶段建模得很好，那么剩余的错误将捕获无法被识别的信息。由控制的协变量解释。因此，

## 经济代写|发展经济学代写Development Economics代考|What Was the Benefit of Using Machine Learning in QIEs?

