قيمة التعلم الآلي المضافة لاستنتاج الأسباب: أدلة من دراسات تمت مراجعتها The value added of machine learning to causal inference: evidence from revisited studies

Introduction

One of the key goals of empirical research in economics is to estimate the causal effect of a variable of interest on a targeted outcome. To avoid biases in the coefficients of interest due to omitted variables, particularly in observational studies, it is often desirable to include in the regressions a large number of controls. Machine learning (ML) methods can potentially be useful in such settings. However, standard ML prediction models are aimed at fundamentally different problems than most of the empirical work in economics. This paper aims to present empirical researchers evidence regarding the merits of causal machine learning methods in realistic settings.

Methods

We revisit a number of influential papers by applying causal ML methods and compare the results with the traditional methods used in the original studies. We focus on both the average treatment effect (ATE) and heterogeneous treatment effects (HTE). Our main contribution is to illustrate how causal ML methods can be implemented in a variety of settings, and to highlight the relevance and additional gains that causal machine learning methods bring to the table relative to the standard econometric approaches. We employ the double/debiased machine learning (DML) method for ATE and the causal forest method for HTE.

Results

Based on our results from the sample of revisited papers, we derive and systemize four main reasons why causal machine learning methods are relevant for causal analysis and add value relative to the traditional methods. These include the ability to recover complex interactions among variables, suitability for high-dimensional settings, systematic model selection, and improved estimation of heterogeneous treatment effects. Our findings are supported by several Monte Carlo simulations which show that DML outperforms OLS when the true nuisance relationship is nonlinear.

Discussion

The econometric theory literature on adapting standard machine learning techniques to causal inference questions is growing rapidly. Despite the advances in causal ML methods, the empirical economics literature has not yet fully exploited their strengths. Our analysis suggests that causal ML methods provide robust estimates and insights into treatment effects that traditional methods may overlook, particularly in high-dimensional settings where confounding factors are prevalent.