On the role of private data in research

Note: this is a reposted version of a verbal intervention at the Economic Policy Panel meeting in 2021 discussing some institutional considerations in light of research work that was presented that leverages private data for research.

The COVID-19 pandemic led to an unprecedented economic policy response across much of the world. A broad range of conventional and unconventional economic policy instruments were deployed and the research community will spend the next years parsing through the wealth of research questions that can be studied through the data generated by these policy interventions. The present paper is a nice illustration of the type of research that can be carried out in a real-time fashion. It is the first and foremost a cleanly executed descriptive analysis leveraging confidential micro data from one of Spain’s largest banks. Motivated by concerns about COVID-19’s impact on inequality, it suggests that the significant expansion of government transfers mostly offset income losses that arose due to the public health crisis. The authors carefully document how the fiscal response in Spain notably limited the first-order negative impacts on income and, to a significant extent, on inequalities in particular among the young, foreign-born and individuals residing in regions dependent on tourism. While a lot of important auxiliary questions cannot be studied with the data available, for example, on the extent to which the fiscal support measures interact with highly dual labour markets—due to the lack of auxiliary information on the individual bank account holders, it nevertheless provides a glimpse of the type of work that private data enable researchers to carry out in the future. This, however, raises questions going beyond this particular contribution.

1. On the social value of privately held data

The pandemic highlighted notable deficiencies across countries in their ability to draw on data to arrive at real-time evidence to inform policy-making. Much of this data is increasingly controlled by private firms, which raises important questions about whether the underlying data resources are underutilized resulting in inefficiently low levels of production of knowledge public goods. Such knowledge public goods could have broad societal benefit, but may yield little monetary benefit to the respective data owners, who may simply lack the human capital required to carry out basic research. The present paper captures an interesting use case in which researchers from universities teamed up with economists within a bank to conduct important cutting edge timely research. But it also highlights a central challenge: the research benefits of data access typically grow non-linearly in the ability to merge said data with other data sources. Developing an institutional environment and positive incentives for private firms to donate and make accessible data for research use, while maintaining confidentiality and ensuring consent of individuals could prove an important avenue to boost a broad range of social and economic outcomes of the production of knowledge public goods.

2. Institutional Considerations

The research community is already using private data to produce knowledge as the present paper illustrates and the pandemic appears to have accelerated this trend, as illustrated in Figure 1. Yet, this also raises further challenges that may warrant an institutional intervention championed by the professional associations. For example, it is not necessarily given that the allocation of data to researchers or research ideas is efficient. Even in the present paper, social capital was a vital input to ensure data could be opened up for academic use, which may help in navigating legal issues which are often challenging, but naturally, this limits access. Similarly, increasing research transparency may be curtailed by the use of private data shared on exclusive terms with individual researchers, creating notable barriers to ensure to the reproducibility of research. All these factors may contribute to further inequalities in the profession that are not desirable. Professional associations publishing some of the leading academic journals could step in to shape incentives and ensure public recognition, for example, by developing a secure research infrastructure on which said data could be hosted as a precondition to publication, which in turn, could be used to broadening access to data. At the same time, such an infrastructure could be used to sharpen incentives to ensure that individuals who worked hard to secure access to certain pieces of private data can maintain some of the associated academic returns.

Figure 1: Number of Working Papers Submitted by Quarter to CEPR and NBER that mention keywords related to “private data”

Notes: The figure plots the number of papers included in the CEPR and NBER working paper series that mention a range of keywords indicative that private proprietary data were leveraged for the analysis. The plot was developed jointly with Abi Adams-Prassl (Oxford).