ERC Consolidator Grant High Dimensional Inference for Panel and Network Data

drawing drawing

Team

Motivation and Goal

The classic data structures that are discussed extensively in the Econometrics literature are either cross-sectional, time-series, or panel data (also called longitudinal data). The basic statistical methods that we use to analyze those data structures have been developed in the 20th century, driven by the availability of corresponding datasets. For example, the systematic collection of stock market data started in the early 20th century, accurate national accounts became increasingly available since the 1930s, microeconomic survey data have been collected systematically since the 1940s, and large longitudinal surveys of households were started in the 1960s.

The trend to increased data availability in the Social Sciences has accelerated in the past decades: Better computer and storage capabilities allow to record and manage much larger datasets, and to access them more easily. We all create digital data footprints on a daily basis, from bank transactions to social network data. New machine learning methods allow to quantify everything from satellite images to legal documents, thus creating structured information out of unstructured raw data. And of course, there have been many conscious efforts of scientists and policy makers to collect larger and better datasets.

Those modern datasets often have a more complicated internal structure that cannot be accurately classified simply as cross-sectional data, time-series data, or panel data. Instead, the structure of the data and of the models that we use to analyze them increasingly have the following characteristics:

The main goal of this research project is to develop robust inference methods for such sparse panel and network datasets described above. This requires to establish a mathematical representation of the network that allows to formalize asymptotic inference results for sequences of growing networks. In addition, new bias correction and robust standard error estimation methods will be developed that account for the sparsity structure of the data. We will also advance more parsimonious modeling and estimation approaches for situations where the data are otherwise uninformative for the parameters of interest.

A first step towards understanding the connection between the network structure of the underlying dataset with the precision of statistical inference was made in this paper: Fixed-effect regressions on network data. This project aims to extend those results to more general data structures and models. For some models, this means to first establish more credible inference and improved bias correction results for classical (non-sparse) panel and network datasets, but ultimately our goal is to tackle sparse panel and network data.

Publications

Working Papers

Statistical software packages based on the above publications and working papers

Selected Presentations

Conferences