RandNLA for Generalized Linear Models with Big Datasets

Published in UPF/UAB Public Online Repository, 2017

Recommended citation: Lange, Robert Tjarko. (2017). "Randomized Numerical Linear Algebra for Generalized Linear Models with Big Datasets." UPF/UAB Public Online Repository.

Download paper here.

This working paper is the result of my Masters project at Barcelona GSE supervised by Omiros Papaspiliopoulos (UPF) and Ioannis Kosmidis (Warwick). Building on original work by Michael Mahoney and Petros Drineas we extend the framework of Randomized Numerical Linear Algebra to Generalized Linear Models. We show how powerful ideas such as randomized subsampling and projections can be used to quickly approximate a GLM estimator in the large data regime. You can find all code and materials to reproduce the results in the GitHub repository. Furthermore, you can check out my poster presented at the DS^3 Data Initiative Summer School in Paris, France.