Tales of Multiple Regression: Informative Missingness, Recommender Systems, and R2-D2
by Professor Howard Bondell
Abstract: In this talk, I discuss two projects tangentially related under the umbrella of high-dimensional regression.
The first part of the talk investigates informative missingness in the framework of recommender systems. In this setting, we envision a potential rating for every object-user pair. The goal of a recommender system is to predict the unobserved ratings in order to recommend an object that the user is likely to rate highly.
A typically overlooked piece is that the combinations are not missing at random. For example, in movie ratings, a relationship between the user ratings and their viewing history is expected, as human nature dictates the user would seek out movies that they anticipate enjoying. We model this informative missingness, and place the recommender system in a shared-variable regression framework which can aid in prediction quality.
The second part of the talk deals with a new class of prior distributions for shrinkage regularization in sparse linear regression, particularly the high dimensional case. Instead of placing a prior on the coefficients themselves, we place a prior on the regression R-squared. This is then distributed to the coefficients by decomposing it via a Dirichlet Distribution.
We call the new prior R2-D2 in light of its R-Squared Dirichlet Decomposition. Compared to existing shrinkage priors, we show that the R2-D2 prior can simultaneously achieve both high prior concentration at zero, as well as heavier tails. These two properties combine to provide a higher degree of shrinkage on the irrelevant coefficients, along with less bias in estimation of the larger signals.