Assistant Professor Andrew Womack, Department of Statistics, Indiana University
A Bayesian Considers Sampling
Recently, I have had cause to consider two sampling problems: 1) how to design an optimal sample from a population under ignorable sampling and 2) how to account for non-ignorable sampling during data analysis. In this talk, I will present my thoughts on both problems from a Bayesian and Information Theoretic perspective. In the case of sampling design, I derive a concave utility function on the simplex whose maximizer defines the optimal sampling design for a given inferential problem. A first result from this utility is the derivation of simple random sampling as an optimal sampling scheme for certain inferential problems. In order to address non-ignorable sampling, I derive a MCMC approach to the inferential problem. Under certain conditions on the sampling bias, a modification of the methodology can be made reasonably computationally tractable.
Research Assistant Professor Suriya Gunasekar, Toyota Technology Institute at Chicago
Implicit Regularization in Matrix Factorization
This talk will explore ideas on "implicit regularization" in under-determined problems where the optimization objective has multiple global minima. We specifically study optimization of a quadratic loss over a matrix with gradient descent on the factorized space. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.
Visiting Assistant Professor Adam Jaeger, Department of Statistics, Indiana University
Split Sample Empirical Likelihood
Empirical likelihood offers a nonparametric approach to estimation and inference, which replaces the probability density based likelihood function with a function defined by estimating equations. While this eliminates the need for a parametric specification, the restriction of numerical optimization greatly decreases the applicability of empirical likelihood for large data problems. We introduce the split sample empirical likelihood; this variant utilizes a divide and conquer approach, allowing for parallel computation of the empirical likelihood function. We provided theoretical results showing that the asymptotic distribution of the estimators and test statistics derived from the split sample empirical likelihood, and through a simulation study demonstrate the reduced computation time.
Assistant Professor Daniel McDonald, Department of Statistics, Indiana University
Compressed and penalized linear regression
Modern applications require methods that are computationally feasible on large datasets but also preserve statistical efficiency. Frequently, these two concerns are seen as contradictory: approximation methods that enable computation are assumed to degrade statistical performance relative to exact methods. In applied mathematics, where much of the current theoretical work on approximation resides, the inputs are considered to be observed exactly. The prevailing philosophy is that while the exact problem is, regrettably, unsolvable, any approximation should be as small as possible. However, from a statistical perspective, an approximate or regularized solution may be preferable to the exact one. Regularization formalizes a trade-off between fidelity to the data and adherence to prior knowledge about the data-generating process such as smoothness or sparsity. The resulting estimator tends to be more useful, interpretable, and suitable as an input to other methods.
In this work, we propose new methodology for estimation and prediction under a linear model borrowing insights from the approximation literature. We explore these procedures from a statistical perspective and find that in many cases they improve both computational and statistical performance.