4/22/19
Raghu Pasupathy, Associate Professor, Department of Statistics, Purdue University
Second-Order Adaptive Sampling Recursion and the Effect of Curvature on Convergence Rate
The Stochastic Approximation (SA) recursion $X_{k+1} = X_k - \gamma_k \nabla F(X_k), k=1,2,\ldots$, also known as Stochastic Gradient Descent (SGD), is the ``workhorse" recursion for solving stochastic optimization problems. For smooth, strongly convex objectives, it is well-known that SA achieves the optimal convergence rate (the so-called information theoretic Cramer-Rao lower bound) to the stationary point when the step size $\gamma_k = \theta k^{-1}$, assuming the constant $\theta$ is chosen to be larger than the inverse of the smallest eigen value of the Hessian of the objective function at the optimum. When this condition on $\theta$ is violated, SA's convergence rate deteriorates rapidly, a situation that seems to manifest frequently in practice because the eigen values of the Hessian at the optimum are rarely known in advance. While the natural remedy to this conundrum is to estimate the eigen values of the Hessian, this is a computationally expensive step in high dimensions. We remedy this impasse using an adaptive sampling quasi-Newton recursion where the inverse Hessian and gradient estimates are constructed using prescribed but different sample sizes, and potentially updated on different timescales. A fine analysis of the resulting recursion reveals that the Cramer-Rao information-theoretic bound is attained with only a ``light updating" of the Hessian inverse, as long as the sample sizes to update the gradient increase at a specific rate. Moreover, unlike in optimal SGD, none of the optimal parameter prescriptions within the proposed procedure depend on unknown curvature constants. To fully illuminate the effect of (a) dimension, (b) computation, and (c) sampling, we express all convergence rates in terms of a work measure that includes all such costs. Our theory and illustrative numerical examples are consistent with success reported by a number of recent adaptive sampling heuristics.
***
4/15/19
Todd Kuffner, Assistant Professor, Department of Mathematics and Statistics, Washington University in St. Louis
Principled Statistical Inference in Data Science
Statistical reasoning and statistical inference have strong historical connections with philosophy of science. In this talk, the new paradigm of data-driven science is examined through comparison with principled statistical approaches. I will review the merits and shortcomings of principled statistical inference. Time permitting, I will offer some comments on post-selection inference, inference for black box algorithms, and a survey of future challenges.
***
4/8/19
Xixi Hu, Ph.D. Student, Department of Statistics, Indiana University
Two-sample tests for latent space model graphs
Two-sample graph comparison arises in neuroscience, e.g., when comparing the brain networks of persons from two populations. Standard practices include tests on aggregated univariate measures, which may not fully characterize a graph, or one test for each edge, which does not exploit graph structure. Assuming a latent space model (LSM) for random graphs, we propose an inferential framework that both compares entire graphs and utilizes underlying structure. Graphs in LSM are described by a set of latent positions and a function that maps distance to edge probability. Our method estimates two sets of common latent positions by multidimensional scaling, then constructs a test statistic by comparing interpoint distance matrices. Simulations show that our test has greater power than either the standard practices or a direct comparison of sample mean graphs. The talk will end with discussions on potential improvement over the estimate.
***
4/1/19
Memo Dalkilic, Associate Professor of Informatics and Computing, Indiana University
Using Data Analysis to Optimize Public Transportation on a College Campus
Using a large volume of bus data in the form of GPS coordinates (over 100 million data points) and automated passenger count data (over 1 million data points) we have developed (1) a system of analysis and prediction of future public transportation demand (2) a new model that uses concepts specific to college campuses that maximizes passenger satisfaction. Using these concepts we improve service of a model college public transportation service and more specifically the Indiana University Campus Bus Service (IUCBS).
***
3/18/19
Zikun Yang, Ph.D. Student, Department of Statistics, Indiana University
Bayesian Locally Adaptive Models
Locally adaptive shrinkage in the Bayesian framework is achieved through the use of local-global prior distributions that model both the global level of sparsity as well as individual shrinkage parameters for mean structure parameters. The most popular of these models is the Horseshoe prior and its variants due to their spike and slab behavior involving an asymptote at the origin and heavy tails. In this talk, we present an alternative Horseshoe prior that exhibits both a sharper asymptote at the origin as well as heavier tails, which we call the Heavy-tailed Horseshoe prior. We prove that mixing on the shape parameters provides improved spike and slab behavior as well as better reconstruction properties than other Horseshoe variants. A simulation study is provided to show the advantage of the heavy-tailed Horseshoe in terms of absolute error to both the truth mean structure as well as the oracle.
***
3/4/19
Professor Roni Khardon, Department of Computer Science, Indiana University
Bounding the Expected Error of Approximate Bayesian Learning Algorithms
A significant portion of work in machine learning uses Bayesian models and their inference algorithms to solve data analysis problems. However, such work often uses mis-specified models. Moreover, the need to analyze large scale data often precludes exact inference and requires some approximations. While such methods have been shown to be effective in practice, theoretical analysis of their performance is not well developed.
The talk will discuss the use of approximations and what kind of guarantees we can hope to have on their performance. It will then show how a perspective from learning theory can yield some initial answers. More concretely, our recent results show that the expected error of certain variational approximations can be quantitatively bounded relative to the error of the best parameter estimate in the same model. This gives a strong justification for the use of such methods as a guard against overfitting.
Based on joint work with Rishit Sheth.
***
2/18/19
Dr. Daniel Manrique Vallier, Dr. Andrew Womack, Lei Ding, Miguel Pebes Trujillo, Department of Statistics, Indiana University
The Deloitte & Touche Project for Statistical Decision Analysis in Auditing
Audits attempt to discover financial errors and improprieties. Standard auditing practices, which have not advanced substantially since the 1980s, rely heavily on random sampling and lack principled ways of incorporating additional information (e.g., historical data, auditor intuition) into audits. The Project for Statistical Decision Analysis in Auditing, a collaboration between the accounting form of Deloitte & Touche and Indiana University, seeks to advance the development of new auditing methods, leading to a new generation of auditing practices.
An audit results in a statement that expresses the extent of the auditing firm's belief in the accuracy of a collection of financial records. In the language of statistics, such a statement describes a subjective probability, a hallmark feature of the Bayesian approach to statistical inference. Bayesian inference provides an intellectual framework in which multiple sources of information can be combined to update prior information about a given proposition, e.g., that the collection of financial records contains no substantive errors. This framework depends on modeling the sources of information as probability distributions, then using the mathematical properties of probability to draw conclusions about the propositions of interest. While several papers in the 1980s noted the relevance of Bayesian inference to the concerns of auditing, there have been no serious attempts to implement Bayesian methods in the field.
To develop a Bayesian approach to auditing, it is necessary to construct a mathematical model of auditing procedures and the phenomena they investigate. Doing so is a highly nontrivial undertaking that requires the substantial involvement of highly trained statisticians with special expertise in developing such models.
***
2/11/19
Amy Willis, Assistant Professor, Department of Biostatistics, University of Washington
Estimating diversity and relative abundance in microbial communities
High-throughput sequencing has advanced our understanding of the role that bacteria and archaea play in marine, terrestrial and host-associated health. Microbial community ecology differs in many ways from macroecology, and therefore new statistical methods are required to analyze microbiome data. In this talk I will present two new statistical methods for the analysis of microbiome data. The first, DivNet, estimates the diversity of microbial communities, and the second, corncob, estimates the relative abundance of microbial strains, metabolites, or genes. Both methods explicitly model microbe-microbe interactions, resulting in larger (but more accurate) estimates of variance compared to classical models. The methods will be illustrated with an analysis of the effects of wildfire on soil microbial communities in the northwestern Canadian boreal forest.