**Speaker: **Raghu Pasupathy, Associate Professor, Department of Statistics, Purdue University

**Title:** Second-Order Adaptive Sampling Recursion and the Effect of Curvature on Convergence Rate

**Abstract:** The Stochastic Approximation (SA) recursion $X_{k+1} = X_k - \gamma_k \nabla F(X_k), k=1,2,\ldots$, also known as Stochastic Gradient Descent (SGD), is the ``workhorse" recursion for solving stochastic optimization problems. For smooth, strongly convex objectives, it is well-known that SA achieves the optimal convergence rate (the so-called information theoretic Cramer-Rao lower bound) to the stationary point when the step size $\gamma_k = \theta k^{-1}$, assuming the constant $\theta$ is chosen to be larger than the inverse of the smallest eigen value of the Hessian of the objective function at the optimum. When this condition on $\theta$ is violated, SA's convergence rate deteriorates rapidly, a situation that seems to manifest frequently in practice because the eigen values of the Hessian at the optimum are rarely known in advance. While the natural remedy to this conundrum is to estimate the eigen values of the Hessian, this is a computationally expensive step in high dimensions. We remedy this impasse using an adaptive sampling quasi-Newton recursion where the inverse Hessian and gradient estimates are constructed using prescribed but different sample sizes, and potentially updated on different timescales. A fine analysis of the resulting recursion reveals that the Cramer-Rao information-theoretic bound is attained with only a ``light updating" of the Hessian inverse, as long as the sample sizes to update the gradient increase at a specific rate. Moreover, unlike in optimal SGD, none of the optimal parameter prescriptions within the proposed procedure depend on unknown curvature constants. To fully illuminate the effect of (a) dimension, (b) computation, and (c) sampling, we express all convergence rates in terms of a work measure that includes all such costs. Our theory and illustrative numerical examples are consistent with success reported by a number of recent adaptive sampling heuristics.