Fall 2017 Colloquium Series

Monday, September 18, 2017

4:00 PM



David Friedenberg, Principal Research Statistician at Battelle

Developing brain-computer interface algorithms to meet user  performance expectations

Brain-computer interface (BCI) systems that directly link neural activity to assistive devices have shown great promise for improving the daily lives of individuals with paralysis.  Using BCIs, individuals with tetraplegia have demonstrated control of computer cursors, robotic arms, communication devices, and their own paralyzed limbs through imagined movements.  In anticipation of these technologies transitioning from the laboratory setting to everyday usage, several groups have surveyed potential BCI users to identify patient priorities and desired characteristics for a clinically viable BCI. This talk will focus on the neural decoding algorithms used in BCIs and how careful algorithm choices can help meet patient requirements, moving the field closer to devices that can help patients. We will discuss recent results using statistical and machine learning methods to decode neural activity enabling a paralyzed man to regain partial control of his hand. This work is part of a clinical trial that is jointly led and funded by Battelle and The Ohio State University Wexner Medical Center.



Assistant Professor Mauricio Sadinle, Department of Biostatistics, School of Public Health, University of Washington

Nonparametric Identified Methods to Handle Nonignorable Missing Data

There has recently been a lot of interest in developing approaches to handle missing data that go beyond the traditional assumptions of the missing data being missing at random and the nonresponse mechanism being ignorable.  Of particular interest are approaches that have the property of being nonparametric identified, because these approaches do not impose parametric restrictions on the observed-data distribution (what we can estimate from the observed data) while allowing the estimation of a full-data distribution (what we would ideally want to estimate).  When comparing inferences obtained from different nonparametric identified approaches, we can be sure that any discrepancies are the result of the different identifying assumptions imposed on the parts of the full-data distribution that cannot be estimated from the observed data, and consequently these approaches are especially useful for sensitivity analysis.  In this talk I will present some recent developments in this area of research and discuss current challenges.



Assistant Professor Benjamin Risk, Department of Biostatistics and Bioinformatics Rollins School of Public Health, Emory University

Statistical impacts of simultaneous multislice imaging and implications for experimental design in fMRI

In fMRI, simultaneous multislice (i.e., multiband) imaging collects data from multiple slices at the same time, which decreases the time between acquisition of volumes. Simultaneous multislice (SMS) is gaining popularity because it can increase statistical power by boosting the effective sample size and facilitating the removal of higher-frequency artifacts. The technique requires an additional processing step in which the slices are separated, or unaliased, to recover the whole brain volume. However, this may result in signal “leakage” between aliased locations. SMS can also lead to noise amplification, which reduces power benefits. Previous studies have generally found that SMS imaging results in higher test statistics. We disentangle this phenomenon into true and false positives. Studies optimizing power may inadvertently optimize signal leakage, and in particular, increasing the sample size in an fMRI time series with SMS can inflate false positives. We examine signal leakage and noise amplification in data from the Human Connectome Project. We discuss how to choose acquisition protocols and reconstruction algorithms to improve experimental design in brain mapping.



Luther Dana Waterman Professor Richard Shiffrin and Ph.D. Student Suyog Chandramouli, Department of Psychological & Brain Sciences, Indiana University

Bayesian Inference and Bayesian Model Selection

Bayesian inference is the best method we now have to draw conclusions about the state of the world from what is always noisy scientific data. Scientific models are always wrong and are always based on data. Yet present Bayesian methods treat the models as primary (and ‘true’) and the data as secondary. We describe an extended form of Bayesian inference that reverses this approach and carries out inference on data, actually on possible probability distributions for data outcomes of an experiment. Priors are placed on these distributions and posteriors calculated based on the observed data. Models are assessed by the degree to which their predicted data distributions match the posterior data distributions. The theory is remarkably simple, but the number of possible data distributions makes computation impossible. We end by describing ways to constrain and parameterize possible data distributions that ought to make the approach practical.



Assistant Professor Franco Pestilli, Department of Psychological & Brain Sciences, Indiana University

A sparse tensor decomposition method for approximation of linear models of diffusion-weighted MRI and tractography

Recently, linear formulations and convex optimization methods have been proposed to predict diffusion-weighted Magnetic Resonance Imaging (dMRI) data given estimates of brain connections generated using tractography algorithms. The size of the linear models comprising such methods grows with both, dMRI data and connectome resolution, and can become very large for application to modern data. In this paper, we introduce a method to predict dMRI signals for potentially very large connectomes, i.e. composed by hundred of thousand to millions of fascicles (bundles of neuronal axons), by using a sparse tensor decomposition. We show that this tensor decomposition accurately approximates the Linear Fascicle Evaluation (LiFE) model, one of the recently developed linear models. We provide a theoretical analysis of the accuracy of the sparse decomposed model, LiFESD, and demonstrate that it can reduce the size of the model significantly. Also, we develop algorithms to implement the optimization solver using the tensor representation in an efficient way.



Associate Professor Keli Xu, Department of Economics, Indiana University

Inference of Long-Horizon Predictability

Examination over multiple horizons has been a routine in testing asset return predictability in finance and macroeconomics. In a simple predictive regression model, we find that the popular scaled test for multiple-horizon predictability has zero null rejection rate if the forecast horizon increases at a faster rate than the inverse of proximity of the predictor autoregressive root to the unity. Correspondingly, the scaled test has zero power for long horizons, e.g. if the horizon increases faster than n^{1/2}, where n is the sample size, when the predictor is stationary. The t-test based on an implication of the short-run model, with Bonferroni correction we suggest, is shown to have controlled size agnostic of persistence of the predictor, and is uniformly more powerful than the robust scaled test. Simulation experiments support the asymptotic results and show substantial power gain of the implied test over various other tests. In the empirical application, we re-examine predictive ability of the short interest and the dividend-price-ratio for aggregate equity premium.



Assistant Professor Andrew Womack, Department of Statistics, Indiana University

A Bayesian Considers Sampling

Recently, I have had cause to consider two sampling problems: 1) how to design an optimal sample from a population under ignorable sampling and 2) how to account for non-ignorable sampling during data analysis. In this talk, I will present my thoughts on both problems from a Bayesian and Information Theoretic perspective. In the case of sampling design, I derive a concave utility function on the simplex whose maximizer defines the optimal sampling design for a given inferential problem. A first result from this utility is the derivation of simple random sampling as an optimal sampling scheme for certain inferential problems. In order to address non-ignorable sampling, I derive a MCMC approach to the inferential problem. Under certain conditions on the sampling bias, a modification of the methodology can be made reasonably computationally tractable.



Research Assistant Professor Suriya Gunasekar, Toyota Technology Institute at Chicago

Implicit Regularization in Matrix Factorization

This talk will explore ideas on "implicit regularization" in under-determined problems where the optimization objective has multiple global minima. We specifically study optimization of a quadratic loss over a matrix with gradient descent on the factorized space. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.



Visiting Assistant Professor Adam Jaeger, Department of Statistics, Indiana University

Split Sample Empirical Likelihood

Empirical likelihood offers a nonparametric approach to estimation and inference, which replaces the probability density based likelihood function with a function defined by estimating equations. While this eliminates the need for a parametric specification, the restriction of numerical optimization greatly decreases the applicability of empirical likelihood for large data problems. We introduce the split sample empirical likelihood; this variant utilizes a divide and conquer approach, allowing for parallel computation of the empirical likelihood function. We provided theoretical results showing that the asymptotic distribution of the estimators and test statistics derived from the split sample empirical likelihood, and through a simulation study demonstrate the reduced computation time.



Assistant Professor Daniel McDonald, Department of Statistics, Indiana University

Compressed and penalized linear regression

Modern applications require methods that are computationally feasible on large datasets but also preserve statistical efficiency. Frequently, these two concerns are seen as contradictory: approximation methods that enable computation are assumed to degrade statistical performance relative to exact methods. In applied mathematics, where much of the current theoretical work on approximation resides, the inputs are considered to be observed exactly. The prevailing philosophy is that while the exact problem is, regrettably, unsolvable, any approximation should be as small as possible. However, from a statistical perspective, an approximate or regularized solution may be preferable to the exact one. Regularization formalizes a trade-off between fidelity to the data and adherence to prior knowledge about the data-generating process such as smoothness or sparsity. The resulting estimator tends to be more useful, interpretable, and suitable as an input to other methods.

In this work, we propose new methodology for estimation and prediction under a linear model borrowing insights from the approximation literature. We explore these procedures from a statistical perspective and find that in many cases they improve both computational and statistical performance.