Fall 2016 Colloquium Series

Monday, October 10, 2016

4:00 PM



Assistant Professor Minje Kim, Department of Intelligent Systems Engineering, School of Informatics and Computing, Indiana University

Bitwise Neural Networks

In Bitwise Neural Networks (BNN), all the input, hidden, and output nodes are binaries (+1 and -1), and so are all the weights and bias. BNNs are spatially and computationally efficient in implementations, since (a) we represent a real-valued sample or parameter with a bit (b) the multiplication and addition correspond to bitwise XNOR and bit-counting, respectively. Therefore, BNNs can be used to implement a deep learning system in a resource-constrained environment, so that we can deploy a deep learning system on small devices without using up the power, memory, CPU clocks, etc. In this talk, a training scheme for BNNs will be presented, which is based on a straightforward extension of backpropagation using some careful initialization and quantization noise injection techniques. BNNs show comparable classification accuracies for the MNIST handwritten digit recognition task. Also, a bitwise denoising autoencoder can be trained to produce a cleaned-up speech spectrum from an input noisy speech spectrum as well.



Visiting Assistant Professor Sayed Mostafa, Department of Statistics, Indiana University

Kernel Density Estimation Using Auxiliary Information from Complex Surveys

Auxiliary information is widely used in sampling surveys to enhance the precision of estimators of finite population parameters, such as the finite population mean, percentiles, and distribution function. In this talk, I will present some attempts towards using auxiliary information effectively in kernel density estimation from complex survey data. Two approaches are used to develop three new kernel density estimators that use both auxiliary information and sample information in the framework of complex surveys. The statistical properties of these estimators are studied under a combined design-model-based inference framework which accounts for the underlying superpopulation model as well as the randomization distribution induced by the sampling design. The asymptotic normality of each estimator is derived. A global error criterion is used to determine the, asymptotically, optimal smoothing parameter for each estimator. Data-driven bandwidth estimators are obtained using the plug-in technique. The finite sample properties of the proposed estimators are addressed via a simulation study. The performance of the new estimators is compared with that of standard estimators which ignore the auxiliary information.

Bio: Dr. Mostafa earned his Ph.D in Statistics from Oklahoma State University. He has research interests in the areas of sampling surveys and nonparametric curve fitting.



Professor Mario Peruggia, Department of Statistics, Ohio State University

Reconciling Two Popular Approaches for Summarizing Case Influence in Bayesian Models

Methods for summarizing case influence in Bayesian models take essentially two forms: (1) use common divergence measures for calculating distances between full-data posteriors and case-deleted posteriors, and (2) measure the impact of infinitesimal perturbations to the likelihood to gain information about local case influence.  Methods based on approach (1) lead naturally to considering the behavior of case-deletion importance sampling weights (the weights used to approximate samples from the case-deleted posterior using samples from the full posterior).  Methods based on approach (2) lead naturally to considering the curvature of the Kullback-Leibler divergence of the full posterior from a geometrically perturbed quasi-posterior.  By examining the connections between the two approaches, we establish a rationale for employing low-dimensional summaries of case influence that are obtained entirely via the variance-covariance matrix of the log importance sampling weights.

This is joint work with Zachary Thomas and Steven MacEachern.



Assistant Professor Tor Lattimore, Department of Computer Science, Indiana University

An Instance Optimal Algorithm for Finite-Armed Stochastic Bandits

Finite-armed stochastic bandits are the simplest model of sequential optimisation exhibiting the exploration-exploitation dilemma, where an agent wishing to maximise her cumulative payoff must carefully balance exploring infrequently chosen actions and exploiting those that appeared best in the past. Although this problem has been studied for nearly a century, there remain open problems. In this talk I present the first strategy that (in terms of its regret) is simultaneously optimal asymptotically, optimal up to constant factors in the minimax sense, and even instance-optimal in finite-time.



Associate Professor Chris Hans, Department of Statistics, Ohio State University

Science-driven Regression with R-prior Distributions

We investigate prior distributions that are designed to incorporate information about the strength of a regression relationship.  The most commonly-used prior distributions for regression models typically assume that coefficients are a priori independent or induce dependence via the empirical design matrix.  While these standard priors (and recently-refined versions of them) may exhibit desirable behavior with respect to targeted inferential goals, we should not expect them to distribute probability throughout the entire parameter space in a way that is consistent with all of our prior beliefs.  Examination reveals that when we focus on the strength of the regression relationship, standard priors place nearly all of their mass in regions of the parameter space that are not only inconsistent with reasonable prior belief but are nearly certain to clash so greatly with the likelihood that we might question the validity of particular inferences.

We describe a new class of priors that allows one to directly incorporate information about the strength of the regression relationship.  We compare the Bayesian model-uncertainty properties of our priors with those of standard priors, highlighting the consequences of inappropriately ignoring prior information when it is indeed available, and highlighting the consequences of unintentionally incorporating strong prior information when it does not exist.  We describe MCMC algorithms that scale well with model size and require minimal storage by using a fixed-dimensional parameterization across models of different sizes.  We discuss several strategies for improving MCMC output-based estimation using the structure of the posterior.

This is joint with work Steven MacEachern and Agniva Som.