Visiting Assistant Professor Qingsong Shan, Department of Statistics, Indiana University
The Measures of Dependence through Copulas
Traditional measures of associations between random variables, like Pearson’s correlation coefficient, Spearman’s ρ and Kendall’s τ, measure linear or monotonic relationships, not suitable for nonlinear cases. Here measures for functional relationship, which could be either linear or nonlinear, will be provided. These measures are constructed based on copulas. To apply those measures to sample data, several estimators are suggested.
Mladen Kolar, Assistant Professor of Econometrics and Statistics at the University of Chicago Booth School of Business
ROCKET: Robust Confidence Intervals via Kendall's Tau for Transelliptical Graphical Models
Undirected graphical models are used extensively in the biological and social sciences to encode a pattern of conditional independences between variables, where the absence of an edge between two nodes a and b indicates that the corresponding two variables Xa and Xb are believed to be conditionally independent, after controlling for all other measured variables. In the Gaussian case, conditional independence corresponds to a zero entry in the precision matrix Ω (the inverse of the covariance matrix Σ). Real data often exhibits heavy tail dependence between variables, which cannot be captured by the commonly-used Gaussian or nonparanormal (Gaussian copula) graphical models. In this paper, we study the transelliptical model, an elliptical copula model that generalizes Gaussian and nonparanormal models to a broader family of distributions. We propose the ROCKET method, which constructs an estimator of Ωab that we prove to be asymptotically normal under mild assumptions. Empirically, ROCKET outperforms the nonparanormal and Gaussian models in terms of achieving accurate inference on simulated data. We also compare the three methods on real data (daily stock returns), and find that the ROCKET estimator is the only method whose behavior across subsamples agrees with the distribution predicted by the theory.
Assistant Professor Claudio Fuentes, Department of Statistics, Oregon State University
Joint Confidence Intervals for the Selected Population Means
Consider an experiment in which p independent treatments or populations pi_i, with corresponding unknown means theta_i are available and suppose that for every population we can obtain a random sample. In this context, researchers are sometimes interested in selecting the populations that give the largest sample means as a result of the experiment, and to estimate the corresponding population means theta_i's. In this talk, we present a frequentist approach to the problem and discuss how to construct confidence intervals for the mean of the selected populations, assuming the populations pi_i are independent and normally distributed with a common variance sigma^2.
Associate Professor Minh Tang, Department of Applied Mathematics & Statistics, Johns Hopkins University
Two-sample hypothesis testing for random graphs
Two-sample hypothesis testing for random graphs arises naturally in neuroscience, social networks, and machine learning. In this talk, we examine semiparametric and nonparametric problems of two-sample inference for a class of latent position random graphs. We formulate a pair of test statistics and then demonstrate how our test procedure can be applied to identify similarities in neural connectome graphs and to perform community detection and classification in large social networks.
Hao Zhang, Professor and Head of Statistics, Purdue University
The Role of Covariances in Spatial Statistics
Covariances play a vital role in spatial statistics because spatial data are correlated in most situations. It is well understood how covariances affect kriging (i.e., the best linear unbiased prediction). Such an understanding helps with the development of statistically and computationally efficient algorithms for estimation and prediction in spatial statistics. However, relatively little is known for cokriging in the multivariate case. In this talk, I will review some interesting and key facts that are known for kriging and present a new result for cokriging and some open problems.
Assistant Professor Martha White, Department of Computer Science and Informatics, Indiana University
Stochastic approximation for representation learning
Learning new representations in machine learning is often tackled using a factorization of the data. Recently, there have been some insights that alternating minimization for certain factorization problems results in global solutions; however, these insights have not been investigated for stochastic gradient descent, where samples are processed incrementally. To practically process large amounts of data, however, incremental algorithms are key. In this talk, I will discuss both theoretical and empirical evidence that a certain class of such models (called regularized factor models) has this nice property that local alternating minimization results in global solutions. I will discuss insights into how to carefully specify an objective for effective incremental estimation with stochastic approximation.
Professor Michael Trosset, Department of Statistics, Indiana University
Professor Carey Priebe, Department of Applied Mathematics & Statistics, Johns Hopkins University.
Learning Statistical Manifolds for Subsequent Inference: A Duet
We describe 1-sample tests that exploit the Riemannian structure of parametric statistical manifolds. This structure is induced by Fisher information, or, equivalently, by Hellinger distance. Thus, the information distance between two distributions is the geodesic distance between them, and our test statistic is the information distance between the null distribution and the distribution of minimal Hellinger distance from the empirical distribution. We describe some asymptotic properties of this test, then consider problems for which the parametric statistical manifold is unknown. If the manifold can be sampled, then it may be possible to learn about its Riemannian structure. We use a variant of Isomap to obtain regularized estimates of the information distance. Examples demonstrate that one can increase power by learning an unknown low-dimensional manifold instead of relying on a known manifold of higher dimension.