Department of Statistics Colloquium Series

Tuesday, January 16, 2018

11:00 AM12:00 PM

Sycamore Hall, Room 0008

Speaker:   Jason Klusowski, Department of Statistics and Data Science, Yale University

Title:  Counting motifs and connected components of large graphs via subgraph sampling

Abstract:  Learning properties of large graphs from samples is an important problem in statistical network analysis. We revisit the problem of estimating the number of connected components in a graph of $N$ vertices based on the subgraph sampling model, where we observe the subgraph induced by $n$ vertices drawn uniformly at random. The key question is whether it is possible to achieve accurate estimation, i.e., vanishing normalized mean-square error, by sampling a vanishing fraction of the vertices. We show that it is possible by accessing only sublinear number of samples if the graph does not contains high-degree vertices or long induced cycles; otherwise it is impossible. We obtain optimal sample complexity bounds for several classes of graphs including forests, cliques, and more generally, chordal graphs, achieved by linear-time estimators. The methodology relies on topological identities of graph homomorphism numbers. They, in turn, also play a key role in proving minimax lower bounds based on constructing random instances of graphs with matching structures of small subgraphs. We will also discuss results for the neighborhood sampling model, where we additionally observe the edges between the sampled vertices and their neighbors. In this setting, we will show how to construct optimal estimators of motif counts that are adaptive to certain unknown graph parameters