Consistent Cross Validation for Community Detection

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


Consistent Cross Validation for Community Detection

Published Date




Thesis or Dissertation


The stochastic block model (Holland et al., 1983) is one of the most popular models for analyzing networks with complicated community structures. While most research in this area focuses on finding robust and efficient algorithms to detect those communities, we are interested in determining how many communities in the network using the cross validation method. In this dissertation, we introduce two new cross validation methods for community detection using different network splitting strate- gies. We have explored the consistency property for the two new cross-validation methods for community detection. We prove that under some conditions on the net- work and on the clustering algorithm and with a proper choice of the splitting ratio, our cross validation methods can consistently choose the correct community number in probability. It is known that several prevailing clustering algorithms for network analyses meet these conditions and therefore our consistency conclusion is applicable to those algorithms. In addition to pursue the theoretical property, we use simulations to show that the two new methods achieve a good success rate when the network contains a small to moderate number of communities. We found out that the success rate depends on two other factors: how sparse the network is and how imbalanced the community sizes are. Regardless of these factors, our new methods are shown to outperform the existing network cross validation method (Chen and Lei, 2018) in simulations under stochastic block model. Furthermore, we have applied our new methods to analyze two real-life networks: the international trade and the U.S. Congress network. We have obtained interesting results that can be easily interpreted from a practical standpoint.



University of Minnesota Ph.D. dissertation. December 2019. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); viii, 112 pages.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Zhang, Lin. (2019). Consistent Cross Validation for Community Detection. Retrieved from the University Digital Conservancy,

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.