Consistent Cross Validation for Community Detection

2019-12
Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Consistent Cross Validation for Community Detection

Alternative title

Published Date

2019-12

Publisher

Type

Thesis or Dissertation

Abstract

The stochastic block model (Holland et al., 1983) is one of the most popular models for analyzing networks with complicated community structures. While most research in this area focuses on finding robust and efficient algorithms to detect those communities, we are interested in determining how many communities in the network using the cross validation method. In this dissertation, we introduce two new cross validation methods for community detection using different network splitting strate- gies. We have explored the consistency property for the two new cross-validation methods for community detection. We prove that under some conditions on the net- work and on the clustering algorithm and with a proper choice of the splitting ratio, our cross validation methods can consistently choose the correct community number in probability. It is known that several prevailing clustering algorithms for network analyses meet these conditions and therefore our consistency conclusion is applicable to those algorithms. In addition to pursue the theoretical property, we use simulations to show that the two new methods achieve a good success rate when the network contains a small to moderate number of communities. We found out that the success rate depends on two other factors: how sparse the network is and how imbalanced the community sizes are. Regardless of these factors, our new methods are shown to outperform the existing network cross validation method (Chen and Lei, 2018) in simulations under stochastic block model. Furthermore, we have applied our new methods to analyze two real-life networks: the international trade and the U.S. Congress network. We have obtained interesting results that can be easily interpreted from a practical standpoint.

Keywords

Description

University of Minnesota Ph.D. dissertation. December 2019. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); viii, 112 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Zhang, Lin. (2019). Consistent Cross Validation for Community Detection. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/211826.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.