Statistics for Effective Document Clustering for Large Heterogeneous Law Firm Collections