A new class of estimation of parameters is proposed for data mining, analysis and modeling of massive datasets. With the expansion of Information Technology, the present problem with many scientists is the analysis and modeling with extremely large databases, sometime refers to as data mining or knowledge discovery in databases. It was found that many attempts used to solve this problem were based on classical approaches such as regression, classification and multivariate techniques, and even summary statistics such as mean and standard deviations are still having problem of estimation with extremely large datasets. Because classical statistical approaches were developed historically to cater the limited availability of data, they do not intend to solve the problem with massive dataset. In this study, certain properties of sub-totaling and repeated estimation of population parameters were used to establish a new statistical method for estimating summary characteristics of populations, and relationships between variables with extremely large datasets. While the method has straightforward applications in data mining and analysis of large databases, it poses the significance of further statistical research.
Institute for Mathematics and Its Applications>IMA Preprints Series
On Parameters Repeated Estimation Methods (PREM's Method) and its applications in data mining.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.