Browsing by Subject "random forest"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Aquifer and Stratigraphy Code Prediction Using a Random Forest Classifier: An Exploration of Minnesota’s County Well Index(2021-05) Thielsen, ChrisWe live in an era of big data, brought on by the advent of automatic large-scale data acquisition in many industries. Machine learning can be used to take advantage of large data sets, predicting otherwise unknown information from them. The Minnesota County Well Index (CWI) database contains information about wells and borings in Minnesota. While a plethora of information is recorded in CWI, some objective codes are missing. A random forest classifier is used to predict aquifer and stratigraphy codes in CWI based on the data provided in drillers’ logs; i.e., before the strata are interpreted by a geologist. We find that by learning from the information written down by the well driller, stratigraphic codes can be predicted with an accuracy of 92.15%. There are 2,600,000 strata recorded in CWI; these codes are not only useful in understanding the geologic history of Minnesota, but also directly inform groundwater models.Item Optimal Treatment Regimes Estimation with Censored Data and Related Topics(2021-06) Sengupta, SanhitaThe thesis is divided in three sections of interconnected topics. Motivated by applications from precision medicine, we consider the problem of estimating an optimal treatment regime (or individual optimal decision rule) based on right-censored survival data. We consider a non-parametric approach that maximizes the expected mean restricted survival time of the potential outcome distribution. Comparing with existing methods, our approach does not need to assume the decision rule belongs to a restricted class (e.g., class of index rules) and can accommodate high-dimensional covariates. We investigate the theory of the estimated optimal treatment regime. Monte Carlo studies and a real data example are used to demonstrate the performance of our proposed method. Random forests are widely used today for various purposes such as regression classification, survival analysis however its theoretical properties are not yet explored completely. We propose a quantile random forest estimator which considers sub-sampling instead of complete bootstrap samples as in Meinshausen[2006]. We study the point wise asymptotics of quantile random forest estimator proposed by rendering it in the framework of U-statistics. We prove point-wise weak convergence to normality and also propose a consistent estimator of the variance. We further explore the asymptotic behavior of the proposed estimator via a simulation study. Measuring the efficacy of a treatment or policy can involve data heterogeneity. In such cases, the entire conditional distributional impact of the treatment is important rather than just a discrete metric such as the average treatment effect. Quantiles inform more about the distribution than an average and multiple quantiles can be used together to get an idea about the entire distribution. In the context of survival analysis with censored data, we propose a quantile regression model estimated using survival random forest. We further extend this to estimate quantile treatment effects under censoring. We show the efficacy of the proposed method via simulations. We also demonstrate using this method and interpreting quantile effect by analysing a colon cancer dataset.