------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset Data for: Tree-planting programs in Himachal Pradesh India 2019 2. Author Information Principal Investigator Contact Information Name: Pushpendra Rana Institution: Indian Forest Service Address: Himachal Pradesh Forest Department Email: pranaifs27@gmail.com ORCID: 0000-0001-8626-3351 Associate or Co-investigator Contact Information Name: Forrest Fleischman Institution: University of Minnesota Address: Department of Forest Resources Email: ffleisch@umn.edu ORCID: 0000-0001-6060-4031 Associate or Co-investigator Contact Information Name: Vijay Ramprasad Institution: University of Minnesota Address: Department of Forest Resources Email: vrampras@umn.edu ORCID: 0000-0003-2636-0090 Associate or Co-investigator Contact Information Name: Kangjae Lee Institution: University of Seoul Address: 163, Seoulsiripdae-ro, Dongdaemun-gu Email: kasbiss@gmail.com ORCID:0000-0002-2857-6496 3. Date of data collection (single date, range, approximate date) June 2019 to Oct 2019 4. Geographic location of data collection (where was data collected?): Plantation and forest polygon data was requested and publicly released from the Himachal Pradesh Forest Department. All other data is open access public data. 5. Information about funding sources that supported the collection of the data: The participation of PR (a portion of his time), FF and VR on this project was funded by a grant from the NASA LCLUC program (NNX17AK14G). -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0): Public Domain Dedication 2. Links to publications that cite or use the data: Publication under review 3. Links to other publicly accessible locations of the data: NA 4. Links/relationships to ancillary data sets: NA 5. Was data derived from another source? If yes, list source(s): Part of this data is derived from the Himachal Pradesh Forest Department's plantation database 6. Recommended citation for the data: Rana, Pushpendra, Fleischman, Forrest, Ramprasad Vijay, Lee Kangjae 2020. Data for: Tree-planting programs in Himachal Pradesh India 2019. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/8x0d-gb23. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: forest_polygons_data_2019 Short description: Comprises 16,674 forest polygons, and covers 33 forest divisions of Himachal Pradesh B. Filename: Test_data_2147plantations_2019 Short description: Comprises 2,147 forest polygons where plantations happened from 1st Jan, 2016 to 31st July, 2019 C. Filename: plantation_prediction_RcodeSubmitted.R Short description: R code for data analysis and processing -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Overview of the prediction procedure We develop a predictive algorithm that forecasts three potentials: [1] probabilities of tree cover loss based on [2] fit of an area for plantation activity, both of which assist in estimating [3] wasteful expenditure. We build this algorithm using data on government tree plantations in the western Indian Himalayan state of Himachal Pradesh, which has experienced decades of afforestation programs within a heavily forested landscape. This region provides a wide range of plantation areas in varying biophysical contexts to test the efficacy of the algorithm for predicting tree cover loss and fit. For example, elevations range from 350 to 6975 meters, and rainfall varies from highs of 1035 mm in lower slopes and plains to low 395 mm in high altitude deserts in the rain shadow of the Great Himalaya range1. Himachal Pradesh has also spent an estimated US $248.24 million on afforestation since 2002, covering an area of 236,686 Ha 2 making it an excellent location to study the effectiveness of afforestation expenditures (Supplementary Fig. 3 and 4). Specifically, we apply machine-learning to 16,674 georeferenced forest polygons to predict probable tree cover loss for each polygon. We use tree cover loss as a proxy measure for evaluating plantation survival potential. We find tree cover loss a useful measure because 1) it can be used in more generalized contexts worldwide, 2) reflects the presence of enabling site conditions that support tree establishment, 3) captures management practices and human use effectively on a large scale. Data and Variables We apply machine-learning to 16,674 georeferenced forest polygons to predict probable tree mortality for each forest polygon, and we operationalize tree cover loss as the decline in tree canopy cover observed between 2003 and 2015 using Forest Survey of India data 3. Himachal Pradesh Forest Department GIS Lab has provided all 18,672 forest polygons belonging to 33 forest divisions out of the total 43 forest divisions. We removed 1998 polygons due to missing data. According to Himachal Pradesh Forest Department records, there were 2809 plantations planted during 2016 to 2019. Out of these total plantations, 785 plantations have missing data for afforestation spending. For this reason, we could use 2024 plantations with complete budget information for comparing predicted plantation mortality with afforestation spending for the purposes of this study. We mention here that such kinds of budgetary documents are beyond the reach of researchers and are kept in total secrecy. The expenditure data on studied plantations became public only when a local Member of Legislature (MLA) asked a question on the floor of Himachal Pradesh Legislative Assembly. Our study shows how availability of similar data in other states and in other part of the world can make analysis of tree-planting efforts more effective. We make a call for such data to be made available more widely, in India and across the world. This dataset does not contain any plantation carried out in cold and dry desert regions of Himachal Pradesh, although we do know that such plantations were carried out from other data. For example, trees were planted in Loser beat (8 Ha), Pagma beat (3 ha) and Kee beat (4.41 ha) in Spiti Valley, a cold desert mountain valley of Himachal Pradesh in 20164. No exact boundaries for these 2024 tree plantations exist, which can introduce some error in the analysis. We, however, have spatial boundaries of Forest Department forest polygons within which these plantations occur. We believe that predicting the tree cover loss in the entire forest polygon reasonably predicts the tree cover loss in plantation area (or the plantation mortality). Tree cover loss estimates in studied forest polygons are constructed based on fit of plantation activity to the area that vary in dependence, soil and biophysical characteristics, canopy cover before planting activity, and management practices. In the model, we included data on population, forest dependents, farmers, literates, road density, grazing density and economic activity as indicative of higher forest dependence. Data on these social indictors were calculated based on values of census villages that fell within forest polygon under study. Values for population, forest dependents, farmers, literates were summed up, whereas values for road density, grazing density and economic activity across villages were averaged within forest polygons. Baseline data on forest cover, cropland, grassland and bare-land area within each forest polygon was also included. Soil quality factors included in the model are soil depth, soil carbon, soil organic carbon, bulk density, cation exchange capacity, soil PH and available soil water capacity. In addition, we included information on altitude, slope, area, precipitation, temperature and forest fires in the predictive model. More details about the model predictors are provided in the Supplementary Table 1. We find that our calculated predicted tree cover loss varies in expected ways with individual forest polygon characteristics. For example, we found our predicted tree cover loss probabilities to vary linearly with the proportion of area under southern aspect in each plantation polygon (Supplementary Fig. 5). Areas on the southern aspect have direct exposure to sun and therefore, lack adequate moisture to support any long-term tree growth. Moreover, the performance of our algorithm is comparable to other recent prediction algorithms that explain social-ecological phenomenon such as poverty 5,6. Fitting the ensemble predictor We use an ensemble of Extreme Gradient Boosting, Random Forest and Naïve Bayes to generate tree cover loss predictions for studied forests (n= 16,674). In the model, we assign tree cover loss as positive and tree cover gain as negative values, and then randomly split the data into a “training” dataset (70%) and a “test” dataset (30%). We develop the predictive algorithm for the training dataset and then, use the resulting algorithm to generate tree cover loss predictions for the test dataset. In the model, we use 10-fold cross validation on the training dataset using three different models (Extreme Gradient Boosting, Random Forest and Naïve Bayes). We center and scale the variables, reduce multi-dimensionality of the algorithm using principal component analysis (PCA), exclude near-zero variance and highly correlated predictors to enhance the performance of algorithm. We also optimize ROC (Receiver Operating Characteristics) for our three machine-learning models. Our chosen parameters for each model are: (i) eXtreme Gradient Boosting: We use 10-fold cross validation and ROC is used to select the optimal model using the largest value (0.64). Sensitivity of the model is 0.78. The final values for the selected model includes: nrounds = 100, max_depth = 2; eta = 0.3; colsample_bytree = 0.8; min_child_weight = 1 and subsample = 0.75. (ii) Random forest: The model include 10-fold cross validation. ROC was used to select the optimal model using value (0.64). Sensitivity is 0.75. Final mtry =2. (iii) Naïve Bayes: We use 10-fold cross validation. ROC is used as a parameter to select the model with the largest value (0.62). We obtain sensitivity as 0.73. The final values of the model include: laplace =0; usekernel = TRUE and adjust =1. Then, we train a stacked ensemble model on these three meta-models with a boosted decision-tree algorithm with the objective of maximizing recall. Our model put more value on recall as missing a true positive (tree cover loss) may lead to serious ramifications for biodiversity and forest cover in the area. Our chosen stacked ensemble model resulted in higher values for balanced accuracy (unbalanced nature of our test set), recall and specificity. The chosen model parameters include: Stacked Ensemble Model: Predictive accuracy is 64% (95% Confidence intervals: 62 to 65%). Kappa = 0.24; Sensitivity = 0.74; Specificity = 0.50; Precision = 0.66; Recall = 0.74; F1 = 0.69. Finally, we use our selected ensemble model to estimate predicted tree cover loss probabilities for a new set of 2024 plantation polygons (planted between January, 2016 and July, 2019) and compare these predicted probabilities with afforestation spending and tree canopy densities. We also created an interpolated tree cover loss probabilities using predicted tree cover loss probabilities of 2024 plantation polygons using Kriging. We used Ordinary Kriging with stable prediction model in Geostatistical Analyst tool in ArcMap (10.7.1) to generate the interpolated tree cover loss as shown in Fig. 2 (b). We chose the model on the basis of normality and anisotropy parameters. Our Kriging model semivariogram has 12 number of lags with a lag size of 651.27 meters with a standard neighborhood type (max neighbors = 4, minimum neighbors =2). The prediction model has a root mean square error of 0.14 and a root mean square standardized error of 1.01. More details about the model predictors are provided below, and this information is also included in the Supplementary Table, which is attached to the DRUM record. Number of households: Total number of HHs in villages that are inside a forest polygon Census (2001), India, http://censusindia.gov.in/ Total population Total population of the villages that fall inside a forest polygon Census (2001), India, http://censusindia.gov.in/ Number of cultivators (farmers) Total number of farmers in villages that fall inside a forest polygon Census (2001), India, http://censusindia.gov.in/ Scheduled caste population Total number of total SC population in villages that fall inside a forest polygon Census (2001), India, http://censusindia.gov.in/ Total number of literates Total number of literates in villages that fall inside a forest polygon Census (2001), India, http://censusindia.gov.in/ Total marginal workers Total number of marginal workers in villages that fall inside a forest polygon Census (2001), India, http://censusindia.gov.in/ 2003–2008, 0.56 km spatial resolution 1 to 63 (values) (Average for villages that fall inside a forest polygon) Version 4 DMSP-OLS Nighttime Lights Time Series Road density, Km/km2 (Average for villages that fall inside a forest polygon) CIESIN (Data Center in NASA's Earth Observing System Data and Information System (EOSDIS)) (https://sedac.ciesin.columbia.edu/data/sets/browse ) Number of small land-holdings less than 0.5 ha Number of smallholdings less than 0.5 ha in Census Tehsils where that forest polygon falls. Agricultural census (2005), India Grazing density Number of grazing animals (buffaloes, goats, sheep, cattle)/area of the tehsil in ha Number/ha (Average for villages that fall inside a forest polygon) Livestock census (2007), India Area of the forest polygon ha Forest records, HP Forest Department, India Area under crop acreage 2000, 30 m resolution ha J. Chen, J. Chen, A. Liao, X. Cao, L. Chen, X. Chen, C. He, G. Han, S. Peng, M. Lu, Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS Journal of Photogrammetry and Remote Sensing. 103, 7–27 (2015). Area under grass coverage 2000, 30 m resolution ha W. R. Wieder, J. Boehnert, G. B. Bonan, M. Langseth, Regridded harmonized world soil database v1. 2. Data set. Available on-line [http://daac. ornl. gov] from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, USA (2014). Area under bare land acreage 2000, 30 m resolution ha (Weider et al 2014) Soil depth 2000, reference soil depth, average cm (Weider et al 2014) Available soil water capacity 2000, available soil water storage capacity, average Coded values 1 to 7; 1 = 15 cm water per m of the soil unit, 2 = 12.5 cm, 3 = 10 cm, 4 = 7.5 cm, 5 = 5 cm, 6 = 1.5 cm, 7 = 0 cm. (Weider et al 2014) Topsoil Carbon Content Topsoil and subsoil carbon content (T_C and S_C) are based on the carbon content of the dominant soil type in each regridded cell rather than a weighted average. kg C m-2 (Weider et al 2014) Subsoil Carbon Content kg C m-2 (Weider et al 2014) Topsoil Organic Carbon % weight (Weider et al 2014) Subsoil Organic Carbon % weight (Weider et al 2014) PH (Top Soil) Topsoil pH (in H2O) -log(H+) (Weider et al 2014) Top Soil Bulk Density Reference bulk density values are calculated from equations developed by Saxton et al. (1986) that relate to the texture of the soil only. kg dm-3 (Weider et al 2014) Top Soil Cation Exchange Capacity Cation exchange capacity of the clay fraction in the topsoil cmol per kg (Weider et al 2014) Sub Soil Cation Exchange Capacity Cation exchange capacity of the clay fraction in the subsoil cmol per kg (Weider et al 2014) Location (altitude) 2000, 90 m resolution m SRTM (Shuttle Radar Topography Mission), 90 m resolution, 2000 SRTM 90m Digital Elevation Database v4.1. CGIAR-CSI (2017), (available at https://cgiarcsi.community/data/srtm-90m-digital-elevation-database-v4-1/). Slope 2000, 90 m resolution degree SRTM (Shuttle Radar Topography Mission), 90 m resolution, 2000 SRTM 90m Digital Elevation Database v4.1. CGIAR-CSI (2017), (available at https://cgiarcsi.community/data/srtm-90m-digital-elevation-database-v4-1/). Baseline forest cover 2003, 24 m resolution Forest cover = Open forest + Moderately dense forest + Very dense forest Forest Survey of India, 2005 http://www.fsi.nic.in/publications Number of forest fires 2003–2008 Number NASA, active fire data, MODIS C6 FIRMS, (available at https://firms.modaps.eosdis.nasa.gov/map). Temperature 2001–2008, 30 km resolution, average °C CRU (Climatic Research Unit) TS dataset, version 4.0, gridded dataset of monthly terrestrial surface climate http://www.cru.uea.ac.uk/ Precipitation 2001–2008, 30 km resolution, average mm CRU (Climatic Research Unit) TS dataset, version 4.0, gridded dataset of monthly terrestrial surface climate http://www.cru.uea.ac.uk/ I. Harris, P. D. Jones, T. J. Osborn, D. H. Lister, Updated high-resolution grids of monthly climatic observations – the CRU TS3.10 Dataset. International Journal of Climatology. 34, 623–642 (2014) Land surface temperature 2001–2008, 5.5 km spatial resolution, average K MODIS/Aqua Land Surface Temperature/Emissivity Monthly L3 Global CMG V005 Global Change Master Directory (GCMD), (available at https://gcmd.gsfc.nasa.gov/). Outcomes (O) Tree cover loss/Mortality 24 m resolution FC_CHANGE15_03 = FC_2015HA – FC_2003HA If FC_CHANGE15_03< 0, MORTALITY = 1, OTHERWISE = 0 Forest Survey of India (2005); Forest Survey of India (2017) 2. Methods for processing the data: 3. Instrument- or software-specific information needed to interpret the data: R statistical software, Excel 4. Standards and calibration information, if appropriate: 5. Environmental/experimental conditions: 6. Describe any quality-assurance procedures performed on the data: 7. People involved with sample collection, processing, analysis and/or submission: All coauthors and no one else. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: forest_polygons_data_2019 ----------------------------------------- 1. Number of variables: 33 2. Number of cases/rows: 16,674 3. Missing data codes: Code/symbol Definition Code/symbol Definition 4. Variable List Column: Variable Name, description, unit of measurement A: COMPT, forest polygon ID, Unique ID number assigned to each forest polygon B: CompAreaGISha Area of forest polygon,ha C: Number_of_Households Number of households,Total number of HHs in villages that are inside a forest polygon D: Total_Population Total population, Total population of the villages that fall inside a forest polygon E: Forest_Dependents Number of marginal people (scheduled caste population),Total number of total SC population in villages that fall inside a forest polygon F: Literates Number of literates,Total number of literates in villages that fall inside a forest polygon G: Number_of_Farmers Number of cultivators (farmers) Total number of farmers in villages that fall inside a forest polygon H: Number_of_Unemployed_Persons Total marginal workers Total number of marginal workers in villages that fall inside a forest polygon I: Grazing_animals_density Number of grazing animals (buffaloes, goats, sheep, cattle)/area of the tehsil in ha, Number/ha (Average for villages that fall inside a forest polygon) J: Number_of_Smallholdings Number of small land-holdings less than 0.5 ha Number of smallholdings less than 0.5 ha in Census Tehsils where that forest polygon falls. K: Altitude Location (altitude), 2000, 90 m resolution, m L: Slope Slope, 2000, 90 m resolution, degree M: Av_temp03_08 Temperature, 2001–2008, 30 km resolution, average, °C N: Av_preci03_08 Precipitation, 2001–2008, 30 km resolution, average, mm O: Av_lst03_08 Land surface temperature, 2001–2008, 5.5 km spatial resolution, average, K P: Av_nit03_08 Economic activity, 2003–2008, 0.56 km spatial resolution, 1 to 63 (values) (Average for villages that fall inside a forest polygon) Q: AvailableSWC Available soil water capacity, 2000, average, Coded values 1 to 7; 1 = 15 cm water per m of the soil unit, 2 = 12.5 cm, 3 = 10 cm, 4 = 7.5 cm, 5 = 5 cm, 6 = 1.5 cm, 7 = 0 cm. R: Soil_depth Soil depth, 2000, reference soil depth, average, cm S: TopSoil_Carbon Topsoil Carbon Content, based on the carbon content of the dominant soil type in each regridded cell rather than a weighted average, kg C m-2 T: SubSoil_Carbon Subsoil Carbon Content, based on the carbon content of the dominant soil type in each regridded cell rather than a weighted average, kg C m-2 U: TopSoil_OC Topsoil Organic Carbon, % weight V: SubSoil_OC Subsoil Organic Carbon, % weight W: TopSoil_PH PH (Top Soil), Topsoil pH (in H2O), -log(H+) X: TopSoil_BulkDen Top Soil Bulk Density, Reference bulk density values are calculated from equations developed by Saxton et al. (1986) that relate to the texture of the soil only, kg dm-3 Y: TopSoil_CEC Top Soil Cation Exchange Capacity, Cation exchange capacity of the clay fraction in the topsoil, cmol per kg Z: SubSoil_CEC Sub Soil Cation Exchange Capacity, Cation exchange capacity of the clay fraction in the subsoil, cmol per kg AA: number_of_fires Number of forest fires, 2003–2008, Number AB: Road_Density Road density, Km/km2 (Average for villages that fall inside a forest polygon) AC: GL2000_Crop_area Area under crop acreage 2000, 30 m resolution, ha AD: GL2000_Grass_area Area under grass coverage, 2000, 30 m resolution, ha AE: GL2000_Bareland_area Area under bare land acreage, 2000, 30 m resolution, ha AF: FC_2003HA Baseline forest cover, 2003, 24 m resolution, Forest cover = Open forest + Moderately dense forest + Very dense forest AG: MORTALITY Tree cover loss/Mortality, 24 m resolution, If FC_CHANGE15_03< 0, MORTALITY = 1, OTHERWISE = 0; FC_CHANGE15_03 = FC_2015HA – FC_2003HA ---------------------------------------- DATA-SPECIFIC INFORMATION FOR: Test_data_2147plantations_2019 ----------------------------------------- 1. Number of variables: 32 2. Number of cases/rows: 2147 3. Missing data codes: Code/symbol Definition Code/symbol Definition 4. Variable List Column: Variable Name, description, unit of measurement A: PlantationID Unique ID number assigned to each forest plantation (n=2024) B: CompAreaGISha Area of forest polygon,ha C: Number_of_Households Number of households,Total number of HHs in villages that are inside a forest polygon D: Total_Population Total population, Total population of the villages that fall inside a forest polygon E: Forest_Dependents Number of marginal people (scheduled caste population),Total number of total SC population in villages that fall inside a forest polygon F: Literates Number of literates,Total number of literates in villages that fall inside a forest polygon G: Number_of_Farmers Number of cultivators (farmers) Total number of farmers in villages that fall inside a forest polygon H: Number_of_Unemployed_Persons Total marginal workers Total number of marginal workers in villages that fall inside a forest polygon I: Grazing_animals_density Number of grazing animals (buffaloes, goats, sheep, cattle)/area of the tehsil in ha, Number/ha (Average for villages that fall inside a forest polygon) J: Number_of_Smallholdings Number of small land-holdings less than 0.5 ha Number of smallholdings less than 0.5 ha in Census Tehsils where that forest polygon falls. K: Altitude Location (altitude), 2000, 90 m resolution, m L: Slope Slope, 2000, 90 m resolution, degree M: Av_temp03_08 Temperature, 2001–2008, 30 km resolution, average, °C N: Av_preci03_08 Precipitation, 2001–2008, 30 km resolution, average, mm O: Av_lst03_08 Land surface temperature, 2001–2008, 5.5 km spatial resolution, average, K P: Av_nit03_08 Economic activity, 2003–2008, 0.56 km spatial resolution, 1 to 63 (values) (Average for villages that fall inside a forest polygon) Q: AvailableSWC Available soil water capacity, 2000, average, Coded values 1 to 7; 1 = 15 cm water per m of the soil unit, 2 = 12.5 cm, 3 = 10 cm, 4 = 7.5 cm, 5 = 5 cm, 6 = 1.5 cm, 7 = 0 cm. R: Soil_depth Soil depth, 2000, reference soil depth, average, cm S: TopSoil_Carbon Topsoil Carbon Content, based on the carbon content of the dominant soil type in each regridded cell rather than a weighted average, kg C m-2 T: SubSoil_Carbon Subsoil Carbon Content, based on the carbon content of the dominant soil type in each regridded cell rather than a weighted average, kg C m-2 U: TopSoil_OC Topsoil Organic Carbon, % weight V: SubSoil_OC Subsoil Organic Carbon, % weight W: TopSoil_PH PH (Top Soil), Topsoil pH (in H2O), -log(H+) X: TopSoil_BulkDen Top Soil Bulk Density, Reference bulk density values are calculated from equations developed by Saxton et al. (1986) that relate to the texture of the soil only, kg dm-3 Y: TopSoil_CEC Top Soil Cation Exchange Capacity, Cation exchange capacity of the clay fraction in the topsoil, cmol per kg Z: SubSoil_CEC Sub Soil Cation Exchange Capacity, Cation exchange capacity of the clay fraction in the subsoil, cmol per kg AA: number_of_fires Number of forest fires, 2003–2008, Number AB: Road_Density Road density, Km/km2 (Average for villages that fall inside a forest polygon) AC: GL2000_Crop_area Area under crop acreage 2000, 30 m resolution, ha AD: GL2000_Grass_area Area under grass coverage, 2000, 30 m resolution, ha AE: GL2000_Bareland_area Area under bare land acreage, 2000, 30 m resolution, ha AF: FC_2003HA Baseline forest cover, 2003, 24 m resolution, Forest cover = Open forest + Moderately dense forest + Very dense forest ---------------------------------------- DATA-SPECIFIC INFORMATION FOR: plantation_prediction_Rcode -----------------------------------------