//  Opening Commands

* close open log file
capture log close

* allows Stata to run without stops
set more off

 
//  do-file:     GovtRecords-cleaning.do
//  author:      Forrest Fleischman
//  date:        February 8, 2018
 
//  #0
//  program setup

* create logfile/begin logging output 
log using "GovtRecords-Cleaning.log", text replace
version 13
clear 
set linesize 80
macro drop _all
set scheme s2mono

*change to the working directory (may need to be modified on different machines)
cd "M:\Team Drives\NASA-Kangra\Papers\PlantationHistories\Data"

//  Middle Commands
//  #1
//  open data & save with new name
* We were provided with an excel file with data provided in several separate sheets
* here I combine these sheets into single data sets using the append command
* I first saved each sheet in the excel file as a separate CSV
* Here I create 3 separate data sets:
* the first dataset consists of statewide data (statewide.csv)
* the second dataset consists of data for Dharamsala circle (d-circle.csv
* the third dataset consist of s data with information on range level data (but
* note that this dataset was created from state-level records - i.e. this is 
* not data collected from the range office itself, but from the statewide records)
 
*step 1 - import each period's csv file, rename a few variables that were named 
*inconsistently, drop missing observation rows, and save as a data file
insheet using Govt-data-plantations-FF-2018-12-30-1979-state.csv
rename sehid schid
rename sehname schname

drop if schid == .
save Govt-data-plantations-FF-2018-12-30-1979-state.dta, replace
clear

insheet using Govt-data-plantations-FF-2018-12-30-1980-89-state.csv
rename sehid schid
drop if schid == .
save Govt-data-plantations-FF-2018-12-30-1980-89-state.dta, replace
clear

insheet using Govt-data-plantations-FF-2018-12-30-1990-99-state.csv
rename sehid schid
drop if schid == .
save Govt-data-plantations-FF-2018-12-30-1990-99-state.dta, replace
clear 

insheet using Govt-data-plantations-FF-2018-12-30-2000-09-state.csv
rename sehid schid
drop if schid == .
save Govt-data-plantations-FF-2018-12-30-2000-09-state.dta, replace
clear

*note that changer data sheet is all for one project, which was listed separately
*in govt. records, and uses different data labels
*In discussion with Rajesh Rana, Vijay Guleria, and Vijay Ramprasad, we determined
*that data from the Changer project files were also duplicated in the other data
*files, thus I starred out this routine, as it is not needed (i.e. adding the 
*Changer data would duplicate data), however I left the routine rather than 
*deleting it in case future information indicates that this was a mistake.
*insheet using Govt-data-plantations-FF-2018-12-30-Changer.csv
*drop if schid == .
*rename othbl notidentified
*rename ttpltrs totaltreesplanted
*save Govt-data-plantations-FF-2018-12-30-Changer.dta, replace
*clear

insheet using Govt-data-plantations-FF-2018-12-30-2010-17-state.csv
rename sehid schid
drop if schid == .
save Govt-data-plantations-FF-2018-12-30-2010-17-state.dta, replace

*step 2 - use the append command to combine all the .dta files
append using Govt-data-plantations-FF-2018-12-30-1979-state.dta
append using Govt-data-plantations-FF-2018-12-30-1980-89-state.dta
append using Govt-data-plantations-FF-2018-12-30-1990-99-state.dta
append using Govt-data-plantations-FF-2018-12-30-2000-09-state.dta

 *step 3 - save as an intermediate file
save plantation-history.dta, replace

*step 4 - drop all circle level variables, drop all empty variables
drop cr*
drop v104 v105 v106 v107 v108 v109 v110 v111 v112 v113 v114 v115 v116 v117 v118 v119 v120 v121 v122 v123

*rename all variables (based on code book provided in original database)
* the goal here is to have easy to understand variable names
* note that tree species all refer to he number of individuals
rename ttarea area
rename ttdeo deodar
rename ttkail kail
rename ttfrsp firspruce
rename ttchil chil
rename ttkhr khair
rename ttshsm shisham
rename ttbam bamboo
rename ttrbna robinia
rename ttwilw willow
rename ttpplr poplar
rename ttwln walnut
rename ttbnok banoak
rename ttdro daroo
rename ttrta ritha
rename ttjtr jatropha
rename ttamla amla
rename ttknr kachnar
rename ttdrk drake
rename ttlcna lucinia
rename ttjmn jamun
rename ttajn arjun
rename ttbhr behra
rename ttteak teak
rename ttduri duri
rename tthar harar
rename tttuni tuni
rename tteulp eucalyptus
rename tttosh tosh
rename ttbchy blackcherry
rename ttmlby mulberry
rename ttmple maple
rename ttsiris siris
rename ttoie oie
rename ttpaja paja
rename ttknor khanor
rename ttdhon dehoon
rename ttkhrk khirak
rename ttjkrd jakrinda
rename ttdlo dlo
rename ttbsmbl bansimbal
rename ttslok slock
rename ttbuel buel
rename ttbill bill
rename ttpnut picknut
rename ttasn alsan
rename ttoth notidentified
rename ttpltrs totaltreesplanted
rename ttgrstuft grasstufts
rename ttgrass totalgrass



*we append the Changer datasheet to this one, as it uses data labels that are 
*same as those used in the renamed data labels
*starred out this command, as we are no longer adding this data file, see discussion above
*append using Govt-data-plantations-FF-2018-12-30-Changer.dta

*step 5 - save the state level data file as a .dta 
save plantation-history-HP.dta, replace

*step 6: merge with program information file
*clean and drop variables from scheme list
*dropped variables were Rajesh Rana & Vijay Guleria's coding of which years each scheme occurred.
*presumably we can go back and recalculate this once we have each scheme matched to actual planting that occurred
clear
insheet using Govt-data-plantations-FF-2018-12-30-schemelist.csv
rename code schid
sort schid
drop if schid ==.
keep schid nameofschemes schemedescription confidence statefunded centralfunded donorfunded	participatory watershedproject_focusedonecosystemservices catchementareaplantation_relatedtodams compensatory labor_nrega
save schemelistshort.dta, replace

*merge many to many the scheme list with the plantation history
use plantation-history-HP.dta
sort schid
merge m:1 schid using schemelistshort.dta
gen teststring = schname == nameofschemes
label var teststring "this variable is used for data cleaning to check that scheme names match across data files"
drop _merge

*label variables (species names are not given an additional label, they all 
*refer to the number of trees planted of that species)
label var schid "Scheme ID number"
label var schname "Scheme name"
label var area "area in hectares"
label var nameofschemes "name of scheme (should match schname)"
label var schemedescription "description of the scheme based on knowledge of Dr. Pushpendra Rana"
label var confidencelevel "Dr. Pushpendra Rana's confidence in his knowledge of this scheme"
label var statefunded "Scheme is funded by the state government according to Dr. Rana"
label var centralfunded "scheme is funded by the central govt. according to Dr. Rana"
label var donorfunded "scheme is funded by a foreign donor according to Dr. Rana"
label var participatory "scheme has a substantial participatory component according to Dr. Rana"
rename watershedproject_focsedonecosys watershed
label var watershed "scheme is a watershed project focused on ecosystem restoration"
rename catchmentareaplantation_related catchment
label var catchment "scheme is a catchment area plantation related to hydro development"
label var compensatory "scheme is part of a compensatory afforestation program"
label var labor_nrega "scheme is part of an NREGA program"


*step 7

*step 7 - save!
save plantationhistory-HP-merge.dta, replace
outsheet using plantation-history-hp.csv , comma replace

*step 8 - in this sequence, we recall plantation-history.dta and create a data
* file with only the data from Dharamshala circle, following the same sequence
* in steps 4-6 above
clear
use plantation-history.dta
drop tt*
drop v104 v105 v106 v107 v108 v109 v110 v111 v112 v113 v114 v115 v116 v117 v118 v119 v120 v121 v122 v123
rename crarea area
rename crdeo deodar
rename crkail kail
rename crfrsp firspruce
rename crchil chil
rename crkhr khair
rename crshsm shisham
rename crbam bamboo
rename crrbna robinia
rename crwilw willow
rename crpplr poplar
rename crwln walnut
rename crbnok banoak
rename crdro daroo
rename crrta ritha
rename crjtr jatropha
rename cramla amla
rename crknr kachnar
rename crdrk drake
rename crlcna lucinia
rename crjmn jamun
rename crajn arjun
rename crbhr behra
rename crteak teak
rename crduri duri
rename crhar harar
rename crtuni tuni
rename creulp eucalyptus
rename crtosh tosh
rename crbchy blackcherry
rename crmlby mulberry
rename crmple maple
rename crsiris siris
rename croie oie
rename crpaja paja
rename crknor khanor
rename crdhon dehoon
rename crkhrk khirak
rename crjkrd jakrinda
rename crdlo dlo
rename crbsmbl bansimbal
rename crslok slock
rename crbuel buel
rename crbill bill
rename crpnut picknut
rename crasn alsan
rename croth notidentified
rename crttpltrs totaltreesplanted
rename crgrstuft grasstufts
rename crgrass totalgrass

sort schid
merge m:1 schid using schemelistshort.dta
gen teststring = schname == nameofschemes
drop _merge

*label variables (species names are not given an additional label, they all 
*refer to the number of trees planted of that species)
label var schid "Scheme ID number"
label var schname "Scheme name"
label var area "area in hectares"
label var nameofschemes "name of scheme (should match schname)"
label var schemedescription "description of the scheme based on knowledge of Dr. Pushpendra Rana"
label var confidencelevel "Dr. Pushpendra Rana's confidence in his knowledge of this scheme"
label var statefunded "Scheme is funded by the state government according to Dr. Rana"
label var centralfunded "scheme is funded by the central govt. according to Dr. Rana"
label var donorfunded "scheme is funded by a foreign donor according to Dr. Rana"
label var participatory "scheme has a substantial participatory component according to Dr. Rana"
rename watershedproject_focusedonecosys watershed
label var watershed "scheme is a watershed project focused on ecosystem restoration"
rename catchmentareaplantation_related catchment
label var catchment "scheme is a catchment area plantation related to hydro development"
label var compensatory "scheme is part of a compensatory afforestation program"
label var labor_nrega "scheme is part of an NREGA program"

save plantation-history-dhm.dta, replace
outsheet using plantation-history.dhm.csv, comma replace

//  Closing Commands
//  save data & close log

save, replace
log close
exit