// Opening Commands * close open log file capture log close * allows Stata to run without stops set more off // do-file: GovtRecords-cleaning.do // author: Forrest Fleischman // date: February 8, 2018 // #0 // program setup * create logfile/begin logging output log using "GovtRecords-Cleaning.log", text replace version 13 clear set linesize 80 macro drop _all set scheme s2mono *change to the working directory (may need to be modified on different machines) cd "M:\Team Drives\NASA-Kangra\Papers\PlantationHistories\Data" // Middle Commands // #1 // open data & save with new name * We were provided with an excel file with data provided in several separate sheets * here I combine these sheets into single data sets using the append command * I first saved each sheet in the excel file as a separate CSV * Here I create 3 separate data sets: * the first dataset consists of statewide data (statewide.csv) * the second dataset consists of data for Dharamsala circle (d-circle.csv * the third dataset consist of s data with information on range level data (but * note that this dataset was created from state-level records - i.e. this is * not data collected from the range office itself, but from the statewide records) *step 1 - import each period's csv file, rename a few variables that were named *inconsistently, drop missing observation rows, and save as a data file insheet using Govt-data-plantations-FF-2018-12-30-1979-state.csv rename sehid schid rename sehname schname drop if schid == . save Govt-data-plantations-FF-2018-12-30-1979-state.dta, replace clear insheet using Govt-data-plantations-FF-2018-12-30-1980-89-state.csv rename sehid schid drop if schid == . save Govt-data-plantations-FF-2018-12-30-1980-89-state.dta, replace clear insheet using Govt-data-plantations-FF-2018-12-30-1990-99-state.csv rename sehid schid drop if schid == . save Govt-data-plantations-FF-2018-12-30-1990-99-state.dta, replace clear insheet using Govt-data-plantations-FF-2018-12-30-2000-09-state.csv rename sehid schid drop if schid == . save Govt-data-plantations-FF-2018-12-30-2000-09-state.dta, replace clear *note that changer data sheet is all for one project, which was listed separately *in govt. records, and uses different data labels *In discussion with Rajesh Rana, Vijay Guleria, and Vijay Ramprasad, we determined *that data from the Changer project files were also duplicated in the other data *files, thus I starred out this routine, as it is not needed (i.e. adding the *Changer data would duplicate data), however I left the routine rather than *deleting it in case future information indicates that this was a mistake. *insheet using Govt-data-plantations-FF-2018-12-30-Changer.csv *drop if schid == . *rename othbl notidentified *rename ttpltrs totaltreesplanted *save Govt-data-plantations-FF-2018-12-30-Changer.dta, replace *clear insheet using Govt-data-plantations-FF-2018-12-30-2010-17-state.csv rename sehid schid drop if schid == . save Govt-data-plantations-FF-2018-12-30-2010-17-state.dta, replace *step 2 - use the append command to combine all the .dta files append using Govt-data-plantations-FF-2018-12-30-1979-state.dta append using Govt-data-plantations-FF-2018-12-30-1980-89-state.dta append using Govt-data-plantations-FF-2018-12-30-1990-99-state.dta append using Govt-data-plantations-FF-2018-12-30-2000-09-state.dta *step 3 - save as an intermediate file save plantation-history.dta, replace *step 4 - drop all circle level variables, drop all empty variables drop cr* drop v104 v105 v106 v107 v108 v109 v110 v111 v112 v113 v114 v115 v116 v117 v118 v119 v120 v121 v122 v123 *rename all variables (based on code book provided in original database) * the goal here is to have easy to understand variable names * note that tree species all refer to he number of individuals rename ttarea area rename ttdeo deodar rename ttkail kail rename ttfrsp firspruce rename ttchil chil rename ttkhr khair rename ttshsm shisham rename ttbam bamboo rename ttrbna robinia rename ttwilw willow rename ttpplr poplar rename ttwln walnut rename ttbnok banoak rename ttdro daroo rename ttrta ritha rename ttjtr jatropha rename ttamla amla rename ttknr kachnar rename ttdrk drake rename ttlcna lucinia rename ttjmn jamun rename ttajn arjun rename ttbhr behra rename ttteak teak rename ttduri duri rename tthar harar rename tttuni tuni rename tteulp eucalyptus rename tttosh tosh rename ttbchy blackcherry rename ttmlby mulberry rename ttmple maple rename ttsiris siris rename ttoie oie rename ttpaja paja rename ttknor khanor rename ttdhon dehoon rename ttkhrk khirak rename ttjkrd jakrinda rename ttdlo dlo rename ttbsmbl bansimbal rename ttslok slock rename ttbuel buel rename ttbill bill rename ttpnut picknut rename ttasn alsan rename ttoth notidentified rename ttpltrs totaltreesplanted rename ttgrstuft grasstufts rename ttgrass totalgrass *we append the Changer datasheet to this one, as it uses data labels that are *same as those used in the renamed data labels *starred out this command, as we are no longer adding this data file, see discussion above *append using Govt-data-plantations-FF-2018-12-30-Changer.dta *step 5 - save the state level data file as a .dta save plantation-history-HP.dta, replace *step 6: merge with program information file *clean and drop variables from scheme list *dropped variables were Rajesh Rana & Vijay Guleria's coding of which years each scheme occurred. *presumably we can go back and recalculate this once we have each scheme matched to actual planting that occurred clear insheet using Govt-data-plantations-FF-2018-12-30-schemelist.csv rename code schid sort schid drop if schid ==. keep schid nameofschemes schemedescription confidence statefunded centralfunded donorfunded participatory watershedproject_focusedonecosystemservices catchementareaplantation_relatedtodams compensatory labor_nrega save schemelistshort.dta, replace *merge many to many the scheme list with the plantation history use plantation-history-HP.dta sort schid merge m:1 schid using schemelistshort.dta gen teststring = schname == nameofschemes label var teststring "this variable is used for data cleaning to check that scheme names match across data files" drop _merge *label variables (species names are not given an additional label, they all *refer to the number of trees planted of that species) label var schid "Scheme ID number" label var schname "Scheme name" label var area "area in hectares" label var nameofschemes "name of scheme (should match schname)" label var schemedescription "description of the scheme based on knowledge of Dr. Pushpendra Rana" label var confidencelevel "Dr. Pushpendra Rana's confidence in his knowledge of this scheme" label var statefunded "Scheme is funded by the state government according to Dr. Rana" label var centralfunded "scheme is funded by the central govt. according to Dr. Rana" label var donorfunded "scheme is funded by a foreign donor according to Dr. Rana" label var participatory "scheme has a substantial participatory component according to Dr. Rana" rename watershedproject_focsedonecosys watershed label var watershed "scheme is a watershed project focused on ecosystem restoration" rename catchmentareaplantation_related catchment label var catchment "scheme is a catchment area plantation related to hydro development" label var compensatory "scheme is part of a compensatory afforestation program" label var labor_nrega "scheme is part of an NREGA program" *step 7 *step 7 - save! save plantationhistory-HP-merge.dta, replace outsheet using plantation-history-hp.csv , comma replace *step 8 - in this sequence, we recall plantation-history.dta and create a data * file with only the data from Dharamshala circle, following the same sequence * in steps 4-6 above clear use plantation-history.dta drop tt* drop v104 v105 v106 v107 v108 v109 v110 v111 v112 v113 v114 v115 v116 v117 v118 v119 v120 v121 v122 v123 rename crarea area rename crdeo deodar rename crkail kail rename crfrsp firspruce rename crchil chil rename crkhr khair rename crshsm shisham rename crbam bamboo rename crrbna robinia rename crwilw willow rename crpplr poplar rename crwln walnut rename crbnok banoak rename crdro daroo rename crrta ritha rename crjtr jatropha rename cramla amla rename crknr kachnar rename crdrk drake rename crlcna lucinia rename crjmn jamun rename crajn arjun rename crbhr behra rename crteak teak rename crduri duri rename crhar harar rename crtuni tuni rename creulp eucalyptus rename crtosh tosh rename crbchy blackcherry rename crmlby mulberry rename crmple maple rename crsiris siris rename croie oie rename crpaja paja rename crknor khanor rename crdhon dehoon rename crkhrk khirak rename crjkrd jakrinda rename crdlo dlo rename crbsmbl bansimbal rename crslok slock rename crbuel buel rename crbill bill rename crpnut picknut rename crasn alsan rename croth notidentified rename crttpltrs totaltreesplanted rename crgrstuft grasstufts rename crgrass totalgrass sort schid merge m:1 schid using schemelistshort.dta gen teststring = schname == nameofschemes drop _merge *label variables (species names are not given an additional label, they all *refer to the number of trees planted of that species) label var schid "Scheme ID number" label var schname "Scheme name" label var area "area in hectares" label var nameofschemes "name of scheme (should match schname)" label var schemedescription "description of the scheme based on knowledge of Dr. Pushpendra Rana" label var confidencelevel "Dr. Pushpendra Rana's confidence in his knowledge of this scheme" label var statefunded "Scheme is funded by the state government according to Dr. Rana" label var centralfunded "scheme is funded by the central govt. according to Dr. Rana" label var donorfunded "scheme is funded by a foreign donor according to Dr. Rana" label var participatory "scheme has a substantial participatory component according to Dr. Rana" rename watershedproject_focusedonecosys watershed label var watershed "scheme is a watershed project focused on ecosystem restoration" rename catchmentareaplantation_related catchment label var catchment "scheme is a catchment area plantation related to hydro development" label var compensatory "scheme is part of a compensatory afforestation program" label var labor_nrega "scheme is part of an NREGA program" save plantation-history-dhm.dta, replace outsheet using plantation-history.dhm.csv, comma replace // Closing Commands // save data & close log save, replace log close exit