This readme.txt file was generated on 2023-06-16 by Recommended citation for the data: Pardey, Philip; Wang, Shanchao; Alston, Julian M.. (2023). R&D Lags in Economic Models. Retrieved from the Data Repository for the University of Minnesota. https://conservancy.umn.edu/handle/11299/254756. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: R&D Lags in Economic Models 2. Author Information Author Contact: Philip Pardey (ppardey@umn.edu) Name: Philip Pardey Institution: University of Minnesota Email: ppardey@umn.edu ORCID: Name: Shanchao Wang Institution: Meta, Seattle Email: scwang@meta.com ORCID: Name: Julian M. Alston Institution: University of California, Davis Email: ORCID: 3. Date published or finalized for release: 2023-06-15 4. Date of data collection (single date, range, approximate date): 1890 to 2007[PGP1] 5. Geographic location of data collection (where was data collected?): Not applicable 6. Information about funding sources that supported the collection of the data: The work for this project was partially supported by the California Agricultural Experiment Station; the Giannini Foundation of Agricultural Economics; the Minnesota Agricultural Experiment Station (MIN-14-171); the University of Minnesota’s GEMS Informatics Center; and the USDA National Research Initiative. 7. Overview of the data (abstract): The data files include primary and processed data that underpin the analysis reported in the paper "R&D Lags in Economic Models". The R files include all the code required to conduct the analysis reported in the paper. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Attribution-NonCommercial-NoDerivs 3.0 United States (http://creativecommons.org/licenses/by-nc-nd/3.0/us/) 2. Links to publications that cite or use the data: 10.22004/ag.econ.330085 Plus the additional links to the source information found here 3. Was data derived from another source? If yes, list source(s): See section 2 “Links to publications that cite or use the data” 4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- File List Filename: Wang et al. _R&D lags_Regression-Data (2023-06-02).xlsx Short description: Primary and processed data Filename: USDACropData.R Short description: The R file to obtain yield data from USDA quick stats. To download USDA data, go to https://www.nass.usda.gov/datasets/. Find file named qs.crops_20230701.txt.gz. Note that this data updates each week day. The date will be changed when you download. Output data of this file is stored in "3.Yield Data tab" in "Wang et al. _R&D lags_Regression-Data (2023-06-02)". Filename: WeatherIndex.R Short description: R file reads data from "3.Yield Data tab" tab in "Wang et al. _R&D lags_Regression-Data (2023-06-02)" to produce weather shocks. Output data of this file is stored in "4. Weather Index" in "Wang et al. _R&D lags_Regression-Data (2023-06-02)". Filename: 1_ConstructStock.R Short description: his R file reads "1. MFP", "2. Ag R&D Spending", "4. Weather Index" from "Wang et al. _R&D lags_Regression-Data (2023-06-02)" and construct knowledge stocks. The output from this R file are: - weights.RData: weights for lag years for each distribution. - Distribution.RData: data output used for regression and time-series analysis. Filename: 2_TimeSeriesTests.R (requires Stata) Short description: This R file performs time series analysis for knowledge stocks. It call STATA (version 17) do files: - 2.Table2.do - 2.TwoSeriesTests.do - 2.TwoSeriesTests_Bloom.do The outputs of this file is: - timeseries_test_results.RData Which are used to construct time series analysis tables. Filename: 3_ResidualAdjust.R (requires Stata) Short description: This R file runs different models and produces regressions results. All outputs can be found in "3_outputs" folder Filename: 4_BCRcalculation.R Short description: Uses results from step 3 to calculate benefit cost ratio under different scenarios. Results saved in folder "4_outputs". Filename: VA_State_US (2021-04-03).xlsx Short description: Value added to the U.S. economy by the agricultural sector, 1910-2021F' Filename: BEA Implicit Price Index for GDP (2014-04-03).xls Short description: Table 1.1.4. Price Indexes for Gross Domestic Product from the Bureau of Economic analysis Filename: CSV Data Archive.zip Short description: Zip folder of CSV versions of the data files 2. Relationship between files: 1. Data from "MFP Estimation Data (2017-12-31)". Under column MFP USDA-InSTePP 1940 to 1948 AgMFP estimates are from USDA (1983); 1949 to 2007 AgMFP estimates are from InSTePP Production account. USDA-ERS (United States Department of Agriculture, Economic Research Service). Economic Indicators of the Farm Sector: Production and Efficiency Statistics, 1981. ECIFS 1-3, Washington, DC: USDA, Economic Research Service, 1983. Pardey, P.G., M.A. Andersen, B. J Craig, and J.M. Alston. “InSTePP United States Production Accounts, Version 5.” St. Paul, MN: International Science and Technology Practice and Policy, 2014. Available from https://wayback.archive-it.org/4111/20220209030454/https:/www.instepp.umn.edu/products/instepp-us-production-accounts-version-5-rev-input-quantity-index. 2. Public agricultural R&D represents the sum of SAES (State Agricultural Experiment Station) and intramural USDA research spending. The SAES R&D series (excluding forestry) are compiled from unpublished USDA, CRIS data files. The USDA intramural series for years prior to 2001 are also from the USDA sources cited in Alston, J. M., Pardey, P. G., & Rao, X. (2020). The Payoff to Investing in CGIAR Research. Arlington, VA: SoAR Foundation. 10.22004/ag.econ.337029 3. Data from USDA QuickStat. ftp://ftp.nass.usda.gov/quickstats/. The top ten crops based on average area are: CORN, HAY, WHEAT, SOYBEANS, OATS, COTTON, SORGHUM, BARLEY, RICE, FLAXSEED. Note: Sunflower has the 9th largest average area. However, it only has data from 1975. Hence, we use data from rice (10th in area) and flaxseed (11th in area). Detailed codes are included in "USDACropData.R" 4. Weighted average yield calculated from the top 10 field crops for the years 1940-2007. Each crop's annual share of the total value of production was used as its weight. Data from tab 3 which is downloaded from USDA Quick Stat. Detailed codes are included in "WeatherIndex.R" 5. Weighted constructed using code in "1_ConstructStock.R" and other scripts. 6. Knowledge stock was constructed using code in "1_ConstructStock.R", and aggregate (SAES plus USDA) public Ag R&D spending from 1890-2007. See Note 2 above for source details. -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: See the information provided in “File list” and “Relationship between files” 2. Methods for processing the data: See the information provided in “File list” and “Relationship between files” 3. Instrument- or software-specific information needed to interpret the data: Not applicable 4. Standards and calibration information, if appropriate: Not applicable 5. Environmental/experimental conditions: Not applicable 6. Describe any quality-assurance procedures performed on the data: Not applicable. Primary data obtained from sources described in “Source information” cited here. 7. People involved with sample collection, processing, analysis and/or submission: Not applicable ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Wang et al. _R&D lags_Regression-Data (2023-06-02).xlsx Sheet: 1. MFP ----------------------------------------- rows: 68 cols: 2 A. Name: YEAR Description: Designates year of associated data B. Name: MFP Description: Multi-factor Productivity Index (base year 1910 = 100) ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Wang et al. _R&D lags_Regression-Data (2023-06-02).xlsx Sheet: 2. Ag R&D Spending ----------------------------------------- rows: 119 cols: 2 A. Name: Year Description: Designates year of associated data B. Name: Public AgRD Description: Public agricultural R&D represents the sum of SAES (State Agricultural Experiment Station) and intramural USDA research spending. The SAES R&D series (excluding forestry) are compiled from unpublished USDA, CRIS data files. The USDA intramural series for years prior to 2001 are also from the USDA sources cited in Alston, J. M., Pardey, P. G., & Rao, X. (2020). The Payoff to Investing in CGIAR Research. Arlington, VA: SoAR Foundation ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Wang et al. _R&D lags_Regression-Data (2023-06-02).xlsx Sheet: 3. Yield Data ----------------------------------------- rows: 1377 cols: 6 A. Name: COMMODITY Description: Designates crop of associated data B. Name: YEAR Description: Designates year of associated data C. Name: YIELD Description Designates national average yield of associated crop in lb/acre D. Name: AREA Description: Designates national area harvested of associated crop in acre E. Name: VOP Description: Designates national value of production of associated crop in nominal US dollars F. Name: PRODUCTION Description: Designates national quantity produced of associated crop in lbs ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Wang et al. _R&D lags_Regression-Data (2023-06-02).xlsx Sheet: 4. Weather Index ----------------------------------------- rows: 68 cols: 4 A. Name: Year Description: Designates year of associated data B. Name: Weighted Sum Yield (Standardized) Description: Weighted average yield calculated from the top 10 field crops for the years 1940-2007. Each crop's annual share of the total value of production was used as its weight. Data from tab 3 which is downloaded from USDA Quick Stat. Detailed codes are included in "WeatherIndex.R" C. Name: Predicted Weighted Sum Yield (Standardized) Description: The fitted values of crop yield obtained from regressing weighted sum yield on linear and cubic time-trends D. Name: Weather Index Description: The differences between weighted sum yield and the predicted weighted sum yield, reflecting the crop yield deviations from the long-term yield trend ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Wang et al. _R&D lags_Regression-Data (2023-06-02).xlsx Sheet: 5. Knowledge Stock Weights ----------------------------------------- rows: 51 cols: 8 A. Name: Lags Description: Designated lag periods used for agricultural R&D investments to accumulate to knowledge stock B. Name: Gamma=0.75, Lambda=0.8 Description: Weights assigned to each of the lagged investments in agricultural R&D to construct knowledge stock. Weights are based on a gamma distribution with gamma value equal to 0.75 and lambda value equal to 0.8 C. Name: Gamma=0.75, Lambda=0.85 Description: Description: Weights assigned to each of the lagged investments in agricultural R&D to construct knowledge stock. Weights are based on a gamma distribution with gamma value equal to 0.75 and lambda value equal to 0.85 D. Name: Gamma=0.85, Lambda=0.8 Description: Description: Weights assigned to each of the lagged investments in agricultural R&D to construct knowledge stock. Weights are based on a gamma distribution with gamma value equal to 0.85 and lambda value equal to 0.8 E. Name: Gamma=0.9, Lambda=0.7 Description: Description: Weights assigned to each of the lagged investments in agricultural R&D to construct knowledge stock. Weights are based on a gamma distribution with gamma value equal to 0.9 and lambda value equal to 0.7 F. Name: Trapezoid Description: Weights assigned to each of the lagged investments in agricultural R&D to construct knowledge stock. Weights are based on a 35-year trapezoidal distribution. G. Name: Geom10 Description: Weights assigned to each of the lagged investments in agricultural R&D to construct knowledge stock. Weights are based on a geometric distribution with 0.1 depreciation rate. H. Name: Geom15 Description: Weights assigned to each of the lagged investments in agricultural R&D to construct knowledge stock. Weights are based on a geometric distribution with 0.15 depreciation rate. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Wang et al. _R&D lags_Regression-Data (2023-06-02).xlsx Sheet: 6. Constructed knowledge Stock ----------------------------------------- rows: 68 cols: 9 A. Name: YEAR Description: Designates year of associated data B. Name: Gamma=0.75, Lambda=0.8 Description: Constructed knowledge stock, the weighted sum of lagged investments in agricultural R&D. Weights are based on a gamma distribution with gamma values equal to 0.75 and lambda values equal to 0.8. C. Name: Gamma=0.75, Lambda=0.85 Description: Constructed knowledge stock, the weighted sum of lagged investments in agricultural R&D. Weights are based on a gamma distribution with gamma values equal to 0.75 and lambda values equal to 0.85. D. Name: Gamma=0.85, Lambda=0.8 Description: Constructed knowledge stock, the weighted sum of lagged investments in agricultural R&D. Weights are based on a gamma distribution with gamma values equal to 0.85 and lambda values equal to 0.8. E. Name: Gamma=0.9, Lambda=0.7 Description: Constructed knowledge stock, the weighted sum of lagged investments in agricultural R&D. Weights are based on a gamma distribution with gamma values equal to 0.9 and lambda values equal to 0.7 F. Name: Trapezoid Description: Constructed knowledge stock, the weighted sum of lagged investments in agricultural R&D. Weights are based on a 35-year trapezoidal distribution G. Name: Geom10 Description: Constructed knowledge stock, the weighted sum of lagged investments in agricultural R&D. Weights are based on a geometric distribution with 0.1 depreciation rate H. Name: Geom15 Description: Constructed knowledge stock, the weighted sum of lagged investments in agricultural R&D. Weights are based on a geometric distribution with 0.15 depreciation rate I. Name: BLOOMLINEAR Description: Constructed knowledge stock using the Romer-Bloom model. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: VA_State_US (2021-04-03).xlsx ----------------------------------------- Tabs include a Document Map with links; data for the United States; then for each US State individually. Column information is the same for each data tab. rows: 76 cols: 62 A. Name: Agricultural Sector Description: From USDA?ERS Farm Income and Wealth Statistics B-BI. Names: Individual Year Description: US dollar amounts in thousands ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: BEA Implicit Price Index for GDP (2014-04-03).xls ----------------------------------------- rows: 33 cols: 94 A. Name: Line Description: Line number B. Name: [blank] Description: Category of GDP C-CP. Names: Years Description: GDP values ###### Session information for all packages R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Ventura 13.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] prais_1.1.2 pcse_1.9.1.1 sandwich_3.0-2 lmtest_0.9-40 zoo_1.8-12 RStata_1.1.1 [7] vroom_1.6.1 readxl_1.4.2 data.table_1.14.8 lubridate_1.9.2 forcats_1.0.0 dplyr_1.1.1 [13] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0 [19] stringr_1.5.0 RCurl_1.98-1.12 loaded via a namespace (and not attached): [1] cellranger_1.1.0 pillar_1.8.1 compiler_4.2.1 bitops_1.0-7 tools_4.2.1 bit_4.0.5 [7] lattice_0.20-45 lifecycle_1.0.3 gtable_0.3.3 timechange_0.2.0 pkgconfig_2.0.3 rlang_1.1.0 [13] cli_3.6.1 rstudioapi_0.14 withr_2.5.0 generics_0.1.3 vctrs_0.6.1 hms_1.1.3 [19] bit64_4.0.5 grid_4.2.1 tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.3 [25] foreign_0.8-82 tzdb_0.3.0 magrittr_2.0.3 scales_1.2.1 colorspace_2.1-0 utf8_1.2.2 [31] stringi_1.7.12 munsell_0.5.0 crayon_1.5.2 [PGP1]This is potentially confusing to a reader. It seems what is being requested is the date over which any primary (survey etc) data were collected, and what is entered here is the date range of the (largely third party) data? 14