This readme.txt file was generated on 2024-04-11 by Dorothy Sweet Recommended citation for the data: Sweet, Dorothy D; Hirsch, Candice N; Hirsch, Cory D. (2024). Hirsch Lab UAV Commercial Maize Phenotyping Project at UMN SROC Waseca: 2020, 2021, and 2022. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/7t39-h236. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Hirsch Lab UAV Commercial Maize Phenotyping Project at UMN SROC Waseca: 2020, 2021, and 2022 2. Author Information Principal Investigator Contact Information Name: Candice N. Hirsch Institution: University of Minnesota Address: 518 Borlaug Hall 1991 Upper Buford Circle Saint Paul, MN 55108 Email: cnhirsch@umn.edu ORCID: 0000-0002-8833-3023 Associate or Co-investigator Contact Information Name: Cory Hirsch Institution: University of Minnesota Address: 495 Borlaug Hall 1991 Upper Buford Circle St. Paul, MN 55108 Email: cdhirsch@umn.edu ORCID: 0000-0002-3409-758X Associate or Co-investigator Contact Information Name: Dorothy Sweet Institution: University of Minnesota Address: 520 Borlaug Hall 1991 Upper Buford Circle Saint Paul, MN 55108 Email: kirsc168@umn.edu ORCID: 0000-0002-9614-5436 3. Date published or finalized for release: 4. Date of data collection (single date, range, approximate date) 2020-05-21 to 2022-07-18 5. Geographic location of data collection (where was data collected?): University of Minnesota Southern Research and Outreach Center Waseca, Minnesota 6. Information about funding sources that supported the collection of the data: Minnesota Corn Research and Promotion Council 7. Overview of the data (abstract): This dataset provides a valuable resource for evaluating the ability of unoccupied aerial vehicles to collect plant height information from commercial agricultural fields and predict within field variation in yield using temporal traits including plant height, growth rate, and vegetative indices. Many flights were conducted over commercial maize fields using an UAV equipped with an RGB camera and this dataset includes orthomosaics and digital elevation models generated from those flights as well as plot boundary shape files used for extraction of data from those flights. Data in this repository includes extracted plant height, extracted RGB vegetative indices, manual height measurements, weather data, soil data, and grain yield. This experiment consisted of three commercial fields containing single maize hybrids and is therefore useful in assessing the ability of UAV extracted values in identifying within field variation for prediction of yield. It can also be used to test different methods of extracting plant height values from commercial fields as it includes manual measurements of height to be used in evaluation. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC0 1.0 Universal 2. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: mmddyyyy_geotiffDEM_LZW.tif Short description: Digital Elevation Model (DEM) B. Filename: mmddyyyy_geotiff_LZW.tif Short description: orthomosaic (ortho) C. Filename: Plot_Boundary_Shapefiles.zip Short description: compressed folder holding all boundary shapefiles for each flight D. Filename: Plot_Order_Files.zip Short description: compressed folder holding output from Matlab to describe what order the plot data was extracted in E. Filename: Segmentation_Mask_Files.zip Short description: compressed folder containing all .mat files to mask out background for each flight in which the plant material was sufficient to extract RGB values F. Filename: Extracted_Plant_Height.zip Short description: compressed folder containing all extracted plant height for each flight G. Filename: Extracted_RGB_Files.zip Short description: compressed folder containing files with extracted Red, Green, and Blue values for each plot H. Filename: Vegetative_Indices_Files.zip Short description: compressed folder containing files with vegetative indices calculated from RGB values for each plot and flight I. Filename: Grain_Yield_Files.zip Short description: compressed folder containing all files pertaining to yield. This includes .csv files (Production_yield_YYYY.csv) with the output from the combine yield monitor as well as the shapefiles associated with that data for each year. J. Filename: Hand_Height_Files.zip Short description: compressed folder containing .csv files with the manual height measurements taken in the manually identified plots. K. Filename: Field_Management_Information.zip Short description: compressed folder containing .txt files with information about how each field was managed during the growing season including hybrid information, planting date, and fertilizer application. L. Filename: Weather_Information.zip Short description: compressed folder containing files with weather information such as precipitation (Weather_YYYY.csv), solar radiation (Solar_Radiation_Data_YYYY.csv), and temperature (Temperature_Data_Waseca_YYYY.csv). M. Filename: Soil_Information_Files.zip Short description: compressed folder containing files with soil survey information such as a soil map and soil description. 2. Relationship between files: mmddyyyy_geotiff_LZW.tif and mmddyyyy_geotiffDEM_LZW.tif are orthomosaics and digital elevation models respectively but in a compressed LZW format. Both the orthomosaics and Digital Elevation Models (DEM)s are outputs from Agisoft Metashape Pro. Plot_Boundary_Shapefiles.zip contains shapefiles defining the boundaries of each plot for data extraction (Extracted_Plant_Height.zip). Segmentagiton_Mask_Files.zip contains the masks used for differentiating canopy from background pixels in extracting RGB values (Extracted_RGB_Files.zip) and calculating RGB vegetative indices (Vegetative_Indices_Files.zip). Extracted_Plant_Height.zip and Extracted_RGB_Files.zip contain that raw extracted data (plant height and RGB values). Hand_Height_Files.zip contains files with manual plant height data collected on plots throughout the season for quality control. Weather_Information.zip contains weather information such as temperature, solar radiation, and precipitation for calculation of growing degree days for data comparison across years. Grain_Yield_Files.zip contains the yield information from the combine harvest, Field_Managment_Information.zip contains the planting date, hybrid name, and fertilizer inputs, and Soil_Information_Files.zip contains the results of a soil survey of the land. -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Protocol for height extraction can be found at Tirado et al. 2020 (https://onlinelibrary.wiley.com/doi/10.1002/pld3.230) Experimental Field Design: A single hybrid of corn was grown in the summers of 2020, 2021, and 2022. Single varieties (Channel 199-11 STXRIB in 2020, Becks 5077 V2P in 2021, and LG 5525 VT2 Pro in 2022) were planted at a density of 35500 seeds/acre at each location. These fields were planted on May 2, 2020, April 30, 2021, and May 7, 2022 at the Southern Research and Outreach Center in Waseca, Minnesota. UAV Data Collection and Processing: The experiment was imaged weekly from planting until plants reached terminal height using a DJI Phantom 4 RTK drone. Images were collected at an altitude of 30 m above ground to achieve a ground sampling distance (GSD) of approximately 0.82 cm with 80% front overlap and 80% side overlap to maximize reconstruction efficiency. Flights were collected at 6 timepoints in 2020, 7 timepoints in 2021, and 8 timepoints in 2022. Ground targets of known height and a foot wide were placed around the border of the area of interest for use as ground control points (GCPs). There were 20 GCPs included in 2020, 22 in 2021, and 22 in 2022. The real world coordinates of these GCPs were collected using real time kinematic positioning with a Swift Console (v 2.3.17) base station and rover (GNSS compass configuration). Weather Data and Growing Degree Days Calculation: Daily min and max temperature data from the Southern Research and Outreach Station in Waseca MN weather station (Station ID 218692) was extracted. Growing Degree Units (GDUs) were then calculated utilizing the max and min temperatures for each date and the cumulative sum of these was extracted and assigned to each date of data collection based on the planting date. 2. Methods for processing the data: Images from each flight were processed as previously described Tirado et al. 2020 (https://onlinelibrary.wiley.com/doi/10.1002/pld3.230) until reaching the height extraction step, at which point, the custom MATLAB script was different. Briefly, Agisoft Software (Agisoft Metashpe Professional v1.7.5) was used to process the images to generate crop surface models (CSMs) and RGB orthomosaics for each flight including the initial ground flight. QGIS software (QGIS v3.16, 2021) was used for plot boundary extraction by overlaying a grid of squares based on the width of 6 rows (120 inches) and exporting plot coordinates. A width and length of 120 inches was settled on for the grid size as a realistic size for management intervention. Similarly, plots of 120 inches long and 29 inches wide were created for the areas defined as manual plots in the field. Custom MATLAB scripts for plant height were used to extract height estimates for the squares of the plot overlay using the difference-based method. In this method, height is extracted for each square of the grid from a DSM of the initial ground flight (extracting the 3rd percentile of all pixel values) and from a DSM of the data flight (extracting the 97th percentile of all pixel values). The height of the crop at the time of the data flight is then determined by subtracting the initial ground flight height value from the data flight height value. This difference-based method was similarly used to extract height values for the manual plots and plots encapsulating the GCPs. MATLAB scripts for k-means classification were used to create masks to remove all background and soil pixels from the RGB orthomosaics. Red, green, and blue (RGB) color values were extracted from the masked RGB orthomosaics on a plot basis using the same plot overlay as used for plant height in MATLAB. Values were then averaged across the plot to create a single value for each wavelength within the plot at each flight. An additional 13 vegetative indices were calculated from the RGB values. 3. Instrument- or software-specific information needed to interpret the data: MATLAB is necessary for extracting plant height and vegetative indices from the tiff images, but the raw data is also present. Plot boundary files can be viewed in QGIS software. 4. Standards and calibration information, if appropriate: Ground targets of known height and a foot wide were placed around the border of the area of interest for use as ground control points (GCPs). There were 20 GCPs included in 2020, 22 in 2021, and 22 in 2022. The real world coordinates of these GCPs were collected using real time kinematic positioning with a Swift Console (v 2.3.17) base station and rover (GNSS compass configuration). 5. Environmental/experimental conditions: Weather information including daily minimum and maximum temperatures (˚F), daily total precipitation (in.), and daily total solar radiation (Cal/cm2) were collected from the University of Minnesota Southern Research and Outreach Center in Waseca, Minnesota weather station (Station ID: 218692). 6. Describe any quality-assurance procedures performed on the data: All extracted plant heights (both the whole field grid and defined manual plots) were normalized to real world measurements by comparing the extracted GCP heights to the known GCP heights. This was completed in three different methods: with a ground (g) value for bare ground to determine a zero and with the UAV base station (b) value to determine a precisely known upper height; with a g value and with extracted GCP heights as a precisely known upper height; and without a g value due to its unnecessity and with a b value. The choice of method used for each data flight was based on the particular challenges specific to each data flight and whether or not the height of the UAV base station was able to be extracted. For each method the normalization was completed by adjusting all heights to the ground value with subtraction (when it was necessary) and then dividing the real world upper height by the extracted upper height (either UAV base station or GCPs based on the particular flight) and then multiplying the value by the extracted plant height. Individual data points (i.e. single grid squares within a single flight date) were removed from the dataset if they were classified as a dip (the height was less than 80% if the previous flight day and also remained less than the next flight day) or a peak (the plot height was more than 120% of the next flight day while still remaining more than the previous flight day). 7. People involved with sample collection, processing, analysis and/or submission: Dorothy Sweet, Julian Cooper, Cory Hirsch, and Candice Hirsch ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Manual_Matlab_Roi_YYYY.csv ----------------------------------------- 1. Number of variables: 10 2. Number of cases/rows: 40 in 2020, 20 in 2021, and 20 in 2022 3. Missing data codes: Code/symbol NA Definition 4. Variable List A. Name: Fields Description: The order in which the plots were extracted 1-40 in 2020 and 1-20 in 2021 and 2022 B. Name: Geometry Description: The kind of bounding box in the shapefile used to create this file Value label: ‘Polygon’ C. Name: BoundingBox Description: Largest and smallest latitude and longitude values for the bounding box D. Name: X Description: list of longitude coordinates for the corners and center of the bounding box E. Name: Y Description: list of latitude coordinates for the corners and center of the bounding box F. Name: id Description: The order in which the plots were created in QGIS G. Name: left Description: The lowest longitude coordinates H. Name: top Description: The highest latitude coordinates I. Name: right Description: The highest longitude coordinates J. Name: bottom Description: The lowest latitude coordinates ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Plots_Manual.txt ----------------------------------------- 1. Number of variables: 1 2. Number of cases/rows: 40 in 2020, 20 in 2021, and 20 in 2022 3. Missing data codes: Code/symbol NA Definition 4. Variable List A. Name: Plot Description: The plots in the order of extraction to align with the manual height measurements. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Plots_Matlab_Roi_YYYY.csv ----------------------------------------- 1. Number of variables: 10 2. Number of cases/rows: 1340 in 2020, 1120 in 2021, and 1147 in 2022 3. Missing data codes: Code/symbol NA Definition 4. Variable List A. Name: Fields Description: The order in which the plots were extracted B. Name: Geometry Description: The kind of bounding box in the shapefile used to create this file Value label: ‘Polygon’ C. Name: BoundingBox Description: Largest and smallest latitude and longitude values for the bounding box D. Name: X Description: list of longitude coordinates for the corners and center of the bounding box E. Name: Y Description: list of latitude coordinates for the corners and center of the bounding box F. Name: id Description: The order in which the plots were created in QGIS G. Name: left Description: The lowest longitude coordinates H. Name: top Description: The highest latitude coordinates I. Name: right Description: The highest longitude coordinates J. Name: bottom Description: The lowest latitude coordinates ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: PlotMeans95Perc_MMDDYYYY_shapefile_plot_xdim20_ydim20.txt ----------------------------------------- 1. Number of variables: 1 2. Number of cases/rows: Varies depending on the shapefile (WholeField, Manual, GCP, Zero, or BaseStation) and the year 3. Missing data codes: Code/symbol NA Definition 4. Variable List A. Name: No header Description: extracted height values for each plot boundary ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: RGB_MMDDYYYY_WholeField.txt ----------------------------------------- 1. Number of variables: 3 2. Number of cases/rows: 1340 in 2020, 1120 in 2021, and 1147 in 2022 3. Missing data codes: Code/symbol NA Definition 4. Variable List A. Name: No header Description: extracted red value B. Name: No header Description: extracted green value C. Name: No header Description: extracted blue value ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: All_Dates_RGB_Indices.csv ----------------------------------------- 1. Number of variables: 22 2. Number of cases/rows: 17316 3. Missing data codes: Code/symbol: NA Definition 4. Variable List A. Name: Plot Description: Plot name assignment B. Name: Year Description: year of data Value labels: 2020, 2021, or 2022 C. Name: Date Description: date of data collection MMDDYYYY D. Name: Red Description: Red extracted value E. Name: Green Description: Green extracted value F. Name: Blue Description: Blue extracted value G. Name: BI Description: Brightness Index H. Name: GLI Description: Green Leaf Index I. Name: NGRDI Description: Normalized Green-Red Difference Index J. Name: VARI Description: Visible Atmospherically Resistant Index K. Name: BGI Description: Blue Green Pigment Index L. Name: ExG Description: Excess Green Index M. Name: ExR Description: Excess Red Vegetation Index N. Name: ExB Description: Excess Blue Vegetation Index O. Name: ExGR Description: Excess Green Minus Excess Red P. Name: MGRVI Description: Modified Green Red Vegetation Index Q. Name: RGBVI Description: Red Green Blue Vegetation Index R. Name: GRRI Description: Green-Red Ratio Index S. Name: VEG Description: Vegetativen T. Name: Range Description: Number of the range the plot is in U. Name: Row Description: Number of the row the plot is in V. Name: Mean.Yld.bu.ac Description: Mean dry yield in bushels per acre ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: All_Vegetative_Indices_Across_Years_InTimepoints.csv ----------------------------------------- 1. Number of variables: 68 2. Number of cases/rows: 3356 3. Missing data codes: Code/symbol: NA Definition 4. Variable List A. Name: Plot Description: Plot assignment B. Name: Year Description: Year of plot data C. Name: Range Description: Range of plot data D. Name: Row Description: Row of plot data E. Name: Mean.Yld.bu.ac Description: Mean dry yield in bushels per acre F - U. Name: early_Red - early_VEG Description: Vegetation Indices from previous file averaged across the early growth season to compare across years V- AL. Name: early.exp_Red - early.exp_VEG Description: Vegetation Indices from previous file averaged across the early exponential growth period to compare across years AM - BB. Name: late.exp_Red - late.exp_VEG Description: Vegetation Indices from previous file averaged across the late exponential growth period to compare across years BC - BR. Name: late_Red - late_VEG Description: Vegetation Indices from previous file averaged across the late growth period to compare across years ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: All_Vegetative_Indices_FPCA.csv ----------------------------------------- 1. Number of variables: 5 2. Number of cases/rows: 53696 3. Missing data codes: Code/symbol: NA Definition 4. Variable List A. Name: Plot Description: Plot assignment B. Name: Year Description: Year of data C. Name: veg.Ind Description: Vegetation Index name D. Name: eigen.1 Description: First score of functional principal component of the vegetation index E. Name: eigen.2 Description: Second score of functional principal component of the vegetation index ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Production_yield_YYYY.csv ----------------------------------------- 1. Number of variables: 25 2. Number of cases/rows: 31781 in 2020, 27973 in 2021, and 13630 in 2022 3. Missing data codes: Code/symbol NA Definition 4. Variable List A. Name: Longitude Description: longitude value of harvest location B. Name: Latitude Description: latitude value of harvest location C. Name: Field Description: field identification number D. Name: Dataset Description: harvest identification information includes “harvest_1_Corn_YYYY_MM_DD_000” E. Name: Product Description: crop harvested Value label: “CORN” F. Name: Obj. Id Description: harvest point G. Name: Swth Wdth(ft) Description: width of area harvested for this data point in feet H. Name: Distance(ft) Description: distance of area harvested for this data point in feet I. Name: Duration(s) Description: meaning unknown and not used in this analysis J. Name: Track(deg) Description: degree of approach for the combine K. Name: Elevation(ft) Description: Elevation of location harvested for this data point in feet L. Name: Area Count Description: Whether or not the area is being collected Value labels: “On” M. Name: Time Description: Date of data collection N. Name: Y Offset(ft) Description: meaning unknown and not used in this analysis O. Name: Pass Num Description: Number of the pass for harvest P. Name: Moisture(%) Description: percent grain moisture of area harvested Q. Name: Crop Flw(M)(lb/s) Description: meaning unknown and not used in this analysis R. Name: Speed (mph) Description: Speed combine was moving at the time of data point collection in miles per hour S. Name: Crop Flw(V)(bu/h) Description: meaning unknown and not used in this analysis T. Name: Yld Mass(Wet)(lb/ac) Description: wet yield in pounds per acre (unadjusted based on moisture) U. Name: Yld Mass(Dry)(lb/ac) Description: dry yield in pounds per acre (adjusted based on moisture content) V. Name: Yld Vol(Wet)(bu/ac) Description: wet yield in bushels per acre (unadjusted based on moisture) W. Name: Yld Vol(Dry)(bu/ac) Description: dry yield in bushels per acre (adjusted based on moisture content) X. Name: Prod(ac/h) Description: meaning unknown and not used in this analysis Y. Name: Date Description: date of collection MM/DD/YY ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: MMDDYYYY_manual.csv ----------------------------------------- 1. Number of variables: 5 2. Number of cases/rows: 40 in 2020, 20 in 2021, and 20 in 2022 3. Missing data codes: Code/symbol NA Definition 4. Variable List A. Name: Plot Description: number of plot for manual measurement B. Name: P1 Description: manual measurement number 1 for the plot C. Name: P2 Description: manual measurement number 2 for the plot D. Name: P3 Description: manual measurement number 3 for the plot E. Name: P4 Description: manual measurement number 4 for the plot F. Name: P5 Description: manual measurement number 5 for the plot ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: GDD_Accumulation_YYYY.csv ----------------------------------------- 1. Number of variables: 6 2. Number of cases/rows: 366 in 2020, 365 in 2021, and 365 in 2022 3. Missing data codes: Code/symbol: NA Definition 4. Variable List A. Name: No header Description: row name B. Name: Date Description: MMDDYYYY C. Name: Max.Deg.F Description:Maximum temperature for the day (Fahrenheit) D. Name: Min.Deg.F Description: Minimum temperature for the day (Fahrenheit) E. Name: GDD Description: Growing Degree Day value for each day F. Name: cum.GDD Description: Accumulated growing degree days after planting date ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Solar_Radiation_Data_YYYY.csv ----------------------------------------- 1. Number of variables: 2 2. Number of cases/rows: 366 in 2020, 207 in 2021, and 365 in 2022 3. Missing data codes: Code/symbol: NA Definition 4. Variable List A. Name: Date Description: Date MM/DD/YY B. Name: Solar (cal/cm^2) Description: solar radiation accumulated that day ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Temperature_Data_Waseca_YYYY.csv ----------------------------------------- 1. Number of variables: 3 2. Number of cases/rows: 366 in 2020, 365 in 2021, and 365 in 2022 3. Missing data codes: Code/symbol: NA Definition 4. Variable List A. Name: Date Description: Date of data M/DD/YY B. Name: Max Deg F Description: Maximum temperature for the day (Fahrenheit) C. Name: Min Deg F Description: Minimum temperature for the day (Fahrenheit) ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Weather_YYYY.csv ----------------------------------------- 1. Number of variables: 6 2. Number of cases/rows: 366 in 2020, 365 in 2021, and 365 in 2022 3. Missing data codes: Code/symbol: T Definition: Trace 4. Variable List A. Name: Date Description: Date of data M/DD/YY B. Name: Maximum Temperature degrees (F) Description: maximum temperature for date (Fahrenheit) C. Name: Minimum Temperature degrees (F) Description: minimum temperature for date (Fahrenheit) D. Name: Precipitation (inches) Description: rainfall in inches E. Name: Snow (inches) Description: snowfall in inches F. Name: Snow Depth (inches) Description: accumulated snow depth in inches