3D Computer Vision Algorithms for Semantic Reconstruction of Agricultural Environments

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


3D Computer Vision Algorithms for Semantic Reconstruction of Agricultural Environments

Published Date




Thesis or Dissertation


Vision sensors mounted on mobile robotic platforms hold great promise in automated agriculture management. However, established computer vision techniques often fail to perform well in agricultural environments due to the environmental complexity, which makes automation difficult. To address this problem, we have designed and developed three-dimensional (3D) computer vision algorithms that improve the accuracy of imaging devices, suppress the undesirable environmental interferences, and generate accurate and precise 3D models of plants with detailed information automatically extracted for farmers. This dissertation is roughly separated into three main parts. In the first part of the thesis, we study the problem of extrinsic calibration of a 2D laser rangefinder and a camera. We present a novel method for extrinsically calibrating a camera and a 2D laser rangefinder whose beams are invisible from the camera image. We show that the point-to-plane constraints from a single observation of a V-shaped calibration pattern composed of two non-coplanar triangles suffice to uniquely constrain the relative pose between two sensors. We propose an approach to obtain analytical solutions using point-to-plane constraints from single or multiple observations. Along the way, we also show that the previous solutions, in contrast to our method, have inherent ambiguities and therefore must rely on a good initial estimate from a large number of observations. In the second part of the thesis, we study the problem of building coherent 3D reconstructions of orchard rows to improve the accuracy of measuring semantic traits for phenotyping and to automate such measurements. Even though 3D reconstructions of side views can be obtained using standard mapping techniques, merging the two side-views is difficult due to the lack of overlap between the two partial reconstructions. We propose a novel method that utilizes global features and semantic information to obtain an initial solution aligning the two sides. Our merging technique then refines the 3D model of the entire tree row by integrating semantic information common to both sides, and extracted using our novel robust detection and fitting algorithms. The proposed vision system automatically measures the semantic traits (i.e., canopy volume, trunk diameter, tree height, and fruit count) of the optimized 3D model that is built from the RGB or RGB-D data in real orchard environments. In the third part of the thesis, we study two problems of suppressing undesirable environmental interferences during sensing and mapping. In the first problem, we present a novel method to estimate the linear velocity of an unmanned aerial vehicle (UAV) from a downward-facing stereo camera even in the presence of disorderly motion of image features. In the second problem, I study the problem of detecting and localizing each elliptical object in clustered and occluded scenarios, such as fruit clusters in trees. We propose the first convolutional neural network (CNN)-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses. We first design a robust and compact ellipse regression that is able to infer the parameters of multiple elliptical objects even they are occluded by other neighboring objects. For better occlusion handling, we exploit refined feature regions for the regression stage, and integrate the encoder-decoder structure to learn different occlusion patterns. To further boost the accuracy of 3D object estimation, we propose a novel ellipse regression loss to learn the uncertainties of regressed parameters and predict the geometric quality for each detection in 2D. Such multi-view detections and geometric uncertainties are integrated into our probabilistic framework to accurately localize the enclosing ellipsoid of each occluded object in 3D. This dissertation makes progress towards achieving automated agricultural practices by building 3D semantic maps of farmlands, crops fields, and orchards, and advances the state-of-the-art automation techniques for precision agriculture. We also demonstrate the feasibility and applicability of our methods through system implementation and with results from synthetic and extensive real experiments.



University of Minnesota Ph.D. dissertation. June 2020. Major: Computer Science. Advisor: Volkan Isler. 1 computer file (PDF); xxv, 168 pages.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Dong, Wenbo. (2020). 3D Computer Vision Algorithms for Semantic Reconstruction of Agricultural Environments. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/224609.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.