Yang, Yang2022-09-262022-09-262022-07https://hdl.handle.net/11299/241770University of Minnesota Ph.D. dissertation. 2022. Major: Computer Science. Advisors: Changhyun Choi, Paul Schrater. 1 computer file (PDF); 118 pages.As robots move from factories to our daily lives, robotic manipulation for ordinaryusers is attracting more attention from the robotics community. Target-driven manipulation is a necessary function for robots to enter people’s everyday spaces because it enables robots to perform tasks (such as grasping a specific object) driven by user inputs. Three key challenges, however, preclude the development of target-driven robotic manipulation: (1) ambiguity caused by a mismatch between human referring and robot understanding; (2) clutter formed by a target object and its surrounding objects; (3) domain shift referring to the change of data distribution from training and deployment environments. In this thesis, we address these challenges by equipping target-driven robotic manipulation with visual attribute reasoning. People recognize and grasp a target object in daily scenes by remembering the critical properties of the target. Visual attribute reasoning, or the ability to perceive andreason about essential attributes of a target item, enables humans to understand the target object and its surroundings, as well as plan their actions accordingly. In this thesis, we present the categorization of visual attributes of objects and their crucial functions in robotic manipulation. We develop robotic manipulation systems that integrate object attributes in the form of appearances, spatial locations, and local relations. The robotic systems can accomplish target-driven tasks in challenging and unconstrained environments because of the integration of these object attributes. As a result, our research advances target-driven robotic manipulation in terms of clutter handling, model generalization, and human-robot interaction in particular. Our long-term goal is to develop intelligent robots that connect human users withtheir surrounding environments. Future robots are predicted to interact with humans and accomplish complex target-driven tasks that benefit users. This thesis makes a step towards the goal by leveraging visual attribute reasoning. We present robotic grasping systems that can locate an invisible target occluded in clutter, grasp a never-seen object based on appearance attributes, and disambiguate unclear commands guided by object attributes.enDeep Learning in Grasping and ManipulationMachine LearningRobotic ManipulationTarget-Driven Robotic ManipulationTarget-Driven Robotic Manipulation with Visual Attribute ReasoningThesis or Dissertation