Target-Driven Robotic Manipulation with Visual Attribute Reasoning

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


Target-Driven Robotic Manipulation with Visual Attribute Reasoning

Published Date




Thesis or Dissertation


As robots move from factories to our daily lives, robotic manipulation for ordinaryusers is attracting more attention from the robotics community. Target-driven manipulation is a necessary function for robots to enter people’s everyday spaces because it enables robots to perform tasks (such as grasping a specific object) driven by user inputs. Three key challenges, however, preclude the development of target-driven robotic manipulation: (1) ambiguity caused by a mismatch between human referring and robot understanding; (2) clutter formed by a target object and its surrounding objects; (3) domain shift referring to the change of data distribution from training and deployment environments. In this thesis, we address these challenges by equipping target-driven robotic manipulation with visual attribute reasoning. People recognize and grasp a target object in daily scenes by remembering the critical properties of the target. Visual attribute reasoning, or the ability to perceive andreason about essential attributes of a target item, enables humans to understand the target object and its surroundings, as well as plan their actions accordingly. In this thesis, we present the categorization of visual attributes of objects and their crucial functions in robotic manipulation. We develop robotic manipulation systems that integrate object attributes in the form of appearances, spatial locations, and local relations. The robotic systems can accomplish target-driven tasks in challenging and unconstrained environments because of the integration of these object attributes. As a result, our research advances target-driven robotic manipulation in terms of clutter handling, model generalization, and human-robot interaction in particular. Our long-term goal is to develop intelligent robots that connect human users withtheir surrounding environments. Future robots are predicted to interact with humans and accomplish complex target-driven tasks that benefit users. This thesis makes a step towards the goal by leveraging visual attribute reasoning. We present robotic grasping systems that can locate an invisible target occluded in clutter, grasp a never-seen object based on appearance attributes, and disambiguate unclear commands guided by object attributes.


University of Minnesota Ph.D. dissertation. 2022. Major: Computer Science. Advisors: Changhyun Choi, Paul Schrater. 1 computer file (PDF); 118 pages.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Yang, Yang. (2022). Target-Driven Robotic Manipulation with Visual Attribute Reasoning. Retrieved from the University Digital Conservancy,

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.