Browsing by Subject "Human Vision"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Bridging Visual Perception and Reasoning: A Visual Attention Perspective(2023-06) Chen, ShiOne of the fundamental goals of Artificial Intelligence (AI) is to develop visual systems that can reason with the complexity of the world. Advances in machine learning have revolutionized many fields in computer vision, achieving human-level performance among several benchmark tasks and industrial applications. While the performance gap between machines and humans seems to be closing, the recent debates on the discrepancies between machine and human intelligence have also received a considerable amount of attention. Studies argue that existing vision models tend to use tactics different from human perception, and are vulnerable to even a tiny shift in visual domains. Evidence also suggests they commonly exploit statistical priors, instead of genuinely reasoning on the visual observations, and have yet to develop the capability to overcome issues resulting from spurious data biases. These contradictory observations strike the very heart of AI research, and bring attention to the question: How can AI systems understand the comprehensive range of visual concepts and reason with them to accomplish various real-life tasks, as we do on a daily basis? Humans learn much from little. With just a few relevant experiences, we are able to adapt to different situations. We also take advantage of inductive biases that can easily generalize, and avoid distraction from all kinds of statistical biases. The innate generalizability is a result of not only our profound understanding of the world but also the ways we perceive and reason with visual information. For instance, unlike machines that develop holistic understanding by scanning through the whole visual scene, humans prioritize their attention with a sequence of eye fixations. Guided by visual stimuli and the structured reasoning process, we progressively locate the regions of interest, and understand their semantic relationships as well as connections to the overall task. Despite the lack of comprehensive understanding of human vision, research on humans' visual behavior can provide abundant insights into the developments of vision models, and have the potential of contributing to AI systems that are practical for real-world scenarios. With an overarching goal of building visual systems with human-like reasoning capability, we focus on understanding and enhancing the integration between visual perception and reasoning. We leverage visual attention as an interface for studying how humans and machines prioritize their focus when reasoning with diverse visual scenes. We tackle the challenges by making progress from three distinct perspectives: From the visual perception perspective, we study the relationship between the accuracy of attention and the performance related to visual understanding; From the reasoning perspective, we pay attention to the connections between reasoning and visual perception, and study the roles of attention throughout the continuous decision-making process; Humans not only capture and reason on important information with high accuracy, but can also justify their rationales with supporting evidence. From the perspective of explainability, we explore the use of multi-modal explanations for justifying the rationales behind models' decisions. Our efforts provide an extensive collection of observations for demystifying the integration between perception and reasoning, and more importantly, they offer insights into the development of trustworthy AI systems with the help of human vision.Item Measurements of Malleable Visual Mechanisms Through High-resolution fMRI and Perceptual Learning(2023-05) Navarro, KarenFor many decades, low-level properties of the visual system, like orientation selectivity, were considered stable. However, advances in methodologies and theoretical frameworks have challenged this belief; we now know that external and internal factors can influence many low-level visual properties. I conducted three separate studies that looked at different visual properties using both behavioral and high-resolution neuroimaging methods, focusing on mechanisms in the primary visual cortex.The first study explored laminar profiles of magnocellular and parvocellular pathways in human V1. This neuroimaging study used achromatic checkerboards with low spatial frequency and high temporal frequency to target the color-insensitive magnocellular pathway and chromatic checkerboards with higher spatial frequency and low temporal frequency to target the color-selective parvocellular pathway of V1. This work resulted in three main findings. First, responses driven by chromatic stimuli had a laminar profile biased towards superficial layers of V1, as compared to responses driven by achromatic stimuli. Second, we found a stronger preference for chromatic stimuli in parafoveal V1 compared with peripheral V1. Finally, we found alternating stimulus-selective bands stemming from the V1 border into V2 and V3. The second study explored the orientation dependence of neural activity in human V1. This study measured responses to stimuli at different orientations to capture the orientation-tuning properties of V1 both across cortical space and through cortical depth. This work resulted in two main findings. First, we validated previous work that orientation preference can be predicted by retinotopic location (i.e., radial bias). Second, we captured V weak orientation selectivity across all depths in cortex and attribute this finding to the fact that fMRI responses reflect neural activity averaged over a finite volume of cortex. The last study explored if temporal dynamics of sensory eye dominance could be altered using a perceptual learning technique. This study used orthogonal gratings during binocular rivalry to influence the temporal dynamics of sensory eye dominance through repeated training. Participants completed 12 days of a task meant to increase representation of one stimulus over the other. Temporal dynamics before and after training were compared. We found an increase in the total time participants spent seeing the grating from the trained eye and concluded that temporal dynamics can be changed through perceptual learning. However, this effect was relatively weak and varied in strength across participants.