Hierarchical visual processing of stimuli with varying complexity A DISSERTATION SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY Yijun Ge IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Advisor: Dr. Sheng He Co-advisor: Dr. Daniel Kersten June 2021 ã Yijun Ge 2021 ALL RIGHTS RESERVED i Acknowledgements Foremost, I would like to express my sincere gratitude to my primary advisor Dr. Sheng He and co-advisor Dr. Daniel Kersten for all their guidance and support. I have learned so much from Sheng throughout the years. He has inspired me to pursue my academic goals, sharpen my thinking, and encouraged me to explore the unknown. I am also very honored to have had the opportunity to work with Daniel. He has unique insights and a great passion for scientific research and has provided me enormous support and insightful advice during the past two years. I could not have imagined having better advisors for my Ph.D. study. Besides my advisors, I am very grateful to my other committee members Dr. Stephen Engel and Dr. Yang Zhang, for their helpful feedback and insightful comments. My sincere thanks also go to Hongru Zhu (Johns Hopkins University) and Dr. Alexander Bratch, who made great contributions to the project in the chapter 4. I would not have been able to complete this study without their help and assistant. I would like to thank all my colleagues and friends Chen Chen, Dr, Chencan Qian, Dr. Ruyuan Zhang, Dr. Quan Lei, Dr. Nilsu Atilgan, Dr. Yingzi Xiong, Yanjun Li, Dr. Juraj Mesik, Walter Wu, Xinyu Liu and Dr. Hao Zhou for all their valuable help and collaboration. Finally, my deep and sincere gratitude to my parents and my wife, Zhouyuan Sun, for their unconditional love and continuous encouragement. Without their support in my back, I would never achieve where I am today. ii Abstract Our visual system samples external information, adjusts its sensitivity and constructs a stable representation of the world that allows us to perceive and interact with objects in our environments. Visual information with different levels of complexity is processed through the hierarchically organized visual cortical areas. This dissertation presents three studies exploring the neural processing of visual stimuli at different cortical levels associated with feedforward and feedback processes. Study 1 investigates whether cortical neurons adjust their sensitivity based on stimulus-driven feedforward or perception-related feedback signals when they are discrepant, using psychophysical and neuroimaging techniques. We found that feedback signals associated with perception dominantly contribute to neural sensitivity control. Study 2 explores the properties of viewpoint-independent spatiotopic reference frame transformation for simple and complex visual stimuli using a trans-saccade adaptation paradigm. The results showed that both simple (orientation) and complex (face gender) visual features could be transformed into the viewpoint-independent spatiotopic reference frame, even in the absence of visual awareness of the target objects. Study 3 examines the viewpoint-dependent and viewpoint-independent neural representation of a complex stimulus feature (human pose information in natural images) in distinct dimensions (2D vs. 3D) using representational similarity analysis of 7T-fMRI data. The results revealed a distributed neural representation encoding different aspects of human pose features, with the 3D viewpoint-independent pose information captured at the posterior superior temporal sulcus, and body viewpoint information mainly encoded near the extrastriate visual cortex. Together, these studies help us to understand the importance of feedback signals in cortical sensitivity control, the awareness- independent transformation of visual objects from retinotopic to spatiotopic reference frame, and the distributed representation of body pose features in the visual cortical hierarchy. iii Table of Contents List of Tables …………………………………………………………………………...iv List of Figures ………………………...………………………………………………....v Chapter 1. Overview ……………………………………………………………………1 Chapter 2. Adaptation to feedback representation of illusory orientation produced from flash grab effect …………………………………………………………………...5 Introduction ……………………………………………………………………………6 Results …………………………………………………………………………………8 Discussion ……………………………………………………………………………21 Methods ………………………………………………………………………………26 Chapter 3. Spatiotopic updating across saccades in the absence of awareness ……………………………………………………………………………….38 Introduction ………………………………………………………………………..…39 Methods …………………………………………………………...……………….…41 Results ………………………………………………………………………………..45 Discussion ……………………………………………………………………………48 Chapter 4. Neural representation of human pose information in natural images …………………………………………………………………………………...53 Introduction …………………………………….…………………………………….54 Results ………………………………………………………………………………..58 Discussion ……………………………………………………………………………63 Conclusion …………………………………………………………………………...66 Methods ………………………………………….…………………………………...67 Bibliography ……………………………………………………………………………71 Appendix 1. Supplemental Information for Chapter 2 ……………………………...86 Appendix 2. Supplemental Information for Chapter 3 ……………………………...89 Appendix 3. Supplemental Information for Chapter 4 ……………………………...93 iv List of Tables Table A3.1 …………………………………………………………………………… 93 Table A3.2 …………………………………………………………………………… 93 Table A3.3 …………………………………………………………………………… 94 Table A3.4 …………………………………………………………………………… 95 v List of Figures Figure 2.1 ………………………………………………………………………...…… 9 Figure 2.2 ………………………………………………………………………...…… 14 Figure 2.3 ………………………………………………………………………...…… 15 Figure 2.4 ………………………………………………………………………...…… 19 Figure 2.5 ………………………………………………………………………...…… 20 Figure 2.6 ……………………………………………………………………...……… 20 Figure 3.1 ……………………………………………………………………...……… 46 Figure 3.2 ……………………………………………………………………...……… 47 Figure 3.3 ……………………………………………………………………...……… 49 Figure 4.1 ………………………………………………………………………...…… 58 Figure 4.2 ……………………………………………………………………...……… 59 Figure 4.3 ……………………………………………………………………...……… 61 Figure 4.4 …………………………………………………………………...………… 62 Figure 4.5 …………………………………………………………………...………… 63 Figure A1.1 ………………………………………………………………...…………. 86 Figure A1.2 ………………………………………………………………...…………. 87 Figure A1.3 ………………………………………………………………...…………. 88 Figure A2.1 ………………………………………………………………...…………. 89 Figure A2.2 ………………………………………………………………...…………. 90 Figure A2.3 ………………………………………………………………...…………. 91 Figure A2.4 ………………………………………………………………...…………. 92 Figure A3.1 ………………………………………………………………...…………. 96 1 Chapter 1. Overview The human visual system obtains rich and dynamic information from our environment that enables our perception and supports our actions. The visual cortical areas are hierarchically organized both anatomically and functionally, from lower-level cortical areas (like primary visual cortex (V1), specialized for simple stimulus features like orientation and spatial frequency), to the intermediate-level (such as V4, tuned to shape and form) (Nandy, Sharpee, Reynolds, & Mitchell, 2013) and higher-level cortical areas (including inferotemporal (IT) cortex that are sensitive to the complex stimulus features like face and body). Across hierarchical cortical areas, visual information processing involves feedforward (bottom-up) and feedback (top-down) connections. The feedforward visual cortical processing begins in the V1, which receives subcortical input from the lateral geniculate nucleus (LGN), and ascend through a ventral pathway into the temporal lobe (‘what/perception pathway’, associated with object recognition) and through a dorsal pathway into the parietal and prefrontal cortex (‘where/action pathway’, associated with spatial locations, visually guided actions, and attentional control). On the other hand, the reciprocal feedback connections carry information about top-down predictions (Kveraga, Ghuman, & Bar, 2007), influences of attention (Noudoost, Chang, Steinmetz, & Moore, 2010), awareness (Ro, Breitmeyer, Burton, Singhal, & Lane, 2003) and behavior context (Gilbert & Li, 2013). Although accumulating evidence indicated that feedforward and feedback processes play important roles in visual processing, how they interact with each other in supporting our visual perception. In addition, how do our brain constructs a stable and viewpoint-independent representation of objects in our environment remains unclear. The neural representation of different visual features (from simple to complex) also requires critical attention. This dissertation project investigated three (among many) impressive feats achieved by the visual system. First, a ubiquitous feature of the sensory nervous system is its ability to adapt to the state of the environment. We asked whether the feedforward or feedback-driven representation determines the outcome of cortical neuronal 2 adaptation when they are discrepant. Second, we have a stable representation of the visual world, despite the constant motion of our eyes and body. We studied whether the orientation and face information could be transformed into a viewpoint- independent reference frame and whether visual awareness is a prerequisite during this reference frame transformation. Third, as social animals, humans need to quickly estimate poses from others around us. We performed the representational similarity analysis using natural scene stimuli to delineate viewpoint-dependent and viewpoint- independent neural representation of human pose information in two- and three- dimensional space. In general, the feedforward signal, which more directly represents the sensory input, is consistent with the feedback signal that is more tightly linked with the perceptual representation of the stimulus. However, sometimes the feedforward- and feedback- driven representation of the stimulus could be dissociated with each other. To study the relative contributions of feedforward and feedback signals to various aspects of cortical neural processing, we need tools and paradigms to probe and measure the corresponding neural responses. In this project, reported in Chapter 2, we addressed the question of whether cortical neurons adjust their sensitivity based on stimulus- driven feedforward or perception-related feedback signals. More specifically, we adopted the orientation adaptation paradigm to investigate whether adaption would be based on the original retinal or perceived stimulus orientation. A visual illusion, flash- grab effect (FGE), was used to dissociate the perceived and retinal orientation of the adapting stimulus. Results showed that the orientation adaptation is exclusively dependent on the perceived rather than the retinal orientation of the adaptor. The combined fMRI and EEG results also indicated that the perceived orientation of the FGE is indeed supported by feedback signals in the visual cortex. With rich visual inputs, our brain builds representations of the external world which allow us to navigate through and interact with our environment. Despite continuous and frequent eye movements (up to three times per second), our perceptual 3 representation of the visual world remains stable. Given the retinotopic (coordinates centered on the retina) representation in the early visual cortex, the neural representation of the visual objects in their environment needs to be transformed into a viewpoint-independent spatiotopic (coordinates centered on the outside world) reference frame. In chapter 3, we reported a project investigating the properties of spatiotopic reference frame transformation for simple (orientation) and complex (face gender) visual stimuli using a trans-saccade adaptation paradigm. Results showed that both orientation and face gender adaptation occurred at the same spatiotopic location (but different retinotopic location). We further asked whether the reference frame transformation requires awareness of the target object. Interestingly, when the adapting stimuli were rendered invisible by continuous flash suppression (CFS), both tilt and face gender aftereffects could still be observed at the spatiotopic location. Thus, our results indicated that visual awareness of objects is not a prerequisite for their transformation to the spatiotopic reference frame. Understanding visual processing eventually amounts to understanding the processing of daily visual scenes. In the past, the majority of vision research relied on using simplified artificial laboratory stimuli, which were based on the assumption that neural processing of visual stimuli could be understood based on the responses to simple constituent patterns of stimuli (Nelken, 2004). But recent studies showed that the responses to natural visual scenes might not simply be described by the combination of responses to simplified stimuli (Hasson & Honey, 2012). Using a large set of natural scene stimuli also has an advantage in studying the complex and high-level visual information (like human pose), compared to using limited simplified stimuli. Understanding human pose information is crucial for understanding other people’s actions, emotions, and social interactions, but is also challenging because of high variations between body parts, and appearance changes due to occlusion, viewpoint, and lighting. In chapter 4, we investigate the neural representation of viewpoint- dependent and viewpoint-independent human pose information in two- and three- dimensional spaces using representational similarity analysis with 7T-fMRI Natural 4 Scene Dataset (NSD). The results showed that posterior superior temporal sulcus (pSTS) and supramarginal gyrus specifically encode the 3D viewpoint-independent pose information. We also found explicit encodings of body viewpoint information mainly near the extrastriate visual cortex. To summarize, the experimental projects reported in this thesis addressed questions related to neural representations of visual stimuli at different cortical levels and associated with feedforward and feedback processes. Three major conclusions emerge from this thesis: 1). When the perceptual representation of a stimulus is dissociated with the retinal representation, cortical neurons recalibrate their sensitivity primarily based on the feedback signals associated with perception; 2). Both simple (orientation) and complex (face gender) visual features could be transformed from retinotopic to spatiotopic reference frame, even in the absence of visual awareness of the target objects; 3). Distributed neural representations encode the different aspects of human pose information (including 2D/3D viewpoint-dependent and viewpoint-independent). Collectively, these results shed light on how feedforward and feedback visual processing contribute to neural sensitivity control, facilitate the interpretation of visual scenes, and enable the construction of a stable object representation in our environment. 5 Chapter 2 Adaptation to the feedback representation of illusory orientation produced from flash grab effect Adaptation is a ubiquitous property of sensory systems. It is typically considered that neurons adapt to dominant energy in ambient environment to function optimally. However, perceptual representation of the stimulus, often modulated by feedback signals, sometimes do not correspond to the input state of the stimulus, which tend to be more linked with feedforward signals. Here we investigated the relative contributions to cortical adaptation from feedforward and feedback signals, taking advantage of a visual illusion, the Flash-Grab Effect, to disassociate the feedforward and feedback representation of an adaptor. Results reveal that orientation adaptation is exclusively dependent on the perceived rather than the retinal orientation of the adaptor. Combined fMRI and EEG measurements demonstrate that the perceived orientation of the Flash-Grab Effect is indeed supported by feedback signals in the cortex. These findings highlight the important contribution of feedback signals for cortical neurons to recalibrate their sensitivity. This chapter is a reproduction of Ge, Y., Zhou, H., Qian, C., Zhang, P., Wang, L., & He, S. (2020). Adaptation to feedback representation of illusory orientation produced from flash grab effect. Nature communications, 11(1), 1-12. 6 INTRODUCTION Though adaptation is typically considered to be neurons adjusting their sensitivity to accommodate to the state of the “world” (Colin W.G. Clifford & Rhodes, 2005; Schwartz, Hsu, & Dayan, 2007), it is necessarily the case that the state of the “world” is reflected in neural representations. However, neural processing involves both feedforward as well as feedback signals, typically with the feedforward signal more directly representing the proximal stimulus (Pizlo, 2001) while the feedback signal, influenced by spatiotemporal contextual factors, leading to the perceptual representation of the distal stimulus. In sensory information processing, contextual modulation and feedforward-feedback interactions are very common (Albright & Stoner, 2002; Gilbert & Li, 2013; Lamme & Roelfsema, 2000). An important unresolved question is whether the feedforward or feedback driven representation determines the outcome of cortical neuronal adaptation, especially when they are discrepant. To address this question, it is necessary to dissociate the input feedforward signals from cortical feedback signals in the brain. A recently discovered visual illusion, Flash-Grab Effect (FGE) (Cavanagh & Anstis, 2013), provides such an opportunity. The FGE occurs when a bar is briefly flashed on the light-dark boundary of a sectored background moving back and forth, at the time-point of direction reversal of background motion. The “flashed” bar could be perceived as tilted by more than 10 degrees away from its original orientation, as what would be perceived without the moving background inducer (Cavanagh & Anstis, 2013). Since the FGE can alter perceived orientation, an orientation-specific adaptation was adopted to investigate whether adaptation would be based on the original retinal or perceived orientation. The tilt-aftereffect (TAE) is a robust visual phenomenon that results from orientation selective adaptation of visual neurons (Jin, Dragoi, Sur, & Seung, 2005). After prolonged exposure to an adaptor slightly tilted from vertical, a 7 vertical test is perceived as tilted away from the adapting orientation (Gibson & Radner, 1937). The underlying mechanism of this aftereffect was thought to be that cortical orientation-selective neurons in the visual system adjust or recalibrate their sensitivity based on the prevalent orientation and contrast of incoming signals, often in a population coding context, and with the goal of achieving more efficient coding (Benucci, Saleem, & Carandini, 2013; Blakemore & Tobin, 1972; C. W.G. Clifford, Wenderoth, & Spehar, 2000; Colin W.G. Clifford, 2014; Colin W.G. Clifford & Rhodes, 2005; Fang, Murray, Kersten, & He, 2005; Forte & Clifford, 2005; Jin et al., 2005; Liu, Larsson, & Carrasco, 2007; Schwartz et al., 2007; Thompson & Burr, 2009). Testing of the tilt aftereffect with the flash grab effect will inform us about the relative contribution to orientation adaptation from the input retinal orientation and the contextual modulated perceived orientation. However, for our goal, we would also need to establish a close link between the perceived orientation of FGE and the feedback signals. While previous neuroimaging experiments showed that the perceived orientation in the FGE could be decoded in the retinotopic cortex (Kohler, Cavanagh, & Tse, 2017), it remains unclear how the neural signals dynamically support the perceived orientation of the flashed bar (Hogendoorn, Verstraten, & Cavanagh, 2015). Thus, we performed high spatial and temporal resolution human brain imaging experiments to delineate the dynamic contribution of feedforward and feedback signals to the perceived orientation in FGE. As shown in the results section, we obtained strong evidence that the perceived orientation in FGE was indeed supported by feedback signals. With this link established, a demonstration of tilt-aftereffect from the perceived orientation would indicate that the feedback signals dominate cortical adaptation. In the following sections, we first present behavioral data showing that perceived orientation dominates the tilt-aftereffect. Then, we show results from high spatial- temporal resolution measurements of the cortical representation of the perceived 8 orientation in FGE. The time-resolved EEG data and layer-resolved fMRI data provide clear evidence that the perceived tilt in FGE is driven by late onset feedback signals, primarily targeting the superficial layers of the retinotopic cortex. These results together strongly suggest that perceived orientation in FGE is supported by feedback signals in the early visual cortex, which dominate orientation-selective adaptation in spite of the available feedforward signal corresponding to the original orientation of the flashed bar stimulus on the retina. RESULTS TAE depends on the perceived rather than retinal input orientation In two psychophysics experiments, we investigated the relative contribution of the perceived vs. retinal orientation of FGE to the tilt-aftereffect. In the first experiment, the adapting bars vertical at the retinal level were perceived as tilted away from vertical orientation; in the second experiment, the adapting bars tilted at the retinal level were perceived as vertical. In both experiments, the testing bars were presented around the vertical orientation. Subjects viewed a pair of vertical bars that were repeatedly and briefly flashed on top of two patterned disks that oscillated clockwise and counter-clockwise, with the flashed bars presented at the moment of the rotation reversals. The adapting bars, which would be perceived as vertical if presented without the moving background inducer, were perceived as tilted away from vertical due to the FGE (Figure 2.1a, and left column of 1b). On each trial of the main experiment condition, subjects were presented with 11 flashes (10.6 s) of adaptation, followed by 33.3 ms of blank screen, then the test bars for 33.3 ms (Figure 2.1a). Subjects were asked to judge whether two test bars converged upward or downward using a two-alternative forced choice (2AFC) method. Three control adaptation conditions were also included in the experiment: (a) the vertical flashed bars only, without the rotating background disks; (b) the rotating background disks only; (c) tilted (5.71 degrees from vertical) flashed 9 bars only, without the rotating background disks. The four conditions were presented in separate blocks. Figure 2.1. Psychophysics Stimuli and Results. (a) Stimulus presentation sequence for the TAE measurement. (b) The stimuli used in the two orientation adaptation conditions and demonstrations of subjects’ illusory perception. Left part: original vertical but perceived tilt with flash grab inducer (tilt angle is by 15.55 degrees on average (SD = 7.54)); Right part: original tilt but perceived vertical with flash grab inducer. (c) Fitted psychometric functions using logistic regression in experiment 1; (d) Averaged TAE sizes across subjects (n=8) for conventional tilted bars condition (p<0.001 Holm-corrected) and flash grab condition (p=0.010 Holm corrected) in experiment 1 (two-sided paired t-test, p=0.013, d=1.17); (e) Fitted psychometric functions using logistic regression in experiment 2; (f) Averaged TAE sizes across subjects (n=8) for conventional tilted bars condition (p<0.001 Holm corrected) and flash grab condition (p=0.730 Holm corrected) in experiment 2 (two-sided paired t-test, p<0.001, d=2.80). Black curve: vertical bars only condition; grey curve: rotating background only condition; green curves/bars: flashed tilted bars only condition (conventional tilt adaptation); orange curves/bars: flash grab illusion condition. Error bars indicate standard errors of the mean (n=8 (individual subject)). Source data are provided as a Source Data file. 10 Results show that a significant TAE was generated by perceptually tilted bars: both the tilted bars without moving background and the FGE-induced tilted bars. Figure 2.1c shows the psychometric functions for adaptation to the FGE and the other three control conditions. Not surprisingly, there was no TAE in both the no background, vertical bar only condition (p = 0.956) and the background-only condition (p = 0.534). The strength of the TAE could be measured as half the difference on the x-axis between the two points of subjective equality (PSEs) following adaptation in two opposite orientations, i.e., the distance between the two green or two orange fitted curves (Equation (1)). Figure 2.1d plots the magnitude of the TAE from the flashed tilted bar adaptors (conventional TAE) and the TAE from the FGE condition. As expected, the conventional tilt adaptation condition generated strong TAE (M = 3.72 deg, SD = 0.97, t(7) = 10.80, Holm-corrected p<0.001, d = 3.84). The key result here is that a significant TAE was observed in the flash grab condition (M = 1.67 deg, SD = 1.34, t(7) = 3.53, p = 0.010 corrected, d = 1.25), though it was weaker than the conventional TAE (two-sided paired sample t-test, t(7) = 3.31, p = 0.013, d = 1.17). The first experiment demonstrates that perceived tilted orientation could induce a TAE even though the input retinal orientation was vertical. Does the input orientation contribute to the TAE separately from the perceived orientation? To address this question, we tested subjects who adapted to bars with tilted input orientation but were perceptually vertical due to FGE (Figure 2.1b, right panel). At the beginning of this experiment, each individual subject adjusted the orientation of the flashed bars in FGE condition so that the bars were perceived as vertical. The adjusted retinal orientation was then set as the input orientation of adapting condition under FGE. Similar to the first experiment, we also included two control conditions: the vertical bars only condition and the tilted bars only condition. Figure 2.1e/f shows the results of this experiment. The two control conditions generated results as expected: without the moving background inducer, the vertical bars by themselves did not generate the TAE (p = 0.367), and the tilted bars 11 generated very robust TAE (M = 6.02 deg, SD = 1.69, t(7) = 10.08, p < 0.001 corrected, d = 3.56). However, with the moving background inducer, the key result is that the originally tilted but perceptually vertical bars (due to FGE) generated no measurable TAE (M = -0.43 deg, SD = 3.36, t(7) = -0.36, p = 0.730 corrected), which is significantly weaker than the conventional TAE (two-sided paired sample t-test, t(7) = 7.91, p < 0.001, d = 2.80) as shown in Figure 2.1f. Results from the two psychophysics experiments clearly show that, when the adaptor’s perceived orientation is dissociated from its input orientation, the TAE is induced by the perceived rather than the input orientation itself. In other words, orientation selective adaptation seems to be primarily based on the eventual perceptual representation of the stimuli rather than simply on the neural representation directly linked to the input signals. To further understand the contribution of feedforward and feedback signals to FGE and in turn to orientation- selective adaptation, we conducted fMRI and ERP studies investigating the spatial and temporal neural correlates of the FGE. Representation of FGE in the retinotopic visual cortex We investigated the neural representation of the FGE in retinotopic visual areas in two fMRI experiments. The first experiment was conducted on a 3T scanner, with a focus on the retinotopic representation of the flashed bar under FGE. The second experiment was performed at high spatial resolution on a 7T scanner, which allowed us to obtain layer-resolved response signals to FGE in the retinotopic visual cortex. With known biases of feedforward and feedback signals in different cortical layers, the 7T data could inform us about the relationship between feedback signals and perceptual representation. In the 3T fMRI experiment, we obtained the BOLD signal activated by the flashed bar in the FGE with block-designed fMRI scans (Figure 2.2a shows the stimuli and procedure of the experiment). Subjects’ retinotopic maps were also obtained using the standard rotating wedge and expanding/contracting ring stimuli (Engel, Glover, & 12 Wandell, 1997) in two separate scans. The retinotopic map provides, for each voxel in the early visual cortex, the polar angle coordinate of its population receptive field. fMRI responses to the flashed bar for voxels with the same polar angle preference were averaged and used as the radial coordinate, plotted as a function of polar angle across the visual field (figure 2.2b). From V1 to V3, the fMRI response to the clockwise FGE was stronger in the upper right and lower left quadrants of the visual field in comparison with the counter-clockwise illusion, which showed stronger responses in the upper left and lower right quadrants. Therefore, the retinotopic representation of FGE in the early visual cortex is qualitatively consistent with the perceived tilt of the flashed bar. We further estimated the angular difference between the two polar angle representations of fMRI signals in the visual cortex. Note that the angular difference represents the summed effect of clockwise and counter-clockwise tilts. The estimated angular difference was smaller in V1 (17 and 13 degrees for upper and lower visual field, respectively) compared to V2 (41 and 27 degrees) and V3 (36 and 37 degrees) (Figure 2.2b). One-way ANOVA showed that the illusory effect significantly varied across visual cortical areas (F(2, 16) = 22.24, p < 0.001, 𝜂"# = 0.735). Post hoc analysis showed that the illusory effect was significantly stronger in extra-striate than in striate visual cortex (for V2, t(8) = 5.47, p < 0.002, d = 1.824; for V3, t(8) = 5.50, p = 0.002, d = 1.834 ), while no significant difference was observed between V2 and V3 (t (8) = 2.07, p = 0.072). An important consideration is that BOLD responses reflected both the feedforward and feedback influences, and the reason for the smaller estimated tilt representation in V1 could be that V1 activity had a greater contribution from feedforward input signals (corresponding to the retinal orientation). The relative contribution of feedforward vs. feedback signals in different areas was investigated further with layer-resolved imaging (De Martino et al., 2015; Klein et al., 2018; Kok, Bains, Van Mourik, Norris, & De Lange, 2016; Muckli et al., 2015) as described in the following 7T high-resolution fMRI experiment. In the follow-up 7T fMRI experiment, we obtained high-resolution layer-specific representation of the Flash Grab Effect in different layers of V1 to V3. The paradigm 13 was essentially the same as the 3T experiment, with the exception that the flashed bar was presented on the horizontal rather than vertical meridian due to the limited vertical field of view imposed by the 7T coil (Appendix Figure A1.2). In three independent scans, subjects were presented with a rotating bar (centered on the fixation point) to map the polar angle retinotopy of early visual areas (Engel et al., 1997). In the following layer-resolved analysis, the original fMRI data were resampled from 0.85 or 0.8 mm to 0.4 mm isotropic voxel size. Voxels were separated based on their distances from cortical surfaces into three separate layers: from 0% to 40% the superficial layers (S), from 40% to 80% the middle layers (M), and from 80% to 100% the deep layers (D) (Kok et al., 2016; Muckli et al., 2015; Wagstyl et al., 2018). Responses in each ROI to the clockwise and counter- clockwise tilted illusory orientations under FGE were plotted for voxels tuned to different orientations. For each layer (S, M, or D), there were two response curves, one corresponding to the perceived clockwise tilted and the other to the counter- clockwise tilted bars (Appendix Figure A1.3). To alleviate the bias of BOLD response towards superficial layers, the response curves were normalized across conditions within each cortical layer. We calculated indexes that reflect the signal strength corresponding to the input meridian orientation and perceived tilted orientation respectively, for different layers and separately for V1, V2, and V3 based on the normalized response curves. Specifically, the index for the perceived orientation was calculated based on the mean BOLD response differences between two experimental conditions (clockwise vs. counter-clockwise) over the range of -14 to -6 and 6 to 14 degrees polar angles. The index for input meridian orientation signal was calculated based on the mean BOLD response between -4 to 4 degrees. As shown in Figure 2.3, the main effects are: 1) The representation index for the “illusory orientation” was significantly stronger in V2 and V3 than in V1 (F(2, 32) = 9.72, p < 0.001, 𝜂"# = 0.378); in contrast, the strength of signal corresponding to the input horizontal orientation was much more robust in V1 than in V2 and V3 (F(2, 32) = 27.76, p < 0.001, 𝜂"# = 0.634). 2) 14 More importantly, when signals were analyzed from different layers, the illusory representation varied significantly across layers in V1 (F(2, 32) = 3.91, p = 0.030, 𝜂"# = 0.196). Figure 2.2. Stimuli and results of the 3T fMRI. (a) Schematic diagram of stimuli and procedures for the 3T fMRI experiment. A red bar flashed repeatedly for 12 seconds at the reversal point of the background motion, alternating with 12 seconds background only stimulation. The bar was presented at the vertical meridian but would be perceived as tilted clockwise or counter-clockwise from the vertical, depended on the direction of motion reversal. Red solid lines indicate the original position of the bar, while red dotted lines illustrate subjects’ perception of the bar. (b) Polar angle representation of the flashed bar in FGE in the early visual cortex. Normalized fMRI response to the clockwise and counter-clockwise tilted illusions were plotted as a function of polar angle coordinates across the visual field. The minimal and maximum polar response for each subject were normalized to 0 and 1 (first subtracted the min and then divided by the max). Red and blue curves show the average polar response across subjects (low pass filtered by convolving with a 60-degree-width hamming window for illustration purposes). Shaded areas indicate standard errors of the mean (n = 9 (individual subject)). Red and blue bars illustrate the estimated average tilt from the curves, whereas the dots indicate the estimated tilt for individual subjects. Source data are provided as a Source Data file. 15 Figure 2.3. Representations corresponding to retinal input and illusory percepts across different layers of early visual cortices. fMRI response to the clockwise and counter-clockwise tilted illusions were calculated as a function of bar angle coordinates across the field of bar rotation. The computed response index for input representation at the horizontal meridian was based on the mean BOLD responses between -4 to 4 degrees polar angles, while the illusory index was based on the mean BOLD response differences between two experimental conditions (clockwise vs. counter-clockwise) at both -14 to -6 and 6 to 14 degrees polar angles. The representation index for the “illusory orientation” was significantly stronger in V2 and V3 than in V1 (two-way repeated measure ANOVA, F(2, 32)=9.72, p<0.001; Post-hoc Holm corrected, V1 vs. V2: t(16)=-3.23, p=0.010, d=-0.783, V1 vs. V3: t(16)=-4.14, p=0.002, d=- 1.005, V2 vs. V3: t(16)=0.31, p=0.755), consistent with the 3T fMRI results; in contrast, the strength of signal corresponding to the input orientation (horizontal) was much stronger in V1 than in V2 and V3 (two-way repeated measure ANOVA, F(2, 32)=27.76, p<0.001; Holm corrected post-hoc, V1 vs. V2: t(16)=4.84, p<0.001, d=1.176, V1 vs. V3: t(16)=-6.94, p<0.001, d=1.685, V2 vs. V3: t(16)=2.61, p=0.019, d=0.634). When signals were analyzed from different layers, the illusory representation varied significantly across layers in V1 (F(2, 32)=3.91, p=0.030). Significant illusory representation was observed in V1 superficial layer (t(16)=3.16, p=0.018 Bonferroni corrected), but not in V1 middle layer (t(16)=0.88, p=0.392), and post hoc comparison showed that the illusory effect was significantly stronger in V1 superficial layer than in the middle layer (t(16)=2.81, p=0.037 Holm corrected). The statistical comparison across layers in V1 (one-way repeated measure ANOVA) was conducted by “within-subject” design. All the error bars represent within-subject 95% confidence interval of the mean index (n=17 (individual subject)). Source data are provided as a Source Data file. 16 Significant illusory representation was observed in V1 superficial layer (t(16) = 3.16, p = 0.018 Bonferroni corrected, Cohen’s d = 0.766), but not in V1 middle layer (t(16) = 0.88, p = 0.392), and post hoc comparison showed that the illusory effect was significantly stronger in the superficial layer than in the middle layer (t(16) = 2.81, p = 0.037 Holm corrected, Cohen’s d = 0.682). These layer-specific results indicate that the neural representation of FGE is primarily localized in the superficial layer for V1, but not the middle layer. This is consistent with previous studies showing that responses in the V1 middle layer reflect mainly bottom-up input signals, while responses in the V1 superficial layers are more related to feedback signals (Bastos et al., 2012; Felleman & Van Essen, 1991; Self, van Kerkoerle, Goebel, & Roelfsema, 2019; Self, van Kerkoerle, Supèr, & Roelfsema, 2013; van Kerkoerle, Self, & Roelfsema, 2017). In other words, the layer-resolved 7T data of FGE suggest that the representation of the perceived tilt was likely driven by feedback signals. FGE correlates with late visual evoked potential signals While the fMRI results suggest that early visual areas are closely involved in FGE representation, with the 7T layer-resolved data suggesting a dominant feedback contribution to the FGE, the temporal dynamics of feedforward and feedback processing in FGE remain unclear. Thus, we adopted EEG measurements to address this question. Considering the limited spatial resolution of EEG, the flashed bar was only presented in the lower visual field so that perceptually with the influence of FGE, the flashed bar would fall onto either the left or right visual field (Figure 2.4b). This meant that an invoked ERP signal corresponding to the perceptual representation would be lateralized. In essence, the timing of the lateralized component in the ERP signal should indicate the timing of the neural representation of the perceptual effect. Trials with only a rotating background were included as a baseline condition, and trials with only a retinally tilted flashed bar without the rotating background were also included as a control condition. The orientation of the retinally tilted flashed bar were 17 individually adjusted to roughly match the perceived orientation in the FGE condition (Figure 2.4a). Figures 2.4c and 2.4d show the differential ERP from posterior electrodes evoked by the contralateral versus ipsilateral bar in all three conditions. As expected, we observed a clear lateralized C1 component in retinally-tilted condition (Figure 2.4c), in response to the lateralized feedforward input. The cluster-based permutation test revealed an early positive peak (46-98ms) within C1 latency and a later negative peak (110-217ms). In contrast, after subtracting the background-only condition, no corresponding lateralized C1 was found in the illusory condition (Figure 2.4d), but only the later negative peak (118-161ms) remained, at which time window the rotating background generated a positive deflection. This is consistent with the lack of lateralized representation in the early visual cortex during the feedforward sweep. We then performed multivariate pattern analysis to uncover the dynamic change of lateralized representation for the retinally or illusorily tilted stimuli from beyond the posterior electrodes. Linear classifiers were trained to predict whether the flashed bar was perceived to be tilted left or right at each time point (and for background- only trials, we were effectively predicting rotation direction). Retinally-tilted trials could be decoded significantly above chance about 50 ms after stimulus onset, reaching peak performance at C1 latency (Figure 2.4e). Illusory trials could also be successfully decoded starting from about 70 ms after stimulus onset. However, it outperformed the baseline condition (rotating background alone) only at a later stage, about 178 ms after stimulus onset (Figure 2.4f). We further characterized the nature of the lateralization information in the illusory condition using cross-decoding method. If the early lateralized representation before 100 ms reflected a mislocalized bar, similar to a retinally-tilted one, then a classifier trained using data from illusory condition in this time period should be able to decode data from retinally-tilted condition during C1 latency. The observed results did not support this hypothesis. Classifiers trained using illusory trials between 50-100 ms could not predict retinally- 18 tilted trials in the same period (Figure 2.5a, the decoding accuracy was actually significantly below chance level), but they did predict background-only trials significantly above chance level from 0 to around 150 ms (Figure 2.5b). Importantly, stimulus side in retinally-tilted trials between 50-100 ms could be predicted by classifiers trained using illusory trials between 180-220 ms (Figure 2.5a). This suggests that the lateralized representation for the illusory tilt appeared at a relatively late stage, and the early information about stimulus side was more closely associated with the rotating background. We further asked whether and when lateralized EEG signals could predict the magnitude of the tilt perception in FGE. The inter-subject Pearson correlation between instantaneous amplitude of the difference wave (contralateral minus ipsilateral) in illusory condition (with background-only condition subtracted) and illusion size was calculated at each time point. Significant positive correlation emerged about 177 ms after stimulus onset (Figure 2.6a). Figure 2.6b, based on the same data as the shaded region in Figure 2.6a, more explicitly shows the clear relationship between the size of the perceptual illusion and the mean amplitude of the differential wave (r(9) = 0.77, p = 0.004). Notably, the onset of significant correlation matched well with the time when decoding performance in illusory condition overtook background-only condition, as well as when the scalp topography pattern in illusory condition became similar to that of retinally-tilted condition in C1 latency, convergently supporting that the main relevant component for the illusory effect appears rather late, consistent with the typical timing of feedback signals. In contrast to a robust and clearly lateralized C1 signal from the retinally-tilted condition, no such lateralized signal was observed in the typical time window of C1 from the illusorily tilted bars under FGE. Only at a relatively late stage did the lateralized signal become prominent in the illusory condition, with its amplitude strongly correlated with the illusory effect size across individual subjects. These results support that the perceived tilt in FGE emerged later, likely a result of 19 feedback processing. Figure 2.4. Stimuli and results of the EEG experiment. (a, b) Visual stimulus and perception (dashed bar) in tilted bar only (a) and illusory (b) conditions. (c, d) The differential ERPs (contralateral minus ipsilateral) averaged over five posterior electrodes and all subjects (n=12 (individual subject)) in the tilted bar only (green), illusory (orange), and background-only (gray) conditions. Insets show corresponding topography snapshots in three diagnostic time points, assuming the bar flashed to the left of the vertical meridian. Note the topography in (d) corresponds to differential wave between illusory and background-only conditions. (e, f) Cross-validation performance of linear classifiers trained at each time point to predict to which side the bar was flashed (green), was perceived (orange), or would have been perceived if flashed (gray) averaged across subjects. Gray bars indicate the significant time period after multiple comparison correction (p threshold = 0.05) Error bands indicate 95% confidence interval obtained by bootstrap. Source data are provided as a Source Data file. 20 Figure 2.5. Cross-decoding analysis results. The color in each cell of the matrix indicates decoding accuracy if the classifier was trained with data from one time in one condition, and tested with data from another time in another condition. (a) Trained with illusory and tested with retinally tilted condition. (b) Trained with illusory and tested with background. The highlighted cells were significantly different (p threshold = 0.05) from chance level (50%) according to cluster-based permutation test, corrected for multiple comparisons. Source data are provided as a Source Data file. Figure 2.6. Inter-subject correlation Analysis. (a) The Pearson correlation between instantaneous amplitude of the difference wave (contralateral minus ipsilateral) in illusory condition (with background-only condition subtracted) and illusion size was calculated at each time point. Dark gray bars indicate period with significant correlation (p < 0.05, cluster- based permutation test). Light gray bars indicate period when the absolute value of correlation was greater than the cluster defining threshold r = 0.5. (b) The size of perceptual illusion (6.25 ± 2.35 deg) was well predicted by the mean amplitude, averaged within the shaded interval in (a). Source data are provided as a Source Data file. 21 Taken together, the fMRI results show that the distribution of fMRI BOLD signals in retinotopic visual cortical areas represented both the perceived and the input positions of the flashed bars. The 7T fMRI data further reveal that signals in the superficial layers were more influenced by the perceived illusory location of the flashed bars, especially in V1. Finally, a robust and behaviorally relevant lateralized EEG signature was only observed late in time, at around 170-180 ms after the onset of the flashed bars in the illusory condition. The combined spatio-temporal imaging results strongly suggest that the perceived tilt of the flashed bars in FGE was instigated by feedback signals. DISCUSSION The combined psychophysics, fMRI, and EEG results jointly support that cortical adaptation can be tuned to feedback-driven representations. In the case of orientation-selective adaptation investigated here, the tilt aftereffect was mainly dependent on the perceived illusory orientation from the FGE rather than the input orientation of the flashed bar. With spatiotemporal imaging results supporting a feedback origin of the perceived orientation in FGE, these results suggest that feedback signals play an important role in orientation adaptation and provide evidence that in the presence of discrepant feedforward and feedback supported representation of visual input, the feedback signal determines the adaptation outcome. A recent fMRI decoding study showed that patterns of activation in early visual cortex could be used to classify the direction of perceived position shift of FGE (Kohler et al., 2017). Our study went beyond decoding and 1) generated direct estimates of the angular representations of FGE in early visual cortex (3T fMRI), 2) identified the relative contributions from different cortical layers to the perceptual illusion (7T fMRI), and 3) revealed that the neural correlates of the perceptual illusion arose relatively late (EEG). In addition, a noticeable aspect of the 3T fMRI results is 22 that BOLD signals showed stronger representation of the FGE in dorsal compared to ventral visual cortex (see in Appendix Figure A1.1). This might have resulted from asymmetric representation across the meridian of the visual field (Liu, Heeger, & Carrasco, 2006). Perception has long been considered an inferential process (Hiebert, 1996; Pizlo, 2001), that retina inputs are modulated by spatiotemporal context and other priors to generate our perceptual experience. A number of neuroimaging studies have examined whether the neural signals in early visual cortex reflected the input properties or the perceived quality of the stimuli, with mixed results. Some studies showed that the BOLD signal in V1 reflected the perceived stimulus rather than the retinal input, such as activation reflecting distance scaling of perceived object size (Murray, Boyaci, & Kersten, 2006) and activation along apparent motion trajectory where there was no direct stimulation (Muckli, Kohler, Kriegeskorte, & Singer, 2005). Other studies have shown that local signals in V1 did not necessarily correspond to perceived brightness and color changes induced by modulating a surround field (Cornelissen, Wade, Vladusich, Dougherty, & Wandell, 2006). To reconcile the conflicting findings, an important point to consider is that BOLD responses are driven by both feedforward and feedback neural signals. In our study of FGE, the smaller estimated tilt angle based on fMRI signals in V1 could be due to a greater contribution from feedforward input signals in V1. In this regard, the layer-resolved 7T fMRI has a particular advantage, as shown in our results, in which the superficial layers tend to have more robust representations of the illusory tilt, compared to the middle layers that are more dominated by feedforward signals (De Martino et al., 2015; Kok et al., 2016; Muckli et al., 2015). Across individuals, EEG signal lateralization about 180 ms after flash onset closely correlated with the magnitude of FGE. But while all subjects showed illusory tilt effect in consistent directions, the corresponding (contra-ipsi) lateralized ERP was not always positive, with subjects experiencing weaker illusion tending to have little or 23 reversed lateralization (Figure 2.6b). This is likely because the observed ERP during that interval was also influenced by other sensory and cognitive processes. For example, a stronger feedforward representation may induce a larger negative component in the P1/N1 range, reducing the potential lateralized ERP signals in the time window. Another interesting observation is that the background by itself induced a significant lateralized EEG signal at around 120 ms (Figure 2.4d), which was not observed when a vertically flashed bar was added to this background in the FGE condition. It is possible that the abruptly flashed bar attracted attention and reduced the signal from the rotating wedge background. Alternatively, it may have been canceled out by an oppositely lateralized signal from the perceived tilted bar, which means the illusory representation could have emerged as early as 120 ms after bar onset. The fact that the lateralized signal around 120 ms was not correlated with illusion size and did not outperform background-only condition in decoding implies that this signal was not intrinsically linked to the FGE. In any case, 120 ms is not typically considered in the temporal window of feedforward processing in early visual cortex. Overall, the temporal data strongly support a feedback interpretation of FGE. An interesting observation is the below-chance level cross-decoding performance (from illusory to retinally-tilted condition) shown in Figure 2.5a. This was observed during a very early time window for the training stimulus. The implication is that the activity patterns of illusory (centered around 80 ms) and retinally-tilted (centered around 100 ms) trials were likely oppositely lateralized. It is possible the two patterns represented different features of the stimuli. Indeed, the activity patterns of the illusory condition around the same time window cross decoded the background-only condition significantly above-chance, suggesting that the former was more related to the moving background wedge (note that the wedge would always be at the opposite side of the perceived location of the flashed bar, Figure 2.4b). This below-chance decoding performance in the early time window of the illusory condition forms a clear contrast to the above-chance decoding in a later time window (~200 ms). Together, 24 they point to an early background based and late illusory bar position based cross- decoding performance. With the results from spatiotemporal imaging supporting a feedback interpretation of the FGE, the behavioral data showing that the perceived tilt in FGE could generate a TAE implies that the visual cortical neurons adapted to orientation representation driven by the feedback signals. Given that the goal of adaptation is to adjust the system’s sensitivity based on the statistics of the environment to process information more efficiently, this point becomes more interesting when the input driven feedforward representation and the feedback driven perceptual representation are in conflict and both are available in cortex. When input signals and perceptual representation agree, it is difficult to distinguish between adaptation to feedforward or feedback signals. Our previous demonstration that orientation-selective adaptation could occur to invisible gratings (S. He, Cavanagh, & Intriligator, 1996; Sheng He & MacLeod, 2001) constitutes support for adaptation to feedforward- dominated cortical representation of orientation. Our current results show that when the feedforward input orientation is different from perception, adaptation is primarily driven by the feedback-driven neural representation of the perceived property. These results also go beyond the demonstration of TAE from mentally generated bars (Mohr, Linder, Dennis, & Sireteanu, 2011; Mohr, Linder, Linden, Kaiser, & Sireteanu, 2009). Since no feedforward inputs were presented in those studies, there was no competition between the feedforward and feedback signals. There were early experiments investigating the potential influence on adaptation effect resulting from dissociation between input and perceived properties of stimuli, with mixed results. For example, in the so-called flash-drag effect, where the perceived position of a flashed stimulus appears to be shifted in the direction of a nearby moving object, the perceived location biased the effectiveness of adaptation (Kosovicheva et al., 2012). However, other studies showed that those motion- induced position changes had little contribution to the adaptation aftereffect (Fukiage 25 & Murakami, 2010, 2013). The lack of clear results from these early studies could be due to weak adaptation effect (Fukiage & Murakami, 2010) or rather small size of perceptual mislocalization (Fukiage & Murakami, 2013). The FGE could induce a 10 times larger position shift compared with the flash-drag effect (Cavanagh & Anstis, 2013), by presenting the flashed target on top of the moving background at the time it reverses its motion trajectory, rather than adjacent to the moving object. The current results, with complete dissociation between retinal input and perceived orientation of the adapting stimuli, combined with the clear demonstration of the feedback origin of the perceptual effect, provide unequivocal evidence for neural adaptation to feedback representations. Since information processing networks consist of both hierarchical stages and parallel pathways, naturally adaptation could occur at multiple stages of processing. Consequences of adaptation observed at later stages of processing could be based on inherited signals from other parts of the neural networks, or the adaptation effect could be itself inherited (Solomon & Kohn, 2014). For example, contrast adaptation effect could be observed in MT neurons or from the inheritance of contrast adaptation effect at early stages of processing (Kohn, 2007; Kohn & Movshon, 2003). Early studies have also demonstrated adaptation effect to biases in appearance in color and motion, which allowed the authors to conclude that these adaptation effects were cortical in origin (Goddard, Solomon, & Clifford, 2010; Krauskopf & Zaidi, 1986; Zaidi & Sachtler, 1991). In addition, attention could modulate the representational strength of attended features and in turn enhance its adaptation. While it is common that many factors modify the retinal input to generate perception, and these results are certainly consistent with adaptation to perception- linked neural representations, our current study has the advantage of explicitly contrasting the feedforward representation and feedback representation in their effectiveness for adaptation. Specifically, our study adds to the understanding of adaptation that when input signal and feedback representation are clearly different, the visual system can adjust its sensitivity based on the feedback-driven neural representation despite the discrepant feedforward representation. Although this point 26 is demonstrated with just one perceptual phenomenon here, our study prompts future neural adaptation models to take into account the different roles of feedforward and feedback signals, especially when they are discrepant. In summary, our spatiotemporal imaging results reveal that the illusory orientation representation was temporally late and spatially biased to the superficial cortical layers, thus pointing to a feedback origin of the FGE. Combined with psychophysical results, this study provides evidence that when perceived and input stimulus orientations of the adapting bars are dissociated with each other, the orientation adaptation mainly depends on the feedback supported neural representation linked to perception. These results highlight the important contribution of feedback signals for cortical neurons to recalibrate their sensitivity. METHODS Participants Eight healthy subjects (5 female, ages 21-27) participated in the psychophysics experiments; eleven (2 female, ages 21-27) participated in the 3T fMRI experiment (two subject was excluded due to head movement or failed to obtain clear retinotopy); seventeen (9 female, ages 22-35) participated in the 7T fMRI experiment; and twelve (4 female, ages 21-27) participated the EEG experiment (one subject was excluded due to excessive eye movement/blinks). Subjects were unaware of the purpose of the experiments. All observers had normal or corrected- to-normal vision and gave written consent. The protocol was approved by The Institutional Review Panel at the Institute of Biophysics (IBP), Chinese Academy of Sciences (CAS). Psychophysics stimuli and procedures Subjects’ head position was stabilized with a chin-rest at a viewing distance of 57cm. Stimuli were presented in a dark room on a CRT monitor (NESO FS210A, 27 Nanchang, China), with a resolution of 1024×768 and a refresh rate of 120 Hz. The experiment was programed in MATLAB (The Math Works, Inc.) using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) extensions. During the experiment, a small black fixation dot was presented at the center of the screen and a pair of rotating disks of 3.9 dva (degree of visual angle) radius were presented at the two sides of the fixation point, on a uniform gray background. The disks were patterned with 6-sectors (spanning 60 degree each sector). The distance between the fixation point and the center of each disk was 10.2 dva. The sectors had 25% Michelson Contrast (Michelson, 1995), which was defined by Cm=((Lmax -Lmin))⁄((Lmax+Lmin)) Where Lmax and Lmin represent the luminance of brighter and darker sectors respectively. The disks rotated 250° (degrees of rotation) every second and reversed direction every 240 ms (covering 60°, 1 sector, in that time). On each reversal a light-dark edge would be at the vertical orientation, and for every other rotation reversal (480 ms/cycle) two red vertical bars (0.3 dva width) were flashed on for 33 ms, aligned with the light-dark edges. In the first experiment, we tested the tilt after effect to perceived tilted but retinally vertical condition. We first measured the size of the flash-grab effect. Subjects were presented with a pair of rotating sectored disks and two vertical bars were flashed briefly at the direction reversals. A pair of green pointers (0.3 dva) was presented around each of the two disks. Using the keyboard, the subjects adjusted the angles between the pointers until the pointers and bars appeared to be aligned. They had unlimited time to adjust the angles, and were asked to press the spacebar when they were satisfied with the angle alignment to record the setting and to start the next trial. The two rotation directions (left clockwise and right counter-clockwise, vice versa) were tested 5 times each for each subject. The mean perceived tilt (away from vertical) across subjects was 15.55° (n = 8, SD = 7.54) . 28 The adaptation trial sequence is depicted in Figure 2.1a. On each trial, subjects were presented with the same patterned disks as in the flash grab measurement part of the experiment and adapted to the two flash bars. The bars were perceived to be tilted due to the flash grab effect. The adaptation period included 11 flashes (5.3 s) in each trial, followed by a 33.3 ms blank period. Then a pair of test bars were presented for 33.3ms. The test bars were the same as the pair of red bars presented during the adaptation period except that the angle between two bars was varied ranging from -6.9° to +6.9° (7 variations, -6.9°, -2.3°, -1.1°, 0°, +1.1°, +2.3°, +6.9°, positive degree represents the two bars converging upward). Subjects were asked to judge whether the two test bars were converging upward or downward using a 2AFC method. The 7 different angular conditions of bars were tested 20 times each (selected in random order across trials). Three control adaptation conditions were included in the experiment: (a) the vertical flashed bars only without the rotating background disks; (b) the rotating background disks only; (c) tilted flashed bars as in conventional TAE experiment (The bars were tilted 5.7 degrees away from vertical). The tilted flash bars conditions and the flash grab conditions are counterbalanced between blocks among the subjects. In the second experiment, we tested the tilt after effect to perceived vertical but retinally tilted condition. The conditions were similar to that described above, except that subjects needed to adjust the reversal angle of disks until the two flashed bars appeared vertical using keyboard. Subjects had unlimited time to make the adjustment. When they were satisfied with the adjustment, they pressed spacebar to start another trial. Two rotation directions were tested 20 times each for each subject. The mean orientation away from vertical across subjects was 16.02° (n = 8, SD = 7.34) . The adaptation stimulus used in this experiment is demonstrated in Figure 2.1b (right column). 29 The tilt aftereffect was measured with similar procedure as described above, except that the adapting stimuli were retinally tilted but perceived vertical for each subject. Two control conditions were included as well, one is the vertical flashed bars without the background, and the other is the retinally tilted bars without the background as in conventional TAE experiments. 3T fMRI procedures and data acquisition Stimuli were presented with an MRI safe projector (1024x768@60Hz) on a translucent screen behind the head coil. For the FGE experiment, the rotating pinwheel background (Figure 2.2a) was presented at 3.12% contrast, 36.87 degrees of visual angle in diameter, rotating at 180 degrees per second and changed motion direction every 0.67 seconds (120 degrees per rotation). A red vertical bar (36.87 and 0.96 degrees in length and width, respectively) was briefly presented for 67 ms at the boundary of two disc sectors, at the moment of background motion reversal. Subjects were instructed to keep fixation while passively viewed the stimuli. Four runs of functional data were collected for the FGE experiment, each consisted of 144 image volumes. Retinotopic localizer were rotating wedge and expanding ring checkerboard stimuli reversing contrast at 5 Hz. The wedge stimulus has a center angle of 22.5 degrees, rotating clockwise across the full visual field in 32 seconds. The ring stimulus expanded from fixation to the edge of the viewing aperture (47.93 degrees in diameter) in 32 seconds. Two runs of functional images were collected for the retinotopic localizer, 128 image volumes for each run. MRI data were acquired with a 3T MRI scanner (Siemens Trio) using a 12-channel receive head coil at Beijing MRI Center for Brain Research (BMCBR), IBP, CAS. Functional images were acquired with a gradient echo planar imaging sequence (3 mm isotropic voxels, 30 axial slices of 3 mm thickness, 64×64 matrix with 3 mm in- plane resolution, TR/TE = 2000/28 ms, flip angle = 90°). High-resolution anatomical volume was obtained with a T1-MPRAGE sequence (1 mm isotropic voxels, 192 30 sagittal slices of 1mm thickness, 256×256 matrix with 1 mm in-plane resolution, TR/TE = 2600/3.02 ms, flip angle = 8°). 7T fMRI procedures and data acquisition Viewing aperture of the 7T screen was 26.27 degrees horizontally and 19.85 degrees vertically. Fullfield rotating pinwheel background (Appendix Figure A1.2) was presented at 2.91% contrast, rotating at 240 degrees per second and changed motion direction every 0.5 seconds (120 degrees per rotation). A red horizontal bar (26.27° and 0.52° visual angle in length and width, respectively) was briefly presented for 67 ms at the boundary of two disc sectors, at the moment of reversal of background motion. Subjects were instructed to keep fixation while passively viewed the stimuli. Nine runs of functional images were collected for the FGE experiment, 144 volumes of images for each run. Retinotopic localizer was a rotating bar stimulus with checkerboard patterns reversing contrast at 5 Hz (26.27° and 0.52° visual angle in length and width, respectively). Centered on the fixation, the bar rotated counter-clockwise from -16 to +15 degrees in 32 seconds. Three runs of functional images were collected for the retinotopic localizer, each consisted of 128 volumes of images. MRI data were acquired with a 7T whole body MRI scanner (Siemens Healthineers GmbH, Erlangen, Germany) using a 32 channels head coil (Nova Medical, Wilmington, USA) at BMCBR, IBP, CAS. For the first seven subjects, a reduced- FOV Gradient-echo EPI sequence was used to acquire functional images (0.85 mm isotropic voxels, 21 coronal slices of 0.85 mm thickness, 126 × 96 matrix with 0.85 mm in-plane resolution, TR/TE = 2000/21 ms, flip angle = 80°, 6/8 phase partial Fourier (GRAPPA acceleration factor 3). High-resolution anatomical volume was obtained with a T1-weighted MPRAGE sequence (0.7 mm isotropic voxels, 256 sagittal slices at 0.7 mm thickness, 320 × 320 matrix with 0.7 mm in-plane resolution, TR/TE = 3100/3.56 ms, TI = 1200ms, flip angle =5°) and a proton density or PD- weighted MPRAGE sequence (0.7 mm isotropic voxels, 256 sagittal slices at 0.7 mm 31 thickness, 320 × 320 matrix with 0.7 mm in-plane resolution, TR/TE = 2340/3.56 ms, flip angle = 5°). For the rest ten subjects, functional images were collected with a GE-EPI sequence with larger FOV (TR = 2000 ms, TE = 23 ms, 80° flip angle, voxel size 0.8 × 0.8 × 0.8 mm, FOV 128 × 128 mm, 31 oblique-coronal slices, 6/8 phase partial Fourier, GRAPPA acceleration factor 3). High-resolution anatomic volume was obtained with a T1-weighted MP2RAGE sequence (TR = 4000 ms, TE = 3.05 ms, voxel size 0.7 × 0.7 × 0.7 mm, field of view 224 × 224 mm, 256 sagittal slices, receiver bandwidth 240 Hz/pix, 7/8 phase partial Fourier, 7/8 slice partial Fourier, TI1 = 750 ms, 4° flip angle, TI2 = 2500 ms, 5° flip angle). EEG procedures and data acquisition Observers were tested individually in a dark testing room. Head position was stabilized with a chin rest at a viewing distance of 57 cm. Stimuli were presented on a CRT monitor (NESO FS210A, Nanchang, China) with a resolution of 800*600 and a refresh rate of 100 Hz. The experiment script was written in MATLAB (The Math Works, Inc.) using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) extensions. As shown in Figure 2.4a and 2.4b, the screen was filled with a uniform gray background. A small, black fixation dot was 5.9 dva (degrees of visual angle) above the screen center and a 60-degree sector (6.3% contrast with background) of 15.6 dva radius rotated back and forth below the fixation point. The sector rotated 80° (degrees of rotation) every second and reversed direction every 1500 ms (covering 120°, from -60° to 60° around vertical meridian). When the reversal occurred, a green vertical bar (0.3 dva in width) might flash for 30 ms (3 frames) at the vertical meridian, aligning with one of the two edges of the sector. In order to match the illusorily and retinally-tilted conditions, we first did a psychophysical experiment to measure the size of flash-grab effect. Within each trial, the flashed bar was always illusorily titled toward one direction. The oscillating sector 32 described above could be rotated clockwise or counter-clockwise using keyboard by the subjects, who were instructed to adjust the display so that the flashed bar appeared to be subjectively vertical. They had unlimited time to make this “subjective vertical” adjustment. When they were satisfied with the adjustment, they pressed spacebar to move on to the next trial. The two reversal directions were tested 20 times each for each subject. In the EEG experiment, subjects were presented with the same rotating sector as in the psychophysics session, except that the bar always flashed at the vertical meridian (See Figure 2.4). The green vertical bar had 50% chance to flash on for 30 ms at the reversal. The flash grab effect biased the perceived location of the flash bar in the direction of the sector’s motion after the reversal. There were four situations after a reversal: (1) sector rotated to the left without bar flash; (2) sector rotated to the right without bar flash; (3) sector rotated to the left with the flashed bar perceived to be tilted to the left; (4) sector rotated to the right with the flashed bar perceived to be tilted to the right. (1) and (2) were termed “background-only” condition, whereas (3) and (4) were termed “illusory” condition. Stimuli were presented in runs that lasted ~120s. Data from 5 runs were collected, yielding 200 repetitions in each situation. In the control experiment, only the retinally-tilted flash bar was presented (adopting the angle obtained in the psychophysics session, 50% chance to flash), without the rotating background sector, termed “retinally-tilted” condition. EEG data were acquired from 64 scalp electrodes (Neuroscan), digitized at 1000 Hz. Vertical electro-oculogram (VEO) was recorded by electrodes placed above and below the left eye. Horizontal electro-oculogram (HEO) was recorded by electrodes placed at the left and right outer canthi. The reference electrode was placed on the top of the midline between electrodes CZ and CPZ. Psychophysics data analysis 33 Psychophysical data were analyzed using custom MATLAB scripts (MathWorks Inc.). The average behavioral performance was plotted separately for each condition as the percentage of upward responses against intersection angles of test bars (Figure 2.1c/e). Data points were fitted with the following logistic function to estimate the PSE (point of subject equality) where the test bars appeared parallel (both vertical). (1) 𝑝(𝑥) = 𝛾 + 678796:;<=∗(?<@) x is the intersection angle and p(x) is the percentage of upward response. a, b, l and g are free parameters that were fitted using least squares estimation. The magnitude of TAE was measured as half the distance of PSEs following adaptation in two opposite orientations. fMRI data analysis 3T MRI data were analyzed with Brain Voyager QX software package (Goebel, Esposito, & Formisano, 2006) and Matlab (MathWorks Inc.). Functional images were motion corrected, low and high pass temporal filtered, and slice timing corrected. The high-resolution T1 volume was co-registered to the first volume of functional images, and transformed to Talairach space. General linear model was used to estimate fMRI responses to the flashed bars with clockwise and counter-clockwise illusions. The retinotopic mapping data was analyzed using a cross-correlation method embedded in BrainVoyager QX software package. 16 phase lags (every 2 seconds) was used to find the best fit of polar angle or eccentricity representation for each voxel. ROIs of early visual cortices (V1, V2, V3d/VP) were defined according to the retinotopic maps on inflated cortical surface. For each ROI, voxels were sorted and resampled into 360 bins according to their polar angle representations. Then the BOLD response of the flashed bar was plotted as a function of polar angle. From this response curve, the angular representation of a flashed bar was estimated separately for the upper and lower visual fields, defined as the polar angle that splits the area under the curve into two equal halves. 34 7T MRI data were analyzed with AFNI (Cox, 1996), Freesurfer (Fischl, 2012), and custom Matlab/Python codes. Functional images were motion corrected and EPI distortion. The high-resolution T1 volume was co-registered to the mean volume of functional images. General linear model was used to estimate fMRI responses to the red bars with clockwise and counter-clockwise illusions. A cross-correlation method with 32 phase lags (every one second) was used to generate the polar angle retinotopic map of early visual areas V1/V2/V3. Pial and White Matter surfaces were reconstructed based on PD corrected T1 volume (Van de Moortele et al., 2009). An equi-distance method was used to estimate the relative cortical depth of a voxel. The voxels in a ROI were sorted and resampled into three depth bins: superficial depth (0-0.4), middle depth (0.4-0.8), and deep cortical depth (0.8-1.0). The partition ratio was selected based on the thickness of cortical layers of human visual cortex (De Sousa et al., 2010). Similar as the 3T data analysis, BOLD response to the flashed bar was plotted as a function of polar angel representation. To alleviate the draining veins effect of BOLD signal cross cortical layers, the min and max values of polar angle response curve was normalized to 0 and 1. The FGE illusory effect was calculated as the difference of normalized response between two illusory conditions (clockwise vs. counterclockwise), averaged across two polar angle windows (voxels identified through independent localizer scan with preferred orientation tuning to -14 to -6 degrees and 6 to 14 degrees). The input representation index was calculated as the mean of normalized responses centered on the horizontal meridian (where voxels had preferred orientation tuning ranging from -4 to 4 degrees). The polar angle windows were chosen to maximize the sensitivity of the index, because when pooling across all subjects/areas/layers, the difference between CW/CCW illusory conditions were most prominent around ±10 degrees (i.e., for voxels with preferred orientation tuning around 10 or -10 degrees). A small gap was left between these orientation windows to mitigate potential cross talk, and a slightly different gap did not qualitatively change the final results. The data with error bars are displayed as mean±SEM. The p values < 0.05 were considered statistically significant. Within- 35 subject confidence intervals were estimated according to the method described by Cousineau (Cousineau, 2005). EEG data analysis Data were analyzed using EEGLAB v13.3.2 (http://www.sccn.ucsd.edu/eeglab) and MNE v0.16.2 (https://martinos.org/mne/) (Gramfort et al., 2013). Raw data were first filtered off-line with a 1-35 Hz bandpass filter. Data excursions exceeding 75 μV at electrode VEO (-100 to +300 ms) were excluded from analysis. Remaining epochs were separately averaged according to the stimulus conditions. To select electrodes for the C1 amplitude and latency analysis, grand averaged ERPs were made for each electrode and each condition but pooling all subjects. Five electrodes showing the largest C1 amplitudes were chosen for further analysis (posterior electrodes including P3, P5, PO5, PO7, O1). To quantify the C1 amplitude and latency for each stimulus and each subject, the waveforms at these five electrodes were first averaged to obtain a mean waveform. Multivariate pattern analysis (Grootswagers, Wardle, & Carlson, 2017) was conducted using scikit-learn 0.16.0 (http://scikit-learn.org/) (Pedregosa et al., 2015). Linear support vector machine classifiers were trained at each time point for each subject to predict to which side the flashed bar was retinally or perceived to be tilted, using preprocessed EEG data from all electrodes as features. For the background- only condition, we were predicting to which side a bar would be illusorily tilted if it was flashed as in the illusory condition, although the imaginary bar was not actually displayed. The decoding accuracy was estimated using a stratified 10-fold cross- validation procedure, and the regularization parameter C was set to 1.0. Each feature (electrode) was normalized to have zero mean and unitary standard deviation. To reduce the impact of random noise in single trials, we employed a mini- ERP approach. From all trials sharing the same label in the training set, k trials were randomly selected and averaged into a mini-ERP, which served as one training sample. The sampling process repeated until 1000 samples were generated and 36 used to train the classifier. Similar procedure was used at test time except that the mini-ERP samples were derived from test set. We chose k = 9 in current analysis, leading to a 3-fold boost in SNR and hence more accurate and robust decoding. Cross decoding was performed across different conditions and different time points. A separate SVM was trained using all trials in condition A at time tA, and tested using all trials in condition B at time tB. The average prediction accuracy of all subjects was recorded in a matrix at row tA and column tB. To reduce computational burden, the EEG time series were decimated in time, and raw trial data instead of mini-ERP were used (i.e., k = 1) in this analysis. The inter-subject correlation between either instantaneous or time-averaged ERP amplitude and TAE effect size was quantified with Pearson's linear correlation coefficient. The lateralization potential evoked by the vertical bar was calculated by first subtracting ERP signals in ipsilateral electrodes from corresponding contralateral electrodes, and then contrasting illusory condition with background-only condition. The same set of posterior electrodes were selected as with the ERP analysis. The illusion size for each subject was obtained by pooling all measurements for both directions from the adjustment experiment for both directions. The mean ERP amplitude was averaged within the interval between 177 ms and 400 ms after bar onset for visualization purpose. The time interval was chosen according to the onset of significant instantaneous correlation and the interval of significant higher decoding accuracy in illusory condition compared with background-only condition. The difference in time series were tested for statistical significance at population level using cluster-based permutation test (Maris & Oostenveld, 2007; Nichols & Holmes, 2003) which corrected for multiple comparisons. Values at individual time points were first subjected to mass univariate t-test with cluster-defining threshold set to p < 0.05 (or |r| > 0.5 for correlation analysis). The resulted contiguous 37 suprathreshold intervals, in which statistics were of the same sign, were defined as clusters. For cross-decoding matrix, 2D clusters were defined on regular lattice. These clusters had to further pass a critical value in “cluster mass” before reported as significant. Cluster mass is the sum of t values in the cluster. The critical values were obtained with the following procedure: 1) randomly permute left or right labels for each subject, apply mass univariate t-test, calculate cluster mass for each cluster, and record the max and min cluster mass values; 2) repeat the above for 10000 times or all possible permutations, and construct the empirical distribution for max and min values; 3) take the 97.5 and 2.5 percentiles of the max and min distributions, respectively, as the critical values for a two-tailed test. The confidence interval of population mean time courses as well as instantaneous intersubject correlation was estimated using bootstrap technique by resampling the subjects with replacement for 1000 times. 38 Chapter 3 Spatiotopic updating across the saccades in the absence of awareness Despite the continuously changing visual inputs due to eye movements, our perceptual representation of the visual world remains remarkably stable. Visual stability has been a major area of interest within the field of visual neuroscience. The early visual cortical areas are retinotopic-organized and presumably there is a retinotopic to spatiotopic transformation process that supports the stable representation of the visual world. In this study, we used a cross-saccadic adaptation paradigm to show that both the orientation adaptation and face gender adaptation could still be observed at the same spatiotopic (but different retinotopic) locations even when the adapting stimuli were rendered invisible. These results suggest that awareness of a visual object is not required for its transformation from the retinotopic to the spatiotopic reference frame. This chapter is a reproduction of Ge, Y., Sun, Z., Qian, C., & He, S. (2021). Spatiotopic updating across saccades in the absence of awareness. Journal of Vision, 21(5), 7-7. 39 INTRODUCTION Despite the continuous movements of the eyes and body, our visual world remains stable. In other words, an object could be imaged at very different positions on our retina (when eyes move), but our perceptual representation of that object remains stable in the visual world. Key to this visual stability is the transformation of visual object representation from the retinotopic (coordinates centered on the retina) to spatiotopic (coordinates centered on the outside world) reference frame across saccades (Cicchini, Binda, Burr, & Morrone, 2013; Crapse & Sommer, 2012; Fabius, Fracasso, Nijboer, & Van Der Stigchel, 2019). Previous studies showed that neurons in the extrastriate visual cortex (such as V4) and the lateral intraparietal cortex (LIP) could temporarily remap their receptive fields to compensate for an impending saccadic eye movement (Duhamel, Colby, & Goldberg, 1992; Tolias et al., 2001; Wurtz, Joiner, & Berman, 2011). Meanwhile, other studies also indicated explicit spatiotopic neural representation in middle temporal area (MT) and parietal areas (D’Avossa et al., 2007; Duhamel, Bremmer, BenHamed, & Graf, 1997), although this has remained a topic of debate (Gardner, Merriam, Movshon, & Heeger, 2008; Merriam, Gardner, Movshon, & Heeger, 2013). In any case, either by continuously updating or remapping the retinotopic maps, or by transforming the retinotopic representation to explicit spatiotopic representation, our brain would be able to keep track of the salient objects in the scene and achieve visual stability. While the input visual information during saccades is suppressed, our conscious representation of the visual scene across saccades seems to be smooth and continuous, yet we typically do not keep track of the whole visual scene. Selective attention is one of the potential mechanisms to help to maintain visual stability (Crespi et al., 2011; Melcher, 2008, 2011; Szinte, Jonikaitis, Rangelov, & Deubel, 2018). Attentional selection contributes to visual stability by restricting information processing to salient or task-relevant objects. Thus the trans-saccadic spatiotopic updating of salient objects would allow the brain to track important features or items 40 in the scene. With multiple objects, the allocation of the selective attention would influence the spatiotopic updating and previous results showed that unattended stimuli could induce decreased but still measurable adaptation aftereffect in the spatiotopic location (Melcher, 2009; Melcher & Colby, 2008). However, while attention plays an important role in gating information to awareness, attention and awareness are not the same. Here we ask if the visual stimulus is invisible, could the spatiotopic updating process still happen? In other words, is spatiotopic updating so critical to our visual function that this process occurs even when we are not aware of the objects in the visual scene? Previous studies have shown that attention can be drawn to unconscious stimuli (Cohen, Cavanagh, Chun, & Nakayama, 2012; Jiang, Costello, Fang, Huang, & He, 2006) and the unconscious stimuli can still be processed to a certain level in the neural pathway (Axelrod, Bar, & Rees, 2015; Fang & He, 2005; Z. Lin & He, 2009; Sterzer, Stein, Ludwig, Rothkirch, & Hesselmann, 2014). Thus the key question addressed in this study is: is awareness of a visual object necessary for its reference frame transformation from retinotopic to spatiotopic across saccades? Retinotopic vs. spatiotopic representations are dissociated by object locations pre- and post-saccadic eye movements. To investigate the question raised above, in addition to using eye movement that dissociates the object’s retinotopic and spatiotopic locations, we also need a tool to probe the neural representation in the corresponding locations before and after the saccade. Adaptation paradigms are effective in studying neural representations in different reference frames for they allow a relatively long temporal delay in measuring the adaptation effect, so that if an object has achieved representation at the spatiotopic reference frame we would expect to see adaptation effect when the test probe is presented at the same spatiotopic location (even if its retinotopic location is different from that of the adapting stimulus). Adaptation paradigms also have the advantage of being able to target specific levels of neural representation in the visual pathway by selectively adapting to properties with different levels of complexity (Boynton & Finney, 2003; Colin W.G. Clifford & Rhodes, 2005; Georgeson, 2004; Kohn, 2007; Rushton, 1965). 41 In our study, we took advantage of two forms of visual aftereffects that were previously shown capable of generating spatiotopic aftereffects, namely the tilt aftereffect (TAE) and the face gender aftereffect (FGAE) (Cha & Chong, 2014; D. He, Mo, & Fang, 2017; T. He, Fritsche, & Lange de, 2018; Melcher, 2005, 2009; Nakashima & Sugita, 2017; Wolfe & Whitney, 2015; Zimmermann, Morrone, Fink, & Burr, 2013; Zirnsak, Gerhards, Kiani, Lappe, & Hamker, 2011). We first verified that both TAE and FGAE could be observed at the spatiotopic location, which implied that the adapting stimulus had undergone retinotopic to spatiotopic transformation. Next, to render the adapting stimulus invisible so that we could investigate whether the aftereffects could still be observed at the spatiotopic location from the invisible adaptor, we adopted the continuous flash suppression (CFS) approach. CFS is an effective way to render adapting stimuli in one eye invisible by presenting a stream of rapidly changing noise to the other eye. CFS has the advantage of achieving prolonged suppression duration and being less influenced by visual properties of the to be suppressed stimulus (Fang & He, 2005; Kim & Blake, 2005; Tsuchiya & Koch, 2005). There is evidence showing that different types of adaptation aftereffects are differentially influenced by interocular suppression. Not surprisingly, more complex stimulus properties like face gender and identity information are more vulnerable to suppression, compared with simple stimulus features such as flicker, motion, or orientation (Alais & Melcher, 2007; Kaunitz, Fracasso, & Melcher, 2011; Tsuchiya & Koch, 2005; Yang, Hong, & Blake, 2010). In this study, we investigated the role of awareness in the retinotopic to spatiotopic reference frame transformation, by using CFS to suppress the awareness of the target visual objects. Our results show that for visual targets not consciously perceived, both local orientation information and face gender information could be transformed from retinotopic to spatiotopic reference frame. METHODS 42 Participants Twelve participants (7 females, mean age=23.2) took part in the main experiment. Half of the participants (n=6) also took part in the eye movement recording experiment. All participants had normal or corrected-to-normal vision. All participants provided written informed consent and were paid to take part in the study, which was approved by the Institutional Review Panel at the Institute of Biophysics (IBP), Chinese Academy of Sciences (CAS). Stimuli Stimuli were displayed on two synchronized 23.8-inch LCD displays (Dell U2414H, 1920*1080 at 60 Hz refresh rate) and viewed from a distance of 80 cm through stereo mirrors. All visual stimuli were generated using MATLAB Psychophysics Toolbox (Brainard, 1997). The presentation of a frame (18 * 12 dva) with dashed lines facilitated stable convergence of images in two eyes and also provided background coordination information for the saccade task. A cross (0.56 * 0.56 dva) presented in the left or right part of the frame served as the fixation point. The adaptor for tilt aftereffect was a tilted (±15°) Gaussian-windowed sinusoidal luminance Gabor that subtended 5 dva (Figure 3.1b). The frequency of the Gabor was 0.8 c/deg. The test stimuli were similar to the adaptor, tilted from -4.5 to 4.5 degrees. For the face adaptation, male and female faces were used as adaptors subtending 5 dva. The morphs were generated using Morph 3.0 (Gryphon Software, San Diego, CA) with 100 intervening morphs. Morph number 50 was regarded as a neutral center point within the morphing space. Procedure There were two conditions, visible and invisible, for each adaptation stimulus type in separate sessions to avoid task complexity. A total of 2688 trials were obtained for each participant across all conditions. In the visible condition, after the initial adaptation period (25s), the participants first fixated at the left cross for 0.8 s. Then 43 the top-up adaptor was presented to the participant’s non-dominant eye for 2 s at the upper-middle location of the monitor. Following a 0.8 s (𝑆𝐷 = 0.1 𝑠) preview of the next fixation cross on the right side, while still maintaining fixation on the left cross, the participants made a saccade to the right fixation cross (6 dva from the left cross) prompted by the extinction of the current fixation cross on the left. Then a test probe was presented for 100 ms at one of four possible locations (retinotopic, spatiotopic, retinotopic-control, or spatiotopic-control) pseudo-randomly selected with equal probability (Figure 3.1). Participants needed to report the direction of tilt of the Gabor or the gender of the face. The invisible condition was the same as the visible condition, except that dynamic Mondrian patterns (10 Hz, subtending 5 dva) were simultaneously presented to participants’ dominant eye in both initial and top-up adaptation periods. To ensure that the dynamic Mondrian patterns could effectively suppress the adapting stimuli (Stein & Sterzer, 2014; Yang, Brascamp, Kang, & Blake, 2014), we first presented the adaptor at 80% contrast to test whether it could be suppressed in both initial adaptation (25s) and 20 trials of top-up adaptation (2s each trial) for each participant. Participants were asked to press a button if they detected the adaptor in the initial adapting period or in any trial. If the adaptor broke the suppression in more than 5% of trials, we then reduced the contrast of the adaptor by 5% and tested again. This process resulted in the adaptor been seen under CFS suppression in no more than 5% of the trials. The contrast of adaptor was recorded and used in the formal experiment (average contrast for Gabor patch: 79.7% ± 0.8%; average contrast for face: 78.3% ± 2.3%). During the adaptation period, if participants could see the Gabor or tell the gender of the face, they pressed a button (spacebar) to indicate the Mondrian patterns did not fully suppress awareness of the adaptor. These trials were excluded from further analysis. In addition, we included a full adaptation condition in which participants maintained the fixation on the left without making a saccade during the whole period with the 44 test stimulus presented in the same location as the adaptor. The logic of the experiment is that if an aftereffect could be observed at the spatiotopic location, then it would imply that the adapting stimulus had achieved spatiotopic representation, in other words, had undergone retinotopic to spatiotopic transformation. Eye movement measurements To verify that the participants were generally able to follow the instructions, half of the participants (n=6) took part in an eye movement experiment, which was the same as the main experiment, but half in the number of trials (1344 trials). Eye movements of the participants were monitored by the Eyelink 1000 Plus system (SR Research), which sampled gaze positions with a frequency of 1000 Hz. Only the left eye was recorded. The system detected a start and an end of a saccade when eye velocity exceeded or fell below 22°/s and acceleration was above or below 3800°/s2. At the beginning of each session during the experiment, a 9-point calibration and validation procedure was conducted. If the calibration did not meet the defined requirements, calibration was repeated until successful. The averaged horizontal eye positions over the time course of the trial for each participant were showed in Appendix Figure A2.1. The eye position traces were aligned with the midpoint of the saccade. Analysis MATLAB was used to analyze the data. The psychometric response curve was fitted with a Bayesian-based cumulative Gaussian function (psignifit toolbox in MATLAB) (Schütt, Harmeling, Macke, & Wichmann, 2016) to measure the aftereffects. The magnitude of the TAE was defined as half the difference of tilt to annul the effects of adapting clockwise, compared with counter-clockwise gratings. The FGAE was calculated with a similar method. Example fitting results for one participant were shown in Figure 3.2. It showed the tilt aftereffect in four different locations when the adaptor was visible. One-half of the distance between two fitted curves was the measured magnitude of the aftereffect. 45 RESULTS Participants were well able to maintain their fixation and execute the required eye movements (Appendix Figure A2.1). The mean distances between eye position and fixation center were 0.11° (SD=0.09°) and 0.36° (SD=0.28°) before and after saccades. Saccades, which need to be executed within 500ms after the extinction of the left fixation cross, were on average accurate and prompt, with 143.6 ms (SD=117.3) mean saccade latency. In only 1.15% of all trials, the saccades were not executed before the test stimulus presentation. Due to the very small proportion of these delayed saccades, our results were not affected by whether we exclude these trials or not in the following statistical analysis. For participants who finished separate sessions with and without eye movement recording, no significant differences were found between the two sessions (dependent sample t-tests for all conditions, p>0.05, Appendix Figure A2.2). There were also no significant differences between participants with and without eye movement recording (independent sample t-tests for all conditions, p>0.05, Appendix Figure A2.3). Thus we combined these data in the further statistical analysis. The strength of TAE and FGAE for each participant was calculated as half of the difference on the x-axis between the two points of subjective equality (PSEs) based on the psychometric functions following adaptation in two opposite orientation (TAE) or gender (FGAE) (see Figure 3.2 for an example). Statistics were then performed on the group data. We performed two-way ANOVA analyses to examine the effects of two factors (two levels of adaptor awareness and five different adapt-test relationships) on the magnitude of TAE and FGAE. For the TAE, both the main effects of adaptor awareness and adapt-test relationship are significant (adaptor awareness: F(1,11)=61.48, p<0.001, 𝜂"# = 0.848; adapt-test relationship: F(4,44)=61.61, p<0.001, 46 𝜂"# = 0.849). The interaction between adaptor awareness and the adapt-test relationship was also significant (F(4,44)=11.71, p<0.001, 𝜂"# = 0.516), indicating that the impact of adaptor awareness depended on the relationship between adapt-test locations. Post hoc analysis showed that the TAE in spatiotopic location is significantly larger than the control-spatiotopic location in both visible (t=5.91, p<0.001) and invisible condition (t=3.26, p<0.01), suggesting the existence of a spatially specific adaptation effect at the spatiotopic location, regardless of awareness state of the adapting stimulus. Figure 3.1. Experiment paradigms for different conditions. (a) The locations of adaptation and test stimuli before and after the saccade. The cross presents the fixation point. The black arrow represents the saccade direction (from left to right); The letters in the squares: A- adaptation location (also the full adapt test location); S-spatiotopic location; R-retinotopic location; Cs- control spatiotopic location; Cr- control retinotopic location; (b) Adaptor and test stimuli for tilt aftereffect and face gender aftereffect; (c) Time sequences in the experiment. The adapter was presented for 2 s (top-up adaptation) after 0.8 s fixation in the left cross. After a 0.8 s preview of the right cross, participants need to saccade to the right cross after the extinction of the left cross. Then a test stimulus was present for 0.1s in 1 out of 4 locations randomly. 47 For the FGAE, again both the main effects of adaptor awareness and adapt-test relationship are significant (adaptor awareness: F(1,11)=14.49, p=0.003, 𝜂"# = 0.568; adapt-test relationship: F(4,44)=12.15, p<0.001, 𝜂"# = 0.525). However, the interaction effect between adaptor awareness and adapt-test relationship is not significant (F(4,44)=1.83, p=0.141, 𝜂"# = 0.142), suggesting that the impact of adaptor awareness was not dependent on the relationship between adapt-test locations. Post hoc analysis showed that the FGAE in spatiotopic location is not significantly larger than that in the control-spatiotopic location in both visible and invisible conditions (p>0.05). Figure 3.2. Fitted curves of Tilt Aftereffect results for one participant in four test locations without CFS stimuli (spatiotopic (a), retinotopic (b), control-spatiotopic (c), and control- retinotopic location (d)). Similar results were found for the other 11 participants. Red and blue curves represent clockwise and counterclockwise adaptors respectively. The vertical bars represent the estimated 50% threshold. 48 For the visible condition (without CFS), the one-sample t-tests with Holm correction (N=10, 5 locations* 2 state awareness (with(out) CFS) for TAE and FGAE respectively) indicate that both TAE and FGAE could be induced at the spatiotopic location (TAE: M=0.93°, p<0.001; FGAE: M=7.56%, p<0.001), and not surprisingly, at the retinotopic location (TAE: M=2.26°, p<0.001; FGAE: M=16.67%, p<0.001). Results show that the TAE and FGAE partially transfer to control-retinotopic location (TAE: M=0.48°, p<0.01; FGAE: M=5.46%, p<0.05) and control-spatiotopic location (TAE: M=0.27°, p<0.05; FGAE: M=7.85%, p<0.01). The full adaptation condition (no saccade) reveals the strength of the TAE (M=2.38°, p<0.001) and FGAE (M=10.31%, p<0.001) in the classic condition (Figure 3.3, left panels) (also see normalized results in Appendix Figure A2.4). For the invisible condition (with CFS), interestingly, results show that both stimuli could still generate robust aftereffects at the retinotopic (TAE: M=0.85°, p<0.02; FGAE: M=6.62%, p<0.02) and spatiotopic locations (TAE: M=0.25°, p<0.02; FGAE: M=3.88%, p<0.03), whereas no aftereffect was observed at the control-spatiotopic location (TAE: M=0.02°, p=0.97; FGAE: M=1.09%, p=0.88) nor at the control- retinotopic location (TAE: M=0.02°, p=0.97; FGAE: M=0.19%, p=0.88). For the full adaptation condition without saccade, significant TAE and FGAE were observed (TAE: M=0.69°, p<0.01; FGAE: M=6.83%, p<0.05) (Figure 3.3, right panels). Comparing with results in the visible adaptation condition, the spread of aftereffects to control locations did not occur when participants had no awareness of the adaptation stimulus, however, the adaptation effect remained robust at the spatiotopic location. DISCUSSION We used the adaptation paradigm to investigate whether visual objects could be transformed from retinotopic to spatiotopic reference frame while observers were not aware of their presence. We first established that both the orientation and the face gender adaptation were capable of generating tilt and face gender aftereffects, 49 respectively, when tested at different retinotopic but the same spatiotopic location. The critical observation is that when the adapting stimulus was rendered invisible, both aftereffects could still be observed at the spatiotopic location. In contrast to awareness being not necessary for the spatiotopic updating, the buildup of spatiotopic neural representation requires spatial attention (Crespi et al., 2011; Melcher, 2008, 2009, 2011; Melcher & Colby, 2008; Szinte et al., 2018). Crespi et al. (2011) found that when participants were conducting a demanding attention task on the foveal stimuli, BOLD responses evoked by moving stimuli unrelated to the fovea task were mainly tuned in retinotopic coordinates. But the BOLD responses were tuned in spatiotopic coordinates when subjects could easily attend to the motion stimuli. In our study, when the adaptors were visible, the spatial attention to the adaptor location might help the buildup of the adaptation effect in the spatiotopic location. Previous studies showed that the stimuli under CFS could still Figure 3.3. Adaptation aftereffects (a, TAE; b, FGAE) for the No-CFS and CFS conditions in different locations. Average results from 12 participants show significant TAE and FGAE effects in spatiotopic locations when the adaptors were visible. The effect partially transferred to the two control locations. For invisible adaptor, robust adaptation effects were observed in spatiotopic locations, but not in two control locations. Error bars show ±1 SE of the mean. Multiple comparisons were Holm corrected. (* adjusted p<0.05; ** adjusted p<0.01; *** adjusted p<0.001). 50 influence spatial attention (Jiang et al., 2006), which may enable our observation that both TAE and FGAE could occur at the spatiotopic location without visual awareness. Attentional facilitation to the saccade destination may also influence the adaptation effects. In our study, the saccade target did not overlap with test locations and eccentricity-matched control locations were included for both spatiotopic and retinotopic conditions. Thus, the possible effects of attention facilitation to the saccade target were avoided due to the equal probability of test presence among four different locations (Afraz & Cavanagh, 2009). Besides, since the adaptation and test stimuli were always presented in the periphery, there was no switch between foveal and peripheral locations in testing the aftereffects, presumably generating more stable aftereffect measurements. It has been debated whether visual feature information or just the spatial information is transferred in the trans-saccadic remapping. Recent studies demonstrated that feature information like orientation (Ganmor, Landy, & Simoncelli, 2015; Wutz, Drewes, & Melcher, 2016; Zimmermann, Weidner, & Fink, 2017), shape (Demeyer, De Graef, Wagemans, & Verfaillie, 2009), motion (Fabius, Fracasso, & Van Der Stigchel, 2016; Fracasso, Caramazza, & Melcher, 2010; Melcher & Fracasso, 2012; Turi & Burr, 2012), and facial expressions (Wolfe & Whitney, 2015), could be remapped across saccades. Our results provide further support that trans-saccadic remapping takes place at the feature level. The process of feature remapping would enabling the construction of spatiotopic representations of visual features. The time course of spatiotopic updating might also influence the adaptation effects among different locations across saccades (Burr, Tozzi, & Morrone, 2007; Melcher & Morrone, 2003). There is evidence showing that the preview duration is a necessary requirement for the spatiotopic representation to fully build up (Golomb, Marino, Chun, & Mazer, 2011; Golomb, Nguyen-Phuc, Mazer, McCarthy, & Chun, 2010; Golomb, Pulido, Albrecht, Chun, & Mazer, 2010; Mathôt & Theeuwes, 2010; Morrone, Cicchini, & Burr, 2010; Zimmermann, Morrone, & Burr, 2015, 2014; 51 Zimmermann et al., 2013). Thus, the relatively long target-preview duration (0.8 s) used in our study likely contributed to a stronger object representation at the spatiotopic location. It is also possible that spatiotopic updating may have different temporal dynamics for different stimulus types and states of awareness. For example, a recent study using rotating motion illusion suggested that spatiotopic updating could occur rapidly (e.g., within 150 ms) (Fabius et al., 2019). Recent fMRI adaptation studies showed reduced BOLD response in the extrastriate visual cortex when two repeated gratings were presented at the same spatiotopic location before and after a saccade (Dunkley, Baltaretu, & Crawford, 2016; Fairhall, Schwarzbach, Lingnau, Van Koningsbruggen, & Melcher, 2017; Zimmermann, Weidner, Abdollahi, & Fink, 2016). These repetition suppression effects indicate a transfer of representation (and consequently adaptation effect) from retinotopic to spatiotopic reference frame, which is in accord with our finding of spatiotopic adaptation effect with visible grating adaptors. Our results show that when the adaptor was visible, a robust tilt aftereffect could be observed in the spatiotopic location (with the largest effect in the retinotopic location and smaller effects in the control locations). For the face gender adaptation, the magnitude of aftereffects was similar among the spatiotopic and other two control locations (smaller than the retinotopic location), which is consistent with a previous study that showed no significant difference between spatiotopic and control locations (Afraz & Cavanagh, 2009). Such results indicate that, in addition to the transformation from retinotopic to spatiotopic reference frame, when the adapting face was visible, there was a spatially non-local adaptation effect. In other words, there was a more spatially invariant representation when an object was consciously perceived, in contrast to a more spatially local object representation in the absence of awareness. The role of awareness in spatially invariant representation was also revealed for object viewpoint in a recent study using Necker cubes as stimuli (Cho & He, 2019). With awareness, the spatially non-specific effect was also observed for 52 TAE, but quite a bit weaker, presumably due to the intrinsic local nature of orientation processing in the visual cortex. More interestingly, when the adaptor was rendered invisible, our results show that there was still a significant representation of the adaptor at its spatiotopic location for both orientation and face gender information, but not in the two eccentricity-matched control locations. In other words, both local orientation and face gender information could be transformed from the retinotopic to spatiotopic reference frame without awareness. The spatiotopic updating of an object from its retinotopic reference frame, a process that is critical for achieving a stable perceptual representation of the visual world, can occur even when the object is not explicitly perceived. 53 Chapter 4 Neural representation of human pose information in natural images The human body is a stimulus that occurs frequently in real life, and the pose, defined as the spatial relationships between body parts, carries a great deal of information about the underlying motion and action of a person. While there has been literature on the neural representation of some human pose variations, the enormous pose space experienced in natural images is largely unexplored. Here we examined the cortical sensitivity to a broad range of natural poses with a high degree of appearance variations from complex natural images of people. With recent advances in 3D human pose recovery from natural images, we developed several pose models to parameterize natural pose images and characterize the structure of the natural pose space from different aspects (viewpoint-dependent vs. viewpoint- independent) in distinct dimensions (2D vs. 3D). Using representational similarity analysis of fMRI data, we found several cortical regions, including areas of lateral occipital-temporal cortex (LOTC), fusiform gyrus, and superior parietal cortex that captured the structures of the pose space from both viewpoint-independent and viewpoint-dependent parameterizations. We also found that the right superior temporal sulcus captures only the intrinsic, viewpoint-independent 3D pose dissimilarity structure. Together, our results revealed distributed representations of different aspects of human pose information from a broad range of natural poses and appearances. * This study was done in collaboration with Hongru Zhu and Alexander Bratch. Hongru Zhu developed the various model parameterizations, including the application of computer vision to extend the annotations to 3D. He also drafted the Introduction and Discussion sections. Alex Bratch provided advice on the localization of EBA/FBA ROIs and other cortical areas. In particular, Alex worked with Kendrick Kay of the CMRR to identify and standardize these ROIs for the NSD project. My primary role was in the analysis of the NSD fMRI data reported in the results, figures, and tables, including the searchlight analysis and comparisons of voxel RDMs with model RDMs. The development of the hypotheses and the interpretation of the data was the synergistic result of the collaboration between all three of us and our advisor. 54 INTRODUCTION As highly social creatures, our visual world is filled with a prevalent and complex stimulus—the human body in the natural world. The perception of the human body provides crucial support for the understanding of other people’s emotions, actions, and social interactions. More specifically, pose, defined as the spatial relationships between body parts, carries a great deal of information about the underlying motion and action of a person. Further, human vision can draw inferences about both motion and action from even a single glance. However, computing human pose from a single natural image is computationally challenging (Wang, Wang, Lin, & Yuille, 2019). For one thing, human bodies have non-rigid forms with various joint articulations, making them prone to self-occlusion. For another, there is inherently a high degree of appearance variations in natural body stimuli from changes due to occlusion, clothing, lighting, and viewpoint. Given the complexity and importance of body pose information, we investigated the cortical representation of static, natural human poses defined by the local body parts and their spatial configurations in two dimensions and three dimensions. An important line of research work has revealed specialized neural mechanisms for processing human body stimuli. Early fMRI studies found distinct cortical regions that are preferentially activated for human bodies, including the extrastriate body area (EBA) (Downing & Kanwisher, 2001) as well as the fusiform body area (FBA) (Peelen & Downing, 2005). Subsequent studies identified body part maps in the occipitotemporal cortex (OTC) with dissociable responses to individual body parts, and suggested that their organization was related to the action-related properties of body parts (Bracci, Caramazza, & Peelen, 2015; Orlov, Makin, & Zohary, 2010). Following from these previous findings which connect representations of individual body parts with action-related information, we focused on the cortical representation of human poses defined as the spatial configurations of body parts. Such intermediate pose representations have been relatively little studied but are effective 55 for motion and action understanding from a computational perspective (Campbell & Bobick, 1995; Wang, Wang, & Yuille, 2013; Yacoob & Black, 1999). Along the line of human pose representations, previous work has investigated several human brain regions and their roles in pose discrimination using static images of a few pre-selected poses. Studies found that repetitive transcranial magnetic stimulation (rTMS) of EBA disrupts the perception of bodily form while rTMS of the premotor cortex disrupts the perception of bodily action (Urgesi, Candidi, Ionta, & Aglioti, 2007). Another fMRI study suggested viewpoint- independent encoding of contorted and ordinary postures in the fusiform gyrus, posterior superior temporal sulcus (pSTS), inferior frontal gyrus (IFG) and, inferior parietal lobule (IPL), including regions classically associated with action observation (Cross, MacKie, Wolford, & Antonia, 2010). Together, these studies measured viewpoint-independent cortical responses to pose variations in static images. However, given the vast range of legitimate, natural pose variations, prior work has not addressed how the enormous pose space experienced in natural images is represented. Further, findings from the use of simplified stimuli may not generalize to complex, real visual scenes (Hasson & Honey, 2012). Considering the high degree of appearance variations for human poses in real life and the highly simplified pose stimuli used in the prior work, it raises the question of cortical sensitivity to the broad range of human poses from complex natural images of people. In light of this, we have developed several pose models that capture the structure of the pose space over a large range of natural poses. From a computational point of view, different pose parameterizations are arguably utilized to extract different aspects of pose information as needed. For example, viewpoint-dependent 3D pose representations make explicit body part depth and body orientation information with respect to the viewer, and thus are useful in the computation of relationships between a person and other objects/people. Whereas viewpoint-independent 3D pose representations are likely to be computationally more efficient for action 56 categorization by making explicit configurations of body parts of a person independent of viewpoint. Other possible pose parameterizations include viewpoint- dependent 2D pose representations, which, though requiring less computation, ignore relative depth information. These different pose parameterizations present trade-offs between computations and representations required for different tasks. In this work, we investigated cortical representations of pose information given three different parameterizations using (1) viewpoint-independent 3D pose representations, (2) viewpoint-dependent 3D pose representations as well as (3) viewpoint-dependent 2D pose representations. As a direct comparison, we also investigated another (4) viewpoint representations that were purely and explicitly based on body orientation with respect to the viewer. To parameterize poses, we need to solve the problem of extracting pose information from natural scene images. Such information usually takes the form of joint locations in three dimensions. Traditionally, it is often complicated to obtain three-dimensional pose information from natural images because human subjects have to wear markers for motion capture (mo-cap) systems when the images are taken. Even with existing natural image datasets with three-dimensional pose annotations, it is still hard to extract the viewpoint of human body images – a necessity to produce viewpoint-independent pose parameters. Benefitting from the recent advances in computer vision, we made use of an off-the-shelf human 3D mesh reconstruction model (Kanazawa, Black, Jacobs, & Malik, 2018) to extract a corresponding 3D human mesh for each human body in natural images. The 3D human mesh comes with 3D body joint rotation and 3D body global rotation parameters, namely the viewpoint parameters, and can be transformed into 3D joint locations. With 3D joint locations and global rotation parameters, we produced the desired, different parameterizations for each pose. We subsequently built separate pose models to parameterize the broad range of human poses and characterized the pose space structures with different parameterizations. 57 Our adopted pose parameterization approach enabled us to extract 2D and 3D pose information from a large set of natural human images, allowing for our analysis of cortical activations obtained from the Natural Scene Dataset (NSD) (Allen et al., 2021). This is a massive high-resolution dataset containing 7T fMRI responses to natural scene images. For the scope of our analysis, we selected a subset of 4,450 natural scene images containing only single persons engaged in different activities including sports, household activities, eating and drinking, etc. Despite the additional complexity, variations, and nuisance factors inherent to natural images, the use of this large set of NSD images complements previous studies which have used highly simplified body images with much smaller variations in pose articulations and appearances. To compare model predictions with the patterns of cortical activity, we used representational similarity analysis (RSA) and search-light mapping (Kriegeskorte, Goebel, & Bandettini, 2006; Kriegeskorte, Mur, & Bandettini, 2008). RSA enabled us to identify cortical regions whose responses correlate with the pose dissimilarity structure characterized by different pose parameterizations. This allows for a flexible form of pattern analysis and the plug-in use of different representational dissimilarity matrices (RDMs) from different models. Furthermore, RSA can also benefit from a data-driven perspective as we used search-light mapping to discover spatial clusters of voxels that may be distributed across the whole brain. We tested four different RDMs – three built on the dissimilarity measurements from three different pose parameterizations, and a fourth one built on the dissimilarities of the associated viewpoint from pairs of natural pose images. If any part of the cortical regions is sensitive to the auxiliary, relative depth information, we would expect distinct results from 2D and 3D viewpoint-dependent pose parameterizations. If viewpoint- independent pose information is automatically computed for NSD subjects in the continuous recognition task, which was to indicate whether they have seen each presented image at any point in the past, we expect some cortical regions to show greater sensitivity from 3D viewpoint-independent pose parameterizations. 58 RESULTS Pose parameterization and RDM construction With the off-the-shelf human 3D mesh reconstruction model, we extracted human 3D mesh to further parameterize poses. Figure 4.1 shows examples of natural pose images sampled from the Natural Scene Dataset together with reconstructed meshes. These reconstructed meshes were reasonable and captured the major characteristics of different poses. It is thus feasible to make use of such mesh reconstruction results to parameterize complex natural poses. With reconstructed meshes, we built different pose models to capture the structure of the pose space. We first obtained 3D joint locations and global rotation parameters, which were subsequently converted into (1) viewpoint-independent aligned 3D joint locations by reversing the global rotation in three-dimensions, (2) viewpoint-dependent 3D joint locations, (3) viewpoint-dependent 2D joint locations by discarding depth coordinates, and (4) explicit viewpoint information from global rotations. Four different models were built with these different aspects of pose information, and different RDMs were subsequently constructed in accordance with Figure 4.1. Example natural single human pose images from Natural Scene Dataset that were used in our analysis (first row), together with reconstructed 3D meshes (second row). 59 dissimilarity measurements on different parameterizations (Figure 4.2). These RDMs showed that our pose models can capture dissociable pose and viewpoint information. RSA Searchlight To investigate the spatial organization of cortical regions encoding different type of pose information, we performed searchlight-based representational similarity analyses using four different RDMs: (1) viewpoint-independent pose RDM, (2) viewpoint-dependent 3D pose RDM, (3) viewpoint-dependent 2D pose RDM, and (4) viewpoint RDM. Results were compared with cortical parcellation atlas (Desikan et al., 2006) as well as several regions of interest (ROIs) from functional localizers in NSD. RSA results from different RDMs were shown in Figure 4.3 and Figure 4.4. Using the viewpoint RDM, we identified significant clusters that correlate with body viewpoint dissimilarity structures in the lateral occipital cortex, right fusiform gyrus, inferior parietal cortex, and superior parietal cortex. For the 3D viewpoint-independent pose RDM, we found distributed clusters across lateral occipital-temporal cortex (LOTC), fusiform gyrus, and temporal-parietal junction (including posterior superior temporal sulcus (pSTS), supramarginal gyrus). Figure 4.2. RDMs from (1) the view-independent 3D pose model, (2) the view-dependent 3D pose model, (3) the view-dependent 2D pose model, and (4) the viewpoint model. RDMs were calculated using 4450 natural pose images from NSD with the same image ordering. (3) Viewpoint-dependent 2D pose RDM (4) Viewpoint RDM(2) Viewpoint-dependent 3D pose RDM (1) Viewpoint-independent 3D pose RDM 60 For the 3D viewpoint-dependent pose RDM, we find little or no significant clusters around the pSTS and supramarginal gyrus. But we found a more distributed pattern in LOTC, fusiform gyrus, posterior frontal cortex, and cingulate cortex. Particularly, the activated areas contain the pericalcarine cortex, lateral occipital cortex, lingual gyrus, fusiform gyrus, parahippocampal gyrus, inferior and middle temporal gyrus, anterior supramarginal gyrus, inferior and superior parietal cortex, precuneus cortex, left precentral and paracentral gyrus, left caudal middle frontal gyrus, and posterior cingulate cortex. The 2D viewpoint-dependent pose model showed similar activation with the 3D viewpoint-dependent pose model, with overlapped clusters in LOTC, fusiform gyrus, and inferior parietal cortex (Figure 4.5). For the viewpoint RDM, significant clusters were found mainly near the extrastriate visual cortex and posterior parietal cortex. As a result, 2D and 3D pose RDMs produced overlapping clusters mainly in areas near LOTC, fusiform gyrus, and the superior parietal cortex. Appendix Tables A3.1- A3.4 provide further details about cluster size, location, and other information from the use of different RDMs in cortical parcellation atlas (Desikan et al., 2006). We further compared the RSA searchlight results with NSD functional localizer results as shown separately in Appendix Figure A3.1. Results show that our distributed pose clusters also overlap with several ROIs that are associated with body or face processing, including OFA, FFA, FBA, and EBA. 61 Figure 4.3. The group results of the searchlight-based RSA for the (a) 3D viewpoint- independent pose, (b) 3D viewpoint-dependent pose, (c) 2D viewpoint-dependent pose, and (d) viewpoint models. Color maps show the t values of significant clusters survived a cluster- based nonparametric Monte Carlo permutation test (cluster stat: max sum; initial threshold p<0.001, n=10000). The correlation maps for each participant were first Fisher transformed to the normal distribution and then the t scores were measured. After the correction for multiple comparisons, the results were converted into z scores (z-value > 1.65 were defined as significant). 62 Figure 4.4. Conjunction plots for the significant clusters (z value > 1.65) in view- independent 3D pose (red), view-dependent 3D pose (blue) and viewpoint (green) models. The results showed significant clusters in pSTS for the 3D view-independent pose model only. For the viewpoint model, the significant clusters were mainly near the extrastriate visual cortex and posterior parietal cortex. The results for 3D viewpoint-independent and 3D viewpoint-dependent pose models also revealed significant clusters that overlapped in LOTC, fusiform gyrus, and superior parietal cortex. 63 DISCUSSION The representation of the human pose is a central aspect in the computation and interpretation of body actions. While existing research has examined cortical responses to a limited range of human poses from simplified stimuli, here we focused on the spatial organization of cortical sensitivity to a broad range of human poses from complex natural scenes. By introducing an off-the-shelf human 3D mesh reconstruction model, we parameterized natural human poses in a large set of complex natural scene images and built 2D/3D viewpoint-independent and viewpoint-dependent pose models as well as a viewpoint model. RDM analysis Figure 4.5. Conjunction plots for the significant clusters (z value > 1.65) in view-dependent 3D pose (blue) and view-dependent 2D pose (orange) models. The 3D and 2D view- dependent pose models are highly overlapped in LOTC, fusiform gyrus, and inferior parietal cortex. 64 showed that our pose models captured dissociable pose and viewpoint information. Using RSA searchlight, we showed that the dissimilarity structure of a broad range of natural poses was best captured in a set of distributed clusters across the brain, primarily including areas of lateral occipital-temporal cortex (LOTC), fusiform gyrus, and pSTS as well as supramarginal gyrus. Distributed representation of pose information The distributed clusters encoding natural pose dissimilarity structure found in our analysis converges with previously reported cortical network encoding viewpoint- independent postures from a limited range of poses (Cross et al., 2010; Urgesi et al., 2007). For example, the 3D viewpoint-independent pose model produced significant clusters in LOTC, covering what is traditionally thought to be specialized for body parts and bodies (Bracci et al., 2015; Orlov et al., 2010; Peelen & Downing, 2005). We observed right-lateralized pose clusters in the fusiform gyrus, consistent with prior work that reported right-lateralized FBA responses to bodies (Hodzic, Kaas, Muckli, Stirn, & Singer, 2009). However, our results diverge from previous work regarding the type of pose information encoded in the cortical network. As both 2D and 3D viewpoint-dependent pose models produced overlapped clusters with the 3D viewpoint-independent pose model in LOTC, fusiform gyrus, and superior parietal cortex, the pose information encoded in these cortical regions is not necessarily viewpoint independent. Whereas pSTS may indeed encode viewpoint-independent 3D pose information as they captured only the dissimilarity structure from 3D viewpoint-independent pose models. This is in contrast to the previous work that suggested viewpoint- independent encoding of postures across multiple regions including fusiform gyrus, posterior superior temporal sulcus, inferior frontal gyrus, and inferior parietal lobule. For one thing, our results suggested several candidate cortical regions that are likely to encode structured information about human poses. These regions include areas that are traditionally associated with the processing of bodies and body parts as well 65 as the processing of motion and actions (Grèzes & Decety, 2000; Isik, Koldewyn, Beeler, & Kanwisher, 2017; Peelen & Downing, 2005; Pelphrey et al., 2003; Saxe, Xiao, Kovacs, Perrett, & Kanwisher, 2004). For another, our results also suggested that within this likely distributed cortical network encoding pose structures, 3D viewpoint-independent pose information is likely to be automatically computed and that different aspects of pose information (viewpoint-dependent vs. viewpoint- independent) were encoded in different regions. Representation of viewpoint information Besides the distributed representation of pose information, our RSA searchlight using viewpoint RDM identified cortical encoding of viewpoint for bodies mainly near the extrastriate visual cortex with a few extending into the posterior parietal cortex. These clusters bearing explicit viewpoint information are rather localized compared to the distributed pose clusters. Although we found clusters encoding 2D and 3D viewpoint-dependent pose information in some distributed pose clusters, they do not seem to explicitly encode body viewpoint information. Further, both 2D and 3D viewpoint-dependent pose clusters did not emerge near pSTS, where only the 3D viewpoint-independent pose clusters were situated. In the line with these findings, several behavioral studies have shown that human pose representations have more viewpoint invariance when crossing different poses and viewpoints (Sekunova, Black, Parkinson, & Barton, 2013). Our results added evidence suggesting a possible increase in view-tolerant representations along with human pose processing. Given the degree of articulation and wide range of potential viewpoints, it seems plausible to maintain sensitivity to features irrespective of changes in viewpoint and orientation. We noted that we used the body trunk as the reference frame to determine viewpoint. Hence two poses will be deemed from the same viewpoint as long as their trunks are facing the same direction. Future experiments will be needed to study different reference frames for assessing body orientation and viewpoints, and to 66 determine the sensitivity to viewpoints across different cortical regions encoding pose structures. Computational role of pose representation One strength of our approach is that we structured the pose space with a vast range of parameterized poses covering different ways of parameterization. One future direction is to pin down the specific use of the different aspects of pose information regarding different perception tasks. As the computation of pose information serves as an essential step in the computation of motion and action of a person, the distributed nature of pose representation may be attributed to the various perception tasks (motion, emotion, action, etc.) that pose information supports. It will be an important direction to investigate the role of each local pose cluster in the computation of pose information, and the relationship between the type of pose information encoded and subsequent computation it supports. CONCLUSION In conclusion, we present an approach to parameterize three-dimensional human poses from single static images, making explicit different aspects of pose information (e.g. viewpoint-dependent vs. viewpoint independent). With different pose parameterizations, we built several pose models to capture pose dissimilarity structures from a broad range of natural poses. We applied our pose models to a large set of complex natural body images from the Natural Scene Dataset and used searchlight RSA to find cortical regions encoding pose dissimilarity structures. As a result, we found distributed pose clusters encoding pose information in LOTC, pSTS, and superior parietal cortex. In particular, our results suggested that viewpoint- independent pose information is likely to be computed automatically and that pSTS specifically encodes such 3D viewpoint-independent aspects of pose information. Furthermore, we found explicit encodings of body viewpoint information mainly near the extrastriate visual cortex, suggesting the possibly increasing view-tolerant representations along with the human pose processing. Future experiments are 67 needed to determine the differential contribution of each pose cluster in the computation of different aspects of pose information. METHODS Stimulus selection Natural Scene Dataset (Allen et al., 2021) contains 73,000 cropped color natural scene images from the MS COCO dataset (T. Y. Lin et al., 2014). We aimed to select a subset of images that contain only single persons and cover a broad range of legitimate human body poses. To this aim, we used the ground truth person keypoint annotations provided by the MS COCO dataset. For each person in each image, the annotations consist of an enclosing person bounding box together with two-dimensional image coordinates and visibility flags for 17 defined body keypoints, including 5 face keypoints (L/R eyes, nose, and L/R ears) as well as 12 limb keypoints (L/R shoulders, L/R elbows, L/R wrists, L/R hips, L/R knees, L/R ankles). We selected images with keypoint annotations for one and only one person inside the cropped image regions. As a next step, we further excluded single-person images under partial body presence, namely, where the persons were partially truncated by the image boundary. Specifically, we selected single-person images with 12 limb keypoints fully annotated. Face keypoints (eyes, nose, and ears) were not considered because these annotations were sometimes missing for persons with smaller areas in the images. Finally, we selected a subset of 4450 images of full single persons under different poses. Pose parameterization To parameterize natural poses, we first extracted 3D pose information from complex natural scene images. MS COCO dataset does not provide ground truth person keypoint annotations or viewpoint parameters in three dimensions. Therefore, we adopted an approach to use an off-the-shelf human 3D mesh reconstruction model (Kanazawa et al., 2018) to extract 3D pose information. Given a single RGB image 68 in the wild, this model can reconstruct a full 3D human body mesh. The model was quantitatively evaluated on standard 3D joint estimation benchmarks and outperformed previous approaches that output 3D meshes (Kanazawa et al., 2018). The viewpoint parameter for the 3D human body mesh is an axis-angle representation for the 3D body global rotation in SMPL format (Loper, Mahmood, Romero, Pons-Moll, & Black, 2015). The 3D rotation was transformed into a rotation matrix 𝑅 ∈ ℝH×H for further processing. For body pose parameters, we transformed the 3D body mesh into a list of 3D joint locations with a trained joint location regressor (Kanazawa et al., 2018). This joint list includes 19 joints (L/R ankles, L/R knees, L/R hips, L/R wrists, L/R elbows, L/R shoulders, neck, head, nose, L/R eyes, L/R ears). Thus, for each pose, we obtained a rotation matrix 𝑅 for the body global rotation and a list of 𝐾 = 19 joint locations 𝑝 = [ 𝐽6, 𝐽#, … , 𝐽O ] where 𝐽Q ∈ ℝH. We did not perform additional normalization on these 3D joint coordinates because they were already in the same 3D body mesh reference frame. To parameterize pose using 3D view-dependent joint locations 𝑝HR_T =[𝐽6HR_T, 𝐽#HR_T , … , 𝐽QHR_T], we simply used these 3D joint coordinates 𝐽QHR_T = 𝐽Q ∈ ℝH. To parameterize pose using 2D view-dependent joint locations 𝑝#R =[𝐽6(#R), 𝐽#(#R), … , 𝐽O(#R)], we simply discarded the depth coordinate to make 𝐽Q(#R) ∈ ℝ#. To parameterize pose using 3D view-independent aligned joint locations 𝑝HR_TU =[𝐽6(HR_TU), 𝐽#(HR_TU), … , 𝐽O(HR_TU)], we reversed the global rotation to align poses to the same, original orientation 𝐽Q(HR_TU) = 𝑅76𝐽Q where 𝑅76 is the inverse of the rotation matrix for the 3D global body rotation. Construction of representational dissimilarity matrices (RDMs) 69 Once we parameterized each natural pose and obtained 2D view-dependent joint locations and 3D view-independent joint locations as well as 3D global rotations, we construct representational dissimilarity matrices by measuring dissimilarity under different metrics. To construct 3D view-independent aligned pose RDMs, we measured the dissimilarity between two aligned poses using Mean Per Joint Position Error which is used in much of the literature on 3D joint estimation. It measures the Euclidean distance averaged on all joints after aligning two poses. Specifically, the dissimilarity between two poses 𝑝U(HR_TU) and 𝑝V(HR_TU) is measured as 𝑑(HR_TU)X𝑝U(HR_TU), 𝑝V(HR_TU)Y = 1𝐾Z[𝐽UQ(HR_TU) − 𝐽VQ(HR_TU)[#OQ]6 Similarly, we can construct the 2D and 3D view-dependent pose RDM by measuring dissimilarity as 𝑑(#R)X𝑝U(#R), 𝑝V(#R)Y = 1𝐾Z[𝐽UQ(#R) − 𝐽VQ(#R)[#OQ]6 𝑑(HR_T)X𝑝U(HR_T), 𝑝V(HR_T)Y = 1𝐾Z[𝐽UQ(HR_T) − 𝐽VQ(HR_T)[#OQ]6 To construct the viewpoint RDM, we measured the viewpoint dissimilarity between pairs of bodies as the distance between the body global rotations in three dimensions. We first transformed the associated 3D rotation matrix 𝑅 ∈ ℝH×H into a unit quaternion 𝑞. Following (Huynh, 2009), we used the distance metric below to assess the dissimilarity of two 3D body global rotations 𝑑(T)X𝑞U, 𝑞VY = cos76b𝑞U ∙ 𝑞Vb Representational Similarity Analysis We carried out the representational similarity analyses (RSA) using a searchlight approach in the individual volume space to investigate the relationship between the computed features and the brain activity. A spherical neighborhood of 100 voxels (approximately 10mm in radius) were used for each searchlight. The multivariate analyses were performed using CosMoMVPA (Oosterhof, Connolly, & Haxby, 2016) 70 and custom-written MATLAB functions (ver2017b, The MathWorks Inc.). The beta maps with the same images were first averaged, and then the beta maps were normalized across features. The neural RDM was derived using 1-correlation as the distance metric. We selected 3D viewpoint-independent pose RDM, 2D viewpoint- dependent pose RDM, and viewpoint RDM as three target RDMs in our analysis. The neural RDM was correlated with the normalized target RDM for each searchlight. Then, the beta values were assigned to the central voxel of each searchlight in each participant, which resulting in beta maps for each model. The beta maps in individual volume space were then resampled to the standard MNI space and used in a group level analysis to compare the individual beta maps against zero using a one-tailed t-test at each voxel. The t maps were then corrected with a nonparametric cluster-based Monte Carlo permutation test (initial threshold p<0.001; 10000 iterations). 71 Bibliography Afraz, A., & Cavanagh, P. (2009). The gender-specific face after effect is based in retinotopic not spatiotopic coordinates across several natural image transformations. Journal of Vision, 9(10), 1–17. https://doi.org/10.1167/9.10.10 Alais, D., & Melcher, D. (2007). Strength and coherence of binocular rivalry depends on shared stimulus complexity. Vision Research, 47(2), 269–279. https://doi.org/10.1016/j.visres.2006.09.003 Albright, T. D., & Stoner, G. R. (2002). Contextual influences on visual processing. Annual Review of Neuroscience, 25(1), 339–379. https://doi.org/10.1146/annurev.neuro.25.112701.142900 Allen, E. J., St-Yves, G., Wu, Y., Breedlove, J. L., Dowdle, L. T., Caron, B., … Kay, K. N. (2021). A massive 7T fMRI dataset to bridge cognitive and computational neuroscience. BioRxiv, 1–70. Retrieved from https://doi.org/10.1101/2021.02.22.432340 Axelrod, V., Bar, M., & Rees, G. (2015). Exploring the unconscious using faces. Trends in Cognitive Sciences, 19(1), 35–45. https://doi.org/10.1016/j.tics.2014.11.003 Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). Canonical Microcircuits for Predictive Coding. Neuron, 76(4), 695–711. https://doi.org/10.1016/j.neuron.2012.10.038 Benucci, A., Saleem, A. B., & Carandini, M. (2013). Adaptation maintains population homeostasis in primary visual cortex. Nature Neuroscience, 16(6), 724–729. https://doi.org/10.1038/nn.3382 Blakemore, C., & Tobin, E. A. (1972). Lateral inhibition between orientation detectors in the cat’s visual cortex. Experimental Brain Research, 15(4), 439–440. https://doi.org/10.1007/BF00234129 Boynton, G. M., & Finney, E. M. (2003). Orientation-specific adaptation in human visual cortex. Journal of Neuroscience, 23(25), 8781–8787. https://doi.org/10.1523/jneurosci.23-25-08781.2003 Bracci, S., Caramazza, A., & Peelen, M. V. (2015). Representational similarity of body parts in human occipitotemporal cortex. Journal of Neuroscience, 35(38), 12977– 12985. https://doi.org/10.1523/JNEUROSCI.4698-14.2015 72 Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10(4), 433–436. https://doi.org/10.1163/156856897X00357 Burr, D., Tozzi, A., & Morrone, M. C. (2007). Neural mechanisms for timing visual events are spatially selective in real-world coordinates. Nature Neuroscience, 10(4), 423. Campbell, L. W., & Bobick, A. F. (1995). Recognition of human body motion using phase space constraints. In IEEE International Conference on Computer Vision (pp. 624– 630). IEEE. https://doi.org/10.1109/iccv.1995.466880 Cavanagh, P., & Anstis, S. (2013). The flash grab effect. Vision Research, 91, 8–20. https://doi.org/10.1016/j.visres.2013.07.007 Cha, O., & Chong, S. C. (2014). The background is remapped across saccades. Experimental Brain Research, 232(2), 609–618. https://doi.org/10.1007/s00221- 013-3769-9 Cho, S., & He, S. (2019). Size-invariant but location-specific object-viewpoint adaptation in the absence of awareness. Cognition, 192, 104035. https://doi.org/10.1016/j.cognition.2019.104035 Cicchini, G. M., Binda, P., Burr, D. C., & Morrone, M. C. (2013). Transient spatiotopic integration across saccadic eye movements mediates visual stability. Journal of Neurophysiology, 109(4), 1117–1125. https://doi.org/10.1152/jn.00478.2012 Clifford, C. W.G., Wenderoth, P., & Spehar, B. (2000). A functional angle on some after- effects in cortical vision. Proceedings of the Royal Society B: Biological Sciences, 267(1454), 1705–1710. https://doi.org/10.1098/rspb.2000.1198 Clifford, Colin W.G. (2014). The tilt illusion: Phenomenology and functional implications. Vision Research, 104, 3–11. https://doi.org/10.1016/j.visres.2014.06.009 Clifford, Colin W.G., & Rhodes, G. (2005). Fitting the Mind to the World: Adaptation and After-Effects in High-Level Vision. Fitting the Mind to the World: Adaptation and After-Effects in High-Level Vision (Vol. 2). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198529699.001.0001 Cohen, M. A., Cavanagh, P., Chun, M. M., & Nakayama, K. (2012). The attentional requirements of consciousness. Trends in Cognitive Sciences, 16(8), 411–417. https://doi.org/10.1016/j.tics.2012.06.013 Cornelissen, F. W., Wade, A. R., Vladusich, T., Dougherty, R. F., & Wandell, B. A. (2006). No functional magnetic resonance imaging evidence for brightness and 73 color filling-in in early human visual cortex. Journal of Neuroscience, 26(14), 3634– 3641. https://doi.org/10.1523/JNEUROSCI.4382-05.2006 Cousineau, D. (2005). Confidence intervals in within-subject designs: A simpler solution to Loftus and Masson’s method. Tutorials in Quantitative Methods for Psychology, 1(1), 42–45. https://doi.org/10.20982/tqmp.01.1.p042 Cox, R. W. (1996). AFNI: Software for analysis and visualization of functional magnetic resonance neuroimages. Computers and Biomedical Research, 29(3), 162–173. https://doi.org/10.1006/cbmr.1996.0014 Crapse, T. B., & Sommer, M. A. (2012). Frontal eye field neurons assess visual stability across saccades. Journal of Neuroscience, 32(8), 2835–2845. https://doi.org/10.1523/JNEUROSCI.1320-11.2012 Crespi, S., Biagi, L., d’Avossa, G., Burr, D. C., Tosetti, M., & Morrone, M. C. (2011). Spatiotopic coding of BOLD signal in human visual cortex depends on spatial attention. PLoS ONE, 6(7), e21661. https://doi.org/10.1371/journal.pone.0021661 Cross, E. S., MacKie, E. C., Wolford, G., & Antonia, A. F. (2010). Contorted and ordinary body postures in the human brain. Experimental Brain Research, 204(3), 397–407. https://doi.org/10.1007/s00221-009-2093-x D’Avossa, G., Tosetti, M., Crespi, S., Biagi, L., Burr, D. C., & Morrone, M. C. (2007). Spatiotopic selectivity of BOLD responses to visual motion in human area MT. Nature Neuroscience, 10(2), 249–255. https://doi.org/10.1038/nn1824 De Martino, F., Moerel, M., Ugurbil, K., Goebel, R., Yacoub, E., & Formisano, E. (2015). Frequency preference and attention effects across cortical depths in the human primary auditory cortex. Proceedings of the National Academy of Sciences of the United States of America, 112(52), 16036–16041. https://doi.org/10.1073/pnas.1507552112 De Sousa, A. A., Sherwood, C. C., Schleicher, A., Amunts, K., MacLeod, C. E., Hof, P. R., & Zilles, K. (2010). Comparative cytoarchitectural analyses of striate and extrastriate areas in hominoids. Cerebral Cortex, 20(4), 966–981. https://doi.org/10.1093/cercor/bhp158 Demeyer, M., De Graef, P., Wagemans, J., & Verfaillie, K. (2009). Transsaccadic identification of highly similar artificial shapes. Journal of Vision, 9(4), 28. https://doi.org/10.1167/9.4.28 74 Desikan, R. S., Ségonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., … Killiany, R. J. (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980. https://doi.org/10.1016/j.neuroimage.2006.01.021 Downing, P., & Kanwisher, N. (2001). A cortical area specialized for visual processing of the human body. Journal of Vision, 1(3), 2470–2473. https://doi.org/10.1167/1.3.341 Duhamel, J. R., Bremmer, F., BenHamed, S., & Graf, W. (1997). Spatial invariance of visual receptive fields in parietal cortex neurons. Nature, 389(6653), 845–848. https://doi.org/10.1038/39865 Duhamel, J. R., Colby, C. L., & Goldberg, M. E. (1992). The updating of the representation of visual space in parietal cortex by intended eye movements. Science, 255(5040), 90–92. https://doi.org/10.1126/science.1553535 Dunkley, B. T., Baltaretu, B., & Crawford, J. D. (2016). Trans-saccadic interactions in human parietal and occipital cortex during the retention and comparison of object orientation. Cortex, 82, 263–276. https://doi.org/10.1016/j.cortex.2016.06.012 Engel, S. A., Glover, G. H., & Wandell, B. A. (1997). Retinotopic organization in human visual cortex and the spatial precision of functional MRI. Cerebral Cortex, 7(2), 181–192. https://doi.org/10.1093/cercor/7.2.181 Fabius, J. H., Fracasso, A., Nijboer, T. C. W., & Van Der Stigchel, S. (2019). Time course of spatiotopic updating across saccades. Proceedings of the National Academy of Sciences of the United States of America, 116(6), 2027–2032. https://doi.org/10.1073/pnas.1812210116 Fabius, J. H., Fracasso, A., & Van Der Stigchel, S. (2016). Spatiotopic updating facilitates perception immediately after saccades. Scientific Reports, 6(1), 1–11. https://doi.org/10.1038/srep34488 Fairhall, S. L., Schwarzbach, J., Lingnau, A., Van Koningsbruggen, M. G., & Melcher, D. (2017). Spatiotopic updating across saccades revealed by spatially-specific fMRI adaptation. NeuroImage, 147, 339–345. https://doi.org/10.1016/j.neuroimage.2016.11.071 Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and ventral pathways. Nature Neuroscience, 8(10), 1380–1385. 75 https://doi.org/10.1038/nn1537 Fang, F., Murray, S. O., Kersten, D., & He, S. (2005). Orientation-tuned fMRI adaptation in human visual cortex. Journal of Neurophysiology, 94(6), 4188–4195. https://doi.org/10.1152/jn.00378.2005 Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1(1), 1–47. https://doi.org/10.1093/cercor/1.1.1-a Forte, J. D., & Clifford, C. W. G. (2005). Inter-ocular transfer of the tilt illusion shows that monocular orientation mechanisms are colour selective. Vision Research, 45(20), 2715–2721. https://doi.org/10.1016/j.visres.2005.05.001 Fracasso, A., Caramazza, A., & Melcher, D. (2010). Continuous perception of motion and shape across saccadic eye movements. Journal of Vision, 10(13), 14. https://doi.org/10.1167/10.13.14 Fukiage, T., & Murakami, I. (2010). The tilt aftereffect occurs independently of the flash- lag effect. Vision Research, 50(19), 1949–1956. https://doi.org/10.1016/j.visres.2010.07.002 Fukiage, T., & Murakami, I. (2013). Adaptation to a spatial offset occurs independently of the flash-drag effect. Journal of Vision, 13(2), 7. https://doi.org/10.1167/13.2.7 Ganmor, E., Landy, M. S., & Simoncelli, E. P. (2015). Near-optimal integration of orientation information across saccades. Journal of Vision, 15(16), 8. https://doi.org/10.1167/15.16.8 Gardner, J. L., Merriam, E. P., Movshon, J. A., & Heeger, D. J. (2008). Maps of visual space in human occipital cortex are retinotopic, not spatiotopic. Journal of Neuroscience, 28(15), 3988–3999. https://doi.org/10.1523/JNEUROSCI.5476- 07.2008 Georgeson, M. (2004). Visual aftereffects: Cortical neurons change their tune. Current Biology, 14(18), R751–R753. https://doi.org/10.1016/j.cub.2004.09.011 Gibson, J. J., & Radner, M. (1937). Adaptation, after-effect and contrast in the perception of tilted lines. Journal of Experimental Psychology, 20(5), 453–467. https://doi.org/10.1037/h0059826 Gilbert, C. D., & Li, W. (2013). Top-down influences on visual processing. Nature Reviews Neuroscience, 14(5), 350–363. https://doi.org/10.1038/nrn3476 76 Goddard, E., Solomon, S., & Clifford, C. (2010). Adaptable mechanisms sensitive to surface color in human vision. Journal of Vision, 10(9), 17. https://doi.org/10.1167/10.9.17 Goebel, R., Esposito, F., & Formisano, E. (2006). Analysis of Functional Image Analysis Contest (FIAC) data with BrainVoyager QX: From single-subject to cortically aligned group General Linear Model analysis and self-organizing group Independent Component Analysis. Human Brain Mapping, 27(5), 392–401. https://doi.org/10.1002/hbm.20249 Golomb, J. D., Marino, A. C., Chun, M. M., & Mazer, J. A. (2011). Attention doesn’t slide: Spatiotopic updating after eye movements instantiates a new, discrete attentional locus. Attention, Perception, and Psychophysics, 73(1), 7–14. https://doi.org/10.3758/s13414-010-0016-3 Golomb, J. D., Nguyen-Phuc, A. Y., Mazer, J. A., McCarthy, G., & Chun, M. M. (2010). Attentional facilitation throughout human visual cortex lingers in retinotopic coordinates after eye movements. Journal of Neuroscience, 30(31), 10493–10506. https://doi.org/10.1523/JNEUROSCI.1546-10.2010 Golomb, J. D., Pulido, V. Z., Albrecht, A. R., Chun, M. M., & Mazer, J. A. (2010). Robustness of the retinotopic attentional trace after eye movements. Journal of Vision, 10(3), 1–12. https://doi.org/10.1167/10.3.19 Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., … Hämäläinen, M. (2013). MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 7(7 DEC), 267. https://doi.org/10.3389/fnins.2013.00267 Grèzes, J., & Decety, J. (2000). Functional anatomy of execution, mental simulation, observation, and verb generation of actions: A meta-analysis. Human Brain Mapping, 12(1), 1–19. https://doi.org/10.1002/1097-0193(200101)12:1<1::AID- HBM10>3.0.CO;2-V Grootswagers, T., Wardle, S. G., & Carlson, T. A. (2017). Decoding dynamic brain patterns from evoked responses: A tutorial on multivariate pattern analysis applied to time series neuroimaging data. Journal of Cognitive Neuroscience, 29(4), 677– 697. https://doi.org/10.1162/jocn_a_01068 Hasson, U., & Honey, C. J. (2012). Future trends in Neuroimaging: Neural processes as expressed within real-life contexts. NeuroImage, 62(2), 1272–1278. 77 https://doi.org/10.1016/j.neuroimage.2012.02.004 He, D., Mo, C., & Fang, F. (2017). Predictive feature remapping before saccadic eye movements. Journal of Vision, 17(5), 14. https://doi.org/10.1167/17.5.14 He, S., Cavanagh, P., & Intriligator, J. (1996). Attentional resolution and the locus of visual awareness. Nature, 383(6598), 334–337. https://doi.org/10.1038/383334a0 He, Sheng, & MacLeod, D. I. A. (2001). Orientation-selective adaptation and tilt after- effect from invisible patterns. Nature, 411(6836), 473–476. https://doi.org/10.1038/35078072 He, T., Fritsche, M., & Lange de, F. P. (2018). Predictive remapping of visual features beyond saccadic targets. BioRxiv, 18(13), 20. https://doi.org/10.1101/297481 Hiebert, E. N. (1996). Science and Culture: Popular and Philosophical Essays . Hermann von Helmholtz , David Cahan . Isis (Vol. 87). University of Chicago Press. https://doi.org/10.1086/357539 Hodzic, A., Kaas, A., Muckli, L., Stirn, A., & Singer, W. (2009). Distinct cortical networks for the detection and identification of human body. NeuroImage, 45(4), 1264–1271. https://doi.org/10.1016/j.neuroimage.2009.01.027 Hogendoorn, H., Verstraten, F. A. J., & Cavanagh, P. (2015). Strikingly rapid neural basis of motion-induced position shifts revealed by high temporal-resolution EEG pattern classification. Vision Research, 113(PA), 1–10. https://doi.org/10.1016/j.visres.2015.05.005 Huynh, D. Q. (2009). Metrics for 3D rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision, 35(2), 155–164. https://doi.org/10.1007/s10851- 009-0161-2 Isik, L., Koldewyn, K., Beeler, D., & Kanwisher, N. (2017). Perceiving social interactions in the posterior superior temporal sulcus. Proceedings of the National Academy of Sciences of the United States of America, 114(43), E9145–E9152. https://doi.org/10.1073/pnas.1714471114 Jiang, Y., Costello, P., Fang, F., Huang, M., & He, S. (2006). A gender- and sexual orientation-dependent spatial attentional effect of invisible images. Proceedings of the National Academy of Sciences of the United States of America, 103(45), 17048–17052. https://doi.org/10.1073/pnas.0605678103 Jin, D. Z., Dragoi, V., Sur, M., & Seung, H. S. (2005). Tilt aftereffect and adaptation- 78 induced changes in orientation tuning in visual cortex. Journal of Neurophysiology, 94(6), 4038–4050. https://doi.org/10.1152/jn.00571.2004 Kanazawa, A., Black, M. J., Jacobs, D. W., & Malik, J. (2018). End-to-End Recovery of Human Shape and Pose. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 7122–7131). https://doi.org/10.1109/CVPR.2018.00744 Kaunitz, L., Fracasso, A., & Melcher, D. (2011). Unseen complex motion is modulated by attention and generates a visible aftereffect. Journal of Vision, 11(13), 10. https://doi.org/10.1167/11.13.10 Kim, C. Y., & Blake, R. (2005). Psychophysical magic: Rendering the visible “invisible.” Trends in Cognitive Sciences, 9(8), 381–388. https://doi.org/10.1016/j.tics.2005.06.012 Klein, B. P., Fracasso, A., van Dijk, J. A., Paffen, C. L. E., te Pas, S. F., & Dumoulin, S. O. (2018). Cortical depth dependent population receptive field attraction by spatial attention in human V1. NeuroImage, 176, 301–312. https://doi.org/10.1016/j.neuroimage.2018.04.055 Kohler, P. J., Cavanagh, P., & Tse, P. U. (2017). Motion-induced position shifts activate early visual cortex. Frontiers in Neuroscience, 11(APR), 168. https://doi.org/10.3389/fnins.2017.00168 Kohn, A. (2007). Visual adaptation: Physiology, mechanisms, and functional benefits. Journal of Neurophysiology, 97(5), 3155–3164. https://doi.org/10.1152/jn.00086.2007 Kohn, A., & Movshon, J. A. (2003). Neuronal adaptation to visual motion in area MT of the macaque. Neuron, 39(4), 681–691. https://doi.org/10.1016/S0896- 6273(03)00438-0 Kok, P., Bains, L. J., Van Mourik, T., Norris, D. G., & De Lange, F. P. (2016). Selective activation of the deep layers of the human primary visual cortex by top-down feedback. Current Biology, 26(3), 371–376. https://doi.org/10.1016/j.cub.2015.12.038 Kosovicheva, A. A., Maus, G. W., Anstis, S., Cavanagh, P., Tse, P. U., & Whitney, D. (2012). The motion-induced shift in the perceived location of a grating also shifts its aftereffect. Journal of Vision, 12(8), 1–4. https://doi.org/10.1167/12.8.7 79 Krauskopf, J., & Zaidi, Q. (1986). Induced desensitization. Vision Research, 26(5), 759– 762. https://doi.org/10.1016/0042-6989(86)90090-8 Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain mapping. Proceedings of the National Academy of Sciences of the United States of America, 103(10), 3863–3868. https://doi.org/10.1073/pnas.0600244103 Kriegeskorte, N., Mur, M., & Bandettini, P. (2008). Representational similarity analysis - connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2(NOV), 4. https://doi.org/10.3389/neuro.06.004.2008 Kveraga, K., Ghuman, A. S., & Bar, M. (2007). Top-down predictions in the cognitive brain. Brain and Cognition, 65(2), 145–168. https://doi.org/10.1016/j.bandc.2007.06.007 Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23(11), 571–579. https://doi.org/10.1016/S0166-2236(00)01657-X Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., … Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8693 LNCS, pp. 740–755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48 Lin, Z., & He, S. (2009). Seeing the invisible: The scope and limits of unconscious processing in binocular rivalry. Progress in Neurobiology, 87(4), 195–211. https://doi.org/10.1016/j.pneurobio.2008.09.002 Liu, T., Heeger, D. J., & Carrasco, M. (2006). Neural correlates of the visual vertical meridian asymmetry. Journal of Vision, 6(11), 12. https://doi.org/10.1167/6.11.12 Liu, T., Larsson, J., & Carrasco, M. (2007). Feature-Based Attention Modulates Orientation-Selective Responses in Human Visual Cortex. Neuron, 55(2), 313–323. https://doi.org/10.1016/j.neuron.2007.06.030 Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J. (2015). SMPL: A skinned multi-person linear model. ACM Transactions on Graphics, 34(6), 1–16. https://doi.org/10.1145/2816795.2818013 Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG- data ଝ , ଝଝ, 164, 177–190. https://doi.org/10.1016/j.jneumeth.2007.03.024 80 Mathôt, S., & Theeuwes, J. (2010). Gradual remapping results in early retinotopic and late spatiotopic inhibition of return. Psychological Science, 21(12), 1793–1798. https://doi.org/10.1177/0956797610388813 Melcher, D. (2005). Spatiotopic transfer of visual-form adaptation across saccadic eye movements. Current Biology, 15(19), 1745–1748. https://doi.org/10.1016/j.cub.2005.08.044 Melcher, D. (2008). Dynamic, object-based remapping of visual features in trans- saccadic perception. Journal of Vision, 8(14), 2. https://doi.org/10.1167/8.14.2 Melcher, D. (2009). Selective attention and the active remapping of object features in trans-saccadic perception. Vision Research, 49(10), 1249–1255. https://doi.org/10.1016/j.visres.2008.03.014 Melcher, D. (2011). Visual stability. Philosophical Transactions of the Royal Society B: Biological Sciences. article, England: The Royal Society. https://doi.org/10.1098/rstb.2010.0277 Melcher, D., & Colby, C. L. (2008). Trans-saccadic perception. Trends in Cognitive Sciences, 12(12), 466–473. Melcher, D., & Fracasso, A. (2012). Remapping of the line motion illusion across eye movements. Experimental Brain Research, 218(4), 503–514. https://doi.org/10.1007/s00221-012-3043-6 Melcher, D., & Morrone, C. (2003). Spatiotopic temporal integration of motion across saccades. Journal of Vision, 3(9), 877–881. https://doi.org/10.1167/3.9.172 Merriam, E. P., Gardner, J. L., Movshon, J. A., & Heeger, D. J. (2013). Modulation of visual responses by gaze direction in human visual cortex. Journal of Neuroscience, 33(24), 9879–9889. https://doi.org/10.1523/JNEUROSCI.0500- 12.2013 Michelson, A. A. (1995). Studies in optics TT -. Dover Books on Physics; Dover Books on Physics. TA -. Courier Corporation. Mohr, H. M., Linder, N. S., Dennis, H., & Sireteanu, R. (2011). Orientation-specific aftereffects to mentally generated lines. Perception, 40(3), 272–290. https://doi.org/10.1068/p6781 Mohr, H. M., Linder, N. S., Linden, D. E. J., Kaiser, J., & Sireteanu, R. (2009). Orientation-specific adaptation to mentally generated lines in human visual cortex. 81 NeuroImage, 47(1), 384–391. https://doi.org/10.1016/j.neuroimage.2009.03.045 Morrone, M. C., Cicchini, M., & Burr, D. C. (2010). Spatial maps for time and motion. Experimental Brain Research, 206(2), 121–128. https://doi.org/10.1007/s00221- 010-2334-z Muckli, L., De Martino, F., Vizioli, L., Petro, L. S., Smith, F. W., Ugurbil, K., … Yacoub, E. (2015). Contextual Feedback to Superficial Layers of V1. Current Biology, 25(20), 2690–2695. https://doi.org/10.1016/j.cub.2015.08.057 Muckli, L., Kohler, A., Kriegeskorte, N., & Singer, W. (2005). Primary visual cortex activity along the apparent-motion trace reflects illusory perception. PLoS Biology, 3(8), e265. https://doi.org/10.1371/journal.pbio.0030265 Murray, S. O., Boyaci, H., & Kersten, D. (2006). The representation of perceived angular size in human primary visual cortex. Nature Neuroscience, 9(3), 429–434. https://doi.org/10.1038/nn1641 Nakashima, Y., & Sugita, Y. (2017). The reference frame of the tilt aftereffect measured by differential Pavlovian conditioning. Scientific Reports, 7(1), 1–11. https://doi.org/10.1038/srep40525 Nandy, A. S., Sharpee, T. O., Reynolds, J. H., & Mitchell, J. F. (2013). The Fine Structure of Shape Tuning in Area V4. Neuron, 78(6), 1102–1115. https://doi.org/10.1016/j.neuron.2013.04.016 Nelken, I. (2004). Processing of complex stimuli and natural scenes in the auditory cortex. Current Opinion in Neurobiology, 14(4), 474–480. https://doi.org/10.1016/j.conb.2004.06.005 Nichols, T., & Holmes, A. (2003). Nonparametric Permutation Tests for Functional Neuroimaging. Human Brain Function: Second Edition, 15(1), 887–910. https://doi.org/10.1016/B978-012264841-0/50048-2 Noudoost, B., Chang, M. H., Steinmetz, N. A., & Moore, T. (2010). Top-down control of visual attention. Current Opinion in Neurobiology, 20(2), 183–190. https://doi.org/10.1016/j.conb.2010.02.003 Oosterhof, N. N., Connolly, A. C., & Haxby, J. V. (2016). CoSMoMVPA: Multi-modal multivariate pattern analysis of neuroimaging data in matlab/GNU octave. Frontiers in Neuroinformatics, 10(JUL), 27. https://doi.org/10.3389/fninf.2016.00027 Orlov, T., Makin, T. R., & Zohary, E. (2010). Topographic Representation of the Human 82 Body in the Occipitotemporal Cortex. Neuron, 68(3), 586–600. https://doi.org/10.1016/j.neuron.2010.09.032 Pedregosa, F., Varoquaux, G., Buitinck, L., Louppe, G., Grisel, O., & Mueller, A. (2015). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 19(1), 29–33. Peelen, M. V., & Downing, P. E. (2005). Selectivity for the human body in the fusiform gyrus. Journal of Neurophysiology, 93(1), 603–608. https://doi.org/10.1152/jn.00513.2004 Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Spatial Vision, 10(4), 437–442. https://doi.org/10.1163/156856897X00366 Pelphrey, K. A., Mitchell, T. V., McKeown, M. J., Goldstein, J., Allison, T., & McCarthy, G. (2003). Brain activity evoked by the perception of human walking: Controlling for meaningful coherent motion. Journal of Neuroscience, 23(17), 6819–6825. https://doi.org/10.1523/jneurosci.23-17-06819.2003 Pizlo, Z. (2001). Perception viewed as an inverse problem. Vision Research, 41(24), 3145–3161. https://doi.org/10.1016/S0042-6989(01)00173-0 Ro, T., Breitmeyer, B., Burton, P., Singhal, N. S., & Lane, D. (2003). Feedback contributions to visual awareness in human occipital cortex. Current Biology, 13(12), 1038–1041. https://doi.org/10.1016/S0960-9822(03)00337-3 Rushton, W. A. H. (1965). The Ferrier Lecture, 1962 Visual adaptation. Proceedings of the Royal Society of London. Series B. Biological Sciences, 162(986), 20–46. https://doi.org/10.1098/rspb.1965.0024 Saxe, R., Xiao, D. K., Kovacs, G., Perrett, D. I., & Kanwisher, N. (2004). A region of right posterior superior temporal sulcus responds to observed intentional actions. Neuropsychologia, 42(11), 1435–1446. https://doi.org/10.1016/j.neuropsychologia.2004.04.015 Schütt, H. H., Harmeling, S., Macke, J. H., & Wichmann, F. A. (2016). Painfree and accurate Bayesian estimation of psychometric functions for (potentially) overdispersed data. Vision Research, 122, 105–123. https://doi.org/10.1016/j.visres.2016.02.002 Schwartz, O., Hsu, A., & Dayan, P. (2007). Space and time in visual context. Nature 83 Reviews Neuroscience, 8(7), 522–535. https://doi.org/10.1038/nrn2155 Sekunova, A., Black, M., Parkinson, L., & Barton, J. J. S. (2013). Viewpoint and pose in body-form adaptation. Perception, 42(2), 176–186. https://doi.org/10.1068/p7265 Self, M. W., van Kerkoerle, T., Goebel, R., & Roelfsema, P. R. (2019). Benchmarking laminar fMRI: Neuronal spiking and synaptic activity during top-down and bottom-up processing in the different layers of cortex. NeuroImage, 197, 806–817. https://doi.org/10.1016/j.neuroimage.2017.06.045 Self, M. W., van Kerkoerle, T., Supèr, H., & Roelfsema, P. R. (2013). Distinct Roles of the Cortical Layers of Area V1 in Figure-Ground Segregation. Current Biology, 23(21), 2121–2129. https://doi.org/10.1016/j.cub.2013.09.013 Solomon, S. G., & Kohn, A. (2014). Moving sensory adaptation beyond suppressive effects in single neurons. Current Biology, 24(20), R1012–R1022. https://doi.org/10.1016/j.cub.2014.09.001 Stein, T., & Sterzer, P. (2014). Unconscious processing under interocular suppression: Getting the right measure. Frontiers in Psychology, 5(MAY), 387. https://doi.org/10.3389/fpsyg.2014.00387 Sterzer, P., Stein, T., Ludwig, K., Rothkirch, M., & Hesselmann, G. (2014). Neural processing of visual information under interocular suppression: A critical review. Frontiers in Psychology, 5(MAY), 453. https://doi.org/10.3389/fpsyg.2014.00453 Szinte, M., Jonikaitis, D., Rangelov, D., & Deubel, H. (2018). Pre-saccadic remapping relies on dynamics of spatial attention. Elife, 7, e37598. Thompson, P., & Burr, D. (2009). Visual aftereffects. Current Biology, 19(1), R11–R14. https://doi.org/10.1016/j.cub.2008.10.014 Tolias, A. S., Moore, T., Smirnakis, S. M., Tehovnik, E. J., Siapas, A. G., & Schiller, P. H. (2001). Eye movements modulate visual receptive fields of V4 neurons. Neuron, 29(3), 757–767. https://doi.org/10.1016/S0896-6273(01)00250-1 Tsuchiya, N., & Koch, C. (2005). Continuous flash suppression reduces negative afterimages. Nature Neuroscience, 8(8), 1096–1101. https://doi.org/10.1038/nn1500 Turi, M., & Burr, D. (2012). Spatiotopic perceptual maps in humans: Evidence from motion adaptation. Proceedings of the Royal Society B: Biological Sciences, 279(1740), 3091–3097. https://doi.org/10.1098/rspb.2012.0637 84 Urgesi, C., Candidi, M., Ionta, S., & Aglioti, S. M. (2007). Representation of body identity and body actions in extrastriate body area and ventral premotor cortex. Nature Neuroscience, 10(1), 30–31. https://doi.org/10.1038/nn1815 Van de Moortele, P. F., Auerbach, E. J., Olman, C., Yacoub, E., Uǧurbil, K., & Moeller, S. (2009). T1 weighted brain images at 7 Tesla unbiased for Proton Density, T2* contrast and RF coil receive B1 sensitivity with simultaneous vessel visualization. NeuroImage, 46(2), 432–446. https://doi.org/10.1016/j.neuroimage.2009.02.009 van Kerkoerle, T., Self, M. W., & Roelfsema, P. R. (2017). Erratum: Layer-specificity in the effects of attention and working memory on activity in primary visual cortex. Nature Communications, 8(1), 15555. https://doi.org/10.1038/ncomms15555 Wagstyl, K., Lepage, C., Bludau, S., Zilles, K., Fletcher, P. C., Amunts, K., & Evans, A. C. (2018). Mapping cortical laminar structure in the 3D bigbrain. Cerebral Cortex, 28(7), 2551–2562. https://doi.org/10.1093/cercor/bhy074 Wang, C., Wang, Y., Lin, Z., & Yuille, A. L. (2019). Robust 3D human pose estimation from single images or video sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5), 1227–1241. https://doi.org/10.1109/TPAMI.2018.2828427 Wang, C., Wang, Y., & Yuille, A. L. (2013). An approach to pose-based action recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 915–922). https://doi.org/10.1109/CVPR.2013.123 Wolfe, B. A., & Whitney, D. (2015). Saccadic remapping of object-selective information. Attention, Perception, and Psychophysics, 77(7), 2260–2269. https://doi.org/10.3758/s13414-015-0944-z Wurtz, R. H., Joiner, W. M., & Berman, R. A. (2011). Neuronal mechanisms for visual stability: Progress and problems. Philosophical Transactions of the Royal Society B: Biological Sciences, 366(1564), 492–503. https://doi.org/10.1098/rstb.2010.0186 Wutz, A., Drewes, J., & Melcher, D. (2016). Nonretinotopic perception of orientation: Temporal integration of basic features operates in object-based coordinates. Journal of Vision, 16(10), 3. https://doi.org/10.1167/16.10.3 Yacoob, Y., & Black, M. J. (1999). Parameterized Modeling and Recognition of Activities. Computer Vision and Image Understanding, 73(2), 232–247. 85 https://doi.org/10.1006/cviu.1998.0726 Yang, E., Brascamp, J., Kang, M. S., & Blake, R. (2014). On the use of continuous flash suppression for the study of visual processing outside of awareness. Frontiers in Psychology, 5(JUL), 724. https://doi.org/10.3389/fpsyg.2014.00724 Yang, E., Hong, S. W., & Blake, R. (2010). Adaptation aftereffects to facial expressions suppressed from visual awareness. Journal of Vision, 10(12), 1–13. https://doi.org/10.1167/10.12.24 Zaidi, Q., & Sachtler, W. L. (1991). Motion adaptation from surrounding stimuli. Perception, 20(6), 703–714. https://doi.org/10.1068/p200703 Zimmermann, E., Morrone, M. C., & Burr, D. (2015). Visual mislocalization during saccade sequences. Experimental Brain Research, 233(2), 577–585. Zimmermann, E., Morrone, M. C., & Burr, D. C. (2014). Buildup of spatial information over time and across eye-movements. Behavioural Brain Research, 275, 281–287. https://doi.org/10.1016/j.bbr.2014.09.013 Zimmermann, E., Morrone, M. C., Fink, G. R., & Burr, D. (2013). Spatiotopic neural representations develop slowly across saccades. Current Biology, 23(5), R193– R194. https://doi.org/10.1016/j.cub.2013.01.065 Zimmermann, E., Weidner, R., Abdollahi, R. O., & Fink, G. R. (2016). Spatiotopic adaptation in visual areas. Journal of Neuroscience, 36(37), 9526–9534. https://doi.org/10.1523/JNEUROSCI.0052-16.2016 Zimmermann, E., Weidner, R., & Fink, G. R. (2017). Spatiotopic updating of visual feature information. Journal of Vision, 17(12), 6. https://doi.org/10.1167/17.12.6 Zirnsak, M., Gerhards, R. G. K., Kiani, R., Lappe, M., & Hamker, F. H. (2011). Anticipatory saccade target processing and the presaccadic transfer of visual features. Journal of Neuroscience, 31(49), 17887–17891. https://doi.org/10.1523/JNEUROSCI.2465-11.2011 86 Appendix 1: Supplemental Information for Chapter 2 Figure A1.1. Hemi-visual field fMRI response to the flash grab illusion. Upper row shows data from the upper visual field (ventral part of visual cortex); lower row shows data from the lower visual field (dorsal part of visual cortex). Compared to the counter-clockwise illusion, fMRI response to the clockwise tilted illusion was stronger in the left ventral and right dorsal visual cortex, but weaker in the left dorsal and right ventral visual cortex. The left column shows predictions for the illusory representation in the visual retinotopic cortex. The right three columns show the fMRI responses to the clockwise and counter-clockwise tilted illusion from different quadrants of the visual field in early visual cortices from V1 to V3. Three-way repeated measures ANOVA revealed a significant three-way interaction in V1 across dorsal/ventral, left/right hemisphere, and clockwise/counter-clockwise illusion (F (1, 8) = 38.11, p < 0.001). In the ventral part of V1 (corresponding to the upper visual field), compared to the counter-clockwise condition, clockwise illusion produced stronger fMRI signals in the left hemisphere (corresponding to the right visual field), and weaker response in the right hemisphere, resulting in a significant interaction between left/right hemisphere and clockwise/counter-clockwise illusion (F (1, 8) = 5.55, p = 0.046) in a two-way repeated measures ANOVA. The opposite was true for the dorsal part of V1: BOLD response to the clockwise condition was weaker in the left hemisphere and stronger in the right hemisphere (F (1, 8) = 14.16, p = 0.006). Similar results were found for V2 (V2_v/upper: F (1, 8) = 39.36, p < 0.001; V2_d/lower: F (1, 8) = 11.60, p = 0.009; three-way interaction: F (1, 8) = 39.83, p < 0.001), and V3 (V3_v/upper: F (1, 8) = 12.50, p = 0.077; V3_v/lower: F (1, 8) = 23.09, p = 0.001; three-way interaction: F (1, 8) = 28.56, p < 0.001). Simple effects of CW vs CCW conditions (stars in the figure, two-sided paired t-test) were not further corrected beyond the protection of a significant ANOVA. The error bars indicate standard error of mean (n=9 individuals). Source data are provided as a Source Data file. 0 2 4 6 8 BO LD S ign al Ch an ge (% ) 0 2 4 6 8 BO LD S ign al Ch an ge (% ) 0 2 4 6 8 BO LD S ign al Ch an ge (% ) Counter-Clockwise Clockwise RH LH 0 2 4 6 8 BO LD S ign al Ch an ge (% ) 0 2 4 6 8 BO LD S ign al Ch an ge (% ) 0 2 4 6 8 BO LD S ign al Ch an ge (% ) **** * * * ** ** *** V1_v V2_v V3_v RH LH RH LH RH LH RH LH RH LH V1_d V2_d V3_d Predictions RH LH Ventral Dorsal RH LH 87 Figure A1.2. Schematic diagram of stimuli and procedures for the 7T fMRI experiment. In each block, a red bar repeatedly presented(flashed) at the reversal point of the pinwheel disc which is rotating back and forth for 12 seconds as a constant background , alternating with 12 seconds rotating background-only stimulus section. The bar would be percieved as tilted clockwise or counter-clockwise from the horizontal meridian, depended on the direction of motion reversal. Red solid lines indicate the presented position of the bar, while red dotted lines illustrate the perceived position. The bar rotation covered a section of both left and right visual fields between 16 degrees clockwise and 15 degrees counter-clockwise from the horizontal meridian. 88 Figure A1.3. Layer-specific bar angle representation of the flash grab illusion in each retinotopic visual area. In layers of V1, V2, V3, fMRI responses to the clockwise and counter-clockwise tilted illusions were plotted as a function of bar angle coordinates across the field of bar rotation. The red and blue curves represent mean retinotopic responses for clockwise and counter- clockwise conditions across seventeen subjects. The shading color indicate between-subject standard error. Source data are provided as a Source Data file. 89 Appendix 2: Supplemental Information for Chapter 3 Figure A2.1. Averaged horizontal eye position over the time course of the trials (time relative to the test probe onset (ms)) for six subjects, aligned with the midpoint of the saccade. (a) TAE without CFS condition; (c) TAE with CFS condition; (b) FGAE without CFS condition and (d) FGAE with CFS condition. Different colored curves represent horizontal eye positions for each individual (N=6). Dark gray horizontal bars represent the positions of the fixation point (at 0 degree) and the saccade target (at 6 degree). Light gray vertical bar represents the time course of the test presentation (0-100 ms). 90 Figure A2.2. Scatter plots for two sessions results with and without eye movement recording for six participants (a, TAE; b, FGAE). The dependent sample t-tests showed that there were no significant differences between two sessions in each condition for both TAE and FGAE (p>0.05, Holm corrected). 91 Figure A2.3. Scatter plots for the two groups of participants (n=12) with and without eye movement recording (a, TAE; b, FGAE). The independent sample t-tests showed that there were no significant differences between two groups in each condition for both TAE and FGAE (p>0.05, Holm corrected). 92 Figure A2.4. Normalized adaptation aftereffects (a, TAE; b, FGAE) for different conditions. The data were normalized against the NoCFS retinotopic condition in TAE and FGAE for each participant (dividing the aftereffect value by that in the NoCFS retinotopic condition). Following the normalization, the pattern of results is similar to that of the main results (Figure 3). Error bars show ±1 SE of the mean. Multiple comparisons were Holm corrected. (* adjusted p<0.05; ** adjusted p<0.01; *** adjusted p<0.001). 93 Appendix 3: Supplemental Information for Chapter 4 Table A3.1. List of ROI activation for viewpoint RDM Table A3.2. List of ROI activation for 3D viewpoint-independent pose RDM Abbreviation Full roi name Num of voxels Total Num of voxels Voxel percent in the roi (%) L lateraloccipital Left lateral occipital cortex 644 6379 10.09562627 R lateraloccipital Right lateral occipital cortex 436 5963 7.311755828 L lingual Right lingual gyrus 61 4205 1.450653983 R fusiform Right fusiform gyrus 119 4661 2.553100193 R parahippocampal Right parahippocampal gyrus 134 1742 7.692307692 L inferiorparietal Left inferior parietal cortex 81 7871 1.029094143 R inferiorparietal Right inferior parietal cortex 176 9676 1.818933444 L superiorparietal Left superior parietal cortex 123 10456 1.176358072 R superiorparietal Right superior parietal cortex 84 10222 0.821756995 Abbreviation Full roi name Num of voxels Total Num of voxels Voxel percent in the roi (%) L lateraloccipital Left lateral occipital cortex 1863 6379 29.20520458 R lateraloccipital Right lateral occipital cortex 1794 5963 30.08552742 L lingual Left lingual gyrus 419 4205 9.964328181 R lingual Right lingual gyrus 304 3894 7.806882383 L parahippocampal Left parahippocampal gyrus 324 1838 17.62785637 R parahippocampal Right parahippocampal gyrus 395 1742 22.67508611 L fusiform Left fusiform gyrus 1114 4714 23.63173526 R fusiform Right fusiform gyrus 1838 4661 39.43359794 L inferiortemporal Left inferior temporal gyrus 330 4415 7.474518686 R inferiortemporal Right inferior temporal gyrus 792 4198 18.86612673 L middletemporal Left middle temporal gyrus 160 4452 3.593890386 R middletemporal Right middle temporal gyrus 432 5057 8.542614198 L superiortemporal Left superior temporal gyrus 81 7271 1.114014578 R bankssts Right banks of the superior temporal sulcus 148 2196 6.739526412 L supramarginal Left supramarginal gyrus 361 8600 4.197674419 R supramarginal Right supramarginal gyrus 227 8150 2.785276074 L inferiorparietal Left inferior parietal cortex 1223 7871 15.53805107 R inferiorparietal Right inferior parietal cortex 912 9676 9.425382389 L superiorparietal Left superior parietal cortex 698 10456 6.675592961 R superiorparietal Right superior parietal cortex 748 10222 7.317550382 R precuneus Right precuneus cortex 64 7975 0.802507837 94 Table A3.3. List of ROI activation for 3D viewpoint-dependent pose RDM Abbreviation Full roi name Num of voxels Total Num of voxels Voxel percent in the roi (%) L pericalcarine Left pericalcarine cortex 101 1912 5.282426778 L lateraloccipital Left lateral occipital cortex 2095 6379 32.84213827 R lateraloccipital Right lateral occipital cortex 1425 5963 23.8973671 L lingual Left lingual gyrus 836 4205 19.88109394 R lingual Right lingual gyrus 574 3894 14.74062661 L parahippocampal Left parahippocampal gyrus 293 1838 15.94124048 R parahippocampal Right parahippocampal gyrus 620 1742 35.5912744 L fusiform Left fusiform gyrus 437 4714 9.270258804 R fusiform Right fusiform gyrus 1028 4661 22.05535293 L inferiortemporal Left inferior temporal gyrus 82 4415 1.857304643 R inferiortemporal Right inferior temporal gyrus 746 4198 17.77036684 R middletemporal Right middle temporal gyrus 505 5057 9.986157801 L supramarginal Left supramarginal gyrus 271 8600 3.151162791 R supramarginal Right supramarginal gyrus 101 8150 1.239263804 L inferiorparietal Left inferior parietal cortex 1128 7871 14.33108881 R inferiorparietal Right inferior parietal cortex 1376 9676 14.22075238 L superiorparietal Left superiorparietal 800 10456 7.651109411 R superiorparietal Right superior parietal cortex 1586 10222 15.51555469 L precuneus Left precuneus cortex 206 7308 2.818828681 R precuneus Right precuneus cortex 515 7975 6.457680251 L postcentral Left postcentral gyrus 54 9519 0.56728648 L paracentral Left paracentral gyrus 132 3294 4.007285974 L precentral Left precentral gyrus 159 10740 1.480446927 L caudalmiddlefrontal Left caudal middle frontal gyrus 69 3736 1.846895075 L superiorfrontal Left superior frontal gyrus 85 12179 0.697922654 L isthmuscingulate Left isthmus cingulate cortex 60 2531 2.370604504 R isthmuscingulate Right isthmus cingulate cortex 199 2388 8.333333333 L posteriorcingulate Left posterior cingulate cortex 145 3266 4.439681568 95 Table A3.4. List of ROI activation for 2D viewpoint-dependent pose RDM Abbreviation Full roi name Num of voxels Total Num of voxels Voxel percent in the roi (%) L pericalcarine Left pericalcarine cortex 91 1912 4.759414226 L lateraloccipital Left lateral occipital cortex 2093 6379 32.81078539 R lateraloccipital Right lateral occipital cortex 1505 5963 25.23897367 L lingual Left lingual gyrus 485 4205 11.53388823 R lingual Right lingual gyrus 468 3894 12.01848998 L parahippocampal Left parahippocampal gyrus 204 1838 11.09902067 R parahippocampal Right parahippocampal gyrus 392 1742 22.50287026 L fusiform Left fusiform gyrus 545 4714 11.56130675 R fusiform Right fusiform gyrus 1138 4661 24.41536151 L inferiortemporal Left inferior temporal gyrus 123 4415 2.785956965 R inferiortemporal Right inferior temporal gyrus 857 4198 20.41448309 L middletemporal Left middle temporal gyrus 61 4452 1.37017071 R middletemporal Right middle temporal gyrus 571 5057 11.29127941 L supramarginal Left supramarginal gyrus 318 8600 3.697674419 R supramarginal Right supramarginal gyrus 102 8150 1.251533742 L inferiorparietal Left inferior parietal cortex 995 7871 12.64134163 R inferiorparietal Right inferior parietal cortex 1363 9676 14.08639934 L superiorparietal Left superior parietal cortex 931 10456 8.903978577 R superiorparietal Right superior parietal cortex 1590 10222 15.55468597 L precuneus Left precuneus cortex 232 7308 3.174603175 R precuneus Right precuneus cortex 642 7975 8.05015674 L postcentral Left postcentral gyrus 60 9519 0.630318311 L precentral Left precentral gyrus 104 10740 0.968342644 R precentral Right precentral gyrus 64 10705 0.597851471 L paracentral Left paracentral lobule 139 3294 4.219793564 R paracentral Right paracentral lobule 128 3831 3.341164187 L superiorfrontal Left superior frontal gyrus 130 12179 1.067411117 L caudalmiddlefrontal Left caudal middle frontal gyrus 90 3736 2.408993576 R isthmuscingulate Right isthmus cingulate cortex 184 2388 7.70519263 L posteriorcingulate Left posterior cingulate cortex 195 3266 5.970606246 96 Figure A3.1. Group-level functional localizer results for body-, face-, and place- selective areas across all the subjects (color map threshold is 62.5% (five out of eight subjects)). The body- selective areas include EBA and FBA. The face-selective areas include FFA and OFA. The place selective areas include PPA, OPA, and RSC.