COSYNELog in


Cosyne 2009 Workshops


March 2-3, 2009

Snow Bird, Utah


Workshop Title

The role of spatial context in biological and computational vision

Organizer(s)

Ruben Coen Cagli & Odelia Schwartz, Albert Einstein College of Medicine

Abstract

A number of mechanisms are in place in biological systems, to narrow down the huge amount of visual information available to the eyes. Nevertheless, in normal circumstances, the visual input at a given point is richly influenced by its context. Spatial context in a visual scene is known to affect perception, from visual attention, to the proficiency in tasks such as object search and recognition, to pop-out effects, and spatial illusions. Furthermore, the activity of early visual neurons is influenced by contextual stimuli in the so-called non--classical receptive field (nCRF), leading to striking nonlinear behaviors. This wealth of phenomena has provided a source of inspiration for innovative computer vision techniques, and an important challenge for computational models of vision.

The aim of this workshop is to foster discussion on the above-mentioned topics, present recent experimental data and computational approaches, and outline future research directions. The workshop will provide the ideal context for researchers in a wide range of areas such as neurophysiology, psychophysics, computational modeling, and computer vision, to exchange ideas and interact.

Schedule

Morning Session (8.00-11.00 AM)

8.00-8.10 Introduction

8.10-8.40 Colin Clifford (University of Sydney)

8.40-9.10 Alessandra Angelucci (Utah)

9.10-9.20 Coffee Break

9.20-9.50 Valerio Mante (Stanford)

9.50-10.20 Sophie Deneve (Ecole Normale Supérieure, Paris)

10.20-10.50 Jozsef Fiser (Brandeis)


Afternoon Session (4.30-7.30 PM)

4.30-4.50 Ruben Coen Cagli (Albert Einstein College of Medicine)

4.50-5.20 John Reynolds (The Salk Institute)

5.20-5.50 Sarah Harrison (SUNY)

5.50-6.00 Coffee Break

6.00-6.30 Ruth Rosenholtz (MIT)

6.30-7.00 Laurent Itti (University of Southern California)

7.00-7.30 Nuno Vasconcelos (UCSD)


Speakers

Laurent Itti (University of Southern California) Integrating context to computational models of saliency-based attention. To cope with the complexity of the natural world, biological systems have evolved attentional strategies which rapidly focus processing resources on the most important and relevant aspects of the incoming sensory data. Here I will describe several exciting new research directions that study the joint stimulus-driven (or bottom-up) and contextual or goal-driven (or top-down) influences on attentional allocation. I will describe a new computational model which processes video inputs and predicts where observers look under different contextual and task conditions. I will discuss results of testing this model against human eye movement recordings over several hours of video stimuli.

Alessandra Angelucci (Utah) Contextual effects in primary visual cortex: pathways and mechanisms. The response of V1 neurons to oriented stimuli inside their receptive field (RF) is modulated by oriented stimuli in the RF surround. What circuitry and mechanisms generate orientation-specific surround modulation (SM)? We previously proposed that there are two components to the surround operating at different spatio-temporal scales: a near surround, subserved by feedforward and slow intra-areal horizontal (HZ) connections, and a far surround, subserved by divergent and fast inter-areal feedback (FB) connections. We provide anatomical evidence for two FB systems: one diffuse and unspecific terminating in upper layer 1, the other patchy and orientation-specific terminating in deeper V1 layers. To investigate the circuitry and mechanisms for SM, we generated an anatomically and physiologically-constrained recurrent network model of macaque V1, In the model, HZ and FB connections generate contrast-dependent size tuning and near SM, while FB connections mediate far SM by targeting horizontally-projecting V1 cells. I will present experiments designed to test several model’s predictions. One such prediction, supported by our recent data, is that the far surround can be facilitatory when the RF is weakly stimulated, e.g. by a stimulus of low contrast or of sub-optimal orientation. We have also examined the orientation tuning of surround facilitation and suppression and found them both to be tuned to the orientation “seen”, not “preferred” by the RF, with suppression being strongest and facilitation weakest for iso-oriented center and surround stimuli. In a 2D version of our model this property emerges from the interaction of orientation-specific HZ (and or FB) connections and strong recurrent connections. Iso-orientation SM independent of the input selectivity of V1 neurons, may serve the perceptual purpose of enhancing V1 responses to local orientation contrast.


Colin Clifford (University of Sydney) Orientation-specific contextual modulation in human visual cortex. The responses of orientation-selective neurons in primate visual cortex can be profoundly affected by the presence and orientation of stimuli falling outside the classical receptive field. Our perception of the orientation of a line or grating also depends upon the context in which it is presented. For example, the perceived orientation of a grating embedded in a surround tends to be repelled from the predominant orientation of the surround. Here, we used fMRI to investigate orientation-specific spatial interactions in human visual cortex. Subjects’ brains (n=8) were scanned at 3T while performing a dimming task at fixation. Stimuli consisted of a ‘test’ annulus 2-3º in radius embedded in an ‘inducing’ region. Each 15-second stimulus block contained a succession of gratings presented for 0.75s at 20 different orientations. Parallel and orthogonal blocks differed in the relative orientation of the test and inducing gratings, but not the distribution of absolute orientations. When test and inducer stimuli were either both luminance gratings or both red-green chromatic modulations, significantly lower BOLD activation was observed across the early retinotopic areas of visual cortex in response to gratings with parallel versus orthogonal inducers. This difference increased up the visual hierarchy. Colour-luminance interactions were much less orientation-selective. This pattern of results indicates orientation-specific lateral interactions beyond V1 that are selective for the colour/luminance congruence of the test and inducer.

Valerio Mante (Stanford) Validating functional models of the visual system with natural stimuli Functional models of the computations performed by the early visual system are largely based on neural responses to artificial stimuli such as bars, dots, and gratings. How well do these models explain neural responses to the complex image sequences occurring during natural vision? To address this question, we extended existing models of responses in the lateral geniculate nucleus (LGN) to operate on arbitrary movies. Our model is based on a linear receptive field that is shaped by fast adaptation mechanisms sensitive to luminance and contrast. The model can fit the responses to large sets of artificial stimuli and, with the same parameters, predicts the bulk of responses to natural stimuli. This shows that in the LGN the same functional mechanisms shape neural responses to both artificial and natural stimuli. Despite the relative simplicity of our model, large data sets are required to fit its parameters. It might thus be impractical to use our approach to validate more complex models in areas downstream of the LGN. Nonetheless, our results suggest that insights gained from artificial stimuli are invaluable for understanding natural vision throughout the visual system.

Sophie Deneve (Ecole Normale Supérieure, Paris) Divisive inhibition reshapes receptive fields to detect visual objects in their context. Experimental studies have demonstrated that stimuli outside the classical receptive field (cRF) of a visual neuron can nevertheless strongly modulate the cell's response to stimuli inside the cRF. The extent and influence of the cRFs and ncRFs, and the amplitude of the response also varies as a function of contrast and time. In particular, these observations hint at a coarse to fine coding continuum as more information becomes available. Here we show that cRF and ncRF spatial properties, their temporal dynamics and contextual dependencies can emerge naturally from detecting objects in movies. This approach draws a parallel between the spatio-temporal statistics of the visual input and the plasticity and dynamics of a network of adapting integrate and fire neurons. In particular, divisive inhibition (as opposed to subtractive inhibition) emerges as an essential component of visual processing since it is necessary to solve ambiguities between different possible interpretations of the visual input.

Nuno Vasconcelos (UCSD) A decision theoretic view of contextual effects in perception We present a new interpretation of neural computations as the implementation of optimal decision rules (in the decision-theoretic sense of Wald), for perceptual stimuli that follow statistical distributions commonly found in the natural world. Under this interpretation, contextual influences of stimuli that fall outside the classical receptive field (CRF) can be interpreted as measures of stimulus likelihood, under the null hypothesis of a Bayes optimal classifier. This has a number of interesting consequences for both the understanding of biological perception and the development of new algorithms for computer vision. We will review some examples in the area of visual saliency, object recognition, and visual tracking.

Sarah Harrison (SUNY) Skeletal axis structure and segmentation of texture-defined shapes. We studied the relationship between texture orientation and shape skeletal axes in two tasks related to texture perception. Shape axes are frequently presented as a computational solution to the gap between local featural representations and globally integrated object representations, but psychophysical evidence of shape axes’ influence on perceptual processes is limited. Our first series of experiments investigated discrimination of texture-defined shapes. We found that alignment between texture orientation and the skeletal axis of a figural region improved the segmentation strength, as did a perpendicular arrangement to a lesser extent. We attribute this effect to the orientation of the skeletal axis itself, not the orientation of the figure edges; these two factors were deconfounded by the use of shapes whose contours undulated relative to the main axis orientation. Discrimination of multi-part shapes additionally showed that local alignment of texture with the axis of the enclosing part gave superior segmentation performance when compared to the classically optimal case of uniform texture orientation. A second series of experiments investigated sensitivity to changes in texture orientation within texture patches. Texture orientation discrimination was heightened when texture was aligned with the axis of the patch shape, demonstrating that the "axis effect" also affects the encoding of texture orientation. Taken together, these findings point to a broad role of skeletal axes in influencing the processes by which texture elements are aggregated to form the object itself. Our results therefore provide evidence for the psychological reality of such representations and, more generally, demonstrate an influence of global configuration on the aggregation of local texture elements.

Jozsef Fiser (Brandeis) The effect of context in coding orientation of natural scenes and learning descriptors of unknown visual scenes. In the first part of the talk, I will revisit the age-old question of how orientation information is encoded in human perception. I will present a novel image manipulation technique that allowed us to assess sensitivity to local orientation structure while subjects viewed natural object images rather than simple displays with a single grating. We found that in natural context, the visual system involuntarily discounts substantial levels of orientation noise until it exceeds levels that are considerably higher than the smallest orientation change that humans can discriminate for a single contour. Moreover, we found that the level of discounting depends on the complexity and familiarity of natural images. These results do not fit the classic view of orientation coding based exclusively on bottom-up orientation information, but can be readily interpreted by a model that encodes orientation in context by combining sensory information with expectations derived from earlier experiences. In the second part of the talk, I will present evidence that when visual scenes contain complex statistical structures, the process by which humans develop internal representations of these scenes is best captured by Bayesian model comparison which selects a minimally sufficient representation, and not by traditionally proposed hierarchical pairwise associative learning. Together these results suggest that under natural conditions, the complexity of the context forces both immediate coding of simple attributes and high-level learning of new representations to utilize sophisticated mechanisms that rely on instantly combining sensory input with accumulated past knowledge.

John Reynolds (The Salk Institute) The normalization model of attention and context-dependent attentional modulation. An emerging view of the attentional system is that feedback signals impinge on sensory processing areas so as to enhance attended stimuli at the expense of unattended stimuli. An important goal is to understand the nature of the circuitry that transforms these feedback signals into improved sensory processing. One model of this circuitry, the normalization model of attention (Reynolds & Chelazzi, 2004; Reynolds & Heeger, 2009), posits that the brain has co-opted gain-control circuits that may originally have evolved to mediate context-dependent forms of response modulation such as surround suppression and to adapt sensory processing to changes in the strength of sensory input. According to this proposal, attentional feedback signals scale the inputs to normalization circuits in primary and extrastriate visual cortices. I will describe experiments designed to test predictions of the normalization model of attention. When a stimulus in a neuron's classical receptive field is paired with a second stimulus in the suppressive surround, the model predicts that directing attention to the stimulus in the center should diminish surround suppression, while directing attention to the surround stimulus should increase surround suppression. The model further predicts that attentional modulation will be stronger in the presence than in the absence of a suppressive surround stimulus. We tested these predictions in macaque area V4 and find clear support for these predictions. These findings demonstrate that attention modulates the neural mechanisms that give rise to center-surround interactions, and provide support for the normalization model of attention.

Ruth Rosenholtz (MIT) The visual system as statistician: Do contextual influences arise (in part) because the brain is collecting statistics? The visual system must make tradeoffs between the specificity of its responses and invariance to irrelevant variation. For example, ideally some part of our visual systems should not only to respond to an "F" at a particular location, but also to "F"s at neighboring locations. This might be accomplished by detecting invariant features (e.g. the co-occurrence of a horizontal and vertical bar, for the "F"), followed by "pooling" over the region of location invariance. In normal object recognition, with a small pooling region, this operation appears to find co-occurrences. With a larger pooling region over a repeated pattern, this operation resembles instead a statistical correlation, and could be used for texture analysis. Visual crowding, in which neighboring items impair the recognition of a target, may be one "cost" of this statistical computation over "too large" a region. The lack of focal attention may also lead to large pooling regions, resulting in statistical-like computations and their associated costs and benefits. I will present recent work testing the feasibility of this statistical view of early visual information processing.

Ruben Coen Cagli (Albert Einstein College of Medicine) Generative modeling of natural scene statistics and spatial context effects. We consider contextual surround effects from the point of view of a well-found model of natural scene statistics. We first review a class of generative model known as the Gaussian Scale Mixture (GSM) model, which can capture the statistical coordination amongst linear oriented filters, and is closely related to divisive gain control. We then focus on two issues in the model: a) which surround filters are statistically coordinated with a given center filter and therefore subject to a common divisive gain control; b) how surround filters that are in the gain pool, can have either suppressive or facilitatory influence, depending on their covariance with the center filter. We discuss how the extended model can be applied to reproduce surround modulation in visual cortical neurons in response to naturalistic stimuli, as well as perceptual effects such as saliency pop-out and grouping.

Retrieved from "http://cosyne.org/wiki/The_role_of_spatial_context_in_biological_and_computational_vision"

This page has been accessed 2,106 times. This page was last modified 03:04, 11 February 2009.


Cosyne 10
Meeting program
Workshops
Hotels
Transportation
Abstracts
Registration
Volunteers
Mailing list

Cosyne 09
Cosyne 08
Cosyne 07
Cosyne 06
Cosyne 05
Cosyne 04