start » program » presentations

Individual differences in speech-driven gaze patterns in the visual world task

Eye movements over visual displays during apprehension of speech yield sensitive indicators of the time-course of speech comprehension [1]. On a moment-by-moment basis, gaze to display elements is closely connected to the meaning of the language being heard. This ‘visual world’ (VW) paradigm has been used to study various aspects of speech perception and comprehension ranging from phonetic characteristics of speech [2], to semantic characteristics of words [3], to integration of meaning across words [4]. However, the bulk of research using the VW task has focused on contingencies between variation in gaze responses and variation in stimulus characteristics. Our interest is different, lying not in the nomothetic effects of stimulus manipulations, but in individual variability around the nominal responses.
We use the VW task to examine associations between the ability to integrate semantic information across words and visual context, and individual differences in measures of language and literacy skill. We recruited native English speakers with wide-ranging literacy skills aged 16 to 24 years (n=64) from adult education centers and community colleges serving urban neighborhoods. Thus, we capture a wider range of language-related capacities than found among university students. We seek to achieve a deeper understanding of the true range of variation in human language processing skill and its correlates [5]. Participants were assessed for vocabulary knowledge, verbal memory and other measures of language and cognitive function.

Our VW task consists of a 4 picture display and a simple instruction to the participant to (e.g.) "Point to the purple balloons." Four conditions instantiate factorial manipulation of 2 variables: First is early vs. late resolution: in the 'early' condition targets can be identified by color alone (only 1 purple object onscreen); targets in the 'late' condition cannot be identified until the noun is heard (more than 1 purple object onscreen). The second factor is the presence vs. absence of a name competitor for the target. Competitor displays include an object of the same type as the target, but of a different color; in other displays the target picture is unique with respect to name (cf. [4]). Adjectives are common color terms and nouns are names of common objects.

Analyses use mixed-effects growth models to examine the proportion of looks to target pictures in each condition as a function of time [6]. We find associations between measures of verbal memory, visuo-spatial memory and vocabulary knowledge and the time course of gaze to target objects. This study demonstrates the existence of non-random variation in the ability of adult listeners to integrate meaning across words and with visual context, even in the case of simple adjective--noun composition.


1. R. M. Cooper, Cognitive Psychology 6, 84 (1974).

2. D. Dahan, J. S. Magnuson, M. K. Tanenhaus, E. M. Hogan, Language and Cognitive Processes 16, 507 (2001).

3. D. Mirman, T. J. Strauss, J. A. Dixon, J. S. Magnuson, Cognitive Science 34, 161 (2010).

4. J. C. Sedivy, M. K. Tanenhaus, C. G. Chambers, G. N. Carlson, Cognition 71, 109 (1999).

5. D. Braze, W. Tabor, D. P. Shankweiler, W. E. Mencl, Journal of Learning Disabilities 40, 226 (2007).

6. D. M. Bates. (University of Wisconson, Madison 2010).