Individual Submission Summary
Share...

Direct link:

Poster #103 - Not like us: The changing visual world of infants

Fri, March 22, 7:45 to 9:15am, Baltimore Convention Center, Floor: Level 1, Exhibit Hall B

Integrative Statement

At every moment, the visual input received by a perceiver depends on their posture, location, and behavior. Infants’ postures, motor skills, locations and interests change markedly over the first year of postnatal life, creating systematically different regularities at different points in development. The field has little understanding of these changes and of their implications for visual learning. Indeed, many experimental studies of infant visual learning are structured from an adult perspective, with the stimuli selected and created by adults with mature visual systems and bodies. For example, the photographs often used in studies of category learning by infants (as well by adults and by machines) are taken by adults with mature bodies and visual systems that stand still and hold the camera to frame a picture. Here we attempt to quantify differences in infant visual experiences over the first year of life relative to one adult standard. We analyzed images from head-mounted cameras worn by 21 infants – in 3 age groups, 1-3, 6-8 months and 11-12 months -- as they went about daily activities in their home. A total of 1500 images (500 per age group) containing common objects (e.g., apple, car, hand, bowl, cat) that were comparably frequent at all ages were selected for analysis. One set of analyses used state-of-the-art object recognition models, Convolution Neural Networks (CNNs), from computer vision. These networks trained with adult-selected photographs of common objects are able to achieve human-adult level performance in recognizing novel instances of those categories. Can a CNN trained on such photographs recognize the objects captured in infant head camera images? Can they recognize the images of older infants better than younger infants? The CNN was pre-trained on the MS-COCO dataset, which is a collection of well-framed, adult-view images containing common life objects. The main results are shown in Figure 2. The CNN was particularly poor at recognizing objects in the images of the youngest and the oldest infants, but did quite well for 6 to 8 month olds. Why is that the case? Very young infants have very little control over what they see and their views and context are not at all like photographs taken by adults. Older toddlers have considerable control over what they see, but through their manual actions they create images that are very different (and much more variable) from standard photographs of canonical views. Infants who are 6 to 8 months old sit stably and watch the world, yielding images not unlike the photographs used to train networks in state-of-the art computer vision. In support of these conclusions, we will present data on the image size of target objects (large for the oldest children), the similarity of images of same categories (using pixel histograms as the measure), and clutter. The theoretical question is how these changing inputs support the development of infant object category learning.

Authors