Before we can discuss the reasons why object perception isn’t an easy task for computer vision, we need to understand computer vision, and the importance and uses of object perception in the human visual processes.
Computer vision is technology associated with “seeing” computers. More specifically, it is concerned with making computers that can see in the same way, or much the same way as the human visual processing systems can (the eye and brain) (Ballard & Brown, 1982). However, due to the complexity of the eye and the brain’s role in vision, individuals interested in computer vision have found it difficult to implement the same level of object perception as is present in humans.
The human visual system sees color, light, contrast, and varying viewpoints. It can also fill in the blanks in an object or scene when something cannot be seen, is blurred, or otherwise impaired because our memory allows us to inset the appropriate object there. The ability of our visual system to adapt and work to see so many things makes up object perception. From this standpoint, it is easy to see why a computer would have so much difficulty with object perception. After all, does a computer have eyes? Do those eyes have rods and cones to decipher different information? Does it have the ability to recognize an object based on various viewpoints? Can it differentiate between an object and its background? And does it have the memory base by which to fill in information to create a whole picture when one is not available? At this point, no. Even in highly specialized robots, the task of object perception is a serious point of research.
The Difficulty of Object Perception for Computer Vision
Because computers are lacking in many of the basic biological systems the human body uses to see, lighting is a huge issue. You may not realize it, but differences in light are used everyday by the human visual system to create different colors and pictures. According to Ballard and Brown (1982), the “low-level capability” of seeing light and color allows the human eye to see black as being black, even when the light wavelengths are inconsistent. This is differentiated from the computer’s visual system as it is unable or has difficulty with inconsistent light and what that light means. For example, depth can be a problem, as can be light reflecting on a surface like water. For a computer it can be difficult to distinguish whether this reflection is an object, or whether it is merely a result of the light upon the water’s surface. This seems to go hand in hand with the point that with current technology, computers and robots still have difficulty distinguishing an object from its background. This obviously has a great deal to do with the difficulty in seeing color and lighting and the impact of both of those things on a scene or object.
A computer based object perception presents the issue of object recognition. When the human visual system sees an object we can recognize it from various viewpoints. For example, no matter in how from how many various viewpoints you see the Eiffel Tower from, you still know it is the Eiffel Tower. But what about a computer? A computer lack a human brain, and thus strings of information entered into the computer must function as the brain so that the computer can process information in the same way the human brain does. An article on research of the challenges that both robots and infants face in object perception noted that there is a “growing awareness of the importance of collecting and exploiting empirical knowledge about statistical combinations of materials, shapes, lighting, and viewpoints that actually occur in our world (Fitzpatrick et al., N.d). This research illustrates the movement within the scientific community to seek solutions for computer vision problems such as challenges with objects seen from various viewpoints.
This same reason leads into the next difference between computer vision and the visual systems of the human body. Lee, Bulthoff, & Poggio (2000) explain that due to the inability of the computer or robot to connect an image in one viewpoint to an image in the other, “it is required to generalize correctly from past experience and classify correctly the novel image,” while the human visual system allows for the learning of characteristics in objects and scenes that then allows us to comprehend the variability of the object or scene.
Another issue in computer vision is the robot or computer “eye” or “brain’s” inability to complete a visual scene when information is missing. Just as with a varying image and the computer being unable to identify it, a computer also cannot instantly complete an image based on past experience unless it was specifically programmed to do so. For a human being, we have knowledge of what we’re looking at; we generally have some sort of experience with it, and we can connect it with a life event, or relate it to a similar life event. With a computer, pathways have to be created in order to connect events, objects, and scenes, and that requires a human to program those pathways. It is for this reason that computers have difficulty or simply cannot put various bits of information together to gain comprehension, because they are only able to “see” the information, not understand it’s meaning in the same way a human being would (Goldstein, 2010).
D. Ballard and C. Brown, Computer Vision, previously published by Prentice-Halldaidb, 1982.
Fitzpatrick, P., Needham, A., Natale, L., Metta, G. (N.d). Shared Challenges in Object Perception for Robots and Infants. Retrieved November 4, 2009, from http://www.robotcub.org/misc/review2/06_Fitzpatrick_Needham_Natale_Metta.pdf
Goldstein, B. (2010). Sensation and Perception, 8th ed. Belmont: Wadsworth.
Lee, S., Bulthoff, H., H., Poggio, T. (2000). Biologically motivated computer vision: first IEEE international workshop. Berlin: Springer-Verlag Berlin Heidelberg.