Analyzing Eye Movement During Geospatial Data Inspection
June 26, 2008 By: William S. Helton, Robert Liimakka, Gennady Gienko, Eugene LevinModern geospatial data acquisition systems deliver vast amounts of multi-domain remotely sensed data, such as multi- and hyperspectral imagery and LIDAR point clouds. Unfortunately, geospatial products automatically derived from source geospatial data are burdened with residual errors and artifacts, which should be manually inspected, cleaned, and corrected. These tasks become critical in many large-scale projects that require real-time processing of visual information, and usually require manual post-processing or visual inspection of the source and/or derived data.
The process of visual inspection can be divided into two general phases: perception and reaction. Perception comprises such steps as visual search, feature selection, and identification. Reaction reflects a decision made by an operator, and usually involves other types of modalities (e.g., physical action such as mouse movements or typing). Human analysts perceive visual data through intensive eye movements, which subconsciously select the most distinctive features in an image to reduce overall ambiguity about the observed scene.
Spatial and temporal data derived from those eye movements, compiled while the operator observes geospatial imagery, retain meaningful information that could be successfully applied to geospatial image processing and interpretation. It is possible to determine which features attract the attention of the analyst and to measure the analyst's reaction time by deploying eye-tracking technology. Specifically, eye fixations can be interpreted as coordinates of the featured points' clouds of an object area being observed before the final decision-making process (such as mouse-clicked measurement) was performed.
Currently, geospatial data acquisition systems provide data in multiple domains. This data is acquired in different spectral domains, including various spatial and spectral resolutions, geometries (vertical, oblique, and panoramic), and sensor models. The complexity and variety of these data sets, and the processing of various types of terrain, requires multiple human-computer interactions. Furthermore, results of automated processing usually are inspected by human analysts for blunder detection and quality assurance purposes. Thus all these processes can be defined as "human-in-the-loop." Therefore, study of human geospatial analysts' mental workflow may be helpful in research and development efforts resulting in acceleration of semi-automated and foundational automated image fusion systems.
Geospatial Data Perception and Eye Movement
We may term the technical approach of interactive human geospatial data inspection "eye-grammetry," which is the technology based on the principles of tracking the human eye movements while perceiving the visual scene (Gienko, Levin, 2005; Gienko, Levin, 2007). The virtual scene, imagined in the brain, is inherently related to neurophysiological features of the human visual system and differs from the real-world image or scene. The brain processes visual input by concentrating on specific components of the entire sensory area, so that greater attention may be paid to the interesting features of a scene. Visual attention serves as a "selective filter," interrupting the continuous process of ocular observations with visual fixations. Human vision is a piecemeal process relying on the perceptual integration of small regions to construct a coherent representation of the whole (Duchowski, 2007, Mishkin et.al., 1983).
When the brain processes a visual scene, the elements of the scene are put in focus by various attention mechanisms. When the brain analyzes a visual scene, it must combine the representations obtained from different domains. Since information about the form and other features of particular objects can be obtained only when the object is foveated — observed by a small area of the retina that provides acute vision — different objects can be attended to only through saccadic movements of the eye. Saccades are rapid eye movements, made at the rate of about three per second, orienting the foveal region of the eye over targets of interest in a visual scene (Posner, 1990). Saccades are naturally linked to fixations — eye positions that are relatively stable. Past research proves that visual and cognitive processing occurs during fixations (Just and Carpenter, 1984).
Typical tasks in visual geospatial data analysis include, but are not limited to: retrieval of information, image interpretation, change detection, 3D surface reconstruction, and updating derived geospatial data such as GIS vector layers. In many application scenarios, such as risk management or military targeting, it is necessary to perform these tasks in real time. All these tasks require visual data matching and fusing performed by a human analyst, who at the same time can be a subject matter expert (SME) and, under certain circumstances, act as a decision maker. Thus, the solutions described herein constitute some useful technology empowering certain types of decision support systems, which can be defined as a human-computer symbiosis (HCS) in visual data analysis.
![]() Table 1. The main stages of a typical image analysis process, which involves certain human intellectual and computerized recourses, employed simultaneously or concurrently. |
1 2 3




