Experiment

The head mounted display (HMD) Oculus-DK2 was used for this test. It has a frame refresh rate of 75Hz, resolution of 960x1080 per eye and a total viewing angle of 100o in both horizontal and vertical directions. The gyroscopic sensors within the device are able to transmit the orientation data at a rate equal to the device frame refresh rate. A small eye-tracking camera from Sensomotoric Instruments (SMI) was integrated into the device and was able to transmit eye-tracking data binocularly at 60Hz.

The software setup included a custom build unity software along with the Oculus-DK2 driver version 2.0. The software had a feature to check for calibration accuracy every two minutes and re-calibrated each time if necessary.

A total of 17 observers in the age group of 25-52 participated in the test, out of which 9 were experts who were using a VR headset everyday at work. All observers were made to explicitly answer a questionnaire before the start of the test, which asked for their expertise and age.

Observers were tested for visual acuity using the Snellen Test and their dominant eye was also determined using the cardboard technique. Data from the dominant eye was used for all further analysis and calibration. To maintain a natural (free-viewing like) gaze pattern, subjects were made to view the scene normally without the need to provide explicit quantitative measurements. They were instructed to watch the scene as normally as possible with a combination of head and eye-movement. Observers were also free to stop the test anytime in case they felt fatigued or had a sensation of vertigo. There were five images used as a training for the observers before starting the actual test.

A total of 60 stimuli were shown the observers in a sequence. Each stimuli lasted for 25 seconds and there was a 5 second gray screen between two stimuli. Every two minutes there was a calibration performed to check the accuracy of the eye-tracker. The test itself lasted for about 35 minutes and the observers had a pause of 5 minutes at the half point of the experiment. The observers were themselves seated comfortably in a turn-chair and were free to rotate the full 360 degrees and also move the chair within the room if necessary. The position of each 360 image was reset to the equirectangular image center at the start of each viewing (irrespective of their position). This was done to ensure that all observers start at the same starting position in the panorama.
 

Detailed description of database contents

The dataset contains 60 images (360°), illustrated in the figure bellow, along with eye tracking data from four different classes:

  • Indoor/Outdoor natural scenes
  • scenes containing human faces
  • sports scenes
  • computer graphics contents

Figure: Three sample images from each of the five classes used for the test. (Top Row): Cityscapes - outdoor scenes, (Second Row): Small rooms - indoor scenes, (Third Row): Scenes containing human faces, (Fourth Row): Great halls - indoor scenes, (Bottom Row): Naturescapes - outdoor scenes.

The eye-tracking data are provided in three forms (in respective sub-folders):

1 - Scan-path Data

It includes 40 images with the associated scan-path data from 48 observers (each of who have observed the data for a total of 25 seconds). The scan-paths are composed of individual fixations and are extracted from a combination of the raw head and eye movement data. The data is also organized in a text file named "SP<ImageNumber>.txt". Each line contains a quadruple vector that indicates the Fixation Number, Fixation-Time, X-Position (Equirectangular) and Y-Position (Equirectangular) respectively. The fixation number increments serially for a particular observer and resets to 1 when we reach the next observer, after all of the fixations of the given observer are completed. The fixation time is indicated in seconds and X and Y positions are indicated in pixels (of the respective Equi-Rectangular image).

A simple illustration of the scan-path data from 3 different observers can be found bellow. The image shows the scan-path data for the observers, each observer labelled in a particular color. The corresponding number next to the circle indicates the order of the fixation. The circle itself indicates the fixated location.

2 - Head Motion based Saliency maps

We provide a total of 20 images and the associated saliency-map for each image composed from the head movement data (Yaw, Pitch, Roll) of 48 observers who have watched the image for 25 seconds each. The data is organised into a binary file "SH<ImageNumber>.bin" containing double values (8 bytes), depicting the saliency value of each pixel. The saliency data is organized row-wise across the image pixels. The minimum value of saliency is 0 and the sum of all pixel saliencies equals to one.

A simple illustration of the saliency map for each of the images can be found bellow. While the more saturated red regions indicate the regions of frequent attention, the more saturated blue regions are attended relatively sparsely.

3 - Head+Eye-Motion based Saliency maps

We provide a total of 40 images and the associated saliency-map for each image composed from the head and eye movement data (Yaw, Pitch, Roll and also X-Gaze, Y-Gaze) of 48 observers who have watched the image for 25 seconds each. The data is organized into a binary file " SHE_<ImageNumber>.bin" containing double values (8 bytes), depicting the saliency value of each pixel. The saliency data is organized row-wise across the image pixels. The minimum value of saliency is 0 and the sum of all pixel saliencies equals to one.

A simple illustration of the saliency map for each of the images can be found bellow. While the more saturated red regions indicate the regions of frequent attention, the more saturated blue regions are attended relatively sparsely.

Detailed description of source code provided

Matlab and C++ functions to parse the data are provided in each respective sub-folders.

Links

Introduction

Download page