December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
A computer-vision based approach to co-register frames from egocentric video recordings
Author Affiliations
  • Jianing Mu
    Haverford College
    The University of Tokyo
  • Zixun Wei
    Waseda University
    The University of Tokyo
  • Margaret Moulson
    Ryerson University
  • Gabriel (Naiqi) Xiao
    McMaster University
  • Ming Bo Cai
    The University of Tokyo
Journal of Vision December 2022, Vol.22, 3954. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jianing Mu, Zixun Wei, Margaret Moulson, Gabriel (Naiqi) Xiao, Ming Bo Cai; A computer-vision based approach to co-register frames from egocentric video recordings. Journal of Vision 2022;22(14):3954.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Wearable eye-trackers enable us to record eye movement dynamics from an egocentric viewpoint. Although the data collected from wearable eye-trackers can index our looking patterns in naturalistic settings, current data analyses focus on the correspondence between fixation locations and an individual frame from the egocentric video recording. This analysis approach separates the continuous eye movement dynamics into isolated frames, thereby hindering the study of the temporal dynamics of eye movement patterns, such as building computational models to predict eye movement during interpersonal interactions. The challenges are largely caused by the fact that the recorded eye movement data includes both eye movement and head/body motion, but there is no reliable method to isolate the head/body motion. Therefore, separating eye movement from and head/body motion may potentially facilitate computational models to focus on finding features relevant for predicting fixation dynamics. To this end, we adopt methods in computer vision to correct observers’ head/body motion by co-registering frames of videos taken by head-mounted cameras. The end results are a series of images as if taken from a static camera. Towards this goal, we first used semantic segmentation algorithms based on deep learning to identify stationary objects (e.g., a table & wall) in each frame. Next, we calculated the dense optic flows between every two consecutive frames based on the pixels automatically selected from the stationary objects. The global frame-by-frame movement is then estimated as a series of affine transformations and is used to warp and align consecutive frames. We tested our method on eye-tracking data and egocentric videos simultaneously recorded by head-mounted cameras from infants aged 9-18 months exploring a lab environment. Ongoing works are testing existing models of saliency prediction on the aligned videos.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.