May 2008
Volume 8, Issue 6
Vision Sciences Society Annual Meeting Abstract  |   May 2008
Human-assisted motion annotation for real-world videos
Author Affiliations
  • Ce Liu
    Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT
  • Edward Adelson
    Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT, and Department of Brain and Cognitive Sciences, MIT
  • William Freeman
    Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT
Journal of Vision May 2008, Vol.8, 679. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Ce Liu, Edward Adelson, William Freeman; Human-assisted motion annotation for real-world videos. Journal of Vision 2008;8(6):679.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

The computations of the human visual system are presumably well matched to the statistics of natural scenes. What are those statistics? It would be appealing to analyze massive amounts of imagery with machine vision systems. However, it is often preferable to hand-label images for the variables of interest since humans are far more accurate than machines. Useful hand-labeled databases include the continuity of contours (Geisler et al., 2001) and the segmentation of images (Martin et al., 2001). We wish to analyze motion in natural image sequences. Hand labeling motion sequences provides a serious challenge. The sheer number of pixels is vast, and assigning a velocity to every pixel in every frame would drive away even the most patient labeler. Based on recent computer vision techniques, we have designed a computer system for efficient motion annotation. The image sequence is represented as a set of overlapping layers, each with smooth motion. The observer marks the boundary and provides a depth ordering of each object in a given frame. The computer system propagates the layer annotation to the other frames and estimates a set of flow fields. The user picks the best flow field that yields accurate matching between two adjacent frames and agrees with the smoothness and discontinuities of the image. When flow estimation fails, the user can label sparse correspondences between two frames, which the system automatically interpolates to a dense correspondence. We find that the mean absolute deviation of eight subjects' annotations of one sequence is around 0.1 pixels, and the human labeled motion of a sequence with veridical ground truth also has a mean error of around 0.1 pixels. We have labeled 20 video sequences with plans to label hundreds more. This will provide a useful motion database for researchers in both human and machine vision.

Liu, C. Adelson, E. Freeman, W. (2008). Human-assisted motion annotation for real-world videos [Abstract]. Journal of Vision, 8(6):679, 679a,, doi:10.1167/8.6.679. [CrossRef]

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.