Abstract
Methods for detecting visual artifacts in digital motion pictures are usually limited to specialized algorithms, each designed to deal with a single type of artifact. Content providers and video archivists both asses the visual quality of the videos in their collections, which requires artifact identification and localization. Effective detection solutions do not exist for all visual artifact types, especially as evolving media introduces new artifacts. As such, development of a general approach to motion picture quality assessment would be a significant boon for both producers and consumers of motion picture content. We have developed a single model, based on basic principles of visual perception and models of naturalistic pictures, which can be trained to produce a highly accurate detector of either upscaling or combing artifacts in motion pictures without any need for a reference signal. This model uses a shallow convolutional neural network to identify distorted locations in any sized input image or video. This local detection performed globally makes it possible to produce a dense detection map, which can then be used either to make a final overall prediction of the perceptual quality of the image or video. Using large (>100,000 samples) class-balanced test datasets, we observed an F1 score of 0.995 when distinguishing up-scaled images from natural images, and an F1 score of 0.99 when distinguishing combed video frames from pristine video frames. The model yields state-of-the-art prediction power on upscaling and combing artifacts that occur in digital motion pictures. The areas of distortion need not be known a priori, since it is learned using only global image/video labels. We envision that this general framework for motion picture artifact detection will provide the basis for powerful tools that will prove useful in the motion picture post-production and distribution industries.
Meeting abstract presented at VSS 2017