Abstract
Various image-computable models have been proposed to describe the relationship between local luminance, visual context, and perceived luminance (brightness) or perceived reflectance (lightness). Classically, these models are tested on a subset of relevant stimuli. While specific failure cases have been shown for most models, a systematic overview is lacking. As a consequence, it is unclear how to favor any specific model of human brightness perception.
Our goal is to work towards such a comprehensive overview. Towards that end, we are developing a stimulus benchmark, and evaluate various brightness models on these stimuli. For this, we provide publicly available re-implementations of several brightness models in Python, code for creating parameterized stimuli, and BRENCH - a framework to automate running and evaluating brightness models on (sets of) stimuli. With our framework, we can replicate previously published modeling results. Going beyond, the framework facilitates the comparison of models from multiple publications across all stimuli from those publications. BRENCH is flexible to allow for new model(parameterization)s and stimuli. Comparing a larger set of models and stimuli makes it possible to group models which perform similarly on certain stimuli, and group stimuli based on similar model performances. We hope BRENCH aids discussions about what stimuli should form a benchmark for brightness models, and what the interface and form of such models should be.
Funding: Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC 2002/1 “Science of Intelligence” – project number 390523135.