We have shown that the shape of a 3D object can be accurately recovered from a single 2D orthographic or perspective image. The key to a successful recovery is the operation of a priori constraints: 3D symmetry, 3D compactness, and planarity of contours. Once the entire 3D shapes in the scene are recovered, it is possible to recover the spaces between them and form a complete spatial map. In three experiments, we tested how well a human observer recovers relative positions of objects (the shape of a 3D scene). In Experiment 1, the subject viewed a room, containing several pieces of furniture placed on textureless floor, from a single viewing position, and drew a top view of the room on a tablet computer screen. Two transformations (best rotation and size scaling in the least-squares sense) were applied in order to compare the relative positions of recovered objects to those of real objects. The error was less than 10% of the distance between the subject and the objects. Since the drawing is likely to be affected by the ability to scale sizes and distances, the subjects were asked in the next experiment to remember the positions of the objects and then walk around their positions after the objects had been removed. Performance in this task was as good as that in the drawing task. The contribution of short term memory was tested in the third experiment. The subject viewed a computer screen containing several dots. After the dots disappeared, the subject waited for a period of time (0 to 40 sec) and then reconstructed the map on another screen. The results show that delays of up to 15 sec do not harm the performance. Results of these three experiments show that human perception of 3D indoor scenes is very accurate.
This project was supported by the NSF and AFOSR.