August 2009
Volume 9, Issue 8
Vision Sciences Society Annual Meeting Abstract  |   August 2009
ImageNet: Constructing a large-scale image database
Author Affiliations
  • Li Fei-Fei
    Computer Science Dept, Princeton University, and Psychology Dept, Princeton University
  • Jia Deng
    Computer Science Dept, Princeton University
  • Kai Li
    Computer Science Dept, Princeton University
Journal of Vision August 2009, Vol.9, 1037. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Li Fei-Fei, Jia Deng, Kai Li; ImageNet: Constructing a large-scale image database. Journal of Vision 2009;9(8):1037.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Image dataset is a pivotal resource for vision research. We introduce here the preview of a new dataset called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate each of the majority of the 80,000 synsets (concrete and countable nouns and their synonym sets) of WordNet with an average of 500–1000 clean images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. To construct ImageNet, we first collect a large set of candidate images (about 10 thousands) from the Internet image search engines for each synset, which typically contains approximately 10% suitable images. We then deploy an image annotation task on the online workers market Amazon Mechanical Turk. To obtain a reliable rating, each image is evaluated by a dynamically determined number of online workers. At the time this abstract is written, we have six completed sub-ImageNets with more than 2500 synsets and roughly 2 million images in total (Mammal-Net, Vehicle-Net, MusicalInstrument-Net, Tool-Net, Furniture-Net, and GeologicalFormation-Net). Our analyses show that ImageNet is much larger in scale and diversity and much more accurate than the current existing image datasets. A particularly interesting question arises in the construction of ImageNet - the degree to which a concept (within concrete and countable nouns) can be visually represented. We call this the “imageability” of a synset. While a German shepherd is an easy visual category, it is not clear how one could represent a “two-year old horse” with reliable images. We show that “imageability” can be quantified as a function of human subject consensus. Given the large scale of our dataset, ImageNet can offer, for the first time, an “imageability” measurement to a large number of concrete and countable nouns in the English language.

Fei-Fei, L. Deng, J. Li, K. (2009). ImageNet: Constructing a large-scale image database [Abstract]. Journal of Vision, 9(8):1037, 1037a,, doi:10.1167/9.8.1037. [CrossRef]
 A Microsoft Research New Faculty Fellowship Award a Princeton Frank Moss gift.

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.