Free
Research Article  |   April 2009
Comparing citations and downloads for individual articles at the Journal of Vision
Author Affiliations
Journal of Vision April 2009, Vol.9, i. doi:10.1167/9.4.i
  • Views
  • PDF
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to Subscribers Only
      Sign In or Create an Account ×
    • Get Citation

      Andrew B. Watson; Comparing citations and downloads for individual articles at the Journal of Vision. Journal of Vision 2009;9(4):i. doi: 10.1167/9.4.i.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Introduction
Measuring the impact of scientific articles is of interest to authors and readers, as well as to tenure and promotion committees, grant proposal review committees, and officials involved in the funding of science. The number of citations by other articles is at present the gold standard for evaluation of the impact of an individual scientific article. Online journals offer another measure of impact: the number of unique downloads of an article (by unique downloads we mean the first download of the PDF of an article by a particular individual). Since May 2007, Journal of Vision has published download counts for each individual article. So far as we know, we are the only scientific journal providing these numbers. In the most recent accounting in July, 2008, the top five articles were each downloaded between 1,993 and 3,478 times. While we cannot equate download of an article with actually reading it, these are nonetheless remarkable numbers. The reader may wonder how total downloads of an article compare with the more traditional measures of citation count. Elsewhere I and others have discussed the differences between, and advantages and disadvantages, of download and citation counts (Watson, 2007) (Brody, Harnad, & Carr, 2006; Deciphering citation statistics,” 2008; Perneger, 2004). In this note, I discuss the degree of correlation between these two measures. 
Before proceeding to the analyses, it is worth contemplating potential outcomes. Since downloads and citations are in some respects complementary measures, we should not expect perfect correlation. But substantial correlation, joined to the fact that downloads generally precede citations, would mean they provide a useful early predictor of eventual citations. 
Methods
The data in this report were collected from two sources. The first is our own collection of log files at the Journal of Vision. The logs cover the interval from October 23, 2003 to July 1, 2008. The log files were analyzed to extract unique PDF downloads as a function of time since publication. A unique download is the first download of a particular paper by a particular reader. In the remainder of this paper, “downloads” refers to unique PDF downloads. 
The second data source is citation counts for all Journal of Vision papers collected from Scopus on July 18, 2008. Scopus is a large abstract and citation database of research literature (http://www.scopus.com/). Our Scopus data consist of counts of citations for each article occurring in each calendar year from 2001 to 2008. We excluded data corresponding to editorials and errata. 
Results
Total citations vs total downloads
Our first comparison is between the total downloads and total citations. This is shown in Figure 1, in which we plot the two quantities against one another (we add 1 to citations to allow it to be plotted on a log scale). The correlation between these two quantities is 0.74, indicating a strong positive relationship. 
Figure 1
 
Total downloads vs total citations. We add 1 to citations to allow it to be plotted on a log scale.
Figure 1
 
Total downloads vs total citations. We add 1 to citations to allow it to be plotted on a log scale.
The data in Figure 1 correspond to papers that vary in age from 0 to 7 years. Since papers garner both downloads and citations over time, it is possible that much of the association shown in Figure 1 is due to growth with age. To examine this effect, we first looked at the growth of citations and downloads with time following publication. 
Citations vs age
The citation data are already binned into calendar years, so our analysis is coarse. In Figure 2, we plot the average cumulative number of citations per article as a function of article age. The number climbs steadily to about 18 citations after 6 years. Estimates are less certain for the oldest articles because fewer papers contribute to the estimate, but there is as yet no evidence of an asymptote. 
Figure 2
 
Average total citations per Journal of Vision article as a function of article age. Error bars show ±2 SE.
Figure 2
 
Average total citations per Journal of Vision article as a function of article age. Error bars show ±2 SE.
Downloads vs age
For the download data, we know the date and time of each download, so the analysis can be performed on a finer time scale. We counted unique downloads for each article in bins of one week from date of publication until the date of the last record (July 1, 2008). For each week, we averaged the count over all articles in existence in that week. The result is shown by the black curve in Figure 3. The remarkably smooth curve shows a rapid initial climb followed by a more gradual rise, reaching a value of about 1000 after 7 years. Elsewhere, we have noted a similar shape that characterizes the growth in downloads for individual articles (Watson, 2007). 
Figure 3
 
Average total unique downloads per article as a function of years after date of publication (black curve). The red points are transformed average cumulative citations, fromFigure 2, as described in the text.
Figure 3
 
Average total unique downloads per article as a function of years after date of publication (black curve). The red points are transformed average cumulative citations, fromFigure 2, as described in the text.
An obvious difference between downloads and citations is that the former can occur the moment the article is published, while citations inevitably lag by at least the time required to write and publish an article. That difference aside, both quantities rise systematically with article age. In fact, the rate of growth is quite comparable, once the lag is accounted for. To show this, in Figure 3 we also re-plot as red points the data from Figure 2, advanced in time by 2 years, and multiplied by 45. Loosely described, on average, about 45 downloads correspond to one citation about 2 years later. 
Citations vs downloads for papers published in a given year
To neutralize the growth with age, we can compare the total downloads and citations (as of July 1, 2008) for papers published in a given year. This analysis is shown in Figure 4, which shows the correlation between total downloads and total citations for papers published in each year. The figure shows a strong positive correlation in each year, with a high of around 0.8 in 2003. Because of the lag between downloads and citations noted above, we should not expect correlations to be as high for articles less than three years old. In articles at least three years old, the correlation is always above 0.6 (except for 2001, which is based on only 12 papers). Recall that total downloads are not accurate for papers prior to 2004, because only logs after October 2003, were available. We shall have to wait for several more years to determine whether the correlation continues to climb with age, and at what age and value it might asymptote Figure 5
Figure 4
 
Correlation between total downloads and citations for papers published in each year. The error bars indicate ±2 SE.
Figure 4
 
Correlation between total downloads and citations for papers published in each year. The error bars indicate ±2 SE.
Figure 5
 
Distribution of CiteRate for Journal of Vision articles published in 2004.
Figure 5
 
Distribution of CiteRate for Journal of Vision articles published in 2004.
The Journal of Vision is, of course, a very young journal. As can be seen in Figure 4, the number of papers published each year has changed markedly over our lifespan. And our lifespan coincides with a period of radical change in the methods and habits of publication and consumption of scientific articles. Consequently patterns of submission, citation, and download have changed markedly over the eight years of our existence. Changing correlations over time between citations and downloads may reflect these other changes as well. 
CiteRate vs DemandFactor
To this point we have compared two statistics for individual papers: total downloads and total citations. As we have noted, both of these grow over time subsequent to publication, which limits the usefulness of the raw statistics in comparing papers of different ages. However, both statistics can be normalized for age. In the case of downloads, we have proposed the DemandFactor, which corresponds to the number of downloads per day over the first 1000 days of an article's lifetime (Watson, 2007). In the case of citations, we can count citations in some interval of time following publication. This is reminiscent of the impact factor, but for individual articles, and generalized with respect to the interval in which citations are counted. 
Recall that in our dataset the citations are binned into counts per calendar year for each article. Thus the interval in which citations are counted must be an integer number of years. We characterized the interval by a lag (years after the year of publication) and a length (years included in the count) and explored lags of 0 to 3 years, and lengths of 1 to 5 years (where possible). We computed the number of citations within the interval, and divided by the length of the interval. We call this CiteRate, and it has units of citations/year. 
Since citation rates for individual articles have not been widely described or analyzed, we show the distribution of CiteRate at Journal of Vision, for a lag of 2 and length of 3. The median of this distribution is 3, and the mean is 4.2682. For comparison, the 2006 Impact Factor of the Journal of Vision was 3.753. That describes citations in 2006 of articles published in 2004 and 2005. 
Figure 6
 
Correlation between DemandFactor and CiteRate for papers published in 2004, as a function of article age (lag + length of the counting interval).
Figure 6
 
Correlation between DemandFactor and CiteRate for papers published in 2004, as a function of article age (lag + length of the counting interval).
We plot the correlation against article age at the end of the counting interval (length + lag). The red curve shows that correlation grows steadily with article age, reaching a value of 0.62 after five years. The other curves show that it does not make much difference how long after publication we wait to begin counting. This value is essentially the same as the correlation between total citations and total downloads for year 2004 shown in Figure 4
To summarize, DemandFactor correlates strongly with CiteRate (r = 0.62), measured over an interval of five years after publication. It is possible that the correlation continues to climb for even larger values of article age. This is useful, since DemandFactor, unlike total downloads, can be used to compare articles irrespective of age. 
Discussion
Our study confirms and extends a number of previous reports relating online usage and citation statistics. The earliest report measured “hits,” during the first week following publication, of the HTML full text articles published in a single volume (1999) of the British Medical Journal, and compared those with citations of the same articles as of May 2004 (Perneger, 2004). For this set of 153 papers, a correlation between logs of 0.54 was found. An analysis of downloads (from a UK mirror site) and later citations of physics articles deposited in a large preprint archive (arXiv.org) showed an asymptotic correlation of 0.46 (Brody et al., 2006). For Nature Neuroscience articles published in 2005, a correlation of 0.72 was found between citations as of March 2008 and PDF downloads in the first 180 days of an article's lifetime (“Deciphering citation statistics,” 2008). 
Conclusions
  1.  
    Overall correlation between total downloads and total citations of Journal of Vision articles is 0.74.
  2.  
    Citations and downloads increase with article age in a characteristic way, but relative to downloads, citations are delayed by about 2 years and reduced by a factor of about 45.
  3.  
    For papers published in a single year, the correlation is as high as 0.8, and usually above 0.6.
  4.  
    The correlation between age-normalized statistics of DemandFactor (downloads/year) and CiteFactor (citations/year) is about 0.62.
  5.  
    Download statistics provide a useful indicator, two years in advance, of eventual citations. Downloads are also a useful measure in their own right of the interest and significance of individual articles.
Acknowledgments
This work was supported in part by NASA's Space Human Factors Engineering Project, WBS 466199 and by NASA/FAA Interagency Agreement DTFAWA-08-X-80023. 
Commercial relationships: none. 
Corresponding author: Andrew B. Watson. 
Email: andrew.b.watson@nasa.gov. 
Address: MS 262-2 NASA Ames Research Center, Moffett Field, CA 94035, USA. 
References
Brody, T. Harnad, S. Carr, L. (2006). Earlier Web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology, 57, 1060–1072. [CrossRef]
(2008). Nature Neuroscience, 11, 619. [CrossRef] [PubMed]
Perneger, T. V. (2004). Relation between online “hit counts” and subsequent citations: Prospective study of research papers in the BMJ. BMJ, 329, 546–547. [PubMed] [Article] [CrossRef] [PubMed]
Watson, A. B. (2007). Measuring demand for online articles at the Journal of Vision. Journal of Vision, 7, (7):,
Figure 1
 
Total downloads vs total citations. We add 1 to citations to allow it to be plotted on a log scale.
Figure 1
 
Total downloads vs total citations. We add 1 to citations to allow it to be plotted on a log scale.
Figure 2
 
Average total citations per Journal of Vision article as a function of article age. Error bars show ±2 SE.
Figure 2
 
Average total citations per Journal of Vision article as a function of article age. Error bars show ±2 SE.
Figure 3
 
Average total unique downloads per article as a function of years after date of publication (black curve). The red points are transformed average cumulative citations, fromFigure 2, as described in the text.
Figure 3
 
Average total unique downloads per article as a function of years after date of publication (black curve). The red points are transformed average cumulative citations, fromFigure 2, as described in the text.
Figure 4
 
Correlation between total downloads and citations for papers published in each year. The error bars indicate ±2 SE.
Figure 4
 
Correlation between total downloads and citations for papers published in each year. The error bars indicate ±2 SE.
Figure 5
 
Distribution of CiteRate for Journal of Vision articles published in 2004.
Figure 5
 
Distribution of CiteRate for Journal of Vision articles published in 2004.
Figure 6
 
Correlation between DemandFactor and CiteRate for papers published in 2004, as a function of article age (lag + length of the counting interval).
Figure 6
 
Correlation between DemandFactor and CiteRate for papers published in 2004, as a function of article age (lag + length of the counting interval).
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×