Abstract
Scientific progress relies on accurate inference about the presence (or absence) of an experimental effect. Failures to replicate high-profile studies have elevated concerns about the integrity of inference in psychology research (Nosek et al., 2015). One proposed solution is pre-registering experimental designs before data collection to prevent post-hoc changes that might increase false positives and to increase publication of null findings. However, pre-registration does not always align with the inherently complex and unpredictable nature of research, particularly when a priori power estimates are not sufficient to guide the design of studies. Better a priori power estimates would also increase confidence in interpreting null results. The current study used a massive dataset of visual search performance (>11 million participants, >2.8 billion trials; Airport Scanner, Kedlin Co.; www.airportscannergame.com) to produce empirical estimates of the a priori power of various designs (i.e., number of trials and participants) and to estimate the impact of and appropriate corrections for various post-hoc changes (e.g., retaining pilot data). Dividing the dataset into many thousands of independent replications of various designs allowed estimation of the minimum effect size each design can reliably detect (i.e., a priori power). Application of common post-hoc changes to these thousands of replications, yielded precise estimates of the individual and combined impact of post-hoc changes on false positive rates, which in some cases were >30%. Critically, adjusted p-values that correct for post-hoc changes can also be derived. The approach and findings discussed here have the potential to significantly strengthen research practices, guiding the design of studies, encouraging transparent reporting of all results, and providing corrections that allow flexibility without sacrificing integrity.
Meeting abstract presented at VSS 2017