Cohen (1990) and Schmidt (1996) both raised important criticisms about the overuse of null hypothesis significance testing (NHST) in psychology, highlighting its problematic nature for our field. Cohen (1990) believed that one major issue is how we as psychologists often misuse the information from a p-value. For a finding to be statistically significant (p < .05) does not necessarily mean that the finding is important or even practically meaningful. If we test for an effect using a large enough sample, even the smallest or most trivial of effects can appear to be “significant.” On the other hand, if our sample is too small, we may fail to detect a real and substantive effect. By focusing on the wrong findings and overstating our results, we become less scientific.

Another important critique from Cohen (1990) is that NHST provides no information about the size of an effect (i.e., effect size) or how large or meaningful the effect is. By not reporting effect sizes and confidence intervals, our scientific understanding of findings remains incomplete and underpowered. Schmidt (1996) added that overreliance on NHST is also a major source of replication issues in our field. P-values are not stable across sample sizes and are subject to random variation, so many “significant” findings are not replicable across studies. He argued this prevents our science from being cumulative, as we simply discard any “failures to replicate” rather than building on them. Schmidt (1996) called for a move away from NHST and significance testing, and instead recommended a shift to estimation, including reporting effect sizes, confidence intervals, and using meta-analysis to help us interpret our findings by their magnitude and consistency, rather than as a binary “significant” or “non-significant.”

These are critical points for our field because if we continue to place too much value on NHST, we may limit our progress as a science. In order for psychology to truly build generalizable and reliable knowledge, we need to look beyond the “reject or fail to reject” framework and focus more on estimation and replication. In the coming weeks, learning about alternative approaches (such as effect size estimation and Bayesian approaches) will help us appreciate how research and interpretation can move toward more meaningful and useful results.

References

Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304–1312. https://doi.org/10.1037/0003-066X.45.12.1304

Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115–129. https://doi.org/10.1037/1082-989X.1.2.115