Null hypothesis testing flaws in research and in the psychology field
Null hypothesis significance testing has been a dominant method in psychological research and is still used, despite significant flaws, in the academic environment, even among experienced researchers. According to Cohen (1990) and Schmidt (1996), many researchers, including students in graduate programs, emphasize the importance of understanding the magnitude of an effect rather than merely its statistical significance. Using the .05 level for many tests escalates the experiment wise Type I error rate—or in plain English, greatly increases the chances of discovering things that aren’t so (Cohen, 1990). Those of us who are the keepers of the methodological and quantitative flame for the field of psychology bear the major responsibility for this failure because we have continued to emphasize significance testing in the training of graduate students despite clear demonstrations of the deficiencies of this approach to data analysis (Schmidt,1996).
Cohen (1990) and Schmidt (1996) explained several reasons why we should not use Null hypothesis significance testing, especially in the social sciences. The statistics texts on which I was raised and their later editions to which I repeatedly turned in the 1950s and 1960s presented null hypothesis testing a la Fisher as a done deal, as the way to do statistical inference (Cohen, 1990). An essential part of the explanation is that researchers hold false beliefs about significance testing, beliefs that tell them that significance testing offers important benefits to researchers that it in fact does not (Schmidt, 1996). In psychology, and especially in soft psychology, under the way of the Fisherian scheme, there has been little consciousness of how big things are (Cohen, 1990).
Cohen (1990) and Schmidt (1996) emphasized the criticism of Null hypothesis testing, highlighting a binary decision-making framework where results are either significant or not significant, based on an arbitrary p-value threshold (usually 0.05). The Fisherian null hypothesis does not tell us the probability that the null hypothesis is true; it certainly cannot tell us anything about the probability that the research or alternative hypothesis is true (Cohen, 1990). Both Cohen (1990) and Schmidt (1996) employ the same approach and highlight the neglect of effect sizes and confidence intervals, which provide more informative insights into the data. They argue that meta-analysis would be another way to analyze data and advise against the use of the null hypothesis testing for the future of scientific research. We can no longer tolerate a situation in which our upcoming generation of researchers are being trained to use discredited data analysis methods while the broader research enterprise of which they are to become a part has moved toward improved methods (Schmidt , 1996). A nuanced approach that includes consideration of effect sizes, confidence intervals can be a game changer in research instead of keeping working with the dominant use of null hypothesis significance testing.
References
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304–1312. https://doi.org/10.1037/0003-066X.45.12.1304
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115–129. https://doi.org/10.1037/1082-989X.1.2.115

