Merkitsevyyden merkitys, tilastolliset rutiinit sekä metodologiset ja kognitiiviset harhat
Keywords:
p-value, confidence interval, compatibility, biasesAbstract
Testing of statistical null hypotheses, abundant reporting of p-values, mechanical sorting of results into statistically "significant" and "non-significant", and starring based on threshold values (such as 0.05) are popular rituals in research reports in health sciences as well as in many other fields. Testing involves methodological background assumptions that affect the validity of the conclusions, but which are inadequately taken into account by researchers when analyzing their data and interpreting their results. The background assumptions concern both the statistical models used in the analysis and the setup of data collection, and they must be sufficiently fulfilled so that the nominal properties of the statistical quantities are valid. However, the realism of the assumptions often remains questionable. In addition, the analysis and valid interpretation of the results are mingled by misconceptions related to p-values and confidence intervals and other cognitive biases. Testing is often excessive and unnecessary, sometimes even impeding the progress of science. Misinterpretations sometimes lead to serious consequences. The increased reporting of confidence intervals has only partially improved the situation, because they are also misused and misinterpreted. The problems have certain historical and institutional reasons, wrong incentives, deficiencies in the methodological training of researchers and especially the often quite superficial application of statistical methods in relation to the research object and context. Representatives of mainstream statistics have for a long time brought up these problems in various forums, and presented their recommendations for appropriate practices of statistical analysis and reporting of results. The article reviews the international discussion on the topic, offers complementary perspectives to usual textbook presentations on the nature of p-values and confidence intervals, and on more thoughtful use and nuanced interpretations of them, and also surveys the use of testing and p-values in recent issues of this journal.