Calibrating and Combining Evidence in Two-sample Comparisons
by Bob Staudte
Abstract: How much more evidence is there in a "highly significant" p-value equal to 0.01 compared to one which is just "significant" at 0.05 ? To answer such a question, a calibration scale is proposed for evidence in the p-value, and compared with a recently proposed Bayesian calibration scale by Selke, Bayarri and Berger (The American Statistician, 2001). In many routine problems the evidence in the p-value can be obtained directly by variance stabilizing the test statistic. This is a classical problem which many statisticians have studied because it often leads to confidence intervals for model parameters. It also facilitates combinations of evidence from different studies. However, variance stabilization, while straightforward in principle, often does not always work so well in practice. We briefly consider the problem of finding the evidence in the one-sample Student-t statistic and then concentrate on the Welch two-sample statistic. Applications to combining evidence in two-sample comparisons from different studies and confidence intervals for an overall effect are presented.
For More Information: Owen Jones tel. 8344-6412 email: email@example.com