The probability of observing your data (or something more extreme) if there were truly no effect. Widely used, widely misunderstood, and never sufficient on its own to make a decision.
Definition: The probability of observing your data (or something more extreme) if there were truly no effect. Widely used, widely misunderstood, and never sufficient on its own to make a decision.
A p-value tells you how likely your observed result would be if the null hypothesis were true—that is, if there were no real difference or effect. A small p-value means the data would be surprising under the assumption of no effect.
If p = 0.03, there is a 3% chance of seeing a result this extreme assuming the null hypothesis is true. It does not mean:
The 0.05 threshold is a convention, not a natural law. Fisher originally proposed it as a rough guide, not a binary decision rule.
Report effect size alongside your p-value. A statistically significant result with a tiny effect size is not worth acting on. A non-significant result with a large effect size may warrant a larger study. Use confidence intervals to communicate the range of plausible values—far more informative than a single yes-or-no threshold.
A determination that an observed result is unlikely to have occurred by random chance alone. Conventionally indicated by a p-value below 0.05, meaning less than 5% probability of the result being a fluke.
A measure of the magnitude of a finding—how big the difference is between conditions, not just whether it exists. Essential for determining practical significance beyond statistical significance.
The number of participants in a research study. Appropriate sample size depends on research goals, method type (qualitative vs. quantitative), the precision required, and the number of distinct user segments being studied.
This term is referenced in the following articles: