A determination that an observed result is unlikely to have occurred by random chance alone. Conventionally indicated by a p-value below 0.05, meaning less than 5% probability of the result being a fluke.
Definition: A determination that an observed result is unlikely to have occurred by random chance alone. Conventionally indicated by a p-value below 0.05, meaning less than 5% probability of the result being a fluke.
Statistical significance is a determination that an observed result—like "Design B got 15% more clicks than Design A"—is unlikely to have occurred by random chance alone.
When you measure a sample, there is always a chance that your findings are just due to random noise or the specific people you happened to recruit. A statistical test calculates the probability of observing your result if there were truly no real difference.
If this probability (the p-value) is very low—conventionally below 0.05 (5%)—the finding is called "statistically significant," and you can be more confident that the effect is real.
A statistically significant result suggests:
Statistical significance does not tell you:
Always report effect size alongside significance to show whether a difference is large enough to justify action.
The probability of observing your data (or something more extreme) if there were truly no effect. Widely used, widely misunderstood, and never sufficient on its own to make a decision.
A measure of the magnitude of a finding—how big the difference is between conditions, not just whether it exists. Essential for determining practical significance beyond statistical significance.
Research focused on numerical measurement with the goal of generalizing findings from a sample to a broader population. Answers 'how much,' 'how many,' and 'how often.'
This term is referenced in the following articles: