t-value, F-value, and p-value in One Page
This page summarizes three core ideas in hypothesis testing:
- What a test statistic is
- How to interpret the t-value
- How to interpret the F-value
- What the p-value really means (and common pitfalls)
The goal is intuition first, formulas second.
1. What is a test statistic?
When we say “we tested whether the groups are different”, we almost always:
- Collect sample data.
- Compute some summary number from the data.
- Compare that number to a reference distribution to get a p-value.
That summary number is the test statistic, e.g.:
- t (t-value)
- F (F-value)
- z
- chi-square (χ²)
You can think of a test statistic as:
A processed version of your sample statistics that tells you
“How big is the effect, relative to how much noise (uncertainty) I have?”
Roughly:
- numerator = “effect” (difference we care about)
- denominator = “uncertainty / noise / variability”
The bigger the test statistic (in absolute value), the more “surprising” the result is if the null hypothesis were true.
2. t-value: “difference / uncertainty”
2.1 Basic idea
When comparing two sample means (e.g., treatment vs control), we are interested in:
- Effect: difference between sample means
- Example: mean blood pressure in group 1 minus group 2
- Uncertainty: how much sampling error that difference has
The t-value combines these as:
t = (difference in means) / (standard error of that difference)
Or in words:
“The difference is this big, and the noise level is this big.”
If the noise is small (large sample size, low variance), even a modest difference can give a large t.
If the noise is large, even a big-looking difference might not be statistically convincing.
2.2 Two-sample t-value (independent groups, general form)
Suppose we have:
- Group 1: sample mean
X1_bar, standard deviations1, sample sizen1 - Group 2: sample mean
X2_bar, standard deviations2, sample sizen2
A common form of the two-sample t is:
-
Effect (numerator):
X1_bar - X2_bar -
Standard error (denominator):
SE = sqrt( (s1^2 / n1) + (s2^2 / n2) )
So the t-value is:
t = (X1_bar - X2_bar) / SE
This is exactly “difference / uncertainty”.
2.3 Why does n (sample size) appear?
The standard error shrinks as sample size grows:
- Larger
n→ more precise mean estimates → smaller standard error - Smaller standard error → larger t-value (for the same effect size)
So with huge sample sizes, even tiny differences can produce large t-values and very small p-values. This is one key reason not to confuse “statistically significant” with “practically important”.
3. F-value: “between-group variance / within-group variance”
When comparing more than two groups (e.g., 3+ treatments), we use ANOVA and the F-value.
Conceptually, the F-value is:
F = (variance between groups) / (variance within groups)
-
Between-group variance:
How far the group means are spread out from the overall mean
→ reflects group differences (possible treatment effect). -
Within-group variance:
How spread out the data are inside each group
→ reflects noise / random variation not explained by the grouping.
So F answers:
“Are the differences between group means big, relative to the noise inside each group?”
If all groups come from the same population (null is true):
- Group means should not differ more than what random sampling explains.
- So between-group variance and within-group variance should be similar
→ F should be around 1.
If at least one group really is different:
- Between-group variance becomes larger than within-group variance
- F becomes larger than 1
- A large F value (in the upper tail of the F distribution) suggests the null is unlikely.
3.1 Relationship between t and F: F = t² (for two groups)
If you compare only two groups, ANOVA and the two-sample t-test are actually the same test in disguise.
For two groups:
- The F-test for group differences
- The t-test for difference in means
are related by:
F = t²
So conceptually, t and F carry the same information in that setting:
- t focuses on difference / standard error
- F focuses on between-group variance / within-group variance
They both measure “signal vs noise”.
4. p-value: “How surprising is this, if the null were true?”
Now that we have test statistics (t, F, etc.), how do we decide whether they are “big enough” to reject the null hypothesis?
That’s where the p-value comes in.
4.1 Formal definition (loosely stated)
Given a test statistic (t, F, z, etc.) and a null hypothesis (for example, “the two group means are equal”):
The p-value is the probability of obtaining a test statistic
at least as extreme as the one we observed,
assuming that the null hypothesis is true.
Important:
- The probability is computed under the assumption that the null is true.
- It is not the probability that the null hypothesis is true.
- It is about “how unusual is our data, if the null world were real?”
In plain language:
“Let’s pretend the two groups actually come from the same population.
Under that assumption, how likely would we be to see a t (or F)
as large as we got (or even more extreme) just by random chance?”
If that probability is very small (e.g., < 0.05), we say:
- “Our result is statistically significant at the 5% level.”
- Meaning: “In a world where the null is true, what we observed would be rare.”
4.2 Why p-values are convenient
- Different tests (t, F, chi-square, etc.) have different distributions.
- Instead of memorizing critical values for all degrees of freedom and distributions,
we just convert everything into a probability (the p-value). - Thresholds like 0.05 or 0.01 are easy to remember and communicate.
So p-values are popular because they compress:
- The magnitude of the test statistic
- The shape of its distribution
- The sample size (through the test statistic)
into a single number.
4.3 The downside: over-reliance and misuse
Because p-values compress a lot of information, they can be easily misused:
-
Effect size vs. sample size are conflated
- A very small effect with a very large sample
→ can yield a tiny p-value. - A moderate effect with a very small sample
→ might yield a non-significant p-value.
So:
- “p < 0.05” does not mean the effect is large or important.
- It only means: “given the sample size and variability, this result is unlikely under the null.”
- A very small effect with a very large sample
-
p-value is not:
- the probability that the null hypothesis is true
- the probability that your results are due to chance
- the probability that the alternative hypothesis is true
It is purely about:
P( test statistic ≥ observed | null is true ) -
With huge datasets, almost anything becomes significant.
- If n is extremely large, even trivial differences can yield minuscule p-values.
- That’s why we must also report effect sizes and confidence intervals, not just p-values.
5. Big picture summary
You can think of the three concepts like this:
- t-value
- “How many standard errors away is the difference in means from zero?”
t ≈ (difference) / (uncertainty)
- F-value
- “How big is the variance between groups relative to the variance within groups?”
F ≈ (between-group variance) / (within-group variance)- When comparing 2 groups,
F = t².
- p-value
- “If the null hypothesis were true, how likely is it to see a test statistic at least this extreme?”
- It is not the probability the null is true.
- It depends on both effect size and sample size.
When you interpret results in practice, a good habit is:
- Look at the effect size (difference in means, ratios, etc.).
- Look at the uncertainty (standard error, confidence interval).
- Then look at the p-value to understand how surprising the result is under the null.
t, F, and p-values are just tools.
They are powerful and useful—but only when we also think carefully about the data, the design, and the real-world meaning of the effect.