So, to recap:
- sample means and standard deviations just happen to be optimal estimators of the parameters of a Gaussian distribution
- Gaussian distributions happen naturally (Central Limit Theorem), especially when mixing several causes to an effect so we can often fall back to them
- to construct a CI one has to build a probability around something independent of the very thing we're trying to estimate (otherwise circular dep!)
- it's easy when sigma is known (literally the CLT), but to extract something without both sigma and mu we need a bit more elbow grease (Student t)
- when not Gaussian we need moar math