1. Descriptive Statistics
Descriptive statistics summarize and describe the main features of a dataset. They form the foundation for all statistical analysis and inference.
Measures of Central Tendency
Median = (x(n/2) + x(n/2+1)) / 2 for even n
Mode: The most frequently occurring value in the dataset.
Measures of Dispersion
Measures of Shape
2. Probability Theory
Axioms of Probability (Kolmogorov)
- P(A) ≥ 0 for any event A
- P(S) = 1 where S is the sample space
- For mutually exclusive events: P(A ∪ B) = P(A) + P(B)
Conditional Probability
Bayes' Theorem
Law of Total Probability
3. Probability Distributions
Discrete Distributions
Binomial Distribution
Models the number of successes in n independent Bernoulli trials.
Poisson Distribution
Models rare events over time or space.
Continuous Distributions
Normal Distribution
Standard Normal Distribution
t-Distribution
Used when population variance is unknown and estimated from sample. Has heavier tails than normal; approaches normal as df → ∞.
Chi-Square Distribution
If Z1, ..., Zk are independent standard normal, then ∑Zi2 ~ χ2(k)
F-Distribution
Ratio of two independent chi-square variables divided by their degrees of freedom. Used in ANOVA and regression.
4. Point Estimation
Properties of Estimators
- Unbiasedness: E[θ̂] = θ
- Efficiency: Minimum variance among unbiased estimators
- Consistency: θ̂ → θ as n → ∞
- Sufficiency: Captures all information about θ in the sample
Maximum Likelihood Estimation
5. Confidence Intervals
A (1-α) confidence interval provides a range that, in repeated sampling, would contain the true parameter value (1-α)×100% of the time.
6. Hypothesis Testing Framework
The Testing Process
- State null (H0) and alternative (H1) hypotheses
- Choose significance level α
- Select appropriate test statistic
- Determine critical region or compute p-value
- Make decision: reject or fail to reject H0
Types of Errors
| H0 True | H0 False | |
|---|---|---|
| Reject H0 | Type I Error (α) | Correct Decision (Power) |
| Fail to Reject | Correct Decision | Type II Error (β) |
7. t-Tests
One-Sample t-Test
Independent Two-Sample t-Test
Paired t-Test
8. Analysis of Variance (ANOVA)
One-Way ANOVA
Tests whether means of k groups are equal: H0: μ1 = μ2 = ... = μk
SSB = ∑j nj(x̄j - x̄)2
SSW = ∑j∑i (xij - x̄j)2
Assumptions of ANOVA
- Independence of observations
- Normality within groups
- Homogeneity of variances (homoscedasticity)
Post-Hoc Tests
When ANOVA rejects H0, post-hoc tests identify which means differ:
- Tukey's HSD: Controls family-wise error rate for all pairwise comparisons
- Bonferroni: Adjusts α for multiple comparisons (α/m)
- ScheffΓ©: Most conservative; allows all contrasts
9. Nonparametric Tests
Nonparametric tests make fewer assumptions about the underlying distribution and are appropriate when normality cannot be assumed or with ordinal data.
Mann-Whitney U Test
Nonparametric alternative to independent two-sample t-test. Tests whether one distribution is stochastically greater than the other.
Wilcoxon Signed-Rank Test
Nonparametric alternative to paired t-test. Uses ranks of absolute differences.
Kruskal-Wallis Test
Nonparametric alternative to one-way ANOVA. Extends Mann-Whitney to k groups.
10. Correlation Analysis
Testing Correlation
Spearman Rank Correlation
Nonparametric correlation based on ranks. Measures monotonic (not necessarily linear) relationships.
11. Regression Analysis
Simple Linear Regression
b0 = ȳ - b1x̄
Coefficient of Determination
Regression Inference
12. Categorical Data Analysis
Chi-Square Test of Independence
df = (r-1)(c-1)
Chi-Square Goodness of Fit
Tests whether observed frequencies match expected frequencies from a hypothesized distribution.
Fisher's Exact Test
Exact test for 2×2 tables when expected frequencies are small (any Eij < 5).
References and Further Reading
- Casella, G. & Berger, R.L. (2002). Statistical Inference, 2nd Edition. Cengage Learning.
- Wackerly, D.D., Mendenhall, W., & Scheaffer, R.L. (2014). Mathematical Statistics with Applications, 7th Edition. Cengage.
- Agresti, A. (2018). Statistical Methods for the Social Sciences, 5th Edition. Pearson.
- Kutner, M.H., et al. (2004). Applied Linear Statistical Models, 5th Edition. McGraw-Hill.
- Hollander, M., Wolfe, D.A., & Chicken, E. (2013). Nonparametric Statistical Methods, 3rd Edition. Wiley.