Part 1: Data Scientist Foundation
Statistics & Exploratory Analysis
Building on a sound data-handling foundation, this section focuses on statistical concepts and exploratory methods. Learners will quantify central tendencies, variability, and distribution shapes, then progress to formal hypothesis testing. By combining visual and numerical approaches, data scientists uncover meaningful patterns, relationships, and initial insights in real-world datasets.
DESCRIPTIVE & EXPLORATORY STATISTICS
Learning Objectives
Compute measures of central tendency (mean, median, mode) and spread (variance, std dev, IQR)
Interpret distribution shape (skewness, kurtosis) using histograms/boxplots
Explore bivariate relationships via scatterplots, correlation, simple linear regression
Indicative Content
Measures of Center & Spread
Mean, standard deviation, quantiles
Shape Analysis
Skewness, kurtosis, outliers (boxplots)
Bivariate Exploration
Correlation coefficients, scatterplots, single-regressor regression
INFERENTIAL STATISTICS & HYPOTHESIS TESTING
Learning Objectives
Formulate null vs. alternative hypotheses; select suitable tests (parametric vs. non-parametric)
Perform t-tests (one-sample, paired, independent), ANOVA, chi-square
Interpret p-values, confidence intervals, and effect sizes for robust conclusions
Indicative Content
Hypothesis Testing Basics
Significance levels, Type I/II errors, test selection
Parametric Tests
T-tests, ANOVA variants (one-way, two-way)
Non-Parametric Tests
Mann-Whitney, Wilcoxon, Kruskal-Wallis; chi-square for categorical data
Implementation Insights
Using stats libraries for tests; reading p-values and confidence intervals
TOOLS & METHODOLOGIES (STATISTICS & EXPLORATORY ANALYSIS)
Statistical & Exploratory Libraries
scipy.stats
,statsmodels
for hypothesis tests, descriptive statsMatplotlib/Seaborn for advanced visualizations (boxplots, scatterplots)
Descriptive Measures
Mean, standard deviation, skewness/kurtosis checks
Identifying outliers with boxplots
Hypothesis Testing Framework
Choice between parametric vs. non-parametric tests
T-tests, ANOVA, chi-square, and interpreting p-values, confidence intervals
Bivariate Exploration
Correlation analysis, single-variable regression for initial relationship mapping