Part 1: Data Scientist Foundation

Statistics & Exploratory Analysis

Building on a sound data-handling foundation, this section focuses on statistical concepts and exploratory methods. Learners will quantify central tendencies, variability, and distribution shapes, then progress to formal hypothesis testing. By combining visual and numerical approaches, data scientists uncover meaningful patterns, relationships, and initial insights in real-world datasets.

DESCRIPTIVE & EXPLORATORY STATISTICS

Learning Objectives

  • Compute measures of central tendency (mean, median, mode) and spread (variance, std dev, IQR)

  • Interpret distribution shape (skewness, kurtosis) using histograms/boxplots

  • Explore bivariate relationships via scatterplots, correlation, simple linear regression

Indicative Content

  • Measures of Center & Spread

    • Mean, standard deviation, quantiles

  • Shape Analysis

    • Skewness, kurtosis, outliers (boxplots)

  • Bivariate Exploration

    • Correlation coefficients, scatterplots, single-regressor regression

INFERENTIAL STATISTICS & HYPOTHESIS TESTING

Learning Objectives

  • Formulate null vs. alternative hypotheses; select suitable tests (parametric vs. non-parametric)

  • Perform t-tests (one-sample, paired, independent), ANOVA, chi-square

  • Interpret p-values, confidence intervals, and effect sizes for robust conclusions

Indicative Content

  • Hypothesis Testing Basics

    • Significance levels, Type I/II errors, test selection

  • Parametric Tests

    • T-tests, ANOVA variants (one-way, two-way)

  • Non-Parametric Tests

    • Mann-Whitney, Wilcoxon, Kruskal-Wallis; chi-square for categorical data

  • Implementation Insights

    • Using stats libraries for tests; reading p-values and confidence intervals

TOOLS & METHODOLOGIES (STATISTICS & EXPLORATORY ANALYSIS)

  • Statistical & Exploratory Libraries

    • scipy.stats, statsmodels for hypothesis tests, descriptive stats

    • Matplotlib/Seaborn for advanced visualizations (boxplots, scatterplots)

  • Descriptive Measures

    • Mean, standard deviation, skewness/kurtosis checks

    • Identifying outliers with boxplots

  • Hypothesis Testing Framework

    • Choice between parametric vs. non-parametric tests

    • T-tests, ANOVA, chi-square, and interpreting p-values, confidence intervals

  • Bivariate Exploration

    • Correlation analysis, single-variable regression for initial relationship mapping