Part 1: Data Scientist Foundation

Predictive Modeling Foundations

This section introduces the fundamentals of predictive modeling, emphasizing binary classification concepts and the logistic regression technique. By splitting data for training and testing, evaluating model metrics, and interpreting regression coefficients, learners gain the skills to develop initial predictive workflows that can guide business or research decisions.

FUNDAMENTALS OF PREDICTIVE MODELING & LOGISTIC REGRESSION

Learning Objectives

  • Understand binary classification metrics (accuracy, precision, recall, F1, ROC/AUC)

  • Build logistic regression models; interpret coefficients as log-odds/odds ratios

  • Split data (train/test) and assess performance with confusion matrices, ROC curves

Indicative Content

  • Classification Essentials

    • Confusion matrix, threshold choices, sensitivity/specificity

  • Logistic Regression

    • Sigmoid function, logit transformation, coefficient implications

  • Model Evaluation

    • Precision-recall, ROC curve, AUC, threshold tuning

TOOLS & METHODOLOGIES (PREDICTIVE MODELING FOUNDATIONS)

  • Python Libraries

    • Machine Learning: scikit-learn for train/test splits, logistic regression

    • Evaluation Tools: Libraries/functions for confusion matrices, ROC plots, AUC calculation

  • Binary Classification

    • Key metrics (accuracy, precision, recall, F1)

    • Assessing threshold adjustments for sensitivity vs. specificity

  • Logistic Regression

    • Log-odds interpretation, intercept vs. coefficient meaning

    • Potential for threshold tuning, model calibration