Part 4 Predictive Analytics

Classification Models

This section addresses techniques for modeling discrete or categorical outcomes, such as binary, multi-class, or ordinal responses, relying heavily on logistic regression and maximum likelihood estimation (MLE). It underscores how log-odds coefficients and data encoding (e.g., dummy variables) are critical to accurately modeling categorical patterns. The objective is to enable analysts to segment or label entities effectively, providing actionable insights for organizational decision-making.

Binary Logistc Regression

Learning Objectives

Explain the rationale for using logistic regression when the outcome is binary, describe the statistical model and its MLE-based parameter estimation, and demonstrate how to implement it in Python using Logit().

Indicative Content

Purpose of binary logistic regression:
- Dependent variable: 0 or 1
- Independent variables: Categorical or continuous
- Logit link function to handle bounded probabilities
Statistical model:
- Log-odds form: logit(\(p\)) = \(\beta_0 + \beta_1 X_1 + \dots + \beta_k X_k\)
- Parameter estimation: Maximum Likelihood Estimation (MLE)
- Interpretation of coefficients in terms of odds ratios
Python implementation:
- logit() from statsmodels.formula.api
- Retrieving summaries and parameter estimates (e.g., model.summary())
- Obtaining predicted probabilities (e.g., model.predict())

Multinomial Logistic Regression

Learning Objectives

Describe the multinomial logistic regression model for non-ordered categorical outcomes, outline how (k-1) logit functions are defined for k categories, and demonstrate a Python-based approach to parameter estimation via MLE.

Indicative Content

Dependent variable with more than two mutually exclusive categories
Statistical model:
- (k-1) logit equations relative to a base category
- Each logit has its own intercept and coefficients
- Parameters estimated by MLE
Interpretation of coefficients:
- Log-odds of each category versus a chosen base category
- Sign and magnitude of coefficients indicate direction and strength
Python implementation (example approaches):
- MNLogit from statsmodels.discrete.discrete_model
- Formula-based usage similar to binary logistic, but specifying multiple categories
- Viewing parameter summaries with model.summary()

Ordinal Logistic Regression

Learning Objectives

Explain the structure of ordinal logistic regression for ordered categorical outcomes, discuss the proportional odds assumption, and demonstrate how to fit the model in Python using an appropriate function or library.

Indicative Content

Ordinal (ordered) dependent variable with k > 2 categories
Proportional odds model:
- A single set of coefficients applies across thresholds
- (k-1) intercepts vary, but slopes (coefficients) remain constant
- Parameter estimation via MLE
Interpretation of coefficients:
- Log-odds of being in or below a given category
- Negative or positive coefficients indicate how predictors shift likelihood toward lower or higher categories
Python implementation (possible approaches):
- OrderedModel from statsmodels.miscmodels.ordinal_model
- Similar steps to fitting logistic models (define formula, fit, interpret coefficients)

Tools and Methodologies

statsmodels
- logit() for binary logistic regression
- MNLogit for multinomial logistic regression
- OrderedModel or similar for ordinal logistic regression
Python (e.g., pandas, numpy) for data manipulation
MLE (Maximum Likelihood Estimation) for fitting logistic-type models