Classification Models
This section addresses techniques for modeling discrete or categorical outcomes, such as binary, multi-class, or ordinal responses, relying heavily on logistic regression and maximum likelihood estimation (MLE). It underscores how log-odds coefficients and data encoding (e.g., dummy variables) are critical to accurately modeling categorical patterns. The objective is to enable analysts to segment or label entities effectively, providing actionable insights for organizational decision-making.
Binary Logistc Regression
Learning Objectives
Explain the rationale for using logistic regression when the outcome is binary, describe the statistical model and its MLE-based parameter estimation, and demonstrate how to implement it in Python using Logit()
.
Indicative Content
Purpose of binary logistic regression:
Dependent variable: 0 or 1
Independent variables: Categorical or continuous
Logit link function to handle bounded probabilities
Statistical model:
Log-odds form: logit(\(p\)) = \(\beta_0 + \beta_1 X_1 + \dots + \beta_k X_k\)
Parameter estimation: Maximum Likelihood Estimation (MLE)
Interpretation of coefficients in terms of odds ratios
Python implementation:
logit()
fromstatsmodels.formula.api
Retrieving summaries and parameter estimates (e.g.,
model.summary()
)Obtaining predicted probabilities (e.g.,
model.predict()
)
Multinomial Logistic Regression
Learning Objectives
Describe the multinomial logistic regression model for non-ordered categorical outcomes, outline how (k-1) logit functions are defined for k categories, and demonstrate a Python-based approach to parameter estimation via MLE.
Indicative Content
Dependent variable with more than two mutually exclusive categories
Statistical model:
(k-1) logit equations relative to a base category
Each logit has its own intercept and coefficients
Parameters estimated by MLE
Interpretation of coefficients:
Log-odds of each category versus a chosen base category
Sign and magnitude of coefficients indicate direction and strength
Python implementation (example approaches):
MNLogit
fromstatsmodels.discrete.discrete_model
Formula-based usage similar to binary logistic, but specifying multiple categories
Viewing parameter summaries with
model.summary()
Ordinal Logistic Regression
Learning Objectives
Explain the structure of ordinal logistic regression for ordered categorical outcomes, discuss the proportional odds assumption, and demonstrate how to fit the model in Python using an appropriate function or library.
Indicative Content
Ordinal (ordered) dependent variable with k > 2 categories
Proportional odds model:
A single set of coefficients applies across thresholds
(k-1) intercepts vary, but slopes (coefficients) remain constant
Parameter estimation via MLE
Interpretation of coefficients:
Log-odds of being in or below a given category
Negative or positive coefficients indicate how predictors shift likelihood toward lower or higher categories
Python implementation (possible approaches):
OrderedModel
fromstatsmodels.miscmodels.ordinal_model
Similar steps to fitting logistic models (define formula, fit, interpret coefficients)
Tools and Methodologies
statsmodels
logit()
for binary logistic regressionMNLogit
for multinomial logistic regressionOrderedModel
or similar for ordinal logistic regression
Python (e.g.,
pandas
,numpy
) for data manipulationMLE (Maximum Likelihood Estimation) for fitting logistic-type models